Plotting ======== The plotting interface on streamz DataFrame and Series objects attempts to mirror the pandas plotting API, but instead of plotting with matplotlib_ uses HoloViews_ to generate dynamically streaming bokeh_ plots. To support plotting streaming data you can use this interface either in a Jupyter notebook or deploy it as a :ref:`bokeh server app `. HoloViews provides several constructs which make it well suited to streaming visualizations. All plotting methods will return so called ``DynamicMap`` objects, which update the plot whenever streamz triggers an event. For additional information about working and plotting with HoloViews_ see the `User Guide `_, as we will focus on using the high-level plotting API in this overview and skip most of the mechanics going on behind the scenes. *All plots generated by the streamz plotting interface dynamically stream data, since the documentation cannot easily embed streaming plots all plots represent static screenshots.* Basic plotting -------------- Throughout this section we will be using the ``Random`` construct, which provides an easy way of generating a DataFrame of random streaming data. .. code-block:: python from streamz.dataframe import Random df = Random() .. image:: images/random_df.png :alt: example Random streaming dataframe output :width: 300 px The plot method on Series and DataFrame is a simple wrapper around a line plot, which will plot all columns: .. code-block:: python df.plot() .. image:: images/simple_plot.png :alt: a line plot of the Random dataframe The plot method can also be called on a Series, plotting a specific column: .. code-block:: python df.z.cumsum().plot() Another more general way to express the same thing is to explicitly define ``x`` and ``y`` in the DataFrame plot method: .. code-block:: python df.cumsum().plot(x='index', y='z') .. image:: images/series_plot.png :alt: a line plot of the Random Series Other plots ----------- Plotting methods allow for a handful of plot styles other than the default Line plot. These methods can be provided as the ``kind`` keyword argument to :meth:`~DataFrame.plot`. These include: * :ref:`'bar' ` or :ref:`'barh' ` for bar plots * :ref:`'hist' ` for histogram * :ref:`'box' ` for boxplot * :ref:`'kde' ` or ``'density'`` for density plots * :ref:`'area' ` for area plots * :ref:`'scatter' ` for scatter plots * :ref:`'table' ` for tables For example, a bar plot can be created the following way: .. code-block:: python df.groupby('y').x.sum().plot(kind='bar') .. image:: images/bar_plot.png :alt: a bar plot of the summed x values grouped by y You can also create these other plots using the methods ``DataFrame.plot.`` instead of providing the ``kind`` keyword argument. This makes it easier to discover plot methods and the specific arguments they use: .. code-block:: python In [14]: df = Random() In [15]: df.plot. df.plot.area df.plot.barh df.plot.density df.plot.kde df.plot.scatter df.plot.bar df.plot.box df.plot.hist df.plot.line df.plot.table .. _plotting.barplot: Bar plots ~~~~~~~~~ For labeled, non-time series data, you may wish to produce a bar plot. In addition to the simple bar plot shown above we can also produce grouped bars: .. code-block:: python df.groupby('y').sum().plot.bar(x='y') .. image:: images/bar_grouped.png :alt: a grouped bar plot of the summed values grouped by y Alternatively you may also stack the bars: .. code-block:: python df.groupby('y').sum().plot.bar(x='y', stacked=True) .. image:: images/bar_stacked.png :alt: a grouped bar plot of the summed values grouped by y .. _plotting.hist: Histograms ~~~~~~~~~~ Histogram can be drawn by using the DataFrame.plot.hist() and Series.plot.hist() methods. The number of bins can be declared using the bins keyword and normalization can be disabled with the normed keyword. .. code-block:: python df.z.plot.hist(bins=50, backlog=5000, normed=False) .. image:: images/series_hist.png :alt: a histogram of a series Calling from the DataFrame.plot.hist will plot all columns, to be able to compare you can lower the alpha and define a bin_range: .. code-block:: python df.plot.hist(bin_range=(-3, 3), bins=50, backlog=5000, alpha=0.3) .. image:: images/df_hist.png :alt: a histogram of a dataframe .. _plotting.box: Box Plots ~~~~~~~~~ Boxplot can be drawn calling Series.plot.box() and DataFrame.plot.box() to visualize the distribution of values within each column. For example here we plot each column: .. code-block:: python df.plot.box() .. image:: images/df_boxplot.png :alt: a box plot of a dataframe Or we can generate a boxplot of a Series: .. code-block:: python df.x.plot.box(width=300) .. image:: images/series_boxplot.png :alt: a box plot of a series :width: 350 px It is also possible to group a box plot by a secondary variable: .. code-block:: python df.plot.box(by='y', height=400) .. image:: images/grouped_boxplot.png :alt: a box plot of a series .. _plotting.kde: KDE plots ~~~~~~~~~ You can create density plots using the Series.plot.kde() and DataFrame.plot.kde() methods. .. code-block:: python df.x.plot.kde() .. image:: images/series_kde.png :alt: a KDE plot of a series .. _plotting.area_plot: Area plots ~~~~~~~~~~ You can create area plots with Series.plot.area() and DataFrame.plot.area(). To produce stacked area plot, each column must be either all positive or all negative values. .. code-block:: python df.x.plot.area() .. image:: images/series_area.png :alt: an area plot of a series When plotting multiple columns on a DataFrame the areas may be stacked: .. code-block:: python df[['x', 'y']].plot.area(stacked=True) .. image:: images/stacked_area.png :alt: a stacked area plot of a dataframe .. _plotting.scatter: Scatter plots ~~~~~~~~~~~~~ Scatter plot can be drawn by using the DataFrame.plot.scatter() method. Scatter plot requires numeric or datetime columns for x and y axis. These can be specified by x and y keywords each. .. code-block:: python df.plot.scatter(x='x', y='z') .. image:: images/df_scatter.png :alt: a scatter plot of the 'x' and 'z' columns of the dataframe :width: 350 px The scatter points can also be colored by a certain column using the c keyword. Additionally we will enable a colorbar and adjust the xaxis by setting x-axis limits using xlim: .. code-block:: python df.plot.scatter(x='y', y='z', c='x', cmap='viridis', width=400, colorbar=True, xlim=(-1, 6)) .. image:: images/colored_scatter.png :alt: a scatter plot colored by the 'z' column :width: 450 px .. _plotting.table: Tables ~~~~~~ We can also stream a table view of the data: .. code-block:: python df.plot.table(width=600) .. image:: images/df_table.png :alt: a table view of the data :width: 600 px .. _plotting.composition: Composing Plots --------------- One of the core strengths of HoloViews is the ease of composing different plots. Individual plots can be composed using the ``*`` and ``+`` operators, which overlay and compose plots into layouts respectively. For more information on composing objects see the HoloViews `User Guide `_. By using these operators we can combine multiple plots into composite Overlay and Layout objects, and lay them out in two columns using the Layout.cols method: .. code-block:: python (df.plot.line(width=400) * df.plot.scatter(width=400) + df.groupby('y').sum().plot.bar('y', 'x', width=400) + df.plot.box(width=400) + df.x.plot.kde(width=400)).cols(2) .. image:: images/composite_plot.png :alt: a table view of the data .. _plotting.customization: Customizing the visualization ----------------------------- In addition to specific options for different plot types the plotting API exposes a number of general options including: - ``backlog`` (default=1000): Number of rows of streamed data to accumulate in a buffer and plot at the same time - ``grid`` (default=False): Whether to show a grid - ``hover`` (default=False): Whether to show hover tooltips - ``legend`` (default=True): Whether to show a legend - ``logx``/``logy`` (default=False): Enables logarithmic x- and y-axis respectively - ``shared_axes`` (default=False): Whether to link axes between plots - ``title`` (default=''): Title for the plot - ``xlim``/``ylim`` (default=None): Plot limits of the x- and y-axis - ``xticks``/``yticks`` (default=None): Ticks along x- and y-axis specified as an integer, list of ticks postions, or list of tuples of the tick positions and labels - ``width`` (default=800)/``height`` (default=300): The width and height of the plot in pixels In addition, options can be passed directly to HoloViews providing greater control over the plots. The options can be provided as dictionaries via the ``plot_opts`` and ``style_opts`` keyword arguments. You can also apply options using the HoloViews API (for more information see the HoloViews `User Guide `_). .. _plotting.apps: Deployment as bokeh apps ------------------------ In the Jupyter notebook HoloViews objects will automatically be rendered, but when deploying a plot as a bokeh app it has to be rendered explicitly. The following examples describes how to set up a streaming DataFrame, declare some plots, compose them, set up a callback to update the plot and finally convert the composite plot to a bokeh Document, which can be served from a script using ``bokeh serve`` on the commandline. .. code-block:: python import numpy as np import pandas as pd import holoviews as hv from streamz import Stream from streamz.dataframe import DataFrame import streamz.dataframe.holoviews renderer = hv.renderer('bokeh').instance(mode='server') # Set up streaming DataFrame stream = Stream() index = pd.DatetimeIndex([]) example = pd.DataFrame({'x': [], 'y': [], 'z': []}, columns=['x', 'y', 'z'], index=) df = DataFrame(stream, example=example) cumulative = df.cumsum()[['x', 'z']] # Declare plots line = cumulative.plot.line(width=400) scatter = cumulative.plot.scatter(width=400) bars = df.groupby('y').sum().plot.bar(width=400) box = df.plot.box(width=400) kde = df.x.plot.kde(width=400) # Compose plots layout = (line * scatter + bars + box + kde).cols(2) # Set up callback with streaming data def emit(): now = pd.datetime.now() delta = np.timedelta64(500, 'ms') index = pd.date_range(np.datetime64(now)-delta, now, freq='100ms') df = pd.DataFrame({'x': np.random.randn(len(index)), 'y': np.random.randint(0, 10, len(index)), 'z': np.random.randn(len(index))}, columns=['x', 'y', 'z'], index=index) stream.emit(df) # Render layout to bokeh server Document and attach callback doc = renderer.server_doc(layout) doc.title = 'Streamz HoloViews based Plotting API Bokeh App Demo' doc.add_periodic_callback(emit, 500) .. image:: images/server_plot.png :alt: a bokeh server app demo :width: 600 px For more details on deploying bokeh apps see the HoloViews `User Guide `_. .. _plotting.holoviews: Using HoloViews directly ------------------------ HoloViews includes first class support for streamz DataFrame and Series, for more details see the `Streaming Data section `_ in the HoloViews documentation. .. _matplotlib: https://matplotlib.org/ .. _HoloViews: https://holoviews.org/ .. _bokeh: https://bokeh.pydata.org/en/latest/