Collections API¶
Collections¶
Streaming([stream, example]) |
Superclass for streaming collections |
Streaming.map_partitions(func, *args, **kwargs) |
Map a function across all batch elements of this stream |
Streaming.accumulate_partitions(func, *args, …) |
Accumulate a function with state across batch elements |
Streaming.verify(x) |
Verify elements that pass through this stream |
Batch¶
Batch |
|
Batch.filter |
|
Batch.map |
|
Batch.pluck |
|
Batch.to_dataframe |
|
Batch.to_stream |
Dataframes¶
DataFrame |
|
DataFrame.groupby |
|
DataFrame.rolling |
|
DataFrame.assign |
|
DataFrame.sum |
|
DataFrame.mean |
|
DataFrame.cumsum |
|
DataFrame.cumprod |
|
DataFrame.cummin |
|
DataFrame.cummax |
|
DataFrame.plot |
GroupBy |
|
GroupBy.count |
|
GroupBy.mean |
|
GroupBy.size |
|
GroupBy.std |
|
GroupBy.sum |
|
GroupBy.var |
Rolling(sdf, window, min_periods) |
Rolling aggregations |
Rolling.aggregate(*args, **kwargs) |
Rolling aggregation |
Rolling.count(*args, **kwargs) |
Rolling count |
Rolling.max() |
Rolling maximum |
Rolling.mean() |
Rolling mean |
Rolling.median() |
Rolling median |
Rolling.min() |
Rolling minimum |
Rolling.quantile(*args, **kwargs) |
Rolling quantile |
Rolling.std(*args, **kwargs) |
Rolling standard deviation |
Rolling.sum() |
Rolling sum |
Rolling.var(*args, **kwargs) |
Rolling variance |
DataFrame.window |
|
Window.apply |
|
Window.count |
|
Window.groupby |
|
Window.sum |
|
Window.size |
|
Window.std |
|
Window.var |
Rolling.aggregate(*args, **kwargs) |
Rolling aggregation |
Rolling.count(*args, **kwargs) |
Rolling count |
Rolling.max() |
Rolling maximum |
Rolling.mean() |
Rolling mean |
Rolling.median() |
Rolling median |
Rolling.min() |
Rolling minimum |
Rolling.quantile(*args, **kwargs) |
Rolling quantile |
Rolling.std(*args, **kwargs) |
Rolling standard deviation |
Rolling.sum() |
Rolling sum |
Rolling.var(*args, **kwargs) |
Rolling variance |
Random([freq, interval, dask]) |
A streaming dataframe of random data |
Details¶
-
class
streamz.collection.Streaming(stream=None, example=None)¶ Superclass for streaming collections
Do not create this class directly, use one of the subclasses instead.
Parameters: stream: streamz.Stream
example: object
An object to represent an example element of this stream
See also
streamz.dataframe.StreamingDataFrame,streamz.dataframe.StreamingBatchMethods
accumulate_partitions(func, *args, **kwargs)Accumulate a function with state across batch elements emit(x)map_partitions(func, *args, **kwargs)Map a function across all batch elements of this stream verify(x)Verify elements that pass through this stream -
accumulate_partitions(func, *args, **kwargs)¶ Accumulate a function with state across batch elements
See also
-
map_partitions(func, *args, **kwargs)¶ Map a function across all batch elements of this stream
The output stream type will be determined by the action of that function on the example
See also
-
verify(x)¶ Verify elements that pass through this stream
-
-
class
streamz.dataframe.Rolling(sdf, window, min_periods)¶ Rolling aggregations
This intermediate class enables rolling aggregations across either a fixed number of rows or a time window.
Examples
>>> sdf.rolling(10).x.mean() >>> sdf.rolling('100ms').x.mean()
Methods
aggregate(*args, **kwargs)Rolling aggregation count(*args, **kwargs)Rolling count max()Rolling maximum mean()Rolling mean median()Rolling median min()Rolling minimum quantile(*args, **kwargs)Rolling quantile std(*args, **kwargs)Rolling standard deviation sum()Rolling sum var(*args, **kwargs)Rolling variance -
aggregate(*args, **kwargs)¶ Rolling aggregation
-
count(*args, **kwargs)¶ Rolling count
-
max()¶ Rolling maximum
-
mean()¶ Rolling mean
-
median()¶ Rolling median
-
min()¶ Rolling minimum
-
quantile(*args, **kwargs)¶ Rolling quantile
-
std(*args, **kwargs)¶ Rolling standard deviation
-
sum()¶ Rolling sum
-
var(*args, **kwargs)¶ Rolling variance
-
-
class
streamz.dataframe.Random(freq='100ms', interval='500ms', dask=False)¶ A streaming dataframe of random data
The x column is uniformly distributed. The y column is poisson distributed. The z column is normally distributed.
This class is experimental and will likely be removed in the future
Parameters: freq: timedelta
The time interval between records
interval: timedelta
The time interval between new dataframes, should be significantly larger than freq
Attributes
columnsdtypesindexMethods
accumulate_partitions(func, *args, **kwargs)Accumulate a function with state across batch elements assign(**kwargs)Assign new columns to this dataframe cummax()Cumulative maximum cummin()Cumulative minimum cumprod()Cumulative product cumsum()Cumulative sum emit(x)groupby(other)Groupby aggreagtions map_partitions(func, *args, **kwargs)Map a function across all batch elements of this stream mean()Average plot([backlog, width, height])Plot streaming dataframe as Bokeh plot rolling(window[, min_periods])Compute rolling aggregations round([decimals])Round elements in frame stop()sum()Sum frame tail([n])Round elements in frame to_frame()Convert to a streaming dataframe verify(x)Verify consistency of elements that pass through this stream