Collections API¶
Collections¶
Streaming ([stream, example]) |
Superclass for streaming collections |
Streaming.map_partitions (func, *args, **kwargs) |
Map a function across all batch elements of this stream |
Streaming.accumulate_partitions (func, *args, …) |
Accumulate a function with state across batch elements |
Streaming.verify (x) |
Verify elements that pass through this stream |
Batch¶
Batch |
|
Batch.filter |
|
Batch.map |
|
Batch.pluck |
|
Batch.to_dataframe |
|
Batch.to_stream |
Dataframes¶
DataFrame |
|
DataFrame.groupby |
|
DataFrame.rolling |
|
DataFrame.assign |
|
DataFrame.sum |
|
DataFrame.mean |
|
DataFrame.cumsum |
|
DataFrame.cumprod |
|
DataFrame.cummin |
|
DataFrame.cummax |
|
DataFrame.plot |
GroupBy |
|
GroupBy.count |
|
GroupBy.mean |
|
GroupBy.size |
|
GroupBy.std |
|
GroupBy.sum |
|
GroupBy.var |
Rolling (sdf, window, min_periods) |
Rolling aggregations |
Rolling.aggregate (*args, **kwargs) |
Rolling aggregation |
Rolling.count (*args, **kwargs) |
Rolling count |
Rolling.max () |
Rolling maximum |
Rolling.mean () |
Rolling mean |
Rolling.median () |
Rolling median |
Rolling.min () |
Rolling minimum |
Rolling.quantile (*args, **kwargs) |
Rolling quantile |
Rolling.std (*args, **kwargs) |
Rolling standard deviation |
Rolling.sum () |
Rolling sum |
Rolling.var (*args, **kwargs) |
Rolling variance |
DataFrame.window |
|
Window.apply |
|
Window.count |
|
Window.groupby |
|
Window.sum |
|
Window.size |
|
Window.std |
|
Window.var |
Rolling.aggregate (*args, **kwargs) |
Rolling aggregation |
Rolling.count (*args, **kwargs) |
Rolling count |
Rolling.max () |
Rolling maximum |
Rolling.mean () |
Rolling mean |
Rolling.median () |
Rolling median |
Rolling.min () |
Rolling minimum |
Rolling.quantile (*args, **kwargs) |
Rolling quantile |
Rolling.std (*args, **kwargs) |
Rolling standard deviation |
Rolling.sum () |
Rolling sum |
Rolling.var (*args, **kwargs) |
Rolling variance |
Random ([freq, interval, dask]) |
A streaming dataframe of random data |
Details¶
-
class
streamz.collection.
Streaming
(stream=None, example=None)¶ Superclass for streaming collections
Do not create this class directly, use one of the subclasses instead.
Parameters: stream: streamz.Stream
example: object
An object to represent an example element of this stream
See also
streamz.dataframe.StreamingDataFrame
,streamz.dataframe.StreamingBatch
Methods
accumulate_partitions
(func, *args, **kwargs)Accumulate a function with state across batch elements emit
(x)map_partitions
(func, *args, **kwargs)Map a function across all batch elements of this stream verify
(x)Verify elements that pass through this stream -
accumulate_partitions
(func, *args, **kwargs)¶ Accumulate a function with state across batch elements
See also
-
map_partitions
(func, *args, **kwargs)¶ Map a function across all batch elements of this stream
The output stream type will be determined by the action of that function on the example
See also
-
verify
(x)¶ Verify elements that pass through this stream
-
-
class
streamz.dataframe.
Rolling
(sdf, window, min_periods)¶ Rolling aggregations
This intermediate class enables rolling aggregations across either a fixed number of rows or a time window.
Examples
>>> sdf.rolling(10).x.mean() >>> sdf.rolling('100ms').x.mean()
Methods
aggregate
(*args, **kwargs)Rolling aggregation count
(*args, **kwargs)Rolling count max
()Rolling maximum mean
()Rolling mean median
()Rolling median min
()Rolling minimum quantile
(*args, **kwargs)Rolling quantile std
(*args, **kwargs)Rolling standard deviation sum
()Rolling sum var
(*args, **kwargs)Rolling variance -
aggregate
(*args, **kwargs)¶ Rolling aggregation
-
count
(*args, **kwargs)¶ Rolling count
-
max
()¶ Rolling maximum
-
mean
()¶ Rolling mean
-
median
()¶ Rolling median
-
min
()¶ Rolling minimum
-
quantile
(*args, **kwargs)¶ Rolling quantile
-
std
(*args, **kwargs)¶ Rolling standard deviation
-
sum
()¶ Rolling sum
-
var
(*args, **kwargs)¶ Rolling variance
-
-
class
streamz.dataframe.
Random
(freq='100ms', interval='500ms', dask=False)¶ A streaming dataframe of random data
The x column is uniformly distributed. The y column is poisson distributed. The z column is normally distributed.
This class is experimental and will likely be removed in the future
Parameters: freq: timedelta
The time interval between records
interval: timedelta
The time interval between new dataframes, should be significantly larger than freq
Attributes
columns
dtypes
index
Methods
accumulate_partitions
(func, *args, **kwargs)Accumulate a function with state across batch elements assign
(**kwargs)Assign new columns to this dataframe cummax
()Cumulative maximum cummin
()Cumulative minimum cumprod
()Cumulative product cumsum
()Cumulative sum emit
(x)groupby
(other)Groupby aggreagtions map_partitions
(func, *args, **kwargs)Map a function across all batch elements of this stream mean
()Average plot
([backlog, width, height])Plot streaming dataframe as Bokeh plot rolling
(window[, min_periods])Compute rolling aggregations round
([decimals])Round elements in frame stop
()sum
()Sum frame tail
([n])Round elements in frame to_frame
()Convert to a streaming dataframe verify
(x)Verify consistency of elements that pass through this stream