Collections API

Collections

Streaming([stream, example]) Superclass for streaming collections
Streaming.map_partitions(func, *args, **kwargs) Map a function across all batch elements of this stream
Streaming.accumulate_partitions(func, *args, …) Accumulate a function with state across batch elements
Streaming.verify(x) Verify elements that pass through this stream

Batch

Batch
Batch.filter
Batch.map
Batch.pluck
Batch.to_dataframe
Batch.to_stream

Dataframes

DataFrame
DataFrame.groupby
DataFrame.rolling
DataFrame.assign
DataFrame.sum
DataFrame.mean
DataFrame.cumsum
DataFrame.cumprod
DataFrame.cummin
DataFrame.cummax
DataFrame.plot
GroupBy
GroupBy.count
GroupBy.mean
GroupBy.size
GroupBy.std
GroupBy.sum
GroupBy.var
Rolling(sdf, window, min_periods) Rolling aggregations
Rolling.aggregate(*args, **kwargs) Rolling aggregation
Rolling.count(*args, **kwargs) Rolling count
Rolling.max() Rolling maximum
Rolling.mean() Rolling mean
Rolling.median() Rolling median
Rolling.min() Rolling minimum
Rolling.quantile(*args, **kwargs) Rolling quantile
Rolling.std(*args, **kwargs) Rolling standard deviation
Rolling.sum() Rolling sum
Rolling.var(*args, **kwargs) Rolling variance
DataFrame.window
Window.apply
Window.count
Window.groupby
Window.sum
Window.size
Window.std
Window.var
Rolling.aggregate(*args, **kwargs) Rolling aggregation
Rolling.count(*args, **kwargs) Rolling count
Rolling.max() Rolling maximum
Rolling.mean() Rolling mean
Rolling.median() Rolling median
Rolling.min() Rolling minimum
Rolling.quantile(*args, **kwargs) Rolling quantile
Rolling.std(*args, **kwargs) Rolling standard deviation
Rolling.sum() Rolling sum
Rolling.var(*args, **kwargs) Rolling variance
Random([freq, interval, dask]) A streaming dataframe of random data

Details

class streamz.collection.Streaming(stream=None, example=None)

Superclass for streaming collections

Do not create this class directly, use one of the subclasses instead.

Parameters:

stream: streamz.Stream

example: object

An object to represent an example element of this stream

See also

streamz.dataframe.StreamingDataFrame, streamz.dataframe.StreamingBatch

Methods

accumulate_partitions(func, *args, **kwargs) Accumulate a function with state across batch elements
emit(x)
map_partitions(func, *args, **kwargs) Map a function across all batch elements of this stream
verify(x) Verify elements that pass through this stream
accumulate_partitions(func, *args, **kwargs)

Accumulate a function with state across batch elements

map_partitions(func, *args, **kwargs)

Map a function across all batch elements of this stream

The output stream type will be determined by the action of that function on the example

verify(x)

Verify elements that pass through this stream

class streamz.dataframe.Rolling(sdf, window, min_periods)

Rolling aggregations

This intermediate class enables rolling aggregations across either a fixed number of rows or a time window.

Examples

>>> sdf.rolling(10).x.mean()  
>>> sdf.rolling('100ms').x.mean()  

Methods

aggregate(*args, **kwargs) Rolling aggregation
count(*args, **kwargs) Rolling count
max() Rolling maximum
mean() Rolling mean
median() Rolling median
min() Rolling minimum
quantile(*args, **kwargs) Rolling quantile
std(*args, **kwargs) Rolling standard deviation
sum() Rolling sum
var(*args, **kwargs) Rolling variance
aggregate(*args, **kwargs)

Rolling aggregation

count(*args, **kwargs)

Rolling count

max()

Rolling maximum

mean()

Rolling mean

median()

Rolling median

min()

Rolling minimum

quantile(*args, **kwargs)

Rolling quantile

std(*args, **kwargs)

Rolling standard deviation

sum()

Rolling sum

var(*args, **kwargs)

Rolling variance

class streamz.dataframe.Random(freq='100ms', interval='500ms', dask=False)

A streaming dataframe of random data

The x column is uniformly distributed. The y column is poisson distributed. The z column is normally distributed.

This class is experimental and will likely be removed in the future

Parameters:

freq: timedelta

The time interval between records

interval: timedelta

The time interval between new dataframes, should be significantly larger than freq

Attributes

columns
dtypes
index

Methods

accumulate_partitions(func, *args, **kwargs) Accumulate a function with state across batch elements
assign(**kwargs) Assign new columns to this dataframe
cummax() Cumulative maximum
cummin() Cumulative minimum
cumprod() Cumulative product
cumsum() Cumulative sum
emit(x)
groupby(other) Groupby aggreagtions
map_partitions(func, *args, **kwargs) Map a function across all batch elements of this stream
mean() Average
plot([backlog, width, height]) Plot streaming dataframe as Bokeh plot
rolling(window[, min_periods]) Compute rolling aggregations
round([decimals]) Round elements in frame
stop()
sum() Sum frame
tail([n]) Round elements in frame
to_frame() Convert to a streaming dataframe
verify(x) Verify consistency of elements that pass through this stream