It may become necessary to traverse the elements of a series or the rows of a dataframe in a way that the next element or next row is dependent on the previously selected element or row. This is called path dependency.
Consider the following time series s
with irregular frequency.
#starting python community conventions
import numpy as np
import pandas as pd
# n is number of observations
n = 5000
day = pd.to_datetime(['2013-02-06'])
# irregular seconds spanning 28800 seconds (8 hours)
seconds = np.random.rand(n) * 28800 * pd.Timedelta(1, 's')
# start at 8 am
start = pd.offsets.Hour(8)
# irregular timeseries
tidx = day + start + seconds
tidx = tidx.sort_values()
s = pd.Series(np.random.randn(n), tidx, name='A').cumsum()
s.plot();
Let's assume a path dependent condition. Starting with the first member of the series, I want to grab each subsequent element such that the absolute difference between that element and the current element is greater than or equal to x
.
We'll solve this problem using python generators.
Generator function
def mover(s, move_size=10):
"""Given a reference, find next value with
an absolute difference >= move_size"""
ref = None
for i, v in s.iteritems():
if ref is None or (abs(ref - v) >= move_size):
yield i, v
ref = v
Then we can define a new series moves
like so
moves = pd.Series({i:v for i, v in mover(s, move_size=10)},
name='_{}_'.format(s.name))
Plotting them both
moves.plot(legend=True)
s.plot(legend=True)
The analog for dataframes would be:
def mover_df(df, col, move_size=2):
ref = None
for i, row in df.iterrows():
if ref is None or (abs(ref - row.loc[col]) >= move_size):
yield row
ref = row.loc[col]
df = s.to_frame()
moves_df = pd.concat(mover_df(df, 'A', 10), axis=1).T
moves_df.A.plot(label='_A_', legend=True)
df.A.plot(legend=True)