It may become necessary to traverse the elements of a series or the rows of a dataframe in a way that the next element or next row is dependent on the previously selected element or row. This is called path dependency.
Consider the following time series s with irregular frequency.
#starting python community conventions
import numpy as np
import pandas as pd
# n is number of observations
n = 5000
day = pd.to_datetime(['2013-02-06'])
# irregular seconds spanning 28800 seconds (8 hours)
seconds = np.random.rand(n) * 28800 * pd.Timedelta(1, 's')
# start at 8 am
start = pd.offsets.Hour(8)
# irregular timeseries
tidx = day + start + seconds
tidx = tidx.sort_values()
s = pd.Series(np.random.randn(n), tidx, name='A').cumsum()
s.plot();
Let's assume a path dependent condition. Starting with the first member of the series, I want to grab each subsequent element such that the absolute difference between that element and the current element is greater than or equal to x.
We'll solve this problem using python generators.
Generator function
def mover(s, move_size=10):
"""Given a reference, find next value with
an absolute difference >= move_size"""
ref = None
for i, v in s.iteritems():
if ref is None or (abs(ref - v) >= move_size):
yield i, v
ref = v
Then we can define a new series moves like so
moves = pd.Series({i:v for i, v in mover(s, move_size=10)},
name='_{}_'.format(s.name))
Plotting them both
moves.plot(legend=True)
s.plot(legend=True)
The analog for dataframes would be:
def mover_df(df, col, move_size=2):
ref = None
for i, row in df.iterrows():
if ref is None or (abs(ref - row.loc[col]) >= move_size):
yield row
ref = row.loc[col]
df = s.to_frame()
moves_df = pd.concat(mover_df(df, 'A', 10), axis=1).T
moves_df.A.plot(label='_A_', legend=True)
df.A.plot(legend=True)