python - Use Pandas' NaNs to filter out holes in time series -
i having bit of trouble filtering data pandas nas. have data frame looking this:
jan feb mar apr may june 0 0.349143 0.249041 0.244352 nan 0.425336 nan 1 0.530616 0.816829 nan 0.212282 0.099364 nan 2 0.713001 0.073601 0.242077 0.553908 nan nan 3 0.245295 0.007016 0.444352 0.515705 0.497119 nan 4 0.195662 0.007249 nan 0.852287 nan nan
and need filter out rows have "holes". think of rows time series, , hole mean nas in middle of series, not @ end. i.e. in data frame above, lines 0, 1 , 4 have holes, 2 , 3 not (having nas @ end of row).
the way think of far this:
for rowindex, row in df.iterrows(): # step through each entry in row # , after encountering first na, # check if subsequent values na too.
but hoping there might less convoluted , more efficient way it.
thanks, anne
as say, looping (iterrows) last resort. try this, uses apply
axis=1
instead of iterating through rows.
in [19]: def holey(s): starts_at = s.notnull().argmax() next_null = s[starts_at:].isnull().argmax() if next_null == 0: return false any_values_left = s[next_null:].notnull().any() return any_values_left ....: in [20]: df.apply(holey, axis=1) out[20]: 0 true 1 true 2 false 3 false 4 true dtype: bool
and can filter df[~df.apply(holey, axis=1)]
.
a handy idiom here: use argmax()
find first occurrence of true
in series of boolean values.
Comments
Post a Comment