python - Use Pandas' NaNs to filter out holes in time series -


i having bit of trouble filtering data pandas nas. have data frame looking this:

        jan       feb       mar       apr       may june 0  0.349143  0.249041  0.244352       nan  0.425336  nan 1  0.530616  0.816829       nan  0.212282  0.099364  nan 2  0.713001  0.073601  0.242077  0.553908  nan       nan 3  0.245295  0.007016  0.444352  0.515705  0.497119  nan 4  0.195662  0.007249       nan  0.852287  nan       nan 

and need filter out rows have "holes". think of rows time series, , hole mean nas in middle of series, not @ end. i.e. in data frame above, lines 0, 1 , 4 have holes, 2 , 3 not (having nas @ end of row).

the way think of far this:

for rowindex, row in df.iterrows():     # step through each entry in row      # , after encountering first na,      # check if subsequent values na too. 

but hoping there might less convoluted , more efficient way it.

thanks, anne

as say, looping (iterrows) last resort. try this, uses apply axis=1 instead of iterating through rows.

in [19]: def holey(s):     starts_at = s.notnull().argmax()     next_null = s[starts_at:].isnull().argmax()     if next_null == 0:         return false     any_values_left = s[next_null:].notnull().any()     return any_values_left    ....:   in [20]: df.apply(holey, axis=1) out[20]:  0     true 1     true 2    false 3    false 4     true dtype: bool 

and can filter df[~df.apply(holey, axis=1)].

a handy idiom here: use argmax() find first occurrence of true in series of boolean values.


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -