python - Reindexing and filling NaN values in Pandas -

February 15, 2013

consider dataset:

data_dict = {'ind' : [1, 2, 3, 4], 'location' : [301, 301, 302, 303], 'ind_var' : [4, 8, 10, 15], 'loc_var' : [1, 1, 7, 3]} df = pd.dataframe(data_dict)  df_indexed = df.set_index(['ind', 'location']) df_indexed

which looks like

           ind_var loc_var ind location         1   301      4      1 2   301      8      1 3   302     10      7 4   303     15      3

ind_var variable varies ind ( = individual) , loc_var varies location. (i have variable varies both ind , location, i'm omitting simplify presentation)

i need transform data have each individual index contain possible locations. can reindex in way (just showing individuals 1 3):

new_shape = [(1, 301), (1, 302), (1, 303), (2, 301), (2, 302), (2, 303), (3, 301), (3, 302), (3, 303)] idx = pd.index(new_shape) df2 = df_indexed.reindex(idx, method = none) df2.index.names = ['id', 'location']

which gives

        ind_var loc_var id  location         1   301     4     1     302    nan   nan     303    nan   nan 2   301     8     1     302    nan   nan     303    nan   nan 3   301    nan   nan     302    10     7     303    nan   nan

but need way fill missing values, get:

        ind_var loc_var id  location         1   301     4     1     302     4     7     303     4     3 2   301     8     1     302     8     7     303     8     3 3   301    10     1     302    10     7     303    10     3

i tried 2 different things no success:

1) using loc_dict = {301 : 1, 302 : 7, 303 : 3} replace loc_var , ind_dict = {1 : 4, 2: 8, 3: 10, 4 : 15} replace ind_var

2) using groupby method.

# first reset index df_non_indexed = df2.reset_index()  df_non_indexed['loc_var'] = df_non_indexed.groupby(['location'])['loc_var'].transform(lambda x: x.fillna(method='ffill'))

this works, fill forward (or backwards)

there must simple way of doing this, haven't been able figure out! time.

note: related question reshaping wide long. i've taken different approach , simplified in hope 1 easier understand.

this can done stack/unstack , groupby easily:

# unstack wide, fillna 0s df_wide = df_indexed.unstack().fillna(0) # stack long df_long = df_wide.stack() # change 0s max using groupby. df_long['ind_var'] = df_long['ind_var'].groupby(level = 0).transform(lambda x: x.max()) df_long['loc_var'] = df_long['loc_var'].groupby(level = 1).transform(lambda x: x.max()) print df_long

this gives results:

                   ind_var  loc_var ind location                   1   301             4        1     302             4        7     303             4        3 2   301             8        1     302             8        7     303             8        3 3   301            10        1     302            10        7     303            10        3 4   301            15        1     302            15        7     303            15        3

Search This Blog

Live

python - Reindexing and filling NaN values in Pandas -

Comments

Post a Comment

Popular posts from this blog

javascript - JS causing window size to be bigger than necessary - Dropdown bug -

How to mention the localhost in android -

php - Calling a template part from a post -