python - Reindexing and filling NaN values in Pandas -
consider dataset:
data_dict = {'ind' : [1, 2, 3, 4], 'location' : [301, 301, 302, 303], 'ind_var' : [4, 8, 10, 15], 'loc_var' : [1, 1, 7, 3]} df = pd.dataframe(data_dict) df_indexed = df.set_index(['ind', 'location']) df_indexed which looks like
ind_var loc_var ind location 1 301 4 1 2 301 8 1 3 302 10 7 4 303 15 3 ind_var variable varies ind ( = individual) , loc_var varies location. (i have variable varies both ind , location, i'm omitting simplify presentation)
i need transform data have each individual index contain possible locations. can reindex in way (just showing individuals 1 3):
new_shape = [(1, 301), (1, 302), (1, 303), (2, 301), (2, 302), (2, 303), (3, 301), (3, 302), (3, 303)] idx = pd.index(new_shape) df2 = df_indexed.reindex(idx, method = none) df2.index.names = ['id', 'location'] which gives
ind_var loc_var id location 1 301 4 1 302 nan nan 303 nan nan 2 301 8 1 302 nan nan 303 nan nan 3 301 nan nan 302 10 7 303 nan nan but need way fill missing values, get:
ind_var loc_var id location 1 301 4 1 302 4 7 303 4 3 2 301 8 1 302 8 7 303 8 3 3 301 10 1 302 10 7 303 10 3 i tried 2 different things no success:
1) using loc_dict = {301 : 1, 302 : 7, 303 : 3} replace loc_var , ind_dict = {1 : 4, 2: 8, 3: 10, 4 : 15} replace ind_var
2) using groupby method.
# first reset index df_non_indexed = df2.reset_index() df_non_indexed['loc_var'] = df_non_indexed.groupby(['location'])['loc_var'].transform(lambda x: x.fillna(method='ffill')) this works, fill forward (or backwards)
there must simple way of doing this, haven't been able figure out! time.
note: related question reshaping wide long. i've taken different approach , simplified in hope 1 easier understand.
this can done stack/unstack , groupby easily:
# unstack wide, fillna 0s df_wide = df_indexed.unstack().fillna(0) # stack long df_long = df_wide.stack() # change 0s max using groupby. df_long['ind_var'] = df_long['ind_var'].groupby(level = 0).transform(lambda x: x.max()) df_long['loc_var'] = df_long['loc_var'].groupby(level = 1).transform(lambda x: x.max()) print df_long this gives results:
ind_var loc_var ind location 1 301 4 1 302 4 7 303 4 3 2 301 8 1 302 8 7 303 8 3 3 301 10 1 302 10 7 303 10 3 4 301 15 1 302 15 7 303 15 3
Comments
Post a Comment