python - Organizing data read from Excel to Pandas DataFrame -

February 15, 2015

my goal script to: 1.read timseries data in excel file (>100,000k rows) headers (labels, units) 2.convert excel numeric dates best datetime object pandas dataframe 3.be able use timestamps reference rows , series labels reference columns

so far used xlrd read excel data list. made pandas series each list , used time list index. combined series series headers make python dictionary. passed dictionary pandas dataframe. despite efforts df.index seems set column headers , i'm not sure when convert dates datetime object.

i started using python 3 days ago advice great! here's code:

    #open excel workbook , first sheet     wb = xlrd.open_workbook("c:\greencsv\calgary\cwater.xlsx")     sh = wb.sheet_by_index(0)      #read rows containing labels , units     labels = sh.row_values(1, start_colx=0, end_colx=none)     units = sh.row_values(2, start_colx=0, end_colx=none)      #initialize list hold data     data = [none] * (sh.ncols)      #read column column , store in list     colnum in range(sh.ncols):         data[colnum] = sh.col_values(colnum, start_rowx=5, end_rowx=none)      #delete unecessary rows , columns     del labels[3],labels[0:2], units[3], units[0:2], data[3], data[0:2]         #create pandas series     s = [none] * (sh.ncols - 4)     colnum in range(sh.ncols - 4):         s[colnum] = series(data[colnum+1], index=data[0])      #create dictionary of series     dictionary = {}     in range(sh.ncols-4):         dictionary[i]= {labels[i] : s[i]}      #pass dictionary pandas dataframe     df = pd.dataframe.from_dict(dictionary)

you can use pandas directly here, create dictionary of dataframes (with keys being sheet name):

in [11]: xl = pd.excelfile("c:\greencsv\calgary\cwater.xlsx")  in [12]: xl.sheet_names  # in example may different out[12]: [u'sheet1', u'sheet2', u'sheet3']  in [13]: dfs = {sheet: xl.parse(sheet) sheet in xl.sheet_names}  in [14]: dfs['sheet1'] # access dataframe sheet name

you can check out docs on parse offers more options (for example skiprows), , these allows parse individual sheets more control...

Search This Blog

Live

python - Organizing data read from Excel to Pandas DataFrame -

Comments

Post a Comment

Popular posts from this blog

How to mention the localhost in android -

php - Calling a template part from a post -

java - How should I set a HttpURLConnection to be the same as a HttpServletRequest? -