Python CSV - Need to sum up values in a column grouped by value in another column -


i have data in csv needs parsed. looks like:

date, name, subject, sid, mark 2/2/2013, andy cole, history, 216351, 98 2/2/2013, andy cole, maths, 216351, 87 2/2/2013, andy cole, science, 217387, 21 2/2/2013, bryan carr, maths, 216757, 89 2/2/2013, carl jon, botany, 218382, 78 2/2/2013, bryan carr, biology, 216757, 27 

i need have sid key , sum values in mark column using key. output like:

sid     mark 216351  185 217387   21 216757  116 218382   78 

i not have write output on file. need when execute python file. similar question. how should changed skip columns in between?

this concept of histogram. use defaultdict(int) collections , iterate through rows. use 'sid' value key dict , add 'mark' value current value.

the defaultdict of type int makes sure if key not existing far value becomes initialized 0.

from collections import defaultdict  d = defaultdict(int)  open("data.txt") f:     line in f:         tokens = [t.strip() t in line.split(",")]         try:             sid = int(tokens[3])             mark = int(tokens[4])         except valueerror:             continue         d[sid] += mark  print d 

output:

defaultdict(<type 'int'>, {217387: 21, 216757: 116, 218382: 78, 216351: 185}) 

you can change parsing part else (e.g. use csvreader or perform other validations). key point here use defaultdict(int) , update so:

d[sid] += mark 

Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -