Python CSV - Need to sum up values in a column grouped by value in another column -
i have data in csv needs parsed. looks like:
date, name, subject, sid, mark 2/2/2013, andy cole, history, 216351, 98 2/2/2013, andy cole, maths, 216351, 87 2/2/2013, andy cole, science, 217387, 21 2/2/2013, bryan carr, maths, 216757, 89 2/2/2013, carl jon, botany, 218382, 78 2/2/2013, bryan carr, biology, 216757, 27
i need have sid key , sum values in mark column using key. output like:
sid mark 216351 185 217387 21 216757 116 218382 78
i not have write output on file. need when execute python file. similar question. how should changed skip columns in between?
this concept of histogram. use defaultdict(int)
collections
, iterate through rows. use 'sid' value key dict , add 'mark' value current value.
the defaultdict of type int makes sure if key not existing far value becomes initialized 0.
from collections import defaultdict d = defaultdict(int) open("data.txt") f: line in f: tokens = [t.strip() t in line.split(",")] try: sid = int(tokens[3]) mark = int(tokens[4]) except valueerror: continue d[sid] += mark print d
output:
defaultdict(<type 'int'>, {217387: 21, 216757: 116, 218382: 78, 216351: 185})
you can change parsing part else (e.g. use csvreader
or perform other validations). key point here use defaultdict(int)
, update so:
d[sid] += mark
Comments
Post a Comment