memory - oversized python dictionary -


i have large csv files (around 15e6 rows).

each row looks : id_user|id_software.

my aim build dictionary keys tuples of 2 distincts softwares , values probability these 2 softwares installed on same computer (a computer=id_user).

the first step consists in reading csv file , in building dictionary keys user ids , values tuples containing softwares installed on user's computer.

the second step created final dictionary reading first one.

my problem that, first 5e5 lignes of csv file give birth 1gb dictionary (i use heapy profile me algorithm).

any idea me solve issue ?

here code :

import csv import itertools dict_apparition={} dict_prob_jointe={} nb_user=0 open('discovery_requests_id.csv','ru') f:     f=csv.reader(f)     f.next()     row in f:     a=int(row[1])     b=int(row[0])         try:             if not in dict_apparition[b]:                 dict_apparition[b]+=(a,)         except:             dict_apparition[b]=(a,)             nb_user+=1  key1 in dict_apparition.keys():     l=itertools.combinations(dict_apparition[key1],2)     item in l:         try:             dict_prob_jointe[item]+=1.0/nb_user         except:             dict_prob_jointe[item]=1.0/nb_user return dict_prob_jointe 

thansks !


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -