memory - oversized python dictionary -

March 15, 2014

i have large csv files (around 15e6 rows).

each row looks : id_user|id_software.

my aim build dictionary keys tuples of 2 distincts softwares , values probability these 2 softwares installed on same computer (a computer=id_user).

the first step consists in reading csv file , in building dictionary keys user ids , values tuples containing softwares installed on user's computer.

the second step created final dictionary reading first one.

my problem that, first 5e5 lignes of csv file give birth 1gb dictionary (i use heapy profile me algorithm).

any idea me solve issue ?

here code :

import csv import itertools dict_apparition={} dict_prob_jointe={} nb_user=0 open('discovery_requests_id.csv','ru') f:     f=csv.reader(f)     f.next()     row in f:     a=int(row[1])     b=int(row[0])         try:             if not in dict_apparition[b]:                 dict_apparition[b]+=(a,)         except:             dict_apparition[b]=(a,)             nb_user+=1  key1 in dict_apparition.keys():     l=itertools.combinations(dict_apparition[key1],2)     item in l:         try:             dict_prob_jointe[item]+=1.0/nb_user         except:             dict_prob_jointe[item]=1.0/nb_user return dict_prob_jointe

thansks !

Search This Blog

Live

memory - oversized python dictionary -

Comments

Post a Comment

Popular posts from this blog

How to mention the localhost in android -

php - Calling a template part from a post -

c# - String.format() DateTime With Arabic culture -