On CPU cache access -


i decided experiment code examples paper what every programmer should know memory (pdf), , saw different result section 6.2.

the programs doing matrix multiplications, have made little change , put them @ https://github.com/herberteuler/cpumemory.

in paper, said optimization matrix1.c matrix2.c, transposing second matrix, save 76.6% of cpu cycles (section 6.2.1, page 50):

             original      transposed cycles    16,765,297,870  3,922,373,010 relative       100%           23.4% 

but on node, result different above list.

this kernel , cpu information:

$ uname -a linux herberteuler 3.9-1-amd64 #1 smp debian 3.9.8-1 x86_64 gnu/linux $ cat /proc/cpuinfo processor       : 0 vendor_id       : genuineintel cpu family      : 6 model           : 23 model name      : intel(r) core(tm)2 duo cpu     p8600  @ 2.40ghz stepping        : 10 microcode       : 0xa07 cpu mhz         : 2401.000 cache size      : 3072 kb physical id     : 0 siblings        : 2 core id         : 0 cpu cores       : 2 apicid          : 0 initial apicid  : 0 fpu             : yes fpu_exception   : yes cpuid level     : 13 wp              : yes flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida dtherm tpr_shadow vnmi flexpriority bogomips        : 4788.33 clflush size    : 64 cache_alignment : 64 address sizes   : 36 bits physical, 48 bits virtual power management:  processor       : 1 vendor_id       : genuineintel cpu family      : 6 model           : 23 model name      : intel(r) core(tm)2 duo cpu     p8600  @ 2.40ghz stepping        : 10 microcode       : 0xa07 cpu mhz         : 800.000 cache size      : 3072 kb physical id     : 0 siblings        : 2 core id         : 1 cpu cores       : 2 apicid          : 1 initial apicid  : 1 fpu             : yes fpu_exception   : yes cpuid level     : 13 wp              : yes flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida dtherm tpr_shadow vnmi flexpriority bogomips        : 4788.33 clflush size    : 64 cache_alignment : 64 address sizes   : 36 bits physical, 48 bits virtual power management: 

and result of executing matrix1 , matrix2:

$ ./matrix1 cpu cycles: 18071621964 $ ./matrix2 cpu cycles: 15716582775 

why don't see huge reduction of cpu cycles in matrix2, expected?

thanks in advance.


Comments

Popular posts from this blog

How to mention the localhost in android -

php - Calling a template part from a post -

c# - String.format() DateTime With Arabic culture -