performance - Why does Haskell perform so poorly when executing C-like codes? (in this instance at least) -
i trying figure out performance issues have been having haskell. part of that, have written small comparison program compare c , haskell. specifically, translated c program haskell few changes could. speed-measured part of haskell program written in imperative style.
the program makes 2 list of random numbers in range, , calculates integral of graph formed connecting points, 1 list being x-values , 1 list being y-values. essentially, trapezoidal rule.
here 2 codes:
main.c
#include <stdio.h> #include <stdlib.h> #include <time.h> #define n 5000000 #define maxy 1e5f/n #define maxxgap 1 int main(){ int i; float *y, *x; float xaccum, area; clock_t begin, end; double time_spent; y = (float*)malloc(sizeof(float)*n); x = (float*)malloc(sizeof(float)*n); srand(50546345); // change seed different numbers //populate y , x fields random points for(i = 0; < n; i++){ y[i] = ((float)rand())/((float)rand_max)*maxy; } xaccum = 0; for(i = 0; < n; i++){ x[i] = xaccum; xaccum += ((float)rand())/((float)rand_max)*maxxgap; } begin = clock(); //perform trapezoidal integration using x y coordinates area = 0; for(i = 0; < n-1; i++){ area += (y[i+1]+y[i])/2*(x[i+1]-x[i]); } end = clock(); time_spent = (double)(end - begin) / clocks_per_sec * 1000; printf("%i points\n%f area\n%f ms\n", n, area, time_spent); }
main.hs
{-# language bangpatterns #-} module main import data.array.unboxed import data.array.io import data.list import system.random import system.cputime import text.printf import control.exception main :: io () main = (x,y) <- initarrays area <- time $ integrate x y print area n :: int n = 5000000 maxy :: float maxy = 100000.0/(fromintegral n) maxxgap :: float maxxgap = 1 --initialize arrays random floats --this part not measured in running time (very slow) initarrays :: io (iouarray int float, iouarray int float) initarrays = y <- newlistarray (0,n-1) (randomlist maxy n (mkstdgen 23432)) x <- newlistarray (0,n-1) (scanl1 (+) $ randomlist maxxgap n (mkstdgen 5462)) return (x,y) randomlist :: float -> int -> stdgen -> [float] randomlist max n gen = map (abs . ((*) max)) (take n . unfoldr (just . random) $ gen) integrate :: iouarray int float -> iouarray int float -> io float integrate x y = iterative x y 0 0 iterative :: iouarray int float -> iouarray int float -> int -> float -> io float iterative x y !i !accum = if == n-1 return accum else x1 <- readarray x x2 <- readarray x (i+1) y1 <- readarray y y2 <- readarray y (i+1) iterative x y (i+1) (accum + (y2+y1)/2*(x2-x1)) time :: io t -> io t time = start <- getcputime v <- end <- getcputime let diff = (fromintegral (end-start)) / (10^9) printf "computation time %0.5f ms\n" (diff :: double) return v
the c integration runs in 7 ms , haskell integration in 60 ms on system. of course haskell version slower, wondering why slower. there lot of inefficiency in haskell code.
why haskell code slower? how 1 fix it?
thanks answers.
out of curiosity, ran llvm:
ghc test.hs -o2 -xbangpatterns -fllvm -optlo-o3
and took down 60ms 24ms. still not ideal.
so, 1 of first things i'll when want know why benchmark slow, dump prepared core. is, core after optimisations.
ghc test.hs -o2 -ddump-prep -dsuppress-all -xbangpatterns > test.hscore
after looking through core, found $wa, iterative loop defined. turns out it's making surprisingly many index bound checks. see, use data.vector.unboxed, has "unsaferead" , "unsafeindex" functions, remove bounds checks. these useful here. personally, think vector package superior.
if @ $wa, you'll notice it's unboxing arguments @ start:
case w_s3o9 of _ { stuarray l_s3of u_s3oi ds1_s3ol ds2_s3oh -> case l_s3of of wild2_s3os { i# m_s3oo -> case u_s3oi of wild3_s3ot { i# n1_s3ov -> case ds1_s3ol of wild4_s3oc { i# y1_s3oe ->
this looks bad, turns out in recursive call it's using specialised version, integrate_$s$wa, unboxed integers etc. good.
in summary, think should improvement using vector unsafe indexing.
edit: here modified version data.vector. runs in 7ms. good vector code, think slowness compared c should due incomplete alias analysis. https://gist.github.com/amosr/6026995
Comments
Post a Comment