Make hadoop split Lzo input files -
i'm using hadoop-lzo process mr on large compressed data. jobs automatically generated tool, that's not matter. lzo compression works on node (tried distributedlzoindexer), , can use streaming on splitted lzo files using command line :
hadoop jar /path/to/jar/hadoop-streaming-1.2.0.1.3.0.0-107.jar \ -input /path/to/testfile.lzo -output wc_test -inputformat com.hadoop.mapred.deprecatedlzotextinputformat \ -mapper 'cat' -reducer 'wc -l' it creates 11 map (according file size guess) , process normally. when try other jar file lzo file processed using 1 map. question is
normally hadoop use input format according compression codec ? hadoop-lzo-0.4.3.jar in path, not understand why still uses default textformat.
is there way of forcing hadoop use lzotextinputformat ?
thanks reading.
Comments
Post a Comment