Make hadoop split Lzo input files -


i'm using hadoop-lzo process mr on large compressed data. jobs automatically generated tool, that's not matter. lzo compression works on node (tried distributedlzoindexer), , can use streaming on splitted lzo files using command line :

hadoop  jar /path/to/jar/hadoop-streaming-1.2.0.1.3.0.0-107.jar \ -input /path/to/testfile.lzo -output wc_test  -inputformat com.hadoop.mapred.deprecatedlzotextinputformat \ -mapper 'cat' -reducer 'wc -l' 

it creates 11 map (according file size guess) , process normally. when try other jar file lzo file processed using 1 map. question is

normally hadoop use input format according compression codec ? hadoop-lzo-0.4.3.jar in path, not understand why still uses default textformat.

is there way of forcing hadoop use lzotextinputformat ?

thanks reading.


Comments

Popular posts from this blog

How to mention the localhost in android -

php - Calling a template part from a post -

c# - String.format() DateTime With Arabic culture -