Make hadoop split Lzo input files -

September 15, 2015

i'm using hadoop-lzo process mr on large compressed data. jobs automatically generated tool, that's not matter. lzo compression works on node (tried distributedlzoindexer), , can use streaming on splitted lzo files using command line :

hadoop  jar /path/to/jar/hadoop-streaming-1.2.0.1.3.0.0-107.jar \ -input /path/to/testfile.lzo -output wc_test  -inputformat com.hadoop.mapred.deprecatedlzotextinputformat \ -mapper 'cat' -reducer 'wc -l'

it creates 11 map (according file size guess) , process normally. when try other jar file lzo file processed using 1 map. question is

normally hadoop use input format according compression codec ? hadoop-lzo-0.4.3.jar in path, not understand why still uses default textformat.

is there way of forcing hadoop use lzotextinputformat ?

thanks reading.

Search This Blog

Live

Make hadoop split Lzo input files -

Comments

Post a Comment

Popular posts from this blog

How to mention the localhost in android -

php - Calling a template part from a post -

javascript - JS causing window size to be bigger than necessary - Dropdown bug -