hadoop - Flume NG and HDFS -

February 15, 2012

i new hadoop , please excuse dumb questions.

i have following knowledge best usecase of hadoop large files helping in efficiency while running mapreduce tasks.

keeping above in mind confused flume ng. assume tailing log file , logs produced every second, moment log gets new line transferred hdfs via flume.

a) mean flume creates new file on every line logged in log file tailing or append existing hdfs file ??

b) append allowed in hdfs in first place??

c) if answer b true ?? ie contents appended , how , when should run mapreduce application?

above questions sound silly answers same highly appreciated.

ps: have not yet set flume ng or hadoop yet, reading articles understanding , how add value company.

flume writes hdfs means of hdfs sink. when flume starts , begins receive events, sink opens new file , writes events it. @ point opened file should closed, , until data in current block being written not visible other redaers.

as described in the documentation, flume hdfs sink has several file closing strategies:

each n seconds (specified rollinterval option)
after writing n bytes (rollsize option)
after writing n received events (rollcount option)
after n seconds of inactivity (idletimeout option)

so, questions:

a) flume writes events opened file until closed (and new file opened).

b) append allowed in hdfs, flume not use it. after file closed, flume not append data.

c) hide opened file mapreduce application use inuseprefix option - files name starts . not visible mr jobs.

Search This Blog

Live

hadoop - Flume NG and HDFS -

Comments

Post a Comment

Popular posts from this blog

How to mention the localhost in android -

php - Calling a template part from a post -

javascript - JS causing window size to be bigger than necessary - Dropdown bug -