hadoop - Flume NG and HDFS -
i new hadoop , please excuse dumb questions.
i have following knowledge best usecase of hadoop large files helping in efficiency while running mapreduce tasks.
keeping above in mind confused flume ng. assume tailing log file , logs produced every second, moment log gets new line transferred hdfs via flume.
a) mean flume creates new file on every line logged in log file tailing or append existing hdfs file ??
b) append allowed in hdfs in first place??
c) if answer b true ?? ie contents appended , how , when should run mapreduce application?
above questions sound silly answers same highly appreciated.
ps: have not yet set flume ng or hadoop yet, reading articles understanding , how add value company.
flume writes hdfs means of hdfs sink. when flume starts , begins receive events, sink opens new file , writes events it. @ point opened file should closed, , until data in current block being written not visible other redaers.
as described in the documentation, flume hdfs sink has several file closing strategies:
- each n seconds (specified
rollintervaloption) - after writing n bytes (
rollsizeoption) - after writing n received events (
rollcountoption) - after n seconds of inactivity (
idletimeoutoption)
so, questions:
a) flume writes events opened file until closed (and new file opened).
b) append allowed in hdfs, flume not use it. after file closed, flume not append data.
c) hide opened file mapreduce application use inuseprefix option - files name starts . not visible mr jobs.
Comments
Post a Comment