Thursday 26 January 2017

Flume Tutorial

Download and import Hortonworks sandbox into VM.

Start the VM once imported and open the URL shown on screen once the sandbox is ready.
Hortonworks Sandbox URL



















Enter the above URL in browser and open Ambari to make sure Flume service is running


Flume Service check




















Connect to sandbox from any ssh client like putty. (I've used Mobaxterm here)

HWX sandbox connection from ssh client





















Create Config file as below in any directory of your choice

Flume Configuration

























Change the directory to /usr/hdp/current/flume-server/bin

Execute the below command to start Flume agent i.e agt1 in above conf file. and connect to related channel & sink.
flume-ng agent --conf conf --conf-file /root/flume_conf.conf --name agt1 -Dflume.root.logger=INFO,console

Flume stats
















Make sure that Sink, Source are started and connected with an active channel. (observe the above diagram)

Now, go to the source/spool directory location given in conf directory and create a sample file.
you will observe the file getting consumed immediately after the creation.

TestFile creation in spool directory

Started moving the file

Process completed
















Source file renamed






File created in HDFS (sink directory)






Observation:  the file in source directory will be renamed to <filename>.COMPLETED

 A good reference for Shell scripting  https://linuxcommand.org/lc3_writing_shell_scripts.php