Saturday 9 August 2014

Hadoop Interview Questions

Hi Reader,

Below are the questions that I faced in one of the recent interviews. Hope it helps 

Hadoop Framework:
  1. What is the difference between existing file system and HDFS? why do we need HDFS?
  2. What are the different modes?
  3. Configuration files and their properties
  4. Where do we set Name node, Data node, task tracker address/location?
  5. How many instances of Job tracker runs on a cluster?
  6. What is the difference b/w job and a task?
  7. How job tracker manages the jobs?
  8. How many task trackers exist on a data node?
  9. What happens if a job tracker fails?
  10. What happens if the Name node fails?
  11. How Secondary Name node will get the data present in Name node?
Map Reduce:

  1. Word count example flow/ Map reduce job flow?
  2. What are the phases of reducer?
  3. What is speculative execution?
  4. If two instances of same mapper gets completed at same time, what are the factors that job tracker consider in the selection of completed task?
  5. What do you mean by combiner?
  6. Where do we need to use combiner ?
  7. Which class/Interface will be used to write Combiner?
  8. What is partitioning?
  9. How to implement customized partition method?
  10. What is Map/Reduce side join? when do we go for it?Adv and disadvantages ?
  11. What is distributed cache?
Hive:
  1. When is Hive used?
  2. How to change the location of schema while creating it?
  3. What are the different properties that can be set while defining schema?
  4. What are the types of partitions?
  5. Explain a Scenario where we need a partition.
  6. What is bucketing?
  7. What are the properties that can be set in the Hive query?
  8. What does explain plan contain? 
  9. How the number of mappers and reducers will be decided in a Hive query? Example??
  10. How many number of Map reduce jobs will be created for a join query on 3 tables by same key?
  11. What are the properties to be set for query optimization?
  12. What is AVRO?
  13. Write a query to find the top 2nd student details based on his marks
  14. Write a query to filter all the duplicate records
  15. Methods to implement for an UDF?? 
  16. Commands  to run before using  UDF in a Hive query?
  17. Explain the factors to be considered in Schema/Table design
PIG:
  1. What is pig? when do we use it?
  2. What is the difference b/w Hive and Pig? when to use Pig and when to use Hive?
  3. Different joins supported by PIG?
  4. How to limit number of records?
  5. Explain foreach.
  6. what is co-group?
  7. what is bag? can a bag contain duplicate values?
  8. How to find length of a column value in Pig?
  9. Explain UDF implementation.
  10. How to define constant in pig script?
Other:
  1. What is oozie? when do we use it? 
  2. Give an example oozie program/script/code
  3. When to use Sqoop? architecture of sqoop? 
  4. Is Sqoop Map only job or Map Reduce job?

Please add more questions in comment if you have. All the very best :)

 A good reference for Shell scripting  https://linuxcommand.org/lc3_writing_shell_scripts.php