Hi Reader,
Below are the questions that I faced in one of the recent interviews. Hope it helps
Hadoop Framework:
- What is the difference between existing file system and HDFS? why do we need HDFS?
- What are the different modes?
- Configuration files and their properties
- Where do we set Name node, Data node, task tracker address/location?
- How many instances of Job tracker runs on a cluster?
- What is the difference b/w job and a task?
- How job tracker manages the jobs?
- How many task trackers exist on a data node?
- What happens if a job tracker fails?
- What happens if the Name node fails?
- How Secondary Name node will get the data present in Name node?
Map Reduce:
- Word count example flow/ Map reduce job flow?
- What are the phases of reducer?
- What is speculative execution?
- If two instances of same mapper gets completed at same time, what are the factors that job tracker consider in the selection of completed task?
- What do you mean by combiner?
- Where do we need to use combiner ?
- Which class/Interface will be used to write Combiner?
- What is partitioning?
- How to implement customized partition method?
- What is Map/Reduce side join? when do we go for it?Adv and disadvantages ?
- What is distributed cache?
Hive:
- When is Hive used?
- How to change the location of schema while creating it?
- What are the different properties that can be set while defining schema?
- What are the types of partitions?
- Explain a Scenario where we need a partition.
- What is bucketing?
- What are the properties that can be set in the Hive query?
- What does explain plan contain?
- How the number of mappers and reducers will be decided in a Hive query? Example??
- How many number of Map reduce jobs will be created for a join query on 3 tables by same key?
- What are the properties to be set for query optimization?
- What is AVRO?
- Write a query to find the top 2nd student details based on his marks
- Write a query to filter all the duplicate records
- Methods to implement for an UDF??
- Commands to run before using UDF in a Hive query?
- Explain the factors to be considered in Schema/Table design
PIG:
- What is pig? when do we use it?
- What is the difference b/w Hive and Pig? when to use Pig and when to use Hive?
- Different joins supported by PIG?
- How to limit number of records?
- Explain foreach.
- what is co-group?
- what is bag? can a bag contain duplicate values?
- How to find length of a column value in Pig?
- Explain UDF implementation.
- How to define constant in pig script?
Other:
- What is oozie? when do we use it?
- Give an example oozie program/script/code
- When to use Sqoop? architecture of sqoop?
- Is Sqoop Map only job or Map Reduce job?
Please add more questions in comment if you have. All the very best :)