Wednesday 22 April 2015

Hadoop Interview Questions 2015 - Part 1



Few more questions.. Happy Reading



1.Explain how Hadoop is different from other parallel computing solutions.

2.What are the modes Hadoop can run in?

3.What is a NameNode and what is a DataNode?

4.What is Shuffling in MapReduce?

5.What is the functionality of Task Tracker and Job Tracker in Hadoop? How many instances of a Task Tracker and Job Tracker can be run on a single Hadoop Cluster?

6.How does NameNode tackle DataNode failures?

7.What is InputFormat in Hadoop?

8.What is the purpose of RecordReader in Hadoop?

9.Why can't we use Java primitive data types in Map Reduce?

10.Explain how do you decide between Managed & External tables in hive

11.Can we change the default location of Managed tables

12.What are the points to consider when moving from an Oracle database to Hadoop clusters? How would you decide the correct size and number of nodes in a Hadoop cluster?

13.If you want to analyze 100TB of data, what is the best architecture for that?

14.What is InputSplit in MapReduce?

15 In Hadoop, if custom partitioner is not defined then, how is data partitioned before it is sent to the reducer?

16.What is replication factor in Hadoop and what is default replication factor level Hadoop comes with?

17.What is SequenceFile in Hadoop and Explain its importance?

18.What is Speculative execution in Hadoop?

19.What are the factors that we consider while creating a hive table

20.What are the compression techniques and how do you decide which one to use

21.Co group in Pig

22.If you are the user of a MapReduce framework, then what are the configuration parameters you need to specify?

23.How do you benchmark your Hadoop Cluster with Hadoop tools?

24.Explain the difference between ORDER BY and SORT BY in Hive?

25.What is WebDAV in Hadoop?

26.How many Daemon processes run on a Hadoop System?

27.Hadoop attains parallelism by isolating the tasks across various nodes; it is possible for some of the slow nodes to rate-limit the rest of the program and slows down the program. What method Hadoop provides to combat this?

28.How are HDFS blocks replicated?

29.What will a Hadoop job do if developers try to run it with an output directory that is already present?

30.What happens if the number of reducers is 0?

31.What is meant by Map-side and Reduce-side join in Hadoop?

32.How can the NameNode be restarted?

33.How to include partitioned column in data - Hive

34.What hadoop -put command do exactly

35.What is the limit on Distributed cache size?

36.Handling skewed data

37.When doing a join in Hadoop, you notice that one reducer is running for a very long time. How will address this problem in Pig?

38.How can you debug your Hadoop code?

39.What is distributed cache and what are its benefits?

40.Why would a Hadoop developer develop a Map Reduce by disabling the reduce step?

41.Explain the major difference between an HDFS block and an InputSplit.

42.Are there any problems which can only be solved by MapReduce and cannot be solved by PIG? In which kind of scenarios MR jobs will be more useful than PIG?

43.What is the need for having a password-less SSH in a distributed environment?

44.Give an example scenario on the usage of counters.

45.Does HDFS make block boundaries between records?

46.What is streaming access?

47.What do you mean by “Heartbeat” in HDFS?

48.If there are 10 HDFS blocks to be copied from one machine to another. However, the other machine can copy only 7.5 blocks, is there a possibility for the blocks to be broken down during the time of replication?

49.What is the significance of conf.setMapper class?

50.What are combiners and when are these used in a MapReduce job?

51.What are the Different joins in hive?

52.Explain about SMB join in Hive

53.Which command is used to do a file system check in HDFS?

54.Explain about the different parameters of the mapper and reducer functions.

55.How can you set random number of mappers and reducers for a Hadoop job?

56.Did you ever built a production process in Hadoop? If yes, what was the process when your Hadoop job fails due to any reason? (Open Ended Question

57.Explain about the functioning of Master Slave architecture in Hadoop?

58.What is fault tolerance in HDFS?

59.Give some examples of companies that are using Hadoop architecture extensively.

60.How does a DataNode know the location of the NameNode in Hadoop cluster?

61.How can you check whether the NameNode is working or not?

62.Explain about the different types of “writes” in HDFS.


Hope this helps!

4 comments:

  1. Answer please................

    ReplyDelete
  2. Does anyone have answers to the above questions?

    ReplyDelete
  3. very good collection of questions, but you can add the answers as well thank you for sharing this questions. Know more about Big Data Hadoop Training in Bangalore

    ReplyDelete




  4. I have seen a lot of blogs and Info. on other Blogs and Web sites But in this Hadoop Blog Information is useful very thanks for sharing it........

    ReplyDelete

 A good reference for Shell scripting  https://linuxcommand.org/lc3_writing_shell_scripts.php