Hadoop and Spark and related tools: Hive, Pig, Spark, HBase, Sqoop, Oozie, etc
Write your driver source code using a text editor like vi (or emacs):
vi MaxTemperature.java
Write your mapper and reducer source code:
vi MaxTemperatureMapper.java
vi MaxTemperatureReducer.java
Compile your Java code:
java -version
yarn classpath
javac -classpath `yarn classpath` -d . MaxTemperatureMapper.java
javac -classpath `yarn classpath` -d . MaxTemperatureReducer.java
javac -classpath `yarn classpath`:. -d . MaxTemperature.java
Create your jar file
jar -cvf maxTemp.jar *.class
Create your input data file on the local file system
vi temperatureInputs.txt
Put your input data file into HDFS
hdfs dfs -ls /
hdfs dfs -ls /user
hdfs dfs -ls /user/cloudera
hdfs dfs -mkdir /user/cloudera/class1
hdfs dfs -put temperatureInputs.txt /user/cloudera/class1
hdfs dfs -cat /user/cloudera/class1/temperatureInputs.txt
Run your MapReduce program
hadoop jar maxTemp.jar MaxTemperature /user/cloudera/class1/temperatureInputs.txt /user/cloudera/class1/output
Verify that the program ran and the results are correct
hdfs dfs -ls /user/cloudera/class1/output
hdfs dfs -cat /user/cloudera/class1/output/part-r-00000