To run the java file I used the following commands
hadoop jar prac01.jar AggregateJob prac01/input prac01/output
As an alternative, using a python mapper and reducer, you can use hadoop streaming.
Depending on which version of hadoop is installed on the VM, a different path and jar file is required.
You may need to check that you have the right jar file - use
Now run hadoop with the Hadoop Streaming utility (the program in the jar file), passing in paramaters to tell Hadoop Streaming what you are using as mapper and reducer programs (these are on the Unix file system), where the input data is (on HDFS), and where the output should be written (on HDFS).
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.1.1.jar -mapper "python $PWD/mapper.py" -reducer "python $PWD/reducer.py" -input python01/test1 -output python01/output1