- copy files to a directory:
git clone https://gist.github.com/cc7c8cec1188fd387cc2e3ec0f4fed7a.git wordcountand thencd wordcount. - see the input files:
cat *.txt - make sure mapper and reducer are executable
chmod +x *.scala - see how mapper works:
cat baa.txt | ./mapper.scala - see how reducer works:
cat baa.txt | ./mapper.scala | ./reducer.scala
- copy files to a directory:
git clone https://gist.github.com/cc7c8cec1188fd387cc2e3ec0f4fed7a.git wordcountand thencd wordcount. - create a directory on HDFS:
hadoop fs -mkdir -p /wc/in - copy input files into HDFS:
hadoop fs -put *.txt /wc/in - make sure the files are transfered:
hadoop fs -ls /wc/inYou can also read their content using-cat - make sure the mapper and reducer scripts are executable:
chmod +x *.scala - make sure the output directory dose NOT exist:
hadoop fs -ls /wc/out - issue:
hadoop jar /home/user/hadoop-2.7.3/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar -mapper mapper.scala -reducer reducer.scala -input /wc/in/* -output /wc/out - make sure the above script run successfully:
hadoop fs -ls /wc/outYou should see a zero byte file called_SUCCESS - read the output:
hadoop fs -cat /wc/out/part-00000