Deploy Hadoop clusters on Docker
This repository is based on big-data-europe/docker-hadoop
The default version is 3.2.1.
To select other versions, please specify in the docker-compose.yml
.
To know all supported versions, review all branches in big-data-europe/docker-hadoop
docker-compose-datanode-clusters.yml
for the reason)To deploy a basic HDFS cluster, run:
./start.sh
Stop and remove all HDFS containers, network:
./stop.sh
Access all dashboards:
http://localhost:9870
http://localhost:8188
http://localhost:9864
http://localhost:8042
http://localhost:8088
On the cluster machine, edit the datanode-cluster.env
docker-compose file by replacing 10.0.0.4
with the IP of the Host machine of the Namenode container.
Then deploy the cluster:
./start-datanode-cluster.sh
Stop and remove the Datanode cluster:
./stop-datanode-cluster.sh
After succesfully cluster deployment, do the Namenode Dashboard > Datanodes. Make sure the new Datanode is added, binded its Host’s IP Address and balanced with the correct number of HDFS blocks.
To test Hadoop, attach to the Namenode container:
docker exec -it namenode bash
Create a simple text as the input:
echo "This is a simple test for Hadoop" > test.txt
Then create the corresponding input folder on HDFS:
hadoop fs -mkdir -p input
And copy out test file to HDFS:
hdfs dfs -put test.txt input/test.txt
After preparing the input file, we will get the WordCount program for Hadoop 3.2.1 in the hadoop-mapreduce-examples
executable jar file (If you use another Hadoop version, please change the path):
curl https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-examples/3.2.1/hadoop-mapreduce-examples-3.2.1.jar --output map_reduce.jar
Submit our WordCount Job to Hadoop (The program wordcount
can have different names on each hadoop-mapreduce-examples
version):
hadoop jar map_reduce.jar wordcount input output
If everything runs fine, we can see the output by requesting data from HDFS:
hdfs dfs -cat output/part-r-00000
Result:
This 1
is 1
a 1
simple 1
test 1
for 1
Hadoop 1
See detail in big-data-europe/docker-hadoop
You can use my Docker Commands Toolkt to clean your host machine
namenode
as its host address. So downloading file via Namenode File System Browser will auto redirect to http://namenode:9870/webhdfs/v1/....
, which cause errors.