NOTE: HDFS is required for Flink's DistributedCache which distributes Python plans to worker nodes. We use BlueData Hadoop CDH nodes.
Remember to make sure you aren't using env.execute(local=True) in your Python plans!
On the master node:
-
Install
gitand other useful things that we likesudo yum install git bzip2 -y