Getting Ready to Run a Cascading Cluster on Windows

cascading

In this article, I’m going to discuss what you need to do to prepare your local Window machine (I’m using a VMWARE clone running Windows 8.1 Pro x64 with 8 GB of Ram)

First things first, let’s download our shopping list:

(Still trying to figure this one out but apparently it helps with the network mappings – see Cascading install instructions on GitHub)

Download hadoop-2.3.0.tar.* from  http://www.us.apache.org/dist/hadoop/core/hadoop-.3.0/  (save to Hadoop_temporary folder)

  • o   hadoop-2.3.0.tar.gz
  • o   hadoop-2.3.0.tar.gz.asc
  • o   hadoop-2.3.0.tar.gz.gz
  • o   hadoop-2.3.0.tar.gz.mds

Download hbase-0.96.2-hadoop2-bin.tar.from  http://www.apache.org/dist/hbase/hbase-.96.2/  (save to Hadoop_temporary folder)

  • o   hbase-0.96.2-hadoop2-bin.tar.gz
  • o   hbase-0.96.2-hadoop2-bin.tar.gz.asc
  • o   hbase-0.96.2-hadoop2-bin.tar.gz.gz

The reason for the apache download is that the cluster automation is supposed to copy these down but there seems to be an issue with Puppet on windows and the downloads tend to fail, causing the install process to fail. Trust me … this will save you hours of troubleshooting later.

While you are waiting for the downloads to finish, I suggest you read the instructions on the Github site: https://github.com/Cascading/vagrant-cascading-hadoop-cluster . There are a few pitfalls I will guide you through but it is not a bad read.

Make a folder called “C:\BigDataProject”

Once finish, install VirtualBox .

Follow the install and click YES to any questions. Don’t start it yet.

Install Vagrant to “C:\BigDataProject\Vagrant”

Install Bonjour. Click NO to install auto update options.

Unzip the content of the “vagrant-cascading-hadoop-cluster-2.5.zip” to the following directory “C:\BigDataProject\vagrant-cascading-hadoop-cluster-2.5”

Copy all the files from Hadoop_temporary to the root of the cascading folder that contains the Vagrantfile. “C:\BigDataProject\vagrant-cascading-hadoop-cluster-2.5”

Install Image 1

Open VirtualBox – RUN AS ADMINISTRATOR

Open Command Prompt  – RUN AS ADMINISTRATOR

Navigate to  “C:\BigDataProject\vagrant-cascading-hadoop-cluster-2.5”

TYPE: VAGRANT UP

Install Image 2

(side note: if you did not want to build the full 4 node cluster but just wanted to get a single node up and running, simply navigate to “C:\BigDataProject\vagrant-cascading-hadoop-cluster-2.5\ single-node” and follow all the exact same instructions from here )

[Configuration option: The default VM machines will be assigned 512 MB of Ram. If you would like to assign more on startup, simply edit the “Vagrantfile” and change the “512” to the appropriate number]

Since this is the first time you run it, it will download the Virtualbox Default Image. Once this is done, it will be re-used.

Install Image 3

Once everything is complete and all 4 VM’s have been created, you are good to go. If at any point the install has failed, don’t bother continuing cuz it will not work. Bug fix and restart.

Install Image 4

Next, you need to set up Putty to connect to the master VM. In the command prompt, type the following: vagrant ssh master

Install Image 5

This will tell you 2 very important things:

a)      Where to find the encryption key : “C:/Users/AlienED/.vagrant.d/insecure_private_key”

b)      Which local port was configured to handle access to the master VM : 2202

Copy the “insecure_private_key” to where you put PuttyGen and run it. Click Load and point it to the key file. You should get the following result:

Install Image 6

Click “Save Private Key” and click “Yes” when it moans about a pass phrase (security is not our concern here)

Save file as AccessKey.ppk and close PuttyGen

Open Putty and under Session

a)      Set IP to 127.0.0.1

b)      Port to 2202

Install Image 7

Goto Connection / SSH/ Auth, click Browse and point to AccessKey.ppk

Install Image 8

Goto Connection / Data and set Auto-login Username to “vagrant”

Install Image 9

Go back to Session, under Saved Session enter “master” and hit save.

Install Image 10

All done. Now you should just be able to load Putty, click on master and hit load. For now, just click “Open”. When it asks you to cache the server key, say “YES”

Congratulations! You have just built your very first 4 node Hadoop cluster running Cascading and Lingual.

In the next blog, I will show you how to set up a connection to a data source using Lingual.

Install Image 11

Finally, if you want to shut down the VM’s, it is perty easy. Goto “C:\BigDataProject\vagrant-cascading-hadoop-cluster-2.5” and simply type: “vagrant halt”

To get them back up and running, simply : “vagrant up”

Happy hunting!

Leave a Reply