First things first, let’s download our shopping list:
- Latest Version of Vagrant (1.6.3): http://www.vagrantup.com/downloads.html
- Latest Version of Oracle VirtualBox (4.3.12): https://www.virtualbox.org/wiki/Downloads
- Latest Version of Putty & PuttyGen: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
- Cascading Cluster Setup files: https://github.com/Cascading/vagrant-cascading-hadoop-cluster . (since I don’t have Git installed, I am simply going to Download Zip)
- Bonjour Print Services for Windows v2.0.2 : http://support.apple.com/kb/dl999
(Still trying to figure this one out but apparently it helps with the network mappings – see Cascading install instructions on GitHub)
Download hadoop-2.3.0.tar.* from http://www.us.apache.org/dist/hadoop/core/hadoop-.3.0/ (save to Hadoop_temporary folder)
- o hadoop-2.3.0.tar.gz
- o hadoop-2.3.0.tar.gz.asc
- o hadoop-2.3.0.tar.gz.gz
- o hadoop-2.3.0.tar.gz.mds
Download hbase-0.96.2-hadoop2-bin.tar.from http://www.apache.org/dist/hbase/hbase-.96.2/ (save to Hadoop_temporary folder)
- o hbase-0.96.2-hadoop2-bin.tar.gz
- o hbase-0.96.2-hadoop2-bin.tar.gz.asc
- o hbase-0.96.2-hadoop2-bin.tar.gz.gz
The reason for the apache download is that the cluster automation is supposed to copy these down but there seems to be an issue with Puppet on windows and the downloads tend to fail, causing the install process to fail. Trust me … this will save you hours of troubleshooting later.
While you are waiting for the downloads to finish, I suggest you read the instructions on the Github site: https://github.com/Cascading/vagrant-cascading-hadoop-cluster . There are a few pitfalls I will guide you through but it is not a bad read.
Make a folder called “C:\BigDataProject”
Once finish, install VirtualBox .
Follow the install and click YES to any questions. Don’t start it yet.
Install Vagrant to “C:\BigDataProject\Vagrant”
Install Bonjour. Click NO to install auto update options.
Unzip the content of the “vagrant-cascading-hadoop-cluster-2.5.zip” to the following directory “C:\BigDataProject\vagrant-cascading-hadoop-cluster-2.5”
Copy all the files from Hadoop_temporary to the root of the cascading folder that contains the Vagrantfile. “C:\BigDataProject\vagrant-cascading-hadoop-cluster-2.5”
Open VirtualBox – RUN AS ADMINISTRATOR
Open Command Prompt – RUN AS ADMINISTRATOR
Navigate to “C:\BigDataProject\vagrant-cascading-hadoop-cluster-2.5”
TYPE: VAGRANT UP
(side note: if you did not want to build the full 4 node cluster but just wanted to get a single node up and running, simply navigate to “C:\BigDataProject\vagrant-cascading-hadoop-cluster-2.5\ single-node” and follow all the exact same instructions from here )
[Configuration option: The default VM machines will be assigned 512 MB of Ram. If you would like to assign more on startup, simply edit the “Vagrantfile” and change the “512” to the appropriate number]
Since this is the first time you run it, it will download the Virtualbox Default Image. Once this is done, it will be re-used.
Once everything is complete and all 4 VM’s have been created, you are good to go. If at any point the install has failed, don’t bother continuing cuz it will not work. Bug fix and restart.
Next, you need to set up Putty to connect to the master VM. In the command prompt, type the following: vagrant ssh master
This will tell you 2 very important things:
a) Where to find the encryption key : “C:/Users/AlienED/.vagrant.d/insecure_private_key”
b) Which local port was configured to handle access to the master VM : 2202
Copy the “insecure_private_key” to where you put PuttyGen and run it. Click Load and point it to the key file. You should get the following result:
Click “Save Private Key” and click “Yes” when it moans about a pass phrase (security is not our concern here)
Save file as AccessKey.ppk and close PuttyGen
Open Putty and under Session
a) Set IP to 127.0.0.1
b) Port to 2202
Goto Connection / SSH/ Auth, click Browse and point to AccessKey.ppk
Goto Connection / Data and set Auto-login Username to “vagrant”
Go back to Session, under Saved Session enter “master” and hit save.
All done. Now you should just be able to load Putty, click on master and hit load. For now, just click “Open”. When it asks you to cache the server key, say “YES”
Congratulations! You have just built your very first 4 node Hadoop cluster running Cascading and Lingual.
In the next blog, I will show you how to set up a connection to a data source using Lingual.
Finally, if you want to shut down the VM’s, it is perty easy. Goto “C:\BigDataProject\vagrant-cascading-hadoop-cluster-2.5” and simply type: “vagrant halt”
To get them back up and running, simply : “vagrant up”