Saturday, December 1, 2012

Getting Started with Hadoop on Ubuntu

So, I'm trying to play with hadoop again (haven't done it in a while), and since ubuntu is my current weapon of choice, I found a great tutorial at ,but I wanted something even simpler, like a script, plus a sample program and instructions on how to compile and run it, so I created it (at the very bottom are the differences with Noll's tutorial). It is available at: .

You just need to download it and change it so it can be executable: You probably want to look at it in your favorite editor (it is not a good idea to just run a script from the internet; I trust myself, but you shouldn't trust me), and you may want to change the mirror while you're at it (I live in Atlanta, so I use Georgia Tech's). After you're happy, run it as root: And you should be done with the installation ! the script creates a user for hadoop, called hduser; you can change to it, by typing: Then, as that user, you want to setup your path and classpath (the classpath is needed for compiling): And start hadoop: Now download my sample program (it is the standard WordCount example, from the tutorial, but without the package statement, so you can compile it directly from that folder), compile it and create a jar file: Now, we need to put some data into hadoop; first we create a folder and copy a file into it (our same, since we just need a text file): And we copy that folder into hadoop (and list it, to verify it's there): And now we can run our program in hadoop: When you want to stop hadoop, just run the command; also, if you want to copy the output to your file system, just use the -copyToLocal option of hadoop's dfs.
The install script is completely automated, so you can even use it to start an amazon ec2 instance with it; for example, use: to start a micro instance, with a ubuntu 12.04 daily build (for Dec-1-2012; change the ami id to get a different one :), and a key named mac-vm.


  1. it's a nice project, very helpful for us and thank's for sharing. we are providing Hadoop online training

  2. There are many blogs about the cloud and hadoop out there but this is completely different which has made me completeletely attached to this blog for the information on Hadoop subject. I only learned subject like this at hadoop online training center earlier. Thanks.


  3. In this information to read very exciting because it is very interesting the high knowledge with explanation.

    hadoop training in chennai

  4. You have provided an nice article, Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting for your next post keep on updating these kinds of knowledgeable things...
    Texting API
    Text message marketing
    Digital Mobile Marketing
    Mobile Marketing Services
    Mobile marketing companies
    Fitness SMS