Sunday, February 24, 2013

Set up Hadoop on OSX

Let's say you're looking to get you MacBook Pro all set up with a local Hadoop instance to play with for that long flight across the Atlantic (or some other time when you really want to be running locally).

The following steps work for OSX 10.7 Lion
The following tips may be useful the first time you're setting this up on a new computer:
  1. Turn on SSH (System Preferences => Sharing => Remote Login => "On")
  2. For the JAVA_HOME variable, you could put in the current full path: 
    • export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Home
    • or, you could do it the smart way:
    • export JAVA_HOME=`/usr/libexec/java_home`
  3. If you get this error: 
    ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /Users/jchen/Data/Hadoop/dfs/data: namenode namespaceID = 773619367; datanode namespaceID = 2049079249

    It's because you've formatted the namenode twice. Happens when you're walking through tutorials. The answer is well spelled out here (which is another good setup tutorial). The summary is: either start over - delete the datanode directory and then reformat the name node - or manually fix the version file in the datanode to match the name node.