Running hdfs from the freshly built hadoop 0.23.3 in pseudo distributed mode

So, here’s what I did to run the freshly built hadoop 0.23.3 bits in pseudo distributed mode (this is the mode where each of the hadoop daemons runs in its own process in a single physical/virtual machine).

First I configured passwordless ssh back into the same machine (localhost). I needed to turn off selinux on this CentOS 6.3 VM in order to accomplish that. Seems like selinux is working very hard to make CentOS/Redhat completely unusable. I edited /etc/selinux/config and changed the line SELINUX=enforcing to SELINUX=disabled. Reboot, and then ‘ssh-keygen’ and then ‘ssh-copy-id -i ~/.ssh/id_rsa.pub jagane@localhost’

Now, I untarred the file <src_root>/hadoop-dist/target/hadoop-0.23.3.tar.gz into /opt. mkdir /opt/hadoop-0.23.3/conf, /opt/nn and /opt/dn. Create the following files in /opt/hadoop-0.23.3/conf.

/opt/hadoop-0.23.3/conf/core-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
 <name>fs.default.name</name>
 <value>hdfs://localhost:8020</value>
 </property>
</configuration>

/opt/hadoop-0.23.3/conf/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
 <name>dfs.replication</name>
 <value>1</value>
 </property>
 <property>
 <name>dfs.namenode.name.dir</name>
 <value>/opt/nn</value>
 </property>
 <property>
 <name>dfs.datanode.data.dir</name>
 <value>/opt/dn</value>
 </property>
</configuration>

/opt/hadoop-0.23.3/conf/hadoop-env.sh

export JAVA_HOME=/usr/java/java
export HADOOP_HOME=/opt/hadoop-0.23.3
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/conf/
export YARN_CONF_DIR=~${HADOOP_HOME}/conf/

Now run the following command to format hdfs

$ (cd /opt/hadoop-0.23.3; ./bin/hdfs namenode -format)

Next, startup the namenode as follows:

$ (cd /opt/hadoop-0.23.3; ./sbin/hadoop-daemon.sh --config /opt/hadoop-0.23.3/conf start namenode)

Finally, start up the datanode

$ (cd /opt/hadoop-0.23.3; ./sbin/hadoop-daemon.sh --config /opt/hadoop-0.23.3/conf start datanode)

At this point, running the command jps will show you the datanode and the namenode running.

That’s all for now.

avatar

About Jagane Sundar

0 Responses to “Running hdfs from the freshly built hadoop 0.23.3 in pseudo distributed mode”


  • No Comments

Leave a Reply