2011年7月27日 星期三

[Linux] 安裝單機版 Hadoop 0.20.203 + HBase 0.90.3 @ Ubuntu 10.04.2 64Bit Server

前幾天又安裝 Hadoop 一次,查了一下,其實上一次也才九個月左右 :P 有這樣的感觸,實在是個人用到 Hadoop 的機會不高 Orz 所以每次都重裝一次,裝完沒多久又砍掉,這次安裝主要是避免新手的我,不小心玩壞同事的機器


過去的安裝紀錄:



整理一下,才發現第一次接觸 Hadoop 是 2009 年的夏天,那篇設定說穿了就是為了筆記做了啥設定,啥都不懂就跑去比賽,可還是初賽那一周才開始用 Hadoop 的說。回想起來還滿高興那時不懂 Hadoop/HBase 的,所以沒有被其中的功能給侷限住 XD 反而還是很盡情地享受資料設計,把想要的東西套用在 Hadoop 的架構上。例如做一個跑在 Hadoop 上的 Merge Sort 等,如果知道 Hadoop 的功能或架構,大概會認為有點脫褲子放屁 Orz 至於上一次安裝 Hadoop 則是架設在異質環境,用了 FreeBSD 32bit/64bit 和 Linux 64bit 環境,算第一次架了那麼多台,盡管只有五台啦,那時是學弟有跑 Hadoop 的需求,就是看看在系上工作站架看看 XD 但最後還是選擇使用 國網中心 Hadoop 公用實驗叢集,申請一下就有十多台機器可以用囉!


這次安裝 Hadoop 採用預設設定,在執行 wordcount 時會出現:


Task Id : attempt_#############_0001_r_000000_0, Status : FAILED
Error: java.lang.NullPointerException
        at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820)


猜測應該是機器上有多張網卡或是支援 IPv4 和 IPv6 的關係吧?總之,預設的使用情境是 Pseudo-Distributed 的,所以一堆都填 localhost ,只要把它改成一個自訂的,再到 /etc/hosts 設定一個 IP (127.0.0.1) 即可解決。


總之,又再架了一次了 Orz 說真的,這陣子總覺得雲端熱潮已經結束了,周邊也很少人談論起 Hadoop/HBase 等事,或許就像 MySQL 已經很融入 Web development 環境,大家都覺得很直觀而不用多談?還是說,其實需要 Hadoop/HBase 的環境真的不多呢?除此之外,也有感無招勝有招啊,看了很多新東西,表面上好像增廣見聞,實際上卻容易越來越局限想法。


其他資訊:


從 ubuntu-10.04.2-server-amd64.iso 光碟安裝,並使用 hadoop 當作使用者帳號:


hadoop@hadoop:~$ uname -a
Linux hadoop 2.6.32-28-server #55-Ubuntu SMP Mon Jan 10 23:57:16 UTC 2011 x86_64 GNU/Linux


設定自動登入:


hadoop@hadoop:~$ ssh-keygen -t rsa -P '' && cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
hadoop@hadoop:~$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is ##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##.
Are you sure you want to continue connecting (yes/no)? yes


下載軟體 jdk-6u26-linux-x64.bin / hadoop-0.20.203.0 / hbase-0.90.3:


hadoop@pc:~$ wget http://download.oracle.com/otn-pub/java/jdk/6u26-b03/jdk-6u26-linux-x64.bin
hadoop@pc:~$ wget http://apache.cdpa.nsysu.edu.tw//hadoop/common/hadoop-0.20.203.0/hadoop-0.20.203.0rc1.tar.gz
hadoop@pc:~$ wget http://apache.stu.edu.tw//hbase/hbase-0.90.3/hbase-0.90.3.tar.gz


目錄狀態(解壓縮後,擺在家目錄,tarball 只是用來存放原始壓縮檔):


hadoop@pc:~$ ls && echo 'tarball:' && ls tarball/
hadoop-0.20.203.0  hbase-0.90.3  jdk1.6.0_26  tarball
tarball:
        hadoop-0.20.203.0rc1.tar.gz  hbase-0.90.3.tar.gz  jdk-6u26-linux-x64.bin


Hadoop 設定檔:


hadoop@pc:~$ vim hadoop-0.20.203.0/conf/hadoop-env.sh
export JAVA_HOME=$HOME/jdk1.6.0_26


hadoop@pc:~$ vim hadoop-0.20.203.0/conf/core-site.xml
<property>
  <name>fs.default.name</name>
  <value>hdfs://mycloud:9000</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>


hadoop@pc:~$ vim hadoop-0.20.203.0/conf/mapred-site.xml
<property>
  <name>mapred.job.tracker</name>
  <value>mycloud:9001</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>


hadoop@pc:~$ vim hadoop-0.20.203.0/conf/master
mycloud


hadoop@pc:~$ vim hadoop-0.20.203.0/conf/slaves
mycloud


hadoop@pc:~$ sudo vim /etc/hostname
mycloud


hadoop@pc:~$ sudo vim /etc/hosts
127.0.0.1 mycloud


hadoop@pc:~$ vim ~/.bashrc
PATH=$HOME/hadoop-0.20.203.0/bin:$HOME/hbase-0.90.3/bin:$HOME/jdk1.6.0_26/bin:${PATH}
hadoop@pc:~$ sudo reboot


初始化 Hadoop:


hadoop@mycloud:~$ hadoop namenode -format
INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = mycloud/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.203.0
STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May  4 07:57:50 PDT 2011
************************************************************/
INFO util.GSet: VM type       = 64-bit
INFO util.GSet: 2% max memory = 19.33375 MB
INFO util.GSet: capacity      = 2^21 = 2097152 entries
INFO util.GSet: recommended=2097152, actual=2097152
INFO namenode.FSNamesystem: fsOwner=hadoop
INFO namenode.FSNamesystem: supergroup=supergroup
INFO namenode.FSNamesystem: isPermissionEnabled=true
INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
INFO namenode.NameNode: Caching file names occuring more than 10 times 
INFO common.Storage: Image file of size 112 saved in 0 seconds.
INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at mycloud/127.0.0.1
************************************************************/


啟動 Hadoop 及相關檢查方式: 


hadoop@mycloud:~$ start-all.sh 
starting namenode, logging to /home/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hadoop-namenode-mycloud.out
mycloud: starting datanode, logging to /home/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hadoop-datanode-mycloud.out
mycloud: starting secondarynamenode, logging to /home/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hadoop-secondarynamenode-mycloud.out
starting jobtracker, logging to /home/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hadoop-jobtracker-mycloud.out
mycloud: starting tasktracker, logging to /home/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hadoop-tasktracker-mycloud.out


hadoop@mycloud:~$ jps
984 NameNode
1526 TaskTracker
1626 Jps
1368 JobTracker
1139 DataNode
1303 SecondaryNameNode


hadoop@mycloud:~$ hadoop dfsadmin -report
Configured Capacity: 81232019456 (75.65 GB)
Present Capacity: 75465768975 (70.28 GB)
DFS Remaining: 75465744384 (70.28 GB)
DFS Used: 24591 (24.01 KB)
DFS Used%: 0%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 0

------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Name: 127.0.0.1:50010
Decommission Status : Normal
Configured Capacity: 81232019456 (75.65 GB)
DFS Used: 24591 (24.01 KB)
Non DFS Used: 5766250481 (5.37 GB)
DFS Remaining: 75465744384(70.28 GB)
DFS Used%: 0%
DFS Remaining%: 92.9%
Last contact: ##########


hadoop@mycloud:~$ netstat -plten | grep java


使用 wordcount 測試:


hadoop@mycloud:~$ hadoop fs -mkdir in
hadoop@mycloud:~$ hadoop fs -put ~/hadoop-0.20.203.0/README.txt in
hadoop@mycloud:~$ hadoop jar ~/hadoop-0.20.203.0/hadoop-examples-0.20.203.0.jar wordcount in out
INFO input.FileInputFormat: Total input paths to process : 1
INFO mapred.JobClient: Running job: job_####_0001
INFO mapred.JobClient:  map 0% reduce 0%
INFO mapred.JobClient:  map 100% reduce 0%
INFO mapred.JobClient:  map 100% reduce 100%
INFO mapred.JobClient: Job complete: job_####_0001
INFO mapred.JobClient: Counters: 25
INFO mapred.JobClient:   Job Counters 
INFO mapred.JobClient:     Launched reduce tasks=1
INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=14829
INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
INFO mapred.JobClient:     Launched map tasks=1
INFO mapred.JobClient:     Data-local map tasks=1
INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10740
INFO mapred.JobClient:   File Output Format Counters 
INFO mapred.JobClient:     Bytes Written=1306
INFO mapred.JobClient:   FileSystemCounters
INFO mapred.JobClient:     FILE_BYTES_READ=1836
INFO mapred.JobClient:     HDFS_BYTES_READ=1476
INFO mapred.JobClient:     FILE_BYTES_WRITTEN=45881
INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1306
INFO mapred.JobClient:   File Input Format Counters 
INFO mapred.JobClient:     Bytes Read=1366
INFO mapred.JobClient:   Map-Reduce Framework
INFO mapred.JobClient:     Reduce input groups=131
INFO mapred.JobClient:     Map output materialized bytes=1836
INFO mapred.JobClient:     Combine output records=131
INFO mapred.JobClient:     Map input records=31
INFO mapred.JobClient:     Reduce shuffle bytes=1836
INFO mapred.JobClient:     Reduce output records=131
INFO mapred.JobClient:     Spilled Records=262
INFO mapred.JobClient:     Map output bytes=2055
INFO mapred.JobClient:     Combine input records=179
INFO mapred.JobClient:     Map output records=179
INFO mapred.JobClient:     SPLIT_RAW_BYTES=110
INFO mapred.JobClient:     Reduce input records=131


hadoop@mycloud:~$ hadoop fs -lsr 
drwxr-xr-x   - hadoop supergroup          0 /user/hadoop/in
-rw-r--r--   3 hadoop supergroup       1366 /user/hadoop/in/README.txt
drwxr-xr-x   - hadoop supergroup          0 /user/hadoop/out
-rw-r--r--   3 hadoop supergroup          0 /user/hadoop/out/_SUCCESS
drwxr-xr-x   - hadoop supergroup          0 /user/hadoop/out/_logs
drwxr-xr-x   - hadoop supergroup          0 /user/hadoop/out/_logs/history
-rw-r--r--   3 hadoop supergroup      10522 /user/hadoop/out/_logs/history/job_####_0001_####_hadoop_word+count
-rw-r--r--   3 hadoop supergroup      19894 /user/hadoop/out/_logs/history/job_####_0001_conf.xml
-rw-r--r--   3 hadoop supergroup       1306 /user/hadoop/out/part-r-00000


 hadoop@mycloud:~$ hadoop fs -cat out/p*
BIS),  1
(ECCN)  1
(TSU)   1
(see    1
5D002.C.1,      1
...
...


設定 HBase (http://hbase.apache.org/book/quickstart.html):


hadoop@mycloud:~$ mkdir /home/hadoop/hStoreDir
hadoop@mycloud:~$ vim hbase-0.90.3/conf/hbase-env.sh
export JAVA_HOME=$HOME/jdk1.6.0_26
hadoop@mycloud:~$ vim hbase-0.90.3/conf/hbase-site.xml
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///home/hadoop/hStoreDir/hbase</value>
  </property>
</configuration>


hadoop@mycloud:~$ start-hbase.sh 
starting master, logging to /home/hadoop/hbase-0.90.3/bin/../logs/hbase-hadoop-master-mycloud.out


hadoop@mycloud:~$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.90.3, r1100350, ### ###  # ##:##:## PDT 2011

hbase(main):001:0> 


1 則留言:

  1. 你好
    最近想研究這方向的東西
    所以參考了你灌的流程
    想先問說
    hadoop@pc:~$ vim hadoop-0.20.203.0/conf/master
    mycloud
    hadoop@pc:~$ vim hadoop-0.20.203.0/conf/slaves
    mycloud
    hadoop@pc~$ sudo vim /etc/hostname
    mycloud
    hadoop@pc:~$ sudo vim /etc/hosts
    127.0.0.1 mycloud
    這幾步是要把檔案內原本的文字刪掉改成上面的文字嗎
    還有用不同的hbase會不相容嗎
    目前已經沒有0.90.3了
    謝謝

    版主回覆:(02/20/2013 03:28:59 PM)


    很久沒摸了,不太敢打包票是相容的,你可以去找最近別人安裝過的筆記吧 :)
    至於你上面說要修改的資訊,我看你的機器是用 "hadoop @ PC" 敘述的
    那大概就把 mycloud 改成 pc 就行了吧
    你在試看看吧

    回覆刪除