@2010/02/07,此篇是記錄從原先 0.18 版提升至 0.20 版的過程,若須由重頭安裝的,請參考以下兩篇:
- [Linux] 安裝單機版 Hadoop 0.20.1 Single-Node Cluster (Pseudo-Distributed) @ Ubuntu 9.04
- [Linux] 安裝 Hadoop 0.20.1 Multi-Node Cluster @ Ubuntu 9.10
下載位置:
- http://hadoop.apache.org/common/releases.html
- http://apache.ntu.edu.tw/hadoop/core/
- Hadoop Virtual Image
從 0.18 版升到 0.20 版,順一次最初的 VMWare Image 檔,其為 0.18 版、CentOS,以下用 root 帳號操作:
- 更新軟體 (安裝軟體用 root 身份)
- # yum -y install screen
- 安裝 screen
- # yum -y install screen
- 安裝 lftp
- # yum -y install lftp
- 安裝 apache web server 和 php
- # yum -y install httpd php
- 設定 HTTPD
- # vim /etc/httpd/conf/httpd.conf
- 啟動 HTTPD
- # /etc/init.d/httpd start
- # yum -y install screen
- 設定或建立 .screenrc (可用 hadoop身份)
- # vim ~/.screenrc 加入一行即可
- caption always "%{bw}%M/%d %c %{wb} %-w%{c}%n %t%{w}%+w%{k} %=%{G}[%H] %l%"
- # vim ~/.screenrc 加入一行即可
Hadoop 設定,以下用 hadoop 帳號操作:
- 抓 hadoop 0.20 至 hadoop 根目錄 (/home/hadoop, 即 hadoop 帳號登入的預設位置)
- # wget http://apache.ntu.edu.tw/hadoop/core/hadoop-0.20.0/hadoop-0.20.0.tar.gz
- 解壓縮即安裝
- # tar -xvf hadoop-0.20.0.tar.gz
- 切換至 hadoop-0.20.0 (之後的目錄都是指 hadoop-0.20.0)
- # cd /home/hadoop/hadoop-0.20.0/
- 設定 hadoop-env.sh
- # vim /home/hadoop/hadoop-0.20.0/conf/hadoop-env.sh 加入一行
- export JAVA_HOME=/usr/lib/jre1.6.0_13
- # vim /home/hadoop/hadoop-0.20.0/conf/hadoop-env.sh 加入一行
以上即完成單機版 hadoop 的環境,接著可以啟動跟查看目前所在目錄的資料夾與檔案
- # ./bin/start-all.sh
- # ./bin/hadoop dfs -ls
以下是設定 hadoop dfs 環境,使用 hadoop 身份
- 複製 hadoop 0.18 設定檔
- # cp /home/hadoop/start-hadoop /home/hadoop/hadoop-0.20.0/
- # cp /home/hadoop/stop-hadoop /home/hadoop/hadoop-0.20.0/
- # cp /home/hadoop/conf/hadoop-site.tmp /home/hadoop/hadoop-0.20.0/conf
- 並編輯 /home/hadoop/hadoop-0.20.0/start-hadoop 和 /home/hadoop/hadoop-0.20.0/stop-hadoop
- 更新 Hadoophome="/home/hadoop" 為 Hadoophome="/home/hadoop/hadoop-0.20.0"
- 產生設定檔
- 純粹用來產生 conf/hadoop-site.xml 資料, 但新版已不用此設定檔
- # /home/hadoop/hadoop-0.20.0/start-hadoop
- # /home/hadoop/hadoop-0.20.0/stop-hadoop
- # cp conf/hadoop-site.xml conf/core-site.xml
- # vim conf/core-site.xml
- <property>
<name>fs.default.name</name>
<value>hdfs://IP:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9002</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>${hadoop.tmp.dir}/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>${hadoop.tmp.dir}/dfs/data</value>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
</property>
<property>
<name>mapred.job.tracker.http.address</name>
<value>0.0.0.0:50030</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:50010</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:50020</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50475</value>
</property>
- <property>
- 純粹用來產生 conf/hadoop-site.xml 資料, 但新版已不用此設定檔
- 格式化 namenode (原先存在的資料將被清光)
- # ./bin/hadoop namenode -format
- 啟動 hadoop 前修改 start-hadoop 檔
- # vim /home/hadoop/hadoop-0.20.0/start-hadoop
- 把第九行用 # 註解掉 ( #sed "s/\$hostip/$host_ip/g........ ) 不弄的話也沒關係, 只是會一直蹦訊息而已
- # vim /home/hadoop/hadoop-0.20.0/start-hadoop
- 接著就像 0.18 一樣正常操作了
- [hadoop@hadoop hadoop-0.20.0]$ ./start-hadoop
Starting Hadoop ...
Job Admin: http://IP:50030/
HDFS: http://IP:50070/
[hadoop@hadoop hadoop-0.20.0]$ ./bin/hadoop dfs -ls
ls: Cannot access .: No such file or directory.
[hadoop@hadoop hadoop-0.20.0]$ ./bin/hadoop dfs -mkdir input
[hadoop@hadoop hadoop-0.20.0]$ ./bin/hadoop dfs -ls
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2009-08-12 15:00
/user/hadoop/input
- [hadoop@hadoop hadoop-0.20.0]$ ./start-hadoop
錯誤訊息及修正:
- list 可以有東西出來,但只要新增資料進去就會出錯,蹦一堆訊息,經過查詢發現是 datanode 沒有跑起來
- # ./bin/hadoop dfs -ls
- OK
- # ./bin/hadoop dfs -put test .
- ERROR
- # ./bin/hadoop dfsadmin -report
- 顯示 Datanodes available: 0 (0 total, 0 dead)
- 解決方式, 是將原本 0.18 建立的 datanode 資訊砍掉重建, 即可處理,但將損失資料,最好請參考 Trouble Shooting 的解法
- # ./stop-hadoop
- # rm /tmp/hadoop*
- # ./bin/hadoop namenode -format
- # ./start-hadoop
- # ./bin/hadoop dfs -ls
- Trouble Shooting
- If nameNode or file can not put into.
- Better solution
- # hadoop fsck
- Worst solution (data will all lost)
- # stop-all.sh
- # rm -fr /fs/HDFS
- # mkdir /fs/HDFS
- # hadoop namenode -format
- # start-all.sh
- Better solution
- 如果dfs只能看到自已
- 手動清掉所有node的hadoop tmp目錄(default: /tmp/hadoop-root)再重啟 dfs
- If nameNode or file can not put into.
沒有留言:
張貼留言