你好,游客 登录
背景:
阅读新闻

开源云计算技术系列(四)(Cloudera安装配置)

[日期:2009-09-02] 来源:  作者:清 [字体: ]

节省篇幅,直入正题。

首先用虚拟机virtualbox 配置一台debian 5.0.

debian在开源linux里面始终是最为纯正的linux血统,使用起来方便,运行起来高效,重新审视一下最新的5.0,别有一番似是故人来的感觉。

只需要下载debian-501-i386-CD-1.iso进行安装,剩下的基于debian强大的网络功能,可以很方便的进行软件包的配置。具体过程这里略去,可以在www.debian.org里面找到所有你需要的信息。

下面我们来体验一下稳定版0.183的方便和简洁。

step1.配置 Cloudera Repository

创建一个新的配置文件 vi /etc/apt/sources.list.d/cloudera.list

more /etc/apt/sources.list.d/cloudera.list
deb http://archive.cloudera.com/debian lenny contrib
deb-src http://archive.cloudera.com/debian lenny contrib

增加 Adding the Cloudera Key

debian:~# curl -s http://archive.cloudera.com/debian/archive.key | apt-key add -
OK

更新 APT Index

debian:~# apt-get update
Ign cdrom://[Debian GNU/Linux 5.0.1 _Lenny_ - Official i386 CD Binary-1 20090413-00:10] lenny Release.gpg
Ign cdrom://[Debian GNU/Linux 5.0.1 _Lenny_ - Official i386 CD Binary-1 20090413-00:10] lenny/main Translation-en_US
Ign cdrom://[Debian GNU/Linux 5.0.1 _Lenny_ - Official i386 CD Binary-1 20090413-00:10] lenny Release 
Ign cdrom://[Debian GNU/Linux 5.0.1 _Lenny_ - Official i386 CD Binary-1 20090413-00:10] lenny/main Packages/DiffIndex
Get:1 http://archive.cloudera.com lenny Release.gpg [197B]                                            
Get:2 http://volatile.debian.org lenny/volatile Release.gpg [189B]                                    
Ign http://volatile.debian.org lenny/volatile/main Translation-en_US                                  
Hit http://ftp.us.debian.org lenny Release.gpg                                                        
Ign http://archive.cloudera.com lenny/contrib Translation-en_US                           
Hit http://security.debian.org lenny/updates Release.gpg                                  
Ign http://security.debian.org lenny/updates/main Translation-en_US 
Get:3 http://volatile.debian.org lenny/volatile Release [40.7kB]    
Ign http://ftp.us.debian.org lenny/main Translation-en_US                                       
Hit http://security.debian.org lenny/updates Release                                            
Get:4 http://archive.cloudera.com lenny Release [2391B]                                        
Hit http://ftp.us.debian.org lenny Release                                                      
Ign http://security.debian.org lenny/updates/main Packages/DiffIndex                           
Ign http://archive.cloudera.com lenny/contrib Packages                     
Ign http://security.debian.org lenny/updates/main Sources/DiffIndex        
Ign http://ftp.us.debian.org lenny/main Packages/DiffIndex                 
Ign http://ftp.us.debian.org lenny/main Sources/DiffIndex                                  
Hit http://security.debian.org lenny/updates/main Packages          
Hit http://ftp.us.debian.org lenny/main Packages                    
Ign http://archive.cloudera.com lenny/contrib Sources               
Ign http://volatile.debian.org lenny/volatile/main Packages/DiffIndex
Hit http://security.debian.org lenny/updates/main Sources           
Ign http://volatile.debian.org lenny/volatile/main Sources/DiffIndex
Hit http://ftp.us.debian.org lenny/main Sources                     
Get:5 http://archive.cloudera.com lenny/contrib Packages [4480B]
Get:6 http://volatile.debian.org lenny/volatile/main Packages [7471B]
Get:7 http://volatile.debian.org lenny/volatile/main Sources [2350B]     
Get:8 http://archive.cloudera.com lenny/contrib Sources [1431B]
Fetched 59.2kB in 4s (12.5kB/s)
Reading package lists... Done
debian:~#

查看 Cloudera packages

debian:~# apt-cache search hadoop
hadoop - A software platform for processing vast amounts of data
hadoop-conf-pseudo - Pseudo-distributed Hadoop configuration
hadoop-datanode - Data Node for Hadoop
hadoop-doc - Documentation for Hadoop
hadoop-jobtracker - Job Tracker for Hadoop
hadoop-namenode - Name Node for Hadoop
hadoop-native - Native libraries for Hadoop (e.g., compression)
hadoop-pipes - Interface to author Hadoop MapReduce jobs in C++
hadoop-secondarynamenode - Secondary Name Node for Hadoop
hadoop-tasktracker - Task Tracker for Hadoop
hive - A data warehouse infrastructure built on top of Hadoop
libhdfs0 - JNI Bindings to access Hadoop HDFS from C
pig - A platform for analyzing large data sets using Hadoop
debian:~#

 

ok,准备工作到此,下面开始正式安装,还是非常方便的。

我们选择安装Hadoop (Pseudo-Distributed Mode)的模式。能完整体验hadoop的功能。

昨天我们体验了hadoop-conf-pseudo 0.18.3-0cloudera0.3.0~intrepid,今天放出了基于最新版hadoop 0.20的cloudera软件试用包,既然如此,那就趁机尝一把鲜吧,这就是开源软件的速度,每天都有新感觉。

需要java6。

配置

debian:~/codeblue2/client/examples# more /etc/apt/sources.list
#
# deb cdrom:[Debian GNU/Linux 5.0.1 _Lenny_ - Official i386 CD Binary-1 20090413-00:10]/ lenny main

deb cdrom:[Debian GNU/Linux 5.0.1 _Lenny_ - Official i386 CD Binary-1 20090413-00:10]/ lenny main

deb http://ftp.us.debian.org/debian/ lenny main contrib non-free
deb-src http://ftp.us.debian.org/debian/ lenny main contrib non-free

deb http://security.debian.org/ lenny/updates main contrib non-free
deb-src http://security.debian.org/ lenny/updates main contrib non-free

deb http://volatile.debian.org/debian-volatile lenny/volatile main contrib non-free
deb-src http://volatile.debian.org/debian-volatile lenny/volatile main contrib non-free

 

然后apt-get update一把。

debian:~# apt-get install sun-java6-jre

很傻瓜化的就安装好了,这里就略去输出了。

在体验0.20之前,在把0.18.3 的安装说一下,毕竟是稳定版本。

apt-get -y install hadoop-conf-pseudo
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following extra packages will be installed:
  hadoop hadoop-native liblzo2-2
The following NEW packages will be installed:
  hadoop hadoop-conf-pseudo hadoop-native liblzo2-2
0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
Need to get 12.0MB/12.1MB of archives.
After this operation, 21.5MB of additional disk space will be used.
Get:1 http://archive.cloudera.com lenny/contrib hadoop 0.18.3-4cloudera0.3.0~lenny [11.9MB]
Get:2 http://archive.cloudera.com lenny/contrib hadoop-conf-pseudo 0.18.3-4cloudera0.3.0~lenny [93.1kB]
Get:3 http://archive.cloudera.com lenny/contrib hadoop-native 0.18.3-4cloudera0.3.0~lenny [92.7kB]    
Fetched 4336kB in 23s (184kB/s)                                                                       
Selecting previously deselected package liblzo2-2.
(Reading database ... 103556 files and directories currently installed.)
Unpacking liblzo2-2 (from .../lzo2/liblzo2-2_2.03-1_i386.deb) ...
Selecting previously deselected package hadoop.
Unpacking hadoop (from .../hadoop_0.18.3-4cloudera0.3.0~lenny_all.deb) ...
Selecting previously deselected package hadoop-conf-pseudo.
Unpacking hadoop-conf-pseudo (from .../hadoop-conf-pseudo_0.18.3-4cloudera0.3.0~lenny_all.deb) ...
Selecting previously deselected package hadoop-native.
Unpacking hadoop-native (from .../hadoop-native_0.18.3-4cloudera0.3.0~lenny_i386.deb) ...
Processing triggers for man-db ...
Setting up liblzo2-2 (2.03-1) ...
Setting up hadoop (0.18.3-4cloudera0.3.0~lenny) ...
Setting up hadoop-conf-pseudo (0.18.3-4cloudera0.3.0~lenny) ...
Setting up hadoop-native (0.18.3-4cloudera0.3.0~lenny) ...

 

查看一下安装到哪里了。

debian:~# dpkg -L hadoop-conf-pseudo
/.
/etc
/etc/hadoop
/etc/hadoop/conf.pseudo
/etc/hadoop/conf.pseudo/hadoop-default.xml
/etc/hadoop/conf.pseudo/configuration.xsl
/etc/hadoop/conf.pseudo/log4j.properties
/etc/hadoop/conf.pseudo/slaves
/etc/hadoop/conf.pseudo/sslinfo.xml.example
/etc/hadoop/conf.pseudo/hadoop-env.sh
/etc/hadoop/conf.pseudo/masters
/etc/hadoop/conf.pseudo/hadoop-metrics.properties
/etc/hadoop/conf.pseudo/commons-logging.properties
/etc/hadoop/conf.pseudo/hadoop-site.xml
/usr
/usr/share
/usr/share/doc
/usr/share/doc/hadoop-conf-pseudo
/usr/share/doc/hadoop-conf-pseudo/copyright
/usr/share/doc/hadoop-conf-pseudo/changelog.Debian.gz
/usr/share/doc/hadoop-conf-pseudo/changelog.gz
/usr/share/lintian
/usr/share/lintian/overrides
/usr/share/lintian/overrides/hadoop-conf-pseudo

 

debian:~# ls -l /var/lib/hadoop/cache/hadoop/dfs/name
total 8
drwxr-xr-x 2 hadoop hadoop 4096 2009-06-24 02:58 current
drwxr-xr-x 2 hadoop hadoop 4096 2009-06-24 02:58 image

 

启动hadoop的服务:

debian:~# /etc/init.d/hadoop-namenode start
Starting Hadoop namenode daemon: starting namenode, logging to /var/log/hadoop/hadoop-hadoop-namenode-debian.out
hadoop-namenode.

 

/etc/init.d/hadoop-datanode start
Starting Hadoop datanode daemon: starting datanode, logging to /var/log/hadoop/hadoop-hadoop-datanode-debian.out
hadoop-datanode.
debian:~# /etc/init.d/hadoop-jobtracker start
Starting Hadoop jobtracker daemon: starting jobtracker, logging to /var/log/hadoop/hadoop-hadoop-jobtracker-debian.out

hadoop-jobtracker.

 

查看一下进程是否正常

hadoop    7926     1  0 03:01 ?        00:00:12 /usr/lib/jvm/java-6-sun//bin/java -Xmx100m -Dcom.sun.man
hadoop    8007     1  1 03:02 ?        00:00:14 /usr/lib/jvm/java-6-sun//bin/java -Xmx100m -Dcom.sun.man
hadoop    8053     1  0 03:02 ?        00:00:13 /usr/lib/jvm/java-6-sun//bin/java -Xmx100m -Dcom.sun.man
hadoop    8108     1  0 03:02 ?        00:00:11 /usr/lib/jvm/java-6-sun//bin/java -Xmx100m -Dhadoop.log

 

hive和pig的安装也就一条命令搞定,方便实惠。

apt-get install hive

apt-get insall pig

ok,我们autoremove掉0.183,体验最新的0.20

debian:~# apt-get autoremove hadoop-conf-pseudo

 

debian:~# wget http://archive.cloudera.com/hadoop-summit-09/hadoop-20-debs/deb_lenny_i386/hadoop-0.20_0.20.0-1cloudera0.5.0~lenny_all.deb

debian:~# dpkg -i hadoop-0.20_0.20.0-1cloudera0.5.0~lenny_all.deb
Selecting previously deselected package hadoop-0.20.
(Reading database ... 103589 files and directories currently installed.)
Unpacking hadoop-0.20 (from hadoop-0.20_0.20.0-1cloudera0.5.0~lenny_all.deb) ...
Setting up hadoop-0.20 (0.20.0-1cloudera0.5.0~lenny) ...
Processing triggers for man-db ...

关于0.20的新进展,关注中。

推荐 打印 | 录入: | 阅读:
相关新闻      
本文评论   
评论声明
  • 尊重网上道德,遵守中华人民共和国的各项有关法律法规
  • 承担一切因您的行为而直接或间接导致的民事或刑事法律责任
  • 本站管理人员有权保留或删除其管辖留言中的任意内容
  • 本站有权在网站内转载或引用您的评论
  • 参与本评论即表明您已经阅读并接受上述条款