CDH 在ubuntu上的部署和安装,以及一些坑

最近两天在自己电脑上搭建一个Cloudera Manager来玩玩。本来以为挺简单的,只是在Web UI上无脑下一步就好了,
但其实还是遇到挺多问题的。

安装

在服务器上的操作

刚开始基本上,就是按照官网的步骤来走,首先做一些前置工作:

  1. 配置下apt
  2. 安装JDK.
  3. 安装下NTP时间同步的程序;
  4. 安装好Mysql,MariaDB,Posgres。
    其中的一个数据库,刚开始以为都要安装。。。然后又把MariaDB这些一个个卸载了;
  5. 在Mysql中创建一些CM所需的数据库和表。如下所是。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
CREATE DATABASE scm DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY '123456';
CREATE DATABASE amon DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON amon.* TO 'amon'@'%' IDENTIFIED BY '123456';
CREATE DATABASE rman DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON rman.* TO 'rman'@'%' IDENTIFIED BY '123456';
CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON hue.* TO 'hue'@'%' IDENTIFIED BY '123456';
CREATE DATABASE metastore DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON metastore.* TO 'hive'@'%' IDENTIFIED BY '123456';
CREATE DATABASE sentry DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON sentry.* TO 'sentry'@'%' IDENTIFIED BY '123456';
CREATE DATABASE nav DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON nav.* TO 'nav'@'%' IDENTIFIED BY '123456';
CREATE DATABASE navms DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON navms.* TO 'navms'@'%' IDENTIFIED BY '123456';
CREATE DATABASE oozie DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON oozie.* TO 'oozie'@'%' IDENTIFIED BY '123456';
  1. 通过scm_prepare_database.sh 脚本来进一步设置Manager Database
    sudo /opt/cloudera/cm/schema/scm_prepare_database.sh [options]
    只需要执行一次sudo /opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm mypassword就好了。

在Web UI上的安装

  1. 首先记得在每台机器上配置好/etc/hosts
  2. 在Web上的安装基本上就是点继续。有些地方要注意。
    这一步会等待比较长的时间,会下载安装一些parcels。

Sample Image Added via Markdown

  1. 然后,基本上就是下一步了。到这一步,我是创建了一个叫cloudera的用户,要给与它sudo以及password-exempt
    Sample Image Added via Markdown

对了,因为我的是单机版的,所以HDFS那边会报错一个叫:副本不足的块 存在隐患。
这是因为只有一个节点,Block块无法,分配到其它的节点作为备份。默认是有2个备份Block分发到其它节点。

启动CDH

1
2
3
4
sudo systemctl start cloudera-scm-server
# 查看scm server日志,scm的全称是:The Service and Configuration Manager
sudo tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log

停止CDH

有些时候我们要停机检修一下电脑,所以要停止Cluster。很简单,首先进入到CDH的Web UI的Cluster主界面, 左上角有个Action
,点一下弹出下拉条,然后选停止。等几分钟后集群的所有组件就会停止了。

然后进入到主节点的终端,输入sudo systemctl stop cloudera-scm-server,就全部停止了。

CDH的一些配置

Yarn: RM,NM共用一个host

默认情况下Resource Manager会单独用一个节点。但是我的RM host内存和CPU都有剩余,跑app的时候把资源压在
device2上有点浪费了,我利用起device1的资源来。
首先进入到Yarn的版块,Action下拉框,点击Add Role Instance
Sample Image Added via Markdown
Sample Image Added via Markdown

注意,如果该instance的commission state为decommissioned的话要把它改为commissioned

增加服务

如果我们想新增加一些组件,比如Kafka或Spark,然后我们可以点击Cluster版块的Action下拉框,选中第一个 Add Service
进入新增Service的页面。

一些问题

问题1

我增加一个节点的时候遇到如下报错

1
2
3
Host with invalid Cloudera Manager GUID is detected
...
Error, CM server guid updated, expected c3b5fe15-5f29-434b-ae0a-4750b56c72ab, received dc1d28d4-4c78-4a07-919b-a9eaf7190d41

解决方法:

1
2
3
4
5
6
7
8
9
验证如下配置文件,确定hostname是否正确
$ nano /etc/cloudera-scm-agent/config.ini
so that the hostname where the same as the command $ hostname returned.
Then rm /var/lib/cloudera-scm-agent/cm_guid
然后删除每个节点的cm_guid
then I restarted the agent and the server of cloudera:
然后重启
$ service cloudera-scm-agent restart
$ service cloudera-scm-server restart

问题2

在NameNode Format的时候遇到如下报错
Running in non-interactive mode, and data appears to exist in Storage Directory /dfs/nn. Not formatting.

解决方案:
删除/dfs/nn 以及 /dfs/dn里面的所有数据
因为之前我安装了一个单机集群,HDFS里面放了一些数据

问题3

Cloudera 在Validate Hive Metastore schema的时候出现如下错误,发现metastore里面没有VERSION table

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Fri Jul 19 14:06:33 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version, Cause:Table 'metastore.VERSION' doesn't exist
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version, Cause:Table 'metastore.VERSION' doesn't exist
at org.apache.hadoop.hive.metastore.CDHMetaStoreSchemaInfo.getMetaStoreSchemaVersion(CDHMetaStoreSchemaInfo.java:342)
at org.apache.hive.beeline.HiveSchemaTool.validateSchemaVersions(HiveSchemaTool.java:685)
at org.apache.hive.beeline.HiveSchemaTool.doValidate(HiveSchemaTool.java:578)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1142)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:313)
at org.apache.hadoop.util.RunJar.main(RunJar.java:227)
*** schemaTool failed ***

解决方案:
dennis@device1:/opt/cloudera/parcels/CDH/lib/hive/bin$ schematool -dbType mysql -initSchema -passWord password -userName hive

问题4

在Hue上的hive上运行一些 insert 和count(*) 操作时候会一直卡住(stuck, hang),没有任何反应,也没报错。
看日志是说MR 还没有启动。在Cloudera的community上查到 要mapred-site.xml的参数 mapreduce.framework.name 设置为 local

于是我在CDH中的Yarn集群下修改了mapreduce.framework.name 为 local,然后重启集群后就成功了。 select count(*) 和 insert就不会卡住了。

问题5

在hue上面可以正常地使用Hive。在device2下用hive cli没有问题。但在device1 下的bash执行hive command,里面输入show databases后报错:

1
SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

解决方案:
先用 hive -hiveconf hive.root.logger=DEBUG,console
在调试,查看到更多有价值的报错信息。
果然,查到如下信息:

1
2
3
4
5
6
7
8
9
10
Caused by: java.io.IOException: Keystore was tampered with, or password was incorrect
at com.sun.crypto.provider.JceKeyStore.engineLoad(JceKeyStore.java:865) ~[sunjce_provider.jar:1.8.0_112]
at java.security.KeyStore.load(KeyStore.java:1445) ~[?:1.8.0_121]
at org.apache.hadoop.security.alias.AbstractJavaKeyStoreProvider.locateKeystore(AbstractJavaKeyStoreProvider.java:322) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
at org.apache.hadoop.security.alias.AbstractJavaKeyStoreProvider.<init>(AbstractJavaKeyStoreProvider.java:86) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
at org.apache.hadoop.security.alias.LocalJavaKeyStoreProvider.<init>(LocalJavaKeyStoreProvider.java:58) ~[hadoop-common-3.0.0-cdh6.2.0.jar:?]
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:237) ~[hive-exec-2.1.1-cdh6.2.0.jar:2.1.1-cdh6.2.0]
... 23 more
Caused by: java.security.UnrecoverableKeyException: Password verification failed

按日志的报错信息来说是我的源数据库密码不对,于是我查看hive-site.xml配置文件,发现
/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/hive/conf/hive-site.xml
也就是/etc/hive/conf/hive-site.xml(我猜CDH会把上面目录的所有配置文件复制一遍到 /etc/hive/conf/下)
我之前把它改了,所以那配置有问题,我把它改为默认的配置重启后就恢复正常了!

问题6

启动 cloudera-scm-server 时候报错如下:

1
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'metastore.CM_VERSION' doesn't exist

解决方案:
sudo /opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm mypassword
重新执行下这条命令。之前我把所有的服务都执行了一遍(amon, rman, … metastore, … etc),是我误解了scm_prepare_database.sh的作用。
按官网所说的:

1
2
3
4
Cloudera Manager Server includes a script that can create and configure a database for itself. The script can:
Create the Cloudera Manager Server database configuration file.
(MariaDB, MySQL, and PostgreSQL) Create and configure a database for Cloudera Manager Server to use.
(MariaDB, MySQL, and PostgreSQL) Create and configure a user account for Cloudera Manager Server.

这个脚本只需要执行一次就好了,就是sudo /opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm mypassword
然后重启cloudera-scm-server解决问题。此外,通过/etc/cloudera-scm-server/db.properties 也可以确定目前scm用的是哪个数据库。

问题7

1
The host's NTP service could not be located or did not respond to a request for the clock offset.

解决方法:

1
2
3
service ntp restart
# 加上自启动
sudo systemctl enable ntp

问题8

CDH官方是要求安装Oracle的JDK的,其它版本的JDK使用的话也行,但不保证一定兼容。
在增加节点的时候,在安装agent的时候 apt报错:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
root@device3:/data/software/jdk1.8.0_121# apt --fix-broken install
Reading package lists... Done
Building dependency tree
Reading state information... Done
Correcting dependencies... Done
The following additional packages will be installed:
cloudera-manager-daemons
The following NEW packages will be installed:
cloudera-manager-daemons
0 upgraded, 1 newly installed, 0 to remove and 106 not upgraded.
26 not fully installed or removed.
Need to get 0 B/1,218 MB of archives.
After this operation, 1,420 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
(Reading database ... 106482 files and directories currently installed.)
Preparing to unpack .../cloudera-manager-daemons_6.2.0~968826.ubuntu1804_all.deb ...
+======================================================================+
| Error: Unable to find a compatible version of Java on this host,|
| either because JAVA_HOME has not been set or because a |
| compatible version of Java is not installed. |
+----------------------------------------------------------------------+
| Please download a supported version of the Oracle JDK from the |
| Oracle Java web site: |
| |
| > http://www.oracle.com/technetwork/java/javase/index.html < |
| |
| Cloudera Manager requires Oracle JDK 1.8 or later. |
| NOTE: Cloudera Manager will find the Oracle JDK when starting, |
| regardless of whether you installed the JDK using a binary |
| installer or the RPM-based installer. |
+======================================================================+
dpkg: error processing archive /var/cache/apt/archives/cloudera-manager-daemons_6.2.0~968826.ubuntu1804_all.deb (--unpack):
new cloudera-manager-daemons package pre-installation script subprocess returned error exit status 1
Errors were encountered while processing:
/var/cache/apt/archives/cloudera-manager-daemons_6.2.0~968826.ubuntu1804_all.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

这是因为sudo java -version没有调通。
解决办法:
把sudo 下的java环境搞好就OK了。
或者直接切换到root用户下 apt --fix-broken install, 因为root用户下就不会有java路径找不到的问题

另外增加节点的话,要记得把hosts DNS复制一份在新加的节点上

Reference

https://www.cloudera.com/documentation/enterprise/6/6.2/topics/introduction.html
https://blog.csdn.net/qq_24409555/article/details/76139886
https://community.cloudera.com/t5/Batch-SQL-Apache-Hive/Hive-Errors-happend-when-execute-hive-service-metastore/m-p/93050#M3282