Issue:
When trying to read from Hadoop from an unmanaged Cloudera node with a pipeline the following error appears:
/usr/bin/hadoop: No such file or directory
Solution:
- On the external host download the CDH repo file to the /etc/yum.repos.d/ directory (change the path to match the OS release and CDH version of the client you need):
curl -O https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo
- Edit the base URL in the cloudera-cdh5.repo file to install the CDH version (otherwise, it will install the latest). For example, to install the 5.7.1 hadoop-client, update the baseurl to:
baseurl=https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.7.1/
- Install the hadoop-client rpm:
$ yum clean all$ yum install hadoop-client
Note: You can also download the RPM file and install it locally if desired.
(See http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.7.1/RPMS/x86_64/)
Once we have installed the needed packages we need to configure the client on the unmanaged node.
- In Cloudera Manager navigate to, HDFS -> "Actions" drop down -> "Download Client Configuration" (this will download a zip file called hdfs-clientconfig.zip).
- Move the zip file over to the external host and unzip it.
- Copy all the unzipped configuration files to /etc/hadoop/conf. Example:
$ cp * /etc/hadoop/conf
-
Run Hadoop commands. Example:
$ sudo -u hdfs hadoop fs -ls