You try to write some data to HDFS and encounter the following error:
HADOOPFS_44 - Could not verify the base directory: 'org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: sdc is not allowed to impersonate hadoop'
If your Hadoop cluster is Kerberized you must have a Kerberos service principal for Data Collector, typically it should be sdc/<HOST>
(where <HOST>
is the hostname where Data Collector runs) and the Data Collector user name for Hadoop is sdc
).
If your Hadoop cluster is not Kerberized, the Data Collector user name for Hadoop is the Unix user name that started the Data Collector. This could be sdc
if you are running it as a service, or your own user name.
It looks like your Data Collector user name for Hadoop is sdc
, so I'll use that in the remainder of this answer.
In the Hadoop FS destination, if you want to impersonate a different Hadoop user than the one running the data collector (user sdc
), in the Hadoop FS tab, you should set the HDFS User to the desired user. This it is all you have to do in Data Collector.
Next, you'll have to configure the HDFS name node to allow the Data Collector user (user sdc
), to be a proxy user for other users. You do that by setting the following properties in the hdfs-site.xml
of your name node, or the corresponding safety valve if you're using Cloudera Manager:
<property> <name>hadoop.proxyuser.sdc.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.sdc.groups</name> <value>*</value> </property>
Remember, this is assuming your Data Collector is using the Hadoop user name sdc
.
Once you make those changes, you need to restart the name node.
If you are running a production setup make sure you configure the proxy user properties above in the most restrictive manner possible for your usage (instead of using *
, that means ALL).
NOTE: If you leave the Hadoop FS destination Hadoop User configuration empty, then your pipeline will interact with HDFS as the Hadoop user running the the Data Collector (user sdc
).