How can we pass custom yarn / mapred configuration from a pipeline stage

Product: StreamSets Data Collector

Question:

When running a Hadoop Cluster batch process and Mapred stage in SDC can we pass a custom mapred-site.xml configuration from SDC pipeline.

In this case the default value of mapreduce.task.timeout is 600000 and we wanted to run the MR job with this value set to 200000.

One way to validate is the job config is dumped in the generated xml file of JHS.

hdfs dfs -get  /user/^Cstory/done/2020/10/16/000000/*

/root@node-1 94-yarn-JOBHISTORY]# cat job_1602856174452_0001_conf.xml | grep -i mapreduce.task.timeout
<property><name>mapreduce.task.timeout</name><value>600000</value><final>false</final><source>mapred-site.xml</source><source>job.xml</source></property>

Answer:

Please find the screenshots that describe how you can set MR parameters and please find the snippet below to validate that in your job

eroot@node-1 86-hdfs-NAMENODE]# hdfs dfs -get  /user/history/done/2020/10/16/000000/* .
get: `job_1602856174452_0001-1602870975867-hive-insert+into+dbo+values+%28%27a1%27%29+%28Stage%2D1%29-1602870993928-1-0-SUCCEEDED-root.users.hive-1602870985998.jhist': File exists
get: `job_1602856174452_0001_conf.xml': File exists
get: `job_1602856174452_0002-1602885381377-sdc-StreamSets+Data+Collector%3A+Case15342%2DAPACHE%2DCluste-1602885426029-1-0-SUCCEEDED-root.users.sdc-1602885394665.jhist': File exists
get: `job_1602856174452_0002_conf.xml': File exists
get: `job_1602856174452_0003-1602886921764-sdc-StreamSets+Data+Collector%3A+Case15342%2DAPACHE%2DCluste-1602886971341-1-0-SUCCEEDED-root.users.sdc-1602886938098.jhist': File exists
get: `job_1602856174452_0003_conf.xml': File exists

 root@node-1 86-hdfs-NAMENODE]# cat job_1602856174452_0005_conf.xml | grep -i mapreduce.task.timeout
<property><name>mapreduce.task.timeout</name><value>200000</value><final>false</final><source>programatically</source><source>job.xml</source></property>
 root@node-1 86-hdfs-NAMENODE]#
 root@node-1 86-hdfs-NAMENODE]#
 root@node-1 86-hdfs-NAMENODE]#
 root@node-1 86-hdfs-NAMENODE]#

Be the first to reply!

Question:

Answer:

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded