How to initiate the emr cluster through streamset

  • 2 November 2022
  • 6 replies

How to initiate the emr cluster through streamset

6 replies

Userlevel 3

Hi @Priya151997 I am assuming you are talking in-regards to transformer engine for EMR cluster type.

Please check out the following doc : 

Yes but there is nothing mention how i can trigger my emr cluster if any event happend in aws s3

Userlevel 3

@Priya151997 With above mentioned document, it has reference of how to provision the new emr  cluster. So each time you trigger this transformer job. It will create new cluster and run job on that cluster.

Are you looking for this or something else. Please share your complete use-case details. 


Hi,thanks for the help actually I am new in streamsets.

actually in my usecase i want to trigger (launch)emr cluster using python script.can this possible in data collector?

Userlevel 3


You can use HTTP Client processor to call any APIs so if you can start a new cluster using APIs then you can definitely use the HTTP Client processor to do it from within a StreamSets pipeline. This is from SDC pipeline.


For Transformer, when you start your pipeline, the underlying EMR cluster that you have configured should be initiated automatically.

Hi All,

We are not going to use any EMR cluster within Streamsets,I need to know how to initiate EMR cluster which will launch in my AWS account for further job executions.

this will be only triggering events just to start the cluster.

and want to know for this purpose only data collector will be enough.

again want to mention that EMR cluster will hosted as an individual service on AWS account and not with Streamsets for using in data transformer.