Question

How to configure the transformer pipeline to run on AWS EMR cluster?

  • 27 January 2023
  • 3 replies
  • 83 views

Hi

I have created a simple transformer pipeline trying to run on EMR cluster. In Cluster configuration I have used required configuration details.

Facing below issue,

 

 

Can anyone help me this?


3 replies

Userlevel 5
Badge +1

@akshayp2 

 

Please find attached the EMR cluster configuration and please check if the cluster id has been set properly.

@Bikram 

As mentioned in snippets we have configured the pipeline and We are trying to run a transformer pipeline on AWS EMR. The pipeline starts properly. It goes into running state, on EMR cluster it completes the run successfully but at control hub, it fails with error “START_ERROR: Application has completed. Clearing staged files..”.  

Pipeline error log message:
This Transformer generated the following error message: "Http failure response for https://na01.hub.streamsets.com/tunneling/rest/3b28673a-87a4-4dcb-a749-13643daf3bff/rest/v1/pipeline/EMRAWS__55c36ec1-2536-4ad6-95d3-d38ae28dd36b__b601f953-5a66-11ed-b8e4-b31442db6663/driverLogs?runCount=1&endingOffset=-1&jobId=55c36ec1-2536-4ad6-95d3-d38ae28dd36b:b601f953-5a66-11ed-b8e4-b31442db6663&jobRunCount=9&TUNNELING_INSTANCE_ID=tunneling-1: 500 Internal Server Error". Check if you can access the Transformer URL from this browser

Pipeline detailed logs:
2023-01-27 10:44:46,788    INFO    Current Step Status is PENDING. Waiting for 30 seconds before checking status again..    EMRAppLauncher    *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663        runner-pool-2-thread-29
2023-01-27 10:45:16,824    INFO    Application started successfully. Current Status is RUNNING    EMRAppLauncher    *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663        runner-pool-2-thread-29
2023-01-27 10:45:16,824    INFO    DataTransformerLauncher start method finished    DataTransformerLauncher    *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663        runner-pool-2-thread-29
2023-01-27 10:46:16,857    INFO    Application has completed. Clearing staged files..    EMRAppLauncher    *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663        Status Check Executor
2023-01-27 10:46:16,858    WARN    java.nio.file.NoSuchFileException: /data/transformer/runInfo/EMRAWS__55c36ec1-2536-4ad6-95d3-d38ae28dd36b__b601f953-5a66-11ed-b8e4-b31442db6663/run1674816255936/driver-topLevelError.log    
TransformerUtil    *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663        runner-pool-2-thread-10
2023-01-27 10:46:16,868    INFO    Deleting elasticmapreduce/streamsets/EMRAWS__55c36ec1-2536-4ad6-95d3-d38ae28dd36b__b601f953-5a66-11ed-b8e4-b31442db6663/run1674816255941/pipeline.json,elasticmapreduce/streamsets/EMRAWS__55c36ec1-2536-4ad6-95d3-d38ae28dd36b__b601f953-5a66-11ed-b8e4-b31442db6663/run1674816255941/offset.json,elasticmapreduce/streamsets/EMRAWS__55c36ec1-2536-4ad6-95d3-d38ae28dd36b__b601f953-5a66-11ed-b8e4-b31442db6663/run1674816255941/etc.tar.gz,s3://aws-logs-292681323151-ap-south-1/elasticmapreduce/streamsets/EMRAWS__55c36ec1-2536-4ad6-95d3-d38ae28dd36b__b601f953-5a66-11ed-b8e4-b31442db6663/run1674816255941    EMRAppLauncher    *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663        Status Check Executor
2023-01-27 10:46:16,928    INFO    Removing runner for pipeline EMRAWS__55c36ec1-2536-4ad6-95d3-d38ae28dd36b__b601f953-5a66-11ed-b8e4-b31442db6663
 

Userlevel 2
Badge +1

@akshayp2 :

Spark cluster must be able to access Transformer to send the status, metrics, and offsets for running pipelines. Please configure transformer.base.http.url or cluster call back url  with the one which is accessible from the cluster. 

Reply