Skip to main content

I am executing a simple transformer pipeline on EMR cluster (6.5.0). EMR logs show that the job is successful, but the transformer pipeline fails with error “START_ERROR: Application has completed. Clearing staged files..”. What could be the issue here? Pipeline logs are as below:

2023-01-27 10:44:46,788    INFO    Current Step Status is PENDING. Waiting for 30 seconds before checking status again..    EMRAppLauncher    *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663        runner-pool-2-thread-29
2023-01-27 10:45:16,824    INFO    Application started successfully. Current Status is RUNNING    EMRAppLauncher    *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663        runner-pool-2-thread-29
2023-01-27 10:45:16,824    INFO    DataTransformerLauncher start method finished    DataTransformerLauncher    *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663        runner-pool-2-thread-29
2023-01-27 10:46:16,857    INFO    Application has completed. Clearing staged files..    EMRAppLauncher    *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663        Status Check Executor
2023-01-27 10:46:16,858    WARN    java.nio.file.NoSuchFileException: /data/transformer/runInfo/EMRAWS__55c36ec1-2536-4ad6-95d3-d38ae28dd36b__b601f953-5a66-11ed-b8e4-b31442db6663/run1674816255936/driver-topLevelError.log    TransformerUtil    *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663        runner-pool-2-thread-10
2023-01-27 10:46:16,868    INFO    Deleting elasticmapreduce/streamsets/EMRAWS__55c36ec1-2536-4ad6-95d3-d38ae28dd36b__b601f953-5a66-11ed-b8e4-b31442db6663/run1674816255941/pipeline.json,elasticmapreduce/streamsets/EMRAWS__55c36ec1-2536-4ad6-95d3-d38ae28dd36b__b601f953-5a66-11ed-b8e4-b31442db6663/run1674816255941/offset.json,elasticmapreduce/streamsets/EMRAWS__55c36ec1-2536-4ad6-95d3-d38ae28dd36b__b601f953-5a66-11ed-b8e4-b31442db6663/run1674816255941/etc.tar.gz,s3://aws-logs-292681323151-ap-south-1/elasticmapreduce/streamsets/EMRAWS__55c36ec1-2536-4ad6-95d3-d38ae28dd36b__b601f953-5a66-11ed-b8e4-b31442db6663/run1674816255941    EMRAppLauncher    *842bf273-5a66-11ed-b52a-ab8a97a1d11f@b601f953-5a66-11ed-b8e4-b31442db6663        Status Check Executor
2023-01-27 10:46:16,928    INFO    Removing runner for pipeline EMRAWS__55c36ec1-2536-4ad6-95d3-d38ae28dd36b__b601f953-5a66-11ed-b8e4-b31442db6663

@madhusudan_shastri :

 

Spark cluster must be able to access Transformer to send the status, metrics, and offsets for running pipelines. Please configure transformer.base.http.url or cluster call back url  with the one which is accessible from the cluster. 


Reply