How Can I access the Spark Driver logs in EMR ?

  • 17 January 2022
  • 0 replies
  • 94 views

Userlevel 4
Badge

Product: Transformer all version.

 

Transformer Support only Cluster mode with cluster manager type EMR.

 

So when you submit the Spark application in cluster mode, the driver process runs in the application master container. The application master is the first container that runs when the Spark application executes. The client logs the YARN application report. To get the driver logs:

1.    Get the application ID from the client logs. In the following example, application_1572839353552_0008 is the application ID.

19/11/04 05:24:42 INFO Client: Application report for application_1572839353552_0008 (state: ACCEPTED)

2.    Identify the application master container logs. The following is an example list of Spark application logs. In this list, container_1572839353552_0008_01_000001 is the first container, which means that it's the application master container.

s3://aws-logs-111111111111-us-east-1/elasticmapreduce/j-35PUYZBQVIJNM/containers/application_1572839353552_0008/container_1572839353552_0008_01_000001/stderr.gz

s3://aws-logs-111111111111-us-east-1/elasticmapreduce/j-35PUYZBQVIJNM/containers/application_1572839353552_0008/container_1572839353552_0008_01_000001/stdout.gz

s3://aws-logs-111111111111-us-east-1/elasticmapreduce/j-35PUYZBQVIJNM/containers/application_1572839353552_0008/container_1572839353552_0008_01_000002/stderr.gz

s3://aws-logs-111111111111-us-east-1/elasticmapreduce/j-35PUYZBQVIJNM/containers/application_1572839353552_0008/container_1572839353552_0008_01_000002/stdout.gz

3.    Download the application master container logs to an EC2 instance:

aws s3 sync s3://aws-logs-111111111111-us-east-1/elasticmapreduce/j-35PUYZBQVIJNM/containers/application_1572839353552_0008/ application_1572839353552_0008/

4.    Open the Spark application log folder:

cd application_1572839353552_0008/

5.    Uncompress the log file:

find . -type f -exec gunzip {} \;

6.    Search all container logs for errors and warnings:

egrep -Ril "ERROR|WARN" . | xargs egrep "WARN|ERROR"

7.    Open the container logs that are returned in the output of the previous command.

On a running cluster, you can use the YARN CLI to get the YARN application container logs. For a Spark application submitted in cluster mode, you can access the Spark driver logs by pulling the application master container logs like this:

# 1. Get the address of the node that the application master container ran on $ yarn logs -applicationId application_1585844683621_0001 | grep 'Container: container_1585844683621_0001_01_000001' 20/04/02 19:15:09 INFO client.RMProxy: Connecting to ResourceManager at ip-xxx-xx-xx-xx.us-west-2.compute.internal/xxx.xx.xx.xx:8032 Container: container_1585844683621_0001_01_000001 on ip-xxx-xx-xx-xx.us-west-2.compute.internal_8041 # 2. Use the node address to pull the container logs $ yarn logs -applicationId application_1585844683621_0001 -containerId container_1585844683621_0001_01_000001 -nodeAddress ip-xxx-xx-xx-xx.us-west-2.compute.internal

 


0 replies

Be the first to reply!

Reply