Solved

STAGE ERROR: Data Collector pipeline loading into Snowflake DB

  • 18 November 2021
  • 6 replies
  • 210 views

Userlevel 3
Badge +1

I have a data collector pipeline that runs on data collector version 3.21.x

This pipeline reads data using a JDBC Multi-table Consumer Origin and writes to a Snowflake DB/table.

 

The pipeline used to run successfully. But in the past two days we encounter the following error,

 

"errorMessage": "SNOWFLAKE_59 - Could not get a Stage File Writer instance to write the records: 'SNOWFLAKE_65 - File upload to Snowflake failed: net.snowflake.client.jdbc.SnowflakeSQLLoggedException: S3 operation failed: Operation=listObjects, Error type=Client, Error code=AccessDenied, Error message=Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: DK041FE70QRW4B85; S3 Extended Request ID: DCSX2ikBaa3k/Vwcsc5+Cbb+pH+7KkLITtJYr+e3ea7cHlOVvVeDG9xhpZtQnbc/6FrAsef4C/w=), Request id=DK041FE70QRW4B85, Extended request id=DCSX2ikBaa3k/Vwcsc5+Cbb+pH+7KkLITtJYr+e3ea7cHlOVvVeDG9xhpZtQnbc/6FrAsef4C/w='",

 

Note: I am using the Snowflake Internal Stage NOT AWS S3 and hence unsure what permission is required. I’ve already created the Stage object in Snowflake and have loaded the data successfully previously. 

 

Any help would be appreciated.

 

I’ve set,

log4j.logger.com.streamsets.pipeline=DEBUG
log4j.logger.com.streamsets.datacollector=DEBUG

 

I see,

    DEBUG    Checking if any active runner is idle

 

Any clue and what this mean?

icon

Best answer by Srinivasan Sankar 23 November 2021, 08:52

View original

6 replies

Userlevel 2
Badge

Hi @Srinivasan Sankar ,

 

The issue seems to be a permission problem when connecting to the stagging area. Is it possible that your Snowflake credentials have been expired or changed?

 

I bet that the AWS errors is because Snowflake have it underneath, so the underlying error is related to that, but first let's check that your credentials are still valid.

Userlevel 3
Badge +1

Hi @alex.sanchez 

I can confirm my Snowflake credentials are good. I logged into Snowflake successfully. I ensured that the Schema Role is the owner of the Stage object in Snowflake.

 

I can see the following command was executed successfully in Snowflake,

PUT file:///tmp/sdc697734874214020810/sdc-aeeef657-cd00-4758-8f4d-0b06de9503ed.csv.gz @"SANDBOX"."USER_XXX550"."TEST_STG" parallel=1 auto_compress=false SOURCE_COMPRESSION=NONE

 

But list @TEST_STG; produced no results. I guess this correlates with the error message in the description. Although I do not understand why the PUT operation is marked as Completed Successfully in Snowflake.

 

Any clue? 

 

 

Userlevel 2
Badge

Hi, 

Checking the code, Snowflake is throwing back a SQLException, can you please check the SDC log and get us the full exception, there might be more information down below with the exact cause coming from Snowflake.

 

Thanks

Userlevel 3
Badge +1

Hi @alex.sanchez,

 

Could not see any info / error other than the below in the sdc.log file. I infact set my log level to Debug.

 

2021-11-18 11:05:08,648 [user:*srinivasan.sankar@aabbcc.com] [pipeline:Test_StreamSets_To_Snowflake/TestStreae49d56f8-0da7-4e9a-8a95-18c6edd8ac19] [runner:0] [thread:ProductionPipelineRunnable-TestStreae49d56f8-0da7-4e9a-8a95-18c6edd8ac19-Test_StreamSets_To_Snowflake] [stage:Snowflake_01] DEBUG SnowflakeRunnable - Could not get an instance of Stage File Writer. Thread name: ProductionPipelineRunnable-TestStreae49d56f8-0da7-4e9a-8a95-18c6edd8ac19-Test_StreamSets_To_Snowflake, Thread number: 0

 

And this is present in the Stage Error file (I write the stage error to S3),

        "errorPipelineName": "TestStreae49d56f8-0da7-4e9a-8a95-18c6edd8ac19",
        "errorStage": "Snowflake_01",
        "errorStageLabel": "Snowflake 1",
        "errorCode": "SNOWFLAKE_59",
        "errorMessage": "SNOWFLAKE_59 - Could not get a Stage File Writer instance to write the records: 'SNOWFLAKE_65 - File upload to Snowflake failed: net.snowflake.client.jdbc.SnowflakeSQLLoggedException: S3 operation failed: Operation=listObjects, Error type=Client, Error code=AccessDenied, Error message=Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: QEG56E8F7232AY06; S3 Extended Request ID: P0da1Ez8wWH1i+p6oiivltumSmrsyj4YQ3ItsZ5r5DtWChreKt+13qPtk8oIIIdLcwT5DySq220=), Request id=QEG56E8F7232AY06, Extended request id=P0da1Ez8wWH1i+p6oiivltumSmrsyj4YQ3ItsZ5r5DtWChreKt+13qPtk8oIIIdLcwT5DySq220='",

 

 

 

Thanks

 

Userlevel 3
Badge

hi @Srinivasan Sankar , could you try and copy some data from an existing Snowflake table into a file in that internal stage to see if that also generates a similar error?


I think it should be a command such as:

COPY INTO @SANDBOX.USER_XXX550.TEST_STG/tmp/table.csv FROM table

Userlevel 3
Badge +1

Thanks @Giuseppe Mura. The command that you shared worked. 

--

Ours was a different issue, in our Java Opts (Args) we had the following entries for -Dhttp.nonProxyHosts,

 

-Dhttp.nonProxyHosts = “…|*.s3.ap-southeast-2.amazonaws.com|*.s3.amazonaws.com|*.amazonaws.com

 

We had issues previously with all S3 traffic from SDC using the proxy server even when the proxy bypass rules were set.

 

Hence we used *. to test whether the traffic from SDC bypasses proxy.

 

This *. eventually bypassed the proxy (we hope) when writing to S3 (i.e. to those buckets that are accessible through our VPC).

 

But *. setting meant the request issued by StreamSets to write to Snowflake’s staging object (that resides in Snowflake’s internal S3) also bypassed our proxy server. Note: Our Snowflake region is also ap-southeast-2.

Hence the permission error as mentioned in the question. 

 

When we removed *.s3.ap-southeast-2.amazonaws.com|*.s3.amazonaws.com|*.amazonaws.com  and re-deployed our sdc image and reran our StreamSets pipeline it completed successfully writing data into Snowflake.

 

Luckily this pipeline was a working pipeline and we were able to rollback our recent changes, build & redeploy the image, test and confirm it was the proxy settings that lead to this error.

Reply