Skip to main content

SNOWFLAKE_23 - Could not Merge staged file.


AkshayJadhav
StreamSets Employee
Forum|alt.badge.img

Salesforce CDC to Snowflake destination writes fail for updated records.SNOWFLAKE_23 - Could not Merge staged file
 

2020-01-14 14:13:28,758 [user:*admin] [pipeline:CDC_Load_Salesforce_Contact_Snowflake/CDCLoadSalesforceContactSnowflakef9dfcbf8-6333-408c-8bd4-b8689040dd67] [runner:0] [thread:ProductionPipelineRunnable-CDCLoadSalesforceContactSnowflakef9dfcbf8-6333-408c-8bd4-b8689040dd67-CDC_Load_Salesforce_Contact_Snowflake] [stage:Snowflake_01] DEBUG SnowflakeRunnable - Exception when performing load trigger: 
com.streamsets.pipeline.api.StageException: SNOWFLAKE_23 - Could not Merge staged file 'sdc-ad001fd1-af05-42f2-a3cc-91211b41eca6.csv.gz': net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error:
Expression type does not match column data type, expecting VARIANT but got VARCHAR(16777216) for column MAILINGADDRESS
 at com.streamsets.pipeline.lib.snowflake.MergeTrigger.process(MergeTrigger.java:43)
 at com.streamsets.pipeline.stage.destination.snowflake.SnowflakeRunnable.call(SnowflakeRunnable.java:262)
 at com.streamsets.pipeline.stage.destination.snowflake.SnowflakeRunnable.call(SnowflakeRunnable.java:30)

CDC notifications from Salesforce contain only the changed data and there is no option to receive the full record. The task here would be to enrich the updated records and ensure the updated records are sent to salesforce along with all the other fields that have not been changed.  Salesforce CDC is omitting the columns that are not modified when it is sending the modified records or updates. Which is causing issues in snowflake to merge.

That is not an error of the connector, that is how Salesforce works with CDC data. Whenever you have CDC data, it is required for your CDC producer to include the FULL record that is being updated including all the fields that do not change.

We are doing a MERGE from a staged file, there is no way, for different rows in the file to have different fields.

The pipeline will need to be modified to send full records with all columns for updated records to mitigate this issue with merging records in Snowflake.

You will need to use a Salesforce Lookup processor. 

Refer to the attached screenshot.

You will need to set the lookup to Retrieve mode since this is most efficient when dealing with batches and to reduce memory footprint.

Salesforce Fields will be the list of columns which updated records will extract to be sent over to Snowflake

Object Type This is the Salesforce object that we need to retrieve.

 

Did this topic help you find an answer to your question?
This topic has been closed for comments