Question

Unable to ingest data from Azure SQL (CDC) to Azure Data bricks using Stream Sets.

2 years ago
August 11, 2022
4 replies
117 views

gkognole
Fan
2 replies

Trying to build data pipeline for Azure SQL Server DB (CDC) as source and Azure Data bricks (Delta tables) as destination

I have referred data pipeline sample from
https://github.com/streamsets/pipeline-library/tree/master/datacollector/sample-pipelines/pipelines/SQLServer%20CDC%20to%20Delta%20Lake

Getting below error for few records in Schema preview as-well:

DELTA_LAKE_34 - Databricks Delta Lake load request failed: 'DELTA_LAKE_32 - Could not copy staged file 'sdc-4a076fce-7a73-45ba-8dd7-29e58848cf23.csv': java.sql.SQLException: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: Unable to infer schema for CSV. It must be specified manually.

Note : On Preview/Draft Run → Pipeline is able to capture changes from Source DB, successfully created files in stage (ADLS container) and created Delta tables at destination but it it fails to ingest records there.

saleempothiwala
Headliner
258 replies
2 years ago
August 11, 2022

@gkognole I have seen these kind of errors when the file starts with something like _ or is empty. From the looks of if, your filenames start with sdc- so could be a good idea to check if any temp files are being created and read from.

alex.sanchez
StreamSets Employee
73 replies
2 years ago
August 12, 2022

@gkognole

Could it be that you are using an unsupported version of the cluster? (we support 6.x, 7.x and 8.x only)

Alex - StreamSets Engineering Manager @ Collector

G

gkognole
Author
Fan
2 replies
2 years ago
August 12, 2022

saleempothiwala wrote:

@gkognole I have seen these kind of errors when the file starts with something like _ or is empty. From the looks of if, your filenames start with sdc- so could be a good idea to check if any temp files are being created and read from.

Thank you @saleempothiwala for your reply.

Yes, my stage file name starts with sdc- and there are no temp files created with _

G

gkognole
Author
Fan
2 replies
2 years ago
August 12, 2022

alex.sanchez wrote:

@gkognole

Could it be that you are using an unsupported version of the cluster? (we support 6.x, 7.x and 8.x only)

Thank you @alex.sanchez for your reply.

I am using Databricks Runtime Version : 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).

I will give try using 8.x version if it resolves the issue.

Reply

Related topics

unable to ingest to databricks using streamsetsicon

Ingestion to Azure MSSQL using SDC.

Technical Service Bulletin 2022-02-28 (TSB) - Oracle CDC origin potential data loss when Daylight Saving Time enabled in Oracle Database

Technical Service Bulletin 2022-02-28 (TSB) - Oracle CDC origin potential data loss when Daylight Saving Time enabled in Oracle Database

Zero to StreamSets

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded