Skip to main content
Question

Unable to ingest data from Azure SQL (CDC) to Azure Data bricks using Stream Sets.


Trying to build data pipeline for Azure SQL Server DB (CDC) as source and Azure Data bricks (Delta tables) as destination

I have referred data pipeline sample from
https://github.com/streamsets/pipeline-library/tree/master/datacollector/sample-pipelines/pipelines/SQLServer%20CDC%20to%20Delta%20Lake

 

Getting below error for few records in Schema preview as-well:

DELTA_LAKE_34 - Databricks Delta Lake load request failed: 'DELTA_LAKE_32 - Could not copy staged file 'sdc-4a076fce-7a73-45ba-8dd7-29e58848cf23.csv': java.sql.SQLException: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: Unable to infer schema for CSV. It must be specified manually.
 

Note : On Preview/Draft Run → Pipeline is able to capture changes from Source DB, successfully created files in stage (ADLS container) and created Delta tables at destination but it it fails to ingest records there.

 

4 replies

saleempothiwala
Headliner
Forum|alt.badge.img

@gkognole I have seen these kind of errors when the file starts with something like _ or is empty. From the looks of if, your filenames start with sdc- so could be a good idea to check if any temp files are being created and read from.


alex.sanchez
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 73 replies
  • August 12, 2022

@gkognole

Could it be that you are using an unsupported version of the cluster? (we support 6.x, 7.x and 8.x only)


  • Author
  • Fan
  • 2 replies
  • August 12, 2022
saleempothiwala wrote:

@gkognole I have seen these kind of errors when the file starts with something like _ or is empty. From the looks of if, your filenames start with sdc- so could be a good idea to check if any temp files are being created and read from.

Thank you @saleempothiwala for your reply.

Yes, my stage file name starts with sdc- and there are no temp files created with _


  • Author
  • Fan
  • 2 replies
  • August 12, 2022
alex.sanchez wrote:

@gkognole

Could it be that you are using an unsupported version of the cluster? (we support 6.x, 7.x and 8.x only)

Thank you @alex.sanchez for your reply.

I am using Databricks Runtime Version : 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12). 

I will give try using 8.x version if it resolves the issue.


Reply