Question

unable to ingest to databricks using streamsets

Forum|Forum|3 years ago
February 10, 2022
7 replies
109 views

harshith
Discovered Fame

we are ingesting the data from oracle to databricks,while ingesting i could see some of the staging files (CSV) in s3 bucket are unable to insert into databrciks, its showing as stage errors. is their a way to move these staging files to different bucket and retry it again ?

alex.sanchez
StreamSets Employee
Forum|Forum|3 years ago
February 10, 2022

Hi @harshith, we don’t have an automatic way to do that, those files were marked as errors for some reason, you will have to check why they error out and fix it before processing them. In general, if you find a pattern, the best option will be to ensure (using processors or some other mechanism) that the incoming records are correct.

Alex - StreamSets Engineering Manager @ Collector

harshith
Author
Discovered Fame
Forum|Forum|3 years ago
February 10, 2022

@alex.sanchez thanks for your reply, we went through the logs and data. we saw this below error for the records which went to stage errors

error:DELTA_LAKE_34 - Databricks delta lake load request failed: ’DELTA_LAKE_32’ -could not copy stage file <filename> Error running query , at least one column must be specified for the table

-while we inserted the same record manually in databricks , we could see the record got inserted. No issues found on data as such.

alex.sanchez
StreamSets Employee
Forum|Forum|3 years ago
February 10, 2022

The point then should be knowing why those DELTA_LAKE_32 are triggered, I saw that you opened another discussion about it, so it might be worth following it there.

Alex - StreamSets Engineering Manager @ Collector

Giuseppe Mura
StreamSets Employee
Forum|Forum|3 years ago
February 10, 2022

Hi @harshith, when you say you “inserted the same records manually in Databricks”, what do you mean? did you try to copy the data from the actual CSV files generated by StreamSets?

harshith
Author
Discovered Fame
Forum|Forum|3 years ago
February 10, 2022

@Giuseppe Mura yes for those records which went to staging errors, we copied those and used insert command on databricks. those records were inserted. but not sure why streamsets is unable to insert it

harshith
Author
Discovered Fame
Forum|Forum|3 years ago
February 10, 2022

@Giuseppe Mura this is the error im getting while ingesting through streamsets.

Databricks Delta Lake load request failed: DELTA_LAKE_32 - Could not copy staged file
'<filename.csv>': java.sql.SQLException: [Simba) (SparkJDBCDriver) (500051) ERROR
processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQL Exception: Error running query:
org.apache.spark.sql. AnalysisException: org.apache.hadoop.hive.ql.metadata. HiveException: at least one colúmn must be specified for the table at
org.apache.spark.sql.hive.thriftserver. SparkExecuteStatement Operation.org$apache $spark$sql$hive$thriftserver $SparkExecuteStatement Operations Sexecute

Drew Kreiger
Senior Community Builder at StreamSets
Forum|Forum|3 years ago
February 15, 2022

@harshith Please open a support ticket. https://streamsets.com/support/

I have notified the Support team of your previous asks and comments above.

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded