Question

unable to ingest to databricks using streamsets

2 years ago
10 February 2022
7 replies
92 views

harshith
Discovered Fame
11 replies

we are ingesting the data from oracle to databricks,while ingesting i could see some of the staging files (CSV) in s3 bucket are unable to insert into databrciks, its showing as stage errors. is their a way to move these staging files to different bucket and retry it again ?

7 replies

Userlevel 2

alex.sanchez
StreamSets Employee
73 replies
2 years ago
10 February 2022

Hi @harshith, we don’t have an automatic way to do that, those files were marked as errors for some reason, you will have to check why they error out and fix it before processing them. In general, if you find a pattern, the best option will be to ensure (using processors or some other mechanism) that the incoming records are correct.

harshith
Author
Discovered Fame
11 replies
2 years ago
10 February 2022

@alex.sanchez thanks for your reply, we went through the logs and data. we saw this below error for the records which went to stage errors

error:DELTA_LAKE_34 - Databricks delta lake load request failed: ’DELTA_LAKE_32’ -could not copy stage file <filename> Error running query , at least one column must be specified for the table

-while we inserted the same record manually in databricks , we could see the record got inserted. No issues found on data as such.

Userlevel 2

alex.sanchez
StreamSets Employee
73 replies
2 years ago
10 February 2022

The point then should be knowing why those DELTA_LAKE_32 are triggered, I saw that you opened another discussion about it, so it might be worth following it there.

Userlevel 3

Giuseppe Mura
StreamSets Employee
37 replies
2 years ago
10 February 2022

Hi @harshith, when you say you “inserted the same records manually in Databricks”, what do you mean? did you try to copy the data from the actual CSV files generated by StreamSets?

harshith
Author
Discovered Fame
11 replies
2 years ago
10 February 2022

@Giuseppe Mura yes for those records which went to staging errors, we copied those and used insert command on databricks. those records were inserted. but not sure why streamsets is unable to insert it

harshith
Author
Discovered Fame
11 replies
2 years ago
10 February 2022

@Giuseppe Mura this is the error im getting while ingesting through streamsets.

Databricks Delta Lake load request failed: DELTA_LAKE_32 - Could not copy staged file
'<filename.csv>': java.sql.SQLException: [Simba) (SparkJDBCDriver) (500051) ERROR
processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQL Exception: Error running query:
org.apache.spark.sql. AnalysisException: org.apache.hadoop.hive.ql.metadata. HiveException: at least one colúmn must be specified for the table at
org.apache.spark.sql.hive.thriftserver. SparkExecuteStatement Operation.org$apache $spark$sql$hive$thriftserver $SparkExecuteStatement Operations Sexecute

Userlevel 5

Drew Kreiger
Senior Community Builder at StreamSets
95 replies
2 years ago
15 February 2022

@harshith Please open a support ticket. https://streamsets.com/support/

I have notified the Support team of your previous asks and comments above.

Reply

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded