unable to ingest to databricks using streamsets

  • 10 February 2022
  • 7 replies

we are ingesting the data from oracle to databricks,while ingesting i could see some of the staging files (CSV) in s3 bucket are unable to insert into databrciks, its showing as stage errors. is their a way to move these staging files to different bucket and retry it again ?



7 replies

Userlevel 2

Hi @harshith, we don’t have an automatic way to do that, those files were marked as errors for some reason, you will have to check why they error out and fix it before processing them. In general, if you find a pattern, the best option will be to ensure (using processors or some other mechanism) that the incoming records are correct.

@alex.sanchez thanks for your reply, we went through the logs and data. we saw this below error for the records which went to stage errors

error:DELTA_LAKE_34 - Databricks delta lake load request failed: ’DELTA_LAKE_32’ -could not copy stage file <filename>  Error running query , at least one column must be specified for the table

-while we inserted the same record manually in databricks , we could see the record got inserted. No issues found on data as such.

Userlevel 2

The point then should be knowing why those DELTA_LAKE_32 are triggered, I saw that you opened another discussion about it, so it might be worth following it there.

Userlevel 3

Hi @harshith, when you say you “inserted the same records manually in Databricks”, what do you mean? did you try to copy the data from the actual CSV files generated by StreamSets? 

@Giuseppe Mura  yes for those records which went to staging errors, we copied those and used insert command on databricks. those records were inserted. but not sure why streamsets is unable to insert it 


@Giuseppe Mura this is the error im getting while ingesting through streamsets.

Databricks Delta Lake load request failed: DELTA_LAKE_32 - Could not copy staged file
'<filename.csv>': java.sql.SQLException: [Simba) (SparkJDBCDriver) (500051) ERROR
processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQL Exception: Error running query:
org.apache.spark.sql. AnalysisException: org.apache.hadoop.hive.ql.metadata. HiveException: at least one colúmn must be specified for the table at
org.apache.spark.sql.hive.thriftserver. SparkExecuteStatement$apache $spark$sql$hive$thriftserver $SparkExecuteStatement Operations Sexecute

Userlevel 5

@harshith Please open a support ticket.


I have notified the Support team of your previous asks and comments above.