Question

issues while ingesting data to databricks through streamsets

  • 10 February 2022
  • 7 replies
  • 85 views

while ingesting data from oracle to databricks, i could see few staging files(s3 bucket) are moving into stage errors

error:’DELTA_LAKE_32’ -could not copy stage file <filename> 

Error running query , at least one column must be specified for the table

while we inserted the same record manually in databricks , we could see the record got inserted. No issues found on data as such.

can you please give suggestions on the error which we are receiving.

 


7 replies

Userlevel 2
Badge

Hi @harshith ,

Are you using “normal” data or cdc data?

In any case, looks like your data is missing some key information about which columns are being changed
 

Hi @alex.sanchez we are using cdc data, we are retreiving the type of operation happening (INSERT ,UPDATE, DELETE) as one of the meta column using ${record:attribute(‘oracle.cdc.operation’)}

we could see its updating fine for few tables but some tables its going to stage errors on databricks level

Userlevel 2
Badge

Hi @harshith, can you please provide the full error that you are getting after the DELTA_LAKE_32?

@alex.sanchez 

Databricks Delta Lake load request failed: DELTA_LAKE_32 - Could not copy staged file
'<filename.csv>': java.sql.SQLException: [Simba) (SparkJDBCDriver) (500051) ERROR
processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQL Exception: Error running query:
org.apache.spark.sql. AnalysisException: org.apache.hadoop.hive.ql.metadata. HiveException: at least one colúmn must be specified for the table at
org.apache.spark.sql.hive.thriftserver. SparkExecuteStatement Operation.org$apache $spark$sql$hive$thriftserver $SparkExecuteStatement Operations Sexecute

Userlevel 2
Badge

Hi @harshith,

Googling around, looks like these errors tends to appear when datatype or columns does not match.

If your data can contain any characters like commas or new lines, I would recommend you changing the default values for the quoting mode and setting different values there, since they can corrupt the CSV.

@alex.sanchez went through the data and datatype, everything seems to be fine. tried with one of the record to insert it manually on databricks and its getting inserted.

is their a way to move those csv files which did not get processed(staging layer) to another folder and retry the same operation of inserting into databricks in streamsets?

 

Userlevel 2
Badge

No, currently we don’t have a way to do that :(

To give extra context, we are trying to copy the data to a temporary table and from there we do the merge into the destination table, for some reason, we are struggling to copy into the temporary table.

Reply