issues while ingesting data to databricks through streamsets

while ingesting data from oracle to databricks, i could see few staging files(s3 bucket) are moving into stage errors

error:’DELTA_LAKE_32’ -could not copy stage file <filename>

Error running query , at least one column must be specified for the table

while we inserted the same record manually in databricks , we could see the record got inserted. No issues found on data as such.

can you please give suggestions on the error which we are receiving.

Page 1 / 1

Hi @harshith ,

Are you using “normal” data or cdc data?

In any case, looks like your data is missing some key information about which columns are being changed

Hi @alex.sanchez we are using cdc data, we are retreiving the type of operation happening (INSERT ,UPDATE, DELETE) as one of the meta column using ${record:attribute(‘oracle.cdc.operation’)}

we could see its updating fine for few tables but some tables its going to stage errors on databricks level

Hi @harshith, can you please provide the full error that you are getting after the DELTA_LAKE_32?

@alex.sanchez

Databricks Delta Lake load request failed: DELTA_LAKE_32 - Could not copy staged file
'<filename.csv>': java.sql.SQLException: Simba) (SparkJDBCDriver) (500051) ERROR
processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQL Exception: Error running query:
org.apache.spark.sql. AnalysisException: org.apache.hadoop.hive.ql.metadata. HiveException: at least one colúmn must be specified for the table at
org.apache.spark.sql.hive.thriftserver. SparkExecuteStatement Operation.org$apache $spark$sql$hive$thriftserver $SparkExecuteStatement Operations Sexecute

Hi @harshith,

Googling around, looks like these errors tends to appear when datatype or columns does not match.

If your data can contain any characters like commas or new lines, I would recommend you changing the default values for the quoting mode and setting different values there, since they can corrupt the CSV.

@alex.sanchez went through the data and datatype, everything seems to be fine. tried with one of the record to insert it manually on databricks and its getting inserted.

is their a way to move those csv files which did not get processed(staging layer) to another folder and retry the same operation of inserting into databricks in streamsets?

No, currently we don’t have a way to do that :(

To give extra context, we are trying to copy the data to a temporary table and from there we do the merge into the destination table, for some reason, we are struggling to copy into the temporary table.

Reply

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded