Skip to main content

Hi,

 

I am using Oracle JDBC Origin in Transformer and having an issue to save down the data frame into a Hive table. Below is the error screenshot of the same.

However, data is saved onto a table when i run pipeline in preview mode. In preview mode, i can see all the data types are of string, decimal and date timestamp.

I have tried reading only one string field and still can see same error message. Any help/suggesting would be appreciated.

Transformer version is 3.13.

 

Hi @BharathiMaddela  What is the File Format you have selected in Hive Destination.

  • Could you please share the pipeline export ?
  • Also share the full error trace 

 


Hi Rishi,

 

I have tried text and Parquet format and overwriting table on each run.

 

Thanks,

Bharathi M


org.apache.spark.SparkException: Job aborted.	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)	at org.apache.spark.sql.hive.execution.SaveAsHiveFile$class.saveAsHiveFile(SaveAsHiveFile.scala:86)	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.saveAsHiveFile(InsertIntoHiveTable.scala:66)	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:195)	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:99)	at org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:88)	at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)	at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)	at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)	at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:465)	at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:444)	at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:400)	at com.streamsets.pipeline.spark.destination.hive.HiveDestination.write(HiveDestination.scala:82)	at com.streamsets.datatransformer.api.operator.Destination.measureAndWrite(Destination.java:32)	at com.streamsets.datatransformer.api.spark.SparkDestination.measureAndWrite(SparkDestination.java:47)	at com.streamsets.datatransformer.dag.BaseBatchDAGRunner$$anon$2.call(BaseBatchDAGRunner.scala:693)	at com.streamsets.datatransformer.dag.BaseBatchDAGRunner$$anon$2.call(BaseBatchDAGRunner.scala:687)	at com.streamsets.datatransformer.dag.BaseBatchDAGRunner$$anonfun$materialize$2$$anon$3.call(BaseBatchDAGRunner.scala:728)	at com.streamsets.datatransformer.dag.BaseBatchDAGRunner$$anonfun$materialize$2$$anon$3.call(BaseBatchDAGRunner.scala:724)	at java.util.concurrent.FutureTask.run(FutureTask.java:266)	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)	at java.util.concurrent.FutureTask.run(FutureTask.java:266)	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)	at java.lang.Thread.run(Thread.java:748)Caused by: org.apache.spark.SparkException: Job 1 cancelled 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1890)	at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1825)	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2077)	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2060)	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2049)	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:740)	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2081)	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:167)	... 38 more

I was expecting the full stack trace will have more details.

  1. Could you please modify the Log Level in the transformer pipeline to DEBUG and see if you get more details.
  2. Quick Test: Could you try changing the write mode to Overwrite complete existing Table , just for testing purpose to see if this helps

Also, I would recommend you to open support ticket if you are enterprise customer


Thanks Rishi,

I will try modifying the Log level information and see if i can find anything. I do have Overwrite complete existing Table on write mode.

 

Thanks


Hi Rishi,

 

Issue has been identified and it is to do with the drivers incompatibility. 


Reply