I am on Data Collector 3.21.x

I observed the following and I would like to share with the community. Unsure whether this is a bug OR a gap that needs to be addressed in the documentation. 


My pipeline is simple: Reads from a SQL Sever DB table and writes to AWS S3.

I use a Origin stage followed by a “Field Renamer” stage.


Scenario 1: 

When I use the JDBC multi-table consumer origin, i.e., I read all the fields from the source table I do not need to use the “REPLACE” option in “Rename” tab within the “Field Renamer” stage. Refer picture below,


Pipeline Design



Field Renamer Stage Config.




Scenario 2:

When I use the JDBC Query Consumer stage as my origin and I issue a SELECT statement that is NOT SELECT * FROM …. i.e., I only select a few columns. Example: Select <column1> from <table1>, I have to use the REPLACE option in “Target Field Already Exists” otherwise my pipeline fails.


Pipeline Design


com.streamsets.pipeline.api.base.OnRecordErrorException: FIELD_RENAMER_01 - Target Fields '/<column1>' cannot be overwritten for record 'SELECT <column1> FROM XXXX.dbo.<table1>::rowCount:0'
    at com.streamsets.pipeline.stage.processor.fieldrenamer.FieldRenamerProcessor.process(
    at com.streamsets.pipeline.api.base.SingleLaneRecordProcessor.process(
    at com.streamsets.pipeline.api.base.SingleLaneProcessor.process(
    at com.streamsets.pipeline.api.base.configurablestage.DProcessor.process(
    at com.streamsets.datacollector.runner.StageRuntime.lambda$execute$2(
    at com.streamsets.datacollector.runner.StageRuntime.execute(
    at com.streamsets.datacollector.runner.StageRuntime.execute(
    at com.streamsets.datacollector.runner.StagePipe.process(
    at com.streamsets.datacollector.runner.preview.PreviewPipelineRunner.lambda$runSourceLessBatch$0(
    at com.streamsets.datacollector.runner.PipeRunner.acceptConsumer(
    at com.streamsets.datacollector.runner.PipeRunner.forEachInternal(
    at com.streamsets.datacollector.runner.PipeRunner.executeBatch(
    at com.streamsets.datacollector.runner.preview.PreviewPipelineRunner.runSourceLessBatch(
    at com.streamsets.datacollector.runner.preview.PreviewPipelineRunner.runPollSource(
    at com.streamsets.datacollector.execution.preview.sync.SyncPreviewer.start(
    at com.streamsets.datacollector.execution.preview.async.AsyncPreviewer.lambda$start$1(
    at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.lambda$call$0(
    at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$
    at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$SafeCallable.lambda$call$0(
    at com.streamsets.pipeline.lib.executor.SafeScheduledExecutorService$
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(
    at java.util.concurrent.ScheduledThreadPoolExecutor$
    at com.streamsets.datacollector.metrics.MetricSafeScheduledExecutorService$
    at java.util.concurrent.ThreadPoolExecutor.runWorker(
    at java.util.concurrent.ThreadPoolExecutor$


Here <column1> and <table1> refers to one column in my SQL Server DB Table.



Field Renamer Stage Config.


To overcome the above error, I had to set the REPLACE option as shown below,



Let me know if my understanding is correct. 


Not sure whether this is a bug or a gap in the documentation. Depending on whether you do a SELECT * or SELECT only a subset of columns you need to tweak the “Field Renamer” stage when your source stage is “JDBC Query Consumer”.

Be the first to reply!