Question

Seeking Assistance with Column Renaming and SDC Field Update in StreamSets Pipeline. SDK

  • 26 September 2023
  • 6 replies
  • 48 views

Hello,

I am currently running a pipeline that reads data from Oracle Bulkload as the source and utilizes JDBC Producer for PostgreSQL as the destination.

Now, I am facing an issue where I need to rename a column from 'FIRST_NAME' to 'second_name' in the destination table, which is in PostgreSQL. I've attempted the following methods:

1. Using Oracle Bulkload with a Field Renamer stage followed by JDBC Producer.
   - Error: JDBC_07 - Invalid column mapping from field '/Second_name' to column '/first_name'.

 

2. Using JDBC Consumer Table with a Field Renamer stage followed by JDBC Producer.
   - Error: JDBC_07 - Invalid column mapping from field '/Second_name' to column '/first_name'.

 

3. Using JDBC Consumer Table with a Field Renamer stage followed by JDBC Producer.
   - Error: com.streamsets.pipeline.api.base.OnRecordErrorException: JDBC_90 - Record doesn't have any columns for table 'table1'.
   - Note: For option 3, I left the Field to Column Mapping empty in JDBC Producer.

 

………………………………………………………………………….

 

  • Additionally, within the same pipeline, I attempted to incorporate the following Python SDK script to investigate the possibility of adding and altering values:

postgres_jdbc_table.field_to_column_mapping= [{'columnName' : 'SECOND_name', 'sdcField' : '/FIRST_NAME', 'dataType' : 'STRING'}]

  • While this, script successfully updated all the columns within the pipeline, but  the SDC field remained unaltered. 

 

1: I'm seeking guidance on what might be causing these issues and how to properly handle the error message stating, "Record doesn't have any columns for the table in the destination."

2: Also If anyone has insights into how to update the SDC field as well, your guidance would be greatly appreciated.

Thank you for any assistance provided :)


6 replies

Userlevel 5
Badge +1

@swapniel22 

 

can you please try to rename the column as given below.

 

For sending data to DB using jdbc producer , please make the column name should match with your destination table.

Please let me know if you are still facing issues.

 

Thanks & Regards

Bikram_

Userlevel 5
Badge +1

@swapniel22 

I will check on SDC field how to rename it and let you know on it.

Thanks & regards

Bikram_

Userlevel 5
Badge +1

@swapniel22 

kindly provide some sample data ,so i can try to reproduce the issue in my environment.

Hi Bikram

Thank you, now  from Streamsets UI its working. Field name are getting changed.

 

………………………………………..

But when I tried using Python SDK with various options . below is just one 

postgres_jdbc_table.schema_name=''

postgres_jdbc_table.field_to_column_mapping= [{'columnName' : 'first_name', 'sdcField' : '/FIRST_NAME', 'dataType' : 'STRING'}]

  • It still shows same error i.e VALIDATION_0008 - Invalid configuration: 'sdcField'

Regards,

Swapnil….

Userlevel 5
Badge +1

@swapniel22 

can you please give a try with below code snippet 

 

pipeline_builder = PipelineBuilder()
pipeline_builder.add_stage('Dev Raw Data Source')  # Replace with your source stage
pipeline_builder.add_stage('Field Renamer').set_attributes(
    renames=[{"fromField": "old_field_name", "toField": "new_field_name"}]
)
 

Hi Bikram,

Tried the above code but got an error stating below.

File "/Users/xxxx/sdk /just_demo.py", line 1, in <module>
    from streamsets.sdk import ControlHub, PipelineBuilder

ImportError: cannot import name 'PipelineBuilder' from 'streamsets.sdk'

 

  • Than  just removed the field_to_column_mapping from the stage i.e jdbc_producer.set_attributes and instead just included field_renamer.set_attributes as below and the fields got renamed.
     

Used the same fields rom Streamsets UI for the Field Renamer Stage into the Python SDK .

  • field_renamer.set_attributes(fields_to_rename = [{'fromSourceFieldExpression': '/FIRST_NAME', 'toTargetFieldExpression': '/first_name'}, {'fromSourceFieldExpression': '/CREATED_AT', 'toTargetFieldExpression': '/created_at'}])

But will try again importing pipelinebuilder. I might would have missed to add something here.

Thank you very much for the assist.

Reply