Skip to main content

The title sums up what I am looking for.

The context is that I have a pipeline with a column that is always NULL but the business owner wants it to be populated with an auto incrementing integer to use as a primary key in the Snowflake table destination. I tried permitting the column to accept NULLs in Snowflake while still having it configured to add an integer for each record, but it just uses the NULLs. Taking the field out of the pipeline of course fails due to the missing field.

I am hoping to avoid staging this data in another table and then using INSERT to get the auto increment if StreamSets has the capabilities to do this via regex or some other means.

Thank you in advance for your time to anyone that reads this.

I discovered a solution elsewhere online (I have since lost the URL and don’t recall, but know I did not create this). By inserting a Jython Evaluator and using this code:

 

# Increment CMPNY_LCTN_ID by 1 for each record
statea'CMPNY_LCTN_ID'] = 1

 

for record in records:
  try:
    record.valuer'CMPNY_LCTN_ID'] = sdc.stated'CMPNY_LCTN_ID']
    state 'CMPNY_LCTN_ID'] += 1
    output.write(record)
  except Exception as e:
    error.write(record, str(e))

 

I was able to get the desired result. That said, it will restart from 1 every time a batch is sent through so I had to increase the batch size to 10,000 to handle all the records in a single batch and not repeat the assigned ID. This works for us in this scenario due to the relatively small number of records at this time, but this is not an elegant or sustainable long-term solution.


Could you please refer to this? It might give you an idea.


Reply