Solved

How to configure kinesis shard


Hi,

I use the kinesis consumer on DC 5.3. Configuration shows no errors. When I preview the pipeline everything works but no records are read. This seems to be due to that only one shard contains the actual data. How can I configure the shard id to read from.

 

Thanks,

 

Marco

icon

Best answer by AkshayJadhav 2 April 2023, 20:10

View original

10 replies

Userlevel 4
Badge

Hello @mstoffel - Thank you for reaching out to the StreamSets Community. As per my understanding, I do not see any configuration that will fetch the data from the particular shard however would you please try with the “maxConnections” to 2 and let us know if that works?

 

Userlevel 4
Badge

Please keep the connection = total shards. 

Hi,

tried that, I still get nothing.

 

Kind Regards,

 

Marco

Userlevel 4
Badge

Hello mstoffel - Thank you for the update. For a testing purpose, I have created a 3 shard stream and inserted 1 records and was able to consume without any issue. I had to change the initial position to TRIM_HORIZON which means read message from the beginning of the stream.

Could you give it a try with the above configuration and see if that works? 

I use that exact setting. I also use TimeHorizon when I’m testing within aws kinesis eventViewer. It’s the only way to get non live events. In StreamSets i get just nothing. And I think the connection is correct, sice I don’t get any error on pipeline preview.There are 4 (0-3) Shards in my stream. Data is send on Shard1.

 

kind regards,

 

Marco 

Userlevel 4
Badge

Hello @mstoffel  - What is the strategy that you are using ExplicitHashKey or PartitionKey while sending data to the Kinesis?

Hi, this is a received payload:

 

{

"Records": [

{

"kinesis": {

"kinesisSchemaVersion": "1.0",

"partitionKey": "/f8mlm56bg6/mkhtcm5781",

"sequenceNumber": "49639146678182334589956014474114599020442513762966044690",

"data": "eyJ0aW1lc3RhbXAiOiIyMDIzLTAzLTMxIDEwOjI3OjMwLjk1NSIsImV2ZW50SWQiOiI1NGYzNWNmNGRkZWUtMjQxIiwidmVyc2lvbiI6IjEuMCIsInByb2plY3REaXNwbGF5TmFtZSI6ImN1bXVsb2NpdHktcmVtb3RlLW1vbml0b3JpbmciLCJzaXRlRGlzcGxheU5hbWUiOiJTaXRlIDEiLCJhc3NldERpc3BsYXlOYW1lIjoiV2luZG1pbGwiLCJzZW5zb==",

"approximateArrivalTimestamp": 1680258458.496

},

"eventSource": "aws:kinesis",

"eventVersion": "1.0",

"eventID": "shardId-000000000001:49639146678182334589956014474114599020442513762966044694",

"eventName": "aws:kinesis:record",

"invokeIdentityArn": "arn:aws:iam::572053383:role/service-role/sendMeasurementToC8y",

"awsRegion": "us-east-1",

"eventSourceARN": "arn:aws:kinesis:us-east-1:572086383:stream/monitorn-data"

}

]

}

Looks like partitionKey

Userlevel 4
Badge

@mstoffel - Thank you for sharing the information. We actually just implement a KCL interface provided by AWS in the streamSet framework. As per my findings, it is not possible from StreamSets side however could you please try adding configuration partitionKey as below and start the pipeline with rest origin and start option?

/f8mlm56bg6/mkhtcm5781

This doesn seem to be a valid configuration:

 

 

Userlevel 4
Badge

@mstoffel - Thank you for testing. It seems fetching data from particular shard is not possible with Kinesis origin at the moment. However I will check with the engineering team to raise a feature request.

Thank you.

Reply