Skip to main content
Question

How to read files from s3 bucket using groovy?


himanshu1234567
Discovered Fame

Can someone provide working code of groovy to read files from s3 bucket?

8 replies

Sanjeev
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 53 replies
  • July 11, 2023

@himanshu1234567 any reasons you can’t use the S3 origin?


himanshu1234567
Discovered Fame

@Sanjeevrequirement is to use groovy_scripting only.


Sanjeev
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 53 replies
  • July 12, 2023

@himanshu1234567 would you be able to share more details on the use-case to understand why custom code is needed? The reason I’m advocating for using the built-in origin is because if you go the custom code route then you’ll need to write the code for necessary processing / event generation / errors record handling / offset tracking etc. S3 origin handles all of this automatically. 

 


himanshu1234567
Discovered Fame

@Sanjeev  can we apply sql query also in s3 origin because team wants to apply sql query to the data for filtering.


Bikram
Headliner
Forum|alt.badge.img+1
  • Headliner
  • 486 replies
  • July 12, 2023

@himanshu1234567 

To access data stored in an S3 origin as SQL queries, you will need to set up Athena in your S3 environment.

 


himanshu1234567
Discovered Fame

@Bikram ok that’s why they are forcing us to use groovy_scripting as origin.


Sanjeev
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 53 replies
  • July 16, 2023

@himanshu1234567 if the requirement is to query S3 data using SQL then you can use Athena as Bikram suggested. I’m not quite clear on why you want to use Groovy to do that from with-in StreamSets


Nitika
Fan
  • Fan
  • 1 reply
  • January 19, 2025

I have similar requirement . Kafka is source and i want to use groovy to read data from aws athena table. I am using streamsets daya collector.


Reply