Skip to main content

Hi,

We are processing S3 files with the batch size of 1000, the output also we are planning to store in S3 only. But we have he input file of 10000 records so we are seeing 10 output files  in s3. As per client requirement we need to create a single file.

Is there a way to create a single S3 output file from Streamsets.

Hii @gowrinadh ,  Assuming you have single source file with 10000 record.

You can use the whole file data format to transfer entire files from an origin system to a destination system ( i.e S3 to S3 in your case) . With the whole file data format, you can transfer any type of file. and this will result in one file as you desired.

Please configure Whole File data format at both origin and destination.


@Rishi When I use the whole file format, it is not allowing to use the individual records. I need to read each record and make a call to the webservice using HTTP client Stage.


@Rishi When I use the whole file format, it is not allowing to use the individual records. I need to read each record and make a call to the webservice using HTTP client Stage.

Yes, with Whole File Data format you cannot modify individual record. I was assuming your pipeline like S3 → S3. 

Other workaround I can think of :

  • First writing to a local file( Local Fs) this will result in single file  , then writing to S3 via Whole File. 

@Rishi 

Thanks for the reply, we decided to go with same approach.


Reply