Skip to main content
Solved

Producing single S3 output file

  • January 19, 2022
  • 4 replies
  • 109 views

Hi,

We are processing S3 files with the batch size of 1000, the output also we are planning to store in S3 only. But we have he input file of 10000 records so we are seeing 10 output files  in s3. As per client requirement we need to create a single file.

Is there a way to create a single S3 output file from Streamsets.

Best answer by Rishi

gowrinadh wrote:

@Rishi When I use the whole file format, it is not allowing to use the individual records. I need to read each record and make a call to the webservice using HTTP client Stage.

Yes, with Whole File Data format you cannot modify individual record. I was assuming your pipeline like S3 → S3. 

Other workaround I can think of :

  • First writing to a local file( Local Fs) this will result in single file  , then writing to S3 via Whole File. 
View original
Did this topic help you find an answer to your question?

4 replies

Rishi
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 96 replies
  • January 20, 2022

Hii @gowrinadh ,  Assuming you have single source file with 10000 record.

You can use the whole file data format to transfer entire files from an origin system to a destination system ( i.e S3 to S3 in your case) . With the whole file data format, you can transfer any type of file. and this will result in one file as you desired.

Please configure Whole File data format at both origin and destination.


  • Author
  • Fan
  • 2 replies
  • January 20, 2022

@Rishi When I use the whole file format, it is not allowing to use the individual records. I need to read each record and make a call to the webservice using HTTP client Stage.


Rishi
StreamSets Employee
Forum|alt.badge.img
  • StreamSets Employee
  • 96 replies
  • Answer
  • January 21, 2022
gowrinadh wrote:

@Rishi When I use the whole file format, it is not allowing to use the individual records. I need to read each record and make a call to the webservice using HTTP client Stage.

Yes, with Whole File Data format you cannot modify individual record. I was assuming your pipeline like S3 → S3. 

Other workaround I can think of :

  • First writing to a local file( Local Fs) this will result in single file  , then writing to S3 via Whole File. 

  • Author
  • Fan
  • 2 replies
  • February 2, 2022

@Rishi 

Thanks for the reply, we decided to go with same approach.


Reply