We are processing S3 files with the batch size of 1000, the output also we are planning to store in S3 only. But we have he input file of 10000 records so we are seeing 10 output files in s3. As per client requirement we need to create a single file.
Is there a way to create a single S3 output file from Streamsets.
Best answer by Rishi
gowrinadh wrote:
@Rishi When I use the whole file format, it is not allowing to use the individual records. I need to read each record and make a call to the webservice using HTTP client Stage.
Yes, with Whole File Data format you cannot modify individual record. I was assuming your pipeline like S3 → S3.
Other workaround I can think of :
First writing to a local file( Local Fs) this will result in single file , then writing to S3 via Whole File.
Hii @gowrinadh , Assuming you have single source file with 10000 record.
You can use the whole file data format to transfer entire files from an origin system to a destination system ( i.e S3 to S3 in your case) . With the whole file data format, you can transfer any type of file. and this will result in one file as you desired.
Please configure Whole File data format at both origin and destination.
@Rishi When I use the whole file format, it is not allowing to use the individual records. I need to read each record and make a call to the webservice using HTTP client Stage.
@Rishi When I use the whole file format, it is not allowing to use the individual records. I need to read each record and make a call to the webservice using HTTP client Stage.
Yes, with Whole File Data format you cannot modify individual record. I was assuming your pipeline like S3 → S3.
Other workaround I can think of :
First writing to a local file( Local Fs) this will result in single file , then writing to S3 via Whole File.