Question

Data copy mismatch for same s3 bucket

  • 3 January 2022
  • 5 replies
  • 77 views

HI,

 

I have tried copy all the files from one folder to another folder with in same s3 bucket using streamsets job. But I am seeing more files copied into destination folder compared to source folder(like in source folder if 7 files are there, but in destination folder I am seeing more than 7 like … 8 or 10 or 12). But this issue is coming only for first time of the day. If I run same job again for the day I am seeing record count matching between source and destination. Can any one help me on this issue.

 

Thanks

Murali


5 replies

Userlevel 1

Hi Murali, 

Please verify that every time you’re copying the files, the Data Format is set to Whole File mode. 

Thanks,

bob

HI Bob,

Thanks for the reply. I will check with Whole File mode and will get back to you.

 

Thanks

Murali.

 

HI Bob,

when I run job with Whole File type getting below error.

Error happened when processing record    
com.streamsets.pipeline.stage.lib.hive.exceptions.HiveStageCheckedException: HIVE_19 - Unsupported Type: FILE_REF

 

Thanks

Murali.

 

HI ,

when I run job with Whole File type getting below error.

Error happened when processing record    
com.streamsets.pipeline.stage.lib.hive.exceptions.HiveStageCheckedException: HIVE_19 - Unsupported Type: FILE_REF

 

Any one can help me on this issue.

 

Thanks

Murali.

Userlevel 1

Hi Murali, 

Form the error you pasted, it looks like the destination you’re using is Hive?  From the initial problem description, it seemed you were copying from S3 to S3?  Hive is record-based and does not support Whole File.   To copy complete files, and not process them in the pipeline, you might want to use HDFS as the destination. 

If your organization has an Enterprise support contract, please open a support ticket.

Thanks,

bob  

Reply