Question

SOOLDIR_01 & SPOOLDIR_35 in Hadoop FS standalone

9 months ago
11 July 2023
9 replies
82 views

msc
Fan
4 replies

Hi,

I’m using Hadoop FS standalone as origin in the pipeline. With read order last modified timestamp
The file directory is : /user/msc/* File name pattern is : *

Under msc there are multiple folders and in these folders the hadoop will read all the files present, some functions are ran on those file and move the files to some other locations. The pipeline is working fine but sometimes I get a error like SPOOLDIR_01- failed to process file. Even though the file is read and processed , I’m getting this error.

I’m also getting a error like Running error : SPOOLDIR_35- spool directory runner failed reason java.io.Filenotfoundexception: file does not exist. After this the pipeline restarts itself.

Please help me out, if anyone know the reason.

Thanks,
Madhusudan

9 replies

Userlevel 2

Sanjeev
StreamSets Employee
50 replies
9 months ago
12 July 2023

@msc “some functions are ran on those file and move the files to some other locations” - is this outside of the streamsets pipeline? The issue could be happening due to files being moved by an external process while it was also being queued by SDC to process

msc
Author
Fan
4 replies
9 months ago
12 July 2023

@Sanjeev in the pipeline itself, i have used shell block that calls some script. those script move the files from that folder

Userlevel 2

Sanjeev
StreamSets Employee
50 replies
9 months ago
16 July 2023

@msc are you using the shell script to archive the files after processing? I’m asking because you can configure that from within the pipeline from the ‘Post Processing’ tab and it will be a much cleaner solution

msc
Author
Fan
4 replies
9 months ago
17 July 2023

@Sanjeev yes i’m using shell script on the shell block. Those shell script read those file and perform some operations like moving and other operations

Userlevel 2

Sanjeev
StreamSets Employee
50 replies
9 months ago
17 July 2023

@msc if you are using the shell action to move/archive the files after the Hadoop FS origin reads them, then it’s better to use the archive options available on ‘Post Processing’ tab for the Hadoop FS origin as it eliminates the possibility of moving a file before it is completely processed by the origin and that’s quite possibly the issue you are running into.

msc
Author
Fan
4 replies
9 months ago
18 July 2023

@Sanjeev , can u give more info on Post processing like whats the use case

Userlevel 2

Sanjeev
StreamSets Employee
50 replies
9 months ago
18 July 2023

@msc please refer to step #4 in our docs. Also, perhaps sharing your pipeline will help me understand what you are trying to accomplish

msc
Author
Fan
4 replies
9 months ago
24 July 2023

@Sanjeev if i use the post processing, the files will be moved or deleted, but i don’t need this to happen.
The HDFS searches for files, using filename and path from hadoop as input to the shell, i’m moving the files form shell script

Userlevel 2

Sanjeev
StreamSets Employee
50 replies
9 months ago
25 July 2023

@msc I was suggesting to use ‘Post Processing’ option only if you were using the shell script for that. If you don’t want to move/archive the files after processing then that’s the default behavior. Again, it’s difficult to advice further without more details on the use-case

Reply

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded