Skip to main content

I have created a pipeline with a parameter ‘date’ that will select data from hive for that particular date. 

Is there a way, that if this wasnt populated, to select max(date) from the hive table? 

 

The idea is that each time this is scheduled it will pick up latest data in the table, and if need be the parameter date will be set if a certain date needs to be rerun for whatever reason

 

Thanks in advance

@JB1 

 

May i know below query has been tested in jdbc query consumer , if not can you please try it and chekc if it helps.

SELECT * FROM my_table WHERE date_column = (SELECT MAX(date_column) FROM my_table ORDER BY date_column DESC LIMIT 1);

Also try below steps .

 

 

Step 1: Retrieve all data from the Hive database.

Step 2: Utilize the expression evaluator to invoke times:now() and acquire the current time, converting it into the appropriate date-time format. Store this value in a variable, for instance, named "last_changed_datetime."

Step 3: Implement the streamselector processor to check whether the value of the DB date is null and whether it matches the calculated timestamp. For instance, if the DB_DateTime is null, or if DB_DateTime equals last_change_datetime, then proceed with the latest data.

By following these steps, you can efficiently manage data retrieval and selection based on the specified criteria.

If you are still having issues , please let me know .

can you please provide me the pipeline ,so i can help you on it. 

Provide me the sample pipeline ,need to check on it.

 

Thanks & Regards

Bikram_


Reply