Skip to main content

Let us say we have a dataset like below. Requirement is to use JDBC Query Consumer Origin to load id, title, ratings based on whatever is the latest from created_date or updated_date. It means we want to do incremental load but not on primary key ‘id’.

 

popular_movies

Whenever the pipeline runs it should pull in the updated ratings with other fields. In order to incrementally load data based on our requirement we have written query like below.

SQL Query

Query is failing by saying Invalid column name ‘offset_date’ though it is valid per SQL when we do validation. 

Solution: We can derive the offset column by writing it in inner query and projecting the needed columns from the inner query’s alias.

select * from
(SELECT id, title, ratings, COALESCE(created_date, updated_date) as offset_date
FROM popular_movies ) as pm
WHERE pm.offset_date > ${OFFSET} ORDER BY pm.offset_date

This kind of solution is also helpful in cases where we do not have a primary key and need to derive a key using multiple column combinations(composite key) for incremental loads.

Note: Invalid column name error can also occur if incorrect or non-existent column is referred in the query.


Reply