now I’m working in a transformer, I need to join two tables one below another table like a union.
The union will work only both tables have same column name. But in my scenario I have only matching two column remaining non-matching columns.
Example
table -1
table-2
Output
i want the output like this.
Thanks
Tamilarasu
@tamilarasup You can use `Spark SQL Expression` stage to write SQL query to achieve the expected.
My input:
Query(you can try something like below):
Hope this helps.
@tamilarasup You can use `Spark SQL Expression` stage to write SQL query to achieve the expected.
My input:
Query(you can try something like below):
Hope this helps.
Hi, @Pradeep Thank you for your response,
The Spark Sql expression is only accept 1 input lane, if we tried two means, it generate an error message only.
Same as Spark sql also, it accepts 1 input lane
And I Tried another way, i Created the columns in both tables which are not matched.
After that i use union processor
lane one received input 5 records and Lane 2 received 5 records but the output will go
Separately after 1 plane is received in files, then plane 2 will start.
the below pic shows only first lane records go to the output of next stage
The output of the file will store two files separately.
Thanks Tamilarasu
@tamilarasup You might want to try “Join” stage for doing full outer join and then try see if either Spark SQL Expression or Filter helps to select the columns output as needed.
Output likely will look like as below after full outer join.
hi @Pradeep
I need a horizontal type or merge like
not a vertical Type.
we use full join means I need to concat more columns and need to drop duplicate columns.So, Is there any other solution for solve this.
Thanks
Tamilarasu
Hi @tamilarasup, Query have shared as my first response is an idea how you could achieve the expected output. It is given based on your expected output as below?
we use full join means I need to concat more columns and need to drop duplicate columns.So, Is there any other solution for solve this.
This is most likely be not true for Spark. If you have disabled Ludicrous mode for the pipeline, underlying Spark engine owns the plan optimization which probably will only fetch the columns filtered during full outer join.