Question

MERGE

  • 24 April 2023
  • 7 replies
  • 105 views

Hi 

I have 2 tables 

table one have 3 columns- ID, NAME,LOCATION.

table two have 3 columns-ID,LOCATION, EMPLOYMENT TYPE

I need to merge the two data table as one after another not a join  and expect output as combine both table in single table like

ID, NAME,LOCATION,EMPLOYMENT TYPE

 


7 replies

Userlevel 3
Badge +1

@tamilarasup You can use `Spark SQL Expression` stage to write SQL query to achieve the expected.

My input:

Query(you can try something like below):

Hope this helps.

Userlevel 4
Badge

Hello @tamilarasup - Thank you for reaching out to the StreamSets community. 

Could you please try using the Field Remover stage in between the pipeline and keep the fields which are necessary?

Thank you - Akshay Jadhav

Hi AkshayJadhav Thank you for your reply

 

now I’m working in a transformer, I need to join two tables one below another table like a union.

The union will work only both tables have same column name. But in my scenario I have only matching two column remaining non-matching columns.

 

Example

table -1

table-2

Output

i want the output like this.

 

Thanks

Tamilarasu

@tamilarasup You can use `Spark SQL Expression` stage to write SQL query to achieve the expected.

My input:

Query(you can try something like below):

Hope this helps.

Hi, @Pradeep Thank you for your response,

The Spark Sql expression is only accept 1 input lane, if we tried two means, it generate an error message only.

Same as Spark sql also, it accepts  1 input lane

 

 

And I Tried another way, i Created the columns in both tables which are not matched.

After that i use union processor

 

 lane one received input 5 records and Lane 2 received  5 records but the  output will go

Separately after 1 plane is received in files, then plane 2 will start.

 

the below pic shows only first lane records go to the output of next stage

The output of the file will store two files separately.

 

Thanks Tamilarasu

Userlevel 3
Badge +1

@tamilarasup You might want to try “Join” stage for doing full outer join and then try see if either Spark SQL Expression or Filter helps to select the columns output as needed.

Output likely will look like as below after full outer join.

 

hi @Pradeep 

I need a horizontal type or merge like

not a vertical Type. 

we use full join means I need to concat more columns and need to drop duplicate columns.So, Is there any other solution for solve this.

 

Thanks 

Tamilarasu

Userlevel 3
Badge +1

Hi @tamilarasup, Query have shared as my first response is an idea how you could achieve the expected output. It is given based on your expected output as below?

 

 

  1. we use full join means I need to concat more columns and need to drop duplicate columns.So, Is there any other solution for solve this.
  2. This is most likely be not true for Spark. If you have disabled Ludicrous mode for the pipeline, underlying Spark engine owns the plan optimization which probably will only fetch the columns filtered during full outer join.

Reply