Question

MERGE

1 year ago
April 24, 2023
7 replies
150 views

tamilarasup
Discovered Fame
11 replies

I have 2 tables

table one have 3 columns- ID, NAME,LOCATION.

table two have 3 columns-ID,LOCATION, EMPLOYMENT TYPE

I need to merge the two data table as one after another not a join and expect output as combine both table in single table like

ID, NAME,LOCATION,EMPLOYMENT TYPE

AkshayJadhav
StreamSets Employee
101 replies
1 year ago
April 25, 2023

Hello @tamilarasup - Thank you for reaching out to the StreamSets community.

Could you please try using the Field Remover stage in between the pipeline and keep the fields which are necessary?

Thank you - Akshay Jadhav

tamilarasup
Author
Discovered Fame
11 replies
1 year ago
April 25, 2023

Hi AkshayJadhav Thank you for your reply

now I’m working in a transformer, I need to join two tables one below another table like a union.

The union will work only both tables have same column name. But in my scenario I have only matching two column remaining non-matching columns.

Example

table -1

table-2

Output

i want the output like this.

Thanks

Tamilarasu

Pradeep
StreamSets Employee
48 replies
1 year ago
April 26, 2023

@tamilarasup You can use `Spark SQL Expression` stage to write SQL query to achieve the expected.

My input:

Query(you can try something like below):

Hope this helps.

tamilarasup
Author
Discovered Fame
11 replies
1 year ago
April 26, 2023

Pradeep wrote:

@tamilarasup You can use `Spark SQL Expression` stage to write SQL query to achieve the expected.

My input:

Query(you can try something like below):

Hope this helps.

Hi, @Pradeep Thank you for your response,

The Spark Sql expression is only accept 1 input lane, if we tried two means, it generate an error message only.

Same as Spark sql also, it accepts 1 input lane

And I Tried another way, i Created the columns in both tables which are not matched.

After that i use union processor

lane one received input 5 records and Lane 2 received 5 records but the output will go

Separately after 1 plane is received in files, then plane 2 will start.

the below pic shows only first lane records go to the output of next stage

The output of the file will store two files separately.

Thanks Tamilarasu

Pradeep
StreamSets Employee
48 replies
1 year ago
April 26, 2023

@tamilarasup You might want to try “Join” stage for doing full outer join and then try see if either Spark SQL Expression or Filter helps to select the columns output as needed.

Output likely will look like as below after full outer join.

tamilarasup
Author
Discovered Fame
11 replies
1 year ago
April 26, 2023

hi @Pradeep

I need a horizontal type or merge like

not a vertical Type.

we use full join means I need to concat more columns and need to drop duplicate columns.So, Is there any other solution for solve this.

Thanks

Tamilarasu

Pradeep
StreamSets Employee
48 replies
1 year ago
April 26, 2023

Hi @tamilarasup, Query have shared as my first response is an idea how you could achieve the expected output. It is given based on your expected output as below?

we use full join means I need to concat more columns and need to drop duplicate columns.So, Is there any other solution for solve this.
This is most likely be not true for Spark. If you have disabled Ludicrous mode for the pipeline, underlying Spark engine owns the plan optimization which probably will only fetch the columns filtered during full outer join.

Reply

Related topics

Label titlesicon

API Reference > Cards

Is it possible to crate a card with more than one label?icon

make inclusion of labels, without losing an existing label without cardicon

Add Label using APIicon

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded