Question

Where can I find Transpose processor documentation?


Hello,

I am looking for documentation for the transpose processor in transformer pipelines.

I searched here : https://docs.streamsets.com/portal/platform-transformer/latest/search.html?searchQuery=transpose but did not find it.

Can someone share the link to the relevant documentation?

-Dhanashri

 


10 replies

Userlevel 5
Badge +1

@Dhanashri_Bhate 

There are no document for parser in the Streamsets page.

Just for your information Transpose is used to convert data from Row to column or column to row.

Please provide me your use cases with input details and expected output your expecting for , then i can try to help you on it.

 

Please refer to the below video how to transpose the data . It will help you to the similar configuration in the Streamsets pipeline.

 

https://www.youtube.com/watch?v=GOVIwGDIuxc

 

Thanks & Regards

Bikram_

Thank you Bikram.

The example in the video you shared is similar to the pivot example in the Transformer for Snowflake. I cannot use Transformer for Snowflake though, as I need to use a different spark cluster for this requirement.

My data is like this: (Population field has a numeric values indicating the population at the end of that decade)

COUTRY    DECADE        POPULATION
IND        1991-2000        n1 
IND        2001-2010        n2
IND        2011-2020        n3
USA        1991-2000        p1
USA        2001-2010        p2
USA        2011-2020        p3

.

.

.

I want to transpose this as shown below to find out the countries for which the population has been steadily decreasing over the last 3 decades.
COUNTRY    1991-2000    2001-2010    2011-2020
IND        n1        n2        n3
USA        p1        p2        p3

Unfortunately I was unable to figure out how to set up the parameter values in the StreamSets  transpose stage (in the screenshot shared earlier) to get desired result.

Figured out how to set the configuration and have got transpose data now. Closing the thread. 

Userlevel 5
Badge +1

@Dhanashri_Bhate 

please provide the details for the future reference.

 

Group By fields : Country

Transpose Field: Decade

Distinct values for the transposed field: "1991-2000","2001-2010","2011-2020" 

Aggregate Field: Population

Aggregate Function : SUM ( I don't really need aggregate function here, because I know there is only one value for each decade.But had to choose it, as it is mandatory. ) 

@Dhanashri_Bhate

please provide the details for the future reference.

 

Sure, I was just testing it out a bit. But here it is:

Group By fields : Country

Transpose Field: Decade

Distinct values for the transposed field: "1991-2000","2001-2010","2011-2020" 

Aggregate Field: Population

Aggregate Function : SUM ( I don't really need aggregate function here, because I know there is only one value for each decade.But had to choose it, as it is mandatory. ) 

Userlevel 5
Badge +1

@Dhanashri_Bhate 

I think I was working on wrong input data format ,so the result was not coming as expected.

 if you don’t mind kindly share the pipeline to check where i was doing wrong and so I can correct it in my pipeline.

 

Thanks & Regards

Biikram_

@Dhanashri_Bhate

I think I was working on wrong input data format ,so the result was not coming as expected.

 if you don’t mind kindly share the pipeline to check where i was doing wrong and so I can correct it in my pipeline.

 

Thanks & Regards

Biikram_

 Hello,

For some reason I am unable to attach the exported pipeline, will try to upload it in a separate comment.

Meanwhile, you can use the following details - 

SQLs I use to create test table: 

create or replace TABLE TRAINING.PUBLIC.POPULATION (
    COUNTRY VARCHAR(16777216),
    DECADE VARCHAR(16777216),
    POPULATION NUMBER(38,0)
);

insert into population values ( 'IND','1991-2000', 40000);
insert into population values ( 'IND','2001-2010', 65365);
insert into population values ( 'IND','2011-2020', 73454);
insert into population values ( 'USA','1991-2000', 43724);
insert into population values ( 'USA','2001-2010', 65467);
insert into population values ( 'USA','2011-2020', 54324);

 

Transpose Config:

Output table: 

 

 

Here’s the exported pipeline.

Userlevel 5
Badge +1

@Dhanashri_Bhate 

Thanks for the details.

Reply