The example in the video you shared is similar to the pivot example in the Transformer for Snowflake. I cannot use Transformer for Snowflake though, as I need to use a different spark cluster for this requirement.
My data is like this: (Population field has a numeric values indicating the population at the end of that decade)
COUTRY DECADE POPULATION IND 1991-2000 n1 IND 2001-2010 n2 IND 2011-2020 n3 USA 1991-2000 p1 USA 2001-2010 p2 USA 2011-2020 p3
.
.
.
I want to transpose this as shown below to find out the countries for which the population has been steadily decreasing over the last 3 decades. COUNTRY 1991-2000 2001-2010 2011-2020 IND n1 n2 n3 USA p1 p2 p3
Unfortunately I was unable to figure out how to set up the parameter values in the StreamSets transpose stage (in the screenshot shared earlier) to get desired result.
Figured out how to set the configuration and have got transpose data now. Closing the thread.
@Dhanashri_Bhate
please provide the details for the future reference.
Group By fields : Country
Transpose Field: Decade
Distinct values for the transposed field: "1991-2000","2001-2010","2011-2020"
Aggregate Field: Population
Aggregate Function : SUM ( I don't really need aggregate function here, because I know there is only one value for each decade.But had to choose it, as it is mandatory. )
@Dhanashri_Bhate
please provide the details for the future reference.
Sure, I was just testing it out a bit. But here it is:
Group By fields : Country
Transpose Field: Decade
Distinct values for the transposed field: "1991-2000","2001-2010","2011-2020"
Aggregate Field: Population
Aggregate Function : SUM ( I don't really need aggregate function here, because I know there is only one value for each decade.But had to choose it, as it is mandatory. )
@Dhanashri_Bhate
I think I was working on wrong input data format ,so the result was not coming as expected.
if you don’t mind kindly share the pipeline to check where i was doing wrong and so I can correct it in my pipeline.
Thanks & Regards
Biikram_
@Dhanashri_Bhate
I think I was working on wrong input data format ,so the result was not coming as expected.
if you don’t mind kindly share the pipeline to check where i was doing wrong and so I can correct it in my pipeline.
Thanks & Regards
Biikram_
Hello,
For some reason I am unable to attach the exported pipeline, will try to upload it in a separate comment.
Meanwhile, you can use the following details -
SQLs I use to create test table:
create or replace TABLE TRAINING.PUBLIC.POPULATION ( COUNTRY VARCHAR(16777216), DECADE VARCHAR(16777216), POPULATION NUMBER(38,0) );
insert into population values ( 'IND','1991-2000', 40000); insert into population values ( 'IND','2001-2010', 65365); insert into population values ( 'IND','2011-2020', 73454); insert into population values ( 'USA','1991-2000', 43724); insert into population values ( 'USA','2001-2010', 65467); insert into population values ( 'USA','2011-2020', 54324);