Skip to main content

In this pipeline we solve some fun data problem with the help of  StreamSets Transformer . Transformer execution engine runs data pipelines on Apache Spark. We can run this pipeline on any spark cluster type

 

Problem Statement:  Find the average number of friends for each age and sort them in ascending order.

We are given fake friend dataset of social networking platforms in CSV file format is stored in Google cloud storage 

id, Name, Age, Number of Friends
0,Will,33,385
1,Jean-Luc,26,2
2,Hugh,55,221
3,Deanna,40,465
4,Quark,68,21
5,Weyoun,59,318
6,Gowron,37,220
7,Will,54,307

 

 

 

Be the first to reply!

Reply