Skip to main content

Solving Fun Data problem with Streamsets transformer

  • February 16, 2022
  • 0 replies
  • 41 views

Rishi
StreamSets Employee
Forum|alt.badge.img

In this pipeline we solve some fun data problem with the help of  StreamSets Transformer . Transformer execution engine runs data pipelines on Apache Spark. We can run this pipeline on any spark cluster type

 

Problem Statement:  Find the average number of friends for each age and sort them in ascending order.

We are given fake friend dataset of social networking platforms in CSV file format is stored in Google cloud storage 

id, Name, Age, Number of Friends
0,Will,33,385
1,Jean-Luc,26,2
2,Hugh,55,221
3,Deanna,40,465
4,Quark,68,21
5,Weyoun,59,318
6,Gowron,37,220
7,Will,54,307