Use Case: Receive Device data from various machines around the world either using (python scripts/Go lang/MQTT/Kafka/Files). All the data is landed on MQTT Topics. Streamsets pipeline will read the data from the MQTT Topics.
Once device data is received from the MQTT Topics, will get more machine details (like Country/City/longitude/Latitude) using Geo IP.
Raw data will be stored in S3.
Processed Data will be stored into Databricks delta lake and elasticsearch for analysis.
Sample analysis:
Count all devices for a particular country and map them.
Sample Data:
battery_level,c02_level,cca2,cca3,cn,device_id,device_name,humidity,ip,latitude,lcd,longitude,scale,temp,timestamp
1,234,US,USA,United States,1,meter-gauge-abcdef,10,111.161.225.1,38,green,-97,Celsius,14,1458444054093
Complete Pipeline: