We built this pipeline to compare the images received from facebook and twitter to determine if the 2 different profiles belongs to same customer ordifferent.
We have a business case where we pull the social media comments from different users received from facebook and twitter. Apart from carrying out the sentiment analysis, we also need to know if same customer is being vocal in multiple platforms. Since customers normally don’t share their PII data, when we pull comments along with profile images from facebook and twitter, we store the images and consider the image matching as one of the criteria to determine the probability score of two comments received from different platfroms belongs to the same person or not.
Before running the above pipeline, we have the pre-processed data with some probability score where the name, city, age, sentiments are already matched to come up with the best set of result to compare that is available in a database.
In the above pipline,
- the origin is a MySQL JDBC origin - metadata information of already pulled data from facebook and twitter are stored here such as facebook ID, facebook Image Id, twitter Id, twitter Image ID.
- In the expression evaluator, we create the S3 URL path for the facebook image file and twitter image file that we have stored in S3 bucket.
- From the http client processor - we call the API service for AWS Image Rekognition that takes the S3 image paths for facebook and twitter and return the matching result (0 or 1).
- Based on the matching result (0 or 1), the stream selector ignores the none-matched IDs and only store the matched-IDs in the analytics DB for next set of processing.