Analyzing chess games with batch and streaming pipelines

3 years ago
February 16, 2022
1 reply
137 views

dima
StreamSets Employee

I wanted to have a way to send chess game information from lichess.org (a popular chess server) to Elasticsearch and Snowflake to let me visualize statistics (e.g., how often does World Champion GM Magnus Carlsen lose to GM Daniel Naroditsky?) as well as to generate reports in Snowflake (e.g., when Magnus does lose, which openings does he tend to play?). I ended up accomplishing this with two pipelines:

This pipeline is a batch pipeline that ingests data using the lichess REST API and pushes it to a Kafka topic. It uses an endpoint that allows for pulling down all games by username, so I’ve parameterized the username portion and then can use Control Hub jobs to pick which specific users I want to study.

This pipeline consumes game data from my Kafka topic, does some basic cleanup of the data (adds a field for the name of the winner rather than the color of the winning pieces and converts timestamps from long to datetime) as well as some basic enrichment (it adds a field that calculates how long the game took) before sending it off to Elasticsearch and Snowflake. Since this pipeline is streaming, it runs all the time, ready to process new data when I stumble upon interesting games or players.

R

ricsamma
Fan
1 year ago
March 28, 2024

this is so awesome! have you continuisly developing this pipeline?

Reply

Related topics

I have an the same Webroot warning coming up on our new re-branded website. Can someone help me to get this verified?icon

How do I get Webroot warning off my website? NMGYachts.comicon

Webroot SecureAnywhere wins PCMag Editors Choice AGAIN!

Cyber Threats in the Time of COVID-19

Retired Military Policeman that’s at my whits end.

Tags

Couldn't find what you're looking for?

Sign up

Social Login

Login to the community

Social Login

Scanning file for viruses.

This file cannot be downloaded