IIM-DEVOPS - (demo) Real-time Twitter's hashtags analytics... fully managed by Google Cloud
Note: Servers may be down cuz I’m really poor.
Q: What’s this?
A: Hashtagsbattle is a Web App which displays some analytics, such as hourly trending hashtags, daily hashtags, worldwide activity… based on Twitter and in Real-Time. Inspired by the awesome One Million Tweet Map. (Nb: that’s a demo).
Q: How it works?
A:
First of, there’s a tweets listener built with Tweepy which retrieves tweets sent back by the Twitter’s API. It does some basic cleaning and filtering before publishing them to a Pub/Sub topic, which is basically a global-scale messaging buffer/bus. The listener is running on a Google Compute Engine instance, as it is somehow cheap and doesnt requires auto-scaling.
Then, there’s a little Express server using SocketIO. This application is running on App Engine. There’s an endpoint receiving Pub/Sub push messages and emiting events through a web socket. It’s using the Supercluster library to do server-side clustering on points and to reduce networking/client-side rendering delay.
The heart of my project is the Apache-Beam streaming processing pipeline running on the Cloud Dataflow runner. This pipeline consumes events sent by the source Pub/Sub topic and it does some data transformations (grouping, counting, filtering, batching…) before sending back the pre-aggregated output to another Pub/Sub topic. I’m playing with some windows and some triggers to achieve a quite low-latency.
Finally, the output Pub/Sub topic will trigger Cloud Functions instances that are going to do some computation on the data before saving it to Firestore.
The Web-App is built with Stencil and it’s deployed to Firebase Hosting.
As you can see, this is fully managed by Google Cloud Platform.
Work in progress.
The application is made of 4 components. Almost each component is Dockerized and has it’s own CI/CD pipeline using Cloud Build.