Near real time ETL to populate a dashboard.
This is an ETL pipeline to pull bitcoin exchange data from CoinCap API and load it into our data warehouse. For more details check out the blog at https://startdataengineering.com/post/data-engineering-project-to-impress-hiring-managers/
Code available at bitcoinMonitor repository.
You can run this data pipeline using GitHub codespaces. Follow the instructions below.
Create codespaces on main
button.make up
.make up
to complete, and then wait for 30s (give Metabase sometime to setup).ports
tab and click on the link exposing port 3000
to access Metabase UI (username and password is sdeuser
and sdepassword1234
respectively). See metabase connection settings
screenshot below for connection details.Note: The screenshots show how to run a project on codespaces, please make sure to use the instructions above for this specific project.
Metabase connection settings:
The metabase UI will look like the following
Note Make sure to switch off codespaces instance, you only have limited free usage; see docs here.
To run locally, you need:
Clone the repo and run the following commands to start the data pipeline:
git clone https://github.com/josephmachado/bitcoinMonitor.git
cd bitcoinMonitor
make up
sleep 30 # wait for Metabase to start
make ci # run checks and tests
Go to 3000">http3000 to see the Metabase UI.
We use python to pull, transform and load data. Our warehouse is postgres. We also spin up a Metabase instance for our presentation layer.
All of the components are running as docker containers.
Read this post, for information on setting up CI/CD, IAC(terraform), “make” commands and automated testing.