:factory: Schedule a data pipeline in Google Cloud using cloud function, BigQuery, cloud storage, cloud scheduler, stack trace, cloud build, and pub/sub
Deploy an end to end data pipeline for Chicago traffic api data and measure function performance using: Cloud Functions, Pub/Sub, Cloud Storage, Cloud Scheduler, BigQuery, Stackdriver Trace
Usage:
Data Pipeline Operations:
Technologies: Cloud Shell, Cloud Functions, Pub/Sub, Cloud Storage, Cloud Scheduler, BigQuery, Stackdriver Trace
Languages: Python 3.7, SQL(Standard)
Technical Concepts:
Further Reading: For those looking for production-level deployments
Prerequisites:
OR
git clone https://github.com/sungchun12/serverless_data_pipeline_gcp.git
#set project id
gcloud config set project [your-project-id]
#Enable Google service APIs
gcloud services enable \
cloudfunctions.googleapis.com \
cloudtrace.googleapis.com \
pubsub.googleapis.com \
cloudscheduler.googleapis.com \
storage-component.googleapis.com \
bigquery-json.googleapis.com \
cloudbuild.googleapis.com
Note: If you want to automate the build and deployment of this pipeline, submit the commands in order in cloud shell after completing the above steps. It skips step 5 below as it is redundant for auto-deployment.
Find your Cloudbuild service account in the IAM console. Ex: [unique-id]@cloudbuild.gserviceaccount.com
#add role permissions to CloudBuild Service Account
gcloud projects add-iam-policy-binding [your-project-id] \
--member serviceAccount:[unique-id]@cloudbuild.gserviceaccount.com \
--role roles/cloudfunctions.developer \
--role roles/cloudscheduler.admin \
--role roles/logging.viewer
#Deploy steps in cloudbuild configuration file
gcloud builds submit --config cloudbuild.yaml .
cd serverless_data_pipeline_gcp/src
gcloud functions deploy [function-name] --entry-point handler --runtime python37 --trigger-topic [topic-name]
Ex:
gcloud functions deploy demo_function --entry-point handler --runtime python37 --trigger-topic demo_topic
gcloud pubsub topics publish [topic-name] --message "<your-message>"
Ex:
gcloud pubsub topics publish demo_topic --message "Can you see this?"
gcloud functions logs read --limit 50
gcloud beta scheduler jobs create pubsub [job-name] \
--schedule "*/5 * * * *" \
--topic [topic-name] \
--message-body '{"<Message-to-publish-to-pubsub>"}' \
--time-zone 'America/Chicago'
Ex:
gcloud beta scheduler jobs create pubsub schedule_function \
--schedule "*/5 * * * *" \
--topic demo_topic \
--message-body '{"Can you see this? With love, cloud scheduler"}' \
--time-zone 'America/Chicago'
gcloud beta scheduler jobs run [job-name]
Ex:
gcloud beta scheduler jobs run schedule_function
YOUR PIPELINE IS DEPLOYED AND MEASURABLE!
Note: You’ll notice extraneous blocks of comments and commented out code throughout the python scripts.