项目作者: shgtkshruch

项目描述 :
study for loading concealed data with Embulk
高级语言: HCL
项目地址: git://github.com/shgtkshruch/embulk-masking-sample.git
创建时间: 2020-09-18T14:53:39Z
项目社区:https://github.com/shgtkshruch/embulk-masking-sample

开源协议:

下载


study for loading concealed data with Embulk.

Diagram



Requirements

Setup

  1. # Download test data
  2. # ref: https://dev.mysql.com/doc/employee/en/
  3. $ gh repo clone datacharmer/test_db
  4. # Lunch MySQL server on 4306 port
  5. $ dip provition
  6. # Import test data to MySQL
  7. $ docker-compose exec db /bin/bash -c 'mysql -u root -p"$MYSQL_ROOT_PASSWORD" < employees.sql'
  8. # Install embulk gems
  9. $ docker-compose exec -w /tmp/embulk/bundle embulk bash
  10. $ embulk bundle install

Example

Official example

  1. $ embulk example ./try1
  2. $ embulk guess ./try1/seed.yml -o ./try1/config.yml
  3. $ embulk preview ./try1/config.yml
  4. +---------+--------------+-------------------------+-------------------------+----------------------------+
  5. | id:long | account:long | time:timestamp | purchase:timestamp | comment:string |
  6. +---------+--------------+-------------------------+-------------------------+----------------------------+
  7. | 1 | 32,864 | 2015-01-27 19:23:49 UTC | 2015-01-27 00:00:00 UTC | embulk |
  8. | 2 | 14,824 | 2015-01-27 19:01:23 UTC | 2015-01-27 00:00:00 UTC | embulk jruby |
  9. | 3 | 27,559 | 2015-01-28 02:20:02 UTC | 2015-01-28 00:00:00 UTC | Embulk "csv" parser plugin |
  10. | 4 | 11,270 | 2015-01-29 11:54:36 UTC | 2015-01-29 00:00:00 UTC | |
  11. +---------+--------------+-------------------------+-------------------------+----------------------------+
  12. $ embulk run ./try1/config.yml
  13. 1,32864,2015-01-27 19:23:49,20150127,embulk
  14. 2,14824,2015-01-27 19:01:23,20150127,embulk jruby
  15. 3,27559,2015-01-28 02:20:02,20150128,Embulk "csv" parser plugin
  16. 4,11270,2015-01-29 11:54:36,20150129,

MySQL

  1. $ docker-compose exec embulk bash
  2. $ embulk guess -b bundle -o ./mysql/config.yml ./mysql/seed.yml
  3. $ embulk preview -b bundle ./mysql/config.yml
  4. +-------------+-------------------------+-------------------+------------------+---------------+-------------------------+
  5. | emp_no:long | birth_date:timestamp | first_name:string | last_name:string | gender:string | hire_date:timestamp |
  6. +-------------+-------------------------+-------------------+------------------+---------------+-------------------------+
  7. | 10,001 | 1953-09-01 15:00:00 UTC | Georgi | Facello | M | 1986-06-25 15:00:00 UTC |
  8. | 10,002 | 1964-06-01 15:00:00 UTC | Bezalel | Simmel | F | 1985-11-20 15:00:00 UTC |
  9. | 10,003 | 1959-12-02 15:00:00 UTC | Parto | Bamford | M | 1986-08-27 15:00:00 UTC |
  10. | 10,004 | 1954-04-30 15:00:00 UTC | Chirstian | Koblick | M | 1986-11-30 15:00:00 UTC |
  11. | 10,005 | 1955-01-20 15:00:00 UTC | Kyoichi | Maliniak | M | 1989-09-11 15:00:00 UTC |
  12. ...
  13. ...
  14. $ embulk run -b bundle ./mysql/config.yml

AWS

Create IAM user for Terraform

create .env-aws file.

  1. AWS_ACCESS_KEY_ID=xxx
  2. AWS_SECRET_ACCESS_KEY=xxx
  3. AWS_DEFAULT_REGION=xxx
  1. $ dip aws iam create-user --user-name embulk-mysql-rds-masking
  2. $ dip aws iam create-access-key --user-name embulk-mysql-rds-masking
  3. $ dip aws iam attach-user-policy \
  4. --policy-arn arn:aws:iam::aws:policy/AdministratorAccess \
  5. --user-name embulk-mysql-rds-masking

Terraform

create .env-tf file with embulk-mysql-rds-masking credential.

  1. AWS_ACCESS_KEY_ID=xxx
  2. AWS_SECRET_ACCESS_KEY=xxx
  3. AWS_DEFAULT_REGION=xxx
  1. # generage zip files of lambda
  2. $ cd terraform/lambda && zip -r create-onetime-rds.zip create-onetime-rds.js && zip -r delete-onetime-rds.zip delete-onetime-rds.js && cd -
  3. $ dip terraform init
  4. $ dip terraform plan
  5. $ dip terraform apply

Load test_data to RDS

  1. $ docker-compose exec db /bin/bash -c 'mysql -h HOST -u dbuser -ppassword < employees.sql'

Create onetime RDS

  1. $ dip aws stepfunctions start-execution \
  2. --state-machine-arn <value> \
  3. --input '{ "DBInstanceIdentifier": "RDS_IDENTIFIER" }'

Transfer data from RDS to local MySQL

  1. Set db host to mysql/seed.yml
  2. Run embulk
  1. $ docker-compose exec embulk bash
  2. $ embulk guess -b bundle -o ./mysql/config.yml ./mysql/seed.yml
  3. $ embulk preview -b bundle ./mysql/config.yml
  4. $ embulk run -b bundle ./mysql/config.yml

Cleaning

Remove aws resources.

  1. $ dip terraform destroy