Sync data from one Postgres database to another
Sync data from one Postgres database to another (like pg_dump/pg_restore). Designed for:
Battle-tested at Instacart
pgsync is a command line tool. To install, run:
gem install pgsync
This will give you the pgsync command. If installation fails, you may need to install dependencies.
You can also install it with Homebrew:
brew install pgsync
Or Docker.
In your project directory, run:
pgsync --init
This creates .pgsync.yml for you to customize. We recommend checking this into your version control (assuming it doesn’t contain sensitive information). pgsync commands can be run from this directory or any subdirectory.
First, make sure your schema is set up in both databases. We recommend using a schema migration tool for this, but pgsync also provides a few convenience methods. Once that’s done, you’re ready to sync data.
Sync tables
pgsync
Sync specific tables
pgsync table1,table2
Works with wildcards as well
pgsync "table*"
Sync specific rows (existing rows are overwritten)
pgsync products "where store_id = 1"
You can also preserve existing rows
pgsync products "where store_id = 1" --preserve
Or truncate them
pgsync products "where store_id = 1" --truncate
Exclude specific tables
pgsync --exclude table1,table2
Add to .pgsync.yml to exclude by default
exclude:- table1- table2
Sync tables from all schemas or specific schemas (by default, only the search path is synced)
pgsync --all-schemas# orpgsync --schemas public,other# orpgsync public.table1,other.table2
Define groups in .pgsync.yml:
groups:group1:- table1- table2
And run:
pgsync group1
You can also use groups to sync a specific record and associated records in other tables.
To get product 123 with its reviews, last 10 coupons, and store, use:
groups:product:products: "where id = {1}"reviews: "where product_id = {1}"coupons: "where product_id = {1} order by created_at desc limit 10"stores: "where id in (select store_id from products where id = {1})"
And run:
pgsync product:123
Sync the schema before the data (this wipes out existing data)
pgsync --schema-first
Specify tables
pgsync table1,table2 --schema-first
Sync the schema without data (this wipes out existing data)
pgsync --schema-only
pgsync does not try to sync Postgres extensions.
Prevent sensitive data like email addresses from leaving the remote server.
Define rules in .pgsync.yml:
data_rules:email: unique_emaillast_name: random_letterbirthday: random_dateusers.auth_token:value: secretvisits_count:statement: "(RANDOM() * 10)::int"encrypted_*: null
last_name matches all columns named last_name and users.last_name matches only the users table. Wildcards are supported, and the first matching rule is applied.
Options for replacement are:
unique_emailunique_phoneunique_secretrandom_letterrandom_intrandom_daterandom_timerandom_ipvaluestatementnulluntouchedRules starting with unique_ require the table to have a single column primary key. unique_phone requires a numeric primary key.
Foreign keys can make it difficult to sync data. Three options are:
To defer constraints, use:
pgsync --defer-constraints
To manually specify the order of tables, use --jobs 1 so tables are synced one-at-a-time.
pgsync table1,table2,table3 --jobs 1
To disable foreign key triggers and potentially break referential integrity, use:
pgsync --disable-integrity
This requires superuser privileges on the to database. If syncing to (not from) Amazon RDS, use the rds_superuser role. If syncing to (not from) Heroku, there doesn’t appear to be a way to disable integrity.
Disable user triggers with:
pgsync --disable-user-triggers
Skip syncing sequences with:
pgsync --no-sequences
For extremely large, append-only tables, sync in batches.
pgsync large_table --in-batches
Note: This requires the table to have a numeric, increasing primary key
The script will resume where it left off when run again, making it great for backfills.
Always make sure your connection is secure when connecting to a database over a network you don’t fully trust. Your best option is to connect over SSH or a VPN. Another option is to use sslmode=verify-full. If you don’t do this, your database credentials can be compromised.
To keep you from accidentally overwriting production, the destination is limited to localhost or 127.0.0.1 by default.
To use another host, add to_safe: true to your .pgsync.yml.
To use with multiple databases, run:
pgsync --init db2
This creates .pgsync-db2.yml for you to edit. Specify a database in commands with:
pgsync --db db2
If you run pgsync --init in a Django project, migrations will be excluded in .pgsync.yml.
exclude:- django_migrations
If you run pgsync --init in a Heroku project, the from database will be set in .pgsync.yml.
from: $(heroku config:get DATABASE_URL)?sslmode=require
If you run pgsync --init in a Laravel project, migrations will be excluded in .pgsync.yml.
exclude:- migrations
If you run pgsync --init in a Rails project, Active Record metadata and schema migrations will be excluded in .pgsync.yml.
exclude:- ar_internal_metadata- schema_migrations
To view the SQL that’s run, use:
pgsync --debug
Help
pgsync --help
Version
pgsync --version
List tables
pgsync --list
Use groups when possible to take advantage of parallelism.
For Ruby scripts, you may need to do:
Bundler.with_unbundled_env dosystem "pgsync ..."end
Get the Docker image with:
docker pull ankane/pgsyncalias pgsync="docker run -ti --rm -v .:/conf -w /conf ankane/pgsync"
This will give you the pgsync command.
For databases on the host machine, use host.docker.internal as the hostname (on Linux, this requires --add-host=host.docker.internal:host-gateway).
If installation fails, your system may be missing Ruby or libpq.
On Mac, run:
brew install libpq
On Ubuntu, run:
sudo apt-get install ruby-dev libpq-dev build-essential
Run:
gem install pgsync
To use master, run:
gem install specific_installgem specific_install https://github.com/ankane/pgsync.git
With Homebrew, run:
brew upgrade pgsync
With Docker, run:
docker pull ankane/pgsync
Also check out:
Inspired by heroku-pg-transfer.
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
To get started with development:
git clone https://github.com/ankane/pgsync.gitcd pgsyncbundle installcreatedb pgsync_test1createdb pgsync_test2createdb pgsync_test3bundle exec rake test