项目作者: yaooqinn

项目描述 :
PostgreSQL and GreenPlum Data Source for Apache Spark
高级语言: Scala
项目地址: git://github.com/yaooqinn/spark-postgres.git
创建时间: 2019-03-14T07:08:34Z
项目社区:https://github.com/yaooqinn/spark-postgres

开源协议:Apache License 2.0

下载


PostgreSQL & GreenPlum Data Source for Apache Spark License GitHub release codecov Build StatusHitCount

A library for reading data from and transferring data to Greenplum databases with Apache Spark, for Spark SQL and DataFrames.

This library is 100x faster than Apache Spark’s JDBC DataSource while transferring data from Spark to Greenpum databases.

Also, this library is fully transactional .

Try it now !

CTAS

  1. CREATE TABLE tbl
  2. USING greenplum
  3. options (
  4. url "jdbc:postgresql://greenplum:5432/",
  5. delimiter "\t",
  6. dbschema "gptest",
  7. dbtable "store_sales",
  8. user 'gptest',
  9. password 'test')
  10. AS
  11. SELECT * FROM tpcds_100g.store_sales WHERE ss_sold_date_sk<=2451537 AND ss_sold_date_sk> 2451520;

View & Insert

  1. CREATE TEMPORARY TABLE tbl
  2. USING greenplum
  3. options (
  4. url "jdbc:postgresql://greenplum:5432/",
  5. delimiter "\t",
  6. dbschema "gptest",
  7. dbtable "store_sales",
  8. user 'gptest',
  9. password 'test')
  10. INSERT INTO TABLE tbl SELECT * FROM tpcds_100g.store_sales WHERE ss_sold_date_sk<=2451537 AND ss_sold_date_sk> 2451520;

Please refer to Spark SQL Guide - JDBC To Other Databases to learn more about the similar usage.