项目作者: embulk

项目描述 :
高级语言: Java
项目地址: git://github.com/embulk/embulk-filter-calcite.git
创建时间: 2017-02-26T18:16:30Z
项目社区:https://github.com/embulk/embulk-filter-calcite

开源协议:Apache License 2.0

下载


Apache Calcite filter plugin for Embulk

Build Status

Overview

  • Plugin type: filter

This plugin allows users to translate rows flexibly by SQL queries specified by them.

Architecture

This plugin allows translating rows by SQL queries in Pages received from input plugin and sending the query results to next filter or output plugin as modified Pages. It uses Apache Calcite, which is the foundation for your next high-performance database and enbles executing SQL queries to customized storage by the custom adaptor. The plugin applies Page storage adaptor to Apache Calcite and then enables executing SQL queries to Pages via JDBC Driver provided.

Here is Embulk config example for this plugin:

  1. filters:
  2. - type: calcite
  3. query: SELECT * FROM $PAGES

Users can define SELECT query as query option in the filter config section. $PAGES represents Pages that input plugin creates and sends. $PAGES schema is Embulk input schema given. On the other hand, the output schema of the plugin is built from the metadata of query result. Embulk types are converted into Apache Calcite types internally. This is type mapping between Embulk and Apache Calcite.

Embulk type Apache Calcite type JDBC type
boolean BOOLEAN java.lang.Boolean
long BIGINT java.lang.Long
double DOUBLE java.lang.Double
timestamp TIMESTAMP java.sql.Timestamp
string VARCHAR java.lang.String
json VARCHAR java.lang.String

Data types by Apache Calcite: https://calcite.apache.org/docs/reference.html#data-types

Configuration

  • query: SQL to run (string, required)
  • default_timezone: Configure timezone that is used for JDBC connection properties and Calcite engine. This option is one of JDBC connect parameters provided by Apache Calcite. java.util.TimeZone’s AvailableIDs can be specified. (string, default: ‘UTC’)
  • options: extra JDBC properties. See JDBC connect parameters. (hash, default: {})

Example

This config enables removing rows not associated to id 1 and 2 from Pages.

  1. filters:
  2. - type: calcite
  3. query: SELECT * FROM $PAGES WHERE id IN (1, 2)

The following is an example by LIKE operator and enables removing rows not matched at a specified pattern from Pages.

  1. filters:
  2. - type: calcite
  3. query: SELECT * FROM $PAGES WHERE message LIKE '%EMBULK%'

This enables adding new column and inserting the value combined 2 string column values.

  1. filters:
  2. - type: calcite
  3. query: SELECT first_name || last_name AS name, * FROM $PAGES

Adds the new column by CURRENT_TIMESTAMP function.

  1. filters:
  2. - type: calcite
  3. query: SELECT CURRENT_TIMESTAMP, * FROM $PAGES
  4. default_timezone: 'America/Los_Angeles'

SQL language provided by Apache Calcite: https://calcite.apache.org/docs/reference.html

Build

  1. $ ./gradlew gem

Release

  1. $ ./gradlew gemPush