项目作者: sgsfak

项目描述 :
A PostgreSQL Foreign Data Mapper for accessing the OpenCageData API
高级语言: C
项目地址: git://github.com/sgsfak/ocgeo_fdw.git
创建时间: 2020-08-10T07:30:13Z
项目社区:https://github.com/sgsfak/ocgeo_fdw

开源协议:Other

下载


Foreign Data Wrapper for OpenCageData API

This PostgreSQL extension implements a Foreign Data Wrapper (FDW) for the
OpenCageData API.

Please note that this version of ocgeo_fdw works with PostgreSQL version 9.6, 10, 11, 12, 13, and 14.

Installation

To build on POSIX-compliant systems you need to ensure that the PostgreSQL “-dev” package in
installed and that the pg_config executable is in your path when you run make. This executable
is typically in your PostgreSQL installation’s bin directory. E.g. in a recent
Ubuntu Linux you can install
postgresql-server-dev-12 :

  1. sudo apt-get install postgresql-server-dev-12

Also libcurl should have been installed, for example this package for Ubuntu, and please make sure that curl-config is in the PATH.

Then it’s a matter of running (GNU) make like so:

  1. make && sudo make install

Usage

Load the extension in your database

  1. CREATE EXTENSION ocgeo_fdw;

Create a “Server”

The server provides global information about the OpenCageData API. The only configuration
option required is the API endpoint. The server can be defined as follows:

  1. CREATE SERVER ocdata_server FOREIGN DATA WRAPPER ocgeo_fdw
  2. OPTIONS (uri 'https://api.opencagedata.com/geocode/v1/json');

The name of option for the endpoint should be uri and the URL given should correspond to
the API returning JSON format. Other than that, the name
of the server can be anything you like, instead of ocdata_server.

Create a “user mapping”

Different users of your database can have different keys for accessing the API. The following
statement is necessary for configuring the API key used for the current user.

  1. CREATE USER MAPPING FOR current_user SERVER ocdata_server
  2. OPTIONS (api_key '6d0e711d72d74daeb2b0bfd2a5cdfdba',
  3. max_reqs_sec '1');

Please make sure that you are using the correct server name you have defined if you specified
something different in the CREATE SERVER command

For testing purposes the above SQL command will store the 6d0e711d72d74daeb2b0bfd2a5cdfdba
testing API key that returns successful, but always
the same, responses. The above can be used to check whether the FDW works properly but you can
register for the free plan to get a valid key for more
extensive testing purposes, or, better yet, become a customer!.

If you want to change the API key, you can use the following:

  1. ALTER USER MAPPING FOR current_user SERVER ocdata_server
  2. OPTIONS (SET api_key '<your API key>');

The max_reqs_sec option is the maximum number of requests per second that according to the
selected plan the user is allowed to make. This option is
optional and defaults to 1 which is the corresponding value for the free plan.

Create a “foreign table”

The foreign table is a “virtual” table allowing to make requests to the OpenCageData API when used
in SQL SELECT commands. The following is a typical definition that allows to all the information
returned by the JSON API in the components and formatted fields:

  1. CREATE FOREIGN TABLE ocgdata_api (
  2. json_response JSONB,
  3. bounding_box BOX,
  4. location POINT,
  5. _category TEXT,
  6. _type TEXT,
  7. city TEXT,
  8. city_district TEXT,
  9. continent TEXT,
  10. country TEXT,
  11. country_code TEXT,
  12. county TEXT,
  13. house_number TEXT,
  14. political_union TEXT,
  15. neighbourhood TEXT,
  16. postcode TEXT,
  17. road TEXT,
  18. road_type TEXT,
  19. state TEXT,
  20. state_code TEXT,
  21. state_district TEXT,
  22. suburb TEXT,
  23. confidence INTEGER,
  24. formatted TEXT,
  25. -- The following is the input to the select
  26. q text
  27. ) SERVER ocdata_server;

The name of the table can be anything you like (instead of ocgdata_api) but, as
with the definition of the User Mapping, it is important to use the correct server name.

With the above definition in place, you can do single API calls like so:

  1. SELECT confidence, _type, _category, location, formatted
  2. FROM ocgdata_api
  3. WHERE q='Trierer Straße 15, 99423, Weimar, Deutschland';

The important thing to remember is to use the q attribute to provide the search query.
Providing an “equal restriction” on q is required, otherwise no API call will be made
(but see below on ‘More advanced queries’ for JOIN queries). Giving a latitude - longitude
coordinates separated by , (comma) as a query will result in a
reverse geocoding request. Another attribute
that can be used as input is the confidence: Putting a >= condition on this, will make an API
request using the min_confidence parameter.

Example:

  1. > SELECT _type, formatted FROM ocgdata_api WHERE q='eiffel tower, france' AND confidence>=5;
  2. +-------------+-----------------------------------------------------------------------------+
  3. _type formatted
  4. +-------------+-----------------------------------------------------------------------------+
  5. attraction Eiffel Tower, 5 Avenue Anatole France, 75007 Paris, France
  6. information Eiffel Tower, Esplanade des Ouvriers de la Tour Eiffel, 75007 Paris, France
  7. road Rue Gustave Eiffel, 13010 Marseille, France
  8. road Rue Gustave Eiffel, 45000 Orléans, France
  9. road Rue Gustave Eiffel, 34000 Montpellier, France
  10. road Pont Eiffel, 27000 Évreux, France
  11. road Rue Gustave Eiffel, 33100 Bordeaux, France
  12. road Rue Gustave Eiffel, 84000 Avignon, France
  13. road Rue Gustave Eiffel, 18000 Bourges, France
  14. road Avenue Gustave Eiffel, 21000 Dijon, France
  15. road Rue Gustave Eiffel, 72100 Le Mans, France
  16. road Rue Gustave Eiffel, 30000 Nîmes, France
  17. road Rue Gustave Eiffel, 38000 Grenoble, France
  18. road Impasse Eiffel, 44700 Orvault, France
  19. road Rue Gustave Eiffel, 82000 Montauban, France
  20. road Rue Gustave Eiffel, 81000 Albi, France
  21. road Rue Gustave Eiffel, 89000 Auxerre, France
  22. road Rue Gustave Eiffel, 79000 Niort, France
  23. road Rue Gustave Eiffel, 94000 Créteil, France
  24. road Rue Gustave Eiffel, 11000 Carcassonne, France
  25. road Rue Gustave Eiffel, 22000 Saint-Brieuc, France
  26. road Rue Gustave Eiffel, 86100 Châtellerault, France
  27. road Rue Gustave Eiffel, 85000 La Roche-sur-Yon, France
  28. road Rue Gustave Eiffel, 60000 Beauvais, France
  29. road Avenue Eiffel, 78420 Carrières-sur-Seine, France
  30. +-------------+-----------------------------------------------------------------------------+

More advanced queries

How many cities named Paris exist all over the world?

  1. > SELECT country, COUNT(*) FROM ocgdata_api WHERE q='paris' AND _type='city' GROUP BY country ORDER BY 2;
  2. +--------------------------+-------+
  3. | country | count |
  4. +--------------------------+-------+
  5. | France | 1 |
  6. | Canada | 2 |
  7. | United States of America | 7 |
  8. +--------------------------+-------+

Joining with other tables

Assuming we have a (local) table with our users and there’s an address attribute with
the user supplied home address, we can join the two tables to get geocoding information
as follows:

  1. SELECT user_id, ocgdata_api.*
  2. FROM users LEFT JOIN ocgdata_api ON q=users.address
  3. WHERE confidence >= 5;

For executing this, PostgreSQL will select a “nested loop” plan where for each row in the
users table an API call will be performed through the FDW. So this will result in making
as many API requests as rows in the users table.

Note: This may result in a large number of API requests sent in quick succession. The FDW
performs a rate limiting mechanism to make sure that the maximum number of API calls per
second do not exceed the user’s plan, as configured by the USER MAPPING. But it
currently does not perform any throttling with respect to the maximum number of calls
per day. To make sure that even this limitation is respected, you can use the
ocgeo_stats function described bellow.

If such command is used frequently, it may be a good idea to create a materialized view:

  1. CREATE MATERIALIZED VIEW users_locations AS
  2. SELECT user_id, ocgdata_api.*
  3. FROM users LEFT JOIN ocgdata_api ON q=users.address

..and then perform any subsequent queries on the local materialized view. When the users
base table is updated you need to perform a REFRESH MATERIALIZED VIEW users_locations to
recreate the contents of the view.

Accessing the JSON results

If the definition of the foreign table includes an attribute of type JSONB, ocgeo_fdw will store the JSON result message there. This permits more information to be retrieved since there are plenty PostgreSQL operators and functions for processing JSON data. For example, we can get the currency information from the annotations field of the JSON result, as shown next:

  1. WITH temp(js) AS
  2. (SELECT json_response FROM ocgdata_api WHERE q='taj mahal, India')
  3. SELECT jsonb_pretty(js->'annotations'->'currency') AS currency
  4. FROM temp;
  1. +---------------------------------+
  2. currency
  3. +---------------------------------+
  4. {
  5. "name": "Indian Rupee",
  6. "symbol": "₹",
  7. "subunit": "Paisa",
  8. "iso_code": "INR",
  9. "html_entity": "₹",
  10. "iso_numeric": "356",
  11. "decimal_mark": ".",
  12. "symbol_first": 1,
  13. "subunit_to_unit": 100,
  14. "alternate_symbols": [
  15. "Rs",
  16. "৳",
  17. "૱",
  18. "௹",
  19. "रु",
  20. "₨"
  21. ],
  22. "thousands_separator": ",",
  23. "smallest_denomination": 50
  24. }
  25. +---------------------------------+

Here I am using a “Common Table Expression” (CTE) (WITH query) to define the subquery temp to the foreign table which is then used in the main query.

Geometric information

ocgeo_fdw uses the POINT and BOX geometric data types of PostgreSQL for the geometry and bounds JSON fields of the API response. This allows for more advanced queries using the various geometric functions and operators of PostgreSQL..

Support for PostGIS in on plan..

Miscellaneous

Debug messages

Using the following in the psql command session you can debug information and the actual API
URL used:

  1. SET client_min_messages TO DEBUG1;

For example, the following:

  1. SELECT confidence, _type, _category, location, formatted
  2. FROM ocgdata_api
  3. WHERE q='Trierer Straße 15, 99423, Weimar, Deutschland';

will print:

  1. DEBUG: ocgeoGetForeignRelSize: remote conds: 1, local conds: 0
  2. DEBUG: ocgeoGetForeignPaths: param paths 0
  3. DEBUG: printRestrictInfoList: (q = 'Trierer Straße 15, 99423, Weimar, Deutschland')
  4. DEBUG: ocgeoGetForeignPlan, 6 column list, 1 scan clauses
  5. DEBUG: function ocgeoBeginForeignScan qual: q='Trierer Straße 15, 99423, Weimar, Deutschland' and confidence>=0
  6. DEBUG: API https://api.opencagedata.com/geocode/v1/json?q=Trierer%20Stra%C3%9Fe%2015%2C%2099423%2C%20Weimar%2C%20Deutschland&key=6d0e711d72d74daeb2b0bfd2a5cdfdba&limit=50&no_annotations=0 returned status: 200, results: 1, time: 1226.36 msec

Statistics (usage) information

This FDW provides an ocgeo_stats function that can be used to retrieve information about the
usage of the API in the current session. The information returned contains:

  • The total number of API calls (requests)
  • The total number of failed API calls (e.g. because of network errors, rate limiting, etc)
  • The total number of seconds taken in the submission of the API requests and the parsing of
    the JSON response, for all calls (successful and failed)
  • The rate limit information as returned by the
    API in the most recent call. This includes the limit (i.e. the maximum number of requests per
    day), the remaining API calls in the current day, and the date time when the counter will be
    reset.

You can use this function as follows:

  1. > SELECT * FROM ocgeo_stats();
  2. +-----------+------------+------------+------------+----------------+------------------------+
  3. | nbr_calls | nbr_failed | total_time | rate_limit | rate_remaining | rate_reset |
  4. +-----------+------------+------------+------------+----------------+------------------------+
  5. | 4 | 0 | 3.09 | 2500 | 2482 | 2020-07-15 03:00:00+03 |
  6. +-----------+------------+------------+------------+----------------+------------------------+

The rate information is returned by the API server on each request so it is the most accurate,
according to the most recent API call. On the other hand, the number of calls and total time
are recorded per session so they are “local” to the current PostgreSQL connection. So, for
example, based on the above we see that we have 2482 remaining API calls in the current “day”,
and therefore we have made 2500 - 2482 = 8 requests, but only 4 of them were performed in the
current session.