项目作者: banditelol

项目描述 :
Airtable backup script package
高级语言: Jupyter Notebook
项目地址: git://github.com/banditelol/airscraper.git
创建时间: 2020-09-03T09:06:42Z
项目社区:https://github.com/banditelol/airscraper

开源协议:MIT License

下载


Airscraper

Open In Colab
PyPI version

A simple scraper to download csv from any airtable shared view programatically, think of it as a programatic way of downloading csv from airtable shared view.
Use it if:

  • You want to download a shared view periodically
  • You don’t mind the shared view to be accessed basically without authorization

Requirements

Because its a simple scraper, basically only beautifulsoup is needed

  • BeautifulSoup4
  • Pandas

Installation

pip install airscraper

Build From Source

  • Install build dependencies:
    1. pip install --upgrade pip setuptools wheel
    2. pip install tqdm
    3. pip install --user --upgrade twine
  • Build the Package
    • python setup.py bdist_wheel
  • Install the built Package
    • pip install --upgrade dist/airscraper-0.1-py3-none-any.whl
  • Use it without adding python in front of it
    • airscraper [url]

Direct Execution (Testing Purpose)

  • Clone this project
  • Install the requirements
    • pip install -r requirements.txt
  • run the code
    • python airscraper/airscraper.py [url]

Usage

Create a shared view link and use that link to download the shared view into csv. All [url] mentioned in the examples are referring to the shared view link you get from this step.

As CLI

  1. # Print Result to Terminal
  2. python airscraper/airscraper.py [url]
  3. # Pipe the result to csv file
  4. python airscraper/airscraper.py [url] > [filename].csv

As Python Package

  1. from airscraper import AirScraper
  2. client = AirScraper([url])
  3. data = client.get_table().text
  4. # print the result
  5. print(data)
  6. # save as file
  7. with open('data.csv','w') as f:
  8. f.write(data)
  9. # use it with pandas
  10. from io import StringIO
  11. import pandas as pd
  12. df = pd.read_csv(StringIO(data), sep=',')
  13. df.head()

Help

  1. usage: airscraper [-h] [-l LOCALE] [-tz TIMEZONE] view_url
  2. Download CSV from Airtable Shared View Link, You can pass the result to file using
  3. '> name.csv'
  4. positional arguments:
  5. view_url url generated from sharing view using link in airtable
  6. optional arguments:
  7. -h, --help show this help message and exit
  8. -l LOCALE, --locale LOCALE
  9. Your locale, default to 'en'
  10. -tz TIMEZONE, --timezone TIMEZONE
  11. Your timezone, use URL encoded string, default to
  12. 'Asia/Jakarta'

What’s next

Currently I’m thinking of several things in mind:

  • ✅ Making this installed package
  • Adds accessibility to use it in FaaS Platform (most use case I could thought of are related to this)
  • ✅ Create a proper package that can be imported (so I could use it in my ETL script)
  • ✅ Fill in LICENSE and setup.py, (to be honest I have no idea yet what to put into it)
    • It turns out there are a lot of resources out there if you know what to look for :)

Contributing

If you have similar problem or have any idea to improve this package please let me know in the issues or just hit me up on twitter @BanditelolRP

Development

If you’re going to try to develop it yourself, here’s my overall workflow

1. Create a virtual environment

I usually used venv on python 3.8 to create a new virtualenvironment

  1. python -m venv venv
  2. # and activate the environment
  3. source venv/bin/activate

2. Create a virtual environment

Install necessary requirements and install the package for development using editable

  1. pip install wheels pytest -q
  2. pip install -r requirements.txt
  3. pip install -e .

3. Play around with the code

You can browse the notebook for explanation on how it works and some example use case, and I really appreciate helps in documentation and testing. Have fun!