Airscraper

A simple scraper to download csv from any airtable shared view programatically, think of it as a programatic way of downloading csv from airtable shared view.
Use it if:

You want to download a shared view periodically
You don’t mind the shared view to be accessed basically without authorization

Requirements

Because its a simple scraper, basically only beautifulsoup is needed

BeautifulSoup4
Pandas

Installation

Using pip (Recommended)

pip install airscraper

Build From Source

Install build dependencies:

pip install --upgrade pip setuptools wheel
pip install tqdm
pip install --user --upgrade twine

Build the Package
- python setup.py bdist_wheel
Install the built Package
- pip install --upgrade dist/airscraper-0.1-py3-none-any.whl
Use it without adding python in front of it
- airscraper [url]

Direct Execution (Testing Purpose)

Clone this project
Install the requirements
- pip install -r requirements.txt
run the code
- python airscraper/airscraper.py [url]

Usage

Create a shared view link and use that link to download the shared view into csv. All [url] mentioned in the examples are referring to the shared view link you get from this step.

As CLI

# Print Result to Terminal
python airscraper/airscraper.py [url]
# Pipe the result to csv file
python airscraper/airscraper.py [url] > [filename].csv

As Python Package

from airscraper import AirScraper
client = AirScraper([url])
data = client.get_table().text
# print the result
print(data)
# save as file
with open('data.csv','w') as f:
  f.write(data)
# use it with pandas
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(data), sep=',')
df.head()

Help

usage: airscraper [-h] [-l LOCALE] [-tz TIMEZONE] view_url
Download CSV from Airtable Shared View Link, You can pass the result to file using
'> name.csv'
positional arguments:
  view_url              url generated from sharing view using link in airtable
optional arguments:
  -h, --help            show this help message and exit
  -l LOCALE, --locale LOCALE
                        Your locale, default to 'en'
  -tz TIMEZONE, --timezone TIMEZONE
                        Your timezone, use URL encoded string, default to
                        'Asia/Jakarta'

What’s next

Currently I’m thinking of several things in mind:

✅ Making this installed package
Adds accessibility to use it in FaaS Platform (most use case I could thought of are related to this)
✅ Create a proper package that can be imported (so I could use it in my ETL script)
✅ Fill in LICENSE and setup.py, (to be honest I have no idea yet what to put into it)
- It turns out there are a lot of resources out there if you know what to look for :)

Contributing

If you have similar problem or have any idea to improve this package please let me know in the issues or just hit me up on twitter @BanditelolRP

Development

If you’re going to try to develop it yourself, here’s my overall workflow

1. Create a virtual environment

I usually used venv on python 3.8 to create a new virtualenvironment

python -m venv venv
# and activate the environment
source venv/bin/activate

2. Create a virtual environment

Install necessary requirements and install the package for development using editable

pip install wheels pytest -q
pip install -r requirements.txt
pip install -e .

3. Play around with the code

You can browse the notebook for explanation on how it works and some example use case, and I really appreciate helps in documentation and testing. Have fun!