This is useful and practical data analysis project for new python users to improve theirs skills. Whatever is worth doing is worth doing well。任何值得做的事就值得把它做好!
Data:
This project is using the NYC taxi data from the period before July 2016 described and availablenyc.gov, also available either through GoogleBiqQuery or in smaller samples or from archive.org
You need to deal with the raw data and find out that the columns names need to be stripped, concat the trip data and fare data in one csv and remove rows with unreasonable values.
the original taxi zone shp projected coordinate system is defined by EPSG:2263(NAD_1983_StatePlane_New_York_Long_Island_FIPS_3104_Feet). However, longitude, latitude in the csv data
are achieved by geographic coordinate systems ESPG:4326(GCS_WGS_1984) So you need to make an coordinate transformation between them to find the locationID for picking and dropping.
you can use python packages here: