项目作者: xshi0001

项目描述 :
This is useful and practical data analysis project for new python users to improve theirs skills. Whatever is worth doing is worth doing well。任何值得做的事就值得把它做好!
高级语言: Python
项目地址: git://github.com/xshi0001/NYC-taxiAnalysis.git
创建时间: 2018-06-06T09:15:41Z
项目社区:https://github.com/xshi0001/NYC-taxiAnalysis

开源协议:

下载


TLC Trip Record Data. - from http://www.nyc.gov

Data:
This project is using the NYC taxi data from the period before July 2016 described and availablenyc.gov, also available either through GoogleBiqQuery or in smaller samples or from archive.org

step1 clean_data

You need to deal with the raw data and find out that the columns names need to be stripped, concat the trip data and fare data in one csv and remove rows with unreasonable values.

step2 pick_drop_locationID

the original taxi zone shp projected coordinate system is defined by EPSG:2263(NAD_1983_StatePlane_New_York_Long_Island_FIPS_3104_Feet). However, longitude, latitude in the csv data
are achieved by geographic coordinate systems ESPG:4326(GCS_WGS_1984) So you need to make an coordinate transformation between them to find the locationID for picking and dropping.
you can use python packages here:

  • Shapely - a python package for set-theoretic analysis and manipulation of planar features using (via Python’s ctypes module) functions from the well known and widely deployed GEOS library.
    • GeoPandas - an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types.
  • fiona - It focuses on reading and writing data in standard Python IO style and relies upon familiar Python types and protocols.

Reference: