My attempts at using the R language to collect, save, and visualize daily police reports, which are listed on the Newport News Police Open Data page.
My attempts at using the R language to collect, save, and visualize daily police reports, which are listed on the Newport News Police Open Data page.
Using the daily-collection.R
file, you can run the runDailyCollection
function to automatically download all the daily Newport News Police Open Data reports and append them to CSV file data sets. Notice, you must tell the function your working directory where these repo files are stored.
repo <- "/Users/adam/Documents/Newport News Open Police Data"
source("/Users/adam/Documents/Newport News Open Police Data/daily-collection.R")
runDailyCollection(repo)
If you’re saavy, you could use a scheduler like cron
to run this on a daily basis after midnight. This would allow you to create your own daily snapshot of police reports.
Using the plot-reports.R
file, you can run the plotReports
function to automatically plot on a Leaflet map all police activity stored in the CSV data sets. It saves the plot in an HTML file that you can view from a web server. I use MAMP on my Mac for this. On Windows, you can use WAMP or XAMMP to serve the HTML Leaflet plot. Notice, you must tell the function your working directory where these repo files are stored.
You can view a sample of the Leaflet plot here.
repo <- "/Users/adam/Documents/Newport News Open Police Data"
source("/Users/adam/Documents/Newport News Open Police Data/plot-reports.R")
plotReports(repo)
I’ve included some scripts for automating the entire process of downloading and cleaning the daily CSVs, uploading them to a Google Cloud Storage account (via API keys), and creating the HTML Leaflet plot.
Modify them for your needs, but here’s how they work for me:
cron-job.R
: This R script automates the jobs. You must change the repo
variable to point to your repo directory.cronScript
: This bash shell script will call the cron-job.R
R script to run the jobs. It then pushes the updated index.html
Leaflet plot to your fork of this repo. I call this script daily, as you’d guess, via cron
on my Mac.Just one caveat…
Before running the Leaflet plot via R on a command-line, I suggest you also install Pandoc. On a Mac, you’d need to do this via Homebrew. Pandoc is responsible for combining and encoding all the Leaflet HTML and JavaScript assets into the single index.html
file. Pandoc is included with RStudio‘s binaries and runs automatically via the GUI, but it’s unavailable if you’re running R headless.
If you don’t install Pandoc, you’ll still get the index.html
file, but you’ll also get a subfolder index_files
in the repo with all the assets needed to run Leaflet—it’s quite a lot of files.
Public versions of the appended daily reports are available as CSV files on Google Cloud Storage, which I maintain:
Need to incorporate these daily arrest and offense reports, since they list additional charges that don’t show up on the CSV data sets (See Things I’ve learned below):
Daily Offense Reports:
Daily Arrest Reports:
Arrest IDs can have multiple charges, which are all shown on the Daily Arrest Reports. In the Daily Arrest Report (24 hours)
on the Newport News Police Open Data page, only the first charge is listed.
The times of arrest on the Daily Arrest Reports do not necessarily match the times listed on the Daily Arrest Report (24 hours)
. For my purposes, the earliest times will be kept.
The Daily Offense Reports show the Beat
number, whereas the Daily Offenses
report on the Newport News Police Open Data page does not.
Precinct
and Beat
in data sets