From using xpdf, rvest, and quanteda on United Nations Digital Library search results to applying dictionaries to speeches in United Nations meeting records
How to:
rvest
and download.files
to scrape the pdfs from the search resultsquanteda
to create a dictionary to see which speakers/organisations are talking about a topic of interest the mostI had to do this for a project, so I thought I’d share my code to save someone else the pain of having to figure this out from scratch.
Find the guide here (it’s an R Markdown file). I’ve set the working directories in the Markdown file to make it work if you clone the whole respository to your ~/Desktop and run the code; the pdf and txt folders are empty and ready to receive downloads of the pdf files/converted txt files. If you’d like to see the nicely rendered html result of the Markdown file, you’ll need to first clone/download the repository.
I’ve included some random search results in the results.csv
file for you to experiment with.
This will be quite a detailed walkthrough; for a more advanced alternative try this guide by Dr Pablo Barberá.
Many thanks to Dr Pablo Barberá and Dr Gokhan Ciflikli for their invaluable help and advice.
Feedback would be greatly appreciated!