Using R : Scatter plot matrix (SPLOM) and Bar chart
Kamla Kant Tripathi, CS 725/825, Fall 2017
To plot a SPLOM in R, I chose VI2 data named as “universityData-csv.csv”. By following the assignment instructions,
I selected the three attributes from this data viz. number of faculty, number of students, and number of staffs of only US universities. After refining the data in openRefine,
I have remaining data in 97 rows. This csv data is named as “vi6.csv”.
Through this SPLOM, I am visualizing the number of faculty, number of staff, and number of
students ratio of the US universities. Since, there are more than two attributes for the visualization, the scatter plot matrix is a best fit to represent the plot.
# developer: K.K. Tripathi
#date: 10/26/2017
#-------------------------
# scatterplot matrix
# read csv data
data = read.csv("vi6.csv")
#read selected columns
pairs(data[2:4], pch = 23, col="red")
#scatterplot title
title("Scatterplot Matrix (SPLOM) of University Data", line = 3)
In this aforementioned R code, first I read the csv data file then used the ‘pairs’ method to choose desired columns 2 to 4 from the dataset which are
numerical attributes- numFaculty, numStaff, and numStudents respectively. In this scatter plot matrix representation, I used “pch = 24” to represent plot
points in diamond shape because it is easy to differentiate between two overlapping points.
Aditionally, to make is visually discrete I used red color for it’s color parameter.
For this scatterplot, ‘title’ method have been used to give it a title named as ‘Scatterplot Matrix (SPLOM) of University Data’ and to make it in the right position
‘line’ method have been used as value 3.
After plotting the scatter plot, I wanted to know the list of universities with the lower count of faculties through bar chart visualization in R.
Hence, I refined the same data again where I kept data only with equal to and less than 1000 number of faculties.
This bar chart represents the number of faculties vs. US universities through x and y-axis respectively.
There are total 47 universities (rows) in this data file named ‘vi6barchart.csv’.
Through this bar chart, data is very easy to interpret in terms of minimum and maximum number of faculty count in each university because horizontal bar chart helps us to compare the
faculty counts between each university.
Universities with the most number of faculties.
# developer: K.K. Tripathi
#date: 10/26/2017
#-------------------------
#bar chart
# read csv data
data = read.csv("vi6barchart.csv")
# plot bar chart
barchart(data$university~data$numFaculty,
scales=list(x=list(cex=0.8)),
xlab = " Total faculty count",
ylab = "US universities",
main = "US Universities with upto total 1000 faculties")
In this bar chart code, I used csv data ‘vi6barchart.csv’ as input file. To plot this chart, ‘barchart’ method which represents the
‘university’ and ‘numFaculty’ attributes of csv data file have been used. For the label representations of x and y-axis, ‘xlab’ and ‘ylab’
arguments have been used to label the total faculty count and US universities respectively. The ‘main’ argument have been used to give the title of this bar
chart which is ‘US Universities with upto total 1000 faculties’.
The csv data is used in this project is taken from the following link: