Convenience Functions, Moving Window Statistics, and Graphics
Dane Van Domelen
vandomed@gmail.com
2020-04-11
This package contains miscellaneous functions that I think are useful
for various purposes, e.g. for:
Running and summarizing statistical simulation studies (sumsim
,iterate
)
Visualizing data (histo
, cart_app
)
Calculating moving/sliding statistics (sliding_cov
, sliding_cor
,moving_mean
)
Doing something convenient (bmi3
, cleancut
ral
)
In this README, I’ll showcase a few functions.
This function creates tables summarizing results of statistical
simulations, providing common metrics of performance like mean bias,
standard deviation, mean standard error, mean squared error, and
confidence interval coverage.
To illustrate, suppose (X_1, …, X_n \sim N(\mu, \sigma^2)), and we
wish to compare two estimators for (\sigma^2): the MLE ((n) in
denominator) vs. the sample variance ((n-1) in denominator).
MLE <- c()
s2 <- c()
for (ii in 1: 1000) {
x <- rnorm(n = 25)
MLE[ii] <- sum((x - mean(x))^2) / 25
s2[ii] <- sum((x - mean(x))^2) / 24
}
kable(sumsim(estimates = cbind(MLE, s2), truth = 1))
Mean bias | SD | MSE | |
---|---|---|---|
MLE | -0.036 | 0.275 | 0.077 |
s2 | 0.004 | 0.286 | 0.082 |
You can request different performance metrics through the statistics
input; some of them, like confidence interval coverage, require
specifying ses
with standard errors.
This function is similar to the base R function hist
, but with two
added features:
Can overlay one or more fitted probability density/mass functions
(PDFs/PMFs) for any univariate distribution supported in R (see?Distributions
).
Can generate more of a barplot type histogram, where each possible
value gets its own bin centered over its value (useful for discrete
variables with not too many possible values).
Here are two examples:
# Histogram for 1,000 values from Bin(8, 0.25)
x <- rbinom(n = 1000, size = 5, prob = 0.25)
histo(x, dis = "binom", size = 5, colors = "blue", points_list = list(type = "b"))
# Histogram for 10,000 values from lognormal(0, 0.35) and various fitted PDFs.
x <- rlnorm(n = 10000, meanlog = 0, sdlog = 0.35)
histo(x, c("lnorm", "norm", "gamma"), main = "X ~ Lognormal(0, 0.35)")
The function moving_mean is one of dozens of moving average functions
available in R. I’m not sure it’s the absolute fastest, but it is much
faster than roll_mean in RcppRoll.
library("RcppRoll")
lengths <- c(10, 100, 1000, 10000)
multiples1 <- multiples2 <- c()
for (ii in 1: 4) {
n <- lengths[ii]
x <- rnorm(n)
medians <- summary(microbenchmark(roll_mean(x, 5), moving_mean(x, 5),
roll_mean(x, n / 5), moving_mean(x, n / 5),
times = 50))$median
multiples1[ii] <- medians[1] / medians[2]
multiples2[ii] <- medians[3] / medians[4]
}
par(mfrow = c(1, 2))
plot(1: 4, multiples1, type = "b", col = "blue", main = "5-unit MA",
ylab = "Speed multiple", xlab = "Vector length", xaxt = "n",
ylim = c(0, max(multiples1) * 1.05))
axis(side = 1, at = 1: 4, labels = lengths)
abline(h = 1)
plot(1: 4, multiples2, type = "b", col = "blue", main = "length(x)/5-unit MA",
ylab = "Speed multiple", xlab = "Vector length", xaxt = "n",
ylim = c(0, max(multiples2) * 1.05))
axis(side = 1, at = 1: 4, labels = lengths)
abline(h = 1)
Whenever I try to use cut
to categorize a continuous variable, I find
myself taking a suboptimal approach: (1) Call cut
without specifyinglabels
, and with arguments I think will create the groups I want
(\Rightarrow) (2) Run table
to see if it worked (\Rightarrow) (3)
Return to (1) if necessary (Rightarrow) (4) Call cut
once again withlabels
specified.
The idea of cleancut
is to provide a simple character string-based
alternative. To illustrate, here’s how you break a continuous variable
into “low” (\< -1), “medium” (-1 to 1, inclusive), and “high” (> 1).
I’ll do it two ways, once without and once with labels:
x <- rnorm(100)
y.nolabels <- cleancut(x, "(-Inf, -1), [-1, 1], [1, Inf)")
y.labels <- cleancut(x, "(-Inf, -1), [-1, 1], [1, Inf)", labels = c("low", "medium", "high"))
table(y.nolabels, y.labels)
low | medium | high | |
---|---|---|---|
(-Inf, -1) | 20 | 0 | 0 |
[-1, 1] | 0 | 64 | 0 |
[1, Inf) | 0 | 0 | 16 |