title: "Area Under the Curve Speed Test"
Benchmarking Area Under the Curve
Background
I had a very similar problem only last month. I created this R Markdown Notebook to track and test answers from a recent discussion on StackOverflow, where I asked:
I want to estimate the likelihood that a randomly-selected item from one group will have a higher score than a randomly-selected item from a different group. That is, the Probability of Superiority, sometimes called the Common-language Effect Size. See for example: https://rpsychologist.com/d3/cohend/. This can be resolved algebraically if we accept that the data are normally distributed McGraw and Wong (1992, Psychological Bulletin, 111, doi: 10.1037/0033-2909.111.2.361), but I know that my data are not normally distributed, making such an estimate unreliable.
The resulting discussion produced several alternatives, most of which were a great deal better than my first efforts.
This file runs benchmarking tests to see the relative speed at which each algorithm works with different sample sizes.
Setup
Load required libraries
# for analysis
library(data.table)
library(bigstatsr)
# for plotting
library(ggplot2) # pretty plots
library(scales) # commas in large numbers
# for recording & comparing runtimes
library(microbenchmark)
# for output to PDF & HTML
library(tinytex)
options(width = 90)
Functions to test
Eight alternative functions are benchmarked against each other and compared using a prespecified number of calculations with 100 iterations. Summary tables are presented for absolute and relative times. We then present these results with violin plots.
Functions are:
- My Original For-If-Else function
- Bring data together with expandgrid
- "CJ" (C)ross (J)oin. A data.table is formed from the cross product of the vectors
- Modified expandgrid function using similar sum routine as CJ
- DataTable cartesian product of data.table
- Wilcox from base R
- Modified Wilcox from base R
- BaseR AUC, taking advantage of properties of matrix outer product
- AUC from BigStatsR
My Original For-If-Else function
MyForIfElse.function <- function(){
bigger <- 0
equal <- 0.0
smaller <- 0
for (i in alpha) {
for (j in beta) {
if (i > j) {bigger <- bigger + 1}
else
if (i == j) {equal <- equal + 0.5}
}
}
PaGTb <- (bigger + equal) / nPairs
PaGTb
}
Expand.grid function
Myexpandgrid.function <- function(){
# My second attempt
c <- expand.grid(a = alpha, b = beta)
aGTb <- length(which(c$a > c$b))
aEQb <- length(which(c$a == c$b))
aGTb <- aGTb + (0.5 * aEQb)
PaGTb <- aGTb / nPairs
PaGTb
}
Cross Join function
MyCJ.function <- function(){
data.table::CJ(a = alpha, b = beta, sorted = F)[, {
aGTb = sum(a > b)
aEQb = sum(a == b)
aGTb = aGTb + 0.5 * aEQb
PaGTb = aGTb / nPairs #data.table returns last value in {}
} ]
}
bigstatsr AUC function
MyAUC.function <- function() {
MyAUC <-
bigstatsr::AUC(c(alpha, beta), rep(1:0, c(length(alpha), length(beta))))
MyAUC
}
DataTable function
MyDT.function <- function() {
DTa <- data.table(a = alpha)[, .N, keyby = a]
DTb <- data.table(b = beta)[, .N, keyby = b]
MyDT <- (DTb[DTa, on = .(b < a), allow.cartesian = TRUE,
nomatch = 0L, sum(N*i.N)] +
0.5 * DTb[DTa, on = .(b = a), allow.cartesian = TRUE,
nomatch = 0L, sum(N*i.N)]) / nPairs
MyDT
}
Wilcoxon test function
MyWilcox.function <- function() {
MyWilcox <- wilcox.test(alpha, beta)$statistic / nPairs
MyWilcox
}
Modified Wilcox
wilcox_AUC <- function(){
r <- rank(c(alpha, beta))
n.x <- as.integer(length(alpha))
n.y <- as.integer(length(beta))
{sum(r[seq_along(alpha)]) - n.x * (n.x + 1) / 2} / n.x / n.y
}
Base AUC
AUC2 <- function() {
# tabulate does not include 0, therefore we need to
# normalize to positive values
m <- min(c(alpha, beta))
A_Freq = tabulate(alpha - min(m) + 1)
B_Freq = tabulate(beta - min(m) + 1)
# calculate the outer product.
out_matrix <- outer(A_Freq, B_Freq)
bigger = sum(out_matrix[lower.tri(out_matrix)])
equal = 0.5 * sum(diag(out_matrix))
(bigger + equal) / length(alpha) / length(beta)
}
Benchmark all of the Functions
MyBenchmark.function <- function(x) {
mbm <<- microbenchmark(
"ForIfElse" = {MyForIfElse.function()},
"expandgrid" = {Myexpandgrid.function()},
"CrossJoin" = {MyCJ.function()},
"CartesianDT" = {MyDT.function()},
"Wilcox" = {MyWilcox.function()},
"BaseWilcox" = {wilcox_AUC()},
"BigStatsAUC" = {MyAUC.function()},
"BaseAUC" = {AUC2()},
times = x,
unit = "ns"
)
}
Charting functions
Function to visualise the data distributions
MyDistPlot <- function(){
a <- data.frame(val = alpha, id = "alpha")
b <- data.frame(val = beta, id = "beta")
c <- rbind(a,b)
p2 <- ggplot(data = c, aes(x = val, group = id, fill = id)) +
geom_density(adjust = 1.5, alpha = .2) +
theme_classic() +
labs(title = NULL ,
subtitle = paste("Sample sizes = ", MyRows),
caption = paste("AUC = Alpha:", round(PaGTb, 2) * 100,
", Beta:", round(1 - PaGTb, 2) * 100,
"
Note truncated x-axis: Values can range ± 30")
) +
theme(legend.position = c(0.9, 0.9)) +
xlim(-8, 8)
p2
}
Function to plot results of simulation
MyPlot.function <- function(){
pl <- ggplot(mbm, aes(expr, time)) +
geom_violin(scale = "width", trim = TRUE,
fill = "blue", alpha = 0.1,
colour = "#999999") +
geom_jitter(height = 0, width = 0.2,
alpha = 0.1,
colour = "navy") +
theme_classic() +
scale_y_log10() +
labs(title = "Speed-test: alternative Probability of Superiority calculations",
subtitle = paste(comma(MyRows), " x ",
comma(MyRows), " = ",
comma(nPairs), "comparisons"),
y = "Nanoseconds (log10)" , x = NULL ) +
coord_flip()
pl
}
Generate some artificial data
Generating data
In this example, two non-normal data sets are compared, so it's unclear which should have higher probability of superiority. Group Alpha is more platykurtic (spread out), with many large negative and positive values, while group Beta is a $\chi$$^2$ distribution (df=1), so highly skewed.
# for reproducability
set.seed(42)
# shifted normal distribution cubed
MakeAlpha <- function(){rnorm(MyRows, mean = 0, sd = 1)**3}
# normal distribution squared and shifted
MakeBeta <- function(){rnorm(MyRows, mean = 0, sd = 1)**2 - 0.6}
Run the simulation
Change the number in MyRows for different sample sizes
Note: Large sample size can take a long time.
MyRows <- 25
Complete simulation is run 100 times.
(Reduce the number for a faster result)
alpha <- MakeAlpha()
beta <- MakeBeta()
nPairs <- length(alpha) * length(beta)
# For interest, we calculate the sample Probability of Superiority
# (Area Under the Curve)
PaGTb <- AUC2()
MyDistPlot()
Print summary results and visualise benchmarks
MyBenchmark.function(100)
# print comparison speed test table
print(mbm, unit = "ms")
print(mbm, unit = "relative")
# Visualise the comparison
MyPlot.function()
Interpretation of small sample experiment
The original ForIfElse algorithm seems to work about as well as, or better than, the other options.
The Modified Wilcox and BaseR_AUC algorithm seem to perform best most often.
Rerun with larger sample size
MyRows <- 1000
set.seed(42)
alpha <- MakeAlpha()
beta <- MakeBeta()
nPairs <- length(alpha) * length(beta)
PaGTb <- MyAUC.function()
MyDistPlot()
MyBenchmark.function(100)
# print comparison benchmark tables
print(mbm, unit = "ms")
print(mbm, unit = "relative")
MyPlot.function()
Interpretation of larger sample experiment
The Modified Wilcox, BigStatsR AUC and BaseR AUC algorithm seem to perform best most often.
ForIfElse alfgorithm is among the poorer performers now
Rerun with much larger sample size
This can take some time ... You may want to go make a cup of tea while it's running
MyRows <- 5000
set.seed(42)
alpha <- MakeAlpha()
beta <- MakeBeta()
nPairs <- length(alpha) * length(beta)
PaGTb <- MyAUC.function()
MyDistPlot()
MyBenchmark.function(100)
# print comparison benchmark tables
print(mbm, unit = "ms")
print(mbm, unit = "relative")
MyPlot.function()
Interpretation of larger sample experiment
The BaseR_AUC, followed by Modified Wilcox and BigStatsR AUC seem to perform best most often.
BaseR_AUC functions at about 5000 times the speed of the ForIfElse algorithm.
Conclusions
The Area Under the Curve (AUC) function from the package, bigstatsr (Statistical Tools for Filebacked Big Matrices) by Florian Privé (Grenoble, France) was extraordinarily efficient.
The two close winners were created by Cole, who prefers to remain anonymous, with significant input from chinsoon12 (Singapore). These were:
- a Wilcox test extracted from the original R-code to focus only on the single task, rather than a general statistical routine.
- an AUC function that relies on table()
Both algorithms are attractive because they rely only on Base R.