The good news is
shapiro.testBIG <- function (x)
{
DNAME <- deparse1(substitute(x))
stopifnot(is.numeric(x))
x <- sort(x[complete.cases(x)])
n <- length(x)
if (is.na(n) || n < 5L || n > 10000L)
stop("sample size must be between 5 and 10,000")
rng <- x[n] - x[1L]
if (rng == 0)
stop("all 'x' values are identical")
if (rng < 1e-10)
x <- x/rng
res <- .Call(stats:::C_SWilk, x)
RVAL <- list(statistic = c(W = res[1]), p.value = res[2],
method = "Shapiro-Wilk normality test", data.name = DNAME)
class(RVAL) <- "htest"
return(RVAL)
}
The bad news is
shapiro.testBIG(rnorm(5001))
#> Error in shapiro.testBIG(rnorm(5001)): ifault=2. This should not happen
Created on 2023-03-11 with reprex v2.0.2
ifault
typically arises when a C
routine begins muddling memory management.
C_SWilk
resides in stats.so
and has utterly defeated my search-foo to find the associated C
source. The best I can do is
stats:::C_SWilk
#> $name
#> [1] "SWilk"
#>
#> $address
#> <pointer: 0x11ff54400>
#> attr(,"class")
#> [1] "RegisteredNativeSymbol"
#>
#> $dll
#> DLL name: stats
#> Filename:
#> /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/stats/libs/stats.so
#> Dynamic lookup: FALSE
#>
#> $numParameters
#> [1] 1
#>
#> attr(,"class")
#> [1] "CallRoutine" "NativeSymbolInfo"
The way in which stats::shapiro.test()
calls C_SWilk
takes care to impose a range of 5...5000 beforehand
shapiro.test <- function(x)
{
DNAME <- deparse(substitute(x))
stopifnot(is.numeric(x))
x <- sort(x[complete.cases(x)])
n <- length(x)
if(is.na(n) || n < 3L || n > 5000L)
stop("sample size must be between 3 and 5000")
rng <- x[n] - x[1L]
if(rng == 0) stop("all 'x' values are identical")
if(rng < 1e-10) x <- x/rng # rescale to avoid ifault=6 with single version.
res <- .Call(C_SWilk, x)
RVAL <- list(statistic = c(W = res[1]), p.value = res[2],
method = "Shapiro-Wilk normality test", data.name = DNAME)
class(RVAL) <- "htest"
return(RVAL)
}
because the purpose of C_SWilk
was to relieve the original FORTRAN
limit of 2,000, which in turn relieved the ancestral limit of 50, which you can read about in the papers cited in shapiro.test()
.
See this S/O post by the estimable Ben Bolker.
UPDATE:
Line 127 in swilk.c
has
if (n > 5000) *ifault = 2;
and it would be possible to increase that in a private version and to make the appropriate adjustments.
However, please also carefully review this additional S/O post on the hazards inherent in pushing Shapiro-Wilk beyond the intended limits.