how to include a c function when modifying shapiro.test

MODIFIED SHAPIRO WILK TEST FUNCTION includes a call to a C function ( C_SWilk).
It is not recognized by RStudio. How do I include C_SWilk ?
I simply want to increase the allowable sample size.

shapiro.testBIG <- function (x) 
{
    DNAME <- deparse1(substitute(x))
    stopifnot(is.numeric(x))
    x <- sort(x[complete.cases(x)])
    n <- length(x)
    if (is.na(n) || n < 5L || n > 10000L) 
        stop("sample size must be between 5 and 10,000")
    rng <- x[n] - x[1L]
    if (rng == 0) 
        stop("all 'x' values are identical")
    if (rng < 1e-10) 
        x <- x/rng
    res <- .Call(C_SWilk, x)
    RVAL <- list(statistic = c(W = res[1]), p.value = res[2], 
        method = "Shapiro-Wilk normality test", data.name = DNAME)
    class(RVAL) <- "htest"
    return(RVAL)
}

Function Call is done here

library(stats)
for(i in 1:11){
v <- paste("x",i,sep="")
data=df2[[v]]
SW <- shapiro.testBIG(data)
print(SW)
}

The good news is

shapiro.testBIG <- function (x) 
{
  DNAME <- deparse1(substitute(x))
  stopifnot(is.numeric(x))
  x <- sort(x[complete.cases(x)])
  n <- length(x)
  if (is.na(n) || n < 5L || n > 10000L) 
    stop("sample size must be between 5 and 10,000")
  rng <- x[n] - x[1L]
  if (rng == 0) 
    stop("all 'x' values are identical")
  if (rng < 1e-10) 
    x <- x/rng
  res <- .Call(stats:::C_SWilk, x)
  RVAL <- list(statistic = c(W = res[1]), p.value = res[2], 
               method = "Shapiro-Wilk normality test", data.name = DNAME)
  class(RVAL) <- "htest"
  return(RVAL)
}

The bad news is

shapiro.testBIG(rnorm(5001))
#> Error in shapiro.testBIG(rnorm(5001)): ifault=2. This should not happen

Created on 2023-03-11 with reprex v2.0.2

ifault typically arises when a C routine begins muddling memory management.

C_SWilk resides in stats.so and has utterly defeated my search-foo to find the associated C source. The best I can do is

stats:::C_SWilk
#> $name
#> [1] "SWilk"
#> 
#> $address
#> <pointer: 0x11ff54400>
#> attr(,"class")
#> [1] "RegisteredNativeSymbol"
#> 
#> $dll
#> DLL name: stats
#> Filename:
#>         /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/stats/libs/stats.so
#> Dynamic lookup: FALSE
#> 
#> $numParameters
#> [1] 1
#> 
#> attr(,"class")
#> [1] "CallRoutine"      "NativeSymbolInfo"

The way in which stats::shapiro.test() calls C_SWilk takes care to impose a range of 5...5000 beforehand

shapiro.test <- function(x)
{
    DNAME <- deparse(substitute(x))
    stopifnot(is.numeric(x))
    x <- sort(x[complete.cases(x)])
    n <- length(x)
    if(is.na(n) || n < 3L || n > 5000L)
	stop("sample size must be between 3 and 5000")
    rng <- x[n] - x[1L]
    if(rng == 0) stop("all 'x' values are identical")
    if(rng < 1e-10) x <- x/rng # rescale to avoid ifault=6 with single version.
    res <- .Call(C_SWilk, x)
    RVAL <- list(statistic = c(W = res[1]), p.value = res[2],
		 method = "Shapiro-Wilk normality test", data.name = DNAME)
    class(RVAL) <- "htest"
    return(RVAL)
}

because the purpose of C_SWilk was to relieve the original FORTRAN limit of 2,000, which in turn relieved the ancestral limit of 50, which you can read about in the papers cited in shapiro.test().

See this S/O post by the estimable Ben Bolker.

UPDATE:

Line 127 in swilk.c has

    if (n > 5000) *ifault = 2;

and it would be possible to increase that in a private version and to make the appropriate adjustments.

However, please also carefully review this additional S/O post on the hazards inherent in pushing Shapiro-Wilk beyond the intended limits.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.