When performing regression analysis on relatively large data IN Rstudio I am getting the following warning and non-sensical results.
Warning message:
glm.fit: algorithm did not converge
But, when I run the same code in R interactively, the model works without issue. The sessionInfo()
between the two environments are identical and I have listed it below.
Sample Code:
Works in R interactively, but gives weird results in RSutdio. If I reduce the sample size to 4000, the regression works without issue in either environment.
# set seed and create sample data
set.seed(1234)
true_class <- factor(sample(paste0("Class", 1:2),
size = 4096,
prob = c(.2, .8), replace = TRUE))
true_class <- sort(true_class)
class1_probs <- rbeta(sum(true_class == "Class1"), 4, 1)
class2_probs <- rbeta(sum(true_class == "Class2"), 1, 2.5)
test_set <- data.frame(obs = true_class,
Class1 = c(class1_probs, class2_probs))
# run regression analysis
test_set.fit <- glm(obs~Class1, data = test_set,
family = "binomial")
I am using the most updated version of RStudio server, Version 1.2.5019
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64_lin/libmkl_rt.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.6.1
I would greatly appreciate any help on this issue.
Thanks,
greg