How to resolve Inf. $y values when using MASS::boxcox?

Here is a sample of the code I'm running:

'''r
rcode ->
meps_analytic2%>%

  • filter(as.logical(yes))%>%
  • MASS::boxcox(sb2,lambda=seq(-1,1,len=5),plotit=FALSE,data=.)
    $x
    [1] -1.0 -0.5 0.0 0.5 1.0

$y
[1] Inf Inf Inf Inf Inf

I've spent 3+ hours trying to get to the bottom of the issue, including by running is.finite() and is.nan() for all variables that make up sb2, but no infinite values or NaNs are found. Any ideas about what would cause me to keep getting Inf. responses for $y? Thank you in advance!

What happens with

seq(-1,1,length=5)

instead of len=5?

@technocrat Thank you for your response, I am still getting all Inf. values for $y when changing verbiage to length instead of len.

Here is some more of my code if it is helpful:

meps_2013b<-
meps_2013%>%
filter(age13x>=18,mnhlth53>0,rthlth53>0)%>%
mutate(sex2=if_else(sex==1,1,0),
insured=if_else(ins13x==1,1,0))%>%
select(age13x,racethx,mnhlth53,rthlth53,ipdis13,insured,sex2)

meps_analytic2<-
meps_2013b%>%
filter(ipdis13%in%0:17)%>%
mutate(yes=ipdis13>=1,
no=ipdis13==0)

meps_analytic2<-
meps_analytic2%>%
select(age13x,racethx,mnhlth53,rthlth53,sex2,no,yes,insured)

meps_analytic2<-
meps_analytic2%>%
mutate(across(c(age13x,sex2,racethx,rthlth53,mnhlth53),~as.factor(.)))

sb2<- as.formula("yes~no+insured+sex2+racethx+mnhlth53+rthlth53+age13x")

meps_analytic2%>%
filter(as.logical(yes))%>%
MASS::boxcox(sb2,lambda=seq(-1,1,length=5),plotit=FALSE,data=.)

And here is output from summary(meps_analytic2):
age13x racethx mnhlth53 rthlth53 sex2 no
19 : 579 1: 7792 1:9329 1:6252 0:13967 Mode :logical
26 : 559 2:10196 2:7354 2:8163 1:12212 FALSE:1875
22 : 539 3: 5435 3:7187 3:7759 TRUE :24304
21 : 534 4: 2094 4:1887 4:3125
25 : 533 5: 662 5: 422 5: 880
29 : 533
(Other):22902
yes insured
Mode :logical Min. :0.0000
FALSE:24304 1st Qu.:0.0000
TRUE :1875 Median :1.0000
Mean :0.7225
3rd Qu.:1.0000
Max. :1.0000

Ah! boxcox only takes lm and anova objects as an argument and sb2 requires a glm object, because the response variable is binary. See my notes on Hosmer, David W., Stanley Lemeshow, and Rodney X. Sturdivant. Applied logistic regression, particularly the illustration of the lm diagnostics applied to a binary response variable compared to the diagnostics for a numeric response variable. (Just search for "mpg")

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.