English is a world language; even among those who have it as a first language there is no uniform way of either speaking or writing it. Like all languages, it's for communication and communication requires equal effort between sender and receiver. Your English is better than my attempts at any of the languages that I've studied.
The data described is not an obvious candidate for logistic regression modeling. It appears that the number of cases is the dependent (or treatment or outcome) variable, and it is on the borderline of categorical/continuous. If it takes on more than about 12 values, the conventional approach is to treat it as continuous.
fit <- lm(cases ~ neighborhood + precipitation + temperature, data = dengue)
Or fewer variables could be chosen initially.
Many types of continuous variables follow a Gaussian distribution. A relatively small number of distinct values, however, especially if reported as integers may follow a Poisson distribution. An lm
model is appropriate in the first case and a glm
model in the second, with family = poisson
or family=quasipoisson
d.AD <- data.frame(treatment = gl(3,3),
outcome = gl(3,1,9),
counts = c(18,17,15, 20,10,20, 25,13,12))
glm.D93 <- glm(counts ~ outcome + treatment, d.AD, family = poisson())
summary(glm.D93)
#>
#> Call:
#> glm(formula = counts ~ outcome + treatment, family = poisson(),
#> data = d.AD)
#>
#> Deviance Residuals:
#> 1 2 3 4 5 6 7 8
#> -0.67125 0.96272 -0.16965 -0.21999 -0.95552 1.04939 0.84715 -0.09167
#> 9
#> -0.96656
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 3.045e+00 1.709e-01 17.815 <2e-16 ***
#> outcome2 -4.543e-01 2.022e-01 -2.247 0.0246 *
#> outcome3 -2.930e-01 1.927e-01 -1.520 0.1285
#> treatment2 1.338e-15 2.000e-01 0.000 1.0000
#> treatment3 1.421e-15 2.000e-01 0.000 1.0000
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for poisson family taken to be 1)
#>
#> Null deviance: 10.5814 on 8 degrees of freedom
#> Residual deviance: 5.1291 on 4 degrees of freedom
#> AIC: 56.761
#>
#> Number of Fisher Scoring iterations: 4
## Quasipoisson: compare with above / example(glm) :
glm.qD93 <- glm(counts ~ outcome + treatment, d.AD, family = quasipoisson())
summary(glm.qD93)
#>
#> Call:
#> glm(formula = counts ~ outcome + treatment, family = quasipoisson(),
#> data = d.AD)
#>
#> Deviance Residuals:
#> 1 2 3 4 5 6 7 8
#> -0.67125 0.96272 -0.16965 -0.21999 -0.95552 1.04939 0.84715 -0.09167
#> 9
#> -0.96656
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.045e+00 1.944e-01 15.665 9.7e-05 ***
#> outcome2 -4.543e-01 2.299e-01 -1.976 0.119
#> outcome3 -2.930e-01 2.192e-01 -1.337 0.252
#> treatment2 1.338e-15 2.274e-01 0.000 1.000
#> treatment3 1.421e-15 2.274e-01 0.000 1.000
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for quasipoisson family taken to be 1.2933)
#>
#> Null deviance: 10.5814 on 8 degrees of freedom
#> Residual deviance: 5.1291 on 4 degrees of freedom
#> AIC: NA
#>
#> Number of Fisher Scoring iterations: 4
(From help(family)
.
Only if cases
were coded as 0
= none and 1
= some would a logistic bit be considered.
lfit <- glm(cases_yn ~ ., family = "binomial)
The neighborhood
and date
variables introduce the potential for spatial and temporal autocorrelation—adjacent neighborhoods may share underlying conditions conducive to disease and the disease may have seasonality, such that one August, for example, is much like the next. There are tools in the time series domain to deal with the temporal case. I've not used them, but I assume that they exist for the spatial case, as well. (Of course, if the commonalities by neighborhood are distinct from location, that wouldn't be a concern.)
Finally, casting neighborhood
as a factor can be used for categorizing. See this description of the forcats
package.