jackknife mean - all values return NA

Hello! I am trying to do a jackknife operation apparently. I have asked this question previously on stackoverflow and was redirected to an answer (courtesy of dave armstrong). However this doesn't work, and I now think several functions I wrote before didn't work for the same reason.

My initial question:

The answer:

When I try to run this code

countryedategroup2 %>% mutate(across(c(econw, forpolw, libauthw, demw, urbruralw, ethnicw, educationw, envprow, euintw, decentw), function(x) 
  map_dbl(seq_along(x), function(rn) mean(x[-rn])), 
  .names = "{.col}_mean"))

I get this error:

Error in mutate():
:information_source: In argument: across(...).
Caused by error in across():
! Can't compute column econw_mean.
Caused by error in map_dbl():
:information_source: In index: 1.
Caused by error in h():
! error in evaluating the argument 'x' in selecting a method for function 'mean': object 'x' not found
Run rlang::last_error() to see where the error occurred.

So I started stripping this code. I finally arrived at that:

countryedategroup2 %>% mutate(econwjack = map_dbl(seq_along(econw), function(rn) mean(econw[-rn])))

Which resulted in all Na values. Several for loops I wrote before also resulted in all Na values, but I thought I made a mistake in the code. I now think maybe there is a problem with data. What is the reason for that? When I do the calculation manually as below:

countryedategroup2 <- countryedategroup2 %>% mutate(
  econwjack = c(mean(countryedategroup2$econw[which(countryedategroup2$econw!=countryedategroup2[1,22])], na.rm = TRUE),
                mean(countryedategroup2$econw[which(countryedategroup2$econw!=countryedategroup2[2,22])], na.rm = TRUE),
                mean(countryedategroup2$econw[which(countryedategroup2$econw!=countryedategroup2[3,22])], na.rm = TRUE),
                mean(countryedategroup2$econw[which(countryedategroup2$econw!=countryedategroup2[4,22])], na.rm = TRUE),
                mean(countryedategroup2$econw[which(countryedategroup2$econw!=countryedategroup2[5,22])], na.rm = TRUE),
                mean(countryedategroup2$econw[which(countryedategroup2$econw!=countryedategroup2[6,22])], na.rm = TRUE),
                mean(countryedategroup2$econw[which(countryedategroup2$econw!=countryedategroup2[7,22])], na.rm = TRUE)))

I get below result:

  urbruralw_mean ethnichw_mean educationw_mean envprow_mean euintw_mean decentw_mean econwjack
1       4.587094     0.2378293       0.2020795    0.3101839    0.182376    0.1730582 0.4614283
2       4.587094     0.2378293       0.2020795    0.3101839    0.182376    0.1730582 0.5667689
3       4.587094     0.2378293       0.2020795    0.3101839    0.182376    0.1730582       NaN
4       4.587094     0.2378293       0.2020795    0.3101839    0.182376    0.1730582       NaN
5       4.587094     0.2378293       0.2020795    0.3101839    0.182376    0.1730582 0.1909023
6       4.587094     0.2378293       0.2020795    0.3101839    0.182376    0.1730582 0.5870974
7       4.587094     0.2378293       0.2020795    0.3101839    0.182376    0.1730582 0.5881766

I have 10 variables over 159 groups all including from 7 to 15 observations to be computed in this way so writing this manually is not an option. Thank you very much for your help!

Two problems

econw <- c(0.211374468421053, 0.0495444789473684, 0.0883421052631579, 0.254285829473684, 0.288283657894737, 0.0914804, 0.214830792105263, 0, 0.219519663157895, 0.167761411578947, 0.16954572631579, "NA", "NA", "NA", "NA", "NA", "NA", 0.554445789473684, 0.412379641578948, 0.666462765263158, 0.043414707368421, 0.0141368757894737, 1.04711717157895, 0.0240855263157895, 0.00793917526315789, 0.219133023684211, 0.040332177368421, "NA", "NA", 0.00572921631578947, "NA", "NA", 0.0320958084210526, 0.0740951968421053)
mean(econw)
#> Warning in mean.default(econw): argument is not numeric or logical: returning
#> NA
#> [1] NA
econw <- as.numeric(econw)
#> Warning: NAs introduced by coercion
mean(econw)
#> [1] NA
mean(econw,na.rm = TRUE)
#> [1] 0.204014

Created on 2023-06-24 with reprex v2.0.2

  1. Can't take the mean() of a character vector
  2. Can't take the mean() of a numeric vector containing one or more NA without excluding them

Correct this before moving on to the grouping problem.

Thank you very much! sorry for the basic na mistake :slight_smile: adding na.rm = TRUE worked for below.

view(countryedategroup2 %>% mutate(econwjack = map_dbl(seq_along(econw), function(rn) mean(econw[-rn], na.rm = TRUE))))

what I don't understand is, when I run skimr:::skim(countryedategroup2) econw variable shows as numeric. When I add as.numeric, I still get the error that econw is not found.

> countryedategroup2 %>% mutate(across(as.numeric(c(econw, forpolw, libauthw, demw, urbruralw, ethnicw, educationw, envprow, euintw, decentw)), function(x) 
+   map_dbl(seq_along(x), function(rn) mean(x[-rn], na.rm = TRUE)), 
+   .names = "{.col}_jack"))
Error in `mutate()`:
ℹ In argument: `across(...)`.
Caused by error in `across()`:
! Problem while evaluating `as.numeric(...)`.
Caused by error:
! object 'econw' not found
Run `rlang::last_error()` to see where the error occurred.

However I am fine with doing this for each variable :smiley: when I introduce group_by there are no problems in the bigger dataframe.

cee_niche %>% group_by(countryedate) %>%  mutate(econwjack = map_dbl(seq_along(econw), function(rn) mean(econw[-rn], na.rm = TRUE)))

Thanks a lot again for taking the time!

1 Like

First thing to check, is what is being passed into the pipe

countryedategroup2$econw
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.