How can I run this loop?

Hi friends!!!
I have a dataframe like this:

study year test n_controls n_patients age_controls age_patients mean_controls mean_patients sd_controls sd_patients yi vi
Caraquez 1998 n-back 12 15 39.1 23.4 1.01 15.4 31.1 0.9 -0.67601411442671 0.158462871905632
Caraquez 1998 n-back 12 15 39.1 23.4 1.01 54 2.1 7 -9.47933975490664 1.81403485535099
Franzen 2012 n-back 65 56 67.1 66.9 10 47 91.1 5 -0.549607753881664 0.03448997594063
Franzen 2012 n-back 65 56 67.1 66.9 50 67 4.1 2 -5.11910608650243 0.141527903385833
Franzen 2012 n-back 65 56 67.1 66.9 50 67 6.1 2 -3.61295800630745 0.087181698553085
Lieu 2012 digit_span 102 99 21.1 23.4 24 10 21.5 5 0.887578672128736 0.021864622961224
Lieu 2012 digit_span 102 99 21.1 23.4 12 20 1.5 5 -2.17285011427459 0.031649403358949
Lieu 2012 digit_span 102 99 21.1 23.4 12 20 1.9 9 -1.23386292200698 0.023692040401736
McMullin 1990 n-back 10 10 35.6 25.1 1.02 111 0.29 0.29 -363.179151876099 3297.67740893607
McMullin 1990 n-back 10 10 35.6 25.1 1.02 131 0.5 0.29 -304.549916872949 2318.966296683
McMullin 1990 n-back 10 10 35.6 25.1 1.02 1 2.5 0.29 0.01076236528614 0.200002895712664
Meyer 1992 digit_span 75 80 22.2 54.5 56 61 0.9 6.8 -1.00999884488257 0.029123970967734
Meyer 1992 digit_span 75 80 22.2 54.5 56 61 0.9 6.8 -1.00999884488257 0.029123970967734

I have to run rma.mv function from package metafor using the following code:
ma_model_all <- rma.mv(yi, vi, data = all_csv, subset = (study=="McMullin"))
Here, the subset option is used to run the function rma.mv for specific rows (here for "McMullin"). And it gives an output ma_model_all which is run by following code-
RE_table <- coef(summary(ma_model_all))
so that it produces a table like:

           estimate       se        zval      pval      ci.lb     ci.ub
intrcpt -0.03752519 0.447184 -0.08391443 0.9331245 -0.9139897 0.8389393

Now, I want two things:
Firstly, the intrcpt in the RE_table to be replaced by the study name from where it comes (i.e. here McMullin), and

Secondly, run these two commands in a loop and produce this type (RE_model) table for every row name (i.e., Caraquez, Franzen, Lieu, McMullin, Meyer) and produce a single datatframe from all the RE_tables like this:

estimate se zval pval ci.lb ci.ub
Caraquez 0.567525188100193 0.447183987 -0.123914426703275 0.933124469167552 -0.45897017593 0.566565325558915
Franzen 0.12345567525188 0.442383987 0.67777777703275 0.12344469167552 -0.233232317593 0.4434434344
Lieu 0.7895251881 0.1237183987 -0.6666426703275 0.555552446916755 -0.444447593 0.838939325558915
McMullin -0.037525188100193 0.447183989385799 -0.083914426703275 0.933124469167552 -0.9139897017593 0.838939325558915
Meyer 0.53411331 0.554212878 -0.323323121 0.54124469167552 -0.800898917593 0.87777778915

Can you please help me?

Thanks in advance,
DC7

Not sure why my values are different than what you showed (apart from McMullin), but does this work?

study_types <- unique(all_csv$study)
# [1] "Caraquez" "Franzen"  "Lieu"     "McMullin" "Meyer"  

# Create data frame to store
df <- data.frame(matrix(NA, ncol = 7, nrow = 5))
df[, 1] <- study_types
names(df) <- c("Study", names(RE_table)) # just a convenient way to get the col names from your earlier object
df

     Study estimate se zval pval ci.lb ci.ub
1 Caraquez       NA NA   NA   NA    NA    NA
2  Franzen       NA NA   NA   NA    NA    NA
3     Lieu       NA NA   NA   NA    NA    NA
4 McMullin       NA NA   NA   NA    NA    NA
5    Meyer       NA NA   NA   NA    NA    NA

for(i in 1:length(study_types)){
  ma_model <- rma.mv(yi, vi, data = all_csv, subset = (study == study_types[i]))
  re_table <- coef(summary(ma_model))
  df[i, -1] <- re_table
}

df

     Study    estimate        se         zval         pval      ci.lb      ci.ub
1 Caraquez -1.38323940 0.3817494  -3.62342295 2.907298e-04 -2.1314544 -0.6350244
2  Franzen -1.96817463 0.1450495 -13.56898662 6.116178e-42 -2.2524664 -1.6838828
3     Lieu -0.67038982 0.0914624  -7.32967639 2.307091e-13 -0.8496528 -0.4911268
4 McMullin -0.03752519 0.4471840  -0.08391443 9.331245e-01 -0.9139897  0.8389393
5    Meyer -1.00999884 0.1206731  -8.36971327 5.775860e-17 -1.2465137 -0.7734840
1 Like

Many many thanks @bayesian for your response. It works. But, let me tell you some improvement of the code, if possible.

  • Firstly, can we automatize the number of rows of the dataframe df according to the number of unique study? Because, I just give you an example data frame. The original dataframe may be of several hundred rows long. Number of study_types may vary from 100-150.

  • Also, the column names for dfare coming from RE_table that I have provided in the question for convenience. But, it would not be originally provided as RE_table is generated in the loop. So, somehow, is it possible to create the column names of df from the RE_table generated in the loop?

Thanks a lot for givingyour valuable time solving my problem.

Yes it can easily be done. On hindsight I should've done that in the first place. My bad!

  1. To automatically create the number of rows = how many studies you have, just use nrow = length(study_types) instead, which would draw the number from how many study names/authors you have stored in study_types.

  2. To manually specify the column names, you would type them in quotations in a vector like so: names(df) <- c("Study", "estimate", "se", "zval", "pval", "ci.lb", "ci.ub"). I suppose the number of columns will not change so we can leave ncol = 7 when creating the df. But otherwise you can use the same length() way to set column numbers.

1 Like

Thanks a lot @bayesian. But, I don't know why I am facing this error now:

Error in rma.mv(yi, vi, data = all_csv, subset = (study == study_types[i])) :
Processing terminated since k <= 1.

Here's my code: (I have changed the name of first column of all_csv from study to Bacteria)

library(RCurl)
library(bitops)
library(metafor)
library(Formula)
library(dplyr)
library(ggforestplot)

all_csv <- read.csv("~/Desktop/all_es.csv")
Bacteria <- unique(all_csv$Bacteria)



# Create data frame to store
df <- data.frame(matrix(NA, ncol = 7, nrow = length(Bacteria)))
df[, 1] <- Bacteria
names(df) <- c("Bacteria", "estimate", "se", "zval", "pval", "ci.lb", "ci.ub") # just a convenient way to get the col names from your earlier object
df

for(i in 1:length(Bacteria)){
  ma_model <- rma.mv(yi, vi, data = all_csv, subset = (Bacteria == Bacteria[i]))
  re_table <- coef(summary(ma_model))
  df[i, -1] <- re_table
}

I have tried several times with different tricks, nothing is working.
Thanks

I think the issue is because you named the vector of Bacteria names as Bacteria (same name as the column). This causes a problem in the for-loop at this part subset = (Bacteria == Bacteria[i]))

I stored the bacteria names in a differently named vector called bacteria_names and edited the other occurrences accordingly, and it works fine:

names(all_csv)[1] <- "Bacteria"

bacteria_names <- unique(all_csv$Bacteria)

df <- data.frame(matrix(NA, ncol = 7, nrow = length(bacteria_names)))
df[, 1] <- bacteria_names
names(df) <- c("Bacteria", "estimate", "se", "zval", "pval", "ci.lb", "ci.ub")
df

for(i in 1:length(bacteria_names)){
  ma_model <- rma.mv(yi, vi, data = all_csv, subset = (Bacteria == bacteria_names[i]))
  re_table <- coef(summary(ma_model))
  df[i, -1] <- re_table
}

df
  Bacteria    estimate        se         zval         pval      ci.lb      ci.ub
1 Caraquez -1.38323940 0.3817494  -3.62342295 2.907298e-04 -2.1314544 -0.6350244
2  Franzen -1.96817463 0.1450495 -13.56898662 6.116178e-42 -2.2524664 -1.6838828
3     Lieu -0.67038982 0.0914624  -7.32967639 2.307091e-13 -0.8496528 -0.4911268
4 McMullin -0.03752519 0.4471840  -0.08391443 9.331245e-01 -0.9139897  0.8389393
5    Meyer -1.00999884 0.1206731  -8.36971327 5.775860e-17 -1.2465137 -0.7734840

Edit:

While my above discussion still stands (if you named both column and vector Bacteria you will get repeated results), I just noticed your quoted error was with the earlier naming of study and study_types. It seems the Processing terminated since k <= 1. error comes from the metafor package itself rather than an issue with the loop. I suppose the rma.mv function terminates when there is k <= 1 studies left?

1 Like

Yes. @bayesian. I am facing the same issue after renaming as youtold. But problem still exists. I am seeing the loop runs for some rows and terminates. Gives a result like:

 > df
                Bacteria    estimate        se         zval         pval      ci.lb      ci.ub
1     Methanobrevibacter -0.03752519 0.4471840  -0.08391443 9.331245e-01 -0.9139897  0.8389393
2         Methanosphaera -0.98189028 0.1154835  -8.50243170 1.856599e-17 -1.2082337 -0.7555468
3  Methanomassiliicoccus  3.83246261 0.5414903   7.07761968 1.466515e-12  2.7711611  4.8937641
4            Actinomyces  0.75429295 0.5958761   1.26585527 2.055649e-01 -0.4136028  1.9221887
5          Aeriscardovia -1.91938071 0.1637011 -11.72490696 9.500536e-32 -2.2402291 -1.5985324
6        Bifidobacterium  2.50389122 0.2804214   8.92903146 4.297546e-19  1.9542754  3.0535070
7            Gardnerella -0.19910302 0.3238758  -0.61475107 5.387191e-01 -0.8338880  0.4356820
8         Brevibacterium -2.32479799 0.3074499  -7.56155179 3.982888e-14 -2.9273886 -1.7222074
9                 Rothia  0.29377681 0.1113252   2.63890753 8.317366e-03  0.0755835  0.5119701
10            Tropheryma -1.23295572 0.1672961  -7.36989900 1.707574e-13 -1.5608501 -0.9050613
11     Propionibacterium          NA        NA           NA           NA         NA         NA
12             Atopobium          NA        NA           NA           NA         NA         NA
13                Enorma          NA        NA           NA           NA         NA         NA
14         Adlercreutzia          NA        NA           NA           NA         NA         NA
15       Asaccharobacter          NA        NA           NA           NA         NA         NA

I managed to recreate the problem using the old dataset. It happens when you have only one of a type of study/bacteria. For example, if I remove the first row using data = all_csv[-1, ], the problem occurs because there is only one Caraquez left so you cannot meta-analyse it (you need 2 or more Caraquez).

for(i in 1:length(bacteria_names)){
  ma_model <- rma.mv(yi, vi, data = all_csv[-1, ], subset = (Bacteria == bacteria_names[i]))
  re_table <- coef(summary(ma_model))
  df[i, -1] <- re_table
}
Error in rma.mv(yi, vi, data = all_csv[-1, ], subset = (Bacteria == bacteria_names[i])) : 
  Processing terminated since k <= 1.
1 Like

Ah. ok. Now I understand. Thanks a lot @bayesian. You saved me.

1 Like

Hi @bayesian- I have noticed that the loop abruptly stops in the middle. For example, here is my dataset below. The first bacteria in the dataset is on;y single time present. So, the loop does not even go beyond it. It just stops at the begining giving the following result.

Dataset:

Bacteria year test n_controls n_patients age_controls age_patients mean_controls mean_patients sd_controls sd_patients yi vi country
Adlercreutzia 2012 digit_span 102 99 21.1 23.4 12 20 1.9 9 -1.23386292200698 0.023692040401736 India
Aeriscardovia 2005 digit_span 20 19 65 65.1 2.9 123.1 2.8 8.6 -18.615474202867 4.54539926866026 australia
Aeriscardovia 2000 digit_span 102 120 19.1 23.7 12 25.1 3.02 6 -2.68398773123057 0.034362007472673 Denmark
Aeriscardovia 2005 digit_span 20 19 65 65.1 22.9 13.1 9.8 2.6 1.32355279414728 0.125090450728074 India
Asaccharobacter 2012 n-back 65 56 67.1 66.9 50 67 6.1 2 -3.61295800630745 0.087181698553085 India
Atopobium 2012 n-back 65 56 67.1 66.9 50 67 4.1 2 -5.11910608650243 0.141527903385833 Denmark
Bifidobacterium 2005 n-back 46 37 25.7 24.7 39 9.7 2.8 12.2 3.45698804727594 0.120758725889573 australia
Bifidobacterium 2005 digit_span 20 19 65 65.1 22.9 13.1 2.8 2.6 3.54948861309056 0.264155545799414 Denmark
Bifidobacterium 2005 n-back 46 37 25.7 24.7 39 97 2.8 4.5 -15.7232515863889 1.53804712402133 India
Brevibacterium 2010 n-back 38 48 55.6 54.5 1.7 21 6 0.25 -4.79797305176145 0.18098950307114 australia
Brevibacterium 2007 n-back 12 9 44.4 45 22 18 11 8.8 0.378959033077559 0.197863728938518 Denmark

Result:

> df
                   Study estimate se zval pval ci.lb ci.ub
1          Adlercreutzia       NA NA   NA   NA    NA    NA
2          Aeriscardovia       NA NA   NA   NA    NA    NA
3        Asaccharobacter       NA NA   NA   NA    NA    NA
4              Atopobium       NA NA   NA   NA    NA    NA
5        Bifidobacterium       NA NA   NA   NA    NA    NA
6         Brevibacterium       NA NA   NA   NA    NA    NA
7                 Enorma       NA NA   NA   NA    NA    NA
8            Gardnerella       NA NA   NA   NA    NA    NA
9     Methanobrevibacter       NA NA   NA   NA    NA    NA
10 Methanomassiliicoccus       NA NA   NA   NA    NA    NA
11        Methanosphaera       NA NA   NA   NA    NA    NA
12     Propionibacterium       NA NA   NA   NA    NA    NA
13                Rothia       NA NA   NA   NA    NA    NA
14            Tropheryma       NA NA   NA   NA    NA    NA
15           Actinomyces       NA NA   NA   NA    NA    NA

Is there an way to solve the problem. I don't know earlier I did with the same data and at least the loop completed.

We just have to remove the types of Bacteria that only have one study/instance:

# See which bacteria only has one of its type
library(dplyr)
all_csv %>% group_by(Bacteria) %>% count(sort = TRUE) %>% filter(n == 1) 
# A tibble: 3 x 2
# Groups:   Bacteria [3]
  Bacteria            n
  <chr>           <int>
1 Adlercreutzia       1
2 Asaccharobacter     1
3 Atopobium           1
# Remove these studies
all_csv <- all_csv %>% filter(!Bacteria == "Adlercreutzia" &
                                 !Bacteria == "Asaccharobacter" &
                                 !Bacteria == "Atopobium")

Our usual code

bacteria_names <- unique(all_csv$Bacteria)

df <- data.frame(matrix(NA, ncol = 7, nrow = length(bacteria_names)))
df[, 1] <- bacteria_names
names(df) <- c("Bacteria", "estimate", "se", "zval", "pval", "ci.lb", "ci.ub")
df

for(i in 1:length(bacteria_names)){
  ma_model <- rma.mv(yi, vi, data = all_csv, subset = (Bacteria == bacteria_names[i]))
  re_table <- coef(summary(ma_model))
  df[i, -1] <- re_table
}

df

         Bacteria  estimate        se       zval         pval     ci.lb     ci.ub
1   Aeriscardovia -1.919381 0.1637011 -11.724907 9.500536e-32 -2.240229 -1.598532
2 Bifidobacterium  2.503891 0.2804214   8.929031 4.297546e-19  1.954275  3.053507
3  Brevibacterium -2.324798 0.3074499  -7.561552 3.982888e-14 -2.927389 -1.722207
1 Like

Thanks @bayesian for your constant help. And sorry for continuously bothering you. Is it possible to run the loop for Bacteria >1. I just want to run without removing them. Because for a big dataframe, these number may be super high to write them manually.

Thanks.

No worries. Is it okay if you create a separate data frame for the bacteria that can be meta-analysed? So we preserve the original dataset all_csv.

This will automate the identification and filtering of all the single-instance bacteria, and keeps all the meta-analysable studies in a new df called final_csv:


# Extract a vector of all the single bacteria studies
single_bacteria <- all_csv %>% count(Bacteria, sort = TRUE) %>% 
                   filter(n == 1) %>% select(Bacteria) %>% .$Bacteria
single_bacteria
[1] "Adlercreutzia"   "Asaccharobacter" "Atopobium"

# Keep only those bacteria that have >1 studies without having to type each bacteria out
final_csv <- all_csv[which(!all_csv$Bacteria %in% single_bacteria), ]

# Our usual code, except now all_csv is replaced with final_csv
bacteria_names <- unique(final_csv$Bacteria)

df <- data.frame(matrix(NA, ncol = 7, nrow = length(bacteria_names)))
df[, 1] <- bacteria_names
names(df) <- c("Bacteria", "estimate", "se", "zval", "pval", "ci.lb", "ci.ub")
df

for(i in 1:length(bacteria_names)){
  ma_model <- rma.mv(yi, vi, data = final_csv, subset = (Bacteria == bacteria_names[i]))
  re_table <- coef(summary(ma_model))
  df[i, -1] <- re_table
}

df

         Bacteria  estimate        se       zval         pval     ci.lb     ci.ub
1   Aeriscardovia -1.919381 0.1637011 -11.724907 9.500536e-32 -2.240229 -1.598532
2 Bifidobacterium  2.503891 0.2804214   8.929031 4.297546e-19  1.954275  3.053507
3  Brevibacterium -2.324798 0.3074499  -7.561552 3.982888e-14 -2.927389 -1.722207
1 Like

Many many thanks @bayesian. It is working fine now. And also it seems better than the previous one, as I don;t have to write a long list of bacteria. I really appreciate your help. Thanks again.

Deep

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.