How can I run this loop?

DC7 · December 27, 2020, 10:23pm

Hi friends!!!
I have a dataframe like this:

study	year	test	n_controls	n_patients	age_controls	age_patients	mean_controls	mean_patients	sd_controls	sd_patients	yi	vi
Caraquez	1998	n-back	12	15	39.1	23.4	1.01	15.4	31.1	0.9	-0.67601411442671	0.158462871905632
Caraquez	1998	n-back	12	15	39.1	23.4	1.01	54	2.1	7	-9.47933975490664	1.81403485535099
Franzen	2012	n-back	65	56	67.1	66.9	10	47	91.1	5	-0.549607753881664	0.03448997594063
Franzen	2012	n-back	65	56	67.1	66.9	50	67	4.1	2	-5.11910608650243	0.141527903385833
Franzen	2012	n-back	65	56	67.1	66.9	50	67	6.1	2	-3.61295800630745	0.087181698553085
Lieu	2012	digit_span	102	99	21.1	23.4	24	10	21.5	5	0.887578672128736	0.021864622961224
Lieu	2012	digit_span	102	99	21.1	23.4	12	20	1.5	5	-2.17285011427459	0.031649403358949
Lieu	2012	digit_span	102	99	21.1	23.4	12	20	1.9	9	-1.23386292200698	0.023692040401736
McMullin	1990	n-back	10	10	35.6	25.1	1.02	111	0.29	0.29	-363.179151876099	3297.67740893607
McMullin	1990	n-back	10	10	35.6	25.1	1.02	131	0.5	0.29	-304.549916872949	2318.966296683
McMullin	1990	n-back	10	10	35.6	25.1	1.02	1	2.5	0.29	0.01076236528614	0.200002895712664
Meyer	1992	digit_span	75	80	22.2	54.5	56	61	0.9	6.8	-1.00999884488257	0.029123970967734
Meyer	1992	digit_span	75	80	22.2	54.5	56	61	0.9	6.8	-1.00999884488257	0.029123970967734

I have to run rma.mv function from package metafor using the following code:
ma_model_all <- rma.mv(yi, vi, data = all_csv, subset = (study=="McMullin"))
Here, the subset option is used to run the function rma.mv for specific rows (here for "McMullin"). And it gives an output ma_model_all which is run by following code-
RE_table <- coef(summary(ma_model_all))
so that it produces a table like:

           estimate       se        zval      pval      ci.lb     ci.ub
intrcpt -0.03752519 0.447184 -0.08391443 0.9331245 -0.9139897 0.8389393

Now, I want two things:
Firstly, the intrcpt in the RE_table to be replaced by the study name from where it comes (i.e. here McMullin), and

Secondly, run these two commands in a loop and produce this type (RE_model) table for every row name (i.e., Caraquez, Franzen, Lieu, McMullin, Meyer) and produce a single datatframe from all the RE_tables like this:

	estimate	se	zval	pval	ci.lb	ci.ub
Caraquez	0.567525188100193	0.447183987	-0.123914426703275	0.933124469167552	-0.45897017593	0.566565325558915
Franzen	0.12345567525188	0.442383987	0.67777777703275	0.12344469167552	-0.233232317593	0.4434434344
Lieu	0.7895251881	0.1237183987	-0.6666426703275	0.555552446916755	-0.444447593	0.838939325558915
McMullin	-0.037525188100193	0.447183989385799	-0.083914426703275	0.933124469167552	-0.9139897017593	0.838939325558915
Meyer	0.53411331	0.554212878	-0.323323121	0.54124469167552	-0.800898917593	0.87777778915

Can you please help me?

Thanks in advance,
DC7

bayesian · December 28, 2020, 5:10am

Not sure why my values are different than what you showed (apart from McMullin), but does this work?

study_types <- unique(all_csv$study)
# [1] "Caraquez" "Franzen"  "Lieu"     "McMullin" "Meyer"  

# Create data frame to store
df <- data.frame(matrix(NA, ncol = 7, nrow = 5))
df[, 1] <- study_types
names(df) <- c("Study", names(RE_table)) # just a convenient way to get the col names from your earlier object
df

     Study estimate se zval pval ci.lb ci.ub
1 Caraquez       NA NA   NA   NA    NA    NA
2  Franzen       NA NA   NA   NA    NA    NA
3     Lieu       NA NA   NA   NA    NA    NA
4 McMullin       NA NA   NA   NA    NA    NA
5    Meyer       NA NA   NA   NA    NA    NA

for(i in 1:length(study_types)){
  ma_model <- rma.mv(yi, vi, data = all_csv, subset = (study == study_types[i]))
  re_table <- coef(summary(ma_model))
  df[i, -1] <- re_table
}

df

     Study    estimate        se         zval         pval      ci.lb      ci.ub
1 Caraquez -1.38323940 0.3817494  -3.62342295 2.907298e-04 -2.1314544 -0.6350244
2  Franzen -1.96817463 0.1450495 -13.56898662 6.116178e-42 -2.2524664 -1.6838828
3     Lieu -0.67038982 0.0914624  -7.32967639 2.307091e-13 -0.8496528 -0.4911268
4 McMullin -0.03752519 0.4471840  -0.08391443 9.331245e-01 -0.9139897  0.8389393
5    Meyer -1.00999884 0.1206731  -8.36971327 5.775860e-17 -1.2465137 -0.7734840

DC7 · December 28, 2020, 2:47pm

Many many thanks @bayesian for your response. It works. But, let me tell you some improvement of the code, if possible.

Firstly, can we automatize the number of rows of the dataframe df according to the number of unique study? Because, I just give you an example data frame. The original dataframe may be of several hundred rows long. Number of study_types may vary from 100-150.
Also, the column names for dfare coming from RE_table that I have provided in the question for convenience. But, it would not be originally provided as RE_table is generated in the loop. So, somehow, is it possible to create the column names of df from the RE_table generated in the loop?

Thanks a lot for givingyour valuable time solving my problem.

bayesian · December 28, 2020, 2:56pm

Yes it can easily be done. On hindsight I should've done that in the first place. My bad!

To automatically create the number of rows = how many studies you have, just use nrow = length(study_types) instead, which would draw the number from how many study names/authors you have stored in study_types.
To manually specify the column names, you would type them in quotations in a vector like so: names(df) <- c("Study", "estimate", "se", "zval", "pval", "ci.lb", "ci.ub"). I suppose the number of columns will not change so we can leave ncol = 7 when creating the df. But otherwise you can use the same length() way to set column numbers.

DC7 · December 30, 2020, 3:08pm

Thanks a lot @bayesian. But, I don't know why I am facing this error now:

Error in rma.mv(yi, vi, data = all_csv, subset = (study == study_types[i])) :
Processing terminated since k <= 1.

Here's my code: (I have changed the name of first column of all_csv from study to Bacteria)

library(RCurl)
library(bitops)
library(metafor)
library(Formula)
library(dplyr)
library(ggforestplot)

all_csv <- read.csv("~/Desktop/all_es.csv")
Bacteria <- unique(all_csv$Bacteria)



# Create data frame to store
df <- data.frame(matrix(NA, ncol = 7, nrow = length(Bacteria)))
df[, 1] <- Bacteria
names(df) <- c("Bacteria", "estimate", "se", "zval", "pval", "ci.lb", "ci.ub") # just a convenient way to get the col names from your earlier object
df

for(i in 1:length(Bacteria)){
  ma_model <- rma.mv(yi, vi, data = all_csv, subset = (Bacteria == Bacteria[i]))
  re_table <- coef(summary(ma_model))
  df[i, -1] <- re_table
}

I have tried several times with different tricks, nothing is working.
Thanks

bayesian · December 30, 2020, 3:29pm

I think the issue is because you named the vector of Bacteria names as Bacteria (same name as the column). This causes a problem in the for-loop at this part subset = (Bacteria == Bacteria[i]))

I stored the bacteria names in a differently named vector called bacteria_names and edited the other occurrences accordingly, and it works fine:

names(all_csv)[1] <- "Bacteria"

bacteria_names <- unique(all_csv$Bacteria)

df <- data.frame(matrix(NA, ncol = 7, nrow = length(bacteria_names)))
df[, 1] <- bacteria_names
names(df) <- c("Bacteria", "estimate", "se", "zval", "pval", "ci.lb", "ci.ub")
df

for(i in 1:length(bacteria_names)){
  ma_model <- rma.mv(yi, vi, data = all_csv, subset = (Bacteria == bacteria_names[i]))
  re_table <- coef(summary(ma_model))
  df[i, -1] <- re_table
}

df

  Bacteria    estimate        se         zval         pval      ci.lb      ci.ub
1 Caraquez -1.38323940 0.3817494  -3.62342295 2.907298e-04 -2.1314544 -0.6350244
2  Franzen -1.96817463 0.1450495 -13.56898662 6.116178e-42 -2.2524664 -1.6838828
3     Lieu -0.67038982 0.0914624  -7.32967639 2.307091e-13 -0.8496528 -0.4911268
4 McMullin -0.03752519 0.4471840  -0.08391443 9.331245e-01 -0.9139897  0.8389393
5    Meyer -1.00999884 0.1206731  -8.36971327 5.775860e-17 -1.2465137 -0.7734840

Edit:

While my above discussion still stands (if you named both column and vector Bacteria you will get repeated results), I just noticed your quoted error was with the earlier naming of study and study_types. It seems the Processing terminated since k <= 1. error comes from the metafor package itself rather than an issue with the loop. I suppose the rma.mv function terminates when there is k <= 1 studies left?

DC7 · December 30, 2020, 3:43pm

Yes. @bayesian. I am facing the same issue after renaming as youtold. But problem still exists. I am seeing the loop runs for some rows and terminates. Gives a result like:

 > df
                Bacteria    estimate        se         zval         pval      ci.lb      ci.ub
1     Methanobrevibacter -0.03752519 0.4471840  -0.08391443 9.331245e-01 -0.9139897  0.8389393
2         Methanosphaera -0.98189028 0.1154835  -8.50243170 1.856599e-17 -1.2082337 -0.7555468
3  Methanomassiliicoccus  3.83246261 0.5414903   7.07761968 1.466515e-12  2.7711611  4.8937641
4            Actinomyces  0.75429295 0.5958761   1.26585527 2.055649e-01 -0.4136028  1.9221887
5          Aeriscardovia -1.91938071 0.1637011 -11.72490696 9.500536e-32 -2.2402291 -1.5985324
6        Bifidobacterium  2.50389122 0.2804214   8.92903146 4.297546e-19  1.9542754  3.0535070
7            Gardnerella -0.19910302 0.3238758  -0.61475107 5.387191e-01 -0.8338880  0.4356820
8         Brevibacterium -2.32479799 0.3074499  -7.56155179 3.982888e-14 -2.9273886 -1.7222074
9                 Rothia  0.29377681 0.1113252   2.63890753 8.317366e-03  0.0755835  0.5119701
10            Tropheryma -1.23295572 0.1672961  -7.36989900 1.707574e-13 -1.5608501 -0.9050613
11     Propionibacterium          NA        NA           NA           NA         NA         NA
12             Atopobium          NA        NA           NA           NA         NA         NA
13                Enorma          NA        NA           NA           NA         NA         NA
14         Adlercreutzia          NA        NA           NA           NA         NA         NA
15       Asaccharobacter          NA        NA           NA           NA         NA         NA

bayesian · December 30, 2020, 4:05pm

I managed to recreate the problem using the old dataset. It happens when you have only one of a type of study/bacteria. For example, if I remove the first row using data = all_csv[-1, ], the problem occurs because there is only one Caraquez left so you cannot meta-analyse it (you need 2 or more Caraquez).

for(i in 1:length(bacteria_names)){
  ma_model <- rma.mv(yi, vi, data = all_csv[-1, ], subset = (Bacteria == bacteria_names[i]))
  re_table <- coef(summary(ma_model))
  df[i, -1] <- re_table
}

Error in rma.mv(yi, vi, data = all_csv[-1, ], subset = (Bacteria == bacteria_names[i])) : 
  Processing terminated since k <= 1.

DC7 · December 30, 2020, 4:06pm

Ah. ok. Now I understand. Thanks a lot @bayesian. You saved me.

DC7 · December 30, 2020, 4:20pm

Hi @bayesian- I have noticed that the loop abruptly stops in the middle. For example, here is my dataset below. The first bacteria in the dataset is on;y single time present. So, the loop does not even go beyond it. It just stops at the begining giving the following result.

Dataset:

Bacteria	year	test	n_controls	n_patients	age_controls	age_patients	mean_controls	mean_patients	sd_controls	sd_patients	yi	vi	country
Adlercreutzia	2012	digit_span	102	99	21.1	23.4	12	20	1.9	9	-1.23386292200698	0.023692040401736	India
Aeriscardovia	2005	digit_span	20	19	65	65.1	2.9	123.1	2.8	8.6	-18.615474202867	4.54539926866026	australia
Aeriscardovia	2000	digit_span	102	120	19.1	23.7	12	25.1	3.02	6	-2.68398773123057	0.034362007472673	Denmark
Aeriscardovia	2005	digit_span	20	19	65	65.1	22.9	13.1	9.8	2.6	1.32355279414728	0.125090450728074	India
Asaccharobacter	2012	n-back	65	56	67.1	66.9	50	67	6.1	2	-3.61295800630745	0.087181698553085	India
Atopobium	2012	n-back	65	56	67.1	66.9	50	67	4.1	2	-5.11910608650243	0.141527903385833	Denmark
Bifidobacterium	2005	n-back	46	37	25.7	24.7	39	9.7	2.8	12.2	3.45698804727594	0.120758725889573	australia
Bifidobacterium	2005	digit_span	20	19	65	65.1	22.9	13.1	2.8	2.6	3.54948861309056	0.264155545799414	Denmark
Bifidobacterium	2005	n-back	46	37	25.7	24.7	39	97	2.8	4.5	-15.7232515863889	1.53804712402133	India
Brevibacterium	2010	n-back	38	48	55.6	54.5	1.7	21	6	0.25	-4.79797305176145	0.18098950307114	australia
Brevibacterium	2007	n-back	12	9	44.4	45	22	18	11	8.8	0.378959033077559	0.197863728938518	Denmark

Result:

> df
                   Study estimate se zval pval ci.lb ci.ub
1          Adlercreutzia       NA NA   NA   NA    NA    NA
2          Aeriscardovia       NA NA   NA   NA    NA    NA
3        Asaccharobacter       NA NA   NA   NA    NA    NA
4              Atopobium       NA NA   NA   NA    NA    NA
5        Bifidobacterium       NA NA   NA   NA    NA    NA
6         Brevibacterium       NA NA   NA   NA    NA    NA
7                 Enorma       NA NA   NA   NA    NA    NA
8            Gardnerella       NA NA   NA   NA    NA    NA
9     Methanobrevibacter       NA NA   NA   NA    NA    NA
10 Methanomassiliicoccus       NA NA   NA   NA    NA    NA
11        Methanosphaera       NA NA   NA   NA    NA    NA
12     Propionibacterium       NA NA   NA   NA    NA    NA
13                Rothia       NA NA   NA   NA    NA    NA
14            Tropheryma       NA NA   NA   NA    NA    NA
15           Actinomyces       NA NA   NA   NA    NA    NA

Is there an way to solve the problem. I don't know earlier I did with the same data and at least the loop completed.

bayesian · December 30, 2020, 5:48pm

We just have to remove the types of Bacteria that only have one study/instance:

# See which bacteria only has one of its type
library(dplyr)
all_csv %>% group_by(Bacteria) %>% count(sort = TRUE) %>% filter(n == 1)

# A tibble: 3 x 2
# Groups:   Bacteria [3]
  Bacteria            n
  <chr>           <int>
1 Adlercreutzia       1
2 Asaccharobacter     1
3 Atopobium           1

# Remove these studies
all_csv <- all_csv %>% filter(!Bacteria == "Adlercreutzia" &
                                 !Bacteria == "Asaccharobacter" &
                                 !Bacteria == "Atopobium")

Our usual code

bacteria_names <- unique(all_csv$Bacteria)

df <- data.frame(matrix(NA, ncol = 7, nrow = length(bacteria_names)))
df[, 1] <- bacteria_names
names(df) <- c("Bacteria", "estimate", "se", "zval", "pval", "ci.lb", "ci.ub")
df

for(i in 1:length(bacteria_names)){
  ma_model <- rma.mv(yi, vi, data = all_csv, subset = (Bacteria == bacteria_names[i]))
  re_table <- coef(summary(ma_model))
  df[i, -1] <- re_table
}

df

         Bacteria  estimate        se       zval         pval     ci.lb     ci.ub
1   Aeriscardovia -1.919381 0.1637011 -11.724907 9.500536e-32 -2.240229 -1.598532
2 Bifidobacterium  2.503891 0.2804214   8.929031 4.297546e-19  1.954275  3.053507
3  Brevibacterium -2.324798 0.3074499  -7.561552 3.982888e-14 -2.927389 -1.722207

DC7 · December 30, 2020, 5:55pm

Thanks @bayesian for your constant help. And sorry for continuously bothering you. Is it possible to run the loop for Bacteria >1. I just want to run without removing them. Because for a big dataframe, these number may be super high to write them manually.

Thanks.

bayesian · December 31, 2020, 4:09am

No worries. Is it okay if you create a separate data frame for the bacteria that can be meta-analysed? So we preserve the original dataset all_csv.

This will automate the identification and filtering of all the single-instance bacteria, and keeps all the meta-analysable studies in a new df called final_csv:


# Extract a vector of all the single bacteria studies
single_bacteria <- all_csv %>% count(Bacteria, sort = TRUE) %>% 
                   filter(n == 1) %>% select(Bacteria) %>% .$Bacteria
single_bacteria
[1] "Adlercreutzia"   "Asaccharobacter" "Atopobium"

# Keep only those bacteria that have >1 studies without having to type each bacteria out
final_csv <- all_csv[which(!all_csv$Bacteria %in% single_bacteria), ]

# Our usual code, except now all_csv is replaced with final_csv
bacteria_names <- unique(final_csv$Bacteria)

df <- data.frame(matrix(NA, ncol = 7, nrow = length(bacteria_names)))
df[, 1] <- bacteria_names
names(df) <- c("Bacteria", "estimate", "se", "zval", "pval", "ci.lb", "ci.ub")
df

for(i in 1:length(bacteria_names)){
  ma_model <- rma.mv(yi, vi, data = final_csv, subset = (Bacteria == bacteria_names[i]))
  re_table <- coef(summary(ma_model))
  df[i, -1] <- re_table
}

df

         Bacteria  estimate        se       zval         pval     ci.lb     ci.ub
1   Aeriscardovia -1.919381 0.1637011 -11.724907 9.500536e-32 -2.240229 -1.598532
2 Bifidobacterium  2.503891 0.2804214   8.929031 4.297546e-19  1.954275  3.053507
3  Brevibacterium -2.324798 0.3074499  -7.561552 3.982888e-14 -2.927389 -1.722207

DC7 · December 31, 2020, 11:30am

Many many thanks @bayesian. It is working fine now. And also it seems better than the previous one, as I don;t have to write a long list of bacteria. I really appreciate your help. Thanks again.

Deep

system · January 7, 2021, 11:30am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.