I have to run rma.mv function from package metafor using the following code: ma_model_all <- rma.mv(yi, vi, data = all_csv, subset = (study=="McMullin"))
Here, the subset option is used to run the function rma.mv for specific rows (here for "McMullin"). And it gives an output ma_model_all which is run by following code- RE_table <- coef(summary(ma_model_all))
so that it produces a table like:
Now, I want two things:
Firstly, the intrcpt in the RE_table to be replaced by the study name from where it comes (i.e. here McMullin), and
Secondly, run these two commands in a loop and produce this type (RE_model) table for every row name (i.e., Caraquez, Franzen, Lieu, McMullin, Meyer) and produce a single datatframe from all the RE_tables like this:
Not sure why my values are different than what you showed (apart from McMullin), but does this work?
study_types <- unique(all_csv$study)
# [1] "Caraquez" "Franzen" "Lieu" "McMullin" "Meyer"
# Create data frame to store
df <- data.frame(matrix(NA, ncol = 7, nrow = 5))
df[, 1] <- study_types
names(df) <- c("Study", names(RE_table)) # just a convenient way to get the col names from your earlier object
df
Study estimate se zval pval ci.lb ci.ub
1 Caraquez NA NA NA NA NA NA
2 Franzen NA NA NA NA NA NA
3 Lieu NA NA NA NA NA NA
4 McMullin NA NA NA NA NA NA
5 Meyer NA NA NA NA NA NA
for(i in 1:length(study_types)){
ma_model <- rma.mv(yi, vi, data = all_csv, subset = (study == study_types[i]))
re_table <- coef(summary(ma_model))
df[i, -1] <- re_table
}
df
Study estimate se zval pval ci.lb ci.ub
1 Caraquez -1.38323940 0.3817494 -3.62342295 2.907298e-04 -2.1314544 -0.6350244
2 Franzen -1.96817463 0.1450495 -13.56898662 6.116178e-42 -2.2524664 -1.6838828
3 Lieu -0.67038982 0.0914624 -7.32967639 2.307091e-13 -0.8496528 -0.4911268
4 McMullin -0.03752519 0.4471840 -0.08391443 9.331245e-01 -0.9139897 0.8389393
5 Meyer -1.00999884 0.1206731 -8.36971327 5.775860e-17 -1.2465137 -0.7734840
Many many thanks @bayesian for your response. It works. But, let me tell you some improvement of the code, if possible.
Firstly, can we automatize the number of rows of the dataframe df according to the number of unique study? Because, I just give you an example data frame. The original dataframe may be of several hundred rows long. Number of study_types may vary from 100-150.
Also, the column names for dfare coming from RE_table that I have provided in the question for convenience. But, it would not be originally provided as RE_table is generated in the loop. So, somehow, is it possible to create the column names of df from the RE_table generated in the loop?
Thanks a lot for givingyour valuable time solving my problem.
Yes it can easily be done. On hindsight I should've done that in the first place. My bad!
To automatically create the number of rows = how many studies you have, just use nrow = length(study_types) instead, which would draw the number from how many study names/authors you have stored in study_types.
To manually specify the column names, you would type them in quotations in a vector like so: names(df) <- c("Study", "estimate", "se", "zval", "pval", "ci.lb", "ci.ub"). I suppose the number of columns will not change so we can leave ncol = 7 when creating the df. But otherwise you can use the same length() way to set column numbers.
I think the issue is because you named the vector of Bacteria names as Bacteria (same name as the column). This causes a problem in the for-loop at this part subset = (Bacteria == Bacteria[i]))
I stored the bacteria names in a differently named vector called bacteria_names and edited the other occurrences accordingly, and it works fine:
While my above discussion still stands (if you named both column and vector Bacteria you will get repeated results), I just noticed your quoted error was with the earlier naming of study and study_types. It seems the Processing terminated since k <= 1. error comes from the metafor package itself rather than an issue with the loop. I suppose the rma.mv function terminates when there is k <= 1 studies left?
Yes. @bayesian. I am facing the same issue after renaming as youtold. But problem still exists. I am seeing the loop runs for some rows and terminates. Gives a result like:
> df
Bacteria estimate se zval pval ci.lb ci.ub
1 Methanobrevibacter -0.03752519 0.4471840 -0.08391443 9.331245e-01 -0.9139897 0.8389393
2 Methanosphaera -0.98189028 0.1154835 -8.50243170 1.856599e-17 -1.2082337 -0.7555468
3 Methanomassiliicoccus 3.83246261 0.5414903 7.07761968 1.466515e-12 2.7711611 4.8937641
4 Actinomyces 0.75429295 0.5958761 1.26585527 2.055649e-01 -0.4136028 1.9221887
5 Aeriscardovia -1.91938071 0.1637011 -11.72490696 9.500536e-32 -2.2402291 -1.5985324
6 Bifidobacterium 2.50389122 0.2804214 8.92903146 4.297546e-19 1.9542754 3.0535070
7 Gardnerella -0.19910302 0.3238758 -0.61475107 5.387191e-01 -0.8338880 0.4356820
8 Brevibacterium -2.32479799 0.3074499 -7.56155179 3.982888e-14 -2.9273886 -1.7222074
9 Rothia 0.29377681 0.1113252 2.63890753 8.317366e-03 0.0755835 0.5119701
10 Tropheryma -1.23295572 0.1672961 -7.36989900 1.707574e-13 -1.5608501 -0.9050613
11 Propionibacterium NA NA NA NA NA NA
12 Atopobium NA NA NA NA NA NA
13 Enorma NA NA NA NA NA NA
14 Adlercreutzia NA NA NA NA NA NA
15 Asaccharobacter NA NA NA NA NA NA
I managed to recreate the problem using the old dataset. It happens when you have only one of a type of study/bacteria. For example, if I remove the first row using data = all_csv[-1, ], the problem occurs because there is only one Caraquez left so you cannot meta-analyse it (you need 2 or more Caraquez).
Hi @bayesian- I have noticed that the loop abruptly stops in the middle. For example, here is my dataset below. The first bacteria in the dataset is on;y single time present. So, the loop does not even go beyond it. It just stops at the begining giving the following result.
Dataset:
Bacteria
year
test
n_controls
n_patients
age_controls
age_patients
mean_controls
mean_patients
sd_controls
sd_patients
yi
vi
country
Adlercreutzia
2012
digit_span
102
99
21.1
23.4
12
20
1.9
9
-1.23386292200698
0.023692040401736
India
Aeriscardovia
2005
digit_span
20
19
65
65.1
2.9
123.1
2.8
8.6
-18.615474202867
4.54539926866026
australia
Aeriscardovia
2000
digit_span
102
120
19.1
23.7
12
25.1
3.02
6
-2.68398773123057
0.034362007472673
Denmark
Aeriscardovia
2005
digit_span
20
19
65
65.1
22.9
13.1
9.8
2.6
1.32355279414728
0.125090450728074
India
Asaccharobacter
2012
n-back
65
56
67.1
66.9
50
67
6.1
2
-3.61295800630745
0.087181698553085
India
Atopobium
2012
n-back
65
56
67.1
66.9
50
67
4.1
2
-5.11910608650243
0.141527903385833
Denmark
Bifidobacterium
2005
n-back
46
37
25.7
24.7
39
9.7
2.8
12.2
3.45698804727594
0.120758725889573
australia
Bifidobacterium
2005
digit_span
20
19
65
65.1
22.9
13.1
2.8
2.6
3.54948861309056
0.264155545799414
Denmark
Bifidobacterium
2005
n-back
46
37
25.7
24.7
39
97
2.8
4.5
-15.7232515863889
1.53804712402133
India
Brevibacterium
2010
n-back
38
48
55.6
54.5
1.7
21
6
0.25
-4.79797305176145
0.18098950307114
australia
Brevibacterium
2007
n-back
12
9
44.4
45
22
18
11
8.8
0.378959033077559
0.197863728938518
Denmark
Result:
> df
Study estimate se zval pval ci.lb ci.ub
1 Adlercreutzia NA NA NA NA NA NA
2 Aeriscardovia NA NA NA NA NA NA
3 Asaccharobacter NA NA NA NA NA NA
4 Atopobium NA NA NA NA NA NA
5 Bifidobacterium NA NA NA NA NA NA
6 Brevibacterium NA NA NA NA NA NA
7 Enorma NA NA NA NA NA NA
8 Gardnerella NA NA NA NA NA NA
9 Methanobrevibacter NA NA NA NA NA NA
10 Methanomassiliicoccus NA NA NA NA NA NA
11 Methanosphaera NA NA NA NA NA NA
12 Propionibacterium NA NA NA NA NA NA
13 Rothia NA NA NA NA NA NA
14 Tropheryma NA NA NA NA NA NA
15 Actinomyces NA NA NA NA NA NA
Is there an way to solve the problem. I don't know earlier I did with the same data and at least the loop completed.
Thanks @bayesian for your constant help. And sorry for continuously bothering you. Is it possible to run the loop for Bacteria >1. I just want to run without removing them. Because for a big dataframe, these number may be super high to write them manually.
No worries. Is it okay if you create a separate data frame for the bacteria that can be meta-analysed? So we preserve the original dataset all_csv.
This will automate the identification and filtering of all the single-instance bacteria, and keeps all the meta-analysable studies in a new df called final_csv:
# Extract a vector of all the single bacteria studies
single_bacteria <- all_csv %>% count(Bacteria, sort = TRUE) %>%
filter(n == 1) %>% select(Bacteria) %>% .$Bacteria
single_bacteria
[1] "Adlercreutzia" "Asaccharobacter" "Atopobium"
# Keep only those bacteria that have >1 studies without having to type each bacteria out
final_csv <- all_csv[which(!all_csv$Bacteria %in% single_bacteria), ]
# Our usual code, except now all_csv is replaced with final_csv
bacteria_names <- unique(final_csv$Bacteria)
df <- data.frame(matrix(NA, ncol = 7, nrow = length(bacteria_names)))
df[, 1] <- bacteria_names
names(df) <- c("Bacteria", "estimate", "se", "zval", "pval", "ci.lb", "ci.ub")
df
for(i in 1:length(bacteria_names)){
ma_model <- rma.mv(yi, vi, data = final_csv, subset = (Bacteria == bacteria_names[i]))
re_table <- coef(summary(ma_model))
df[i, -1] <- re_table
}
df
Bacteria estimate se zval pval ci.lb ci.ub
1 Aeriscardovia -1.919381 0.1637011 -11.724907 9.500536e-32 -2.240229 -1.598532
2 Bifidobacterium 2.503891 0.2804214 8.929031 4.297546e-19 1.954275 3.053507
3 Brevibacterium -2.324798 0.3074499 -7.561552 3.982888e-14 -2.927389 -1.722207
Many many thanks @bayesian. It is working fine now. And also it seems better than the previous one, as I don;t have to write a long list of bacteria. I really appreciate your help. Thanks again.