I want to perform a t test of values based on matching row names coming from a specific column. Some example would be like this...
12 boys and 12 girls go and pick 4 types of flowers. The (arbitrary) size of the flower is listed in the 'boy' or 'girl' column.
I want to perform a t test by the flower type. I guess the hypothesis in my example would be something like, "boys pick different sized flowers than girls".
I imagine if I wanted to perform the t test with just rose, I could do:
Here is one method using a custom function and the map function from purrr.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
mydf <- data.frame(boy=1:12, girl=13:24)
mydf$flower[1:3] <- c("lilly")
mydf$flower[4:6] <- c("rose")
mydf$flower[7:9] <- c("petunia")
mydf$flower[10:12] <- c("violet")
Flowers <- unique(mydf$flower)
MyFunc <- function(Nm, DF){
tmp <- DF %>% filter(flower == Nm)
t.test(tmp$boy,tmp$girl)
}
TESTS <- map(Flowers, MyFunc, DF = mydf) #map sends each value in Flowers to MyFunc
# TESTS is a list of 4 elements, a t.test for each flower
names(TESTS) <- Flowers #set the names of TESTS
TESTS$rose
#>
#> Welch Two Sample t-test
#>
#> data: tmp$boy and tmp$girl
#> t = -14.697, df = 4, p-value = 0.0001247
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -14.266958 -9.733042
#> sample estimates:
#> mean of x mean of y
#> 5 17
#Compare
rose_boy <- c(4,5,6)
rose_girl <- c(16,17,18)
t.test(rose_boy, rose_girl)
#>
#> Welch Two Sample t-test
#>
#> data: rose_boy and rose_girl
#> t = -14.697, df = 4, p-value = 0.0001247
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -14.266958 -9.733042
#> sample estimates:
#> mean of x mean of y
#> 5 17
@FJCC This is exactly what I was trying to accomplish - thank you!
I see from your solution that the p.value is contained with each list - is there a way I can extract this for al flowers and add it back to the original dataframe? I realize doing so would have replicated values, but that is completely ok.
For example, a new column pvals in the original mydf would have the value of 0.0001247 for all "rose" rows.
Here are methods for returning just the p value from the t test or returning more complete information using the broom package. Note that the second output has the p value twice only because I reused mydf after joining the p values in the first method. Also, all of the p values are the same because the differences are constant across the groups.