Hi! Hoping someone can help me with this - it seems like there should be a fairly simple solution, but internet searches have led me to things that are almost but not quite what I need. I'm an R novice and not sure how to put it together myself.
I would like to see which countries have the largest variety of mosquito species. The data is organized by country, and each species is a vector with "y" if the species appears there and no data point if it does not. (sample below - actual data contains 48 countries and 26 species)
Country
An funestus
An coustani
An ziemanni
An gambiae
Angola
n/a
n/a
n/a
y
Benin
n/a
y
n/a
y
Botswana
n/a
n/a
y
n/a
Burundi
y
n/a
y
y
In this case, the number of species present in Angola would be 1, Benin 2, Botswana 1, and Burundi has 3. How do I get R to count all the "y"s across all species columns while maintaining their association with their country category?
Here is one way. Try executing just the pivot_longer() step first to see what that does and then add on the group_by() and the summarize(). I changed the mosquito names to just A, B and C to save typing.
DF <- data.frame(Country = c("Anglola", "Benin", "Botswana"),
A = c(NA, NA, "Y"),
B = c("Y", "Y", NA),
C = c("Y", NA, NA))
DF
#> Country A B C
#> 1 Anglola <NA> Y Y
#> 2 Benin <NA> Y <NA>
#> 3 Botswana Y <NA> <NA>
library(tidyr)
library(dplyr)
Counts <- DF %>% pivot_longer(A:C, names_to = "Species", values_to = "Value") %>%
group_by(Country) %>%
summarize(Cnt = sum(Value == "Y", na.rm = TRUE))
Counts
#> # A tibble: 3 x 2
#> Country Cnt
#> <fct> <int>
#> 1 Anglola 2
#> 2 Benin 1
#> 3 Botswana 1