Counting in multiple vectors and categories

kelsicle · February 18, 2020, 4:38am

Hi! Hoping someone can help me with this - it seems like there should be a fairly simple solution, but internet searches have led me to things that are almost but not quite what I need. I'm an R novice and not sure how to put it together myself.

I would like to see which countries have the largest variety of mosquito species. The data is organized by country, and each species is a vector with "y" if the species appears there and no data point if it does not. (sample below - actual data contains 48 countries and 26 species)

Country	An funestus	An coustani	An ziemanni	An gambiae
Angola	n/a	n/a	n/a	y
Benin	n/a	y	n/a	y
Botswana	n/a	n/a	y	n/a
Burundi	y	n/a	y	y

In this case, the number of species present in Angola would be 1, Benin 2, Botswana 1, and Burundi has 3. How do I get R to count all the "y"s across all species columns while maintaining their association with their country category?

FJCC · February 18, 2020, 5:09am

Here is one way. Try executing just the pivot_longer() step first to see what that does and then add on the group_by() and the summarize(). I changed the mosquito names to just A, B and C to save typing.

DF <- data.frame(Country = c("Anglola", "Benin", "Botswana"),
                 A = c(NA, NA, "Y"),
                 B = c("Y", "Y", NA),
                 C = c("Y", NA, NA))
DF
#>    Country    A    B    C
#> 1  Anglola <NA>    Y    Y
#> 2    Benin <NA>    Y <NA>
#> 3 Botswana    Y <NA> <NA>
library(tidyr)
library(dplyr)
Counts <- DF %>% pivot_longer(A:C, names_to = "Species", values_to = "Value") %>% 
  group_by(Country) %>% 
  summarize(Cnt = sum(Value == "Y", na.rm = TRUE))
Counts
#> # A tibble: 3 x 2
#>   Country    Cnt
#>   <fct>    <int>
#> 1 Anglola      2
#> 2 Benin        1
#> 3 Botswana     1

^{Created on 2020-02-17 by the reprex package (v0.3.0)}

system · March 10, 2020, 2:08pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.