Hi R gurus,
I want to create a data frame that calculates abundances of traits from two other related data frames but don’t know what code would give me the desired output.
The first data frame, species abundance, contains species (rows) by sites (columns). e.g.;
BaCo_1 BaFa_1 BaSl_1 BrSl_1 CaCo_1 CaFa_1 CaSl_1
Abla.nota 0 0 3 2 9 0 0
Albo.woro 0 0 0 1 0 0 2
Aust.coll 0 2 0 0 2 0 0
Bibu.Kadj 0 0 1 0 0 0 0
cala.sp 0 0 0 0 0 7 1
The second dataframe, species traits, contains species (rows) by traits (columns);
MFA2 MFA4 MFA5 MFA6 MFA12 MFA14 flow1 flow2 flow3
Abla.nota 1 0 0 0 0 0 1 0 0
Albo.woro 0 0 0 1 0 0 1 0 0
alel.aust 0 1 0 0 0 0 0 0 1
Aulo.stri 1 0 0 0 0 0 0 0 1
Aust.anac 0 0 0 0 1 0 1 0 0
Aust.coll 0 0 0 0 1 0 1 0 0
Aust.subt 0 1 0 0 0 0 0 0 1
bero.sp 0 1 0 0 0 0 1 0 0
Bibu.Kadj 1 0 0 0 0 0 0 0 1
Bran.sowe 0 0 1 0 0 0 1 0 0
cala.sp 0 0 0 1 0 0 1 0 0
Note that the “traits” data frame contains more species than the “sites” data frame. This is because the trait data frame contains all species in the study, of which the “sites” data frame is a subset. Both examples above are subsets of the full data.
From these, I want to create a third data frame, sites (rows) by traits (columns), a dataframe which has the trait abundances at each site, regardless of the species that possess those traits;
MFA2 MFA4 MFA5 MFA6 MFA12 MFA14 flow1 flow2 flow3
BaCo_1 0 0 0 0 0 0 0 0 0
BaFa_1 0 0 0 0 2 0 2 0 0
BaSl_1 4 0 0 0 0 0 3 0 1
BrSl_1 2 0 0 1 0 0 3 0 0
CaCo_1 9 0 0 0 2 0 11 0 0
CaFa_1 0 0 0 7 0 0 7 0 0
CaSl_1 0 0 0 3 0 0 3 0 0
I created this output by hand because I don’t know what code would produce this output. Each value is a summation of the abundances of all individuals in the sample (row) that possess a trait (column). For example, CaCo_1 x flow1 in the example output, 11, above is the summation of the 9 alba.nota and 2 Aust.coll from the first data frame that possess the flow1 trait identified in the second dataframe.
What code/package should I use to produce this outcome?
FJCC
September 19, 2023, 4:15am
2
Here are two slightly different version of a solution. In the first, I filter out elements of the data that are zero. This makes the AllData data frame much smaller. However, that means that not all of the listed sites and traits appear in the final data. In the second version, the zeros are kept at the price of a larger data set.
library(tidyr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
Traits <- read.csv("~/R/Play/Traits.csv")
Traits
#> Species MFA2 MFA4 MFA5 MFA6 MFA12 MFA14 flow1 flow2 flow3
#> 1 Abla.nota 1 0 0 0 0 0 1 0 0
#> 2 Albo.woro 0 0 0 1 0 0 1 0 0
#> 3 alel.aust 0 1 0 0 0 0 0 0 1
#> 4 Aulo.stri 1 0 0 0 0 0 0 0 1
#> 5 Aust.anac 0 0 0 0 1 0 1 0 0
#> 6 Aust.coll 0 0 0 0 1 0 1 0 0
#> 7 Aust.subt 0 1 0 0 0 0 0 0 1
#> 8 bero.sp 0 1 0 0 0 0 1 0 0
#> 9 Bibu.Kadj 1 0 0 0 0 0 0 0 1
#> 10 Bran.sowe 0 0 1 0 0 0 1 0 0
#> 11 cala.sp 0 0 0 1 0 0 1 0 0
Sites <- read.csv("~/R/Play/Sites.csv")
Sites
#> Species BaCo_1 BaFa_1 BaSl_1 BrSl_1 CaCo_1 CaFa_1 CaSl_1
#> 1 Abla.nota 0 0 3 2 9 0 0
#> 2 Albo.woro 0 0 0 1 0 0 2
#> 3 Aust.coll 0 2 0 0 2 0 0
#> 4 Bibu.Kadj 0 0 1 0 0 0 0
#> 5 cala.sp 0 0 0 0 0 7 1
#version 1
Traits_long <- Traits |> pivot_longer(cols = -Species,
names_to = "Traits",
values_to = "Present") |>
filter(Present == 1)
Traits_long
#> # A tibble: 22 × 3
#> Species Traits Present
#> <chr> <chr> <int>
#> 1 Abla.nota MFA2 1
#> 2 Abla.nota flow1 1
#> 3 Albo.woro MFA6 1
#> 4 Albo.woro flow1 1
#> 5 alel.aust MFA4 1
#> 6 alel.aust flow3 1
#> 7 Aulo.stri MFA2 1
#> 8 Aulo.stri flow3 1
#> 9 Aust.anac MFA12 1
#> 10 Aust.anac flow1 1
#> # ℹ 12 more rows
Sites_long <- Sites |> pivot_longer(cols = -Species,
names_to = "Sites", values_to = "Count") |>
filter(Count > 0)
Sites_long
#> # A tibble: 10 × 3
#> Species Sites Count
#> <chr> <chr> <int>
#> 1 Abla.nota BaSl_1 3
#> 2 Abla.nota BrSl_1 2
#> 3 Abla.nota CaCo_1 9
#> 4 Albo.woro BrSl_1 1
#> 5 Albo.woro CaSl_1 2
#> 6 Aust.coll BaFa_1 2
#> 7 Aust.coll CaCo_1 2
#> 8 Bibu.Kadj BaSl_1 1
#> 9 cala.sp CaFa_1 7
#> 10 cala.sp CaSl_1 1
AllData <- inner_join(Sites_long, Traits_long, by = "Species", relationship = "many-to-many")
Summs <- AllData |> group_by(Sites, Traits) |> summarize(Total = sum(Count)) |>
pivot_wider(names_from = Traits, values_from = Total)
#> `summarise()` has grouped output by 'Sites'. You can override using the
#> `.groups` argument.
Summs
#> # A tibble: 6 × 6
#> # Groups: Sites [6]
#> Sites MFA12 flow1 MFA2 flow3 MFA6
#> <chr> <int> <int> <int> <int> <int>
#> 1 BaFa_1 2 2 NA NA NA
#> 2 BaSl_1 NA 3 4 1 NA
#> 3 BrSl_1 NA 3 2 NA 1
#> 4 CaCo_1 2 11 9 NA NA
#> 5 CaFa_1 NA 7 NA NA 7
#> 6 CaSl_1 NA 3 NA NA 3
#version 2
Traits_long <- Traits |> pivot_longer(cols = -Species,
names_to = "Traits",
values_to = "Present")
Traits_long
#> # A tibble: 99 × 3
#> Species Traits Present
#> <chr> <chr> <int>
#> 1 Abla.nota MFA2 1
#> 2 Abla.nota MFA4 0
#> 3 Abla.nota MFA5 0
#> 4 Abla.nota MFA6 0
#> 5 Abla.nota MFA12 0
#> 6 Abla.nota MFA14 0
#> 7 Abla.nota flow1 1
#> 8 Abla.nota flow2 0
#> 9 Abla.nota flow3 0
#> 10 Albo.woro MFA2 0
#> # ℹ 89 more rows
Sites_long <- Sites |> pivot_longer(cols = -Species,
names_to = "Sites", values_to = "Count")
Sites_long
#> # A tibble: 35 × 3
#> Species Sites Count
#> <chr> <chr> <int>
#> 1 Abla.nota BaCo_1 0
#> 2 Abla.nota BaFa_1 0
#> 3 Abla.nota BaSl_1 3
#> 4 Abla.nota BrSl_1 2
#> 5 Abla.nota CaCo_1 9
#> 6 Abla.nota CaFa_1 0
#> 7 Abla.nota CaSl_1 0
#> 8 Albo.woro BaCo_1 0
#> 9 Albo.woro BaFa_1 0
#> 10 Albo.woro BaSl_1 0
#> # ℹ 25 more rows
AllData <- inner_join(Sites_long, Traits_long, by = "Species", relationship = "many-to-many")
Summs <- AllData |> mutate(Count = Count * Present) |>
group_by(Sites, Traits) |> summarize(Total = sum(Count)) |>
pivot_wider(names_from = Traits, values_from = Total)
#> `summarise()` has grouped output by 'Sites'. You can override using the
#> `.groups` argument.
Summs
#> # A tibble: 7 × 10
#> # Groups: Sites [7]
#> Sites MFA12 MFA14 MFA2 MFA4 MFA5 MFA6 flow1 flow2 flow3
#> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 BaCo_1 0 0 0 0 0 0 0 0 0
#> 2 BaFa_1 2 0 0 0 0 0 2 0 0
#> 3 BaSl_1 0 0 4 0 0 0 3 0 1
#> 4 BrSl_1 0 0 2 0 0 1 3 0 0
#> 5 CaCo_1 2 0 9 0 0 0 11 0 0
#> 6 CaFa_1 0 0 0 0 0 7 7 0 0
#> 7 CaSl_1 0 0 0 0 0 3 3 0 0
Created on 2023-09-18 with reprex v2.0.2
1 Like
thank you for responding so quickly. I have run the code myself and had success
system
Closed
September 26, 2023, 5:35am
4
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.