Data wrangling help dplyr

Hi, I am fairly new R user, trying to do something relatively simple.

I have a small dataset of individual bee species counts. I am simply trying to get the sum for each site for each m_y (which is a trapping event).

I first made columns into factors.

I have not been able to figure out if I use "group_by" or "filter" to run this?

Thank you.
My data:
mth m_y event trap site taxa counts normalized units

1 Jan Jan-18 1 NE-1 NE Agapostemon 1 0.111 number/day
2 Jan Jan-18 1 NE-2 NE Lasioglossum dialictus 2 0.222 number/day
3 Jan Jan-18 1 NW-1 NW Lasioglossum dialictus 12 1.33 number/day
4 Jan Jan-18 1 NW-1 NW Apis mellifera 2 0.222 number/day
5 Jan Jan-18 1 NW-2 NW Augochlorella 1 0.111 number/day
6 Jan Jan-18 1 NW-2 NW Lasioglossum dialictus 6 0.667 number/day

My code:


by.site <- bees %>%
  group_by(m-y, site)

#> Error in bees %>% group_by(m - y, site): could not find function "%>%"
  summarise(sum.normal = sum(normalized))
#> Error in summarise(sum.normal = sum(normalized)): could not find function "summarise"

You need to load either the tidyverse or dplyr package (part of the tidyverse).

So:

install.packages("tidyverse") # this is only if you haven't installed it yet
library(tidyverse)

not to mention, watch your hyphens vs. underscores...(m-y vs. m_y)
(sez he who's made the same mistake!)
:grinning:

@kelaguen thanks for catching that.

@williaml I loaded tidyverse, and ran my script, but it did not give me a new dataframe with the sums for each site for each m_y

This is what I came up with (looks just like my original dataframe)????

mth m_y event trap site taxa counts normalized units

1 Jan Jan-18 1 NE-1 NE Agapostemon 1 0.111 number/day
2 Jan Jan-18 1 NE-2 NE Lasioglossum dialictus 2 0.222 number/day
3 Jan Jan-18 1 NW-1 NW Lasioglossum dialictus 12 1.33 number/day
4 Jan Jan-18 1 NW-1 NW Apis mellifera 2 0.222 number/day
5 Jan Jan-18 1 NW-2 NW Augochlorella 1 0.111 number/day
6 Jan Jan-18 1 NW-2 NW Lasioglossum dialictus 6 0.667 number/day

If you want to get specific help, you would need to provide a proper REPRoducible EXample (reprex) illustrating your issue. I'm going to make one for you this time but please have in mind that this is not your first time here and at this point, you should be following good practices on the forum.

library(dplyr)

# Sample data on a copy/paste friendly format
bees <- data.frame(
  stringsAsFactors = FALSE,
               mth = c("Jan", "Jan", "Jan", "Jan", "Jan", "Jan"),
               m_y = c("Jan-18","Jan-18","Jan-18",
                       "Jan-18","Jan-18","Jan-18"),
             event = c(1, 1, 1, 1, 1, 1),
              trap = c("NE-1", "NE-2", "NW-1", "NW-1", "NW-2", "NW-2"),
              site = c("NE", "NE", "NW", "NW", "NW", "NW"),
              taxa = c("Agapostemon",
                       "Lasioglossum dialictus","Lasioglossum dialictus","Apis mellifera",
                       "Augochlorella","Lasioglossum dialictus"),
            counts = c(1, 2, 12, 2, 1, 6),
        normalized = c(0.111, 0.222, 1.33, 0.222, 0.111, 0.667),
             units = c("number/day","number/day",
                       "number/day","number/day","number/day","number/day")
)

bees %>%
    group_by(m_y, site) %>% 
    summarise(sum_normal = sum(normalized))
#> `summarise()` regrouping output by 'm_y' (override with `.groups` argument)
#> # A tibble: 2 x 3
#> # Groups:   m_y [1]
#>   m_y    site  sum_normal
#>   <chr>  <chr>      <dbl>
#> 1 Jan-18 NE         0.333
#> 2 Jan-18 NW         2.33
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.