Now i want to summarize all the data for each Video.ID. The problem are columns like category, where i cannot apply sum or mean or something. I tried the following
small %>%group_by(Video.ID) %>% summarise(sumr=sum(Partner.Revenue),len=mean(Video.Duration..sec.),cat=mean(Category))
mean(Category) is clearly wrong, but I do not know how to fix it. One video id only ever has one category, how to I get dplyr to just pick that in the summary ?
I know its a beginner question and most likely silly, but I just cannot find a solution. All tips are appreciated.
summarise() just needs a function that will return a single value. So if you're certain that there's only one value of Category per group, you could use:
small %>%
group_by(Video.ID) %>%
summarise(
sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.),
cat = Category[[1]]
)
If it's possible you might have more than one value of Category per group, then it's up to you to decide how they should be collapsed into a single value. Concatenation is one simple option, though whether it's a good idea or not depends on what you're doing next with the summarized data (displaying in a table? further analysis? ) .
small %>%
group_by(Video.ID) %>%
summarise(
sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.),
cat = paste(Category, collapse = ", ")
)
(if you do need to do further analysis on multiple values of Category, summarizing them as a list column with cat = list(Category) is probably the way to go, although then you'll really want to look into purrr and this great tutorial)
P.S. There is nothing wrong with asking beginner questions! We were all beginners once, and we are all still beginners at countless things.
Agreed! @Traumfabrik, I should add a disclaimer that since there was no reprex here, I had to assume that the code in your post was free of typos or other errors. If that's not a safe assumption, then the code in my post may just replicate those problems and therefore not work, for reasons unrelated to your question. Reprexes save us all time and confusion!