How to summarise unique values with respect to another coulmn?

This is the preview of the data

snip

Here, i want to summarise the total population for each continent.

snip1

I tired the above code but i want all cotinent names grouped together with total respective population

What would be the right code for this ?

Summarize can mean several things, Can you please clarify your intentions? Those are daily values so do you want to keep just the latest record? Or you want to keep the record with the maximum or minimum value for population?

I have updated an image please check

There are 7 unique continent names out of which one appears like this ""

You get more than one population value per continent because your data contains daily estimates of total population, not increments (I think), so you are keeping all value variations for each continent.

You first need to decide what criteria you are going to use to summarize the data, maybe the latest population estimate, or the mean over a period of time, or the range of population over a period of time, etc, etc.

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

By the latest population

The code would be something like this

library(lubridate)
library(dplyr)

r %>%
    mutate(date = dmy(date)) %>% 
    group_by(continent, location) %>% 
    filter(date == max(date)) %>% 
    group_by(continent) %>% 
    summarise(population = sum(population))

As I said before, If you need more specific help, please provide a proper REPR oducible EX ample (reprex) illustrating your issue.

Thank you for this answer
I learned something: how to use max at that first stage to get the latest record for each continent/location group
I think I have code I could simplify using that technique

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.