# Sum rows in dataframe and divide by values

Hello,
I am quite new with R and trying to learn as much as I can about data manipulation.

I feel what I am trying to accomplish should be easy, in principle, but whatever approach I try I can’t find a solution. So here I am with a few questions. Thanks for your help.

My main dataset looks something like this, where I have dates, regions, and a series of variables (x-y-z-t-d-e-f). There are many more dates so it’s much longer than this.

``````Date             Region     X    Y   Z   T  D  E   F
01-01-2020  RegionA    2   4   2   3   2   3   4
01-01-2020  RegionB    1   3   2   2   3   3   3
01-01-2020  RegionC    1   4   4   2   3   4   2
01-01-2020  RegionD    2   4   2   3   2   4   4
01-01-2020  RegionE    1   3   2   2   2   2   2
02-01-2020  RegionA    2   4   7   3   2   3   4
02-01-2020  RegionB    1   3   2   2   2   3   3
02-01-2020  RegionC    1   4   4   8   3   4   2
02-01-2020  RegionD    2   3   2   3   2   4   4
02-01-2020  RegionE    1   3   2   2   2   2   2
``````

Then I have second dataset, which contains further information about the population of these regions

``````Region     Pop
RegionA    2000
RegionB    4039
RegionC    24728
RegionD    3738
RegionE    2936
``````

There are two tasks I want to accomplish. One, related to the first dataset, would be to add together two rows. For example, creating a RegionAB whose variables (x-y-z-t-d-e-f) are the sum of RegionA and RegionB. This should be done in each date, separately. So the final dataset would have a RegionAB row in 01-01-2020 and in 02-01-2020

The second task is to divide the values of one of the variables (say Z) by the values of the population contained in the other dataset. This should be done for all dates separately and added in a new column.

My third question is, what type of book do I need to learn this kind of data manipulation?

Thank you

``````lumps <-
tibble(
Region=paste0("Region",LETTERS[1:5]),
lump=c("AB","AB","CD","CD","E")
)
``````
``````my_df %>%
left_join(lumps) %>%
group_by(Date,lump) %>%
select(-Region) %>%
summarise_all(sum)
``````

Thank you very much - that was very quick!
And now, if I can still ask for some further help, I wanted to divide the values - say of variable X - of each region by their population values contained in the other dataset.

I could join the column with the population value to the main dataframe but my issue here is that data are repeated across different dates. And they should all be divided by the same number.

Thank you again.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.