New column with percentages based on other column

floris508 · April 12, 2021, 9:10am

Hi, I have a dataset consisting of 3 columns and 59 rows. I want to create a fourth row, which is based on the data of two other rows. Firstly, it should check in one column which city it is about. Secondly, if that row matches a particular city, I want to divide the "length" (which is another column) by the total amount of length of that particular city. Hence, the new column would be something like: percentage of length per city (but I named it Percentage). The data looks something like this:

A tibble: 59 x 3

Groups: City [4]

City Omschrijving Length

1 Amsterdam Bedrijfsterrein 6756.
2 Amsterdam Begraafplaats 1359.
3 Amsterdam Bos 506.
4 Amsterdam Bouwterrein 1584.
5 Amsterdam Dagrecreatief terrein 2127.
6 Amsterdam Detailhandel en horeca 3179.
7 Amsterdam Hoofdweg 20958.
8 Amsterdam Nat natuurlijk terrein 578.
9 Amsterdam Openbare voorziening 2118.
10 Amsterdam Overig agrarisch gebruik 476.

... with 49 more rows

My code looks something like this

City_Omschrijving <-
City_Omschrijving %>%
filter(!Omschrijving == "Park en plantsoen") %>%
group_by(City, Omschrijving) %>%
mutate(Percentage = Length / sum(Length))

after which I did:

City_Omschrijving$Percentage <-
formattable::percent(City_Omschrijving$Percentage, digits =1)

But this gives me, unfortunately, the percentage of each Length with regards to the total amount of Length of all the column, instead of the percentage of each Length per city:

Groups: City, Omschrijving [59]

City Omschrijving Length Percentage

1 Amsterdam Bedrijfsterrein 6756. 1.6%
2 Amsterdam Begraafplaats 1359. 0.3%
3 Amsterdam Bos 506. 0.1%
4 Amsterdam Bouwterrein 1584. 0.4%
5 Amsterdam Dagrecreatief terrein 2127. 0.5%
6 Amsterdam Detailhandel en horeca 3179. 0.8%
7 Amsterdam Hoofdweg 20958. 4.9%
8 Amsterdam Nat natuurlijk terrein 578. 0.1%
9 Amsterdam Openbare voorziening 2118. 0.5%
10 Amsterdam Overig agrarisch gebruik 476. 0.1%

... with 49 more rows

FJCC · April 12, 2021, 12:07pm

Perhaps like this?

library(dplyr)
DF <- data.frame(City=c("A","A","A","B","B","C","C","C"),
                  Omschrijving=LETTERS[1:8],
                  Length=c(5,3,6,2,5,6,8,4))
DF %>% group_by(City) %>% mutate(SUM=sum(Length),
                                  Percent=Length/SUM) 
# A tibble: 8 x 5
# Groups:   City [3]
  City  Omschrijving Length   SUM Percent
  <chr> <chr>         <dbl> <dbl>   <dbl>
1 A     A                 5    14   0.357
2 A     B                 3    14   0.214
3 A     C                 6    14   0.429
4 B     D                 2     7   0.286
5 B     E                 5     7   0.714
6 C     F                 6    18   0.333
7 C     G                 8    18   0.444
8 C     H                 4    18   0.222

floris508 · April 12, 2021, 2:22pm

It worked, thank you FJCC!

system · April 19, 2021, 2:23pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.