Combine character entries and sum associated numbers

don_joevy · October 30, 2022, 10:46am

Hi Community,

im a very very beginner in R so id like to ask for help here:

I have a dataframe with two columns, 1st contains 47 US states which are duplicated many times, in the 2nd column i have numbers (counts). I want to combine the states to only one set of 47 States and in the 2nd column i want the sums of all associated numbers from the repititions.

Its probably very simple but as i said, im an absolute beginner...
Thanks for any help!!

FactOREO · October 30, 2022, 12:21pm

assuming the columns are state and value:

library(dplyr)

data |>
  group_by(state) |>
  summarise(
    total_value = sum(value)
  )

this will give the desired result

kind regards

don_joevy · October 30, 2022, 2:43pm

df_grouped <- df_combi %>% group_by(states) %>% summarise(total_value = sum(col_lost))

thats what i tried. the two columns are called states and col_lost. Grouping of states works, but not the sum of the numbers...

andresrcs · October 30, 2022, 2:56pm

Does your data have NAs in it? if so, use na.rm = TRUE.

df_grouped <- df_combi %>% 
    group_by(states) %>% 
    summarise(total_value = sum(col_lost, na.rm = TRUE))

If this doesn't solve your problem, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

FactOREO · October 30, 2022, 3:00pm

In your screenshot you executed it twice (the second line starts with a "+", not with ">"). This issue is due to a missing closing bracket at the end of your summarise call. The issue with NA is due to missings in your data, just add na.rm = TRUE in your sum() call.

This is how it should look like (with the same functions as in my first comment):

Data <- data.frame(state = rep(1:5, each = 4),
                   col_lost = sample(1:1000,20))
Data[sample(1:20,5),2] <- NA_integer_
head(Data,10)
#>    state col_lost
#> 1      1      504
#> 2      1      574
#> 3      1       NA
#> 4      1      332
#> 5      2      930
#> 6      2      387
#> 7      2       NA
#> 8      2       NA
#> 9      3      841
#> 10     3       NA

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
Data |>
  group_by(state) |> 
  summarise(total_col_lost = sum(col_lost, na.rm = TRUE))
#> # A tibble: 5 × 2
#>   state total_col_lost
#>   <int>          <int>
#> 1     1           1410
#> 2     2           1317
#> 3     3           2235
#> 4     4            638
#> 5     5           2654

^{Created on 2022-10-30 with reprex v2.0.2}

Kind regards

don_joevy · October 30, 2022, 5:27pm

it worked!!

In the end it was that i forgot to load the package and the na.rm function.
you guys are amazing, thank you so much!

KG

system · November 6, 2022, 5:28pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.