im a very very beginner in R so id like to ask for help here:
I have a dataframe with two columns, 1st contains 47 US states which are duplicated many times, in the 2nd column i have numbers (counts). I want to combine the states to only one set of 47 States and in the 2nd column i want the sums of all associated numbers from the repititions.
Its probably very simple but as i said, im an absolute beginner...
Thanks for any help!!
In your screenshot you executed it twice (the second line starts with a "+", not with ">"). This issue is due to a missing closing bracket at the end of your summarise call. The issue with NA is due to missings in your data, just add na.rm = TRUE in your sum() call.
This is how it should look like (with the same functions as in my first comment):
Data <- data.frame(state = rep(1:5, each = 4),
col_lost = sample(1:1000,20))
Data[sample(1:20,5),2] <- NA_integer_
head(Data,10)
#> state col_lost
#> 1 1 504
#> 2 1 574
#> 3 1 NA
#> 4 1 332
#> 5 2 930
#> 6 2 387
#> 7 2 NA
#> 8 2 NA
#> 9 3 841
#> 10 3 NA
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
Data |>
group_by(state) |>
summarise(total_col_lost = sum(col_lost, na.rm = TRUE))
#> # A tibble: 5 × 2
#> state total_col_lost
#> <int> <int>
#> 1 1 1410
#> 2 2 1317
#> 3 3 2235
#> 4 4 638
#> 5 5 2654