I have ride start station name and their corresponding ride start station id.
But the summary data in R shows that I have 625 unique values of ride start station name and 622 for ride start station id. why is this difference and how to resolve it using R ?
Assuming that the data is contained in a data.frame you could use something like
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.1.2
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df1 <- tibble::tribble(
~name, ~id,
"a" , 1,
"A" , 1,
"b" , 2,
"c" , 3
)
df2 <- df1 |>
group_by(id) |>
mutate (count=n()) |>
filter(count>1)
print(df2)
#> # A tibble: 2 x 3
#> # Groups: id [1]
#> name id count
#> <chr> <dbl> <int>
#> 1 a 1 2
#> 2 A 1 2
Created on 2022-07-12 by the reprex package (v2.0.1)
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.