Finding the mismatched values

Danish · July 11, 2022, 8:49pm

I have ride start station name and their corresponding ride start station id.
But the summary data in R shows that I have 625 unique values of ride start station name and 622 for ride start station id. why is this difference and how to resolve it using R ?

HanOostdijk · July 12, 2022, 2:27pm

Assuming that the data is contained in a data.frame you could use something like

library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.1.2
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df1 <- tibble::tribble(
  ~name, ~id,
  "a" , 1, 
  "A" , 1,
  "b" , 2,
  "c" , 3
)

df2 <- df1 |>
  group_by(id) |>
  mutate (count=n()) |>
  filter(count>1) 

print(df2)
#> # A tibble: 2 x 3
#> # Groups:   id [1]
#>   name     id count
#>   <chr> <dbl> <int>
#> 1 a         1     2
#> 2 A         1     2
Created on 2022-07-12 by the reprex package (v2.0.1)

system · August 2, 2022, 2:28pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.