Examine if one value in a column matches values in a list

passwordistaco · November 4, 2025, 9:47pm

I'm trying to compare a column of a dataframe and see if the number in that dataframe matches one of a set of values in a list:

df$column is a range of numerical values, about 4000 rows.
comparisonlist = list of 100 or so values

I want to create a check variable that flips to 1 for a row if the value of column in that specific row matches one of the 100 numbers in comparisonlist, but I need to be able to compare column to comparisonlist for it to work properly.

I've tried various iterations of column %in% comparisonlist, but it always returns a value of FALSE, even when it should be matching. I've tried coding both as characters and it still returns FALSE every time.

Any guidance to what I'm doing wrong?

Thanks!

prubin · November 4, 2025, 11:18pm

Is this what you want (other than the problem dimensions)?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- data.frame(vals = 1:10)
comparisonlist <- c(3, 5, 7)
df <- df |> rowwise() |> mutate(vals %in% comparisonlist)

^{Created on 2025-11-04 with reprex v2.1.1}

margusl · November 5, 2025, 9:42am

Such tasks are often approached with a join, using dplyr::starwars as an example and comparing comparisonlist against sw$mass:

library(dplyr, warn.conflicts = FALSE)
( sw <- 
  starwars |> 
  select(name, height, mass)
)
#> # A tibble: 87 × 3
#>    name               height  mass
#>    <chr>               <int> <dbl>
#>  1 Luke Skywalker        172    77
#>  2 C-3PO                 167    75
#>  3 R2-D2                  96    32
#>  4 Darth Vader           202   136
#>  5 Leia Organa           150    49
#>  6 Owen Lars             178   120
#>  7 Beru Whitesun Lars    165    75
#>  8 R5-D4                  97    32
#>  9 Biggs Darklighter     183    84
#> 10 Obi-Wan Kenobi        182    77
#> # ℹ 77 more rows

comparisonlist <- c(32, 49)

( cl <- tibble(comparisonlist, in_comp_list = TRUE) )
#> # A tibble: 2 × 2
#>   comparisonlist in_comp_list
#>            <dbl> <lgl>       
#> 1             32 TRUE        
#> 2             49 TRUE

left_join(sw, cl, by = join_by(mass == comparisonlist )) |> 
  # missing matches result with NAs, turn those to FALSE with coalesce()
  mutate(in_comp_list = coalesce(in_comp_list, FALSE))
#> # A tibble: 87 × 4
#>    name               height  mass in_comp_list
#>    <chr>               <int> <dbl> <lgl>       
#>  1 Luke Skywalker        172    77 FALSE       
#>  2 C-3PO                 167    75 FALSE       
#>  3 R2-D2                  96    32 TRUE        
#>  4 Darth Vader           202   136 FALSE       
#>  5 Leia Organa           150    49 TRUE        
#>  6 Owen Lars             178   120 FALSE       
#>  7 Beru Whitesun Lars    165    75 FALSE       
#>  8 R5-D4                  97    32 TRUE        
#>  9 Biggs Darklighter     183    84 FALSE       
#> 10 Obi-Wan Kenobi        182    77 FALSE       
#> # ℹ 77 more rows

Or you could loop through all values of df$column (sw$mass in this example) with sapply() or purrr::map_lgl() to use %in% and get a logical vector as a result:

sw$in_list <- sapply(sw$mass, \(x) x %in% cl$comparisonlist)
sw
#> # A tibble: 87 × 4
#>    name               height  mass in_list
#>    <chr>               <int> <dbl> <lgl>  
#>  1 Luke Skywalker        172    77 FALSE  
#>  2 C-3PO                 167    75 FALSE  
#>  3 R2-D2                  96    32 TRUE   
#>  4 Darth Vader           202   136 FALSE  
#>  5 Leia Organa           150    49 TRUE   
#>  6 Owen Lars             178   120 FALSE  
#>  7 Beru Whitesun Lars    165    75 FALSE  
#>  8 R5-D4                  97    32 TRUE   
#>  9 Biggs Darklighter     183    84 FALSE  
#> 10 Obi-Wan Kenobi        182    77 FALSE  
#> # ℹ 77 more rows

By any chance, do you happen to work with floating-point numbers? Just in case, you do know that printed value may not match the object value? E.g. when a and b are both printed as 1, equality is not granted:

a <- 1 
b <- 1 + 1e-7 
print(c(a, b))
#> [1] 1 1
a == b
#> [1] FALSE
dput(b)
#> 1.0000001

And you should not expect something like 0.3 / 3 == 0.1 to yield TRUE, e.g.:

( a <- 0.3 / 3 )
#> [1] 0.1
( b <- 0.1 ) 
#> [1] 0.1
a == b
#> [1] FALSE
a - b
#> [1] -1.387779e-17

The same applies for %in% and equality joins ( join_by(a == b )), so you may need to use rounding, comparison with tolerance (e.g. dplyr::near() instead of %in%) or rolling / overlap join instead of equality join ( you can check ?dplyr::join_by for examples).

system · February 3, 2026, 9:43am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.