Filter not filtering correctly...

I am really struggling to work out what is going wrong here...

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

x <- seq(0.01,0.45,0.01)
y <- seq(0.02,0.05,0.005)

df <- expand.grid(x=x, y=y)

df %>%
  dplyr::filter(x==0.14)
#>      x     y
#> 1 0.14 0.020
#> 2 0.14 0.025
#> 3 0.14 0.030
#> 4 0.14 0.035
#> 5 0.14 0.040
#> 6 0.14 0.045
#> 7 0.14 0.050

df %>%
  dplyr::filter(x==0.36)
#> [1] x y
#> <0 rows> (or 0-length row.names)

Created on 2019-03-28 by the reprex package (v0.2.1)
There are definitely rows in df where x==0.36

sessionInfo()
#> R version 3.5.3 (2019-03-11)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS Mojave 10.14.3
#> 
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_3.5.3  magrittr_1.5    tools_3.5.3     htmltools_0.3.6
#>  [5] yaml_2.2.0      Rcpp_1.0.1      stringi_1.4.3   rmarkdown_1.12 
#>  [9] highr_0.7       knitr_1.22      stringr_1.4.0   xfun_0.5       
#> [13] digest_0.6.18   evaluate_0.13

Any insights greatly received.

This looks like an issue with floating point equality.

Try using all.equal().

4 Likes

Ahh... okay - got it!

So this works.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

x <- seq(0.01,0.45,0.01)
y <- seq(0.02,0.05,0.005)

df <- expand.grid(x=x, y=y)

df %>%
  dplyr::filter(x==0.14)
#>      x     y
#> 1 0.14 0.020
#> 2 0.14 0.025
#> 3 0.14 0.030
#> 4 0.14 0.035
#> 5 0.14 0.040
#> 6 0.14 0.045
#> 7 0.14 0.050

df %>%
  dplyr::filter(near(x, 0.36))
#>      x     y
#> 1 0.36 0.020
#> 2 0.36 0.025
#> 3 0.36 0.030
#> 4 0.36 0.035
#> 5 0.36 0.040
#> 6 0.36 0.045
#> 7 0.36 0.050

Created on 2019-03-28 by the reprex package (v0.2.1)

Thanks!

2 Likes

The answer above using dplyr::near() is the best solution here (I'd forgotten about that one). all.equal() appears to require more fiddling than it's worth.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.