Why the n aren't the same?

juandmaz · October 7, 2024, 6:17pm

I have 2 df that have a date of birth variable and I want to select the identical values.

> head(base$fec_nac)
[1] "1981-06-22" "1974-06-12" "1981-08-20" "1954-07-28" "1982-09-27" "1935-01-02"

> head(base2$fechanacimiento)
[1] "1983-07-13" "1964-06-01" "1950-12-29" "1951-07-03" "1958-09-04" "1961-05-29"

intersect(base$fec_nac, base2$fechanacimiento) %>%
  length()

251

but when I go to one of these bases to select the values, it only selects 9 instead of 251.

> base %>%
+   filter(fec_nac %in% intersect(base$fec_nac, base2$fechanacimiento)) %>%
+   nrow
[1] 6

> base2 %>%
+   filter(fechanacimiento %in% intersect(base$fec_nac, base2$fechanacimiento)) %>%
+   nrow
[1] 186

the strange thing is that intersect() does not return dates but numbers.

> head(intersect(base$fec_nac, base2$fechanacimiento))
[1]   4190   1623   4249  -5636   4652 -12783

eric-hunt · October 7, 2024, 6:51pm

Can you provide some example data?

What is the result of doing something like a filtering join instead?

dplyr::semi_join(
    base,
    base2,
    by = dplyr::join_by(fec_nac == fechanacimiento)
)

Also, what is the need for the intersection if you are filtering for values present in the other vector?

test <- c(1:4)

dplyr::filter(cars, speed %in% intersect(cars$speed, test))
#>   speed dist
#> 1     4    2
#> 2     4   10

dplyr::filter(cars, speed %in% test)
#>   speed dist
#> 1     4    2
#> 2     4   10

^{Created on 2024-10-07 with reprex v2.1.1.9000}

jrkrideau · October 8, 2024, 12:45am

A handy way to supply sample data is to use the dput() function. See ?dput. If you have a very large data set then something like head(dput(myfile), 100) will likely supply enough data for us to work with. Just copy and paste the dput() output here between

```

system · January 6, 2025, 12:46am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.