Why the n aren't the same?

I have 2 df that have a date of birth variable and I want to select the identical values.

> head(base$fec_nac)
[1] "1981-06-22" "1974-06-12" "1981-08-20" "1954-07-28" "1982-09-27" "1935-01-02"

> head(base2$fechanacimiento)
[1] "1983-07-13" "1964-06-01" "1950-12-29" "1951-07-03" "1958-09-04" "1961-05-29"

intersect(base$fec_nac, base2$fechanacimiento) %>%


but when I go to one of these bases to select the values, it only selects 9 instead of 251.

> base %>%
+   filter(fec_nac %in% intersect(base$fec_nac, base2$fechanacimiento)) %>%
+   nrow
[1] 6

> base2 %>%
+   filter(fechanacimiento %in% intersect(base$fec_nac, base2$fechanacimiento)) %>%
+   nrow
[1] 186

the strange thing is that intersect() does not return dates but numbers.

> head(intersect(base$fec_nac, base2$fechanacimiento))
[1]   4190   1623   4249  -5636   4652 -12783

Can you provide some example data?

What is the result of doing something like a filtering join instead?

    by = dplyr::join_by(fec_nac == fechanacimiento)

Also, what is the need for the intersection if you are filtering for values present in the other vector?

test <- c(1:4)

dplyr::filter(cars, speed %in% intersect(cars$speed, test))
#>   speed dist
#> 1     4    2
#> 2     4   10

dplyr::filter(cars, speed %in% test)
#>   speed dist
#> 1     4    2
#> 2     4   10

Created on 2024-10-07 with reprex v2.1.1.9000

A handy way to supply sample data is to use the dput() function. See ?dput. If you have a very large data set then something like head(dput(myfile), 100) will likely supply enough data for us to work with. Just copy and paste the dput() output here between



This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.