I have 2 df that have a date of birth variable and I want to select the identical values.
> head(base$fec_nac)
[1] "1981-06-22" "1974-06-12" "1981-08-20" "1954-07-28" "1982-09-27" "1935-01-02"
> head(base2$fechanacimiento)
[1] "1983-07-13" "1964-06-01" "1950-12-29" "1951-07-03" "1958-09-04" "1961-05-29"
intersect(base$fec_nac, base2$fechanacimiento) %>%
length()
251
but when I go to one of these bases to select the values, it only selects 9 instead of 251.
> base %>%
+ filter(fec_nac %in% intersect(base$fec_nac, base2$fechanacimiento)) %>%
+ nrow
[1] 6
> base2 %>%
+ filter(fechanacimiento %in% intersect(base$fec_nac, base2$fechanacimiento)) %>%
+ nrow
[1] 186
the strange thing is that intersect() does not return dates but numbers.
> head(intersect(base$fec_nac, base2$fechanacimiento))
[1] 4190 1623 4249 -5636 4652 -12783
Can you provide some example data?
What is the result of doing something like a filtering join instead?
dplyr::semi_join(
base,
base2,
by = dplyr::join_by(fec_nac == fechanacimiento)
)
Also, what is the need for the intersection if you are filtering for values present in the other vector?
test <- c(1:4)
dplyr::filter(cars, speed %in% intersect(cars$speed, test))
#> speed dist
#> 1 4 2
#> 2 4 10
dplyr::filter(cars, speed %in% test)
#> speed dist
#> 1 4 2
#> 2 4 10
Created on 2024-10-07 with reprex v2.1.1.9000
A handy way to supply sample data is to use the dput() function. See ?dput. If you have a very large data set then something like head(dput(myfile), 100) will likely supply enough data for us to work with. Just copy and paste the dput() output here between
```
```
system
Closed
January 6, 2025, 12:46am
4
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.