Finding one row of an R dataframe in another dataframe

Saurish_Seksaria · September 8, 2023, 10:12am

I have two dataframes df1 and df2 with the exact same columns. df1 is large and df2 has only one row. I wish to find the row of df2 in df1 and delete all the rows of df1 which come before this particular row. Can someone please help out with how to achieve this?

FJCC · September 8, 2023, 1:14pm

This is all I could think of. It is not very elegant.

set.seed(123)
DF <- data.frame(A = sample(1:10,100,replace = TRUE), 
                   B = sample(c("R","T"), 100, replace = TRUE), 
                 C = sample(1:10,100,replace = TRUE),
                 D = sample(c("Y","Z"), 100, replace = TRUE))

DF2 <- data.frame(A = 5, B = "R", C = 6, D = "Y")
library(purrr)
AllMatches <- pmap(list(DF, DF2), .f = function(x,y) which(x == y))
AllMatches
#> $A
#>  [1]  6 11 26 33 35 39 46 61 79 80
#> 
#> $B
#>  [1]  3  5  6  8 13 14 15 20 21 22 23 26 27 30 31 34 35 37 42 46 50 51 56 58 60
#> [26] 61 62 65 66 68 69 72 78 79 81 83 84 85 89 92 93 94 95 96 98
#> 
#> $C
#> [1]  2 10 25 39 61 85 88
#> 
#> $D
#>  [1]  5  7  8  9 10 11 12 13 14 15 18 19 20 23 25 27 29 30 31 34 41 45 48 51 53
#> [26] 54 57 59 60 61 64 65 66 67 69 74 77 80 82 87 88 89 94 95 96 97 99
LEN <- length(AllMatches) - 1
Candidates <- AllMatches[[1]][AllMatches[[1]] %in% AllMatches[[2]]]
for (i in 2:LEN) {
  Candidates <- Candidates[Candidates %in% AllMatches[[i+1]]]
}
Candidates
#> [1] 61
DFnew <- DF[min(Candidates):nrow(DF), ]

^{Created on 2023-09-08 with reprex v2.0.2}

Saurish_Seksaria · September 8, 2023, 1:40pm

This is giving an error "result would be too long a vector"

FJCC · September 8, 2023, 2:24pm

Does my example work for you?
How many rows and columns are in your df1?
If you make a small subset of your df1, say, 10 rows and columns, and an appropriate df2, does my code work? If not, post the output of running your small df1 through the dput() function. If the small subset is called df1small, run

dput(df1small)

and post the output here. Put a line with three back ticks just before and after the output, like this:
```
output of dput() goes here
```

Saurish_Seksaria · September 8, 2023, 3:13pm

Actually, I found a workaround using the following line:
matching_row_index <- which(apply(df1, 1, identical, df2))
and then I selected the rows from matching_row_index:nrow(df1)

Thank you for the help with the coding

system · October 20, 2023, 3:14pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.