Find rows in a data frame that match a list of values within an error and return the whole row

Hi,
The title says it all. I have a data frame of two columns (value and string) and I would like to find all the rows where the value matches a list of experimental values within an error. Then the whole rows (string + value) that match have to be shown. Also, it may find multiple rows that match an experimental value in the vector. In my example, the experimental value 163.6 is appearing 3 times in df. However it returns it only once...

I took the code from there (first answer):

m <- c(0,1,2,3)
a <- c(0,1,2,3)
p <- c(1,2,3)
Mname <- c("M","MAc")
M <- c(100,110)
Aname <- c("H2O","MeOH","ACN")
A <- c(18,32,41.05)
Pname <- c("H+","Na+")
P <- c(1,23)

#calculating all the values in the first column of the data frame
function_value <- function(m,a,p,M,A,P){
  mass <- (m*M+a*A+p*P)/p
  return(mass)}

FullGrid_value <- expand.grid(m,a,p,M,A,P)
colnames(FullGrid_value) <- c("m","a","p","M","A","P")

masses <- mapply(function_value, FullGrid_value$a, FullGrid_value$m, FullGrid_value$p,FullGrid_value$A, FullGrid_value$M, FullGrid_value$P)

#creating all the matching string names in the second column of the data frame
function_name <- function(m,a,p,Mname,Aname,Pname){
  name <-paste(m,Mname,a,Aname,p,Pname)
  return(name)}

FullGrid_name <- expand.grid(m,a,p,Mname,Aname,Pname)
colnames(FullGrid_name) <- c("m","a","p","Mname","Aname","Pname")

names <- mapply(function_name, FullGrid_name$a, FullGrid_name$m, FullGrid_name$p,FullGrid_name$Aname, FullGrid_name$Mname, FullGrid_name$Pname)

df<- data.frame(masses, names)

#________________________________________________
#This is where my question happens, first the data frame is ordered by increasing values (masses)
df <- df[order(df$masses),]

#The vector of experimental value to compare with the masses in data frame "df"
experi <-c(1400,163.6,262.1,42.6)

i <- findInterval(experi,df$masses,all.inside=T)
inc <- which(abs(df$masses[i+1L]-experi)<abs(df$masses[i]-experi))
i[inc] <- i[inc]+1L

res <- df$masses[i]

#replace the masses out of the interval by NA
res[abs(res-experi)>0.5] <- NA_real_

#returning the result of the search with the error
data.frame(experi,res,error=experi-res)

It returns the following:

  experi    res error
1 1400.0     NA    NA
2  163.6 164.05 -0.45
3  262.1 262.05  0.05
4   42.6     NA    NA

Whereas I would need the following (notice 163.6 has 3 matching values in df):

  experi    res  error name
1 1400.0     NA    NA  NA
2  163.6 164.05 -0.45  3 ACN 3 M 3 Na+
3  163.6 164.05 -0.45  2 ACN 2 M 2 Na+
4  163.6 164.05 -0.45  1 ACN 1 M 1 Na+
5  262.1 262.05  0.05  1 ACN 2 MAc 1 H+
6   42.6     NA    NA  NA

Thank you for your help!

I think I understand what you want: for each point in experi, you want to find all the rows in df where masses is close enough (under a certain tolerance). The result should have one set of rows for each entry in experi, or a single row full of NA if no row in df is close enough.

I think this code would do what you want:

# define inputs
[...]
df<- data.frame(masses, names)
experi <-c(1400,163.6,262.1,42.6)
tolerance <- 0.5

# define a function to find the rows in df that match a given experi point
find_matching_rows <- function(exp){
    res <- df[abs(df$masses - exp) <= tolerance, ]
    
    if(nrow(res) == 0){
      return(data.frame(experi = exp,
                        masses = NA_real_,
                        names = NA_character_))
    }
    
    # add the experi value and reorder columns
    res$experi <- rep(exp, nrow(res))
    res[, c("experi", "masses", "names")]
}

# find matching rows for each value in experi
res_list <- lapply(experi,
                   find_matching_rows)

# we have a list, assemble as a dataframe
do.call(rbind, res_list)
#>     experi masses            names
#> 1   1400.0     NA             <NA>
#> 486  163.6 164.05  1 ACN 1 M 1 Na+
#> 507  163.6 164.05  2 ACN 2 M 2 Na+
#> 528  163.6 164.05  3 ACN 3 M 3 Na+
#> 247  262.1 262.05 1 ACN 2 MAc 1 H+
#> 11    42.6     NA             <NA>

Created on 2023-08-22 with reprex v2.0.2

2 Likes

Indeed that works. Thanks!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.