search multiple columns for values above '60', if true return the value from another column

technocrat · April 6, 2023, 2:12am

This is simple enough not to absolutely require cut-and-paste reprex (see the FAQ, but it's a good idea to cut friction as much as possible.

This is a concise way to do this, with little syntax to master. I'll unpack it below

# fake data created by random sampling 
# without a seed, so they are likely to
# be all different and different each
# time data frame is created with this
# snippet
m <- matrix(
  c(plate = 1:50,
  sample(20:100,50, replace = TRUE),
  sample(20:100,50, replace = TRUE),
  sample(20:100,50, replace = TRUE)),
  nrow = 50,
  ncol = 4
)

colnames(m) <- c("plate","m1","m2","m3")
head(m)
#>      plate m1 m2  m3
#> [1,]     1 32 72  51
#> [2,]     2 47 60  89
#> [3,]     3 53 84  23
#> [4,]     4 94 98  78
#> [5,]     5 50 72 100
#> [6,]     6 98 61  47

mark_na <- function(x) ifelse(x < 60,NA,x)

m[,2:4] <- apply(m[,2:4],2,mark_na)

head(m)
#>      plate m1 m2  m3
#> [1,]     1 NA 72  NA
#> [2,]     2 NA 60  89
#> [3,]     3 NA 84  NA
#> [4,]     4 94 98  78
#> [5,]     5 NA 72 100
#> [6,]     6 98 61  NA

^{Created on 2023-04-05 with reprex v2.0.2}

My paradigm of using R is school algebra—f(x)=y. x is an object that needs some transformation, y is the object containing the transformation and f is the function object that does the transformation. Each of these may be, and usually is, composite.
The object chosen for x has a big influence on f. I've used a matrix because all the contents to be subject to f is numeric. A matrix must be either all character or all numeric. Internally, both columns and rows are vectors. A data frame, which is where incoming data usually lands, can mix character and numeric types. *However, both columns and rows are lists. This is an important difference because a matrix can be treated as a single object and transformed more simply.
The function isolates the logical condition to be tested—whether a value is less than 61, because those are the values to be replaced with NA.
The matrix object, m, has objects and rows. Here m[,2:4] means all rows of m (because the row position in the brackets is empty and columns 2:4 (if we wanted only the second and fourth column, it would be m[,c(2,4)]). Think row/column, row/column. If only dealing with columns, it can be shorthanded m[2:3] which we usually do. When we want to change only some rows, it would be `m[1:7,2:3]. I find it helpful to always have the comma—one less thing to keep track of.
At this point, we know that we are changing every thing in m except the first, plate column with a value of less than 61 to NA and we know how. Now, we do that in a single pass by applying our function to the target columns by columns (we could also do it row-wise). That's what apply does.
As far as variable dimensions, dim() works like the subset operator, row/column. m is

> dim(m)
[1] 50  4

The script will work for any numbers of rows. Some wand waving is required for a variable number of columns. Come back with a reprex if you need help with that case.