Problem with parsing variables in a R function

I'm working with a dataset in R and I want to create a function that separates values into three categories based on quartiles. Values below the first quartile (Q1) should be labeled "Low expression", values above the third quartile (Q3) should be labeled "High expression", and everything else should be labeled "NA".

The purpose of that function is to create a dataset containing extreme expression values of a gene var_1 (quartiles Q1 and Q4) that I will correlate with several patient outcomes (variables 1:7). In order words, are extreme values of var_1 (Q1 and Q4) influencing the response of the variables 1 to 7?

The problem is that the function keeps considering all observations of var_1 as high_expression. Can someone help me with this one?

function

fun_1 <- function(df, var_1) {
  
  quants <- quantile(df[[var_1]], c(.25, .75))
  
  tmp_1 <- df %>%
    # selecting patients outcomes and gene expression
    select(c(1:7) | {{var_1}}) %>%
    # creating a new variable classifying Q1 and Q4
    mutate(
      expression =
        case_when(
          {{var_1}} >= quants[["75%"]] ~ sprintf(
            "High expression ( >= %.02f )", quants[["75%"]]),
          {{var_1}} <= quants[["25%"]] ~ sprintf(
            "Low expression ( <= %.02f )", quants[["25%"]]),
          .default = NA)) %>%
    # excluding Q2 and Q3
    filter(!is.na(expression))
          
  tmp_1
}

fun_1(dataset_name, "gene_name")

output

image

Does it help if you insert rowwise() %>% immediately before mutate?

After some work, finally I discovered that the problem in the function was related to the interpretation of the variable var_1 by the dplyr package.

The way I solved the issue was converting var_1 to a symbol and referencing it inside the function with the !! operator. The final result was the following:

fun_1 <- function(df, var_1) {
  
   # converting var_1 as a symbol for dplyr compatibility
  sym <- as.symbol(var_1)
  # delimitating quantiles
  quants <- quantile(df[[var_1]], c(.25, .75))
  
  tmp_1 <- df %>%
    # selecting patients outcomes and gene expression
    select(c(1:7) | !!sym) %>%
    # creating a new variable classifying Q1 and Q4
    mutate(
      expression =
        case_when(
          !!sym >= quants[["75%"]] ~ sprintf(
            "High expression ( >= %.02f )", quants[["75%"]]),
          !!sym <= quants[["25%"]] ~ sprintf(
            "Low expression ( <= %.02f )", quants[["25%"]]),
          .default = NA)) %>%
    # excluding Q2 and Q3
    filter(!is.na(expression))
          
  tmp_1
}

fun_1(dataset_name, "gene_name")

Unfortunately no. I tried several ways of solving it, including the rowwise() function. The only way that seemed to work in the end, was to change the way I was referencing the variables inside the function. Thanks for the help, tho!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.