Conditionally alter multiple columns of data frame

rcepka · August 19, 2023, 9:40pm

I am trying to achieve something like this, preferably in tidyverse way:

If(column1 == TRUE) {
	column2 <- somevalue
	column3 <- somevalue
	column4 <- somevalue
	}

Example data:

data <- tibble(
    price = c(150, 35, 200),
    price_type = NA,
    price_currency = NA
  )

I want to alter price_type and price_currency columns, depending on the result of test on price column.
I tryed several ways, nothing worked for me:

 data %>% 
    rowwise() %>%
      if(data$price == 1) {
		data$price_type <- NA
		data$price_currency <- "USD"
      }

Nor this worked:

testfun <- function() {
    if(price == 1){
      price_type <- NA
      price_currency <- "USD"
    }
  }
  
  data %>% purrrlyr::by_row(testfun)

Many thanks in advance for any help

technocrat · August 19, 2023, 10:31pm

tidyverse isn't necessarily the easiest way to do something like this.

set.seed(42)
# create a data frame with at least two values of a
# variable price that are identical
d <- data.frame(price = sample(seq(1:52),26,replace = TRUE),
  price_type = sample(LETTERS,26),
  price_currency = sample(letters,26))
d <- d[1:8,]
d
#>   price price_type price_currency
#> 1    49          M              z
#> 2    37          E              r
#> 3     1          T              f
#> 4    25          B              x
#> 5    10          H              b
#> 6    36          C              t
#> 7    18          A              c
#> 8    49          J              v

# perform the transformation simply and clearly

# illustrate steps
d[,1] == 49
#> [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
d[d[,1] == 49,]
#>   price price_type price_currency
#> 1    49          M              z
#> 8    49          J              v
# deploy
d[d[,1] == 49,c("price_type","price_currency")] = NA
d
#>   price price_type price_currency
#> 1    49       <NA>           <NA>
#> 2    37          E              r
#> 3     1          T              f
#> 4    25          B              x
#> 5    10          H              b
#> 6    36          C              t
#> 7    18          A              c
#> 8    49       <NA>           <NA>

^{Created on 2023-08-19 with reprex v2.0.2}

Translation

Subset d by all rows where the first variable equals 49 and identify the two variables to be modified by their quoted names, and then set them to NA. The only syntax to remember is the bracket operator [ and ] and that the order goes row,column. I can't say that it's impossible to go wrong, but I will say that when an error does creep in it is much easier to identify where.

scottyd22 · August 19, 2023, 10:35pm

Below is one approach that creates a function to change column values and then maps through each row.

library(tidyverse)

data <- tibble(
  price = c(150, 35, 200, 1),
  price_type = NA,
  price_currency = NA
)

myfun = function(i) {
  mydata = data[i,]
  if(mydata[,1] == 1) {
    mydata[,2] = NA
    mydata[,3] = 'USD'
  }
  mydata
}

# map through each row and bind together
map(1:nrow(data), myfun) |> bind_rows()
#> # A tibble: 4 × 3
#>   price price_type price_currency
#>   <dbl> <lgl>      <chr>         
#> 1   150 NA         <NA>          
#> 2    35 NA         <NA>          
#> 3   200 NA         <NA>          
#> 4     1 NA         USD

Created on 2023-08-19 with reprex v2.0.2

rcepka · August 20, 2023, 9:51am

@technocrat
Many thanks for your response. I am not full-fledged programmer, so tidyverse way of working with R is much easier for me.
Your code is really efficient but hard to understand for me, so I will stick with some tidyverse solution
I appreciate your answer and even if I don't use this solution, it can be helpful for other users.

rcepka · August 20, 2023, 10:02am

@scottyd22
Your code works, thank you!
Is there any way to use column names instead of indexes in your function?
Something like this (the code below doesnt work ofcourse):

myfun = function(i) {
    mydata = data[i,]
    if(mydata[,price] == 1) {
      mydata[,price_type] = NA
      mydata[,price_currency] = 'USD'
    }
    mydata
  }

The reason for this is that column position in data frame may change easily so I would feel more safe having it referenced by name instead of index..
Thanks again..

rcepka · August 20, 2023, 10:20am

It seems that to achieve this is not that striaghtforward with tidyverse. To expand on this topics, also for other R newbies to learn, I have the following related questions:

How can this be done by using "rowwise" apprach? Something like this:

data %>%
  rowwise() %>%
  ... some-code-to-alter-conditionally-several-columns ...

What is the correct way of using the by_row() function from purrrlyr package?
The function created by @scottyd22 in post above applied with by_row() does not work for me:

myfun = function(i) {
    mydata = data[i,]
    if(mydata[,1] == 1) {
      mydata[,2] = NA
      mydata[,3] = 'USD'
    }
    mydata
  }
    
  data %>% purrrlyr::by_row(myfun, .collate = "rows")

This does not work neither:

 testfun <- function(data) {
    if(price == 1){
      price_type <- NA
      price_currency <- "USD"
    }
  }
  
  data %>% purrrlyr::by_row(testfun, .collate = "rows")

scottyd22 · August 20, 2023, 10:44am

Hi @rcepka. Below are a couple more ways to tackle this problem. The first uses the same function approach with column names. The second uses mutate with an ifelse() statement for each column.

# 1. updated function with column names
myfun = function(i) {
  mydata = data[i,]
  if(mydata$price == 1) {
    mydata$price_type = NA
    mydata$price_currency = 'USD'
  }
  mydata
}

# map through each row and bind together
map(1:nrow(data), myfun) |> bind_rows()
#> # A tibble: 4 × 3
#>   price price_type price_currency
#>   <dbl> <lgl>      <chr>         
#> 1   150 NA         <NA>          
#> 2    35 NA         <NA>          
#> 3   200 NA         <NA>          
#> 4     1 NA         USD

# 2. using a mutate with ifelse()
data |>
  mutate(price_type = ifelse(price == 1, NA, price_type),
         price_currency = ifelse(price == 1, 'USD', price_currency)
         )
#> # A tibble: 4 × 3
#>   price price_type price_currency
#>   <dbl> <lgl>      <chr>         
#> 1   150 NA         <NA>          
#> 2    35 NA         <NA>          
#> 3   200 NA         <NA>          
#> 4     1 NA         USD

rcepka · August 20, 2023, 2:10pm

Thank you @scottyd22, it works like a charm.
If I may have a supplementary question please...
Lets assume that I want to supply a data frame name as an argument to your function. This does not work for me:

myfun = function(i, df) {
  mydata = df[i,]
  if(mydata$price == 1) {
    mydata$price_type = NA
    mydata$price_currency = 'USD'
  }
  mydata
}

result_df <- map(1:nrow(my_df_name), ~myfun(1, df)) |> bind_rows()

I am getting error:

Error in `map()`:
ℹ In index: 1.
Caused by error in `.f()`:
! object 'i' not found

scottyd22 · August 21, 2023, 12:55am

When using a function with multiple arguments within map(), the additional arguments need to be added after the function, as shown below. For this specific function, it looks like map(vector, function, data frame).

Here is a link to the documentation for map().

library(tidyverse)

# data frame
my_df_name <- tibble(
  price = c(150, 35, 200, 1),
  price_type = NA,
  price_currency = NA
)

# function
myfun = function(i, df) {
  mydata = df[i,]
  if(mydata$price == 1) {
    mydata$price_type = NA
    mydata$price_currency = 'USD'
  }
  mydata
}

# map(vector, function, data frame)
result_df <- map(1:nrow(my_df_name), myfun, my_df_name) |> bind_rows()

result_df
#> # A tibble: 4 × 3
#>   price price_type price_currency
#>   <dbl> <lgl>      <chr>         
#> 1   150 NA         <NA>          
#> 2    35 NA         <NA>          
#> 3   200 NA         <NA>          
#> 4     1 NA         USD

rcepka · August 21, 2023, 7:44am

@scottyd22,
understood...
Thank you very much for all your efforts and willingness, I appreciate it...

system · August 28, 2023, 7:44am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.