tidyverse isn't necessarily the easiest way to do something like this.
set.seed(42)
# create a data frame with at least two values of a
# variable price that are identical
d <- data.frame(price = sample(seq(1:52),26,replace = TRUE),
price_type = sample(LETTERS,26),
price_currency = sample(letters,26))
d <- d[1:8,]
d
#> price price_type price_currency
#> 1 49 M z
#> 2 37 E r
#> 3 1 T f
#> 4 25 B x
#> 5 10 H b
#> 6 36 C t
#> 7 18 A c
#> 8 49 J v
# perform the transformation simply and clearly
# illustrate steps
d[,1] == 49
#> [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
d[d[,1] == 49,]
#> price price_type price_currency
#> 1 49 M z
#> 8 49 J v
# deploy
d[d[,1] == 49,c("price_type","price_currency")] = NA
d
#> price price_type price_currency
#> 1 49 <NA> <NA>
#> 2 37 E r
#> 3 1 T f
#> 4 25 B x
#> 5 10 H b
#> 6 36 C t
#> 7 18 A c
#> 8 49 <NA> <NA>
Subset d by all rows where the first variable equals 49 and identify the two variables to be modified by their quoted names, and then set them to NA. The only syntax to remember is the bracket operator [ and ] and that the order goes row,column. I can't say that it's impossible to go wrong, but I will say that when an error does creep in it is much easier to identify where.
@technocrat
Many thanks for your response. I am not full-fledged programmer, so tidyverse way of working with R is much easier for me.
Your code is really efficient but hard to understand for me, so I will stick with some tidyverse solution
I appreciate your answer and even if I don't use this solution, it can be helpful for other users.
@scottyd22
Your code works, thank you!
Is there any way to use column names instead of indexes in your function?
Something like this (the code below doesnt work ofcourse):
The reason for this is that column position in data frame may change easily so I would feel more safe having it referenced by name instead of index..
Thanks again..
It seems that to achieve this is not that striaghtforward with tidyverse. To expand on this topics, also for other R newbies to learn, I have the following related questions:
How can this be done by using "rowwise" apprach? Something like this:
data %>%
rowwise() %>%
... some-code-to-alter-conditionally-several-columns ...
What is the correct way of using the by_row() function from purrrlyr package?
The function created by @scottyd22 in post above applied with by_row() does not work for me:
Hi @rcepka. Below are a couple more ways to tackle this problem. The first uses the same function approach with column names. The second uses mutate with an ifelse() statement for each column.
# 1. updated function with column names
myfun = function(i) {
mydata = data[i,]
if(mydata$price == 1) {
mydata$price_type = NA
mydata$price_currency = 'USD'
}
mydata
}
# map through each row and bind together
map(1:nrow(data), myfun) |> bind_rows()
#> # A tibble: 4 × 3
#> price price_type price_currency
#> <dbl> <lgl> <chr>
#> 1 150 NA <NA>
#> 2 35 NA <NA>
#> 3 200 NA <NA>
#> 4 1 NA USD
# 2. using a mutate with ifelse()
data |>
mutate(price_type = ifelse(price == 1, NA, price_type),
price_currency = ifelse(price == 1, 'USD', price_currency)
)
#> # A tibble: 4 × 3
#> price price_type price_currency
#> <dbl> <lgl> <chr>
#> 1 150 NA <NA>
#> 2 35 NA <NA>
#> 3 200 NA <NA>
#> 4 1 NA USD
Thank you @scottyd22, it works like a charm.
If I may have a supplementary question please...
Lets assume that I want to supply a data frame name as an argument to your function. This does not work for me:
When using a function with multiple arguments within map(), the additional arguments need to be added after the function, as shown below. For this specific function, it looks like map(vector, function, data frame).