Is there a "Un-Character" Command in R?

omario · April 10, 2022, 5:28am

I am working with the R programming language.

I have the following dataset:

factor <- c(1,2,3,4,5,6,7,8,9,10)

var_1 <- as.factor(sample(factor, 10000, replace=TRUE, prob=c(0.1,0.1,0.1,0.1,0.1, 0.1,0.1,0.1,0.1,0.1)))

var_2 <- as.factor(sample(factor, 10000, replace=TRUE, prob=c(0.1,0.1,0.1,0.1,0.1, 0.1,0.1,0.1,0.1,0.1)))

var_3 <- as.factor(sample(factor, 10000, replace=TRUE, prob=c(0.1,0.1,0.1,0.1,0.1, 0.1,0.1,0.1,0.1,0.1)))

var_4 <- as.factor(sample(factor, 10000, replace=TRUE, prob=c(0.1,0.1,0.1,0.1,0.1, 0.1,0.1,0.1,0.1,0.1)))

var_5 <- as.factor(sample(factor, 10000, replace=TRUE, prob=c(0.1,0.1,0.1,0.1,0.1, 0.1,0.1,0.1,0.1,0.1)))

my_data = data.frame(var_1, var_2, var_3, var_4, var_5)

I also have another dataset of "conditions" that will be used for querying this data frame:

conditions = data.frame(cond_1 = c("1,3,4", "4,5,6"), cond_2 = c("5,6", "7,8,9"))

My Question: I tried to run the following command to select rows from "my_data" based on the first row of "conditions" - but this returns an empty result:

my_data[my_data$var_1 %in% unlist(conditions[1,1]) &
            my_data$var_2 %in% unlist(conditions[1,2]), ]

[1] var_1 var_2 var_3 var_4 var_5
<0 rows> (or 0-length row.names)

I tried to look more into this by "inspecting" these conditions:

class(conditions[1,1])
[1] "character"

This makes me think that the "unlist()" command is not working because the conditions themselves are a "character" instead of a "list".

Is there an equivalent command that can be used here that plays the same role as the "unlist()" command so that the above statement can be run?

In general, I am trying to produce the same results as I would have gotten from this code - but keeping the format I was using above:

my_data[my_data$var_1 %in% c("1", "3", "4") &
            my_data$var_2 %in% c("5", "6"), ]

Thanks!

xvalda · April 10, 2022, 9:32am

Hi @omario ,
Would this work for you?

my_data[my_data$var_1 %in% unlist(strsplit(conditions[1,1], ",")) & 
          my_data$var_2 %in% unlist(strsplit(conditions[1,2], ",")),]

pe2ju · April 10, 2022, 9:36am

How about this?

factor <- c(1,2,3,4,5,6,7,8,9,10)

var_1 <- as.factor(sample(factor, 10000, replace=TRUE, prob=c(0.1,0.1,0.1,0.1,0.1, 0.1,0.1,0.1,0.1,0.1)))

var_2 <- as.factor(sample(factor, 10000, replace=TRUE, prob=c(0.1,0.1,0.1,0.1,0.1, 0.1,0.1,0.1,0.1,0.1)))

var_3 <- as.factor(sample(factor, 10000, replace=TRUE, prob=c(0.1,0.1,0.1,0.1,0.1, 0.1,0.1,0.1,0.1,0.1)))

var_4 <- as.factor(sample(factor, 10000, replace=TRUE, prob=c(0.1,0.1,0.1,0.1,0.1, 0.1,0.1,0.1,0.1,0.1)))

var_5 <- as.factor(sample(factor, 10000, replace=TRUE, prob=c(0.1,0.1,0.1,0.1,0.1, 0.1,0.1,0.1,0.1,0.1)))

my_data = data.frame(var_1, var_2, var_3, var_4, var_5)

# Filter conditions
cond_1 = list( c(1,3,4), c(4,5,6) )
cond_2 = list( c(5,6), c(7,8,9) )

# Filter using base R
flt1 <- my_data[ my_data$var_1 %in% cond_1[[1]] &
                 my_data$var_2 %in% cond_2[[2]], ] 
head(flt1)
#>    var_1 var_2 var_3 var_4 var_5
#> 1      1     7    10     9     2
#> 26     3     7     5     2    10
#> 33     3     9     1     4     3
#> 56     1     7     1     2     4
#> 58     1     8     2     4     5
#> 78     4     8     8     2     7

# Filter using dplyr/tidyverse package
flt2 <- my_data |> dplyr::filter( var_1 %in% cond_1[[1]] & 
                                  var_2 %in% cond_2[[2]] ) 
head(flt2)
#>   var_1 var_2 var_3 var_4 var_5
#> 1     1     7    10     9     2
#> 2     3     7     5     2    10
#> 3     3     9     1     4     3
#> 4     1     7     1     2     4
#> 5     1     8     2     4     5
#> 6     4     8     8     2     7

system · May 1, 2022, 9:37am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.