Extract cell value in data.table whose colomn name respects a condition

Jonas · January 15, 2020, 10:26am

Hello,

I have several data tables to inspect. They have the same number of columns and rows.The column names only differ in their prefix.

Example:
data table A has the following colnames: A_col1, A_col2, ...
data table B has the following colnames: B_col1, B_col2, ...

I am using a for loop to extract specific values from each data table. Let's say that I want to extract in a new data table all the values from all the col1 of each data table.

How can I do this?

pieterjanvc · January 15, 2020, 12:52pm

Hi,

Welcome to the RStudio community!

Here is a Tidyverse implementation:

library(tidyverse)

#Choose which numbers in the columns you're interested in
columnNumbers = c(1,3,4, 11)

#Generate test data
df1 = data.frame(matrix(0, nrow = 10, ncol = 15))
colnames(df1) = paste0("A", paste0("_col", 1:15))

df2 = data.frame(matrix(0, nrow = 10, ncol = 15))
colnames(df2) = paste0("B", paste0("_col", 1:15))

#Filter the data and combine them
fileredData = 
  df1 %>% select(matches(
    paste(paste0("col", columnNumbers, "$"), collapse = "|")
    )) %>%
  cbind(
    df2 %>% select(matches(
      paste(paste0("col", columnNumbers, "$"), collapse = "|")
      ))
    )

> fileredData
   A_col1 A_col3 A_col4 A_col11 B_col1 B_col3 B_col4 B_col11
1       0      0      0       0      0      0      0       0
2       0      0      0       0      0      0      0       0
3       0      0      0       0      0      0      0       0
4       0      0      0       0      0      0      0       0
5       0      0      0       0      0      0      0       0
6       0      0      0       0      0      0      0       0
7       0      0      0       0      0      0      0       0
8       0      0      0       0      0      0      0       0
9       0      0      0       0      0      0      0       0
10      0      0      0       0      0      0      0       0

If you don't know the tidyverse yet, I suggest you read more about how to use the piping command and the data manipulation functions.

Basically, I filtered all the column names that contained the column numbers you request at the top. For this, I use regex to quickly filter the names. The command looks a bit weird because of all the pastes, but for the example given, it translated into this:

paste(paste0("col", columnNumbers, "$"), collapse = "|")
[1] "col1$|col3$|col4$|col11$"

Note the $ is at the end of each substring because if you don't col1 would match both col1 and col11 in regex.

After I filtered the first dataset, I simply attach the columns of the second one with the cbind function doing the exact same procedure for the second one.

Hope this helps,
PJ

andresrcs · January 15, 2020, 9:43pm

Nice solution but it wouldn't scale well with several data frames (as stated by the OP) how about this modification?

library(tidyverse)

#Choose which numbers in the columns you're interested in
columnNumbers = c(1,3,4, 11)

df1 = data.frame(matrix(0, nrow = 10, ncol = 15))
colnames(df1) = paste0("A", paste0("_col", 1:15))

df2 = data.frame(matrix(0, nrow = 10, ncol = 15))
colnames(df2) = paste0("B", paste0("_col", 1:15))

fileredData <- ls(pattern = "^df") %>% 
    map_dfc(~{
        eval(as.name(.x)) %>% 
            select(matches(paste(paste0("col", columnNumbers, "$"), collapse = "|")))
    })

fileredData 
#>    A_col1 A_col3 A_col4 A_col11 B_col1 B_col3 B_col4 B_col11
#> 1       0      0      0       0      0      0      0       0
#> 2       0      0      0       0      0      0      0       0
#> 3       0      0      0       0      0      0      0       0
#> 4       0      0      0       0      0      0      0       0
#> 5       0      0      0       0      0      0      0       0
#> 6       0      0      0       0      0      0      0       0
#> 7       0      0      0       0      0      0      0       0
#> 8       0      0      0       0      0      0      0       0
#> 9       0      0      0       0      0      0      0       0
#> 10      0      0      0       0      0      0      0       0

Jonas · January 16, 2020, 8:22am

Thank both of you for your solution.

So far I came up with a satisfactory solution: because the prefix of each column names has the same lenght, I decided to cut each prefix, and therefore extracted the data simply by referring directly to the new column names which are then the same in each data table (col1; col2; etc.).

However, I am still curious to find a proper solution.

Initially I though that I could extract values with such a command:
df$paste("A","col1",sep="_")

pieterjanvc · January 16, 2020, 12:48pm

Hi,

What you mean with by this:

I think both @andresrcs and I provided a solution right? If not, please explain more clearly what the ongoing issue is. I also encourage you to provide a reprex. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

PJ

system · February 6, 2020, 12:48pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.