Extract cell value in data.table whose colomn name respects a condition

Hello,

I have several data tables to inspect. They have the same number of columns and rows.The column names only differ in their prefix.

Example:
data table A has the following colnames: A_col1, A_col2, ...
data table B has the following colnames: B_col1, B_col2, ...

I am using a for loop to extract specific values from each data table. Let's say that I want to extract in a new data table all the values from all the col1 of each data table.

How can I do this?

Hi,

Welcome to the RStudio community!

Here is a Tidyverse implementation:

library(tidyverse)

#Choose which numbers in the columns you're interested in
columnNumbers = c(1,3,4, 11)

#Generate test data
df1 = data.frame(matrix(0, nrow = 10, ncol = 15))
colnames(df1) = paste0("A", paste0("_col", 1:15))

df2 = data.frame(matrix(0, nrow = 10, ncol = 15))
colnames(df2) = paste0("B", paste0("_col", 1:15))

#Filter the data and combine them
fileredData = 
  df1 %>% select(matches(
    paste(paste0("col", columnNumbers, "$"), collapse = "|")
    )) %>%
  cbind(
    df2 %>% select(matches(
      paste(paste0("col", columnNumbers, "$"), collapse = "|")
      ))
    )

> fileredData
   A_col1 A_col3 A_col4 A_col11 B_col1 B_col3 B_col4 B_col11
1       0      0      0       0      0      0      0       0
2       0      0      0       0      0      0      0       0
3       0      0      0       0      0      0      0       0
4       0      0      0       0      0      0      0       0
5       0      0      0       0      0      0      0       0
6       0      0      0       0      0      0      0       0
7       0      0      0       0      0      0      0       0
8       0      0      0       0      0      0      0       0
9       0      0      0       0      0      0      0       0
10      0      0      0       0      0      0      0       0

If you don't know the tidyverse yet, I suggest you read more about how to use the piping command and the data manipulation functions.

Basically, I filtered all the column names that contained the column numbers you request at the top. For this, I use regex to quickly filter the names. The command looks a bit weird because of all the pastes, but for the example given, it translated into this:

paste(paste0("col", columnNumbers, "$"), collapse = "|")
[1] "col1$|col3$|col4$|col11$"

Note the $ is at the end of each substring because if you don't col1 would match both col1 and col11 in regex.

After I filtered the first dataset, I simply attach the columns of the second one with the cbind function doing the exact same procedure for the second one.

Hope this helps,
PJ

Nice solution but it wouldn't scale well with several data frames (as stated by the OP) how about this modification?

library(tidyverse)

#Choose which numbers in the columns you're interested in
columnNumbers = c(1,3,4, 11)

df1 = data.frame(matrix(0, nrow = 10, ncol = 15))
colnames(df1) = paste0("A", paste0("_col", 1:15))

df2 = data.frame(matrix(0, nrow = 10, ncol = 15))
colnames(df2) = paste0("B", paste0("_col", 1:15))

fileredData <- ls(pattern = "^df") %>% 
    map_dfc(~{
        eval(as.name(.x)) %>% 
            select(matches(paste(paste0("col", columnNumbers, "$"), collapse = "|")))
    })

fileredData 
#>    A_col1 A_col3 A_col4 A_col11 B_col1 B_col3 B_col4 B_col11
#> 1       0      0      0       0      0      0      0       0
#> 2       0      0      0       0      0      0      0       0
#> 3       0      0      0       0      0      0      0       0
#> 4       0      0      0       0      0      0      0       0
#> 5       0      0      0       0      0      0      0       0
#> 6       0      0      0       0      0      0      0       0
#> 7       0      0      0       0      0      0      0       0
#> 8       0      0      0       0      0      0      0       0
#> 9       0      0      0       0      0      0      0       0
#> 10      0      0      0       0      0      0      0       0
1 Like

Thank both of you for your solution.

So far I came up with a satisfactory solution: because the prefix of each column names has the same lenght, I decided to cut each prefix, and therefore extracted the data simply by referring directly to the new column names which are then the same in each data table (col1; col2; etc.).

However, I am still curious to find a proper solution.

Initially I though that I could extract values with such a command:
df$paste("A","col1",sep="_")

Hi,

What you mean with by this:

I think both @andresrcs and I provided a solution right? If not, please explain more clearly what the ongoing issue is. I also encourage you to provide a reprex. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

PJ

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.