How to use a vector of column name-row name pairs to extract corresponding values from table?

DynV · July 16, 2024, 9:43am

I have a dataframe with a good amount of columns. I have a vector of IDs with names corresponding to a unique columns, that is there may be duplicate IDs but at the most 1 column name. I know I could do this using a for but I'm trying to have the data created from 2 vectors in a way similar to paste(). I've been working on the problem and the closest I currently got to it is

df[which(df$ID %in% ID_list), names(ID_list)]

what I need is a vector of the same length of the ID list corresponding to the content of the column for that ID. For example, with

ID_list <- c(A=53, B=64, C=59)
set.seed(123); df <- as.data.frame(matrix(dimnames = list(53:64, LETTERS[1:3]), sample(10:99, (64-53 + 1) * 3, replace=FALSE), ncol=3))

should yield

40, 82, 71

Thank you kindly for your help

nirgrahamuk · July 16, 2024, 12:56pm

ID_list <- c(A = 53, B = 64, C = 59)
set.seed(123)
df <- as.data.frame(matrix(dimnames = list(53:64, LETTERS[1:3]), sample(10:99, (64 - 53 + 1) * 3, replace = FALSE), ncol = 3))

library(purrr)
library(dplyr)

tdf <- as_tibble(df,rownames = "rn")
vec_res <- imap_int(ID_list, \(x, y){
  filter(tdf,rn==x) |> pull(y)
}) 

vec_res

str(vec_res)

dromano · July 16, 2024, 2:21pm

Here an approach that uses a join:

original data as `ID_list` and `df`

ID_list <- c(A=53, B=64, C=59)
set.seed(123); 
df <- 
  as.data.frame(
    matrix(
      dimnames = list(53:64, LETTERS[1:3]), 
      sample(
        10:99, 
        (64-53 + 1) * 3, 
        replace=FALSE
      ), 
      ncol=3
    )
  )

library(tibble)
# move row names to column and view data types of columns
df |>
  rownames_to_column(var = "row") |> 
  as_tibble() |> 
  head(3)
#> # A tibble: 3 × 4
#>   row       A     B     C
#>   <chr> <int> <int> <int>
#> 1 53       40    18    50
#> 2 54       88    81    32
#> 3 55       60    35    36

# convert ID_list to table
ID_list |> 
  enframe(name = "col", value = "row")
#> # A tibble: 3 × 2
#>   col     row
#>   <chr> <dbl>
#> 1 A        53
#> 2 B        64
#> 3 C        59

# create long version of df 
library(tidyr)
df |> 
  rownames_to_column(var = "row") |>
  pivot_longer(A:C, names_to = "col") |> 
  head(3)
#> # A tibble: 3 × 3
#>   row   col   value
#>   <chr> <chr> <int>
#> 1 53    A        40
#> 2 53    B        18
#> 3 53    C        50

# join long version of df to ID_list table
library(dplyr)
ID_list |> 
  enframe(name = "col", value = "row") |> 
  # convert values of "row" to character to match original row names
  mutate(across(row, as.character)) |> 
  inner_join(
    df |> 
      rownames_to_column(var = "row") |>
      pivot_longer(A:C, names_to = "col")
  ) |> 
  pull(value)
#> Joining with `by = join_by(col, row)`
#> [1] 40 82 71

^{Created on 2024-07-16 with reprex v2.0.2}

system · October 14, 2024, 2:21pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.