subsetting multiple rows

Hello,
I have a large dataframe (1000 rows and 200 columns) consisting of several values (names of something ). I subset and retain only rows that have certain values of my choice (eg football, camera, car,plane)

Thanks

Take a look at the filter function in the package dplyr. Subset rows using column values — filter • dplyr

Here's a tiny example of how you would do this but if you need more help. Please look into providing a reproducible example (FAQ: How to do a minimal reproducible example ( reprex ) for beginners):

library(tidyverse)
subsetteddata <- dataset %>%
   filter(variablename %in% c("football", "camera", "car", "plane"))

Thanks, it still does not give me what I need. Maybe I will need to explain my self well.
I have a dataframe, and in reality it has 50 variables (columns)- patients diagnosis (I10, A43, Y66, R54, Y67, E32...). I want to extract all patients (rows) with certain diagnosis (I10, A43, Y66). Any of those diagnosis can be in any column but they are mutually exclusive.

I think you need to provide a reproducible example at this point so we can best help you.

sample extract

Please see an extract from the df. I need to delete any value from the ICD_10 columns that is not either I10 or Y67 or E32- Then, extract/create a new df which includes only rows with either I10 or Y67 or E32.

This isn't a reproducible example. We can't do much with a screenshot. Please read this and see how to set up a reproducible example.

1 Like
library(tidyverse)

df <- tibble(
  name = rep(seq(1:10), 100), 
  id = 1 + floor(0.1*(0:(length(name)-1))), 
  value = (LETTERS)[ntile(runif(1000), 26)]
  ) %>% 
  pivot_wider(names_prefix = "IDC_")

df
#> # A tibble: 100 x 11
#>       id IDC_1 IDC_2 IDC_3 IDC_4 IDC_5 IDC_6 IDC_7 IDC_8 IDC_9 IDC_10
#>    <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
#>  1     1 E     Y     Q     D     S     C     K     Q     Z     O     
#>  2     2 B     V     B     L     C     B     B     U     E     E     
#>  3     3 F     G     C     T     S     G     K     T     T     W     
#>  4     4 Q     N     C     Z     B     F     W     B     U     B     
#>  5     5 G     S     X     J     S     N     N     W     I     H     
#>  6     6 J     C     J     N     K     C     U     H     X     M     
#>  7     7 O     T     Q     P     P     I     P     I     D     O     
#>  8     8 K     I     S     Z     R     F     Z     R     Y     B     
#>  9     9 Y     P     I     R     U     M     C     N     C     N     
#> 10    10 B     P     G     P     A     J     V     V     S     O     
#> # ... with 90 more rows

df_deleted <- df %>%
  pivot_longer(starts_with("IDC")) %>%
  mutate(value = if_else(
    value %in% c("A", "B", "C"),
    NA_character_,
    value
  )) %>%
  pivot_wider()

df_deleted
#> # A tibble: 100 x 11
#>       id IDC_1 IDC_2 IDC_3 IDC_4 IDC_5 IDC_6 IDC_7 IDC_8 IDC_9 IDC_10
#>    <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
#>  1     1 E     Y     Q     D     S     <NA>  K     Q     Z     O     
#>  2     2 <NA>  V     <NA>  L     <NA>  <NA>  <NA>  U     E     E     
#>  3     3 F     G     <NA>  T     S     G     K     T     T     W     
#>  4     4 Q     N     <NA>  Z     <NA>  F     W     <NA>  U     <NA>  
#>  5     5 G     S     X     J     S     N     N     W     I     H     
#>  6     6 J     <NA>  J     N     K     <NA>  U     H     X     M     
#>  7     7 O     T     Q     P     P     I     P     I     D     O     
#>  8     8 K     I     S     Z     R     F     Z     R     Y     <NA>  
#>  9     9 Y     P     I     R     U     M     <NA>  N     <NA>  N     
#> 10    10 <NA>  P     G     P     <NA>  J     V     V     S     O     
#> # ... with 90 more rows

df_only <- df %>%
  filter(select(., -id) %>% apply(1, function(x) any(x %in% c("A", "B", "C"))))

df_only
#> # A tibble: 72 x 11
#>       id IDC_1 IDC_2 IDC_3 IDC_4 IDC_5 IDC_6 IDC_7 IDC_8 IDC_9 IDC_10
#>    <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
#>  1     1 E     Y     Q     D     S     C     K     Q     Z     O     
#>  2     2 B     V     B     L     C     B     B     U     E     E     
#>  3     3 F     G     C     T     S     G     K     T     T     W     
#>  4     4 Q     N     C     Z     B     F     W     B     U     B     
#>  5     6 J     C     J     N     K     C     U     H     X     M     
#>  6     8 K     I     S     Z     R     F     Z     R     Y     B     
#>  7     9 Y     P     I     R     U     M     C     N     C     N     
#>  8    10 B     P     G     P     A     J     V     V     S     O     
#>  9    11 N     I     D     C     G     J     S     I     P     D     
#> 10    13 A     N     H     X     L     G     J     Y     V     L     
#> # ... with 62 more rows

Created on 2021-06-17 by the reprex package (v1.0.0)

The preferred way to do row-wise operations is now rowwise(), but I can never seem to figure out how to use it beyond very simple cases. I find apply(x, 1, fn) much easier to understand.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.