subsetting multiple rows

lofus77 · June 17, 2021, 5:39pm

Hello,
I have a large dataframe (1000 rows and 200 columns) consisting of several values (names of something ). I subset and retain only rows that have certain values of my choice (eg football, camera, car,plane)

Thanks

StatSteph · June 17, 2021, 6:07pm

Take a look at the filter function in the package dplyr. Subset rows using column values — filter • dplyr

Here's a tiny example of how you would do this but if you need more help. Please look into providing a reproducible example (FAQ: How to do a minimal reproducible example ( reprex ) for beginners):

library(tidyverse)
subsetteddata <- dataset %>%
   filter(variablename %in% c("football", "camera", "car", "plane"))

lofus77 · June 17, 2021, 6:47pm

Thanks, it still does not give me what I need. Maybe I will need to explain my self well.
I have a dataframe, and in reality it has 50 variables (columns)- patients diagnosis (I10, A43, Y66, R54, Y67, E32...). I want to extract all patients (rows) with certain diagnosis (I10, A43, Y66). Any of those diagnosis can be in any column but they are mutually exclusive.

StatSteph · June 17, 2021, 6:58pm

I think you need to provide a reproducible example at this point so we can best help you.

lofus77 · June 17, 2021, 7:52pm

sample extract

Please see an extract from the df. I need to delete any value from the ICD_10 columns that is not either I10 or Y67 or E32- Then, extract/create a new df which includes only rows with either I10 or Y67 or E32.

StatSteph · June 17, 2021, 8:32pm

This isn't a reproducible example. We can't do much with a screenshot. Please read this and see how to set up a reproducible example.

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

arthur.t · June 17, 2021, 8:38pm

library(tidyverse)

df <- tibble(
  name = rep(seq(1:10), 100), 
  id = 1 + floor(0.1*(0:(length(name)-1))), 
  value = (LETTERS)[ntile(runif(1000), 26)]
  ) %>% 
  pivot_wider(names_prefix = "IDC_")

df
#> # A tibble: 100 x 11
#>       id IDC_1 IDC_2 IDC_3 IDC_4 IDC_5 IDC_6 IDC_7 IDC_8 IDC_9 IDC_10
#>    <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
#>  1     1 E     Y     Q     D     S     C     K     Q     Z     O     
#>  2     2 B     V     B     L     C     B     B     U     E     E     
#>  3     3 F     G     C     T     S     G     K     T     T     W     
#>  4     4 Q     N     C     Z     B     F     W     B     U     B     
#>  5     5 G     S     X     J     S     N     N     W     I     H     
#>  6     6 J     C     J     N     K     C     U     H     X     M     
#>  7     7 O     T     Q     P     P     I     P     I     D     O     
#>  8     8 K     I     S     Z     R     F     Z     R     Y     B     
#>  9     9 Y     P     I     R     U     M     C     N     C     N     
#> 10    10 B     P     G     P     A     J     V     V     S     O     
#> # ... with 90 more rows

df_deleted <- df %>%
  pivot_longer(starts_with("IDC")) %>%
  mutate(value = if_else(
    value %in% c("A", "B", "C"),
    NA_character_,
    value
  )) %>%
  pivot_wider()

df_deleted
#> # A tibble: 100 x 11
#>       id IDC_1 IDC_2 IDC_3 IDC_4 IDC_5 IDC_6 IDC_7 IDC_8 IDC_9 IDC_10
#>    <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
#>  1     1 E     Y     Q     D     S     <NA>  K     Q     Z     O     
#>  2     2 <NA>  V     <NA>  L     <NA>  <NA>  <NA>  U     E     E     
#>  3     3 F     G     <NA>  T     S     G     K     T     T     W     
#>  4     4 Q     N     <NA>  Z     <NA>  F     W     <NA>  U     <NA>  
#>  5     5 G     S     X     J     S     N     N     W     I     H     
#>  6     6 J     <NA>  J     N     K     <NA>  U     H     X     M     
#>  7     7 O     T     Q     P     P     I     P     I     D     O     
#>  8     8 K     I     S     Z     R     F     Z     R     Y     <NA>  
#>  9     9 Y     P     I     R     U     M     <NA>  N     <NA>  N     
#> 10    10 <NA>  P     G     P     <NA>  J     V     V     S     O     
#> # ... with 90 more rows

df_only <- df %>%
  filter(select(., -id) %>% apply(1, function(x) any(x %in% c("A", "B", "C"))))

df_only
#> # A tibble: 72 x 11
#>       id IDC_1 IDC_2 IDC_3 IDC_4 IDC_5 IDC_6 IDC_7 IDC_8 IDC_9 IDC_10
#>    <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
#>  1     1 E     Y     Q     D     S     C     K     Q     Z     O     
#>  2     2 B     V     B     L     C     B     B     U     E     E     
#>  3     3 F     G     C     T     S     G     K     T     T     W     
#>  4     4 Q     N     C     Z     B     F     W     B     U     B     
#>  5     6 J     C     J     N     K     C     U     H     X     M     
#>  6     8 K     I     S     Z     R     F     Z     R     Y     B     
#>  7     9 Y     P     I     R     U     M     C     N     C     N     
#>  8    10 B     P     G     P     A     J     V     V     S     O     
#>  9    11 N     I     D     C     G     J     S     I     P     D     
#> 10    13 A     N     H     X     L     G     J     Y     V     L     
#> # ... with 62 more rows

^{Created on 2021-06-17 by the reprex package (v1.0.0)}

arthur.t · June 17, 2021, 8:41pm

The preferred way to do row-wise operations is now rowwise(), but I can never seem to figure out how to use it beyond very simple cases. I find apply(x, 1, fn) much easier to understand.

system · July 8, 2021, 8:41pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.