Hello, we need to create a subset of a dataset on R by choosing only two labels from a column of 9 labels. In our new data frame the 9 labels exist, even though 7 have no rows, we would like them not to exist. Does anyone know how to do this?
Thanks for your help
Hello,
Welcome to RStudio Community.
It would be helpful if you were to provide a "reproducible example", detailed here:
FAQ: What's a reproducible example (reprex
) and how do I create one?
I believe you are talking about filtering. Here is an example using dplyr:
library(tidyverse)
diamonds
#> # A tibble: 53,940 x 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
#> 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
#> 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
#> 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
#> 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
#> 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
#> 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
#> 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
#> 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
#> 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
#> # ... with 53,930 more rows
diamonds$cut %>% unique()
#> [1] Ideal Premium Good Very Good Fair
#> Levels: Fair < Good < Very Good < Premium < Ideal
best_diamonds = diamonds %>%
filter(cut %in% c("Ideal", "Premium"))
best_diamonds
#> # A tibble: 35,342 x 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
#> 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
#> 3 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
#> 4 0.23 Ideal J VS1 62.8 56 340 3.93 3.9 2.46
#> 5 0.22 Premium F SI1 60.4 61 342 3.88 3.84 2.33
#> 6 0.31 Ideal J SI2 62.2 54 344 4.35 4.37 2.71
#> 7 0.2 Premium E SI2 60.2 62 345 3.79 3.75 2.27
#> 8 0.32 Premium E I1 60.9 58 345 4.38 4.42 2.68
#> 9 0.3 Ideal I SI2 62 54 348 4.31 4.34 2.68
#> 10 0.24 Premium I VS1 62.5 57 355 3.97 3.94 2.47
#> # ... with 35,332 more rows
Created on 2021-12-16 by the reprex package (v2.0.1)
1 Like
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.