Create a subset of a panel data set

mfherman · November 15, 2018, 2:40pm

Hi, @MLent! Thanks for including some of your data. There's a couple things you can do to make it easier for folks here to help with your question. The first is formatting your code as code so it's easier to read and copy and paste into an R console. Basically, you just enclose your code between three back ticks like this:

``` r
reg <- plm(y~x, data=subset(df, ID[Variable>1000]), model="within")
```

Also, to make it easier for folks here to read and work with, it's better to create an R object with your sample data and post it here. This post has some good tips for how to include sample data:

Best Practices: how to prepare your own data for use in a `reprex` if you can’t, or don’t know how to reproduce a problem with a built-in dataset? tidyverse

@EconomiCurtis split this out of FAQ: What's a reproducible example (`reprex`) and how do I do one?. Curious if you have anything additional to add specifically on "how to prepare your own data for use in a reprex if you can't, or don't know how to reproduce a problem with a built-in dataset." I think @jessemaegan's post is about 80% there. The piece it is missing, if your average stack overflow post is any indication, is an explanation about how to prepare your own data for use in a reprex if you can't, or don't know how to reproduce a problem with a built-in dataset. Some handy things to know for this situation: deparse() The ugly as sin, gold standard: head(my_data, 2) %>% depa…

So, with your example, I would do something like the following:

# create sample data
my_data <- tibble::tribble(
 ~ID, ~Time, ~Variable,
 1, 1, 123,
 1, 2, 1001,
 1, 3, 90,
 2, 1, 1111,
 2, 2, 222,
 2, 3, 2222,
 3, 1, 200,
 3, 2, 2000,
 3, 3, 4000
 )

(I added more fake data to make the example a bit more clear.)

To manipulate data, I like to use the the dplyr package, which is part of the tidyverse. It can sometimes be a little more verbose than other ways of coding in R, but I think it makes the code easier to understand!

So here is how I would create a subset of the data you describe. First I find which IDs meet the conditions you define, and then I use those IDs to subset the full dataset.

library(dplyr)

# create vector of IDs meeting condition
my_ids <- my_data %>%
  filter(Time == 2 & Variable > 1000) %>%
  pull(ID)
my_ids
#> [1] 1 3

# subset data using that vector
my_subset <- my_data %>%
  filter(ID %in% my_ids)
my_subset
#> # A tibble: 6 x 3
#>      ID  Time Variable
#>   <dbl> <dbl>    <dbl>
#> 1     1     1      123
#> 2     1     2     1001
#> 3     1     3       90
#> 4     3     1      200
#> 5     3     2     2000
#> 6     3     3     4000

^{Created on 2018-11-15 by the reprex package (v0.2.1)}