Select function works w/ Tibble not w/ Tsibble

Hello all!

I have the following code and am trying to select certain columns in a tsibble from a tibble. When I do the following w/ df as a tibble:
df is a tibble object w/ cols a, b, and c indexed by 'Year'.
I code the following:

dff <- df |>
as_tsibble(key = c(a),
index = Year)
glimpse(dff)

Unfortunately, columns a, b and c remain in dff.

If I use the select function with the original tibble, i.e. df, it works.

Thank you for any consideration or assistance provided.

For a tsibble, you cannot remove the index column or one that it used as a key, which means that the a column cannot be deselected.

A reprex with a sample of your data and the code you used would be very helpful to explain why the b and c columns unexpectedly remain.

Hello! Thank you for your response and assistance.

So - in my original dataset which is a .csv file - I import it into R using 'read_csv'. It becomes a tibble.

My dataset is indexed by year and an id. Now - each of these observations have more columns(variables) associated with them (in ML-speak - more "features"?)

So, my data (tibble) is the following (example): (df)

Year ID ColA ColB ColC ................Col Z
2014 Adam 10 154 123
2015 Bob 20 30 51
.
.
.

I think I fixed this, and was able to create a tsibble that work. I used the following code:

dff <- df |>
as_tsibble (key = c(ID),
index = Year)
dfff <- dff |>
filter(ID = "Adam")
dfff |>
select(ColA)

With this code, only the Year, ID, and ColA remained.

However, what is the point of the 'key' function then? In my "solution", I just "keyed" in on the "id" variable that designates each observation over time. In this regard, I get why the index variable is what it is for a 'time series' object, i.e tsibble. On the other hand, what about all the other columns that I think I would include in the "key" designation? Shouldnt they be "keyed' in too? In the above example - I am referrring to 'Col B', 'Col C', etc. In your explanation - it would seem that if I had included 'Col B', 'Col C'., etc, then they cannot be "deselected". Is that correct?

I hope this makes sense and I apologize for my lack of the appropriate vernacular to describe my situation.

Thank you for your assistance. I appreciate it!

If your data is similar to this:

Year ID .....
2014 Adam .....
2014 Bob .....
2014 Chris .....
2015 Adam .....
2015 Bob .....
2015 Chris .....
2016 Adam .....
2016 Bob .....
2016 Chris .....

then you need to specify both Year as the index and ID as the key so that each row is for a unique observation. The tsibble now knows that there is more than one time series, one each for Adam, Bob and Chris.

I recommend that you read the first part of Chapter 2 of this book:

1 Like

Hello again!

Yes sir! That is exactly what I am working out of, i.e. the 'Forecasting: Principles and Practice (3rd ed) book.

Thank you for your explanation and assistance with this. I really appreciate it!