Select function works w/ Tibble not w/ Tsibble

nes125 · September 21, 2024, 2:53am

Hello all!

I have the following code and am trying to select certain columns in a tsibble from a tibble. When I do the following w/ df as a tibble:
df is a tibble object w/ cols a, b, and c indexed by 'Year'.
I code the following:

dff <- df |>
as_tsibble(key = c(a),
index = Year)
glimpse(dff)

Unfortunately, columns a, b and c remain in dff.

If I use the select function with the original tibble, i.e. df, it works.

Thank you for any consideration or assistance provided.

EconProf · September 21, 2024, 5:09pm

For a tsibble, you cannot remove the index column or one that it used as a key, which means that the a column cannot be deselected.

A reprex with a sample of your data and the code you used would be very helpful to explain why the b and c columns unexpectedly remain.

nes125 · September 21, 2024, 9:06pm

Hello! Thank you for your response and assistance.

So - in my original dataset which is a .csv file - I import it into R using 'read_csv'. It becomes a tibble.

My dataset is indexed by year and an id. Now - each of these observations have more columns(variables) associated with them (in ML-speak - more "features"?)

So, my data (tibble) is the following (example): (df)

Year ID ColA ColB ColC ................Col Z
2014 Adam 10 154 123
2015 Bob 20 30 51
.
.
.

I think I fixed this, and was able to create a tsibble that work. I used the following code:

dff <- df |>
as_tsibble (key = c(ID),
index = Year)
dfff <- dff |>
filter(ID = "Adam")
dfff |>
select(ColA)

With this code, only the Year, ID, and ColA remained.

However, what is the point of the 'key' function then? In my "solution", I just "keyed" in on the "id" variable that designates each observation over time. In this regard, I get why the index variable is what it is for a 'time series' object, i.e tsibble. On the other hand, what about all the other columns that I think I would include in the "key" designation? Shouldnt they be "keyed' in too? In the above example - I am referrring to 'Col B', 'Col C', etc. In your explanation - it would seem that if I had included 'Col B', 'Col C'., etc, then they cannot be "deselected". Is that correct?

I hope this makes sense and I apologize for my lack of the appropriate vernacular to describe my situation.

Thank you for your assistance. I appreciate it!

EconProf · September 21, 2024, 11:17pm

If your data is similar to this:

Year ID .....
2014 Adam .....
2014 Bob .....
2014 Chris .....
2015 Adam .....
2015 Bob .....
2015 Chris .....
2016 Adam .....
2016 Bob .....
2016 Chris .....

then you need to specify both Year as the index and ID as the key so that each row is for a unique observation. The tsibble now knows that there is more than one time series, one each for Adam, Bob and Chris.

I recommend that you read the first part of Chapter 2 of this book:

nes125 · September 21, 2024, 11:39pm

Hello again!

Yes sir! That is exactly what I am working out of, i.e. the 'Forecasting: Principles and Practice (3rd ed) book.

Thank you for your explanation and assistance with this. I really appreciate it!

system · December 20, 2024, 11:39pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.