Learning R. Frustrated and need help.

technocrat · September 27, 2020, 7:01am

Hyndman and Athanasopoulos is a solid introductory text, and the exercises are primarily a matter of comprehension, not coding.

Here are the questions:

The data set fancy concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.
a. Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.
b. Explain why it is necessary to take logarithms of these data before fitting a model.
c. Use R to fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.
d. Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?
e. Do boxplots of the residuals for each month. Does this reveal any problems with the model?
f. What do the values of the coefficients tell you about each variable?
g. What does the Breusch-Godfrey test tell you about your model?
h. Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.
i. Transform your predictions and intervals to obtain predictions and intervals for the raw data.
j. How could you improve these predictions by modifying the model?

These can be profitably approached with the idea in mind of school algebra—f(x)=y, where x is a given object, y is a desired object and f is a function object to make the transformation. Sometimes, it will be necessary to create f as a composed function, like school g(f(x).

Exercise 5 in §15.5 provides the fancy data set as x. Begin with inspecting it.

library(fpp2)
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
#> ── Attaching packages ───────────────────────────────────── fpp2 2.4 ──
#> ✓ ggplot2   3.3.2     ✓ fma       2.4  
#> ✓ forecast  8.13      ✓ expsmooth 2.3
#> 
# user to confirm this is the correct file
fancy <- scan("http://robjhyndman.com/tsdldata/data/fancy.dat")
str(fancy)
#>  num [1:84] 1665 2398 2841 3547 3753 ...

^{Created on 2020-09-26 by the reprex package (v0.3.0.9001)}

For each question, identify y. For example in a it's

time plot of the data

such as

Next, identify f, preferably the simplest function seen to date. If f(x) does not produce the desired result, the next step is reading the documentation for f, specifically the arguments section. Is fancy, which is the object at hand a proper argument as x?

Look at str(fancy). It's a numeric vector. Can it be used to produce an x-y plot? If not, is there a function g that will turn it into one?

Continue in the same fashion for the remaining questions involving coding. As preparation, parse carefully the introductory paragraph, paying particular attention to the highlighted words and phrases.

The data set fancy concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. \dots Over time, the shop has expanded its premises, range of products, and staff.

All of those words and phrases are there as keys to answering the questions and selecting functions and creating objects where needed.

Analysis, in general, and R in particular is much more directed to posing the right questions that will have applicability to different fact patterns than it is about deriving a particular answer to a particular fact pattern.

See also the homework FAQ for additional questions; these would be better posted separately and focused on particular difficulties to understanding.