In the example in section 13.4 Forecast combinations of FPP3, the simple average of the forecasts from three models (based on ETS, STL-ETS, and ARIMA) models are taken. The comment is made that the "mutate() function... automatically handle[s] the forecast distribution appropriately by taking account of the correlation between the forecast errors of the models that are included."

Does this mean that we don't have to check residual diagnostics if we choose to use the simple average of the ETS, STL-ETS, and ARIMA models? For example, do we need to use the Ljung-Box test to check if the resulting residuals are correlated, or use the residuals plot allow us to check if the residuals have zero mean?

Likewise, I am also wondering whether we need to check residual diagnostics for the automated bagging method discussed in 12.5 Bootstrapping and bagging, as the article doesn't appear to mention this.

^{Referred here by Forecasting: Principles and Practice, by Rob J Hyndman and George Athanasopoulos}

I think you are confusing correlation of the residuals over time (residuals are correlated with past residuals) with correlation of the residuals across the models (the residuals for ETS are correlated with the ARIMA residuals). The residual diagnostics are for the former. The distribution for the combination will depend on the latter, the correlations between the components of that combination.

I am not entirely sure what you mean by "need to" in this context. Do you care whether the residuals for each models are white noise, or just the residuals for the combination? If I am trying to decide which models to combine, I might test each one. On the other hand, if each model estimates different but incomplete patterns then the combination should be more accurate than the individual components. Disclaimer: I have not used combination forecasts extensively and only cover this briefly near the end of a forecasting class. Hopefully someone more experienced can chime in.

Just like any model, you should evaluate the residual diagnostics to better understand how the model is performing - and identify possible elements the model is not capturing.

The comment in the textbook is referring to the way in which forecast distributions are combined.
Assume you have two distributions, X \sim N(\mu_x, \sigma^2_x) and Y \sim N(\mu_y, \sigma^2_y). You can imagine that X is the 1-step forecast from an ARIMA model, and Y is the 1-step forecast from an ETS model.

Now, a forecast combination model may equally weight the forecasts from the ARIMA and ETS model, or in other words Z \sim (X + Y)/2, where Z is the 1-step forecast from the combination model.

If the two forecast distributions X and Z were independent, you could easily say Z \sim N((\mu_x + \mu_y)/2, (\sigma^2_x + \sigma^2_y)/4). However, X and Z are not independent, they are forecasts of the same thing! Instead we need to take into the correlation/covariance between the residuals of the combined models. This gives \sigma^2_z = \sigma^2_x + \sigma^2_y + 2\times \text{cov}(X,Y) instead of \sigma^2_z = \sigma^2_x + \sigma^2_y .

This step of handling the covariance between forecasts is handled automatically and it is not something you should need to worry about. This step does not guarantee the residuals are well behaved in any way - that is up to your choice of model (which by saying 'model' I also include combinations of models).