Nested or crossed design?

I need help deciding whether my sample collection design qualifies as nested or crossed. The design is: I collect fish and water property data once per site (at the same 47 sites), once per season (wet and dry; May-Nov and Dec-April), i.e. 2 times per year (12-10 days a year). This is the ideal situation. Due to factors such tides, bad weather, or just human error, some data gets lost or sampling efforts aren't repeated on exactly on the same days as the previous years.

When I build a model for my response variable, I know I need a random effect for my location (site) effect to capture within site variability and avoid pseudo-replication. My question is: is my site variable technically "nested" within my year variable? I'm using "season" instead of "Month" to capture seasonality as well, since the month data is irregular spaced and contains large gaps.

One example of nested data I've heard:
If one set the I.D. of hospital patients to be labeled 1-10 (equivalent to my site variable) and a hospital variable to be A-C (similar to my year variable), it's ambiguous as to which patient belonged to the right hospital. Patients are nested within hospitals, I believe. Does my data present with a similar problem?

Below is a table of the number of sites per year, labeled 1-47 (note, not all years have the full 47 x 2 number of replicates):

> table(df$Month, df$CYR)
    
     2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
  1    47   47   47   47   47    0    0    0    0    0    0    0    0    0    0    0
  2     0    0    0    0    0    0    0    0    0    0   20    0    0    0    9    0
  3     0    0    0    0    0   47   39   30   24   43   27   47   47    0   38   47
  4     0    0    0    0    0    0    8   17   23    0    0    0    0    0    0    0
  7    47   47   47   47    0    0    0    0    0    0    0    0    0    0    0    0
  8     0    0    0    0   47    0   30    0    0    0    0    0    0    0    0    0
  9     0    0    0    0    0   47   17   47   47    0   47   47   47   47   47    0
  10    0    0    0    0    0    0    0    0    0   47    0    0    0    0    0    0

# Distribution by season...(note 2017 and 2021)
> table(df$Season, df$CYR)
     
      2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
  DRY   47   47   47   47   47   47   47   47   47   43   47   47   47    0   47   47
  WET   47   47   47   47   47   47   47   47   47   47   47   47   47   47   47    0

Hi @Nate_L
I suggest you may get more help by posting this question to the R-SIG-mixed-models listserver. https://www.r-project.org/mail.html
Also, is time-series analysis appropriate for these data?

1 Like

Thank you, I will try that! Probably not as a time lag plot (check for temporal autocorrelation) requires regular spaced data. I opted for a GAM, just trying to figure out the right way to model site and year.

Another good place to post design-level questions would be Stack Exchange's statistics site:

1 Like

Answer I received below. Does it seem right that missing values don't constitute a nested design, even 0 obs in 1 level? Still looking for a consensus, but it seems like a reasonable answer.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.