Needing help on Linear Mixed Models

technocrat · April 28, 2023, 9:39am

That's a thoughtful way to frame questions. Lacking [a reprex—see the FAQ] (FAQ: How to do a minimal reproducible example ( reprex ) for beginners), I'm only going to be offer some general thoughts.

First, though, is the movement coefficient continuous or has it been categorized into intervals?

Now, let's get philosophical and return to school algebra—f(x) = y where

x is your tabular data of dim 1320,7.
y is a transformation of x that abstracts away the detail to put a measure on the information contained in x
f is the function or functions (think f(g(x)) that does the transformation.

Although phrased as a question about linear mixed models, the heart of the analysis is the selection of y. What compact measure(s) best describe(s) the relationship between pupil diameter (the chosen measure for physiological engagement) and the combinations of stimuli that were presented? This question leads to

Is there any relationship at all worth looking at or is it just random?
If there is a relationship, how "close" or distant is it from randomness? The latter is the question that the misunderstood p-value of many statistical tests addresses. An f is applied to y to produce a test statistic, the p-value puts a measure on the probability that the statistic results simply from random variation. For the conventional default of 0.05, that means only a one in twenty chance. Depending on the phenomenon being gauged that may be it passes the laugh test and it's worth taking a closer look at or that's not good enough to trust the lives of millions to.
If the data do show an associative relationship that passes the [pre]-selected confidence interval, do the data permit casual inference? Can the relationships among the multiple stimuli be teased apart to test for one while keeping the others constant? Are there mediators? Colliders? Stimuli that only have an effect indirectly? Both directly and indirectly? This is the domain of causal inference and in former times was heretical. Today we have tools, such as directed acyclic graphs, that make it possible when applied carefully.

The first question is the most general and easiest to overlook in the presence of eagerness to get to a conclusion. The tools of exploratory data analysis are designed to hold the drive to selecting y, the gauge of the outcome in abeyance to see what the data are capable of revealing.

You may have already done this and we should be looking at what metrics are available to address the case of a continuous outcome variable Y in the presence of X_1 \dots X_6 where the treatment variables are categorical. If you have, I'd encourage writing the EDA phase up in semi-formal fashion. From experience, I know that it's easy to let the spark of an idea die if left to fend for its own among a mass of mixed notes, files, scripts and what not.

The motivation for my question about whether motion coefficients were continuous is that looking at the simple case of two continuous variables is simple and potentially informative. Linear regression/ANOVA is unreasonably effective in assessing the threshold question is there any there there? If not, move on.

Are any of the variable ordinal in their categorization, such as movements? If these were measured as giraffe, watermelon and quartz there is no ordering, but static, micro and macro suggest a ranking. That may bear on the choice of model.

Also, a nonlinear mixed effects model may be more appropriate. Depending.

I don't have any experience going through this process and making these decisions. I can only just follow the {lme4} vignette. A long career chasing false hopes has left me cautious about setting sail before fully understanding the seas and winds to be expected.