Looking for help as I am not sure what statistical test/tests would be appropriate to compare my results.
I have a total of 3 graphs, one for each drink tested (water, orange juice and coffee). The plotted points are mean values from 6 participants.
My goal is to measure which drink has the most/least effect of blood pressure/heart rate so I can conclude how each drink effects the cardiovascular system.
My original idea was to perform a correlation test of some description and compare the strength of the correlations, however I'm not sure if this would tell me much, or if it would be worse than a t-test between specific points for example.
Anyway I would really appreciate if anyone has any suggestions for me
Standard statistical tests will only give a measure as to how likely or not that differences (in this case means drink) are due to random variation. To go further requires the tools of causal inference, which is a method of carefully considering how to eliminate interactions in the data that may make results of statistical tests ambiguous. For such a small dataset we can set this assign, but hold the thought.
Restating the question: given six subjects with three treatments and a single response variable how can the associated differences among the treatments with respect to response be statistically tested.
Posing that question to ChatGPT, this response is one place to start. (See my comments.)
In the R programming language, you can statistically test the differences among treatments with respect to a response variable in a situation with six subjects and three treatments using an Analysis of Variance (ANOVA) test, particularly a one-way ANOVA if there is only one factor (treatment) involved. Here's how you can approach it:
Data Structure: First, ensure your data is structured appropriately. You should have a dataset where each row represents a subject, and there are two columns: one for the treatment type and one for the response variable. COMMENT: In this community, that is referred to as tidy and more generally for data science table (similar to a schema in SQL). This meaning is more specific than general English because it prescribes a particular layout of rows and columns without the annotations and decorations commonly seen in publication tables and Excel. The principle is to a) separate data from presentation and b) providing a default arrangement so that the form of data doesn't have specially to be described to function parameters each time.
Assumptions Check: Before conducting ANOVA, check its assumptions:
Independence of observations
Homogeneity of variances (you can use Bartlett’s or Levene's test for this) COMMENT: Commonly done with plot(your_model, which = 3) and if the range of $\sqrt{Standardized Residuals}$ is consistent and the red line is reasonably smooth and flat no more need be done. When in doubt use car::leveneTest(your_model)`.
Normality of residuals (which can be checked using a Q-Q plot or Shapiro-Wilk test) COMMENT: Good practice to eyeball first with
In R, to create a Q-Q (quantile-quantile) plot using the base graphics system, you can use the qqnorm() function, which is a standard way to check the normality of a distribution. The qqline() function can be added to provide a reference line which is helpful in assessing the normality. Here's how you can do it:
Prepare Your Data: Ensure you have a numeric vector of data for which you want to check normality.
Generate Q-Q Plot: Use qqnorm() to create the Q-Q plot.
Add Reference Line: Use qqline() to add a reference line to the plot.
Here's an example R code:
# Assume 'data_vector' is your numeric vector
# data_vector <- c(... your data ...)
# Create a Q-Q plot
qqnorm(data_vector)
qqline(data_vector, col = "red") # Adding a reference line in red
This code will produce a Q-Q plot where the x-axis represents the theoretical quantiles from a normal distribution, and the y-axis represents the quantiles from your data. If your data are normally distributed, the points on the plot will fall approximately along the reference line. Deviations from this line suggest departures from normality.
Perform One-Way ANOVA: Use the aov() function in R to perform the ANOVA. The model will look something like response ~ treatment. COMMENT: This is fine for two treatments, but you have three, so Two-Way Anova is indicated. There is a very good walkthrough.
Review the Results: Check the ANOVA table output for the F-statistic and its associated p-value.
Post Hoc Analysis: If your ANOVA results show significant differences, you might want to know which specific treatments differ from each other. This can be done using post hoc tests like Tukey's Honest Significant Difference test.
Here is an example R code to illustrate these steps:
# Assuming your data frame is named 'data' with columns 'treatment' and 'response'
# data <- data.frame(treatment = rep(c("Treatment1", "Treatment2", "Treatment3"), each = 2),
# response = c( ... response values ... ))
# Check assumptions
# bartlett.test(response ~ treatment, data = data)
# shapiro.test(residuals(aov(response ~ treatment, data = data)))
# Perform one-way ANOVA
anova_result <- aov(response ~ treatment, data = data)
summary(anova_result)
# Post hoc test if ANOVA is significant
if (summary(anova_result)[[1]]$'Pr(>F)'[1] < 0.05) {
TukeyHSD(anova_result)
}
This code provides a basic framework. You should adjust it according to your specific dataset and research question. Remember, the interpretation of the results should be in line with your study design and hypothesis.
Thank you for that detailed example. As someone who is attempting to master r after a lifetime of SAS coding and statistical analysis, your code examples are quite helpful.
In that case, just be prepared for some mind-wrenching adaptations. It took me a couple of years to adjust. Well, maybe 5. Vectorization, alone was amazing.
Yes, vectorization in particular. But I can see the progress with every day, and that is reinforcing. Getting rid of a massive license fee is also motivating.
Getting rid of a massive license fee is also motivating.
On my first consulting contact using R, I think the PI was ecstatic when he learned we did not have to pay a license fee.
I've been using R for so long that the thought of going back to something like SAS or Systat is horrifying. And SAS was so nice after SPSS.