Need help creating code for testing my assumptions and EDA

b0neD4ddy · December 13, 2023, 12:13am

This is what I have so far for my code

Load necessary libraries

library(tidyverse)
library(ggplot2)
library(broom)

data <- read.csv("eorgan.csv")

This will read the data

head(data)

This will display the top of the data table

tidy_data <- data %>%
gather(key = "Tempature", value = "Cortisol Levels", -Participant_ID) %>%
mutate(Tempature = factor(Tempature, levels = c("Hot", "Cold")))

This will tidy the data

head(tidy_data)

this will display the top of the tidy data table

ggplot(tidy_data, aes(x = Tempature, y = Cortisol_Levels, fill = Tempature)) +
geom_boxplot() +
labs(title = "Boxplot of Cortisol Before and After",
x = "Tempature of Shower",
y = "Cortisol Levels") +
theme_minimal()

EDA: Boxplot for each time point

tidy_data %>%
filter(Tempature == "Cold") %>%
left_join(
tidy_data %>%
filter(Tempature == "Hot") %>%
select(Participant_ID, Hot = Temperature),
by = "Participant_ID"
) %>%
mutate(Difference = Cold - Hot) %>%
ggplot(aes(x = Difference)) +
geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
labs(title = "Histogram of Temperature Differences",
x = "Difference (Cold - Hot)",
y = "Frequency") +
theme_minimal()

EDA: Histogram of differences

shapiro_test <- tidy_data %>%
filter(Tempature == "Cold") %>%
left_join(
tidy_data %>%
filter(Tempature == "Hot") %>%
select(Participant_ID, Hot = Temperature),
by = "Participant_ID"
) %>%
summarise(shapiro_test = shapiro.test(Cold - Hot)$p.value)

Normality test for differences

print("Shapiro-Wilk Test for Normality of Differences:")
print(shapiro_test)

Print the result of the normality test

levene_test <- tidy_data %>%
group_by(TimePoint) %>%
summarise(levene_test = leveneTest(MentalHealth ~ TimePoint, data = .)$p.value)

Homogeneity of variances test

print("Levene's Test for Homogeneity of Variances:")
print(levene_test)

Print the result of the homogeneity of variances test

Everytime I run my code this is what it tells me

Error in make_ansi_style(x[["color"]]) : Unknown style specification: br_magenta

Error in head(tidy_data) : object 'tidy_data' not found Error in ggplot(tidy_data, aes(x = Tempature, y = Cortisol_Levels, fill = Tempature)) : object 'tidy_data' not found [1] "Shapiro-Wilk Test for Normality of Differences:" Error in print(shapiro_test) : object 'shapiro_test' not found [1] "Levene's Test for Homogeneity of Variances:" Error in print(levene_test) : object 'levene_test' not found

[image] Show Traceback

Error in filter(., Tempature == "Cold") : object 'tidy_data' not found

[image] Show Traceback

Error in filter(., Tempature == "Cold") : object 'tidy_data' not found

[image] Show Traceback

Error in group_by(., TimePoint) : object 'tidy_data' not found

Any help I can get is much appreciated, this is due in two days, and I'm freaking out. Code bits work just fine. I can't figure out why it won't stop telling me there are errors.

Matthias · December 13, 2023, 11:17am

First of all, we cannot really help as we don't have your input data.
Also this is quite a long code and it'S difficult to figure out what's not working, could be something simple as a column name that don't exist. For example sometimes you use "Tempature", sometimes "Temperature".
Try to figure out what is working and where the actual error starts.
This error: "Error in head(tidy_data) : object 'tidy_data' not found Error in ggplot(tidy_data, aes(x = Tempature, y = Cortisol_Levels, fill = Tempature)) : object 'tidy_data' not found" indicates already the gather-step didn't work properly.
Do you see the correct result in "head(tidy_data)"?

For this part:

tidy_data %>%
filter(Tempature == "Cold") %>%
left_join(
tidy_data %>%
filter(Tempature == "Hot") %>%
select(Participant_ID, Hot = Temperature),
by = "Participant_ID"
) %>%
mutate(Difference = Cold - Hot) %>%
ggplot(aes(x = Difference)) +
geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
labs(title = "Histogram of Temperature Differences",
x = "Difference (Cold - Hot)",
y = "Frequency") +
theme_minimal()

run the first part, before the ggplot() and see if this gives the correct outcome.
Or better, save it into another dataframe. It's convenient to pipe it directly to the ggplot function and don't save the intermediate results but this makes it more complicated to find errors.

jrkrideau · December 13, 2023, 2:42pm

A handy way to supply some sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where mydata is your data. Copy the output and paste it here between
```

```

Where is the leveneTest coming from? I don't see it in any of the libraries you have loaded.

system · January 24, 2024, 2:42pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.