I encourage you to try to write that code and, if you get stuck, show your code and ask a specific question. There are already several examples of ggplot code in this thread that give you examples of how plots change with different code.
Ok, so i have been messing around with the code a bunch today and i have decided that if i were to use a box plot, it would be used to compare the averages of each measurement at a given time,
for example to find the average Sa value for tool group B, i would take each time (0,3,6,9) and average that,
however i am running into a problem due to the fact that within group B, tool B1 does not have a measurement for 9H,
i am unsure how i would automate this averaging process, i cannot include data for B1 at 9H due to there not being any,
how would i make it so that it will take the average for each measurement at each time?
You can calculate the means for the various groups using some functions from the dplyr package. First, you need to make a new column that designates the tool group. That is, you need to grab the A, B, or C from the tool column. You can do that with substr(), as I showed previously. Then you need to group your data by the tool group, time, and parameter. The group_by() function from dplyr does this. Then you need to summarize the data with the summarize() function. If you look at the examples in the help section of summarize, you will see how to do both the group_by() and the summarize(). Run ?summarize
to see the help.
What does argument: group = 1, actually do here ?
The time
variable is of class character and is treated as a factor. I think geom_line() then defaults to group = time
. But each group only has one element, geom_line() prints a warning about that and the points in the plots are not connected by a line because each group only has one value. Using group = 1
creates a dummy group so all the points in each plot are in that group, there are no warnings and the points are connected with a line. The problem can also be avoided by changing time
to an number.
plot.df2 <- plot.df[!plot.df$tool %in% c("A2","C2"), ]
> head(plot.df2)
# A tibble: 6 × 4
tool time parameter value
<chr> <chr> <chr> <dbl>
1 B1 3H Sa 0.409
2 B1 3H Ssk 0.266
3 B1 3H Sku 3.09
4 B1 3H Sz 3.47
5 B1 3H Sq 0.515
6 B1 3H Sal 11.6
#No warning with group = 1
ggplot(plot.df2, aes(x = time, y = value, group=1)) +
geom_line() + geom_point() +
facet_grid(parameter ~ tool, scales = "free")
#Without group = 1 we get warnings
ggplot(plot.df2, aes(x = time, y = value)) +
geom_line() + geom_point() +
facet_grid(parameter ~ tool, scales = "free")
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?
#The warning is repeated for every subplot!
#Change time to a number and there is no need for group = 1
plot.df2 |> mutate(time = as.numeric(substr(time, 1, 1))) |>
ggplot(aes(x = time, y = value)) +
geom_line() + geom_point() +
facet_grid(parameter ~ tool, scales = "free")
Thank you for the excellent explanation, I am very grateful.
I am very confused on how i would do this,
would i need to create a new Data frame?
I am also unsure as to how i would properly get the average for the B grouping as B1, only has values up to 6hours, while the others have up to 9hours.
Ok so i figured out how to do it but i feel like it is extremely inefficient
library(dplyr)
AverageA_0H <- plot.df %>%
filter(Time == "0H" & Tool %in% c("A1","A3")) %>%
group_by(para) %>%
summarize(A_0H_Average = mean(Value))
For each time frame (0H, 3H, 6H, 9H) i need to include the same code again but change the number for the time.
It does work however it is giving me values in a new table each time, when i need them together to make a scatter plot of each or a box and whisker plot
Thanks for posting some code. You can increase the efficiency by grouping by more columns. If you run
AverageA_0H <- plot.df %>%
filter(Tool %in% c("A1","A3")) %>%
group_by(Time, para) %>%
summarize(A_Average = mean(Value))
you will get averages for each Time and each para. The next step would be to make it possible to also group by each of the tool groups. You do not have a column that labels each Value as coming from an A tool or a B tool, but you can make such a column with the mutate() function. Here is a hint
plot.df <- plot.df %>% mutate(ToolGroup = substr(Tool, ...))
Can you fill in what should replace the ...
in that code so that ToolGroup has values of A, B or C?
You can then continue and group_by() ToolGroup, Time, and para and get all your values at once.
I am confused with how/what i should be replacing the ... with.
the code i have currently for each average is:
Library(dplyr)
Mid_0H <- plot.df %>%
filter(Time == "0H" & Tool %in% c("A1","A3")) %>%
group_by(para) %>%
summarize(Mid0H = mean(Value))
Mid_3H <- plot.df %>%
filter(Time == "3H" & Tool %in% c("A1","A3")) %>%
group_by(para) %>%
summarize(Mid3H = mean(Value))
Mid_6H <- plot.df %>%
filter(Time == "6H" & Tool %in% c("A1","A3")) %>%
group_by(para) %>%
summarize(Mid6H = mean(Value))
Mid_9H <- plot.df %>%
filter(Time == "9H" & Tool %in% c("A1","A3")) %>%
group_by(para) %>%
summarize(Mid9H = mean(Value))
combined_tableA <- left_join(Mid_0H, Mid_3H, by = "para") %>%
left_join(Mid_6H, by = "para") %>%
left_join(Mid_9H, by = "para")
For context Mid is group A,
So the code i currently have to put all of the averages together from the same group are
library(dplyr)
Mid_0H <- plot.df %>%
filter(Time == "0H" & Tool %in% c("A1","A3")) %>%
group_by(para) %>%
summarize(Mid0H = mean(Value))
Mid_3H <- plot.df %>%
filter(Time == "3H" & Tool %in% c("A1","A3")) %>%
group_by(para) %>%
summarize(Mid3H = mean(Value))
Mid_6H <- plot.df %>%
filter(Time == "6H" & Tool %in% c("A1","A3")) %>%
group_by(para) %>%
summarize(Mid6H = mean(Value))
Mid_9H <- plot.df %>%
filter(Time == "9H" & Tool %in% c("A1","A3")) %>%
group_by(para) %>%
summarize(Mid9H = mean(Value))
combined_tableA <- left_join(Mid_0H, Mid_3H, by = "para") %>%
left_join(Mid_6H, by = "para") %>%
left_join(Mid_9H, by = "para")
mid is group A, I dont know what i would put in place of the ...
Here is a slightly edited version of a part of the help file for the substr() function
substr(x, start, stop)
x, a character vector.
start, An integer. The first element to be extracted or replaced.
stop, An integer. The last element to be extracted or replaced.
Given that, run code like the line below except replace the question marks with numbers. The correct values will cause the function to return "A" "B"
. I used this earlier in this thread.
substr(x = c("A1", "B2"), start = ?, stop = ?)
You can use the same values in the mutate() function to make a ToolGroup column.
i made another data sheet in excel that has all of the average values for each and it looks like this
I want to make the graph so that it compares each of the values at their given parameter
So like box and whisker plot of each concentration at their respective hours,
High is B
Mid is A
Low is C