Order of variable on graphs

Hello,

I have a graph coded in this way:

When plotted Q18_10 comes second in order on the graph and I want to put it last. I'm not sure why this happens but any help would be appreciated! Thank you in advance.

final_plot_data %>%
  dplyr::select(c(
    "Condition", "Q18_1_mean", "Q18_2_mean", "Q18_3_mean", "Q18_4_mean", "Q18_5_mean", "Q18_6_mean", "Q18_7_mean", "Q18_10_mean","Q18_1_sd", "Q18_2_sd", "Q18_3_sd",  "Q18_4_sd",  "Q18_5_sd",  "Q18_6_sd",  "Q18_7_sd", "Q18_10_sd"
  )) %>%
  tidyr::pivot_longer(
    cols = -Condition,
    names_to = c("variable", ".value"),
    names_pattern = "(.*)_(.*)"
  ) %>%
  ggplot(aes(x = variable, y = mean, fill = Condition)) +
  geom_col(position = "dodge") +
  geom_errorbar(
    aes(ymin = mean - sd, ymax = mean + sd),
    width = 0.2,
    position = position_dodge(.9)
  ) +
  ggplot2::scale_fill_grey() +
 labs(x = "Relevance Scale Questions (1-7 & 10)", y = "Mean Relevance Rating") +
   theme(text=element_text(family="Times New Roman", size=12)) +
  theme(panel.grid.major = element_blank(), 
               panel.grid.minor = element_blank()) +
   theme(panel.background = element_blank()) 
  theme_classic()

When your categories are stored using character data, ggplot2 doesn't know their intended order and will use alphabetical. Q18_10... appears right after Q18_1_... alphabetically.

The way to deal with this is to store your categories as factor data, which can be given an ordering that is separate from its labels.

So this will look wrong:


df <- data.frame(variable = c("Q18_1_mean", "Q18_2_mean", "Q18_10_mean"),
                 mean = 1:3)
ggplot(df, aes(variable, mean)) +
  geom_col()

Rplot168

But we can change variable to an ordered factor to fix it. My favorite way is with the forcats package, which offers a variety of ways to control this with a nice syntax.

For instance, if we know the variables are appearing in our data in the order we want them to appear, we could run

 df$variable = forcats::fct_inorder(df$variable)

and then we rerun our chart to get the intended result.

Rplot169

In your case, the columns encode multiple pieces of information which you might want to use later, so you could use something like:

df %>%
  tidyr::separate(variable, into = c("question", "part", "stat"), remove = FALSE, convert = TRUE) 

to extract both the question, the part, and stat from each category, and to convert the number pieces into number form. That would make it simpler to sort later with something like arrange(question, part) to get everything into the order you might want.


It's often useful to plot data in order of some variable, so fct_reorder can be useful too. Here I sort in descending value of mean:

df$variable = forcats::fct_reorder(df$variable, -df$mean)

Rplot170

Or you might surgically want to move one level of the factor to a different position, here's how:

df$variable = forcats::fct_relevel(df$variable, "Q18_1_mean", after = 1)

[quote="jonspring, post:2, topic:134492"]
df$variable = forcats::fct_relevel(df$variable, "Q18_1_mean", after = 1)
[/quote

Hello, Thank you for the detailed response, when I try to do that I get the error message:

Error: `f` must be a factor (or character vector).

The code I'm running is, just to test if the package is working:

final_plot_data$variable = forcats::fct_relevel(final_plot_data$variable, "Q18_10_mean", after = 1) 
  

I think you could convert it to factor first like so:

df$variable = forcats::fct_relevel(as.factor(df$variable), "Q18_1_mean", after = 1)

Hello, I have been trying all these options above but I keep getting more errors.. I feel like it could be because I do not integrate it correctly into the coding..

That is my code at the moment, any further help of how I can integrate it in the code exactly with no errors would be appreciated.. Thank you and sorry in advance.


final_plot_data <- structure(list(Condition = c("Low-Lyrical", "Lyrical"), Q18_1_mean = c(5.375, 
4.47826086956522), Q18_2_mean = c(5.15625, 4.43478260869565), 
    Q18_3_mean = c(4.59375, 3.8695652173913), Q18_4_mean = c(5.1875, 
    4.21739130434783), Q18_5_mean = c(5.46875, 4.65217391304348
    ), Q18_6_mean = c(4.78125, 3.95652173913043), Q18_7_mean = c(4.78125, 
    4.26086956521739), Q18_10_mean = c(4.90625, 
    4.30434782608696), Q18_1_sd = c(0.975506485486286, 1.53355099560676
    ), Q18_2_sd = c(1.16700263979578, 1.53226175536575), Q18_3_sd = c(1.07341405894253, 
    1.57550418556574), Q18_4_sd = c(0.895778630487862, 1.59420888728064
    ), Q18_5_sd = c(1.10670609788542, 1.46500684615757), Q18_6_sd = c(1.15659051219661, 
    1.49174275352279), Q18_7_sd = c(1.15659051219661, 1.684620035507
    ), Q18_10_sd = c(0.856074122506811, 1.42811963493946
    )), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-2L))

final_plot_data %>%
  dplyr::select(c(
    "Condition", "Q18_1_mean", "Q18_2_mean", "Q18_3_mean", "Q18_4_mean", "Q18_5_mean", "Q18_6_mean", "Q18_7_mean", "Q18_10_mean","Q18_1_sd", "Q18_2_sd", "Q18_3_sd",  "Q18_4_sd",  "Q18_5_sd",  "Q18_6_sd",  "Q18_7_sd", "Q18_10_sd"
  )) %>%
  tidyr::pivot_longer(
    cols = -Condition,
    names_to = c("variable", ".value"),
    names_pattern = "(.*)_(.*)"
  ) %>%
  ggplot(aes(x = variable, y = mean, fill = Condition)) +
  geom_col(position = "dodge") +
  geom_errorbar(
    aes(ymin = mean - sd, ymax = mean + sd),
    width = 0.2,
    position = position_dodge(.9),
  ) +
  ggplot2::scale_fill_grey() +
 labs(x = "Relevance Scale Questions (1-7 & 10)", y = "Mean Relevance Rating") +
   theme(text=element_text(family="Times New Roman", size=12)) +
  theme(panel.grid.major = element_blank(), 
               panel.grid.minor = element_blank()) +
  theme_classic()

Add this after the pivot_longer() and before the ggplot():

  mutate(variable = as_factor(variable)) %>%

The levels will be in order of first appearance, which puts Q18_10 at the end.

Also, because you are using all of the variables in final_plot_data, I think you can eliminate the dplyr::select()

Thank you! it worked

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.