How those two columns were calculated ? - Likert data

Andrzej · July 10, 2023, 9:33am

Hi, this is from here:

https://altair-viz.github.io/gallery/diverging_stacked_bar_chart.html

I want to recreate that plot in R.

Here is df:

source <- data.frame(
  question = c(
    "Question 1", "Question 1", "Question 1", "Question 1", "Question 1",
    "Question 2", "Question 2", "Question 2", "Question 2", "Question 2",
    "Question 3", "Question 3", "Question 3", "Question 3", "Question 3",
    "Question 4", "Question 4", "Question 4", "Question 4", "Question 4",
    "Question 5", "Question 5", "Question 5", "Question 5", "Question 5",
    "Question 6", "Question 6", "Question 6", "Question 6", "Question 6",
    "Question 7", "Question 7", "Question 7", "Question 7", "Question 7",
    "Question 8", "Question 8", "Question 8", "Question 8", "Question 8"
  ),
  type = c(
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree"
  ),
  value = c(
    24, 294, 594, 1927, 376,
    2, 2, 0, 7, 11,
    2, 0, 2, 4, 2,
    0, 2, 1, 7, 6,
    0, 1, 3, 16, 4,
    1, 1, 2, 9, 3,
    0, 0, 1, 4, 0,
    0, 0, 0, 0, 2
  ),
  percentage = c(
    0.7, 9.1, 18.5, 59.9, 11.7,
    18.2, 18.2, 0, 63.6, 0,
    20, 0, 20, 40, 20,
    0, 12.5, 6.3, 43.8, 37.5,
    0, 4.2, 12.5, 66.7, 16.7,
    6.3, 6.3, 12.5, 56.3, 18.8,
    0, 0, 20, 80, 0,
    0, 0, 0, 0, 100
  ),
  percentage_start = c(
    -19.1, -18.4, -9.2, 9.2, 69.2,
    -36.4, -18.2, 0, 0, 63.6,
    -30, -10, -10, 10, 50,
    -15.6, -15.6, -3.1, 3.1, 46.9,
    -10.4, -10.4, -6.3, 6.3, 72.9,
    -18.8, -12.5, -6.3, 6.3, 62.5,
    -10, -10, -10, 10, 90,
    0, 0, 0, 0, 0
  ),
  percentage_end = c(
    -18.4, -9.2, 9.2, 69.2, 80.9,
    -18.2, 0, 0, 63.6, 63.6,
    -10, -10, 10, 50, 70,
    -15.6, -3.1, 3.1, 46.9, 84.4,
    -10.4, -6.3, 6.3, 72.9, 89.6,
    -12.5, -6.3, 6.3, 62.5, 81.3,
    -10, -10, 10, 90, 90,
    0, 0, 0, 0, 100
  )
)

My question is, how these two columns: percentage_start and percentage_end were calculated ?

Is it possible to figure it out ?

pieterjanvc · July 10, 2023, 12:26pm

Hello,

Just a quick Google search and you can find multiple tutorials. Here is one:

Hope this helps,
PJ

Andrzej · July 10, 2023, 12:51pm

I want to know how to calculate those two columns from source dataframe. Can you answer my specific question in first post ? I know how to do divergent Likert bar charts.

pieterjanvc · July 10, 2023, 1:24pm

Hi,

Well I've taken the tutorial above, and plugged in your data instead.
I changed some of the variable names and added a new filter in the section that creates the first diverging chart, but otherwise kept everything the same (so the title, labels etc need updating). I think you should be able to figure it out from here.

Load Packages

Let's load the two packages we'll use

library(tidyverse)
library(scales)

Generate Data

We'll create some fake data.

school_quality_summary <- data.frame(
  school = c(
    "Question 1", "Question 1", "Question 1", "Question 1", "Question 1",
    "Question 2", "Question 2", "Question 2", "Question 2", "Question 2",
    "Question 3", "Question 3", "Question 3", "Question 3", "Question 3",
    "Question 4", "Question 4", "Question 4", "Question 4", "Question 4",
    "Question 5", "Question 5", "Question 5", "Question 5", "Question 5",
    "Question 6", "Question 6", "Question 6", "Question 6", "Question 6",
    "Question 7", "Question 7", "Question 7", "Question 7", "Question 7",
    "Question 8", "Question 8", "Question 8", "Question 8", "Question 8"
  ),
  opinion = c(
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree"
  ),
  n_answers = c(
    24, 294, 594, 1927, 376,
    2, 2, 0, 7, 11,
    2, 0, 2, 4, 2,
    0, 2, 1, 7, 6,
    0, 1, 3, 16, 4,
    1, 1, 2, 9, 3,
    0, 0, 1, 4, 0,
    0, 0, 0, 0, 2
  ),
  percent_answers = c(
    0.7, 9.1, 18.5, 59.9, 11.7,
    18.2, 18.2, 0, 63.6, 0,
    20, 0, 20, 40, 20,
    0, 12.5, 6.3, 43.8, 37.5,
    0, 4.2, 12.5, 66.7, 16.7,
    6.3, 6.3, 12.5, 56.3, 18.8,
    0, 0, 20, 80, 0,
    0, 0, 0, 0, 100
  )
) %>% mutate(percent_answers = percent_answers / 100, percent_answers_label = percent(percent_answers, accuracy = 1))

Default Bar Chart

Let's make a summary of our data.

school_quality_summary %>%
  ggplot(aes(x = school, 
             y = percent_answers,
             fill = opinion)) +
  geom_col() +
  geom_text(aes(label = percent_answers_label),
            position = position_stack(vjust = 0.5),
            color = "white",
            fontface = "bold") +
  coord_flip() +
  scale_x_discrete() +
  scale_fill_viridis_d() +
  labs(title = "How good is the education at your school?",
       x = NULL,
       fill = NULL) +
  theme_minimal() +
  theme(axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        panel.grid = element_blank(),
        legend.position = "top")

Basic Diverging Bar Chart

Let's create data that we need to make a basic diverging bar chart.

school_quality_summary_diverging <- school_quality_summary %>%
  mutate(percent_answers = if_else(opinion %in% c("Strongly agree", "Agree"), percent_answers, -percent_answers)) %>% 
  mutate(percent_answers_label = percent(percent_answers, accuracy = 1)) %>% 
  filter(percent_answers != 0)

school_quality_summary_diverging

We can now make a basic diverging bar chart.

school_quality_summary_diverging %>%
  ggplot(aes(x = school, 
             y = percent_answers,
             fill = opinion)) +
  geom_col() +
  geom_text(aes(label = percent_answers_label),
            position = position_stack(vjust = 0.5),
            color = "white",
            fontface = "bold") +
  coord_flip() +
  scale_x_discrete() +
  scale_fill_viridis_d() +
  labs(title = "How good is the education at your school?",
       x = NULL,
       fill = NULL) +
  theme_minimal() +
  theme(axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        panel.grid = element_blank(),
        legend.position = "top")

Positive/Negative Labels

In our previous chart, the labels for very bad and bad were negative. Let's change this!

This will make our labels are positive numbers.

school_quality_summary_diverging_good_labels <- school_quality_summary_diverging %>%
  mutate(percent_answers_label = abs(percent_answers)) %>% 
  mutate(percent_answers_label = percent(percent_answers_label, accuracy = 1))

school_quality_summary_diverging_good_labels

school_quality_summary_diverging_good_labels %>% 
  ggplot(aes(x = school, 
             y = percent_answers,
             fill = opinion)) +
  geom_col() +
  geom_text(aes(label = percent_answers_label),
            position = position_stack(vjust = 0.5),
            color = "white",
            fontface = "bold") +
  coord_flip() +
  scale_x_discrete() +
  scale_fill_viridis_d() +
  labs(title = "How good is the education at your school?",
       x = NULL,
       fill = NULL) +
  theme_minimal() +
  theme(axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        panel.grid = element_blank(),
        legend.position = "top")

Reorder Bars

Our bars are out of order. Let's fix this!

school_quality_summary_diverging_right_order <- school_quality_summary_diverging_good_labels %>% 
  mutate(opinion = fct_relevel(opinion,
                               "Bad", "Very bad", "Good", "Very Good"),
         opinion = fct_rev(opinion)) 

school_quality_summary_diverging_right_order

school_quality_summary_diverging_right_order %>%
  ggplot(aes(x = school, 
             y = percent_answers,
             fill = opinion)) +
  geom_col() +
  geom_text(
    aes(label = percent_answers_label),
    position = position_stack(vjust = 0.5),
    color = "white",
    fontface = "bold"
  ) +
  coord_flip() +
  scale_x_discrete() +
  scale_fill_viridis_d() +
  labs(title = "How good is the education at your school?",
       x = NULL,
       fill = NULL) +
  theme_minimal() +
  theme(axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        panel.grid = element_blank(),
        legend.position = "top")

Make Legend Order Match

The bars are now in the right order, but the legend doesn't match. Let's fix this!

school_quality_summary_diverging_right_order %>%
  ggplot(aes(x = school, 
             y = percent_answers,
             fill = opinion)) +
  geom_col() +
  geom_text(aes(label = percent_answers_label),
            position = position_stack(vjust = 0.5),
            color = "white",
            fontface = "bold") +
  coord_flip() +
  scale_x_discrete() +
  scale_fill_viridis_d(breaks = c("Very bad", "Bad", "Good", "Very Good")) +
  labs(title = "How good is the education at your school?",
       x = NULL,
       fill = NULL) +
  theme_minimal() +
  theme(axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        panel.grid = element_blank(),
        legend.position = "top")

Improve Colors

Let's use a more appropriate color scale for this data.

school_quality_summary_diverging_right_order %>%
  ggplot(aes(x = school, 
             y = percent_answers,
             fill = opinion)) +
  geom_col() +
  geom_text(aes(label = percent_answers_label),
            position = position_stack(vjust = 0.5),
            color = "white",
            fontface = "bold") +
  coord_flip() +
  scale_x_discrete() +
  scale_fill_manual(breaks = c("Very bad", "Bad", "Good", "Very Good"),
                    values = c(
                      "Very bad" = "darkorange3",
                      "Bad" = "orange",
                      "Good" = "deepskyblue",
                      "Very Good" = "deepskyblue4"
                    )) +
  labs(title = "How good is the education at your school?",
       x = NULL,
       fill = NULL) +
  theme_minimal() +
  theme(axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        panel.grid = element_blank(),
        legend.position = "top")

Andrzej · July 10, 2023, 1:46pm

Thank you very much, this is very helpful.
A lot of code written, I will analyse it.

Yet still I would like to know if possible, please how these two columns in my "source" dataframe were calculated. I need this for another project.
Thanks again.

nirgrahamuk · July 10, 2023, 1:50pm

I dont think those example numbers were arrived at in a deterministic way....

Andrzej · July 10, 2023, 2:02pm

I asked because they were used for creation of that nice plot with Python code like this:

alt.Chart(source).mark_bar().encode(
    x='percentage_start:Q',
    x2='percentage_end:Q',
    y=alt.Y('question:N', axis=y_axis),
    color=alt.Color(
        'type:N',
        legend=alt.Legend( title='Response'),
        scale=color_scale,
    )
)

from here:
https://altair-viz.github.io/gallery/diverging_stacked_bar_chart.html

so somehow they must have been calculated in order to prepare those two columns "percentage_start" and "percentage_end" for later usage to make that plot, I guess. Taking that into account and fact that plot looks good/correct I assume those numbers are correct ?

nirgrahamuk · July 10, 2023, 4:20pm

how much they are shifted to the left is somewhat arbitrary, I think the only correctness required is that the difference between the start and end should equal the percentage column.

As it happens, I think I have spotted what they have done. They have taken the Neither Agree nor disagree category's percentage value, changed its sign to minus, and halved it. then constructed the bounds of the other groups to connect to it.

here's an attemp to reproduce.

library(tidyverse)
source <- data.frame(
  question = c(
    "Question 1", "Question 1", "Question 1", "Question 1", "Question 1",
    "Question 2", "Question 2", "Question 2", "Question 2", "Question 2",
    "Question 3", "Question 3", "Question 3", "Question 3", "Question 3",
    "Question 4", "Question 4", "Question 4", "Question 4", "Question 4",
    "Question 5", "Question 5", "Question 5", "Question 5", "Question 5",
    "Question 6", "Question 6", "Question 6", "Question 6", "Question 6",
    "Question 7", "Question 7", "Question 7", "Question 7", "Question 7",
    "Question 8", "Question 8", "Question 8", "Question 8", "Question 8"
  ),
  type = c(
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree"
  ),
  value = c(
    24, 294, 594, 1927, 376,
    2, 2, 0, 7, 11,
    2, 0, 2, 4, 2,
    0, 2, 1, 7, 6,
    0, 1, 3, 16, 4,
    1, 1, 2, 9, 3,
    0, 0, 1, 4, 0,
    0, 0, 0, 0, 2
  ),
  percentage = c(
    0.7, 9.1, 18.5, 59.9, 11.7,
    18.2, 18.2, 0, 63.6, 0,
    20, 0, 20, 40, 20,
    0, 12.5, 6.3, 43.8, 37.5,
    0, 4.2, 12.5, 66.7, 16.7,
    6.3, 6.3, 12.5, 56.3, 18.8,
    0, 0, 20, 80, 0,
    0, 0, 0, 0, 100
  ),
  pstart_orig = c(
    -19.1, -18.4, -9.2, 9.2, 69.2,
    -36.4, -18.2, 0, 0, 63.6,
    -30, -10, -10, 10, 50,
    -15.6, -15.6, -3.1, 3.1, 46.9,
    -10.4, -10.4, -6.3, 6.3, 72.9,
    -18.8, -12.5, -6.3, 6.3, 62.5,
    -10, -10, -10, 10, 90,
    0, 0, 0, 0, 0
  ),
  pend_orig = c(
    -18.4, -9.2, 9.2, 69.2, 80.9,
    -18.2, 0, 0, 63.6, 63.6,
    -10, -10, 10, 50, 70,
    -15.6, -3.1, 3.1, 46.9, 84.4,
    -10.4, -6.3, 6.3, 72.9, 89.6,
    -12.5, -6.3, 6.3, 62.5, 81.3,
    -10, -10, 10, 90, 90,
    0, 0, 0, 0, 100
  )
)

source$type <- factor(source$type,
       level= c("Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree"),
       ordered = TRUE)


(s2 <- source |> mutate(midpoint = if_else(type=="Neither agree nor disagree",round(-percentage/2,1),NA)))

(before <- s2 |> filter(type %in% c("Strongly disagree", "Disagree", "Neither agree nor disagree" )) |> arrange(
  question,desc(type)))


do_befores <- function(dfb){
  dfb$perc_start <- NA
  dfb$perc_end <- NA
  for(i in seq_len(nrow(dfb))){
    midpoint <- dfb[i,]$midpoint 
    if(!is.na(midpoint)){
      dfb[i,]$perc_start <- midpoint
      dfb[i,]$perc_end <- midpoint + dfb[i,]$percentage 
    } else {
      dfb[i,]$perc_end <-     dfb[i-1,]$perc_start  
      dfb[i,]$perc_start <-   dfb[i,]$perc_end - dfb[i,]$percentage 
      
    }
  }
  dfb
}

bef2 <- do_befores(before)


(after <- s2 |> filter(type %in% c("Neither agree nor disagree",
                                   "Agree",
                                   "Strongly agree" )) |> arrange(
  question,type))


do_after <- function(dfb){
  dfb$perc_start <- NA
  dfb$perc_end <- NA
  for(i in seq_len(nrow(dfb))){
    midpoint <- dfb[i,]$midpoint 
    if(!is.na(midpoint)){
      dfb[i,]$perc_start <- midpoint
      dfb[i,]$perc_end <- midpoint + dfb[i,]$percentage 
    } else {
      dfb[i,]$perc_start <-     dfb[i-1,]$perc_end  
      dfb[i,]$perc_end <-   dfb[i,]$perc_start + dfb[i,]$percentage 
      
    }
  }
  dfb
}

after2 <- do_after(after)

fin_1 <- bind_rows(bef2,after2) |> distinct() |> arrange(question,type)

#check differences
fin_1$pstart_orig - fin_1$perc_start
fin_1$pend_orig - fin_1$perc_end

Andrzej · July 10, 2023, 4:39pm

Thank you very much indeed Nir,

Could you please if you have time, explain as I would like to understand what does the code do in do_befores() and do_after() functions ?
Especially what is going on inside for loop in both functions ?

I am very grateful for your solution and time you spent to do this.

nirgrahamuk · July 10, 2023, 4:41pm

look at the before frame notice the relationships within a given question for how a line below has a common value with a line above ; we have to create this with dfb[i,]$perc_start <- dfb[i-1,]$perc_end etc.
and then the other code sets the other value, in this case the end , the end is the start plus the amount of the bar dfb[i,]$perc_end <- dfb[i,]$perc_start + dfb[i,]$percentage

after is similar

Andrzej · July 10, 2023, 4:43pm

And this code is just to check how far final results are differ from originals ? They are almost the same.

nirgrahamuk · July 10, 2023, 4:52pm

yes, thats right; but actually it was close, partly from 'trusting' the original percentage calc, which seems flawed.
I notice question 2. the biggest category with 11 entries is "strongly agree" it should have a 50% size not zero ?

Here I reran it with my own calculation.

library(tidyverse)
source <- data.frame(
  question = c(
    "Question 1", "Question 1", "Question 1", "Question 1", "Question 1",
    "Question 2", "Question 2", "Question 2", "Question 2", "Question 2",
    "Question 3", "Question 3", "Question 3", "Question 3", "Question 3",
    "Question 4", "Question 4", "Question 4", "Question 4", "Question 4",
    "Question 5", "Question 5", "Question 5", "Question 5", "Question 5",
    "Question 6", "Question 6", "Question 6", "Question 6", "Question 6",
    "Question 7", "Question 7", "Question 7", "Question 7", "Question 7",
    "Question 8", "Question 8", "Question 8", "Question 8", "Question 8"
  ),
  type = c(
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree",
    "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree"
  ),
  value = c(
    24, 294, 594, 1927, 376,
    2, 2, 0, 7, 11,
    2, 0, 2, 4, 2,
    0, 2, 1, 7, 6,
    0, 1, 3, 16, 4,
    1, 1, 2, 9, 3,
    0, 0, 1, 4, 0,
    0, 0, 0, 0, 2
  ),
  porig = c(
    0.7, 9.1, 18.5, 59.9, 11.7,
    18.2, 18.2, 0, 63.6, 0,
    20, 0, 20, 40, 20,
    0, 12.5, 6.3, 43.8, 37.5,
    0, 4.2, 12.5, 66.7, 16.7,
    6.3, 6.3, 12.5, 56.3, 18.8,
    0, 0, 20, 80, 0,
    0, 0, 0, 0, 100
  ),
  pstart_orig = c(
    -19.1, -18.4, -9.2, 9.2, 69.2,
    -36.4, -18.2, 0, 0, 63.6,
    -30, -10, -10, 10, 50,
    -15.6, -15.6, -3.1, 3.1, 46.9,
    -10.4, -10.4, -6.3, 6.3, 72.9,
    -18.8, -12.5, -6.3, 6.3, 62.5,
    -10, -10, -10, 10, 90,
    0, 0, 0, 0, 0
  ),
  pend_orig = c(
    -18.4, -9.2, 9.2, 69.2, 80.9,
    -18.2, 0, 0, 63.6, 63.6,
    -10, -10, 10, 50, 70,
    -15.6, -3.1, 3.1, 46.9, 84.4,
    -10.4, -6.3, 6.3, 72.9, 89.6,
    -12.5, -6.3, 6.3, 62.5, 81.3,
    -10, -10, 10, 90, 90,
    0, 0, 0, 0, 100
  )
)

source$type <- factor(source$type,
       level= c("Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree"),
       ordered = TRUE)

source <- group_by(source,question) |> mutate(my_percent = 100*value/sum(value))

(s2 <- source |> mutate(midpoint = if_else(type=="Neither agree nor disagree",-my_percent/2,NA)))

(before <- s2 |> filter(type %in% c("Strongly disagree", "Disagree", "Neither agree nor disagree" )) |> arrange(
  question,desc(type)))


do_befores <- function(dfb){
  dfb$perc_start <- NA
  dfb$perc_end <- NA
  for(i in seq_len(nrow(dfb))){
    midpoint <- dfb[i,]$midpoint 
    if(!is.na(midpoint)){
      dfb[i,]$perc_start <- midpoint
      dfb[i,]$perc_end <- midpoint + dfb[i,]$my_percent 
    } else {
      dfb[i,]$perc_end <-     dfb[i-1,]$perc_start  
      dfb[i,]$perc_start <-   dfb[i,]$perc_end - dfb[i,]$my_percent 
      
    }
  }
  dfb
}

bef2 <- do_befores(before)


(after <- s2 |> filter(type %in% c("Neither agree nor disagree",
                                   "Agree",
                                   "Strongly agree" )) |> arrange(
  question,type))


do_after <- function(dfb){
  dfb$perc_start <- NA
  dfb$perc_end <- NA
  for(i in seq_len(nrow(dfb))){
    midpoint <- dfb[i,]$midpoint 
    if(!is.na(midpoint)){
      dfb[i,]$perc_start <- midpoint
      dfb[i,]$perc_end <- midpoint + dfb[i,]$my_percent 
    } else {
      dfb[i,]$perc_start <-     dfb[i-1,]$perc_end  
      dfb[i,]$perc_end <-   dfb[i,]$perc_start + dfb[i,]$my_percent 
      
    }
  }
  dfb
}

after2 <- do_after(after)

fin_1 <- bind_rows(bef2,after2) |> distinct() |> arrange(question,type)

#check differences
fin_1$pstart_orig - fin_1$perc_start
fin_1$pend_orig - fin_1$perc_end

Andrzej · July 10, 2023, 5:08pm

Thank you very much indeed, I can't thank you enough.

I have a lot to learn and study today from your code.

best wishes,
Andrzej

Andrzej · July 10, 2023, 6:09pm

I have plugged your data/calculations @nirgrahamuk in and here below there is result:

type_order <- c(
  "Strongly disagree", "Disagree", "Neither agree nor disagree",
  "Agree", "Strongly agree"
)
# Create the plot
ggplot(fin_1) +
  geom_rect(
    aes(xmin = perc_start, xmax = perc_end, ymin = as.numeric(factor(question)), ymax = as.numeric(factor(question)) + 0.9,
 fill = type),
    color = "white"
  ) +
  geom_text(
    aes(x = (perc_start + perc_end) / 2, y = as.numeric(factor(question)) + 0.45, label = paste0(round(my_percent, digits = 2),
 "%")),
    hjust = 0.5,
    color = "black",
    size = 5
  ) +
  geom_text(
    aes(x = -28.6, y = as.numeric(factor(question)) + 0.45, label = question),
    hjust = 1,
    color = "black",
    size = 5
  ) +
  scale_fill_manual(values = color_palette, breaks = type_order) +
  labs(x = "Percentage", y = "Question", fill = "Response") +
  theme_minimal() +
  theme(axis.title.y = element_blank(), axis.text.y = element_blank())+
  theme(
    axis.title.y = element_blank(),
    axis.text.y = element_text(size = 1),
    axis.text.x = element_text(size = 10)
  )

Thank you.

system · July 17, 2023, 6:09pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.