# ggplot and group_by and percentage plot

I have a data-set that has 2 columns of time (each column has time in hours before deadline) and 1 column of grade.

Here's how data looks like (sorry, for some reason I cannot upload my test file)

0.6 0.2 A
0.8 0.5 B-
1.0 0.75 A
1.2 0.8 C+
1.8 1.0 B-
1.8 1.2 B
2.6 2.2 C-
3.0 2.8 B+
3.2 2.5 B
5.0 3.7 A

What I would like to do is, to first sort the data first by column 1, and find the count of grades in a given range of 1 hour time intervals. For example - 0 to 1 hr # of A's is 5, # of B+ is 2 and so on. Then I would like to do the same with respect to column 2. Thirdly do it by group_by both the columns 1 and 2. Followed by this, I want to find percentage of each grade within the hour range and then use the bar plot to plot by hour and put the percentage distribution of grades within that bar plot.

I hope I have been able to explain properly. The biggest hurdle that I have had is to find the number of A's and B's within a given range of time.

Appreciate all the help.

~SD

Here is a start to what I think you want to do.

``````library(ggplot2)
library(dplyr, warn.conflicts = FALSE)
DF <- read.csv("~/R/Play/Dummy.csv", sep = " ")
DF <- DF %>% mutate(FirstFactor = cut(First, breaks = 0:5, include.lowest = TRUE),
LastFactor = cut(Last, breaks = 0:5, include.lowest = TRUE))
DF
#>    First Last Grade FirstFactor LastFactor
#> 1    0.6 0.20     A       [0,1]      [0,1]
#> 2    0.8 0.50    B-       [0,1]      [0,1]
#> 3    1.0 0.75     A       [0,1]      [0,1]
#> 4    1.2 0.80    C+       (1,2]      [0,1]
#> 5    1.8 1.00    B-       (1,2]      [0,1]
#> 6    1.8 1.20     B       (1,2]      (1,2]
#> 7    2.6 2.20    C-       (2,3]      (2,3]
#> 8    3.0 2.80    B+       (2,3]      (2,3]
#> 9    3.2 2.50     B       (3,4]      (2,3]
#> 10   5.0 3.70     A       (4,5]      (3,4]
COUNTS_First <- DF %>% group_by(FirstFactor, Grade) %>%
summarize(N = n())
#> `summarise()` regrouping output by 'FirstFactor' (override with `.groups` argument)
COUNTS_First
#> # A tibble: 9 x 3
#> # Groups:   FirstFactor [5]
#>   <fct>       <chr> <int>
#> 1 [0,1]       A         2
#> 2 [0,1]       B-        1
#> 3 (1,2]       B         1
#> 4 (1,2]       B-        1
#> 5 (1,2]       C+        1
#> 6 (2,3]       B+        1
#> 7 (2,3]       C-        1
#> 8 (3,4]       B         1
#> 9 (4,5]       A         1

TotalByHour <- COUNTS_First %>% group_by(FirstFactor) %>% summarize(Total = sum(N))
#> `summarise()` ungrouping output (override with `.groups` argument)
TotalByHour
#> # A tibble: 5 x 2
#>   FirstFactor Total
#>   <fct>       <int>
#> 1 [0,1]           3
#> 2 (1,2]           3
#> 3 (2,3]           2
#> 4 (3,4]           1
#> 5 (4,5]           1

COUNTS_First <- inner_join(COUNTS_First, TotalByHour, by = "FirstFactor") %>%
mutate(Fraction = N/Total)
COUNTS_First
#> # A tibble: 9 x 5
#> # Groups:   FirstFactor [5]
#>   FirstFactor Grade     N Total Fraction
#>   <fct>       <chr> <int> <int>    <dbl>
#> 1 [0,1]       A         2     3    0.667
#> 2 [0,1]       B-        1     3    0.333
#> 3 (1,2]       B         1     3    0.333
#> 4 (1,2]       B-        1     3    0.333
#> 5 (1,2]       C+        1     3    0.333
#> 6 (2,3]       B+        1     2    0.5
#> 7 (2,3]       C-        1     2    0.5
#> 8 (3,4]       B         1     1    1
#> 9 (4,5]       A         1     1    1

ggplot(COUNTS_First, aes(FirstFactor, Fraction, fill = Grade)) + geom_col()
``````

Created on 2021-03-23 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.