Create "Order" Column based on column values

rj88 · December 29, 2022, 2:16pm

Hello.

I'm new to R, so please forgive any knowledge gaps.

In the data frame returned from the below, each row represents an operation done in a theatre, where "Location" is the operating theatre name, "Date" is the date of the operation, "Type" is whether the operating session was scheduled for full day/AM/PM, and "Order" specifies the order in which each patient was scheduled onto the operating list.

I'm trying to create a stacked bar graph to illustrate the order of patients booked onto each operating list, which I've currently accomplished with the x-axis being "Mins" and y-axis being "Type". This results in my y-axis being three bars wide (one bar for AM, PM and Full-Day).

I would like to consolidate the current y-axis into a single bar, which I've so far only managed to do in Excel in and am now trying to achieve the same thing in R where I'm much less competent! To do it, I need to change the "Order" column so that cases in the "PM" continue on the count from any cases completed in the "AM" provided that the Date/Location both match.

You can see that rows 4-9 of the below data frame represent the operations completed on "2023-01-03" in "Loc2", with the "Order" of AM cases in rows 4-7 increasing by one incrementally. Then, the the count resets for the 2 cases in the PM seen in rows 8-9. Instead of this happening, I would like the PM count to continue so that these are seen as cases 5 and 6 respectively (as shown in the "Correct" column... which contains the solution I'm trying to arrive at).

Any help really appreciated.
Thanks.


Date <- as.Date(c("2023-01-06","2023-01-06","2023-01-06","2023-01-03","2023-01-03","2023-01-03","2023-01-03",
          "2023-01-03","2023-01-03","2023-01-04","2023-01-04","2023-01-04"))
Location <- c("Loc1","Loc1","Loc1","Loc2","Loc2","Loc2","Loc2","Loc2","Loc2","Loc2","Loc2","Loc2")
Type <- c("Full_Day","Full_Day","Full_Day","AM","AM","AM","AM","PM","PM","AM","PM","PM")
Order <- c("1","2","3","1","2","3","4","1","2","1","1","2")
Correct <- c("1","2","3","1","2","3","4","5","6","1","2","3")
Mins <- as.integer(c("10","20","30","40","50","60","70","80","90","100","110","120"))

df <- data.frame(Date, Location, Type, Order, Correct, Mins)

FJCC · December 29, 2022, 4:39pm

I don't understand exactly what you want to graph, but you can label the order of the rows using group_by() and row_number().

Date <- as.Date(c("2023-01-06","2023-01-06","2023-01-06","2023-01-03","2023-01-03","2023-01-03","2023-01-03",
                  "2023-01-03","2023-01-03","2023-01-04","2023-01-04","2023-01-04"))
Location <- c("Loc1","Loc1","Loc1","Loc2","Loc2","Loc2","Loc2","Loc2","Loc2","Loc2","Loc2","Loc2")
Type <- c("Full_Day","Full_Day","Full_Day","AM","AM","AM","AM","PM","PM","AM","PM","PM")
Order <- c("1","2","3","1","2","3","4","1","2","1","1","2")
Correct <- c("1","2","3","1","2","3","4","5","6","1","2","3")
Mins <- as.integer(c("10","20","30","40","50","60","70","80","90","100","110","120"))

df <- data.frame(Date, Location, Type, Order, Correct, Mins)
library(dplyr)
df
#>          Date Location     Type Order Correct Mins
#> 1  2023-01-06     Loc1 Full_Day     1       1   10
#> 2  2023-01-06     Loc1 Full_Day     2       2   20
#> 3  2023-01-06     Loc1 Full_Day     3       3   30
#> 4  2023-01-03     Loc2       AM     1       1   40
#> 5  2023-01-03     Loc2       AM     2       2   50
#> 6  2023-01-03     Loc2       AM     3       3   60
#> 7  2023-01-03     Loc2       AM     4       4   70
#> 8  2023-01-03     Loc2       PM     1       5   80
#> 9  2023-01-03     Loc2       PM     2       6   90
#> 10 2023-01-04     Loc2       AM     1       1  100
#> 11 2023-01-04     Loc2       PM     1       2  110
#> 12 2023-01-04     Loc2       PM     2       3  120
df |> group_by(Date, Location) |> 
  mutate(Correct2 = row_number())
#> # A tibble: 12 × 7
#> # Groups:   Date, Location [3]
#>    Date       Location Type     Order Correct  Mins Correct2
#>    <date>     <chr>    <chr>    <chr> <chr>   <int>    <int>
#>  1 2023-01-06 Loc1     Full_Day 1     1          10        1
#>  2 2023-01-06 Loc1     Full_Day 2     2          20        2
#>  3 2023-01-06 Loc1     Full_Day 3     3          30        3
#>  4 2023-01-03 Loc2     AM       1     1          40        1
#>  5 2023-01-03 Loc2     AM       2     2          50        2
#>  6 2023-01-03 Loc2     AM       3     3          60        3
#>  7 2023-01-03 Loc2     AM       4     4          70        4
#>  8 2023-01-03 Loc2     PM       1     5          80        5
#>  9 2023-01-03 Loc2     PM       2     6          90        6
#> 10 2023-01-04 Loc2     AM       1     1         100        1
#> 11 2023-01-04 Loc2     PM       1     2         110        2
#> 12 2023-01-04 Loc2     PM       2     3         120        3

^{Created on 2022-12-29 with reprex v2.0.2}

jrkrideau · December 29, 2022, 4:50pm

I am with FJCC. I do not understand what you are doing. I do not see how you can get a single bar.

You say * which I've currently accomplished with the x-axis being "Mins" and y-axis being "Type"*

Can you paste the code here. Perhaps a screenshot of the Excel plot too?
Thanks.

Wait a minute, is this something arong the lines of what you want? Note I have renamed your data.frame and changed var-names to lowercase.

library(tidyverse)
dat1  <- structure(list(dat = structure(c(19363, 19363, 19363, 19360,  19360, 19360, 19360, 19360, 19360, 19361, 19361, 19361), class = "Date"),      
                        location = c("Loc1", "Loc1", "Loc1", "Loc2", "Loc2", "Loc2", "Loc2", "Loc2", "Loc2", "Loc2", "Loc2", "Loc2"), 
                        type = c("Full_Day",  "Full_Day", "Full_Day", "AM", "AM", "AM", "AM", "PM", "PM", "AM", "PM", "PM"), 
                        order = c("1", "2", "3", "1", "2", "3", "4", "1", "2", "1", "1", "2"), 
                        correct = c("1", "2", "3", "1", "2", "3", "4", "5", "6", "1", "2", "3"), 
                        mins = c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 90L, 100L, 110L, 120L)), class = "data.frame", row.names = c(NA,  -12L))

p  <- ggplot(dat1, aes(type, mins))
p + geom_col(aes(fill = as.factor(order)))

bars

rj88 · December 29, 2022, 5:35pm

Thanks both for the help, FJCC's group_by mutate is doing a lot of what I want. i.e., it's allowing me to create a single stacked plot which is giving me the order of all patients booked onto an operating list (as below) rather than having the three bars. This allows me to see how much of the "available" theatre session time has been booked versus the total time available (red cross). What I'm trying to do is illustrate the "booked time" of cases booked onto a list (see "Mins" in previous code) versus total time available (which I didn't supply in previous example but is shown as the red cross below).

I have a number of follow-up questions. If someone is happy to spare me 30 mins in a Teams call to run through, we can agree a price & I can pay half before/after regardless of outcome? I'm not sure if the forum allows this from a quick Google though? If not, I'll try and better explain when back at desk.

FJCC · December 29, 2022, 6:43pm

I encourage you to ask your questions here. And to post your ggplot code.

jrkrideau · December 29, 2022, 8:15pm

I second FJCC suggestion and I never thought about that kind of layout but now it is obvious.

rj88 · December 29, 2022, 8:22pm

OK - I'm pulling data directly from a SQL connection though, so I've added an anonymised copy of what the SQL output looks like and put it on github (link below). I feel like I'm asking for a lot here, so thanks again for patience/support.

rollingstone09/help (github.com)

The table contains records for patients (PID) booked into sessions (SID) for next week, where a session is effectively an "open block" of time that patients can be booked into. For example, rows 2-6 represent 5x patients (PIDs) relating to the same session (same SID) . The "all day" session in question is scheduled to run for 540 minutes, and each row tells you how long each individual person is scheduled to take. The "order" column details in which order each patient will enter the session (where order = 1 is first in, and order = 5 is last in).

The below gets you to where I currently am where the attached is imported as "a":

a <- read_excel("a.xlsx")
a$Date <- as.Date(as.character(a$Date))
a$Week <- as.Date(as.character(a$Week))
a$Day <- factor(a$Day,
                 levels = c("Monday", "Tuesday", 
                            "Wednesday", "Thursday", 
                            "Friday", "Saturday", "Sunday"))

a$Session <- factor(a$Session,
                     levels = c("PM", "AM", 
                                "All Day"))

a$Priority <- as.character(as.character(a$Priority))

a %>% 
  ggplot() +
  geom_bar(aes(Session, Available), stat = "identity", position = "identity", colour = "red", fill = "white", alpha = 0, linetype = 2) +
  geom_col(aes(Session, Booked, fill = Priority), colour = "black") +
  geom_point(aes(Session, Available), size = 1, colour = "red", shape = 4, stroke = 2) +
  coord_flip() +
  facet_grid(Theatre~Day) +
  theme_bw()

My problems are:

a) I don't like the fact that the AM/PM/All Day are broken out across 3x bars on the y-axis, and I want AM/PM sessions to be treated just like the All-Day sessions are (so each facet has a single bar representing what each theatre looks like on each upcoming weekday). How I've solved this in Excel before is to re-do the "Order" column so that provided the theatre/date match, PM cases continue the count from AM cases (so N41:N44 of the attached would read 1,2,3,4 rather than 1,2,1,2). I'm just testing to see whether the solution FJCC works for this.

b) The purpose of the plot is to review which upcoming sessions have available time (red dotted boxes/red x marker) that could be potentially utilised. The "Available" minutes field is being joined from another table in a SQL query, hence why rows 1-6 refer to the same SID but are each counts the available minutes as 540. When solving (a) above, I need to ensure that the 540 minutes are not being summed for each of the 5x patients. For example, rows 2-6 on the attached represent the Theatre 03 Wednesday facet, where 540 minutes is being labelled as available. I need to avoid this being summed, and thus becoming 5x540 = 2700 minutes.

c) I want to introduce a small level of interactivity to the plot before adding it to an RMarkdown .html report, so that information relating to each patient can be viewed on hover. This information would relate to the following fields: "Operation", "Specialty", "WardTo" and "Surgeon". I have a very small amount of using experience with ggplotly, ggiraph and highcharter -- but having no success due to:

ggplotly squeezes the facets, and even through Googling I've been unable to tidy
ggiraph seems to completely change the scale of everything, and I haven't yet been able to understand either why or how to remedy.
highcharter seems a bit too complicated for me at this stage for this type of plot.

It may just be that the solutions to each of the above are obvious, and I'm sure I'll get there eventually - but any help on any of them really appreciated.

rj88 · December 29, 2022, 8:24pm

ggiraph (having to do this due to new users being limited on embedding things)

rj88 · December 29, 2022, 8:24pm

ggplotly (having to do this due to new users being limited on embedding things)

system · January 19, 2023, 8:24pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.