Sorting out timebased data in Excel from multiple rows into a single row (into a new table)

Dear You who are reading this,

I'm a medical student and I'm not really good with Excel. For my thesis, I have a big database and now I reached a problem where there could be an easy way but on my own, I can only copy&paste some thousands of rows of data.

I have a database that is a list of selected patients with all their ED visits. Every single diagnosis (text format and BNO code) is written in a new row in Excel. Every patient has one row (at least) where they have a COVID-19 related BNO (three different letter&number value covers this, right now it's formatted so for manual copy&paste, they are easy to spot).

My job now is to get the row of their COVID-19 related visit into a new Excel table - but to the same row. I need to put all the other diagnoses that they might have had during the same visit (on the 1st picture, this is the red dotted area). They may have no additional rows with diagnoses or they may have 2-5 more rows even (hence the free space on the 2nd picture, some patients has that area filled up).

Then still in the same row in the new table, comes the part of their 2nd visit (the green dotted area on the 1st picture). This can be one diagnosis aka one row or once again, multiple ones from the first table that needs to be copied.

The 2nd picture shows what the singular row would look like in the new Excel sheet. (The blurred area is the patient's name (1st column in both tables, the 2nd table only contains them once obviously) or their unique identifying numbers).

Some patients have e.g. 20 rows in the first table, and out of these I only need 1.) the rows related to the COVID-19 BNO (which is on the same date - date is in the 4th column of the copied area in the 1st picture) then 2.) their next visit (so all diagnoses that they have with the next possible date. Some patients have 2 different dates with COVID-19 BNO, in this case, I need the one where there is another, non-COVID-19 BNO ED visit date following the COVID-19 related visit (if there are multiple like this, then the first one).

The example pictures:


(as I'm a new user, I can only upload one picture, the upper part would be the first and the down part the 2nd one.)

My counsellor advised me to ask this community, I hope my question is written in the correct way. I'm new here, I'm sorry if I wrote something badly. I can do this manually but if there could be an easier way then I'd like to try, thank you to anyone who spares a thought on this matter!

Are you looking for an answer using R? It is certainly possible to load excel data into R, perform calculations and reshaping in R, and save the output back into Excel, but I'm not sure if that's what you want.

There are a few steps involved here, each of which has many tutorials online.

Here's some fake data:

df <- data.frame(patient = c(1, 1, 1, 2, 3),
                 date = seq.Date(as.Date("2020-12-01"), by = -1, length.out = 5),
                 diagnosis = LETTERS[1:5])

  patient       date diagnosis
1       1 2020-12-05         A
2       1 2020-12-04         B
3       1 2020-12-03         C
4       2 2020-12-02         D
5       3 2020-12-01         E

Assuming you first install R, install RStudio, install the tidyverse package, and load the excel data in, you could run this R code to reshape the table. This says that, "start from the data frame we've saved called df, then (ie %>%) group the data by patient, then arrange the data by patient and date, then add a variable called visit_number that records which row we're on within each patient, then ungroup the data, then reshape it to a wider shape using the visit_number as the column name, prefixed with visit_, and using the diagnosis and date as the data.

library(tidyverse)
df %>%
  group_by(patient) %>%
  arrange(patient, date) %>%
  mutate(visit_number = row_number()) %>%
  ungroup() %>%
  pivot_wider(names_from = visit_number,
              names_prefix = "visit_",
              values_from = c(diagnosis, date))

That produces, which I think is getting towards what you're looking for.


# A tibble: 3 x 7
  patient diagnosis_visit_1 diagnosis_visit_2 diagnosis_visit_3 date_visit_1 date_visit_2 date_visit_3
    <dbl> <chr>             <chr>             <chr>             <date>       <date>       <date>      
1       1 C                 B                 A                 2020-12-03   2020-12-04   2020-12-05  
2       2 D                 NA                NA                2020-12-02   NA           NA          
3       3 E                 NA                NA                2020-12-01   NA           NA

Then you could save to excel.

It's a bunch of steps, but all of them individually are well documented online.

Thank you so much for the detailed answer and help!! I will try to work with this!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.