Joining Datasets - what am I doing wrong?

marathoneer · November 18, 2021, 5:42pm

Hi all,

I'm trying to join two datasets from Public Health Scotland

There's a large dataset of hospital admissions here
and a smaller dataset of patient deaths here. I want to join the two so I can compare deaths as a proportion of admissions over time. However, I can't work out how to do this without causing lots of duplication of rows. As all deaths are a subset of admissions, I thought left joining deaths to admissions would be the most logical but that doesn't seem to work. I'd really welcome any assistance. Very many thanks.

Code is

library(tidyverse)

admissions <- read_csv("https://www.opendata.nhs.scot/dataset/b0135993-3d8a-4f3b-afcf-e01f4d52137c/resource/aee43295-2a13-48f6-bf05-92769ca7c6cf/download/ui_admissions_2021.csv")
deaths <- read_csv("https://www.opendata.nhs.scot/dataset/b0135993-3d8a-4f3b-afcf-e01f4d52137c/resource/89807e07-fc5f-4b5e-a077-e4cf59491139/download/ui_deaths_2021.csv")

combined <- admissions %>% left_join(deaths)

nirgrahamuk · November 18, 2021, 5:48pm

knowing the contents of the data is generally going to be important.
you are using left_join without specificying the join varaibles to use.
There are many common variable names across the two.

 [1] "HBR"              "HBRQF"            "CA"               "CAQF"             "AgeGroup"        
 [6] "AgeGroupQF"       "Sex"              "SexQF"            "InjuryLocation"   "InjuryLocationQF"
[11] "InjuryType"       "InjuryTypeQF"

Is there a variable that represents a unique patient ?

system · December 9, 2021, 5:49pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.