I have the following example data set from a survey about injuries and what side they occur on. The injuries are described in columns (Lat_Ankle_Sprain, knee_sprain, shoulder_sprain) that are abbreviated for easier coding and they have values "Left", "Right", "Both", or "NA". Survey respondents then say which of these injuries caused restricted sport participation (Injury_RestrictedParticipation). one, multiple, or no injuries could have caused restricted sport participation. The injuries listed in Injury_RestrictedParticipation are not abbreviated (see below code chunk):
library(tidyverse)
library(stringr)
df<- structure(list(Lat_Ankle_Sprain = c("Left", "Right", "Left",
"Left", "NA", "NA"), Knee_Sprain = c("Right", "Right", "Left",
"Both", "NA", "Left"), Shoulder_Sprain = c("Right", "Left", "NA",
"Right", "Right", "Right"), Injury_RestrictedParticipation = c("Lateral Ankle Sprain (i.e., inversion sprain),Knee Sprain (or ligament tear)",
"Knee Sprain (or ligament tear),Lateral Ankle Sprain (i.e., inversion sprain)",
"Knee Sprain (or ligament tear)", "Knee Sprain (or ligament tear),Shoulder or Arm Strain (e.g., rotator cuff, biceps),Lateral Ankle Sprain (i.e., inversion sprain)",
"NA", "Shoulder or Arm Strain (e.g., rotator cuff, biceps)")), class = "data.frame", row.names = c(NA,
-6L))
abbreviations <- c("Lateral Ankle Sprain (i.e., inversion sprain)" = "Lat_Ankle_Sprain",
"Knee Sprain (or ligament tear)" = "Knee_Sprain",
"Shoulder or Arm Strain (e.g., rotator cuff, biceps)" = "Shoulder_Sprain")
The Goal is to determine what side the primary injury occurred on and put this data in a column called "Primary_Injury_Side"
To determine this, I aim to take the list of injuries in Injury_RestrictedParticipation, find the corresponding columns, and determine what side the injury occurs. If all the injuries occur on the same side (i.e., Left) then that is the Primary_Injury_Side (i.e., Left). However, in cases where multiple injuries are listed and they occur on different sides, Primary_Injury_Side should be "Both". And if no injuries are listed, Primary_Injury_Side should be "None".
I will provide my code that is not quite working below.
get_primary_side <- function(injury_names) {
injuries <- strsplit(as.character(injury_names), "(?<=\\.(?:i\\.e\\.|e\\.g\\.)),", perl = TRUE)[[1]] #splits injury list by comma - but ignores cases were comma is used after e.g. or i.e. within the injury name
side_values <- sapply(injuries, function(injury) get_side_for_injury(injury))
unique_sides <- unique(side_values[!is.na(side_values)])
if (length(unique_sides) == 1) {
return(as.character(unique_sides))
} else {
return("Both")
}
}
get_side_for_injury <- function(injury) {
side_value <- get_abbrev_column_names(injury)
if (!is.na(side_value)) {
return(side_value)
} else {
return(NA)
}
}
get_abbrev_column_names <- function(injury_name) {
return(abbreviations[injury_name])
}
Injury_df <- df %>%
mutate(Primary_Injury_Side = case_when(
is.na(Injury_RestrictedParticipation) ~ "None",
TRUE ~ get_primary_side(Injury_RestrictedParticipation)))
A second aim of mine is to create a column called LowerBody_Injury_Side which takes the Injury_RestrictedParticipation and follows the same logic as above but ignores columns that are associated with the upper body.
Thank you in advance for you help on this! I would love to provide any additional information that may be helpful.