I am trying to merge sub-topic grouping (see below) with a data set containing observations with distinct ICD-10 codes, each of which falls within the range that is indicates in the sub-topic groupings. I could certainly manually merge but if I can avoid it... would much rather do that. Any guidance on the right commands merge the below sub-topic groupings to the data set with distinct ICD-10 codes (that data set is ~65,000 observations)
Thank you in advance!
It's not totally clear to me what you're trying to do, it would help if you could provide a reproducible example. Especially for people like me who have no idea what ICD-10 is.
My best guess is that you have a reference table with ICD codes and their descriptions. Let's create one:
library(tidyverse)
icd_reference <- tribble(~column,
"V80-V89 Other land",
"V90-V94 Water transport")
And that you want to process this table to have one row per ICD number in the range. First, we need to separate the codes from their description:
icd_reference |>
separate(col = column,
into = c("start_cat","end_cat", "description"),
sep = "[- ]",
extra = "merge")
#> # A tibble: 2 × 3
#> start_cat end_cat description
#> <chr> <chr> <chr>
#> 1 V80 V89 Other land
#> 2 V90 V94 Water transport
If you're loading that table from a file, it's easier to make sure the columns are read properly in the first place so the previous step is unnecessary.
We now have the start and end of each category range as a character, we need to make it numerical so we can do range computations. For this I define an accessory function:
name_to_number <- function(names){
names |>
str_replace("V", "") |>
as.integer()
}
icd_reference |>
separate(col = column,
into = c("start_cat","end_cat", "description"),
sep = "[- ]",
extra = "merge") |>
mutate(start_as_num = name_to_number(start_cat),
end_as_num = name_to_number(end_cat))
#> # A tibble: 2 × 5
#> start_cat end_cat description start_as_num end_as_num
#> <chr> <chr> <chr> <int> <int>
#> 1 V80 V89 Other land 80 89
#> 2 V90 V94 Water transport 90 94
Now that we have the start and end as numerical, we can use them in seq()
to generate the whole range. I'll use map2()
, which can take both vectors in parallel, and return a list:
icd_reference |>
separate(col = column,
into = c("start_cat","end_cat", "description"),
sep = "[- ]",
extra = "merge") |>
mutate(start_as_num = name_to_number(start_cat),
end_as_num = name_to_number(end_cat),
icd_category = map2(start_as_num, end_as_num,
~ seq(from = .x, to = .y)))
#> # A tibble: 2 × 6
#> start_cat end_cat description start_as_num end_as_num icd_category
#> <chr> <chr> <chr> <int> <int> <list>
#> 1 V80 V89 Other land 80 89 <int [10]>
#> 2 V90 V94 Water transport 90 94 <int [5]>
So we have this slightly awkward list-column, and several columns that are not useful anymore. I will select()
the columns we still need, and use unnest()
to expand the list-column into a much longer column:
icd_reference |>
separate(col = column,
into = c("start_cat","end_cat", "description"),
sep = "[- ]",
extra = "merge") |>
mutate(start_as_num = name_to_number(start_cat),
end_as_num = name_to_number(end_cat),
icd_category = map2(start_as_num, end_as_num,
~ seq(from = .x, to = .y))) |>
select(icd_category, description) |>
unnest(icd_category)
#> # A tibble: 15 × 2
#> icd_category description
#> <int> <chr>
#> 1 80 Other land
#> 2 81 Other land
#> 3 82 Other land
#> 4 83 Other land
#> 5 84 Other land
#> 6 85 Other land
#> 7 86 Other land
#> 8 87 Other land
#> 9 88 Other land
#> 10 89 Other land
#> 11 90 Water transport
#> 12 91 Water transport
#> 13 92 Water transport
#> 14 93 Water transport
#> 15 94 Water transport
Done! We can save that in a variable, so we can reuse it later:
table_icd_reference <- icd_reference |>
separate(col = column,
into = c("start_cat","end_cat", "description"),
sep = "[- ]",
extra = "merge") |>
mutate(start_as_num = name_to_number(start_cat),
end_as_num = name_to_number(end_cat),
icd_category = map2(start_as_num, end_as_num,
~ seq(from = .x, to = .y))) |>
select(icd_category, description) |>
unnest(icd_category)
Now if you have another dataframe that contains categories in its own column, you can use a left_join()
to combine them:
my_other_data <- tibble(subject_id = LETTERS[1:3],
icd_category = c(83, 89,91))
my_other_data
#> # A tibble: 3 × 2
#> subject_id icd_category
#> <chr> <dbl>
#> 1 A 83
#> 2 B 89
#> 3 C 91
left_join(my_other_data,
table_icd_reference,
by = "icd_category")
#> # A tibble: 3 × 3
#> subject_id icd_category description
#> <chr> <dbl> <chr>
#> 1 A 83 Other land
#> 2 B 89 Other land
#> 3 C 91 Water transport
Created on 2023-01-16 by the reprex package (v2.0.1)
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.