Merge sub-topic groupings with distinct codes?

amelio · January 16, 2023, 4:05pm

I am trying to merge sub-topic grouping (see below) with a data set containing observations with distinct ICD-10 codes, each of which falls within the range that is indicates in the sub-topic groupings. I could certainly manually merge but if I can avoid it... would much rather do that. Any guidance on the right commands merge the below sub-topic groupings to the data set with distinct ICD-10 codes (that data set is ~65,000 observations)
Thank you in advance!

AlexisW · January 17, 2023, 3:53am

It's not totally clear to me what you're trying to do, it would help if you could provide a reproducible example. Especially for people like me who have no idea what ICD-10 is.

My best guess is that you have a reference table with ICD codes and their descriptions. Let's create one:

library(tidyverse)

icd_reference <- tribble(~column,
  "V80-V89 Other land",
  "V90-V94 Water transport")

And that you want to process this table to have one row per ICD number in the range. First, we need to separate the codes from their description:

icd_reference |>
  separate(col = column,
           into = c("start_cat","end_cat", "description"),
           sep = "[- ]",
           extra = "merge")
#> # A tibble: 2 × 3
#>   start_cat end_cat description    
#>   <chr>     <chr>   <chr>          
#> 1 V80       V89     Other land     
#> 2 V90       V94     Water transport

If you're loading that table from a file, it's easier to make sure the columns are read properly in the first place so the previous step is unnecessary.

We now have the start and end of each category range as a character, we need to make it numerical so we can do range computations. For this I define an accessory function:

name_to_number <- function(names){
  names |>
    str_replace("V", "") |>
    as.integer()
}


icd_reference |>
  separate(col = column,
           into = c("start_cat","end_cat", "description"),
           sep = "[- ]",
           extra = "merge") |>
  mutate(start_as_num = name_to_number(start_cat),
         end_as_num = name_to_number(end_cat))
#> # A tibble: 2 × 5
#>   start_cat end_cat description     start_as_num end_as_num
#>   <chr>     <chr>   <chr>                  <int>      <int>
#> 1 V80       V89     Other land                80         89
#> 2 V90       V94     Water transport           90         94

Now that we have the start and end as numerical, we can use them in seq() to generate the whole range. I'll use map2(), which can take both vectors in parallel, and return a list:

icd_reference |>
  separate(col = column,
           into = c("start_cat","end_cat", "description"),
           sep = "[- ]",
           extra = "merge") |>
  mutate(start_as_num = name_to_number(start_cat),
         end_as_num = name_to_number(end_cat),
         icd_category = map2(start_as_num, end_as_num,
                               ~ seq(from = .x, to = .y)))
#> # A tibble: 2 × 6
#>   start_cat end_cat description     start_as_num end_as_num icd_category
#>   <chr>     <chr>   <chr>                  <int>      <int> <list>      
#> 1 V80       V89     Other land                80         89 <int [10]>  
#> 2 V90       V94     Water transport           90         94 <int [5]>

So we have this slightly awkward list-column, and several columns that are not useful anymore. I will select() the columns we still need, and use unnest() to expand the list-column into a much longer column:

icd_reference |>
  separate(col = column,
           into = c("start_cat","end_cat", "description"),
           sep = "[- ]",
           extra = "merge") |>
  mutate(start_as_num = name_to_number(start_cat),
         end_as_num = name_to_number(end_cat),
         icd_category = map2(start_as_num, end_as_num,
                               ~ seq(from = .x, to = .y))) |>
  select(icd_category, description) |>
  unnest(icd_category)
#> # A tibble: 15 × 2
#>    icd_category description    
#>           <int> <chr>          
#>  1           80 Other land     
#>  2           81 Other land     
#>  3           82 Other land     
#>  4           83 Other land     
#>  5           84 Other land     
#>  6           85 Other land     
#>  7           86 Other land     
#>  8           87 Other land     
#>  9           88 Other land     
#> 10           89 Other land     
#> 11           90 Water transport
#> 12           91 Water transport
#> 13           92 Water transport
#> 14           93 Water transport
#> 15           94 Water transport

Done! We can save that in a variable, so we can reuse it later:

table_icd_reference <- icd_reference |>
  separate(col = column,
           into = c("start_cat","end_cat", "description"),
           sep = "[- ]",
           extra = "merge") |>
  mutate(start_as_num = name_to_number(start_cat),
         end_as_num = name_to_number(end_cat),
         icd_category = map2(start_as_num, end_as_num,
                               ~ seq(from = .x, to = .y))) |>
  select(icd_category, description) |>
  unnest(icd_category)

Now if you have another dataframe that contains categories in its own column, you can use a left_join() to combine them:

my_other_data <- tibble(subject_id = LETTERS[1:3],
                        icd_category = c(83, 89,91))
my_other_data
#> # A tibble: 3 × 2
#>   subject_id icd_category
#>   <chr>             <dbl>
#> 1 A                    83
#> 2 B                    89
#> 3 C                    91

left_join(my_other_data,
          table_icd_reference,
          by = "icd_category")
#> # A tibble: 3 × 3
#>   subject_id icd_category description    
#>   <chr>             <dbl> <chr>          
#> 1 A                    83 Other land     
#> 2 B                    89 Other land     
#> 3 C                    91 Water transport

^{Created on 2023-01-16 by the reprex package (v2.0.1)}

system · February 7, 2023, 3:54am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.