Assign number for each subset of alphabetical rows

sgrplm · February 5, 2024, 3:59pm

I have a data frame with an empty column (e.g., X) and a full character column (e.g., Y).

I would like to add a number to the empty column's rows according to the order of the subset of alphabetical values in the character column.

For example, the first few rows in column Y start off with the letter A and the last row's string begins with the letter P. The next row in column Y begins with the letter C and the last row of that set begins with the letter M. The following row in column Y begins with the letter B and the set ends with a string that starts with the letter X. The dataframe has thousands of rows.

How can I add to column X the number 1 for the first set of rows (starting with A - P), 2 for the second set (starting with C-M), 3 for the third set (starting with B-X), and so forth every time the next row breaks the alphabetical order pattern?

example <- data.frame(
column_x <- rep("NA", 20),
column_y <- c("A ward", "A word", "B word", "C word", "D word", "P word", "C word", "K word", "L word", "M word","B word", "C word", "D ward", "D word", "K word", "P word", "X word", "A word", "K word", "X word" ))

print(example)

Thanks!

scottyd22 · February 5, 2024, 4:32pm

Hi @sgrplm. Below is one approach using the tidyverse.

library(tidyverse)

example |>
  mutate(letter = tolower(substr(column_y,1,1))) |>
  left_join(
    tibble(letter = letters, row = 1:26)
  ) |>
  mutate(column_x = case_when(
    row_number() == 1 | row < lag(row) ~ 1,
    TRUE ~ 0
  )) |>
  mutate(column_x = cumsum(column_x)) |>
  select(column_x, column_y)

#> Joining with `by = join_by(letter)`
#>    column_x column_y
#> 1         1   A ward
#> 2         1   A word
#> 3         1   B word
#> 4         1   C word
#> 5         1   D word
#> 6         1   P word
#> 7         2   C word
#> 8         2   K word
#> 9         2   L word
#> 10        2   M word
#> 11        3   B word
#> 12        3   C word
#> 13        3   D ward
#> 14        3   D word
#> 15        3   K word
#> 16        3   P word
#> 17        3   X word
#> 18        4   A word
#> 19        4   K word
#> 20        4   X word

Created on 2024-02-05 with reprex v2.0.2

system · February 21, 2024, 4:16pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.