I have some data that shows numerical values for a few groups, all in 1 long column. Below is a small sample of what it looks like:
library(tibble)
df <- tribble(
~value,
"group a",
1,
12,
56,
17,
24,
"group b",
23,
1,
5,
"group c",
76,
55,
89,
2,
20,
"group d",
50,
23,
44,
39
)
What i want to end up with is as follows:
tribble(
~value, ~name,
1, "group a",
12, "group a",
56, "group a",
17, "group a",
24, "group a",
23, "group b",
1, "group b",
5, "group b",
76, "group c",
55, "group c"
# and so on....
)
Below are my attempts to tidy this data:
library(dplyr)
library(tidyr)
library(stringr)
library(tibble)
library(purrr)
df_rows <- df %>% rownames_to_column()
groups <- df_rows %>% select(value) %>% str_extract_all("group [a-z]") %>% as_vector()
# find the rows where data for certain group begins and ends:
df_length <- df_rows %>%
filter(value %in% groups) %>%
mutate(
rowpos = as.numeric(rowname)) %>%
select(-rowname) %>%
rownames_to_column() %>%
mutate(
group_start = rowpos + 1,
group_end = lead(rowpos) - 1,
group_end = case_when(
is.na(group_end) ~ 21, # manually fill in the 'end row position' for the last group...
TRUE ~ group_end
)) %>%
select(-rowpos, -rowname)
# Now, I have the row number for where the data for a specific group starts and ends. Along with the original # df with row numbers as well.
# Next, slice() by the group_start and group_end that I just created, then mutate() the group name
df_rows %>%
slice(df_length$group_start[1]:df_length$group_end[1]) %>%
mutate(groupname = "Group A")
df_rows %>%
slice(df_length$group_start[2]:df_length$group_end[2]) %>%
mutate(groupname = "Group B")
# do the same for all the other groups
and then I can just rbind() them all to get the solution that I showed above but it's not very efficient...
I was thinking of using a for loop or map2 but I haven't been successful. Any help with simplifying the above or getting the result in a different way would be appreciated!