Add rows based on a conditions

dsgeek · May 17, 2019, 4:07pm

I am trying to add rows to my data set based on certain conditions. Let us say we have two variables person and person_count, where:

person <- c('a','b','c')
person_count <- c(2,3,5)
data <- data.frame(person,person_count)

I want to duplicate the rows for every person based on the person count.
If person_count == 5, then add no duplicate rows for c
If person_count == 3, then add two rows for b
If person_count == 2, then add three rows for a

dsgeek · May 17, 2019, 5:03pm

I appreciate your reply and thank you for your welcome! I also apologize if i wasn't clear enough in my first post, but please bare with me.

I am trying to apply what you have given to my data set, which is more complicated. so I am getting an error:
I am getting an error: Column 'temp' must be length 1 (the group size), not 8

So, I'll make my question more relevant to my data set if you don't mind.

In my data set I have columns: person, id, day, person_count. The ultimate goal is to have every person have 5 rows -- one for every weekday (mon,tue,...,fri)

So my logic is if someone has only one record say (mon), he/she will have a person_count of 1,
so add 4 more rows one for each of (tue,wed,thrs,frid), but if a person has 2 records (one for mon and one for wed), add 3 more records for the remaining days of the week (ie tue, thrs, frid). If someone has an a person_count of 5, then there is no problem there, don't add anything. Last but not least, there are cases where a person has a record, but no day is shown. It simply shows as an NA.

In this case, add 5 more columns one column for each day (i'll delete the 6th column with NA later) since again the ultimate goal is to have every person have 5 rows one for every day. OR simply substitute the NA with one of the days and add four more rows for the remaining days.

Does that help ? If not is there anything i can do to make things easier?

mrblobby · May 17, 2019, 5:21pm

Hi @dsgeek, based on my interpretation of your description, maybe you could not bother with person_count at all, and instead use complete (https://tidyr.tidyverse.org/reference/complete.html):

library(tidyverse)
data <-
tibble::tribble(
    ~name, ~id,    ~day,
    "Tom",   1,   "Mon",
    "Tom",   1,   "Tue",
    "Tom",   1,  "Weds",
    "Tom",   1, "Thurs",
    "Tom",   1,   "Fri",
   "Pick",   2,   "Mon",
   "Pick",   2,   "Tue",
   "Pick",   2,  "Weds",
   "Pick",   2, "Thurs",
  "Harry",   3,   "Mon"
  )


data %>%
  complete(day, nesting(name)) %>%
  arrange(name) 
#> # A tibble: 15 x 3
#>    day   name     id
#>    <chr> <chr> <dbl>
#>  1 Fri   Harry    NA
#>  2 Mon   Harry     3
#>  3 Thurs Harry    NA
#>  4 Tue   Harry    NA
#>  5 Weds  Harry    NA
#>  6 Fri   Pick     NA
#>  7 Mon   Pick      2
#>  8 Thurs Pick      2
#>  9 Tue   Pick      2
#> 10 Weds  Pick      2
#> 11 Fri   Tom       1
#> 12 Mon   Tom       1
#> 13 Thurs Tom       1
#> 14 Tue   Tom       1
#> 15 Weds  Tom       1

^{Created on 2019-05-17 by the reprex package (v0.2.1)}

Edit: You can use the following if you don't want NAs for other columns (like id) when you complete:

data %>%
  complete(day, nesting(name, id)) %>%
  arrange(name)

dsgeek · May 17, 2019, 7:35pm

Very helpful That works. Thank you very much mrblobby!

dsgeek · May 17, 2019, 7:36pm

Thank you for your help!

system · May 24, 2019, 7:36pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.