Combine two rows into 1 row

data current have

id<-c(234, 234, 256, 256, 264, 264, 290, 290)

event_date<-c("2023-11-01", "2023-11-01", "2023-11-01", "2023-11-01", "2023-11-03", "2023-11-03","2023-11-03","2023-11-03")

bp_codes <- c("001", "002", "001", "002", "001", "002", "001", "002")

code_desc <- c("systolic_bp", "diastolic_bp", "systolic_bp", "diastolic_bp", "systolic_bp", "diastolic_bp", "systolic_bp", "diastolic_bp")

BP_values1 <- c(170, "NA", 135, "NA", 150, "NA", 145, "NA")

BP_Values2 <- c("NA", "100", "NA", "85", "NA", "95", "NA", "80")

bp_data <- data.frame(id, event_date,bp_codes,code_desc,BP_values1, BP_Values2)

rm(bp_data2)

desired results is to combine roles with same ID and identical event_date creating 2 new columns sbp (BP_values labelled systolic_bp) and dbp (BP_values labelled diastolic_bp)

id2<-c(234, 256, 264, 290)

event_date2<-c("2023-11-01", "2023-11-01", "2023-11-03","2023-11-03")

bp_codes2 <- c("001-002", "001-002", "001-002", "001-002")

code_desc2 <- c("bp_reading", "bp_reading", "bp_reading","bp_reading")

sbp <- c(170, 135, 150, 145)

dbp <- c("100", "85", "95", "80")

bp_data2 <- data.frame(id2, event_date2,bp_codes2,code_desc2,sbp,dbp)

hi @oseiasibey.ow,

Is this what you are looking for You can rename the columns

bp_data <- bp_data |>
        dplyr::mutate(bpv = ifelse(bp_codes == "001", BP_values1, BP_Values2)) |>
        dplyr::select(id, event_date, code_desc, bpv) |>
        tidyr::pivot_wider(names_from = "code_desc", values_from = "bpv")

> # A tibble: 4 × 4
>      id event_date systolic_bp diastolic_bp
>   <dbl> <chr>      <chr>       <chr>       
> 1   234 2023-11-01 170         100         
> 2   256 2023-11-01 135         85          
> 3   264 2023-11-03 150         95          
> 4   290 2023-11-03 145         80   

JW

1 Like

So I had a warning message which makes sense because there were wee some event_date with same ID with multiple entries. So abt 2 or 3 bp recordings for for same ID on same event date.

Warning message:
Values from bpv are not uniquely identified; output will contain list-cols.
• Use values_fn = list to suppress this warning.
• Use values_fn = {summary_fun} to summarise duplicates.
• Use the following dplyr code to identify duplicates.
{data} %>%
dplyr::group_by(n_eid_14631, event_dt, TERMV2_DESC) %>%
dplyr::summarise(n = dplyr::n(), .groups = "drop") %>%
dplyr::filter(n > 1L)

How do I go about it, thank you

id2<-c(234, 256, 264, 290)

event_date2<-c("2023-11-01", "2023-11-01", "2023-11-03","2023-11-03")

bp_codes2 <- c("001-002", "001-002", "001-002", "001-002")

code_desc2 <- c("bp_reading", "bp_reading", "bp_reading","bp_reading")

sbp <- c(170, "135" "140" "142", 150, 145)

dbp <- c("100", "85" "90" "87", "95", "80")

bp_data2 <- data.frame(id2, event_date2,bp_codes2,code_desc2,sbp,dbp)

Your code raises errors when I try to run it — could you correct it and paste it back here, between a pair of triple backticks like this?

```r
[<-- paste here]
```

##This is a similar data I#m working with
id<-c(234, 234,234, 234, 256, 256, 264, 264, 290, 290)
event_date<-c("2023-11-01", "2023-11-01", "2023-11-01", "2023-11-01", "2023-11-01", "2023-11-01", "2023-11-03", "2023-11-03","2023-11-03","2023-11-03")
bp_codes <- c("001", "002", "001", "002","001", "002", "001", "002", "001", "002")
code_desc <- c("systolic_bp", "diastolic_bp","systolic_bp", "diastolic_bp", "systolic_bp", "diastolic_bp", "systolic_bp", "diastolic_bp", "systolic_bp", "diastolic_bp")
BP_values1 <- c(170, "NA", 170, "NA", 135, "NA", 150, "NA", 145, "NA")
BP_Values2 <- c("NA", "100", "NA", "100", "NA", "85", "NA", "95", "NA", "80")
bp_data <- data.frame(id, event_date,bp_codes,code_desc,BP_values1, BP_Values2)

I had this script to combine the rows with same id and same event_date removing the NA with systolic_bp (codes 001) and replacing with the diastolic_bp values

bp_data2 <- bp_data |>
dplyr::mutate(bpv = ifelse(bp_codes == "001", BP_values1, BP_Values2)) |>
dplyr::select(id, event_date, code_desc, bpv) |>
tidyr::pivot_wider(names_from = "code_desc", values_from = "bpv")

This script worked very well until hee wee multiple bps checked same id on the same event_date having this outcome c("170", "170") c("100", "100")

I want to either maintain the rows for those incidence or maintain one or find the average of them. In some cases there 3 records

I have another question: What format does your original data come in? And how do the NA values appear there? It may change the suggested solution.

Is a longitudinal data of a cohort studies with patients visiting multiple times to the hospital. The systolic bp and diastolic bps were like the bp_codes were all recorded in a single columns identified by their codes. I separated them into wo different columns and in so doing the NAs appeared.

So did it originally look like this?

#> # A tibble: 10 × 4
#>       id event_date code_desc    value
#>    <dbl> <chr>      <chr>        <chr>
#>  1   234 2023-11-01 systolic_bp  170  
#>  2   234 2023-11-01 diastolic_bp 100  
#>  3   234 2023-11-01 systolic_bp  170  
#>  4   234 2023-11-01 diastolic_bp 100  
#>  5   256 2023-11-01 systolic_bp  135  
#>  6   256 2023-11-01 diastolic_bp 85   
#>  7   264 2023-11-03 systolic_bp  150  
#>  8   264 2023-11-03 diastolic_bp 95   
#>  9   290 2023-11-03 systolic_bp  145  
#> 10   290 2023-11-03 diastolic_bp 80

Created on 2023-11-05 with reprex v2.0.2

And how did you import the data into RStudio? Which command did you use?

The data Ive showed is a hypothetical data, the actual data is abt a million plus. Its on our server and I used the read_dta to import it to RSudio

Yes, but does the imported data yield a table with the same shape and columns immediately after being imported? If you could perform the following steps exactly as indicated below, that would be very helpful:

  1. Use read_dta() to import your data, and assign the result to bp_data.
  2. Execute the following lines of code, exactly as written:
# direct output to new file called "for_posit.txt"
sink("for_posit.txt")
# write the contents of the first 10 rows of bp_data to "for_posit.txt"
bp_data |>            
    dplyr::slice(1:10) |> 
    dput()
# redirect output to RStudio
sink()
  1. Open the file "for_posit.txt" and copy its contents.
  2. Paste its contents here, between a pair of triple backticks (```) like this:
```r
[<-- paste here]
```

Using this method for sharing a sample of your data makes it much easier for folks on this site to help you.

Hi @oseiasibey.ow, I thought I'd check again a last time to see if you're still interested in a solution that works immediately on the output of read_dta() rather than require that you create NA's that you later have to remove. If so, please supply a sample of the output of read_dta() in the manner I described above.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.