How to combine like dates in the same column

barobbie · November 19, 2021, 7:06pm

Hi everyone I am fairly new here. I m having trouble figuring out how to combine the same dates in the same column.

df_merge <-data.frame (
Id=c(123,124,125,124,123),
date=c(2016-04-27, 2016-04-12 ,2016-04-20, 2016-04-12, 2016-04-27),
wp= c(n/a ,n/a, 15,22,13),
tr=c(1,2,1,2,1)
tm=c(25,14, n/a ,n/a ,n/a))

What I am trying to do is combine the dates and the Id's that are the same in hopes it will project as the following

df_merge (
Id=c(123,124,125),
date=c(2016-04-27, 2016-04-12 ,2016-04-20)
wp= c(13,22,15)
tr=c(1,2,1)
tm=c(25,14, n/a))

***Sorry for the mistake I made the correction for tm from 12 to n/a
Is there a way for me to achieve this? Any help would be appreciated.

FJCC · November 19, 2021, 7:57pm

Here is a solution that relies on wp, tr, and tm having one correct value and all other values being NA. I substituted NA where you had written n/a. Note that the value for tm on 2016-04-20 is not 12 as you have it in your example. I don't see how you got that 12.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df_merge <-data.frame (
  Id=c(123,124,125,124,123),
  date=c("2016-04-27", "2016-04-12" ,"2016-04-20", "2016-04-12", "2016-04-27"),
  wp= c(NA ,NA, 15,22,13),
  tr=c(1,2,1,2,1),
  tm=c(25,14, NA ,NA ,NA))
df_merge
#>    Id       date wp tr tm
#> 1 123 2016-04-27 NA  1 25
#> 2 124 2016-04-12 NA  2 14
#> 3 125 2016-04-20 15  1 NA
#> 4 124 2016-04-12 22  2 NA
#> 5 123 2016-04-27 13  1 NA
df_merge <- df_merge |> group_by(date,Id) |> 
  summarize(wp=mean(wp,na.rm=TRUE),
            tr=mean(tr,na.rm=TRUE),
            tm=mean(tm,na.rm=TRUE))
#> `summarise()` has grouped output by 'date'. You can override using the `.groups` argument.
df_merge
#> # A tibble: 3 x 5
#> # Groups:   date [3]
#>   date          Id    wp    tr    tm
#>   <chr>      <dbl> <dbl> <dbl> <dbl>
#> 1 2016-04-12   124    22     2    14
#> 2 2016-04-20   125    15     1   NaN
#> 3 2016-04-27   123    13     1    25

^{Created on 2021-11-19 by the reprex package (v2.0.1)}

barobbie · November 24, 2021, 3:28am

Hi thanks for your reply, I really do appreciate it. Sorry for the confusion on the value of tm. I corrected it.
I want to understand something, using the summarize as you stated here would not provide me with means for wp,tr ,tm but in fact, allow me to combine like dates and Ids using group_by.
Excuse me if this sounds like basic stuff. I am learning R for the first time. The only reason I asked is that when I tried this just like you provided above my results were quite different. My results were of the mean of the columns and not the merging of the same Ids and dates.

FJCC · November 24, 2021, 4:57am

It is very difficult to say what is going on with your code without seeing it. Are you working with the data frame you showed in your first post? If not, please post that data. Also post your code. Both the code and the data will display well if you put a line with three back ticks just before and after them, like this
```
Your code here.
```

barobbie · November 24, 2021, 6:50am

They are the same. The original has about 408 lines give or take.
Some Ids contain both the same dates and different ones. My goal was to only combine the ones with the same dates. Here are the first 12 lines.




      Id             date              WP                   TR           TM
       <dbl>        <date>            <dbl>               <dbl>        <dbl>
1     0366        2016-04-12           96                    Na          NA
2     0366         2016-04-12           NA                   2           34
3     3960         2016-04-15           86                   Na          NA
4     3960         2016-04-15           NA                   2          40
5    1503         2016-04-17           NA                    1           70
6    1503         2016-04-17          25                     NA           NA 
7    1503          2016-05-12          NA                     2           70
8    1503          2016-05-17          92                     NA           NA
 9    0366         2016-05-16           81                    NA           NA
10   0366         2016-05-11           NA                      1         28    
11    3960          2016-05- 08          NA                    2         35   
12   3960          2016-05-08           75                     NA         NA

notimesw   %>%  group_by(date,Id)    %>% 
  summarize(WP=mean(WP,na.rm=TRUE),
            TR=mean(TR,na.rm=TRUE),
            TM=mean(TM,na.rm=TRUE))

Which provided me with

       WP                TR                       TM
1     58.8118          1.118644                 41.4673

I also tried.

aggregate(notimesw[-1], list(notimesw$Id), FUN = mean, na.rm = TRUE)

Which did combine them correctly but the other dates are missing

Group.1            date                WP               TR                 TM
1  10366        2016-04-27             96.31          1.080000           36.2800
2 3960          2016-05-02              85.2          1.000000            94.0000
3 1503          2016-04-25              NaN             1.000000           52.0000

Hopefully, I explained it a little better. Thanks again for your help. It is greatly appreciated.

FJCC · November 24, 2021, 2:51pm

Here is what I get running the data and code you posted. The only change I made to the data was to change two Na value to NA in the TR column. Note that the Id 0366 was read as a number and it displays as 366. I did not bother to change that since it is harmless for the purpose of this example. You should be able to copy the code below and run it, getting the same result.

library(dplyr)
notimesw <- structure(list(Id = c(366L, 366L, 3960L, 3960L, 1503L, 1503L, 
                                  1503L, 1503L, 366L, 366L, 3960L, 3960L), 
                           date = c("2016-04-12", 
                                    "2016-04-12", "2016-04-15", "2016-04-15", "2016-04-17", "2016-04-17", 
                                    "2016-05-12", "2016-05-17", "2016-05-16", "2016-05-11", "2016-05-08", 
                                    "2016-05-08"), 
                           WP = c(96L, NA, 86L, NA, NA, 25L, NA, 92L, 81L, 
                                  NA, NA, 75L), 
                           TR = c(NA, 2L, NA, 2L, 1L, NA, 2L, NA, NA, 1L, 
                                  2L, NA), 
                           TM = c(NA, 34L, NA, 40L, 70L, NA, 70L, NA, NA, 28L, 
                                  35L, NA)), 
                      class = "data.frame", row.names = c(NA, -12L))
notimesw
#>      Id       date WP TR TM
#> 1   366 2016-04-12 96 NA NA
#> 2   366 2016-04-12 NA  2 34
#> 3  3960 2016-04-15 86 NA NA
#> 4  3960 2016-04-15 NA  2 40
#> 5  1503 2016-04-17 NA  1 70
#> 6  1503 2016-04-17 25 NA NA
#> 7  1503 2016-05-12 NA  2 70
#> 8  1503 2016-05-17 92 NA NA
#> 9   366 2016-05-16 81 NA NA
#> 10  366 2016-05-11 NA  1 28
#> 11 3960 2016-05-08 NA  2 35
#> 12 3960 2016-05-08 75 NA NA
notimesw  %>%  group_by(date,Id)    %>% 
  summarize(WP=mean(WP,na.rm=TRUE),
            TR=mean(TR,na.rm=TRUE),
            TM=mean(TM,na.rm=TRUE))
#> `summarise()` has grouped output by 'date'. You can override using the `.groups` argument.
#> # A tibble: 8 x 5
#> # Groups:   date [8]
#>   date          Id    WP    TR    TM
#>   <chr>      <int> <dbl> <dbl> <dbl>
#> 1 2016-04-12   366    96     2    34
#> 2 2016-04-15  3960    86     2    40
#> 3 2016-04-17  1503    25     1    70
#> 4 2016-05-08  3960    75     2    35
#> 5 2016-05-11   366   NaN     1    28
#> 6 2016-05-12  1503   NaN     2    70
#> 7 2016-05-16   366    81   NaN   NaN
#> 8 2016-05-17  1503    92   NaN   NaN

^{Created on 2021-11-24 by the reprex package (v2.0.1)}

barobbie · November 24, 2021, 9:41pm

@FJCC Thank you . The code you provided works it's just that I needed to switch out summarize for dplyr::summarize.
I loaded the old plyr package after loading dplyr and ignored the warnings about how this can cause problems.

system · December 1, 2021, 9:42pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.