Adjust date order in output table

The order of the output table is incorrect, how to adjust?

df1<- structure(
  list(date2= c("01-01-2022","01-01-2022","03-01-2021","03-01-2021","01-02-2021","01-02-2021"),
       Category= c("ABC","CDE","ABC","CDE","ABC","CDE"),
       coef= c(5,4,0,2,4,5)),
  class = "data.frame", row.names = c(NA, -6L))

x<-df1 %>%
  group_by(date2) %>%
  summarize(across("coef", sum),.groups = 'drop')%>% 
  arrange(date2 = as.Date(date2, format = "%d-%m-%y"))

> x
# A tibble: 3 x 2
  date2       coef
  <chr>      <dbl>
1 01-01-2022     9
2 03-01-2021     2
3 01-02-2021     9

This order is incorrect as 01-01-2022 would have to be last not first.

Using a separate mutate() step does the job:

library(tidyverse)

df1<- structure(
  list(date2= c("01-01-2022","01-01-2022","03-01-2021","03-01-2021","01-02-2021","01-02-2021"),
       Category= c("ABC","CDE","ABC","CDE","ABC","CDE"),
       coef= c(5,4,0,2,4,5)),
  class = "data.frame", row.names = c(NA, -6L))

x<-df1 %>%
  group_by(date2) %>%
  summarize(across("coef", sum),.groups = 'drop')%>% 
  mutate(date2 = as.Date(date2, format = "%d-%m-%y")) %>%
  arrange(date2)

x
#> # A tibble: 3 x 2
#>   date2       coef
#>   <date>     <dbl>
#> 1 2020-01-01     9
#> 2 2020-01-03     2
#> 3 2020-02-01     9

Or if you want to retain your dates as characters for whatever reason, lubridate seems to do the job on my end:

df1 %>%
  group_by(date2) %>%
  summarize(across("coef", sum),.groups = 'drop') %>% 
  arrange(desc(lubridate::dmy(date2)))
# A tibble: 3 x 2
  date2       coef
  <chr>      <dbl>
1 01-01-2022     9
2 01-02-2021     9
3 03-01-2021     2
1 Like

Hi,

The mistake is on the arrange call:

x<-df1 %>%
  group_by(date2) %>%
  summarize(across("coef", sum),.groups = 'drop')%>% 
  ungroup() %>% 
  arrange(as.Date(date2, format = "%d-%m-%Y")) # <- MIND THE UPPER Y!

In your original call you were using a lower y "%y" this makes R interpret that you are using a two digit year and that is the source of the error.

date2 <- c("01-01-2022","01-01-2022","03-01-2021","03-01-2021","01-02-2021","01-02-2021")

as.Date(date2, format = "%d-%m-%y")
## [1] "2020-01-01" "2020-01-01" "2020-01-03" "2020-01-03" "2020-02-01" "2020-02-01"

as.Date(date2, format = "%d-%m-%Y")
## [1] "2022-01-01" "2022-01-01" "2021-01-03" "2021-01-03" "2021-02-01" "2021-02-01"

As you can see in the first call all years are interpreted as 2020. This is hidden in your call because you transform the chr to date directly in the arrange call, therefore it is never really stored in the dataframe.

Best!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.