The order of the output table is incorrect, how to adjust?
df1<- structure(
list(date2= c("01-01-2022","01-01-2022","03-01-2021","03-01-2021","01-02-2021","01-02-2021"),
Category= c("ABC","CDE","ABC","CDE","ABC","CDE"),
coef= c(5,4,0,2,4,5)),
class = "data.frame", row.names = c(NA, -6L))
x<-df1 %>%
group_by(date2) %>%
summarize(across("coef", sum),.groups = 'drop')%>%
arrange(date2 = as.Date(date2, format = "%d-%m-%y"))
> x
# A tibble: 3 x 2
date2 coef
<chr> <dbl>
1 01-01-2022 9
2 03-01-2021 2
3 01-02-2021 9
This order is incorrect as 01-01-2022
would have to be last not first.
Using a separate mutate()
step does the job:
library(tidyverse)
df1<- structure(
list(date2= c("01-01-2022","01-01-2022","03-01-2021","03-01-2021","01-02-2021","01-02-2021"),
Category= c("ABC","CDE","ABC","CDE","ABC","CDE"),
coef= c(5,4,0,2,4,5)),
class = "data.frame", row.names = c(NA, -6L))
x<-df1 %>%
group_by(date2) %>%
summarize(across("coef", sum),.groups = 'drop')%>%
mutate(date2 = as.Date(date2, format = "%d-%m-%y")) %>%
arrange(date2)
x
#> # A tibble: 3 x 2
#> date2 coef
#> <date> <dbl>
#> 1 2020-01-01 9
#> 2 2020-01-03 2
#> 3 2020-02-01 9
Or if you want to retain your dates as characters for whatever reason, lubridate
seems to do the job on my end:
df1 %>%
group_by(date2) %>%
summarize(across("coef", sum),.groups = 'drop') %>%
arrange(desc(lubridate::dmy(date2)))
# A tibble: 3 x 2
date2 coef
<chr> <dbl>
1 01-01-2022 9
2 01-02-2021 9
3 03-01-2021 2
1 Like
Hi,
The mistake is on the arrange call:
x<-df1 %>%
group_by(date2) %>%
summarize(across("coef", sum),.groups = 'drop')%>%
ungroup() %>%
arrange(as.Date(date2, format = "%d-%m-%Y")) # <- MIND THE UPPER Y!
In your original call you were using a lower y "%y" this makes R interpret that you are using a two digit year and that is the source of the error.
date2 <- c("01-01-2022","01-01-2022","03-01-2021","03-01-2021","01-02-2021","01-02-2021")
as.Date(date2, format = "%d-%m-%y")
## [1] "2020-01-01" "2020-01-01" "2020-01-03" "2020-01-03" "2020-02-01" "2020-02-01"
as.Date(date2, format = "%d-%m-%Y")
## [1] "2022-01-01" "2022-01-01" "2021-01-03" "2021-01-03" "2021-02-01" "2021-02-01"
As you can see in the first call all years are interpreted as 2020. This is hidden in your call because you transform the chr to date directly in the arrange call, therefore it is never really stored in the dataframe.
Best!
1 Like
system
Closed
February 4, 2022, 4:51pm
4
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.