Summing anomalies category with forcats fct_count creates error

August · January 4, 2021, 4:44pm

Hello there, happy new year!

As I have been off for too long I am functioning optimally, I feel like this issue should be simple but it is alluding me. Essentially, I have taken some time series (tsibble) data and applied anomally detection from the anomalize package. This gives a dataframe structured as seen at the bottom of the question. My goal is to analyze the proportion of anomalies, to do that I do the following...

#Summarize proportion of anomalies
anomaly_sum <- function(ti) {
  ti %>%
    #dplyr::count(anomaly)
    dplyr::mutate(anomaly = factor(anomaly, levels = anom_levels))%>%
    forcats::fct_count(anomaly) %>%
    tidyr::pivot_wider(names_from = f, values_from = n) %>% # f replaced anomaly when changing from dplyr to forcats count
    tidyr::replace_na(list(No = 0, Yes = 0)) %>%
    dplyr::mutate(total_obs = No + Yes) %>%
    dplyr::mutate(anomaly_prop = round(Yes/total_obs,2))
}
anomaly_prop_list <- purrr::map(list_ts_anom, anomaly_sum)

You can see orginally I used dplyr's count, however if there was only one class of anomaly the dataframe created by anomaly_sum would miss a column, which created issues with later wrangling and analysis. The best option seemed to be to convert the anomaly column into a factor with levels (anom_levels <- c("No", "Yes")). However, dude to the structure of the dataframe , specifically the anomaly column, fct_count fails to recognsie the column as a factor. Resulting in the following error...

Error: f must be a factor (or character vector).

Any help and advice would be greatly appreciated, I believe I have com across similar issues before but I am not certain how to get the data without the additional embedded information in the dataframe.

structure(list(Item = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                  1L, 1L, 1L), .Label = c("ITEMA", "ITEMB"), class = "factor"), 
               Promotion = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                       1L), .Label = c("0", "1"), class = "factor"), rowname = c("1", 
                                                                                                 "2", "3", "4", "5", "6", "7", "8", "9", "10"), ds = structure(c(1546300800, 
                                                                                                                                                                 1546387200, 1546473600, 1546560000, 1546646400, 1546732800, 
                                                                                                                                                                 1546819200, 1546905600, 1546992000, 1547078400), tzone = "UTC", class = c("POSIXct", 
                                                                                                                                                                                                                                           "POSIXt")), y = c(69L, NA, NA, NA, NA, NA, NA, NA, 113L, 
                                                                                                                                                                                                                                                             206L), observed = c(69, 602, 744, 636, 571, 433, 477, 418, 
                                                                                                                                                                                                                                                                                 113, 206), season = c(6.63653583610467, 6.57240306518325, 
                                                                                                                                                                                                                                                                                                       6.17844103251103, 0.483939210043006, -11.0456162239029, -10.5906330094403, 
                                                                                                                                                                                                                                                                                                       1.76493071431692, 6.63653583610467, 6.57240306518325, 6.17844103251103
                                                                                                                                                                                                                                                                                 ), trend = c(61.3755580917438, 61.5646297583724, 61.753701425001, 
                                                                                                                                                                                                                                                                                              61.9427730916297, 62.1318447582583, 62.3209164248869, 62.5099880915156, 
                                                                                                                                                                                                                                                                                              62.6990597581442, 62.8881314247728, 63.0772030914015), remainder = c(0.987906072151574, 
                                                                                                                                                                                                                                                                                                                                                                   533.862967176444, 676.067857542488, 573.573287698327, 519.913771465644, 
                                                                                                                                                                                                                                                                                                                                                                   381.269716584553, 412.725081194168, 348.664404405751, 43.5394655100439, 
                                                                                                                                                                                                                                                                                                                                                                   136.744355876088), remainder_l1 = c(-159.962634406091, -159.962634406091, 
                                                                                                                                                                                                                                                                                                                                                                                                       -159.962634406091, -159.962634406091, -159.962634406091, 
                                                                                                                                                                                                                                                                                                                                                                                                       -159.962634406091, -159.962634406091, -159.962634406091, 
                                                                                                                                                                                                                                                                                                                                                                                                       -159.962634406091, -159.962634406091), remainder_l2 = c(175.37808486154, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                               175.37808486154, 175.37808486154, 175.37808486154, 175.37808486154, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                               175.37808486154, 175.37808486154, 175.37808486154, 175.37808486154, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                               175.37808486154), anomaly = c(`25%` = "No", `25%` = "Yes", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             `25%` = "Yes", `25%` = "Yes", `25%` = "Yes", `25%` = "Yes", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             `25%` = "Yes", `25%` = "Yes", `25%` = "No", `25%` = "No"), 
               recomposed_l1 = c(-91.9505404782421, -91.8256015825349, -92.0304919485785, 
                                 -97.5359221044179, -108.876405871735, -108.232350990644, 
                                 -95.6877156002581, -90.6270388118417, -90.5020999161345, 
                                 -90.7069902821781), recomposed_l2 = c(243.390178789389, 243.515117685096, 
                                                                       243.310227319052, 237.804797163213, 226.464313395896, 227.108368276987, 
                                                                       239.653003667373, 244.713680455789, 244.838619351496, 244.633728985453
                                 )), row.names = c(NA, -10L), index_quo = ~ds, index_time_zone = "UTC", class = c("tbl_time", 
                                                                                                                  "tbl_df", "tbl", "data.frame"))

nirgrahamuk · January 4, 2021, 6:05pm

forcats::fct_count takes f, a factor as an argument, but you are piping a data.frame.

I think you are intending to do

anomaly_sum <- function(ti) {
  ti %>%
    dplyr::mutate(anomaly = factor(anomaly, levels = anom_levels))%>%
    dplyr::group_by(anomaly,.drop = FALSE) %>%
    dplyr::summarise(n=n()) %>%
    tidyr::pivot_wider(names_from = anomaly, values_from = n) %>% 
    tidyr::replace_na(list(No = 0, Yes = 0)) %>%
    dplyr::mutate(total_obs = No + Yes) %>%
    dplyr::mutate(anomaly_prop = round(Yes/total_obs,2))
}

system · January 11, 2021, 6:05pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.