Hello there, happy new year!
As I have been off for too long I am functioning optimally, I feel like this issue should be simple but it is alluding me. Essentially, I have taken some time series (tsibble) data and applied anomally detection from the anomalize package. This gives a dataframe structured as seen at the bottom of the question. My goal is to analyze the proportion of anomalies, to do that I do the following...
#Summarize proportion of anomalies
anomaly_sum <- function(ti) {
ti %>%
#dplyr::count(anomaly)
dplyr::mutate(anomaly = factor(anomaly, levels = anom_levels))%>%
forcats::fct_count(anomaly) %>%
tidyr::pivot_wider(names_from = f, values_from = n) %>% # f replaced anomaly when changing from dplyr to forcats count
tidyr::replace_na(list(No = 0, Yes = 0)) %>%
dplyr::mutate(total_obs = No + Yes) %>%
dplyr::mutate(anomaly_prop = round(Yes/total_obs,2))
}
anomaly_prop_list <- purrr::map(list_ts_anom, anomaly_sum)
You can see orginally I used dplyr's count, however if there was only one class of anomaly the dataframe created by anomaly_sum
would miss a column, which created issues with later wrangling and analysis. The best option seemed to be to convert the anomaly column into a factor with levels (anom_levels <- c("No", "Yes")
). However, dude to the structure of the dataframe , specifically the anomaly column, fct_count fails to recognsie the column as a factor. Resulting in the following error...
Error:
f
must be a factor (or character vector).
Any help and advice would be greatly appreciated, I believe I have com across similar issues before but I am not certain how to get the data without the additional embedded information in the dataframe.
structure(list(Item = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("ITEMA", "ITEMB"), class = "factor"),
Promotion = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("0", "1"), class = "factor"), rowname = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"), ds = structure(c(1546300800,
1546387200, 1546473600, 1546560000, 1546646400, 1546732800,
1546819200, 1546905600, 1546992000, 1547078400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), y = c(69L, NA, NA, NA, NA, NA, NA, NA, 113L,
206L), observed = c(69, 602, 744, 636, 571, 433, 477, 418,
113, 206), season = c(6.63653583610467, 6.57240306518325,
6.17844103251103, 0.483939210043006, -11.0456162239029, -10.5906330094403,
1.76493071431692, 6.63653583610467, 6.57240306518325, 6.17844103251103
), trend = c(61.3755580917438, 61.5646297583724, 61.753701425001,
61.9427730916297, 62.1318447582583, 62.3209164248869, 62.5099880915156,
62.6990597581442, 62.8881314247728, 63.0772030914015), remainder = c(0.987906072151574,
533.862967176444, 676.067857542488, 573.573287698327, 519.913771465644,
381.269716584553, 412.725081194168, 348.664404405751, 43.5394655100439,
136.744355876088), remainder_l1 = c(-159.962634406091, -159.962634406091,
-159.962634406091, -159.962634406091, -159.962634406091,
-159.962634406091, -159.962634406091, -159.962634406091,
-159.962634406091, -159.962634406091), remainder_l2 = c(175.37808486154,
175.37808486154, 175.37808486154, 175.37808486154, 175.37808486154,
175.37808486154, 175.37808486154, 175.37808486154, 175.37808486154,
175.37808486154), anomaly = c(`25%` = "No", `25%` = "Yes",
`25%` = "Yes", `25%` = "Yes", `25%` = "Yes", `25%` = "Yes",
`25%` = "Yes", `25%` = "Yes", `25%` = "No", `25%` = "No"),
recomposed_l1 = c(-91.9505404782421, -91.8256015825349, -92.0304919485785,
-97.5359221044179, -108.876405871735, -108.232350990644,
-95.6877156002581, -90.6270388118417, -90.5020999161345,
-90.7069902821781), recomposed_l2 = c(243.390178789389, 243.515117685096,
243.310227319052, 237.804797163213, 226.464313395896, 227.108368276987,
239.653003667373, 244.713680455789, 244.838619351496, 244.633728985453
)), row.names = c(NA, -10L), index_quo = ~ds, index_time_zone = "UTC", class = c("tbl_time",
"tbl_df", "tbl", "data.frame"))