How do I use percent_rank()?

pathos · September 28, 2021, 5:43pm

I have the following generated dataframe

df = data.frame(yearr = sample(2015:2021, 2000, replace = TRUE),
                monthh = sample(1:12, 2000, replace = TRUE),
                dayy = sample(1:29, 2000, replace = TRUE)) |>
  mutate(datee = ymd(paste(yearr, monthh, dayy)),
         yy = sample(0:100, 2000, replace = TRUE) + (130 * yearr) + (2 * monthh)) |>
  filter(!is.na(datee)) |>
  arrange(-desc(datee)) |>
  mutate(ii = row_number()) |>
  distinct(datee, .keep_all = TRUE)

I would like to find out the proportion of data where a certain date would be.

certain_date = ymd('2017-05-15')
percent_rank(df[['datee']] > promo_start) # or df$datee

It gives me vector(s) of outputs instead of a single value. How do I make it return single value?

williaml · September 29, 2021, 6:24am

Is this what you are after?

df %>% 
  mutate(rank = percent_rank(datee)) %>% 
  filter(datee > '2017-05-15')

By the way, there is not weekk in your reprex.

pathos · September 29, 2021, 6:58am

Oops, thanks -- I need single value for prop argument in initial_time_split()

I got a single value with the following code:

propp = df |>
  mutate(rank = percent_rank(datee)) |>
  filter(datee > certain_date) |> 
  summarise(min(rank))

initial_time_split(df, prop = as.numeric(propp))

system · October 6, 2021, 6:58am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.