Calling user defined function to calculate column values

mathminded · November 20, 2022, 3:19pm

I am missing something fundamental about how R works and I just can't put my finger on it.

I tried posting about this yesterday but it got marked as potential spam (and therefore hidden) after I edited an error in the sample code, so I'm posting again but with what I hope is a much more clear example of what I'm trying to do.

Thanks in advance to anyone who can help me through this mental road block!

David

# SAMPLE DATA

bt_df <- tibble(
  co = 34:40,
  yr = 15:21,
  gr = -2,
  ct = c(602, 603, 570, 554, 574, NA, NA),
  p_ct = c(NA, NA,NA, NA, NA, 569.22, 567.05)
)

en_ct_df <- tibble(
    co = 34:40,
    gr = -1,
    ct = c(17, 24, 26, NA, NA, NA, NA)
  ) %>%
  add_row(
    co = 34:40,
    gr = 0,
    ct = c(17, 19, NA, NA, NA, NA, NA)
  ) %>%
  add_row(
    co = 34:40,
    gr = 1,
    ct = c(16, NA, NA, NA, NA, NA, NA)
  ) %>%
  add_row(
    co = 34:40,
    gr = 2,
    ct = c(NA, NA, NA, NA, NA, NA, NA)
  ) %>%
  add_row(
    co = 34:40,
    gr = 3,
    ct = c(NA, NA, NA, NA, NA, NA, NA)
  )

rt_df <- tibble(
  gr = -2:3,
  rt = c(NA, 0.03983, 0.03997, 1.03846, 0.95652, 0.96774)
)


# HELPER FUNCTIONS

get_rt <- function(to_g) {
  return(as.numeric(rt_df %>% filter(gr == to_g) %>% select(rt)))
}

get_bt_ct <- function(y) {
  return(as.numeric(bt_df %>% filter(yr == y) %>% select(p_ct)))
}

compute_ct <- function(lag_ct, r) {
  return(lag_ct * r)
}

get_ct <- function(c, g) {
  ct = as.numeric(en_ct_df %>% filter(co == c, gr == g) %>% select(ct))
  
  if (is.na(ct)) {
    ct = ifelse(g == -2, # g == -2 is a special case so select from alternate df
                get_bt_ct(c - 12 + g),
                compute_ct(
                  as.numeric(
                    en_ct_df %>%
                      arrange(co, gr) %>%
                      group_by(co) %>%
                      mutate(lag_ct = lag(ct)) %>%
                      filter(co == c, gr == g) %>%
                      select(lag_ct)
                  ),
                  get_rt(gr)
                )
        )
  }
  
  return(ct)
}

p_cts_df <- en_ct_df # Make a copy where missing values can be replaced

# FAILED ATTEMPTS

# ATTEMPT 1
p_cts_df$ct <- p_cts_df %>% map2(co, gr, get_value)

# ATTEMPT 2
args1 <- list(co = p_cts_df$co, gr = p_cts_df$gr)
p_cts_df$ct <- args1 %>% pmap(get_value)

# ATTEMPT 3
p_cts_df$ct <- with(p_cts_df, get_value(co, gr))

mathminded · November 20, 2022, 10:07pm

I noticed an error in my code (where I put -2 but meant -1) but instead of editing it and getting marked as spam by the bot, I'm just providing the correction here.

Instead of:

it should say:

ct = ifelse(g == -1, # g == -1 is a special case so select from alternate df

mathminded · November 20, 2022, 10:34pm

I decided to start over and I have a working example that does exactly what I want it to do. However, I'm using a for-loop and nested if-else statements as kludges where I am sure there is a better way. In order to make this work, I have to execute row-by-row, which is why I put in the for-loop. Any ideas on improving this?

# SAMPLE DATA

bt_df <- tibble(
  co = 34:40,
  yr = 20:26,
  gr = -2,
  ct = c(602, 603, 570, 554, 574, NA, NA),
  p_ct = c(NA, NA,NA, NA, NA, 569.22, 567.05)
)

en_ct_df <- tibble(
    co = 34:40,
    gr = -1,
    yr = co - 12 + gr,
    ct = c(17, 24, 26, NA, NA, NA, NA),
    p_ct = c(NA, NA, NA, NA, NA, NA, NA)
  ) %>%
  add_row(
    co = 34:40,
    gr = 0,
    yr = co - 12 + gr,
    ct = c(17, 19, NA, NA, NA, NA, NA),
    p_ct = c(NA, NA, NA, NA, NA, NA, NA)
  ) %>%
  add_row(
    co = 34:40,
    gr = 1,
    yr = co - 12 + gr,
    ct = c(16, NA, NA, NA, NA, NA, NA),
    p_ct = c(NA, NA, NA, NA, NA, NA, NA)
  ) %>%
  add_row(
    co = 34:40,
    gr = 2,
    yr = co - 12 + gr,
    ct = c(NA, NA, NA, NA, NA, NA, NA),
    p_ct = c(NA, NA, NA, NA, NA, NA, NA)
  ) %>%
  add_row(
    co = 34:40,
    gr = 3,
    yr = co - 12 + gr,
    ct = c(NA, NA, NA, NA, NA, NA, NA),
    p_ct = c(NA, NA, NA, NA, NA, NA, NA)
  )

rt_df <- tibble(
  gr = -2:3,
  rt = c(NA, 0.03983, 1.10346, 1.03846, 0.95652, 0.96774)
)


p_cts_df <- en_ct_df
  
p_cts_df <- rbind(p_cts_df, bt_df)

p_cts_df <- left_join(p_cts_df, rt_df, by="gr")

for (i in 1:10) {
  p_cts_df <- p_cts_df %>%
    arrange(co, gr) %>%
    mutate(
      p_ct = ifelse(is.na(ct) & is.na(p_ct), 
                    ifelse(
                      is.na(lag(p_ct)), 
                      round(lag(ct) * rt, 2), 
                      round((lag(p_ct) * rt), 2)
                      ), 
                    p_ct)
    )
}

technocrat · November 21, 2022, 8:18am

Could you briefly describe your intent with get_value, which is missing?

system · January 2, 2023, 8:19am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.