Hello! Let's say I have a data set of someone's ATM visits in order. I want to keep a cumulative total running each time the person visits a new ATM. If it's an ATM they've visited before though, I don't want to count that.
I've cobbled together something that works for one person at a time below, but I'm not sure how I would turn this into something that could be applied across tens of thousands of people. Client A visits several different ATMs, while Client B just visits the same one each time.
Appreciate any help or tips, I'm a little lost on this one.
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.4.3
#> -- Attaching packages --------------------------------------------------------- tidyverse 1.2.1 --
#> v ggplot2 2.2.1 v purrr 0.2.5
#> v tibble 1.4.2 v dplyr 0.7.4
#> v tidyr 0.8.0 v stringr 1.3.1
#> v readr 1.1.1 v forcats 0.2.0
#> Warning: package 'tibble' was built under R version 3.4.3
#> Warning: package 'tidyr' was built under R version 3.4.3
#> Warning: package 'purrr' was built under R version 3.4.4
#> Warning: package 'dplyr' was built under R version 3.4.3
#> Warning: package 'stringr' was built under R version 3.4.4
#> -- Conflicts ------------------------------------------------------------ tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
transactions <- tribble(
~client, ~day, ~ATM_location,
#---------#----#-----#
"A", 1L, "Bank",
"A", 4L, "Elgin",
"A", 10L, "Broadview",
"A", 11L, "Broadview",
"B", 1L, "Bank",
"B", 3L, "Bank",
"B", 5L, "Bank",
"B", 6L, "Bank"
)
# count for one client
one_client <- transactions %>%
filter( client == "A" )
# create a function that increments counter each time we see a new doctor for a SIN
i = 0
been_to = c()
check_new <- function(x) {
if (!x %in% been_to) {
i <<- i + 1
been_to <<- c(x, been_to)
}
i
}
# count new atm visits for client "A"
res <- map( one_client$ATM_location, check_new )
# wrangle result
res %>%
enframe( value = "unique_atms" ) %>%
select(-name) %>%
unnest %>%
bind_cols( one_client )
#> # A tibble: 4 x 4
#> unique_atms client day ATM_location
#> <dbl> <chr> <int> <chr>
#> 1 1.00 A 1 Bank
#> 2 2.00 A 4 Elgin
#> 3 3.00 A 10 Broadview
#> 4 3.00 A 11 Broadview
# count new atm visits for client "B"
one_client <- transactions %>%
filter( client == "B" )
i = 0
been_to = c()
# map function over one client
res <- map( one_client$ATM_location, check_new )
# wrangle result
res %>%
enframe( value = "unique_atms") %>%
select(-name) %>%
unnest %>%
bind_cols( one_client )
#> # A tibble: 4 x 4
#> unique_atms client day ATM_location
#> <dbl> <chr> <int> <chr>
#> 1 1.00 B 1 Bank
#> 2 1.00 B 3 Bank
#> 3 1.00 B 5 Bank
#> 4 1.00 B 6 Bank