I have a piece of hobby code which has thrown up a problem I can't solve.
It relates to fantasy football and manipulating data about which (fantasy) managers have transferred players
The naïve assumption would be that a given player (element)does not get transferred more than once in a week (event) - but that proves not to be the case
So if there is a sequence of trades in and out in the same week, what is the net transfer?
transfersnet below has a partial solution in that in tells me which players per event have been traded in more than out, or out more than in. So I know the net trades. What I don't know is who was traded for whom.
In order to get who traded for whom, I reckon I need to group transfers by event, and then somehow treat each of the rows by storing the first in/out pair. And then for each subsequent pair:
if the in is already in a stored pair as an out, replace the out of the stored pair with the new out.
Likewise: if the out is already in a stored pair as an in, replace the in of the stored pair with the new in.
else add the new pair
I am guessing I can implement functions to the above operating on a list
But I have no idea how to manipulate groups and condense the rows without using summarise
Sorry if this is not well-explained. Writing this up has helped my thinking about it, but I am not sure I know what to do with a group_by if not using summarise!
Input data looks like
> transfers
# A tibble: 40 x 4
element_in element_out event time
<int> <int> <int> <chr>
1 491 67 3 2021-08-22T21:27:08.843876Z
2 196 254 3 2021-08-27T23:40:43.015581Z
3 399 277 4 2021-09-11T09:58:39.486116Z
4 579 348 4 2021-09-11T09:58:39.493032Z
5 430 78 5 2021-09-14T20:48:56.023986Z
6 559 69 7 2021-09-28T13:18:44.050136Z
7 30 448 7 2021-09-28T13:18:44.054258Z
8 121 185 7 2021-09-28T13:18:44.054805Z
9 256 237 7 2021-09-28T13:18:44.055313Z
10 127 275 7 2021-09-28T13:18:44.055828Z
# ... with 30 more rows
library(plyr)
library(dplyr)
transfers =
structure(list(element_in = c(399L, 579L, 430L, 559L, 30L, 121L,
256L, 127L, 44L, 262L, 482L, 173L, 154L, 22L, 125L, 205L, 307L,
78L, 475L, 270L, 249L, 169L, 23L, 432L, 122L, 399L, 64L, 261L,
419L, 21L, 4L, 63L, 579L, 559L, 30L, 121L, 256L, 127L, 44L, 262L
), element_out = c(277L, 348L, 78L, 69L, 448L, 185L, 237L, 275L,
290L, 491L, 196L, 233L, 240L, 399L, 419L, 413L, 430L, 579L, 30L,
559L, 44L, 121L, 127L, 256L, 262L, 22L, 125L, 154L, 173L, 482L,
78L, 205L, 307L, 270L, 475L, 23L, 122L, 169L, 249L, 432L),
event = c(4L,
4L, 5L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L),
time = c("2021-09-11T09:58:39.486116Z",
"2021-09-11T09:58:39.493032Z", "2021-09-14T20:48:56.023986Z",
"2021-09-28T13:18:44.050136Z", "2021-09-28T13:18:44.054258Z",
"2021-09-28T13:18:44.054805Z", "2021-09-28T13:18:44.055313Z",
"2021-09-28T13:18:44.055828Z", "2021-09-28T13:18:44.056324Z",
"2021-09-28T13:18:44.056840Z", "2021-09-28T13:18:44.057337Z",
"2021-09-28T13:18:44.057836Z", "2021-09-28T13:18:44.058331Z",
"2021-09-28T13:18:44.058824Z", "2021-09-28T13:18:44.059313Z",
"2021-09-28T13:18:44.059819Z", "2021-09-28T13:18:44.060318Z",
"2021-09-28T13:18:44.060872Z", "2021-09-28T13:19:49.644860Z",
"2021-09-28T13:19:49.650382Z", "2021-09-28T13:19:49.651567Z",
"2021-09-28T13:19:49.652626Z", "2021-09-28T13:19:49.653605Z",
"2021-09-28T13:19:49.654344Z", "2021-09-28T13:19:49.655445Z",
"2021-09-28T13:19:49.656552Z", "2021-09-28T13:19:49.657677Z",
"2021-09-28T13:19:49.658723Z", "2021-09-28T13:19:49.659701Z",
"2021-09-28T13:19:49.660526Z", "2021-09-28T13:19:49.661612Z",
"2021-09-28T13:19:49.662672Z", "2021-09-28T13:19:49.663489Z",
"2021-09-28T13:20:23.170741Z", "2021-09-28T13:20:23.179240Z",
"2021-09-28T13:20:23.180441Z", "2021-09-28T13:20:23.181475Z",
"2021-09-28T13:20:23.182479Z", "2021-09-28T13:20:23.183104Z",
"2021-09-28T13:20:23.183661Z")), row.names = c(NA, -40L), class = c("tbl_df",
"tbl", "data.frame"))
transfersin = transfers %>%
group_by(id = element_in, event) %>%
summarise(transfersin = n(),timein = max(time), .groups = "drop")
transfersout = transfers %>%
group_by(id = element_out, event) %>%
summarise(transfersout = n(),timeout = max(time), .groups = "drop")
transfersall = transfersin %>%
full_join(transfersout, by=c("id","event")) %>%
mutate(nettransfer = coalesce(transfersin,0) - coalesce(transfersout,0)) %>%
arrange(event)
transfersnet = transfersall %>%
filter(nettransfer != 0)