I'm using this data:
df = df |>
mutate(customer_id = str_replace(customer_id, '.0$', '')) |>
filter(customer_id != 'no customer id')
My focus is on two columns -- customer_id
and stock_code
.
df |>
distinct(customer_id, stock_code) |>
arrange(customer_id, stock_code)
# A tidytable: 261,019 × 2
customer_id stock_code
<chr> <chr>
1 12369 23166AP
2 12370 16008AP
3 12370 17021AP
4 12370 20665AP
5 12370 20719AP
6 12370 20780AP
7 12370 20782AP
8 12370 20966AP
9 12370 21035AP
10 12370 21041AP
# ℹ 261,009 more rows
Above is products per customer ID.
You can see that customer_id
12370 has multiple stock_codes
.
The end result I would like is two columns of products.
A combination of two products per row, but the pairs exist as a purchase per customer_id
.
However, because it's in alphabetical order, there doesn't need to be a reversal (i.e. X -> Y and Y -> X)
End result I would like, just from above 10 rows of data, would be something like this:
column1|column2|customer_id
16008AP|17021AP|12370
16008AP|20665AP|12370
16008AP|20719AP|12370
16008AP|20780AP|12370
16008AP|20782AP|12370
16008AP|20966AP|12370
16008AP|21035AP|12370
16008AP|21041AP|12370
20665AP|20719AP|12370
20665AP|20780AP|12370
20665AP|20782AP|12370
20665AP|20966AP|12370
20665AP|21035AP|12370
20665AP|21041AP|12370
20719AP|20780AP|12370
20719AP|20782AP|12370
20719AP|20966AP|12370
20719AP|21035AP|12370
20719AP|21041AP|12370
20780AP|20782AP|12370
20780AP|20782AP|12370
20780AP|20782AP|12370
20780AP|20782AP|12370
20780AP|20782AP|12370
20966AP|21035AP|12370
20966AP|21041AP|12370
21035AP|21041AP|12370
Would it be just a for loop within a for loop?