Sum of duplicates by column

Stephan · October 23, 2019, 1:20pm

Hi,
I am enjoying myself with the famous Birthday paradox, looking at the probability of several persons (36 in this case) having their birthday the same day. I've managed to come so far:

# a vector for the number of days in a year

yeardays <- seq(1:365)

# A vector for the number of repeated samplings (100)

B <- 100

# A matrix with 100 columns and 36 rows from 100 repeated samplings with 36 random samples each

S <- replicate(B, {
  X <- sample(yeardays, 36, replace = TRUE)
})

# Sum the number of duplicates in one column

sum(duplicated(S[,1]))

Now, what I don't get is how to get the sum of duplicates for each of the 100 columns. A loop or perhaps "apply" should do it, but I just don't get the output of 100 sums I would need. Any clue would be so great! Thanks!

nwerth · October 23, 2019, 1:47pm

You're right that the apply function is a good fit. You can then feed it's result to colSums:

column_duped <- apply(S, MARGIN = 2, duplicated)
colSums(column_duped)
#  [1] 1 1 3 1 3 1 1 4 0 1 2 0 1 2 1 2 2 3
# [19] 1 3 4 4 2 0 1 3 1 2 3 3 2 3 1 3 1 0
# [37] 1 3 5 0 2 1 4 0 2 2 2 1 2 1 1 3 0 0
# [55] 3 2 1 2 5 1 1 1 2 3 2 0 3 2 4 2 3 1
# [73] 1 2 0 1 3 0 0 2 2 2 4 2 2 1 4 1 1 1
# [91] 2 1 0 0 3 0 0 1 3 0

Stephan · October 23, 2019, 2:11pm

Thank you so much!!! It work wonders. I just added a vector for the output and now I can get on doing plots and other fun!

valeri · October 24, 2019, 6:15pm

HI @Stephan,

please mark a solution if your problem has been solved.

system · October 31, 2019, 6:16pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.