# Sum of duplicates by column

Hi,
I am enjoying myself with the famous Birthday paradox, looking at the probability of several persons (36 in this case) having their birthday the same day. I've managed to come so far:

# a vector for the number of days in a year

``````yeardays <- seq(1:365)
``````

# A vector for the number of repeated samplings (100)

``````B <- 100
``````

# A matrix with 100 columns and 36 rows from 100 repeated samplings with 36 random samples each

``````S <- replicate(B, {
X <- sample(yeardays, 36, replace = TRUE)
})
``````

# Sum the number of duplicates in one column

``````sum(duplicated(S[,1]))
``````

Now, what I don't get is how to get the sum of duplicates for each of the 100 columns. A loop or perhaps "apply" should do it, but I just don't get the output of 100 sums I would need. Any clue would be so great! Thanks!

You're right that the `apply` function is a good fit. You can then feed it's result to `colSums`:

``````column_duped <- apply(S, MARGIN = 2, duplicated)
colSums(column_duped)
#  [1] 1 1 3 1 3 1 1 4 0 1 2 0 1 2 1 2 2 3
# [19] 1 3 4 4 2 0 1 3 1 2 3 3 2 3 1 3 1 0
# [37] 1 3 5 0 2 1 4 0 2 2 2 1 2 1 1 3 0 0
# [55] 3 2 1 2 5 1 1 1 2 3 2 0 3 2 4 2 3 1
# [73] 1 2 0 1 3 0 0 2 2 2 4 2 2 1 4 1 1 1
# [91] 2 1 0 0 3 0 0 1 3 0
``````

Thank you so much!!! It work wonders. I just added a vector for the output and now I can get on doing plots and other fun!

HI @Stephan,