# Generate dataset following Benford distribution

Hey there,

I was wondering how I can generate a data set (10000 elements with 4 digits) following the benford distribution.
Are there any additional packages I need to install to get started? What does the code look like?

Thank you very much in advance!

This generates 10000 random integers up to 9999 (i.e. up to 4 digits):

sample.int(9999, 10000, replace = TRUE)


If you need exactly 4 digits, then this should work:

sample.int(8999, 10000, replace = TRUE) + 1000


This is just to get you started. To get a Benford distribution, you could impose a random maximum value, but there may well be better methods.

I don't know what is Benford's distribution. Is it the discrete distribution with the ffollowing pmf?

P(X = d) = log_{10}(1 + \frac{1}{d}), d \in {1, 2, \dots, 9}

(I'm just guessing from this)

If so, then you can do the following to generate the first digits as follows. I don't know the distributions for other positions. You can generate similarly and then use paste.

sample(x = 1:9,
size = 10000, # required bumber of observations
replace = TRUE,
prob = log10(x = (1 + (1 / (1:9)))))


Hope this helps.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.