# sample and replace numbers

Hello everyone,
I have a large data like this:
ID sire Dam
1 2 3
4 1 3
5 1 4
6 5 4
7 5 6
8 7 6
...
I would like to replace 10 percent of numbers of sire with wrong number of sires.
For example, I would like change for ID=1, number of sire = 1, 5 or 7 (Actually, 10 percent
of ID numbers have wrong number of sire).
How can I do this?

Hi,

Welcome to the RStudio community!

Here is an example of a function I created to do this for any categorical variable

``````set.seed(1)

#Dummy data
df = data.frame(ID = 1:100, sire = sample(c(1,2,5,7), 100, replace = T),
Dam = sample(c(3,4,6), 100, replace = T))

#>   ID sire Dam
#> 1  1    1   3
#> 2  2    7   3
#> 3  3    5   3
#> 4  4    1   6
#> 5  5    2   4
#> 6  6    1   3

#Function to replace with wrong values
wrongVal = function(x, perc){

#Get the unique values
uniqueVal = unique(x)
#Pick a random number of values to replace (vector index)
toReplace = sample(1:length(x), ceiling(length(x) * perc / 100))

#Replace the numbers with one that is not the same as the current value
x[toReplace] = sapply(x[toReplace], function(y){
sample(uniqueVal[uniqueVal != y], 1)
})

return(x)
}

#Run the function on your data
df\$sire2 = wrongVal(df\$sire, 10)

#>   ID sire Dam sire2
#> 1  1    1   3     1
#> 2  2    7   3     2
#> 3  3    5   3     5
#> 4  4    1   6     1
#> 5  5    2   4     2
#> 6  6    1   3     1

#Sanity check: percent of incorrect values from original
sum(df\$sire != df\$sire2) / length(df\$sire) * 100
#> [1] 10
``````

Created on 2023-02-23 by the reprex package (v2.0.1)

Hope this helps,
PJ

Thank you! It works

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.