sample and replace numbers

Hello everyone,
I have a large data like this:
ID sire Dam
1 2 3
4 1 3
5 1 4
6 5 4
7 5 6
8 7 6
...
I would like to replace 10 percent of numbers of sire with wrong number of sires.
For example, I would like change for ID=1, number of sire = 1, 5 or 7 (Actually, 10 percent
of ID numbers have wrong number of sire).
How can I do this?

Hi,

Welcome to the RStudio community!

Here is an example of a function I created to do this for any categorical variable

set.seed(1)

#Dummy data
df = data.frame(ID = 1:100, sire = sample(c(1,2,5,7), 100, replace = T), 
           Dam = sample(c(3,4,6), 100, replace = T))

head(df)
#>   ID sire Dam
#> 1  1    1   3
#> 2  2    7   3
#> 3  3    5   3
#> 4  4    1   6
#> 5  5    2   4
#> 6  6    1   3

#Function to replace with wrong values
wrongVal = function(x, perc){
  
  #Get the unique values
  uniqueVal = unique(x)
  #Pick a random number of values to replace (vector index)
  toReplace = sample(1:length(x), ceiling(length(x) * perc / 100))
  
  #Replace the numbers with one that is not the same as the current value
  x[toReplace] = sapply(x[toReplace], function(y){
    sample(uniqueVal[uniqueVal != y], 1)
  })
  
  return(x)
}

#Run the function on your data
df$sire2 = wrongVal(df$sire, 10)

head(df)
#>   ID sire Dam sire2
#> 1  1    1   3     1
#> 2  2    7   3     2
#> 3  3    5   3     5
#> 4  4    1   6     1
#> 5  5    2   4     2
#> 6  6    1   3     1

#Sanity check: percent of incorrect values from original
sum(df$sire != df$sire2) / length(df$sire) * 100
#> [1] 10

Created on 2023-02-23 by the reprex package (v2.0.1)

Hope this helps,
PJ

Thank you! It works :+1:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.