Creating a Data frame with random numbers, and with specific range

saeedraeisi · July 3, 2021, 10:29am

I have a questionnaire with 60 questions and I want to generate fake data to filling it, considering the answer of the questions are in 4 different types (nominal, ordinal, numeric, descriptive). And some of the questions have constrains according to the other questions.

for example:
Q1:what do you do? (restricted answer: 1-Student, 2-Engineer, 3- Doctor)
Q2: how many course do you have in this semester? (1,2,3,4,5,6,7,8,9,10)
#Q2 depends on the answer of Q1 (just students can answer Q2 , others N/A)

would you help me to fix it?

pieterjanvc · July 3, 2021, 12:29pm

Hi,

With the sample() method and some conditional logic you should be able to do this:

library(dplyr)

set.seed(20) #Just here for reproducibility
nParticipants = 8

mySurvey = data.frame(
  Q1 = sample(c("Student", "Engineer", "Doctor"), nParticipants, 
              replace = T, prob = c(0.6,0.2,0.2))
)

mySurvey = mySurvey %>% mutate(
  Q2 = ifelse(Q1 == "Student", sample(1:10, nParticipants, replace = T), NA)
)

mySurvey
#>         Q1 Q2
#> 1 Engineer NA
#> 2   Doctor NA
#> 3  Student  9
#> 4  Student  8
#> 5 Engineer NA
#> 6 Engineer NA
#> 7  Student  9
#> 8  Student  5

^{Created on 2021-07-03 by the reprex package (v2.0.0)}

When using continuous distributions (sample() is for discrete), you have a lot of known types to sample from all built into R

set.seed(20)
nParticipants = 8

#Uniform distribution
runif(nParticipants, 100000, 250000)
#> [1] 231628.2 215280.0 141844.5 179374.6 244436.1 247053.2 113699.9 110612.4

#Normal distribution
rnorm(nParticipants, 5, 1)
#> [1] 4.553433 5.569606 2.110282 4.130982 4.538297 4.444459 4.979865 4.849618

#Others
# runif, rpois, rmvnorm, rnbinom, rbinom, rbeta, rchisq, rexp, rgamma, 
# rlogis, rstab, rt, rgeom, rhyper, rwilcox, rweibull

Hope this helps,
PJ

HanOostdijk · July 3, 2021, 12:44pm

In addition to the response of @pieterjanvc :

When the ifelse logic is more complicated than here the case_when could be easier to read:

mySurvey = mySurvey %>%  
    mutate( 
      Q2= case_when(
        Q1 == "Student" ~ sample(0:10,nParticipants,replace=T),
        # ... more lines in more complicated case
        TRUE ~ NA_integer_)
      )

system · July 24, 2021, 12:45pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.