Sampling from the population with a standard normal distribution

Hi,
I want to sampling with standard normal distribution [N(0,1)] from my population. The sample() function did not work because I want to use mean and standard deviation.

Can you help me?

If I understand you correctly, the rnorm() function does what you want. Take a look at

?rnorm

I'm sorry that my English is bad. rnorm() function generate data but I don't want to generate new data, I want to sampling data (N(0,1) from my population data.

I am very sorry if you felt I was criticizing your English. I truly was not. I know how hard it is to communicate in a foreign language and I would not criticize some else's efforts to do so.

What I do not understand is what it means to sample data using a N(0,1) distribution. If you have a set of data {x_1, x_2, ...x_i} what do you want to do with it? Do you want to select data so that the samples approximate an N(0,1) distribution?

I was only concerned about not being able to accurately describe my problem. Thanks for your sensitivity but I didn't really feel bad :slight_smile: And I think you understand me. Yes, I have a set of data (n=2500) and I want to select a new data (n=50) that the samples approximate an N(0,1) distribution?

I'm also still puzzled about what you exactly want. You can draw 50 values from a N(0,1) distribution using the rnorm function

rnorm(50)

or you can draw 50 values from a dataset by the sample function

data <- 1:2500
sample(data,50)

But if I want to sample data from dataset different mean and standard deviaton with normal distrubiton (e.g. mean=5 and st. dev= 2), what can I do?
As far as I know, sample function select data randomly. I want to sample the dataset with normal distribution using the mean and standard deviation I want.

OK, this is pretty wacky code but it is all I could think of. I use the rnorm function to make an example of 50 samples from \mathcal{N}(0,1) and then I extract from the data the points that are closest to the values given by rnorm. Having data with many points in the region typical of \mathcal{N}(0,1) is required for this to work decently, I think.Out of curiosity, why do you want to do this?

set.seed(1)
DAT <- runif(2500, min = -5, max = 5)
TRGT <- rnorm(50)
mean(TRGT)
#> [1] 0.3986448
sd(TRGT)
#> [1] 1.053185
GetNearest <- function(x, D){
  tmp <- min(abs(x - D))
  idx <- which(abs(x - D) == tmp)
  D[idx[1]]
}

SMPLS <- vector("numeric", 50)
for (i in 1:50) {
  SMPLS[i] <- GetNearest(TRGT[i], DAT)
}
mean(SMPLS)
#> [1] 0.3990898
sd(SMPLS)
#> [1] 1.053693

Created on 2020-02-05 by the reprex package (v0.3.0)

rnorm(50,mean=5,sd=2) will do the trick. See ?rnorm for more information

Hi,
Thank you so much for that codes. I've a real dataset (data.frame), include theta estimation an responses of examiness. I also estimated item parameters. I want to do some analysis in a smaller. I want to generate a subset according to the first column.
For example my real dataset nrows=2500, mean[,1]=0.9 and st.dev[,1]=1.5. I want to create a subset from it for example nrows=100, mean[,1]=0.2 and st.dev[,1]=1.0.

I know the rnorm() code but it is not for my purpose.

Is it possible that you want to do two separate actions , one after the other?

  1. perform uniform random sampling with replacement , to get 100 representative records
  2. normalize / standardise one or more of the variables in the dataset

?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.