Generate observations from a table of probabilities.

Hello!
I'm stuck in one task. I have to generate 1000 observations from a table of probabilities of a distribution that I already have. I don't know how I can do this.

Thanks a lot !


p_df <- data.frame(a=.2,
                   b=.8)

(elements <- names(p_df))
(probabilities <- as.numeric(p_df))

(generated_output <- sample(x=elements,
       size = 100,
       replace = TRUE,
       prob = probabilities))

table(generated_output)
1 Like

Thanks a lot for your help :slight_smile: When I apply it to my problem, as I have tables of probabilities already done, I get an error in the number of probabilities. It says that is wrong. Also I think in my case I don't need to put it as data frame, as it does not let me. I attach my code, maybe it helps, i really don't know where the problem is :frowning:

Error in sample.int(length(x), size, replace, prob) :
incorrect number of probabilities

MY CODE:

library(gRbase)
library(gRain)
library(Rgraphviz)
cad.dag <- dag(~CAD:Smoker:Inherit:Hyperchol+AngPec:CAD +
Heartfail:CAD + QWave:CAD)
plot(cad.dag)

data(cad1)
summary(cad1)
help(extractCPT)
cad.cpt <- extractCPT(cad1, cad.dag, smooth=0.1)
cad.cpt

(elements <- names(cad.cpt))
(probabilities <- as.numeric(unlist(cad.cpt)))
(generated_output <- sample(x=elements,
size = 1000,
replace = TRUE,
prob=probabilities))

table(generated_output)

what does cad.cpt end up being ?


To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:


Short Version

You can share your data in a forum friendly way by passing the data to share to the dput() function.
If your data is too large you can use standard methods to reduce it before sending to dput().
When you come to share the dput() text that represents your data, please be sure to format your post with triple backticks on the line before your code begins to format it appropriately.

```
( example_df <- structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4, 4.6, 
5, 4.4, 4.9), Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9, 3.4, 
3.4, 2.9, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 
1.4, 1.5, 1.4, 1.5), Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 
0.4, 0.3, 0.2, 0.2, 0.1), Species = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("setosa", "versicolor", "virginica"
), class = "factor")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame")))
```

Hello! I will try explain it. The output that I get of cad.cpt (CAD is cardiovascular disease) is the tables of probabilites, regarding if someone has a CAD because they smoke, or they don't and they have CAD...I attach one of the tables as an example. Also taking into account if their relatives had CAD and if they have other disease that could affect it. I hope this is sufficient information. I'm trying to solve it but nothing... Thanks in advance !

$CAD
, , Inherit = No, Hyperchol = No

 Smoker

CAD No Yes
No 0.95634921 0.76182432
Yes 0.04365079 0.23817568

looks like your probabilities would add up to more than 1 . not sure what to make of that ...

If you want my help then I really do ask that you do me a courtesy of taking on board the material I provide you.
I'll repost it so you cant miss it.

You can share your data in a forum friendly way by passing the data to share to the dput() function.
If your data is too large you can use standard methods to reduce it before sending to dput().
When you come to share the dput() text that represents your data, please be sure to format your post with triple backticks on the line before your code begins to format it appropriately.

```
( example_df <- structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4, 4.6, 
5, 4.4, 4.9), Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9, 3.4, 
3.4, 2.9, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 
1.4, 1.5, 1.4, 1.5), Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 
0.4, 0.3, 0.2, 0.2, 0.1), Species = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("setosa", "versicolor", "virginica"
), class = "factor")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame")))
```