Generate observations from a table of probabilities.

ssarabatres · May 15, 2021, 8:19pm

Hello!
I'm stuck in one task. I have to generate 1000 observations from a table of probabilities of a distribution that I already have. I don't know how I can do this.

Thanks a lot !

nirgrahamuk · May 17, 2021, 3:18pm


p_df <- data.frame(a=.2,
                   b=.8)

(elements <- names(p_df))
(probabilities <- as.numeric(p_df))

(generated_output <- sample(x=elements,
       size = 100,
       replace = TRUE,
       prob = probabilities))

table(generated_output)

ssarabatres · May 19, 2021, 9:46am

Thanks a lot for your help When I apply it to my problem, as I have tables of probabilities already done, I get an error in the number of probabilities. It says that is wrong. Also I think in my case I don't need to put it as data frame, as it does not let me. I attach my code, maybe it helps, i really don't know where the problem is

Error in sample.int(length(x), size, replace, prob) :
incorrect number of probabilities

MY CODE:

library(gRbase)
library(gRain)
library(Rgraphviz)
cad.dag <- dag(~CAD:Smoker:Inherit:Hyperchol+AngPec:CAD +
Heartfail:CAD + QWave:CAD)
plot(cad.dag)

data(cad1)
summary(cad1)
help(extractCPT)
cad.cpt <- extractCPT(cad1, cad.dag, smooth=0.1)
cad.cpt

(elements <- names(cad.cpt))
(probabilities <- as.numeric(unlist(cad.cpt)))
(generated_output <- sample(x=elements,
size = 1000,
replace = TRUE,
prob=probabilities))

table(generated_output)

nirgrahamuk · May 19, 2021, 10:05am

what does cad.cpt end up being ?

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Short Version

You can share your data in a forum friendly way by passing the data to share to the dput() function.
If your data is too large you can use standard methods to reduce it before sending to dput().
When you come to share the dput() text that represents your data, please be sure to format your post with triple backticks on the line before your code begins to format it appropriately.

```
( example_df <- structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4, 4.6, 
5, 4.4, 4.9), Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9, 3.4, 
3.4, 2.9, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 
1.4, 1.5, 1.4, 1.5), Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 
0.4, 0.3, 0.2, 0.2, 0.1), Species = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("setosa", "versicolor", "virginica"
), class = "factor")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame")))
```

ssarabatres · May 19, 2021, 10:22am

Hello! I will try explain it. The output that I get of cad.cpt (CAD is cardiovascular disease) is the tables of probabilites, regarding if someone has a CAD because they smoke, or they don't and they have CAD...I attach one of the tables as an example. Also taking into account if their relatives had CAD and if they have other disease that could affect it. I hope this is sufficient information. I'm trying to solve it but nothing... Thanks in advance !

$CAD
, , Inherit = No, Hyperchol = No

 Smoker

CAD No Yes
No 0.95634921 0.76182432
Yes 0.04365079 0.23817568

nirgrahamuk · May 19, 2021, 10:37am

looks like your probabilities would add up to more than 1 . not sure what to make of that ...

If you want my help then I really do ask that you do me a courtesy of taking on board the material I provide you.
I'll repost it so you cant miss it.

You can share your data in a forum friendly way by passing the data to share to the dput() function.
If your data is too large you can use standard methods to reduce it before sending to dput().
When you come to share the dput() text that represents your data, please be sure to format your post with triple backticks on the line before your code begins to format it appropriately.

```
( example_df <- structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4, 4.6, 
5, 4.4, 4.9), Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9, 3.4, 
3.4, 2.9, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 
1.4, 1.5, 1.4, 1.5), Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 
0.4, 0.3, 0.2, 0.2, 0.1), Species = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("setosa", "versicolor", "virginica"
), class = "factor")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame")))
```