Having issues with plot in R

fox002 · September 1, 2020, 5:45pm

I still have one error that I can't figure out, it doesn't want to plot the "headsPercentage" points.
Error: unexpected numeric constant in:
"}
plot(headsPercentage, type = "1"
this is what i have:
<CoinFlip <- function(){
sides <- 1:2
heads = sample(1:2, 1)
flip = sample(1:2, 1)
if (flip == heads) return(1)
else return(0)
}
set.seed(20)
heads <- 1
headsPercentage <- numeric(500)
for(counter in 1:500){
heads <- heads + CoinFlip()
headsPercentage[counter] <- heads/counter
}
plot(headsPercentage, type = "1", xlab = "Flips Simulated”, ylab = ”Heads vs Tails Percentage”, ylim=c(0,1))
headsPercentage[500]>

FJCC · September 1, 2020, 6:21pm

The code below runs for me. The only changes I made were in the plot() call I set the type to "l" (the letter l) instead of the number one. I changed the curly quotation marks to plain quotation marks at the end of Flips Simulated and on both ends of Heads vs Tails Percentage

CoinFlip <- function(){
  sides <- 1:2
  heads = sample(1:2, 1)
  flip = sample(1:2, 1)
  if (flip == heads) return(1)
  else return(0)
}
set.seed(20)
heads <- 1
headsPercentage <- numeric(500)
for(counter in 1:500){
  heads <- heads + CoinFlip()
  headsPercentage[counter] <- heads/counter
}
plot(headsPercentage, type = "l", xlab = "Flips Simulated", ylab = "Heads vs Tails Percentage", ylim=c(0,1))
headsPercentage[500]

fox002 · September 2, 2020, 12:58pm

Thanks a lot, it solved my problem

elmstedt · September 2, 2020, 8:22pm

I wanted to take a second and offer some advice on your code.

Warning: I wound up going a little overboard. I hope though you can get some use out of this.

In general, cleaner code is easier to understand, debug, and maintain. You've got at least 4 unnecessary things going on in this small code snippet.

You define a variable which is never used (sides). Either use it or don't create it.
You're randomizing two things when one will do. You don't need to randomly assign which side of the coin is head and then randomly flip the coin.
Very often, when dealing with a binary result, you can forego using an if statement. I see this frequently with intro R students. Imagine we have a logical variable test,
If we are just going to return TRUE or FALSE based on the results of test:

if (test) {
  TRUE
} else {
  FALSE
}

we can instead just output test. A corollary to this I see a lot is:

if (test == TRUE) {
  TRUE
} else {
  FALSE
}

We don't need test == TRUE as it will always be logically identical to just test.
So, you should never use if (test == TRUE), rather just use if (test)

You should reserve return() function calls for only when you exit a routine someplace other than the end.
You use both <- and =, pick one and use it consistently (you should pick <-).

Bonus note: In R it's preferable to use a snake_case object naming scheme rather than camelCase or BigCamelCase, so I'll do that here.
So, with those four notes in mind, your CoinFlip() code can be simplified to:

coin_flip <- function() {
  sample(0:1, 1)
}

Now, you further continue on to use a for() loop to populate your vector. We can simplify that by updating your new coin_flip() function to accept an argument n which dictates how many coins to flip.

coin_flip <- function(n = 1) { # we'll set a default value of 1
  sample(0:1, n, TRUE) # we need `replace == TRUE` for when we're flipping multiple coins.
}

Now, your working code can be something like,

coin_flip <- function(n = 1) { # we'll set a default value of 1
  # we need `replace == TRUE` for when we're flipping multiple coins.
  sample(0:1, n, TRUE) 
}

Now, "coin_flip" is not a great name for the function. It is probably slightly (significantly?) better to reverse it and call it "flip_coins." Different people might have different perspectives on this, but function names should be verbs and sound active. "coin_flip" or even "coin_flips" sound more like results than functions to me, so going forward we're going to use the function name "flip_coins."

flip_coins <- function(n = 1) { # we'll set a default value of 1
  # we need `replace == TRUE` for when we're flipping multiple coins.
  sample(0:1, n, TRUE) 
}

set.seed(20)
x <- flip_coins(500)
head(x)
#> [1] 1 0 0 1 1 0

Next we can turn our attention to what you are actually doing with these flips. You want to calculate the running proportion of heads over time. So we'll write a function to do that for us in a more "R" way.

So how do we do that? Well, each point is equal to the number of heads results we've seen at
time i, divided by the number of flips we've done at time i. In your original code this was
the variable counter. Since our new function produces a vector of length n consisting of values coded 1 for heads and 0 for tails, the number of heads we have seen at any point is equal to the sum of the vector up to that point (the cumulative sum) which in R (and most programming languages) can be computed with the function cumsum().

Then, we need some way to produce a vector of values which represents the number of flips we
have done at each point. This vector looks like 1, 2, 3, 4, ..., n we could do that by generating a length n sequence with seq_len(n), but this would require explicitly finding n every time. We can do this, with seq_len(length(x)) of course, but that type of construction is common enough to have it's own function: seq_along(x).

Finally, since R is a natively vectorized language, we can just divide our count vector by the number of flips vector to get a vector of the running proportions.

run_prop <- function(x) {
  cumsum(x) / seq_along(x)
}

plot(run_prop(x), type = "l")

Once here we can start to look at ways to enhance and extend the capabilities of our code. A good place to start is to ask, what might we want to be able to do differently? Well, What if we wanted to use a weighted coin? We would need to adjust the sampling proportion, and our flip_coins() function will need to accept an argument which represents the probability of getting heads. Let's call it p and give it a default value of 0.5 representing a fair coin.


flip_coins <- function(n = 1, p = 0.5) {
  sample(0:1, n, TRUE, c(1 - p, p)) 
}

set.seed(147)
x <- flip_coins(500, 0.8)
head(x)
#> [1] 1 1 1 0 1 1
plot(run_prop(x), type = "l")

Now, I would never allow a student to turn in a plot like this... there's no title, the axis labels are all wrong, our y-axis starts somewhere in the high 0.6's, and we don't have any idea what the expected value of the coin is supposed to be. So, let's fix our plot...

plot(x    = seq_along(x),
     y    = run_prop(x),
     type = "l",
     main = "Running Proportion of Heads",
     xlab = "Number of Flips",
     ylab = "Proportion",
     ylim = c(0, 1))
abline(h = 0.8, col = "blue", lty = 3)
text(length(x) / 2, 0.8 + 0.05, "Probability of Heads = 0.8", col = "blue")

Wow! There's a lot going on there... I'd hate to have to copy and paste this code every time I wanted to do this plot. So, let's turn it into a function called coin_plot()!

coin_plot <- function(x, p) {
  plot(x    = seq_along(x),
       y    = run_prop(x),
       type = "l",
       main = "Running Proportion of Heads",
       xlab = "Number of Flips",
       ylab = "Proportion",
       ylim = c(0, 1))
  abline(h = p, col = "blue", lty = 3)
  text(length(x) / 2, p + 0.05, paste("Probability of Heads = ", p), col = "blue")
}

Now, we're cooking!, but there's a little something annoying here... We need to know the true coin probability when we flip the coin... Frustrating! If only there was a way to keep that information associated with the coin flip result itself... There is! We can give it an attribute! So... Let's go back to our flip_coins() function and store the p value in the result as an attribute.

There are two ways to do this,

The "Simple" Way
Save the results as an object, assign an attribute to that object with the attr() function (this is safer to use than attributes() since assigning with attributes() clears all existing attributes first, not a problem here, but I consider it a better practice to use attr()), then finally returning that object.

flip_coins <- function(n = 1, p = 0.5) {
  result <- sample(0:1, n, TRUE, c(1 - p, p))
  attr(result, "p") <- p
  result
}

The Better Way
Whenever possible, I prefer my functions to be succient, and it seems weird to me to need to create and object and then add attributes to that object since if the attributes are an integral part of the object it should never be able to exist without them. Thankfully, someone developing R in the before-times agreed with me and we have the structure() function which allows us to create an object, assign it attributes, and return the object in one fell swoop.

flip_coins <- function(n = 1, p = 0.5) {
  structure(sample(0:1, n, TRUE, c(1 - p, p)), p = p)
}
set.seed(2357)
x <- flip_coins(500, 0.8)
str(x)
#>  int [1:500] 1 1 1 1 1 1 1 1 1 0 ...
#>  - attr(*, "p")= num 0.8

Now we can return to our coin_plot() function and remove the pesky requirement for us to manually keep track of the true probability.

coin_plot <- function(x) {
  p <- attr(x, "p")
  plot(x    = seq_along(x),
       y    = run_prop(x),
       type = "l",
       main = "Running Proportion of Heads",
       xlab = "Number of Flips",
       ylab = "Proportion",
       ylim = c(0, 1))
  abline(h = p, col = "blue", lty = 3)
  text(length(x) / 2, p + 0.05, paste("Probability of Heads = ", p), col = "blue")
}

Excellent! And, if we were normal people we'd be done... But we're just getting started! How infuriating is it that we need to remember a whole other function name just to plot our coins? Who has that kind of mental bandwith? Why can't we just plot our coins as they are and let R figure it out? Well, if you think about the R plot() function, how does it do so much? Let's give ourselves some different types of data and plot them with plot()

set.seed(1123)
my_scores <- sample(50:100, 40, TRUE)
head(my_scores)
#> [1] 97 87 92 53 65 82
my_grades <- cut(my_scores,
                 breaks = c(0, 60, 70, 80, 90, 100),
                 labels = c("F", "D", "C", "B", "A"),
                 include.lowest = TRUE)
head(my_grades)
#> [1] A B A F D B
#> Levels: F D C B A

plot(my_scores)

plot(table(my_grades))

R can do this because plot() in R is a "generic" function, meaning depending on the data it gets it does different things. It identifies what it is plotting by looking at the class attribute, then it calls a different plot.<class of the object> function method accordingly (plot.default() if there is no method defined for the class of the data object). This is one reason why you should never use "." in your object names in R, experienced programmers (and sometimes R itself) will confuse them for function methods.

So, if we want to be able to invoke a specific plotting function for our coin flip data, we need to be able to identify it as such, so we must update flip_coins() to return an object with a class attribute which we will call "coins."

flip_coins <- function(n = 1, p = 0.5) {
  structure(sample(0:1, n, TRUE, c(1 - p, p)),
            p = p,
            class = "coins")
}
set.seed(8675309)
x <- flip_coins(500, 0.8)
str(x)
#>  'coins' int [1:500] 1 1 1 1 1 1 0 0 0 1 ...
#>  - attr(*, "p")= num 0.8

Then, we just need to change our coin_plot() function into the "coins" method for the S3 generic plot() function. At this time, we'll also update the function to accept the three dots. This will allow us to pass any additional optional graphical parameters to the underlying graphical functions. We will also put a copy of our run_prop() function into the body of our plot.coins() function so we don't have to worry about whether or not it exists elsewhere.

plot.coins <- function(x, type = "l",
                       xlim = NULL, ylim = c(0, 1),
                       main = "Running Proportion of Heads",
                       xlab = "Number of Flips",
                       ylab = "Proportion",
                       lty = c(1, 3),
                       lwd = c(1, 1),
                       col = c("black", "blue"),
                       ex_lab = paste("Probability of Heads = ", p),
                       ...) {
  run_prop <- function(x) {
    cumsum(x) / seq_along(x)
  }
  p = attr(x, "p")
  x <- run_prop(x)
  
  plot.default(x = x, type = type,
               xlim = xlim, ylim = ylim, main = main,
               xlab = xlab, ylab = ylab, lty = lty[[1]],
               lwd = lwd[[1]], col = col[[1]], ...)

  abline(a = p, b = 0, lty = tail(lty, 1),
         lwd = tail(lwd, 1), col = tail(col, 1))
  text(x = length(x) / 2, y = p + 0.03,
       labels = ex_lab, col = tail(col, 1))
}

plot(x, lwd = c(2, 4), col = c("blue", "gold"))

Now (for even more fun) we can do things like add the ability to plot a rolling mean over n values to smooth out the plot a bit.

plot.coins <- function(x, n = 0, type = "l",
                       xlim = NULL, ylim = c(0, 1),
                       main = "Running Proportion of Heads",
                       xlab = "Number of Flips",
                       ylab = "Proportion",
                       lty = c(1, 3),
                       lwd = c(1, 1),
                       col = c("black", "blue"),
                       ex_lab = paste("Probability of Heads = ", p),
                       ...) {
  run_prop <- function(x) {
    cumsum(x) / seq_along(x)
  }
  p = attr(x, "p")
  roll <- function(x, n = 3, i = seq_along(x)) {
    if (n <= 1) {
      x
    } else {
      k <- (n - 1) %/% 2
      roll_one <- Vectorize(function(i) {
        idx <- pmax(seq.int(i - k,length.out = n), 0)
        mean(x[idx], na.rm = TRUE)
      })
      roll_one(i)  
    }
  }
  x <- roll(run_prop(x), n)
  
  plot.default(x = x, type = type,
               xlim = xlim, ylim = ylim, main = main,
               xlab = xlab, ylab = ylab, lty = lty[[1]],
               lwd = lwd[[1]], col = col[[1]], ...)
  
  abline(a = p, b = 0, lty = tail(lty, 1),
         lwd = tail(lwd, 1), col = tail(col, 1))
  text(x = length(x) / 2, y = p + 0.03,
       labels = ex_lab, col = tail(col, 1))
}

par(mfrow = c(1, 2))
plot(x, lwd = c(2, 4), col = c("blue", "gold"))
plot(x, n = 10, lwd = c(2, 4), col = c("blue", "gold"))
par(mfrow = c(1, 1))

Of course, once we reach this level of custimization we would almost certainly be better off not calling plot.default() in our plot method. Instead we could simply modify its contents rather than having a function which just wraps it.

^{Created on 2020-09-02 by the reprex package (v0.3.0)}

system · September 23, 2020, 8:22pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.