Need help in making personal R package

package

#1

Good day!

I wanted to make my own personal R package so that I can automate the task I want. Currently, I am in the process of learning how to make a package, and I have no knowledge in such subject.

Here is the graph that I want to produce, if it is feasible, by the package I am thinking:

froese(x, lm = 10.2, bw = 0.5)

I know that this code I am thinking will produce an error, but I write it anyway, so that I can have my first step. This is the code:

froese <- function(x, lm, bw){
  # x = the data frame
  # lm = length at first maturity
  # bw = the specified binwidth (possible choices are 0.5 or 1.0)
  
  # Load the required packages
  library(magrittr)
  library(dplyr)
  library(ggplot2)
  # Find the minimum and maximum value in the data
  min_lf <- floor(min(x))
  max_lf <- ceiling(max(x))
  # Compute for the value of megaspawners
  mega <- max_lf * 0.7
  # Sort the data frame from lowest to highest.
  # Needed for the computation for the percentage of juveniles, adults,
  # and megaspawners individual
  x <- sort(x, decreasing = FALSE)
  x <- as.data.frame(x)
  # Filter the lengths that are considered as juveniles
  juveniles <- x %>% 
    dplyr::filter(x < lm)
  # Filter the lengths that are considered as adults
  adults <- x %>% 
    dplyr::filter(x >= lm & x < mega)
  # Filter the lengths that are considered as megaspawners
  megaspawners <- x %>% 
    dplyr::filter(x >= mega)
  # Compute for the percent contribution of juveniles to the whole collected data
  prcnt_juveniles <- round((nrow(juveniles) / nrow(x)) * 100, digits = 2)
  # Compute for the percent contribution of adults to the whole collected data
  prcnt_adults <- round((nrow(adults) / nrow(x)) * 100, digits = 2)
  # Compute for the percent contribution of megaspawners to the whole collected data
  prcnt_megaspawners <- round((nrow(megaspawners) / nrow(x)) * 100, digits = 2)
  # Save the result of the histogram to a variable, so to extract the
  # mid-length with the highest frequency. This is needed for adding annotation in the graph
  res_hist <- hist(x, 
                   breaks = seq(from = min_lf,
                                to = max_lf, 
                                by =  bw))
  # Extract which index has the highest frequency
  max_val <- which.max(res_hist$counts)
  # Find the corresponding value
  max_val1 <- as.numeric(res_hist$counts[max_val])
  # Make the plot
  p1 <-  
    ggplot(aes(x = x)) +
    geom_histogram(binwidth = bw, colour = "#555555",
                   fill = "#23272A") +
    scale_x_continuous(breaks = seq(from = min_lf, 
                                    to = max_lf, 
                                    by =  bw),
                       limits = c(min_lf, max_lf)) +
    geom_vline(xintercept = lm, color = "#D9534F", linetype = "dashed") +
    geom_vline(xintercept = mega, color = "#D9534F", linetype = "dashed") +
    annotate("rect", xmin = -Inf, xmax = lm, ymin = 0, ymax = Inf, alpha = 0.2, fill = "yellow") +
    annotate("rect", xmin = lm, xmax = mega, ymin = 0, ymax = Inf, alpha = 0.2, fill = "red") +
    annotate("rect", xmin = mega, xmax = Inf, ymin = 0, ymax = Inf, alpha = 0.2, fill = "blue") +
    annotate("text", x = ((max(juveniles) + min(juveniles)) / 2), y = max_val1 + 100, label = paste0(prcnt_juveniles, "%\nJuveniles"), size = 4) +
    annotate("text", x = ((max(adults) + min(adults)) / 2), y = max_val1 + 100, label = paste0(prcnt_adults, "%\nAdults"), size = 4) +
    annotate("text", x = ((max(megaspawners) + min(megaspawners)) / 2), y = max_val1 + 100, label = paste0(prcnt_megaspawners, "%\nMegaspawners"), size = 4)
  # Print the result to the screen
  print(p1)
}

I am thinking how to apply it to a data frame or vectors containing lengths, and apply if I want to facet the graph.

Hoping for your kind consideration to this matter.

Edit: This is the sample data. sample-length


#2

Sounds like an interesting project. This is a good article to read to get you started: https://support.rstudio.com/hc/en-us/articles/200486488-Developing-Packages-with-RStudio

From your comments above I'm not really sure if you have package development questions or questions about your function. Maybe you could clarify what you're looking for.


#3

Thanks. I will read that article. Actually, both. When I run the function, it throws an error. Honestly, I am not yet well verse in debugging errors and code edit. And I would like to make it a package. Right now I am reading the online version of R Packages.


#4

Definitely check out that resource that @jdlong pointed you to. I'm also a big fan of Hadley Wickham's book on package creation.

I'd also highly recommend checking out the source code for your favourite R package after you've read through the resources here- it'll help put things into context.

One small thing I noticed: you definitely shouldn't be calling library() from within a function. In the context of packages, you'd specify these dependencies in the DESCRIPTION file (check out this chapter of the Hadley book for more information). Even if you're just looking to use this is a standalone function (i.e.- not in a package), you should avoid calling library() within the actual function.

Also: just noticed that you're calling filter incorrectly. You should be using the variable name to filter on rather than the dataframe itself e.g.:

my_data <- data.frame(a = c(1,2,3), b = c(5,6,7))

my_data %>%
     filter(a > 1)


#5

Thank you for pointing that. I manage to edit my code. This is my modification:

froese <- function(x, lm, bw) {
  library(ggplot2)
  lm <- lm
  bw <- bw
  min_lf <- floor(min(x))
  max_lf <- ceiling(max(x))
  mega <- max_lf * 0.7
  
  x <- as.data.frame(x)
  x <- x[order(x), ]
  x <- as.data.frame(x)
  
  juveniles <- x[x[1] < lm, ]
  adults <- x[x[1] >= lm & x[1] < mega, ]
  megaspawners <- x[x[1] >= mega, ]
  
  prcnt_juveniles <- round((nrow(juveniles) / nrow(x)) * 100, digits = 2)
  prcnt_adults <- round((nrow(adults) / nrow(x)) * 100, digits = 2)
  prcnt_megaspawners <- round((nrow(megaspawners) / nrow(x)) * 100, digits = 2)
  
  z <- unlist(x)
  
  res_hist <- hist(z, 
                   breaks = seq(from = min_lf,
                                to = max_lf, 
                                by = bw))
  
  max_val <- which.max(res_hist$counts)
  max_val1 <- as.numeric(res_hist$counts[max_val])
  
  p1 <-  
    ggplot(data = x, aes(x = x[1])) +
    geom_histogram(binwidth = bw, colour = "#555555",
                   fill = "#23272A") +
    scale_x_continuous(breaks = seq(from = min_lf, 
                                    to = max_lf, 
                                    by =  bw),
                       limits = c(min_lf, max_lf)) +
    geom_vline(xintercept = lm, color = "#D9534F", linetype = "dashed") +
    geom_vline(xintercept = mega, color = "#D9534F", linetype = "dashed") +
    annotate("rect", xmin = -Inf, xmax = lm, ymin = 0, ymax = max_val1 + 150, alpha = 0.2, fill = "yellow") +
    annotate("rect", xmin = lm, xmax = mega, ymin = 0, ymax = max_val1 + 150, alpha = 0.2, fill = "red") +
    annotate("rect", xmin = mega, xmax = Inf, ymin = 0, ymax = max_val1 + 150, alpha = 0.2, fill = "blue") +
    annotate("text", x = ((max(juveniles) + min(juveniles)) / 2), y = max_val1 + 100, label = paste0(prcnt_juveniles, "%\nJuveniles"), size = 4) +
    annotate("text", x = ((max(adults) + min(adults)) / 2), y = max_val1 + 100, label = paste0(prcnt_adults, "%\nAdults"), size = 4) +
    annotate("text", x = ((max(megaspawners) + min(megaspawners)) / 2), y = max_val1 + 100, label = paste0(prcnt_megaspawners, "%\nMegaspawners"), size = 4)
  
  print(p1)

}

But the plot produced if I run froese(x, lm = 10.2, bw = 0.5) is here:

What is wrong here?

P.S. After I finalize the function, I will remove the library(ggplot2) from the function, and will follow what you described (described in the link you provided) - will put it in the DESCRIPTION file.


#6

I finally make it work. Now, I will read on how to make it a package.

Here's the working code:

froese <- function(x, lm, bw) {
  library(ggplot2)
  lm <- lm
  bw <- bw
  min_lf <- floor(min(x))
  max_lf <- ceiling(max(x))
  mega <- max_lf * 0.7
  
  x <- as.data.frame(x)
  x <- x[order(x), ]
  x <- as.data.frame(x)
  
  juveniles <- x[x[1] < lm, ]
  adults <- x[x[1] >= lm & x[1] < mega, ]
  megaspawners <- x[x[1] >= mega, ]
  
  prcnt_juveniles <- round((length(juveniles) / nrow(x)) * 100, digits = 2)
  prcnt_adults <- round((length(adults) / nrow(x)) * 100, digits = 2)
  prcnt_megaspawners <- round((length(megaspawners) / nrow(x)) * 100, digits = 2)
  
  z <- c(t(x))
  
  res_hist <- hist(z, 
                   breaks = seq(from = min_lf,
                                to = max_lf, 
                                by = bw),
                   plot = FALSE)
  
  max_val <- which.max(res_hist$counts)
  max_val1 <- as.numeric(res_hist$counts[max_val])
  
  p <-  ggplot(data = x, aes(x = x))
  
  p1 <- p + geom_histogram(binwidth = bw, colour = "#555555",
                           fill = "#23272A") +
    scale_x_continuous(breaks = seq(from = min_lf, 
                                    to = max_lf, 
                                    by =  bw),
                       limits = c(min_lf, max_lf))
  
  p2 <- p1 +
    geom_vline(xintercept = lm, color = "#D9534F", linetype = "dashed") +
    geom_vline(xintercept = mega, color = "#D9534F", linetype = "dashed") +
    annotate("rect", xmin = -Inf, xmax = lm, ymin = 0, ymax = Inf, 
             alpha = 0.2, fill = "yellow") +
    annotate("rect", xmin = lm, xmax = mega, ymin = 0, ymax = Inf, 
             alpha = 0.2, fill = "red") +
    annotate("rect", xmin = mega, xmax = Inf, ymin = 0, ymax = Inf, 
             alpha = 0.2, fill = "blue") +
    annotate("text", x = ((max(juveniles) + min(juveniles)) / 2), y = max_val1 + 100, 
             label = paste0(prcnt_juveniles, "%\nJuveniles"), size = 4) +
    annotate("text", x = ((max(adults) + min(adults)) / 2), y = max_val1 + 100, 
             label = paste0(prcnt_adults, "%\nAdults"), size = 4) +
    annotate("text", x = ((max(megaspawners) + min(megaspawners)) / 2), y = max_val1 + 100, 
             label = paste0(prcnt_megaspawners, "%\nMegaspawners"), size = 4)
  
  plot(p2)

}


#7

You can also use this gitbook in conjunction with Hadley’s book referenced above to make the package building process less painful.

Where this section is relevant to your usecase.