FAQ: How to do a minimal reproducible example ( reprex ) for beginners

reprex

#1

A minimal reproducible example consists of the following items:

  • A minimal dataset, necessary to reproduce the error
  • The minimal runnable code necessary to reproduce the error, which can be run
    on the given dataset, and including the necessary information on the used packages.

Let's quickly go over each one of this with an example:

Minimal Dataset (Sample Data)

You need to provide a dataframe that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue.

Let's say, as example, that you are working with the iris dataframe

head(iris)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

Note: In this example we are using the built-in dataset iris, as a representation of your actual data, you should use your own dataset instead of iris, or if your problem can be reproduced with any dataset, then you could use iris directly (or any other built-in dataset e.g. mtcars, ToothGrowth, PlantGrowth, USArrests, etc.) and skip this step.

And you are having issues while trying to do a scatter plot between Sepal.Length and Sepal.Width, so a good minimal sample data for this would be just the first 5 rows of those two variables

head(iris, 5)[, c('Sepal.Length', 'Sepal.Width')]
#>   Sepal.Length Sepal.Width
#> 1          5.1         3.5
#> 2          4.9         3.0
#> 3          4.7         3.2
#> 4          4.6         3.1
#> 5          5.0         3.6

Now you just need to put this into a copy/paste friendly format for been posted in the forum, and you can easily do it with the datapasta package.

# If you don't have done it already, You have to install datapasta first with
# install.packages("datapasta")
datapasta::df_paste(head(iris, 5)[, c('Sepal.Length', 'Sepal.Width')])
# This is the sample data that you have to use in your reprex.
data.frame(
      Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5),
       Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6)
   )

Minimal Runnable Code

The next step is to put together an example of the code that is causing you troubles, and the libraries that you are using for that code.

library(ggplot2) # Make sure to include the calls for all the libraries that you are using in your example

# Remember to include the sample data that you have generated in the previous step.
df <- data.frame(stringsAsFactors = FALSE,
                 Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5),
                 Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6)
)
# Narrow down your code to just the problematic part.
ggplot(data = df, x = Sepal.Length, y = Sepal.Width) +
    geom_point()
#> Error: geom_point requires the following missing aesthetics: x, y

Your Final reprex

Now that you have a minimal reproducible example that shows your error, it's time to put it into a propper format to be posted in the community forum, this is very easy to do with the reprex package, just copy your code with Ctrl + c and run reprex() function in your console pane

# If you don't have done it already, You have to install reprex first with
# install.packages("reprex")
reprex::reprex()

Now you can just do Ctrl + v in your forum post and voilà!, you have a properly formatted reprex like this:

```r
library(ggplot2)

df <- data.frame(stringsAsFactors = FALSE,
                 Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5),
                 Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6)
)
ggplot(data = df, x = Sepal.Length, y = Sepal.Width) +
    geom_point()
#> Error: geom_point requires the following missing aesthetics: x, y
```

![](https://i.imgur.com/rAYQlnn.png)

Note: The previous approach works if you are using a desktop version of rstudio but if you are using a server version (and you don't have access to your clipboard), you will have to paste your code inside the reprex() funtion like this.

reprex::reprex({
library(ggplot2)

df <- data.frame(stringsAsFactors = FALSE,
                 Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5),
                 Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6)
                 )

ggplot(data = df, x = Sepal.Length, y = Sepal.Width) +
    geom_point()
})

The Answer

If you follow all this steps, most likely someone is going to copy your code into its own rstudio session, figure out that you forgot to put your variables inside the aes() function, and answer to you with a working solution like this.

library(ggplot2)

df <- data.frame(stringsAsFactors = FALSE,
                 Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5),
                 Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6)
                 )

ggplot(data = df, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()


Can I move the files from the Data section of the Global Environment to the Values? The ones that were meant for Data went into the Values section and now nothing is graphing!
How to substring value from row in r
How to run a mult. Linear regression in R, after importing the data.
creating histogram with lattice installed
geom_bar display empty plot
Error in if (attr(vectorizer, &quot;grow_dtm&quot;, TRUE) == FALSE
Regression model to predict student's grade in R
How to compare a forecast model to actual data and what is uncertainty?
trouble while using prophet() in R
Read characters with grep
Apply function works incorrectly
Converting 20110101 to 1
Filter unique and
What is the difference of occurrence and density for different types with in 4 different populations
Mutate Evaluation error: objet '...' introuvable."
Individual Scatter Boxplots for very large dataset marking price and product code
Column that is a list
Formating DT table to add background color for interval values
translating xlsx.writeMultipleData("stdresids.xlsx", stdresids)" from library (xlsx) to library(writexl)
Error in undefined Columns
metafor package, change the tol
Help with Function()
Stacked bar chart with continuous Y variables
How to extract factors names from anova function
#2

Please feel free to improve this FAQ, just keep in mind the general goal of making it friendly for r beginners (and if possible for non native English speakers as well)


Error While using KNN model
cspade failing in RStudio under Windows 10 (but not in Rterm or RStudio under Linux)
creating histogram with lattice installed
Mutiple .txt list to data frame in r
geom_bar display empty plot
Error in if (attr(vectorizer, &quot;grow_dtm&quot;, TRUE) == FALSE
Regression model to predict student's grade in R
trouble while using prophet() in R
Read characters with grep
Apply function works incorrectly
Converting 20110101 to 1
Filter unique and
What is the difference of occurrence and density for different types with in 4 different populations
Mutate Evaluation error: objet '...' introuvable."
Formating DT table to add background color for interval values
translating xlsx.writeMultipleData("stdresids.xlsx", stdresids)" from library (xlsx) to library(writexl)
Error in undefined Columns
metafor package, change the tol