FAQ: How to do a minimal reproducible example ( reprex ) for beginners

A minimal reproducible example consists of the following items:

  • A minimal dataset, necessary to reproduce the issue
  • The minimal runnable code necessary to reproduce the issue, which can be run
    on the given dataset, and including the necessary information on the used packages.

Let's quickly go over each one of these with examples:

Minimal Dataset (Sample Data)

You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue.

Let's say, as an example, that you are working with the iris data frame

head(iris)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

Note: In this example we are using the built-in dataset iris, as a representation of your actual data, you should use your own dataset instead of iris, or if your problem can be reproduced with any dataset, then you could use iris directly (or any other built-in dataset e.g. mtcars, ToothGrowth, PlantGrowth, USArrests, etc.) and skip this step.

And you are having issues while trying to do a scatter plot between Sepal.Length and Sepal.Width, so a good minimal sample data for this case would be just the first 5 rows of those two variables, this doesn't mean that you have to necessarily do the same, use your best judgment to decide the minimal amount of sample data needed to exemplify your specific problem.

head(iris, 5)[, c('Sepal.Length', 'Sepal.Width')]
#>   Sepal.Length Sepal.Width
#> 1          5.1         3.5
#> 2          4.9         3.0
#> 3          4.7         3.2
#> 4          4.6         3.1
#> 5          5.0         3.6

Now you just need to put this into a copy/paste friendly format for been posted in the forum, and you can easily do it with the datapasta package.

# If you don't have done it already, You have to install datapasta first with
# install.packages("datapasta")
datapasta::df_paste(head(iris, 5)[, c('Sepal.Length', 'Sepal.Width')])
# This is the sample data that you have to use in your reprex.
data.frame(
      Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5),
       Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6)
   )

A nice guide about datapasta can be found here:

You can also use dput provided in base, which is as simple as this:

dput(head(iris, 5)[c("Sepal.Length", "Sepal.Width")])
#> structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5), Sepal.Width = c(3.5, 
#> 3, 3.2, 3.1, 3.6)), row.names = c(NA, 5L), class = "data.frame")

This output may seem awkward compared to the output of datapasta, but it's much more general in the sense that it supports many more types of R objects.

Minimal Runnable Code

The next step is to put together an example of the code that is causing you troubles, and the libraries that you are using for that code. Please narrow down your code to just the relevant and essential part needed to reproduce your issue.

library(ggplot2) # Make sure to include library calls for all the libraries that you are using in your example

# Remember to include the sample data that you have generated in the previous step.
df <- data.frame(stringsAsFactors = FALSE,
                 Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5),
                 Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6)
)
# Narrow down your code to just the problematic part.
ggplot(data = df, x = Sepal.Length, y = Sepal.Width) +
    geom_point()
#> Error: geom_point requires the following missing aesthetics: x, y

Your Final reprex

Now that you have a minimal reproducible example that shows your problem, it's time to put it into a proper format to be posted in the community forum, this is very easy to do with the reprex package, just copy your code with Ctrl + c and run reprex() function in your console pane

# If you don't have done it already, You have to install reprex first with
# install.packages("reprex")
reprex::reprex()

Now you can just do Ctrl + v in your forum post and voilà!, you have a properly formatted reprex like this:

```r
library(ggplot2)

df <- data.frame(stringsAsFactors = FALSE,
                 Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5),
                 Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6)
)
ggplot(data = df, x = Sepal.Length, y = Sepal.Width) +
    geom_point()
#> Error: geom_point requires the following missing aesthetics: x, y
```

![](https://i.imgur.com/rAYQlnn.png)

Note: The previous approach works if you are using a desktop version of RStudio but if you are using a server version (or you don't have access to your clipboard), you will have to select the code typed in the "source" pane and run the reprex() function without arguments in the "console" pane.

reprex::reprex()

reprex is going to automatically detect an empty clipboard and use the current selection instead, then a .md file is going to be created in the "source" pane with your reproducible example properly formatted and ready to be copied and pasted into your forum post.

Another point to note here is that reprex does not run in your working directory by default. It creates a temporary working directory, and hence if you read files from your working directory using relative paths, you'll get a lot of errors. The preferable way of avoiding this is to share the data in a copy-paste friendly format using datapasta or dput (as discussed above). But in case you can't do that, you can force reprex to use the current working directory using the outfile = NA argument, this won't be reproducible as people will not have access to your local files, and hence you will have to share your data set using some cloud storage service, for example Dropbox, Google Drive, etc.

The Answer

If you follow all these steps, most likely someone is going to copy your code into its R session, figure out what the problem is, in this example it would be that you forgot to put your variables inside the aes() function, and answer to you with a working solution like this.

library(ggplot2)

df <- data.frame(stringsAsFactors = FALSE,
                 Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5),
                 Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6)
                 )

ggplot(data = df, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()

Please feel free to improve this FAQ, just keep in mind the general goal of making it friendly for r beginners (and if possible for non native English speakers as well)

EDIT: Translations to other languages are also welcome (please try to keep translation as close as posible to the original English text)

13 Likes