Working with large or temporary or private datasets with blogdown

igor · December 8, 2017, 8:55pm

I am relatively familiar with R, but I only recently started use R Markdown, so this might be a really obvious inquiry. I would like to start keeping a public notebook of different R tasks I perform. The blogdown package looks like a really great solution for that. However, I am not sure how well it will age. What I mean specifically is I am concerned about having to re-knit all the posts. It sounds like blogdown will re-knit all the posts every time. If you are working with large datasets, that may take a while. If you are downloading data, it may not even be available in the future. You can cache R markdown chunks, but that would make the directory very large. Also, the cache wouldn't be in the repo, so that would need to be backed up separately. Additionally, if I deploy on Netlify (official recommendation), I have no idea how the compiling issue would work with that. I doubt I am the first person to think about this. What is the optimal solution/workflow? Is there one?

mara · December 9, 2017, 6:36pm

Hi Igor,

Not everything is re-rendered every time you add a post. Only a new or modified post will be rendered if you run blogdown::serve_site. See Yihui's response here

You can also customize your caching by post, or chunk, as described here (n.b. this is a link to the Different building methods section, the metadata just doesn't show up that way):

I'm not sure that these solutions necessarily fit your needs, but I'd recommend checking out the thread below, especially @andrewheiss' response.

What's your blogdown workflow to include work from other projects R Markdown

I've played around with symlinks, which Hugo and blogdown respect. In a course website, I store all the data files in /static/data/ so they're easy to link to (/data/whatever.csv), but I also use subdirectories for pages on the site, so I have a structure like this: /content/lecture/01-lecture.Rmd /content/assignments/01-assignment.Rmd Each of those pages use data files in /static/data, so I symlink that directory into each (see here): /static/data -> /content/lecture/data /static/data -> /content/assignments/data That way, in each Rmd file, I can write code like read_csv("data/whatever.csv") and students don't have to worry about messing with relative paths. You could do the same thing…

shudras · July 21, 2019, 8:46am

Hi Igor, I'm also working with big data files and I was curious whether you had success with your blogdown site. I'm running into long loading and Rstudio issues (unable to connect to R session).