Working with large or temporary or private datasets with blogdown

I am relatively familiar with R, but I only recently started use R Markdown, so this might be a really obvious inquiry. I would like to start keeping a public notebook of different R tasks I perform. The blogdown package looks like a really great solution for that. However, I am not sure how well it will age. What I mean specifically is I am concerned about having to re-knit all the posts. It sounds like blogdown will re-knit all the posts every time. If you are working with large datasets, that may take a while. If you are downloading data, it may not even be available in the future. You can cache R markdown chunks, but that would make the directory very large. Also, the cache wouldn't be in the repo, so that would need to be backed up separately. Additionally, if I deploy on Netlify (official recommendation), I have no idea how the compiling issue would work with that. I doubt I am the first person to think about this. What is the optimal solution/workflow? Is there one?

1 Like

Hi Igor,

Not everything is re-rendered every time you add a post. Only a new or modified post will be rendered if you run blogdown::serve_site. See Yihui's response here

You can also customize your caching by post, or chunk, as described here (n.b. this is a link to the Different building methods section, the metadata just doesn't show up that way):

I'm not sure that these solutions necessarily fit your needs, but I'd recommend checking out the thread below, especially @andrewheiss' response.

5 Likes

Hi Igor, I'm also working with big data files and I was curious whether you had success with your blogdown site. I'm running into long loading and Rstudio issues (unable to connect to R session).