Numbering files in logical order

I was reading Jenny Bryan's slides on naming files. In theory I really like the idea of numbering your files in logical order e.g. 01_load-data.R, 02_plot.R and so on, but in practice how do we deal with the possibility that the order of operations might change, especially when there are a large number of files? E.g. maybe there's a data cleaning step 02_clean-data.R that needs to be added in, so all the filenames downstream of that need to be changed as well.

I imagine there are ways to automate the renaming/renumbering process through the command line. How about for people who aren't as experienced with command line?

2 Likes

I can’t speak for Jenny Bryan, of course, but I’ve understood her sequential file numbering advice to be an intentionally simple (and therefore accessible) solution meant to cover relatively simple cases (which is a lot of cases!).

Once you have enough steps to your process where you’re worried about inserting data cleaning step 3a in the middle of the chain, I think it’s time to consider more formal pipeline management tools, whether that’s old-school make, shiny new drake or remake, or something else.

5 Likes

Yes, this is a potential pain point.

I think of it two ways:

  • In a small project, the renumbering is a relatively small hassle and I kind of like the regular grooming / revisiting of things this forces on you.
  • In a large project, I intentionally leave blank space in my file numbering. So I might use the "0x"s for early inspection and diagnostics, the "1x"s for ingest and cleaning, ..., the "4x"s for modelling, ..., the "7x"s for figure-making, ..., the "9x"s for helper functions sourced from more than 1 file. This tends to limit the amount of renumbering you need to do when you add something (see first point).
6 Likes