Formal Documentation On Column Name Syntax


#1

This may seem like an impertinent question but I really have done some googling to try to find the BNF or any rigorous documentation on column name syntax.

The reason I need rigor is that I have 700+ csv files that have names that I drew from patchworknation but they had all kinds of special characters and what not. Some are very long names. I intend on getting these things imported, one per column, in a huge R data file and want to keep the correspondence between the file names and the column names. So I’ve renamed them all to snake case, lower cased them and eliminated all of the characters outside of [a-z0-9_]. That should be well within most sane constraints on names but I can’t even be certain that the length of the names is ok – and I’m a little nervous about underscore given this ancient query for help that no one answered.


#2

I believe the underscore issue in your link was dealt with many, many years ago, so does not present any issue nowadays (assuming you don’t have an ancient version of R). You may have noticed that the tidyverse examples use _ in column names.


#3

I don’t know of anything quite as formal as a BNF for R. I’ve looked before and was never able to find one.

But there is a language definition.

https://cran.r-project.org/doc/manuals/r-release/R-lang.html

The parser part (section 10) of the document may be of some help to you since column names will be parsed by R before they can be used to make a tibble or data.frame.