Why does head() show 6 rows by default?

There's no real motivation behind this question other than pure curiosity. Something like 5 or 10 seems like a more natural choice, so I'm just wondering if anyone knows why it's 6 rows.

11 Likes

To answer a question like this, I first starting looking through the S books I have on hand (e.g. The New S Language). I don't see head() mentioned in the index, so that suggests it's a function introduced by R.

Since it's an R function, I can next search @winston's GitHub mirror of the R sources: https://github.com/wch/r-source, finding the source at https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/library/utils/R/head.R

This includes a comment which suggest we should ask Patrick Burns:

### placed in the public domain 2002
### Patrick Burns patrick@burns-stat.com
###
### Adapted for negative arguments by Vincent Goulet
### <vincent.goulet@act.ulaval.ca>, 2006

But it's worth checking just to make sure it's always used 6. I click on history, and then find the first version: https://github.com/wch/r-source/commit/37271cdbdcd7e5d82c79bdb536ef305d93b644ad#diff-941bf47bf09f67538338535bd512d521 - so it has been six from the very beginning.

So next step, I'll email Patrick and see if he recollects...

38 Likes

This is a fantastic first response, as it not only addresses the question asked, but shows your thought process to help answer future questions before they're asked. Thanks for the genuine response, and I'm looking forward to hearing the response from Patrick.

4 Likes

Just wait until you hear the endgame! I'll leave it to @hadley to recount— it's a :gem:! (Then, I'll offer my very-important, intellectual analogy)!

2 Likes

From Pat (via email):

I came upon 'head' and 'tail' at one of my clients. That implementation had n = 5. I didn't think there would ever be an issue regarding ownership of the code, but I changed to 6 just to help if there were a conflict.

35 Likes

@Bryan, my super important analogy:

n = 6 : R :: brown M&Ms : Van Halen

For those of you not from the U.S. and/or familiar with the weird standardized testing analogy notation:

colon (:) means "is to" and a double colon (::) means "as"

src: http://www.wordmasterschallenge.com/listtag/analogy

7 Likes

Because 6x9=42

2 Likes

I'm mildly disappointed that the answer didn't come down to something about a six-fingered man, maybe one that killed someone's father.

Still, interesting reason, and very informative walk-through of the process!

5 Likes

As you wish…

2 Likes

Thanks again for digging into this. Curiosity satisfied.

1 Like

sixHeaded

What a case of "We've always done it this way" and one person challenging the assumption to find the reason why.

2 Likes

I'm also curious as to why View() is seemingly the only function I've run into that requires capital letter as the first character.

3 Likes

There's quite a few capitalised functions, including some pretty commonly used in functional programming, some statistical tests, and all the Sys. functions. Here's a (not exhaustive) list of some that might get used semi-regularly:

AIC, BIC, C, Find, Filter, HoltWinters, I, a bunch starting with Kalman, Map, Negate, Position, Reduce, Sys.Date, Sys.time, Sys.info etc, Vectorize, X11.

3 Likes

@dylanjm there are a few, and seemingly without (much) rhyme or reason:

grep('^[A-Z]', ls(envir = as.environment('package:base')), value = TRUE)
#  [1] "Arg"                     "Conj"                    "Cstack_info"             "Encoding"                "Encoding<-"              "F"                      
#  [7] "Filter"                  "Find"                    "I"                       "ISOdate"                 "ISOdatetime"             "Im"                     
# [13] "LETTERS"                 "La.svd"                  "La_library"              "La_version"              "Map"                     "Math.Date"              
# [19] "Math.POSIXt"             "Math.data.frame"         "Math.difftime"           "Math.factor"             "Mod"                     "NCOL"                   
# [25] "NROW"                    "Negate"                  "NextMethod"              "OlsonNames"              "Ops.Date"                "Ops.POSIXt"             
# [31] "Ops.data.frame"          "Ops.difftime"            "Ops.factor"              "Ops.numeric_version"     "Ops.ordered"             "Position"               
# [37] "R.Version"               "R.home"                  "R.version"               "R.version.string"        "RNGkind"                 "RNGversion"             
# [43] "R_system_version"        "Re"                      "Recall"                  "Reduce"                  "Summary.Date"            "Summary.POSIXct"        
# [49] "Summary.POSIXlt"         "Summary.data.frame"      "Summary.difftime"        "Summary.factor"          "Summary.numeric_version" "Summary.ordered"        
# [55] "Sys.Date"                "Sys.chmod"               "Sys.getenv"              "Sys.getlocale"           "Sys.getpid"              "Sys.glob"               
# [61] "Sys.info"                "Sys.localeconv"          "Sys.readlink"            "Sys.setFileTime"         "Sys.setenv"              "Sys.setlocale"          
# [67] "Sys.sleep"               "Sys.time"                "Sys.timezone"            "Sys.umask"               "Sys.unsetenv"            "Sys.which"              
# [73] "T"                       "UseMethod"               "Vectorize"              

(with more in methods, utils, stats...)

Not from the US and not familiar with the notation, but I sure am familiar with the Van Halen M&Ms thing and the reasons behind it! :smile:

I forget that uppercase versions of nrow() and ncol() exist. I assume this is also relatively arbitrary/historical? From the documentation:

nrow and ncol return the number of rows or columns present in x. NCOL and NROW do the same treating a vector as 1-column matrix

NROW will work on objects where nrow does not, e.g., on lists:

NROW = function(x) {
  if (length(d <- dim(x))) d[1L] else length(x)
}
1 Like

:thinking: It seems sub-optimal to use the same function name with variation in capitalisation -- NCOL() != ncol() -- this could lead to some mix-ups if people aren't paying attention, are beginners, etc.

1 Like

No doubt. But my favorite is sample's surprise.