There's no real motivation behind this question other than pure curiosity. Something like 5 or 10 seems like a more natural choice, so I'm just wondering if anyone knows why it's 6 rows.
To answer a question like this, I first starting looking through the S books I have on hand (e.g. The New S Language). I don't see head()
mentioned in the index, so that suggests it's a function introduced by R.
Since it's an R function, I can next search @winston's GitHub mirror of the R sources: https://github.com/wch/r-source, finding the source at https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/library/utils/R/head.R
This includes a comment which suggest we should ask Patrick Burns:
### placed in the public domain 2002
### Patrick Burns patrick@burns-stat.com
###
### Adapted for negative arguments by Vincent Goulet
### <vincent.goulet@act.ulaval.ca>, 2006
But it's worth checking just to make sure it's always used 6. I click on history, and then find the first version: https://github.com/wch/r-source/commit/37271cdbdcd7e5d82c79bdb536ef305d93b644ad#diff-941bf47bf09f67538338535bd512d521 - so it has been six from the very beginning.
So next step, I'll email Patrick and see if he recollects...
This is a fantastic first response, as it not only addresses the question asked, but shows your thought process to help answer future questions before they're asked. Thanks for the genuine response, and I'm looking forward to hearing the response from Patrick.
Just wait until you hear the endgame! I'll leave it to @hadley to recount— it's a ! (Then, I'll offer my very-important, intellectual analogy)!
From Pat (via email):
I came upon 'head' and 'tail' at one of my clients. That implementation had n = 5. I didn't think there would ever be an issue regarding ownership of the code, but I changed to 6 just to help if there were a conflict.
@Bryan, my super important analogy:
n = 6 : R :: brown M&Ms : Van Halen
For those of you not from the U.S. and/or familiar with the weird standardized testing analogy notation:
colon
(:)
means "is to" and a double colon(::)
means "as"
I'm mildly disappointed that the answer didn't come down to something about a six-fingered man, maybe one that killed someone's father.
Still, interesting reason, and very informative walk-through of the process!
As you wish…
Thanks again for digging into this. Curiosity satisfied.
What a case of "We've always done it this way" and one person challenging the assumption to find the reason why.
I'm also curious as to why View()
is seemingly the only function I've run into that requires capital letter as the first character.
There's quite a few capitalised functions, including some pretty commonly used in functional programming, some statistical tests, and all the Sys. functions. Here's a (not exhaustive) list of some that might get used semi-regularly:
AIC, BIC, C, Find, Filter, HoltWinters, I, a bunch starting with Kalman, Map, Negate, Position, Reduce, Sys.Date, Sys.time, Sys.info etc, Vectorize, X11.
@dylanjm there are a few, and seemingly without (much) rhyme or reason:
grep('^[A-Z]', ls(envir = as.environment('package:base')), value = TRUE)
# [1] "Arg" "Conj" "Cstack_info" "Encoding" "Encoding<-" "F"
# [7] "Filter" "Find" "I" "ISOdate" "ISOdatetime" "Im"
# [13] "LETTERS" "La.svd" "La_library" "La_version" "Map" "Math.Date"
# [19] "Math.POSIXt" "Math.data.frame" "Math.difftime" "Math.factor" "Mod" "NCOL"
# [25] "NROW" "Negate" "NextMethod" "OlsonNames" "Ops.Date" "Ops.POSIXt"
# [31] "Ops.data.frame" "Ops.difftime" "Ops.factor" "Ops.numeric_version" "Ops.ordered" "Position"
# [37] "R.Version" "R.home" "R.version" "R.version.string" "RNGkind" "RNGversion"
# [43] "R_system_version" "Re" "Recall" "Reduce" "Summary.Date" "Summary.POSIXct"
# [49] "Summary.POSIXlt" "Summary.data.frame" "Summary.difftime" "Summary.factor" "Summary.numeric_version" "Summary.ordered"
# [55] "Sys.Date" "Sys.chmod" "Sys.getenv" "Sys.getlocale" "Sys.getpid" "Sys.glob"
# [61] "Sys.info" "Sys.localeconv" "Sys.readlink" "Sys.setFileTime" "Sys.setenv" "Sys.setlocale"
# [67] "Sys.sleep" "Sys.time" "Sys.timezone" "Sys.umask" "Sys.unsetenv" "Sys.which"
# [73] "T" "UseMethod" "Vectorize"
(with more in methods
, utils
, stats
...)
Not from the US and not familiar with the notation, but I sure am familiar with the Van Halen M&Ms thing and the reasons behind it!
I forget that uppercase versions of nrow()
and ncol()
exist. I assume this is also relatively arbitrary/historical? From the documentation:
nrow and ncol return the number of rows or columns present in x. NCOL and NROW do the same treating a vector as 1-column matrix
NROW
will work on objects where nrow
does not, e.g., on lists
:
NROW = function(x) {
if (length(d <- dim(x))) d[1L] else length(x)
}
It seems sub-optimal to use the same function name with variation in capitalisation -- NCOL()
!= ncol()
-- this could lead to some mix-ups if people aren't paying attention, are beginners, etc.
No doubt. But my favorite is sample
's surprise.