I thought something like attributes(date) would show the seconds. I would expect that the date object would store the date as seconds from an origin. How does the date object store its assigned date?
To get really technical about it, date <- as.numeric(date) reassigns the name "date" to the new numeric object created by as.numeric(). For lots more on how this works in R, see chapter 3 of Advanced R, 2nd ed.
# devtools::install_github("r-lib/lobstr")
date <- as.Date("2000-01-01", "%Y-%m-%d")
# date and as.numeric(date) are stored at different
# locations in memory
lobstr::obj_addr(date)
#> [1] "0x7f8ca43a90c8"
lobstr::obj_addr(as.numeric(date))
#> [1] "0x7f8ca46fe718"
date <- as.numeric(date)
# copy-on-modify occurred:
# date now points to a new memory location
lobstr::obj_addr(date)
#> [1] "0x7f8ca4986d28"
Ack, sorry, that was meant as a general reply to the thread, not specifically directed "at" you! I always get bamboozled by how Discourse's reply buttons work (I should have hit the blue Reply underneath the last post, instead of the white Reply that's technically attached to the last post )
Note: I can't test it because I cannot run tracemem() on my system because I am on linux and I did not compile R with memory profiling enabled (see an issue I opened here for more info on this).
The class determines how the object behaves when passed to generic methods, like print. Because date has class Date, print knows to treat it differently and display the text representation of the date.
But the storage mode, or type, of the date object is still double which is the same thing as numeric.
In other words, the Date class is just a mask over a numeric value to tell R that the object behaves a certain way when passed to generic functions.
So there is no way to access the numeric property of the number of days because the object is the numeric number of days. as.numeric just strips off the Date class. You could also use unclass(date) to get the same effect.
You can see in the example I posted that date has not been modified in place (the new date is at 0x7f8ca4986d28, while the old one was at 0x7f8ca43a90c8). Here's tracemem (new session, so new memory locations)
date <- as.Date("2000-01-01", "%Y-%m-%d")
cat(tracemem(date), "\n")
#> <0x7fcd53a17408>
date <- as.numeric(date)
#> tracemem[0x7fcd53a17408 -> 0x7fcd53ad5798]:
It’s challenging to predict exactly when R applies this optimisation because of two complications:
When it comes to bindings, R can currently only count 0, 1, and many. That means if an object has two bindings, and one goes away, the reference count does not get decremented (one less than many is still many).
Whenever you call any regular function, it will make a reference to the object. The only exception are specially written C functions, which occur mostly in the base package.
Subtraction will get you the difference between any two dates.
Sys.Date() - as.Date("1970-01-01")
# Time difference of 17666 days
If you want to know the difference from 1970-01-01 because that's how dates are implemented, then you're stuck with as.numeric().
Also, it's good to remember there are two painful things shared by all (relevant) programming languages: string encoding and dates/times. Mostly because of locale information (time zones, daylight savings, formatting, etc.). So try to keep dates in Date objects as much as possible, and let R handle the mind-wracking minutiae.
The addresses have changed indeed (as given by lobstr::obj_addr()). But I don't think that the addresses and the object labels represent the same thing. I was confused about whether they did or not and opened this issue actually because I thought the chapter was very unclear on this.
I think our conversation today has allowed me to test that they represent different things:
Let's run Hadley's example of section 3.5:
The labels remain the same according to him. But the addresses do change:
I guess this is why he suggests to use tracemem() to see whether objects have been copied or changed-in-place and not to use lobstr::obj_addr().
I am no expert with tracemem() (I have actually never used it myself since I would have to recompile R to be able to use it!). But this might actually mean that the object was changed-in-place. Compare this with the tracemem() results he is presenting in the chapter... but I could be totally wrong (yet again! ).
This confirms that what labels are is not very clear in the chapter.
Now I think you may be right! I think I have been confused about the same thing you opened an issue about, without realizing it. I actually had to delete a bunch of stuff from the end of the tracemem output that gets added on due to running the code with reprex (apparently reprex isn't tracemem-friendly), and I confess that in my haste I didn't really look at what was left!
(This is also the problem with posting about complicated subjects in between doing several other things... You'd think I'd know better by now! )
(I also feel like this conversation has wandered really far from the original topic, and apologize to everybody else for that)
Object labels are only a thing in the diagrams in that chapter, not in R. R objects have names, which are bindings to locations in memory (the physical stuff, though still slightly indirectly) identified by addresses, which is what lobstr::obj_addr and tracemem are showing.
Thus, @jcblum is correct: the addresses are changing even though the name is not:
x <- Sys.Date()
tracemem(x)
#> [1] "<0x7fd63243fc98>"
x <- as.double(x)
#> tracemem[0x7fd63243fc98 -> 0x7fd6362378d8]
untracemem(x)
That does not mean it's necessarily a new object, just that R had to rearrange underneath to fit its new structure. From an R perspective, whether x gets overwritten or changed above is a bit unclear (and doesn't really matter). From a hardware perspective, it is a new object, presumably because your computer needs to reallocate the memory previously claimed by the class attribute.
Even when we would talk about an object in R definitely being changed, not overwritten, the R story may still not correspond to the memory story, e.g.
In this case, your computer needs to allocate new space for the names attribute—the previous space in memory was too small! Even though the R object is still the same, in your computer's memory, it is now in a new location.
Ultimately, the awesome part about R is that we don't have to care about memory and pointers and such most of the time. There's a reason that coding moved to higher-level languages: Managing memory yourself is a pain and error-prone. You can still do it in C if you need the control, but it is in no way necessary to become an awesome programmer/data scientist.
Thank you for clarifying these enigmatic labels in the book. All of this is very useful.
No attack on jcblum obviously! this has been a very informative and interesting conversation. But I disagree with this:
She said that "copy-on-modify occurred" and disagreed that it could be a case of modify-in-place:
The way she "demonstrates" this (date has different addresses before and after) is flowed as shown when running lobstr::obj_addr on Hadley's very example of modify-in-place which also gives different addresses (see my post earlier).
Right. And that's exactly what jcblum and I were fighting (in a friendly way!!) about.
In short,
I never argued against this. lobstr::obj_addr is clear enough. I was just doubting that copy-on-modify had occurred as she also said.