Are the seconds of a date object stored in the object?

adpatter · May 15, 2018, 4:20pm

With this:

date <- as.Date("2000-01-01", "%Y-%m-%d")

I thought something like attributes(date) would show the seconds. I would expect that the date object would store the date as seconds from an origin. How does the date object store its assigned date?

nutterb · May 15, 2018, 4:26pm

From ?Date

Dates are represented as the number of days since 1970-01-01

The object that counts the number of seconds since an origin in POSIXt.

From ?DateTimeClasses

Class "POSIXct" represents the (signed) number of seconds since the beginning of 1970

To get the number of seconds from a POSIXct object, you may call as.numeric

x <- Sys.time()
class(x)

as.numeric(x)

adpatter · May 15, 2018, 4:30pm

as.numeric(date)
[1] 11342

Is there a way to access the property in the date object that stores the days since 1970-01-01, without converting the object to numeric?

prosoitos · May 15, 2018, 5:42pm

Side note:

My output is different from yours. Are you running it on something different than what you gave us as an example?

date <- as.Date("2000-01-01", "%Y-%m-%d")

as.numeric(date)
#> [1] 10957

prosoitos · May 15, 2018, 6:13pm

as.numeric() does not convert your object. To convert it, you would have to reassign the output to your object, for instance with:

date <- as.numeric(date) or

date <- date %>% as.numeric() or

library(magrittr)

date %<>% as.numeric()

or with a mutate() or map() function.

After running as.numeric() on it, your date is unchanged (and so in the same class) it was before:

date <- as.Date("2000-01-01", "%Y-%m-%d")

str(date)
#>  Date[1:1], format: "2000-01-01"

as.numeric(date)
#> [1] 10957

str(date)
#>  Date[1:1], format: "2000-01-01"

jcblum · May 15, 2018, 6:19pm

To get really technical about it, date <- as.numeric(date) reassigns the name "date" to the new numeric object created by as.numeric(). For lots more on how this works in R, see chapter 3 of Advanced R, 2nd ed.

# devtools::install_github("r-lib/lobstr")
date <- as.Date("2000-01-01", "%Y-%m-%d")

# date and as.numeric(date) are stored at different
# locations in memory
lobstr::obj_addr(date)
#> [1] "0x7f8ca43a90c8"
lobstr::obj_addr(as.numeric(date))
#> [1] "0x7f8ca46fe718"

date <- as.numeric(date)

# copy-on-modify occurred: 
# date now points to a new memory location
lobstr::obj_addr(date)
#> [1] "0x7f8ca4986d28"

prosoitos · May 15, 2018, 6:21pm

True. And I read that last week

jcblum · May 15, 2018, 6:25pm

Ack, sorry, that was meant as a general reply to the thread, not specifically directed "at" you! I always get bamboozled by how Discourse's reply buttons work (I should have hit the blue Reply underneath the last post, instead of the white Reply that's technically attached to the last post )

prosoitos · May 15, 2018, 6:26pm

Oh, no worries!! And you are perfectly right and I appreciate to be corrected! This is the best way to learn!

prosoitos · May 15, 2018, 6:34pm

Actually, I am not sure you are correct here: I feel we might be in the exception of an object with a single binding where modify-in-place occurs.

See section 3.5 of 2 Names and values | Advanced R. No??

Note: I can't test it because I cannot run tracemem() on my system because I am on linux and I did not compile R with memory profiling enabled (see an issue I opened here for more info on this).

prosoitos · May 15, 2018, 6:43pm

Either way, for practical purposes and to answer @adpatter, running as.numeric() on date will not convert it, nor affect it in any way.

nutterb · May 15, 2018, 6:45pm

No, there isn't. But there doesn't need to be. Each R object has two aspects to it, its "type" and its "class".

Consider:

date <- Sys.Date()

class(date)
# [1] "Date"

typeof(date)
# [1] "double"

The class determines how the object behaves when passed to generic methods, like print. Because date has class Date, print knows to treat it differently and display the text representation of the date.

But the storage mode, or type, of the date object is still double which is the same thing as numeric.

In other words, the Date class is just a mask over a numeric value to tell R that the object behaves a certain way when passed to generic functions.

So there is no way to access the numeric property of the number of days because the object is the numeric number of days. as.numeric just strips off the Date class. You could also use unclass(date) to get the same effect.

jcblum · May 15, 2018, 6:49pm

You can see in the example I posted that date has not been modified in place (the new date is at 0x7f8ca4986d28, while the old one was at 0x7f8ca43a90c8). Here's tracemem (new session, so new memory locations)

date <- as.Date("2000-01-01", "%Y-%m-%d")
cat(tracemem(date), "\n")
#> <0x7fcd53a17408>
date <- as.numeric(date)
#> tracemem[0x7fcd53a17408 -> 0x7fcd53ad5798]:

I think the reason is this bit from Ch 3.5:

It’s challenging to predict exactly when R applies this optimisation because of two complications:

When it comes to bindings, R can currently only count 0, 1, and many. That means if an object has two bindings, and one goes away, the reference count does not get decremented (one less than many is still many).

Whenever you call any regular function, it will make a reference to the object. The only exception are specially written C functions, which occur mostly in the base package.

(emphasis mine)

prosoitos · May 15, 2018, 6:50pm

I was actually just trying to sort that out because your lobstr outputs were directly contracting me. I think you are right about the cause for this.

So basically, this was one of the exceptions of the exception

nwerth · May 15, 2018, 6:53pm

Subtraction will get you the difference between any two dates.

Sys.Date() - as.Date("1970-01-01")
# Time difference of 17666 days

If you want to know the difference from 1970-01-01 because that's how dates are implemented, then you're stuck with as.numeric().

Also, it's good to remember there are two painful things shared by all (relevant) programming languages: string encoding and dates/times. Mostly because of locale information (time zones, daylight savings, formatting, etc.). So try to keep dates in Date objects as much as possible, and let R handle the mind-wracking minutiae.

prosoitos · May 15, 2018, 7:21pm

Here is a way to confirm your point:

date <- as.Date("2000-01-01", "%Y-%m-%d")

date
#> [1] "2000-01-01"

date[[1]] <- 10957

date
#> [1] "2000-01-01"

date[[1]] <- 10958

date
#> [1] "2000-01-02"

prosoitos · May 15, 2018, 7:43pm

I am again doubting that you are correct:

The addresses have changed indeed (as given by lobstr::obj_addr()). But I don't think that the addresses and the object labels represent the same thing. I was confused about whether they did or not and opened this issue actually because I thought the chapter was very unclear on this.

I think our conversation today has allowed me to test that they represent different things:

Let's run Hadley's example of section 3.5:

The labels remain the same according to him. But the addresses do change:

v <- 1:3

lobstr::obj_addr(v)
#> [1] "0x55b7c4cacbe0"

v[[3]] <- 4L

lobstr::obj_addr(v)
#> [1] "0x55b7c3d5b748"

I guess this is why he suggests to use tracemem() to see whether objects have been copied or changed-in-place and not to use lobstr::obj_addr().

I am no expert with tracemem() (I have actually never used it myself since I would have to recompile R to be able to use it!). But this might actually mean that the object was changed-in-place. Compare this with the tracemem() results he is presenting in the chapter... but I could be totally wrong (yet again! ).

This confirms that what labels are is not very clear in the chapter.

jcblum · May 15, 2018, 8:06pm

Now I think you may be right! I think I have been confused about the same thing you opened an issue about, without realizing it. I actually had to delete a bunch of stuff from the end of the tracemem output that gets added on due to running the code with reprex (apparently reprex isn't tracemem-friendly), and I confess that in my haste I didn't really look at what was left!

(This is also the problem with posting about complicated subjects in between doing several other things... You'd think I'd know better by now! )

(I also feel like this conversation has wandered really far from the original topic, and apologize to everybody else for that)

alistaire · May 15, 2018, 8:23pm

Object labels are only a thing in the diagrams in that chapter, not in R. R objects have names, which are bindings to locations in memory (the physical stuff, though still slightly indirectly) identified by addresses, which is what lobstr::obj_addr and tracemem are showing.

Thus, @jcblum is correct: the addresses are changing even though the name is not:

x <- Sys.Date()
tracemem(x)
#> [1] "<0x7fd63243fc98>"
x <- as.double(x)
#> tracemem[0x7fd63243fc98 -> 0x7fd6362378d8]
untracemem(x)

That does not mean it's necessarily a new object, just that R had to rearrange underneath to fit its new structure. From an R perspective, whether x gets overwritten or changed above is a bit unclear (and doesn't really matter). From a hardware perspective, it is a new object, presumably because your computer needs to reallocate the memory previously claimed by the class attribute.

Even when we would talk about an object in R definitely being changed, not overwritten, the R story may still not correspond to the memory story, e.g.

y <- 1:3
tracemem(y)
#> [1] "<0x7fd632e33d88>"
names(y) <- letters[1:3]
#> tracemem[0x7fd632e33d88 -> 0x7fd633507d48]
untracemem(y)

In this case, your computer needs to allocate new space for the names attribute—the previous space in memory was too small! Even though the R object is still the same, in your computer's memory, it is now in a new location.

Ultimately, the awesome part about R is that we don't have to care about memory and pointers and such most of the time. There's a reason that coding moved to higher-level languages: Managing memory yourself is a pain and error-prone. You can still do it in C if you need the control, but it is in no way necessary to become an awesome programmer/data scientist.

prosoitos · May 15, 2018, 8:54pm

Thank you for clarifying these enigmatic labels in the book. All of this is very useful.

No attack on jcblum obviously! this has been a very informative and interesting conversation. But I disagree with this:

She said that "copy-on-modify occurred" and disagreed that it could be a case of modify-in-place:

The way she "demonstrates" this (date has different addresses before and after) is flowed as shown when running lobstr::obj_addr on Hadley's very example of modify-in-place which also gives different addresses (see my post earlier).

Right. And that's exactly what jcblum and I were fighting (in a friendly way!!) about.

In short,

I never argued against this. lobstr::obj_addr is clear enough. I was just doubting that copy-on-modify had occurred as she also said.