Note that modify-in-place may actually not have occurred. I am too ignorant about
tracemem() to know and I will look into this later today. But if that were the case, her demonstration would still be flowed because having different addresses is not informative on this question (as shown with Hadley's example).
Note that modify-in-place may actually not have occurred. I am too ignorant about
Modify-in-place means the memory address remains the same (in a memory context, anyway—it can also refer to changing an R object without assigning). Since it is changing, this is copy-on-modify. (The "copy" part refers to copying the object to a new place in memory.)
This would be a contradiction, and I can't find the example you mention.
Copy-pasted from section 3.5 Modify-in-place:
Now have a look at this reprex:
v <- 1:3 lobstr::obj_addr(v) #>  "0x55b7c4cacbe0" v[] <- 4L lobstr::obj_addr(v) #>  "0x55b7c3d5b748"
Yeah, predicting modify-in-place is hard (it's pretty rare); that example may be flawed given current R internals.
A more consistent example is in data.table, which works really hard to make sure it modifies in-place as much as possible (so as to be fast). Note the difference between base R and data.table:
df <- data.frame(x = 1:3) lobstr::obj_addr(df) #>  "0x7f8652344bf0" lobstr::obj_addr(df$x) #>  "0x7f8653d834b8" df$x[] <- 4L lobstr::obj_addr(df) #>  "0x7f8653959880" lobstr::obj_addr(df$x) #>  "0x7f8652fae988" library(data.table) dt <- data.table(x = 1:3) lobstr::obj_addr(dt) #>  "0x7f8652596400" lobstr::obj_addr(dt$x) #>  "0x7f8652e380c8" dt[3, x := 4L] lobstr::obj_addr(dt) #>  "0x7f8652596400" lobstr::obj_addr(dt$x) #>  "0x7f8652e380c8"
OK. This makes a lot more sense. My confusion came from Hadley's example. So do you think that there is a problem with the example itself or is the contradictory result only when running his example in my system?
What happens if you run my reprex on your system?
My whole conversation with jcblum on that subject was fed by that book chapter (which I think needs a bit of clarity around what the labels are). I was confused about what labels vs addresses were (hence the issue I opened on that some time ago). And running my reprex pushed me to the wrong conclusion since I am getting different addresses when a modify-in-place is supposed to happen.
Do you mean that the example might not work with the current R version for instance?
I would be curious to see what you get when running my reprex. If you get different addresses too, I'll add this to my issue or open a new one.
More generally, you can get the number of days between any two
Date objects just by subtraction:
x = as.Date('2018-03-14') # "2018-03-14" x - as.Date('2017-10-04') # Time difference of 161 days as.numeric(x - as.Date('2017-10-04')) # 161
That said, the
lubridate package, which is part of the tidyverse, makes a lot of date-time arithmetic easier!
I tested your base vs
data.table comparison and get results totally consistent with yours. So I suspect that Hadley's example doesn't work anymore due to the new R release or other changes in R, not because of my system. Out of curiosity, I added
tibble to the mix and it is similar to base R for that.
df <- data.frame(x = 1:3) lobstr::obj_addr(df) #>  "0x5649b30793a0" lobstr::obj_addr(df$x) #>  "0x5649b3214330" df$x[] <- 4L lobstr::obj_addr(df) #>  "0x5649b4edb8c0" lobstr::obj_addr(df$x) #>  "0x5649b56127d8" library(data.table) dt <- data.table(x = 1:3) lobstr::obj_addr(dt) #>  "0x5649b65f8420" lobstr::obj_addr(dt$x) #>  "0x5649b403eb48" dt[3, x := 4L] lobstr::obj_addr(dt) #>  "0x5649b65f8420" lobstr::obj_addr(dt$x) #>  "0x5649b403eb48" library(tibble) tb <- tibble(x = 1:3) lobstr::obj_addr(tb) #>  "0x5649b7cfe008" lobstr::obj_addr(tb$x) #>  "0x5649b7ca6108" tb$x[] <- 4L lobstr::obj_addr(tb) #>  "0x5649b7e60828" lobstr::obj_addr(tb$x) #>  "0x5649b7e28e38"
This example was extremely helpful. Thank you @alistaire!
I posted an issue about Hadley's example and his reply brought back my initial understanding that @alistaire then flipped around. I asked him if he could post something here to clarify this and I hope he will find the time to do so as I am feeling very confused about all this right now.
Here is his reply:
Because obj_addr() makes a reference to v
Which would explain why different addresses are no proof that a modify-in-place hasn't happened (which was what I had initially understood and was arguing).
But then, what about the
data.table example that alistaire posted???
I don't know how to explain the
data.table example, but Hadley's reply seems to me to confirm what I was arguing that:
is not valid. And that what I had understood here:
might actually be correct.
Unless I am missing something else? (very possible!)
tracemem shows the same thing:
v <- 1:3 tracemem(v) #>  "<0x7ff3b93c8518>" v[] <- 4L #> tracemem[0x7ff3b93c8518 -> 0x7ff3bcdb7ec8]:
As far as I can tell, it's not modifying in-place, though I can't say why. I thought it may be an ALTREP thing, but I get the same thing with non-numeric vectors and on an old version of R. Maybe the
tracemem call itself counts as a reference? ...but if so, that would stop you from ever witnessing in-place modification.
I believe data.table takes over some of the memory allocation from base R (if you call
dput on a data.table, you can see it tracks its pointer), which may explain its behavior.
I don't know. I am confused. What has confused me all along with this chapter is what those labels are (hence my first issue on that). I thought they were something in R. Thank you for explaining to me that it is just a "name id" given by Hadley to objects for the purpose of illustration. But then I still don't understand how he determines that labels remain the same in the 3.5 section example (the
v example). He tells us that they do and gives us some diagrams, but without explaining how he gets that information.
That's why I hope he will jump in this thread to clarify all this...
Hadley seems to disagree (in the book, but also in his reply to my issue a few hours ago. Which rules out the possibility that it might be something that changed in R).
I think you've missed the point of that chapter: R objects emphatically do not have names, but do have addresses. Unfortunately that misunderstanding means that most of your explanation of what's happening isn't quite right (or at least you're using very non-standard terminology).
The problem with using
obj_addr() to assess whether or not copy-in-place is happening is that it itself takes a reference to
x, thus the copy-in-place optimisation no longer occurs.
You are probably confused here because you're running inside RStudio where the environment pane takes a reference. You should always run this sort of code in the console.
Arg, yeah, in retrospect I was too free in talking about bindings as they appear in practice, not as they exist. I'd edit, but that would risk making the whole thread incomprehensible.
Ah! This makes a lot of sense and fixes [most of] the problem! The remaining confusion is that ALTREP means if the vector is a sequence, it still has to be copied in R 3.5.0:
> v <- 1:3 > tracemem(v)  "<0x7fe184cff3b8>" > v[] <- 4L tracemem[0x7fe184cff3b8 -> 0x7fe18470f588]: > untracemem(v) > > x <- c(1L, 2L, 3L) > tracemem(x)  "<0x7fe187376208>" > x[] <- 4L > untracemem(x)
An old copy of R 3.4.2 on another computer shows the pre-ALTREP behavior:
> v <- 1:3 > tracemem(v)  "<00000000130C24D8>" > v[] <- 4L > untracemem(v)
@hadley, could you explain @alistaire's
data.table example then? Without knowing why (I didn't know that
obj_addr() was taking a reference), I had mostly understood things thanks to your chapter. But it is this example that made me doubt it all and open the issue about not being able to replicate your modify-in-place example:
library(data.table) dt <- data.table(x = 1:3) lobstr::obj_addr(dt) #>  "0x7f8652596400" lobstr::obj_addr(dt$x) #>  "0x7f8652e380c8" dt[3, x := 4L] lobstr::obj_addr(dt) #>  "0x7f8652596400" lobstr::obj_addr(dt$x) #>  "0x7f8652e380c8"
Edit: Full post (comparison between
data.table and base R) here.
Suggestion since most people use RStudio these days: add this to section "3.3.1 tracemem()" of the book?
And maybe add this info in the chapter too?
@jcblum, would you mind running:
date <- as.Date("2000-01-01", "%Y-%m-%d") cat(tracemem(date), "\n") date <- as.numeric(date) untracemem(date)
directly in R (not in RStudio) please? (again, since I cannot run
tracemem() myself). So that we finally get the answer to our question! And thank you so much for getting this started. It turned out to be very informative!! (and sorry @adpatter we totally hijacked your initial post with a side question...)
Thank you. This answers the question I was trying get at.
It makes sense, as a "class" is in fact an attribute of the date object, which, as you explained, is how the correct print function is selected i.e., print.Date.