interface design - allowing special function behavior related to dplyr::mutate()

mrmallironmaker · May 7, 2020, 9:37pm

I use a lot of the tidyverse with three-dimensional motion data, specifically virtual reality tracking data. With this comes data in the form position_x, position_y, and position_z or quaternion_w, quaternion_z, quaternion_y, quaternion_z that I usually put into three or four separate columns. The trouble comes when I want to do vector operations on these columns. For example, say I want to plot position of an object over time and have a little line segment specifying rotation. With vanilla tidyverse, the code would be something like this:

df %>%
  mutate(
    # set the direction of the vector in the local space
    direction_local_x = 0,
    direction_local_y = 0,
    direction_local_z = 1,
    # here we have messy code that rotates a vector by a quaternion,
    # picture eight lines like this one
    direction_worldspace_x = direction_local_x*quaternion_w-direction_local_y*quaternion_z+direction_local_z*quaternion_y
    # then add the direction vector to the position vector
    endpoint_x = position_x + direction_worldspace_x,
    endpoint_y = position_y + direction_worldspace_y,
    endpoint_z = position_z + direction_worldspace_z
  ) %>%
  ggplot(...) # do ggplot magic here

Instead of that mess, I'd like to have the same exact behavior, but with much better syntactic sugar. It would be something like:

df %>%
  mutate(
    direction_local = make_3d(x=0, y=0, z=1),
    direction_worldspace = rotate_3d(direction_initial, by=quaternion),
    endpoint = add_3d(position, direction_worldspace)
  ) %>%
  ggplot(...) # do ggplot magic here.

There are a few approaches in my mind, but I don't know which one to take.

Ideally, I'd like to use some premade functionality of mutate so that my function make_3d not only knows its own arguments (x, y, and z) but also that it's assigned the variable direction_local and it's in the mutate function coming from the tibble df. Then the function make_3d can create the columns direction_local_x, direction_local_y, and direction_local_z using rlang stuff, as long as make_3d has the original tibble and the name "direction_local". I don't know if that's possible, and that's the main question in this post.

Another option is to override mutate to handle when the outermost funciton on the right is one of my special functions and call that function with the extra arguments I need. I'm uncomfortable overriding such an ubiquitous method, but that may be the right approach here.

A third option is to make my own function, e.g, mutate_3d, that works like in case #2 but avoids overriding dplyr::mutate. Then there would be some inelegant switching back and forth between mutate and mutate_3d that feels unneccesary.

Which one would be your choice?

mrmallironmaker · May 24, 2020, 5:43pm

For those interested (maybe @jdblischak, thanks for the like), this was evetually handled by treating 3D vectors as individual columns, rather than using nonstandard evaluation on column names. I found vctrs, a developer-focused package used to build new R vector types (which I used to implement 3D vectors). Then I didn't need to modify mutate at all. If a group of columns naturally fit together, it may make more sense to have them as a single vector type.

For reference, the package I'm developing is on Github.

jdblischak · May 24, 2020, 6:50pm

Very cool! Thanks for sharing your solution. I thought your question was an interesting one, but I felt I couldn't offer any advice since it was such a domain-specific question.

Since you solved your question, could you please choose your response as the solution?

FAQ: How do I mark a solution?

system · May 31, 2020, 6:50pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.