dplyr 1.0.0 - all_equal() lifecycle questioning. Alternative?

josiah · July 24, 2020, 12:22pm

From time to time I find it rather important to compare two tibbles and check if they are identical. I most often us dplyr::all_equal() for this. In the case I need to see where they are not equal I will use compareDF::compare_df().

Today as I was going through the new documentation for all_equal() it manual file states

all_equal() allows you to compare data frames, optionally ignoring row and column names. It is questioning as of dplyr 1.0.0, because it seems to solve a problem that no longer seems that important.

Is there a discussion around why this problem no longer seems important? Is there an alternative solution that is found to be more robust? Or is the statement that comparing two tibbles seems to not be important to the dev team and that the effort involved in solving it could be better directed?

nirgrahamuk · July 24, 2020, 12:38pm

it may have something to do with

dplyr no longer provides a all.equal.tbl_df() method. It never should have done so in the first place because it owns neither the generic nor the class. It also provided a problematic implementation because, by default, it ignored the order of the rows and the columns which is usually important. This is likely to cause new test failures in downstream packages; but on the whole we believe those failures to either reflect unexpected behaviour or tests that need to be strengthened (#2751).

josiah · July 24, 2020, 12:49pm

That is sufficient for me! Thank you, @nirgrahamuk

josiah · July 31, 2020, 12:49pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.