Hi,
some of you may remember the tidylog package. Right now, I'm working on improving the output for join operations such as left_join
, inner_join
, etc., and would welcome feedback on what the package should report.
This is a first draft, loosely oriented on what Stata reports for merges:
> tidylog::left_join(flights[1:10000, ], airlines[1:10, ], by = "carrier")
#>left_join: added one column (name)
#> rows only in x 2,783
#> rows only in y ( 0)
#> matched rows 7,217
#> ========
#> rows total 10,000
(Any time a number is printed in parentheses, it means that those rows are not included in the result.)
Because joins are complicated and cover a lot of different use cases, I would welcome additional input on this. It's also possible to test the current implementation (which surely still has some bugs). See the github issue here for more information: https://github.com/elbersb/tidylog/issues/25
Another interesting thing to report would be the numbers of rows that were duplicated, but I'm not sure yet on how to approach this.
Ben