base::duplicated() unexpectedly returns a matrix

Dear Community,

When looking for duplicate rows in an array, base::duplicated returns an array of dimension n \times 1 for an equally sized input array. Is this a bug or intended behavior?

Here is a minimal reproducible example:

duplicated(
  cbind(
    c(1, 2, 3, 1)
  )
)

Background: I am working on a function that detects duplicate records in a dataset. For reasons beyond the scope of this post, callers provide the data as atomic vectors via the dot-operator. On runtime, my function internally calls cbind() to construct a matrix from the input after it has been validated.

For all cases in which cbind(...) evaluates to a matrix with at least two columns, duplicated returns a logical vector. From my understanding of ?duplicated this seems to be the intended behavior. However, when fed a single vector -- as in the example above -- duplicated will return a n \times 1 matrix, violating my design contracts.

There are of course simple fixes, e.g., as.vector, however, I want to understand the problem before I fix it.

Best,

Dag

Edit: I did not change the default value for duplicated's MARGIN argument, i.e., my function compares rows.

I do not know if this intentional but in your case you could make the behaviour consistent by replacing the relevant function duplicated.array by the following (commented lines are original code, the line below these is the replacement) :

duplicated.dag  = 
function (x, incomparables = FALSE, MARGIN = 1L, fromLast = FALSE, 
    ...) 
{
    if (!isFALSE(incomparables)) 
        .NotYetUsed("incomparables != FALSE")
    dx <- dim(x)
    ndim <- length(dx)
    if (any(MARGIN > ndim)) 
        stop(gettextf("MARGIN = %s is invalid for dim = %s", 
            paste(MARGIN, collapse = ","), paste(dx, collapse = ",")), 
            domain = NA)
    # temp <- if ((ndim > 1L) && (prod(dx[-MARGIN]) > 1L)) 
    #     asplit(x, MARGIN)
    # else x
    temp = asplit(x, MARGIN)
    res <- duplicated.default(temp, fromLast = fromLast, ...)
    dim(res) <- dim(temp)
    dimnames(res) <- dimnames(temp)
    res
}

duplicated.dag(
  cbind(c(1, 2, 3, 1))
  )
#> [1] FALSE FALSE FALSE  TRUE

duplicated.dag(
  cbind(c(1, 2, 3, 1),
        c(1, 2, 3, 1))
  )
#> [1] FALSE FALSE FALSE  TRUE

Created on 2020-06-21 by the reprex package (v0.3.0)

This is very helpful, @HanOostdijk! However, since I am not a friend of interfering with well tested R code, I'll just flatten my result.

Thank you very much!

Best,

Dag

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.