Does dplyr::row_number() calculate row number for each obs? If so, how?

On the tidyverse website reference, I saw two usage mutate(mtcars, row_number() == 1L) and mtcars %>% filter(between(row_number(), 1, 10)) . It would be straight forward to think that the row_number() function is return the row number for each observation in the dataframe.

However, it has been emphasized in the documentation that the function is a window function and is similar to sortperm in other languages. As in the example:

x <- c(5, 1, 3, 2, 2, NA) row_number(x)

which yields to: c( 5 1 4 2 3 NA )

May I ask if this function is intended to report the row number for each observations? If it is, what is the logic flow behind the function call?

Thanks!

You can check the code for a function like so:

> row_number
function (x) 
rank(x, ties.method = "first", na.last = "keep")
<environment: namespace:dplyr>

So yes it's a dplyr function, but it's simply a wrapper around the base function rank() and if you check the arguments, you can see, that e.g. row_number() == 1L will let you manipulate based on ranks, e.g.:

mtcars %>% group_by(cyl) %>% filter(row_number(hp) == 1)

Which is equivalent to, but more robust than:

mtcars %>% group_by(cyl) %>% arrange(hp) %>% slice(1)

If you look at the help, you can see that the row_number functions 'are provided mainly as a convenience when converting between R and SQL'

Hope it helps,
Bw. Leon

4 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.