when a function modifies a dataframe; return dataframe or new vectors

kwbyron · December 16, 2020, 10:42pm

I'm looking for a discussion on some programming best practices. If you have a function that performs an operation on a dataframe, say calculating velocity. What are the benefits of:

Returning full dataframe, modified with new column(s) vs.
Returning new column and relying on user to assign it

calcVelocityDF <- function(df){
    df$velocity <- df$speed * df$dist
    return(df)
}

calcVelocityVector <- function(df){
    df$velocity <- df$speed * df$dist
    return(df$velocity)
}

newDF <- calcVelocityDF(cars)
newDF$velocity <- calcVelocityVector(cars)

I'm using the cars data with the understanding something like this I would probably just pass vector1 and vector2, so the idea is that the calculation might be something that results in needing the full dataset.

It seems like returning a vector makes it clearer to the user what's being done; however, I've always enjoyed the security blanket that comes with working within a dataframe and being confident with the row integrity.

williaml · December 16, 2020, 10:48pm

It would depend on what you were trying to do, wouldn't it? I would assume that a dataframe would be better given that you can see what is going on.

system · January 6, 2021, 10:48pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.