I'm looking for a discussion on some programming best practices. If you have a function that performs an operation on a dataframe, say calculating velocity. What are the benefits of:
- Returning full dataframe, modified with new column(s) vs.
- Returning new column and relying on user to assign it
calcVelocityDF <- function(df){
df$velocity <- df$speed * df$dist
return(df)
}
calcVelocityVector <- function(df){
df$velocity <- df$speed * df$dist
return(df$velocity)
}
newDF <- calcVelocityDF(cars)
newDF$velocity <- calcVelocityVector(cars)
I'm using the cars data with the understanding something like this I would probably just pass vector1 and vector2, so the idea is that the calculation might be something that results in needing the full dataset.
It seems like returning a vector makes it clearer to the user what's being done; however, I've always enjoyed the security blanket that comes with working within a dataframe and being confident with the row integrity.