when a function modifies a dataframe; return dataframe or new vectors

I'm looking for a discussion on some programming best practices. If you have a function that performs an operation on a dataframe, say calculating velocity. What are the benefits of:

  1. Returning full dataframe, modified with new column(s) vs.
  2. Returning new column and relying on user to assign it
calcVelocityDF <- function(df){
    df$velocity <- df$speed * df$dist
    return(df)
}

calcVelocityVector <- function(df){
    df$velocity <- df$speed * df$dist
    return(df$velocity)
}

newDF <- calcVelocityDF(cars)
newDF$velocity <- calcVelocityVector(cars)

I'm using the cars data with the understanding something like this I would probably just pass vector1 and vector2, so the idea is that the calculation might be something that results in needing the full dataset.

It seems like returning a vector makes it clearer to the user what's being done; however, I've always enjoyed the security blanket that comes with working within a dataframe and being confident with the row integrity.

It would depend on what you were trying to do, wouldn't it? I would assume that a dataframe would be better given that you can see what is going on.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.