I have the following dataframe:
df = data.frame(
a = c(1, 2, 3),
b = c(4, 5, 6),
c = c(7, 8, 9)
)
which looks like:
## a b c
## 1 1 4 7
## 2 2 5 8
## 3 3 6 9
Then I have the following function which expects a dataframe with only 1 row, and it basically returns a new dataframe with just 1 row, similar to the previous one, but with an extra column with the sum of the previous columns.
process_row = function(row) {
row = row[1,]
new_col_name = paste("[", paste(colnames(row), collapse = "+"), "]", sep = "")
row[[new_col_name]] = sum(row[1,])
return (row)
}
Please, assume that function cannot be changed and we don’t really know how it works inernally (like a black box). This is a simplification of another problem, so this is a requirement.
Let’s check how that function works:
row = df[1,]
row
## a b c
## 1 1 4 7
row_processed = process_row(row)
row_processed
## a b c [a+b+c]
## 1 1 4 7 12
Now, my goal is to apply that blackbox function to a dataframe with multiple rows, getting the same output as the following chunk of code:
# BEGIN OF BLOCK TO OPTIMIZE
row_processed = process_row(df[1,])
result = row_processed
for (i in 2:nrow(df)) {
row_processed = process_row(df[i,])
result = rbind(result, row_processed)
}
# END OF BLOCK TO OPTIMIZE
Let’s try it:
df
## a b c
## 1 1 4 7
## 2 2 5 8
## 3 3 6 9
result
## a b c [a+b+c]
## 1 1 4 7 12
## 2 2 5 8 15
## 3 3 6 9 18
I’m pretty sure we can get this in a more clear way, probably some function on the apply
function familiy.
Thanks!