Hi,
Welcome to the RStudio community!
Let me start by saying that I have not heard or used PowerBI, so my approach below is just the solution as if it were performed in a regular R script.
I'd also like to point you to the Reprex guide. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:
The reason for this is that the way you shared your dataset here is not easy to transfer to R just by copy-paste. Reading the guide, you can learn how to share the data like this in future:
data.frame(
stringsAsFactors = FALSE,
Sales = c(20L, 15L, 25L, 10L, 19L, 11L, 10L),
Profit = c(10L, 12L, 18L, 10L, 15L, 20L, 30L),
Category = c("FISH", "FISH", "FISH", "MEAT", "MEAT", "MEAT", "VEG"),
Region = c("UK", "US", "AUS", "UK", "US", "AUS", "US")
)
OK, now for the solution. Here is my approach, assuming you have the dataset as a variable in R and the column of interest as one too:
library(dplyr) #make sure you have the latest version, dplyr 1.0.0+
library(tidyr)
> myData
Sales Profit Category Region
1 20 10 FISH UK
2 15 12 FISH US
3 25 18 FISH AUS
4 10 10 MEAT UK
5 19 15 MEAT US
6 11 20 MEAT AUS
7 10 30 VEG US
colOfInterest = "Region"
myData = myData %>% group_by(across(all_of(colOfInterest))) %>%
summarise(Sales = sum(Sales), Profit = sum(Profit), .groups = "drop")
> myData
# A tibble: 3 x 3
Region Sales Profit
<chr> <int> <int>
1 AUS 36 38
2 UK 30 20
3 US 44 57
EXPLANATION
I used the tidyverse implementation here. If you are unfamiliar with this, please read some more here
The group_by and summarise function are basic tidyverse functions (part of dplyr package). The first groups the data by a certain column, the second allows to summarise the other columns. In this case, I assumed that Sales and Profit are always to be summarised.
The more difficult to grasp part is the across(all_of(colOfInterest))
bit. Normally, you would just type the column name in the group_by function like this: myData %>% group_by(Region)
. However, you don't know beforehand which column to group by, so you need to convert a string given by the user's selection (from PowerBI) to a column name that the tidyverse language accepts.
To do this, you use the new across function (dplyr 1.0.0) combined with the all_of function. Here you can have a vector of one or more strings that convert to column names.
I know this all sounds a bit confusing, and I still struggle with it myself as it's all new in dplyr 1.0.0, so maybe someone else comes up with even a more elegant solution
Hope this helps,
PJ