Dynamic grid which summarizes

pieterjanvc · July 16, 2020, 1:47am

Hi,

Welcome to the RStudio community!

Let me start by saying that I have not heard or used PowerBI, so my approach below is just the solution as if it were performed in a regular R script.

I'd also like to point you to the Reprex guide. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

The reason for this is that the way you shared your dataset here is not easy to transfer to R just by copy-paste. Reading the guide, you can learn how to share the data like this in future:

data.frame(
  stringsAsFactors = FALSE,
             Sales = c(20L, 15L, 25L, 10L, 19L, 11L, 10L),
            Profit = c(10L, 12L, 18L, 10L, 15L, 20L, 30L),
          Category = c("FISH", "FISH", "FISH", "MEAT", "MEAT", "MEAT", "VEG"),
            Region = c("UK", "US", "AUS", "UK", "US", "AUS", "US")
)

OK, now for the solution. Here is my approach, assuming you have the dataset as a variable in R and the column of interest as one too:

library(dplyr) #make sure you have the latest version, dplyr 1.0.0+
library(tidyr)

> myData
  Sales Profit Category Region
1    20     10     FISH     UK
2    15     12     FISH     US
3    25     18     FISH    AUS
4    10     10     MEAT     UK
5    19     15     MEAT     US
6    11     20     MEAT    AUS
7    10     30      VEG     US

colOfInterest = "Region"

myData = myData %>% group_by(across(all_of(colOfInterest))) %>% 
  summarise(Sales = sum(Sales), Profit = sum(Profit), .groups = "drop")

> myData
# A tibble: 3 x 3
  Region Sales Profit
  <chr>  <int>  <int>
1 AUS       36     38
2 UK        30     20
3 US        44     57

EXPLANATION
I used the tidyverse implementation here. If you are unfamiliar with this, please read some more here

The group_by and summarise function are basic tidyverse functions (part of dplyr package). The first groups the data by a certain column, the second allows to summarise the other columns. In this case, I assumed that Sales and Profit are always to be summarised.

The more difficult to grasp part is the across(all_of(colOfInterest)) bit. Normally, you would just type the column name in the group_by function like this: myData %>% group_by(Region). However, you don't know beforehand which column to group by, so you need to convert a string given by the user's selection (from PowerBI) to a column name that the tidyverse language accepts.

To do this, you use the new across function (dplyr 1.0.0) combined with the all_of function. Here you can have a vector of one or more strings that convert to column names.

I know this all sounds a bit confusing, and I still struggle with it myself as it's all new in dplyr 1.0.0, so maybe someone else comes up with even a more elegant solution

Hope this helps,
PJ