Handle different aggregations over time and group (contextual aggregations)

proinsias · September 5, 2019, 9:19am

I am not sure if R is the right tool to handle this, as I am a total newbie who is just starting with R. And I only start with R in the hope it can solve this problem.

The problem to resolve is as follow:

I am receiving each 15 minutes a report from each of my 5 tills in each of my shops
Each till report contains 1 value: the average cash in the till
So if I want to know how much cash a till has on average per hour, or per day, I just need to show the average value
But if I want to know the cash in the shop at 9:45, I do a sum of these 5 values values (1 per till)
So I cannot rely on just an average or just a sum of my counter, to create tables showing the various combinations of time and group aggregations (per hour per shop, per hour per till, per 15min per shop)

Can R help me create a calculation that will be behave:

as an average when I only aggregate over time
as an sum when I only aggregate over group,
AND provdie the same value (provided I am not missing any report):
- as a sum of the TIME [2 hours] average when I mix group and time aggregation
- as average of the GROUP [5 tills] sum when I mix time and group aggregation

So to be clear I need one formula that will be the same for all the tables, if they show days, hours, minutes and tills, shops, countries, or if they show each hours or not, or if they show each till or not, or if they show neither time nor group aggregation (=all the tills for ever).
The same formula should cater for all calculations.

I know how to do it in Business Objects, using contextual aggregations, if that can help...
=Average(Sum(Cash) ForEach ([Day];[Hour];[Minute]))
or
=Sum(Average(Cash) ForEach ([Country];[Shop];[Till]))

[And I know that if I am missing reports, my sum of average will not be the same as my average of sum...]

I think the dummy data would look like this, for 2 tills on the 1st and 2nd of the month, for 3pm and 4pm, for the counter "Cash":

# Create dummy data`

tillsdata <- tribble(
  ~Country,~Shop,~Till,~Day,~Hour,~Min,~Cash,
  France,Paris,Till A,1,3,0,36,
  France,Paris,Till A,1,3,15,12,
  France,Paris,Till A,1,3,30,14,
  France,Paris,Till A,1,3,45,40,
  France,Paris,Till A,1,4,0,9,
  France,Paris,Till A,1,4,15,9,
  France,Paris,Till A,1,4,30,12,
  France,Paris,Till A,1,4,45,24,
  France,Paris,Till A,2,3,0,56,
  France,Paris,Till A,2,3,15,40,
  France,Paris,Till A,2,3,30,8,
  France,Paris,Till A,2,3,45,15,
  France,Paris,Till A,2,4,0,60,
  France,Paris,Till A,2,4,15,9,
  France,Paris,Till A,2,4,30,14,
  France,Paris,Till A,2,4,45,20,
  France,Paris,Till B,1,3,0,0,
  France,Paris,Till B,1,3,15,36,
  France,Paris,Till B,1,3,30,8,
  France,Paris,Till B,1,3,45,9,
  France,Paris,Till B,1,4,0,50,
  France,Paris,Till B,1,4,15,0,
  France,Paris,Till B,1,4,30,15,
  France,Paris,Till B,1,4,45,12,
  France,Paris,Till B,2,3,0,56,
  France,Paris,Till B,2,3,15,16,
  France,Paris,Till B,2,3,30,6,
  France,Paris,Till B,2,3,45,32,
  France,Paris,Till B,2,4,0,9,
  France,Paris,Till B,2,4,15,20,
  France,Paris,Till B,2,4,30,56,
  France,Paris,Till B,2,4,45,15

)

And if I remove any column from the report, I want the data to aggregate correctly (sum, or average, or sum of averages).

And a bonus would be if there is a way to declare that a certain column is TIME and the other is GROUP so that removing any GROUP or any TIME column would return the right result (the order of removing them does not matter)

andresrcs · September 5, 2019, 4:50pm

R doesn't work that way, it is not a spreadsheet like excel, you can accomplish all your requests by coding with R but there is more than one way to do it, so we would need a little more information on what kind of output you have in mind (an static html or pdf report, a dynamic web app, a real time dashboard, etc.)

Regardless of the output, you would have to learn how to code all the filtering and data wrangling you are describing, and you can do so by reading this free ebook.

proinsias · September 6, 2019, 1:27pm

Thanks.

The output would be a webpages showing 5 table:

One with no aggregations: Date, Hour, Minute, Country, Shop, Till (each 15min for each hour for each day for each till/shop/country: as many rows as input values)
One with just the overall aggregation (all days, for ever: only one row)
One with just the minutes aggregated to the Hour: each Day/Hour for each till/shop/country (a quarter of the unsegregated rows)
One with just the tills aggregated to the Shop: each Day/Hour/Min for each shop/country
One with just the tills and shops aggregated to the Country and minutes aggregated to the Day: each Day/for each Country

Ideally, the calculation should not have to be recomputed based on knowing what table will be displayed. Instead it would be outputting the correct calculation of the Cash value based on what Inputs are available.

A second use case for output would be:
A Tibco Spotfire cross table, where the table can be modified on the fly using hierarchies:

There is a slider to select if the table is showing the MIN or HOUR or DAY, and another one for TILL/SHOP/COUNTRY.
So based on the slider, the number of columns in the output may vary
And in that table, R would be used to calculate the correct value, based on the dimensions selected (time or group)
Adding or removing TIME dimensions will result in an AVG and doing it to the GROUP would impact the SIM calculation. [For the reasons I explained: the hourly value is the average of the minutes, but the shop value is the sum of the tills]

system · September 27, 2019, 1:27pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.