I am not sure if R is the right tool to handle this, as I am a total newbie who is just starting with R. And I only start with R in the hope it can solve this problem.
The problem to resolve is as follow:
- I am receiving each 15 minutes a report from each of my 5 tills in each of my shops
- Each till report contains 1 value: the average cash in the till
- So if I want to know how much cash a till has on average per hour, or per day, I just need to show the average value
- But if I want to know the cash in the shop at 9:45, I do a sum of these 5 values values (1 per till)
- So I cannot rely on just an average or just a sum of my counter, to create tables showing the various combinations of time and group aggregations (per hour per shop, per hour per till, per 15min per shop)
Can R help me create a calculation that will be behave:
- as an average when I only aggregate over time
- as an sum when I only aggregate over group,
- AND provdie the same value (provided I am not missing any report):
- as a sum of the TIME [2 hours] average when I mix group and time aggregation
- as average of the GROUP [5 tills] sum when I mix time and group aggregation
So to be clear I need one formula that will be the same for all the tables, if they show days, hours, minutes and tills, shops, countries, or if they show each hours or not, or if they show each till or not, or if they show neither time nor group aggregation (=all the tills for ever).
The same formula should cater for all calculations.
I know how to do it in Business Objects, using contextual aggregations, if that can help...
=Average(Sum(Cash) ForEach ([Day];[Hour];[Minute]))
or
=Sum(Average(Cash) ForEach ([Country];[Shop];[Till]))
[And I know that if I am missing reports, my sum of average will not be the same as my average of sum...]
I think the dummy data would look like this, for 2 tills on the 1st and 2nd of the month, for 3pm and 4pm, for the counter "Cash":
# Create dummy data`
tillsdata <- tribble(
~Country,~Shop,~Till,~Day,~Hour,~Min,~Cash,
France,Paris,Till A,1,3,0,36,
France,Paris,Till A,1,3,15,12,
France,Paris,Till A,1,3,30,14,
France,Paris,Till A,1,3,45,40,
France,Paris,Till A,1,4,0,9,
France,Paris,Till A,1,4,15,9,
France,Paris,Till A,1,4,30,12,
France,Paris,Till A,1,4,45,24,
France,Paris,Till A,2,3,0,56,
France,Paris,Till A,2,3,15,40,
France,Paris,Till A,2,3,30,8,
France,Paris,Till A,2,3,45,15,
France,Paris,Till A,2,4,0,60,
France,Paris,Till A,2,4,15,9,
France,Paris,Till A,2,4,30,14,
France,Paris,Till A,2,4,45,20,
France,Paris,Till B,1,3,0,0,
France,Paris,Till B,1,3,15,36,
France,Paris,Till B,1,3,30,8,
France,Paris,Till B,1,3,45,9,
France,Paris,Till B,1,4,0,50,
France,Paris,Till B,1,4,15,0,
France,Paris,Till B,1,4,30,15,
France,Paris,Till B,1,4,45,12,
France,Paris,Till B,2,3,0,56,
France,Paris,Till B,2,3,15,16,
France,Paris,Till B,2,3,30,6,
France,Paris,Till B,2,3,45,32,
France,Paris,Till B,2,4,0,9,
France,Paris,Till B,2,4,15,20,
France,Paris,Till B,2,4,30,56,
France,Paris,Till B,2,4,45,15
)
And if I remove any column from the report, I want the data to aggregate correctly (sum, or average, or sum of averages).
And a bonus would be if there is a way to declare that a certain column is TIME and the other is GROUP so that removing any GROUP or any TIME column would return the right result (the order of removing them does not matter)