I have a dataframe that needs to be sorted by these columns. There is data in the dataframe as well.
|Area name|Area code|Area type|date|
|England|E92000001|Nation|01/07/2020|
|South West|E12000009|Region|01/07/2020|
|South East|E12000008|Region|01/07/2020|
|London|E12000007|Region|01/07/2020|
|East of England|E12000006|Region|01/07/2020|
|West Midlands|E12000005|Region|01/07/2020|
|East Midlands|E12000004|Region|01/07/2020|
|Yorkshire and The Humber|E12000003|Region|01/07/2020|
|North West|E12000002|Region|01/07/2020|
|North East|E12000001|Region|01/07/2020|
|Worcestershire|E10000034|Upper tier local authority|01/07/2020|
|West Sussex|E10000032|Upper tier local authority|01/07/2020|
|Warwickshire|E10000031|Upper tier local authority|01/07/2020|
|Surrey|E10000030|Upper tier local authority|01/07/2020|
|Suffolk|E10000029|Upper tier local authority|01/07/2020|
|Staffordshire|E10000028|Upper tier local authority|01/07/2020|
|Somerset|E10000027|Upper tier local authority|01/07/2020|
|Oxfordshire|E10000025|Upper tier local authority|01/07/2020|
This data charts everyday back to January and contains over 4000 rows.
I have a column of all this data and I want to find out the difference in the column between the dates and the same area.
I need to
- sort data. Firstly by name and then by area type as some areas have different area types but are the same place.
- have it ordered by date so each day is consecutvie.
- calculate the difference between the dates for the same area. But I do not want the code to calculate the difference between different areas, as this would be incorrect.
- I have to make put this into a new column but have to make sure that the column length matches the data frame as you cannot find the difference between the first row and no rows.
How should I do it and please could you provide some example code that I can work with.
Tricky bits are: sorting table by multiple columns. making sure that the difference is not calculated between different areas. making sure that a value is applied to the first row of each area as they rows would not have any difference and so R would not be able to plot this.
I want to chart this new column of differences in the values over time. So the difference between the values on the y axis and dates on the axis and to colour the data by area name.