I have a dataset in the following form
library(data.table)
dt <- data.table("category" = c("G", "G", "G", "G", "G", "G", "G", "G45", "G45",
"G45", "G45", "G45", "G45", "G45", "G46", "G46",
"G46", "G46", "G46", "G46", "G46", "G47", "G47",
"G47", "G47", "G47", "G47", "G47"),
"indicator" = c("V11110", "V11110", "V11110", "V11110",
"V11110", "V11110", "V11110", "V11110",
"V11110", "V11110", "V11110", "V11110",
"V11110", "V11110", "V11110", "V11110",
"V11110", "V11110", "V11110", "V11110",
"V11110", "V11110", "V11110", "V11110",
"V11110", "V11110", "V11110", "V11110"),
"size_class" = c("sc_1", "sc_3", "sc_2", "sc_4", "sc_5",
"sc_6", "sc_TOTAL", "sc_1", "sc_3", "sc_2",
"sc_4", "sc_5", "sc_6", "sc_TOTAL", "sc_1",
"sc_3", "sc_2", "sc_4", "sc_5", "sc_6",
"sc_TOTAL", "sc_1", "sc_3", "sc_2", "sc_4",
"sc_5", "sc_6", "sc_TOTAL"),
"geo" = c("IE", "IE", "IE", "IE", "IE", "IE", "IE", "IE",
"IE", "IE", "IE", "IE", "IE", "IE", "IE", "IE",
"IE", "IE", "IE", "IE", "IE", "IE", "IE", "IE",
"IE", "IE", "IE", "IE"),
"year" = c(2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L,
2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L,
2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L,
2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L),
"values" = c(19996, 3890, 19476, 2035, NA, 104, 46234, 4077,
456, 2851, 280, NA, NA, 7736, 5864, 1159, 4935,
715, NA, NA, 12991, NA, 2275, 11690, 1040, 378,
69, NA),
"grp_id" = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 16L, 16L,
16L, 16L, 16L, 16L, 16L))
Which gives us the following table:
category indicator size_class geo year values grp_id
1: G V11110 sc_1 IE 2008 19996 1
2: G V11110 sc_3 IE 2008 3890 1
3: G V11110 sc_2 IE 2008 19476 1
4: G V11110 sc_4 IE 2008 2035 1
5: G V11110 sc_5 IE 2008 NA 1
6: G V11110 sc_6 IE 2008 104 1
7: G V11110 sc_TOTAL IE 2008 46234 1
8: G45 V11110 sc_1 IE 2008 4077 2
9: G45 V11110 sc_3 IE 2008 456 2
10: G45 V11110 sc_2 IE 2008 2851 2
11: G45 V11110 sc_4 IE 2008 280 2
12: G45 V11110 sc_5 IE 2008 NA 2
13: G45 V11110 sc_6 IE 2008 NA 2
14: G45 V11110 sc_TOTAL IE 2008 7736 2
15: G46 V11110 sc_1 IE 2008 5864 7
16: G46 V11110 sc_3 IE 2008 1159 7
17: G46 V11110 sc_2 IE 2008 4935 7
18: G46 V11110 sc_4 IE 2008 715 7
19: G46 V11110 sc_5 IE 2008 NA 7
20: G46 V11110 sc_6 IE 2008 NA 7
21: G46 V11110 sc_TOTAL IE 2008 12991 7
22: G47 V11110 sc_1 IE 2008 NA 16
23: G47 V11110 sc_3 IE 2008 2275 16
24: G47 V11110 sc_2 IE 2008 11690 16
25: G47 V11110 sc_4 IE 2008 1040 16
26: G47 V11110 sc_5 IE 2008 378 16
27: G47 V11110 sc_6 IE 2008 69 16
28: G47 V11110 sc_TOTAL IE 2008 NA 16
category indicator size_class geo year values grp_id
Now there are some NA
which can be calculated by substracting the sum of available values from the total value for each group. For example, the missing value in row 5 can be calculated as sc_TOTAL - sc_1 - sc_2 - sc_3 - sc_4 - sc_6. But this can only be done if there is one missing value, in case there are more than one like in row 12 and 13 it is not possible.
In addition to that there is also a column with the category and the category has sub-categories. That enables us to calculate the missing value in row 28 for example. By substracting the sc_TOTAL from G46 and G45 from the sc_TOTAL from G. This in turn enables us to calculate the missing value in row 22 like we did it before.
I would like to create a function for it that does this automatically, preferrably with data.table since the whole dataset is large. I made some attempts but they were not succesful.
In the end my table should look like this:
category indicator size_class geo year values grp_id
1: G V11110 sc_1 IE 2008 19996 1
2: G V11110 sc_3 IE 2008 3890 1
3: G V11110 sc_2 IE 2008 19476 1
4: G V11110 sc_4 IE 2008 2035 1
5: G V11110 sc_5 IE 2008 733 1
6: G V11110 sc_6 IE 2008 104 1
7: G V11110 sc_TOTAL IE 2008 46234 1
8: G45 V11110 sc_1 IE 2008 4077 2
9: G45 V11110 sc_3 IE 2008 456 2
10: G45 V11110 sc_2 IE 2008 2851 2
11: G45 V11110 sc_4 IE 2008 280 2
12: G45 V11110 sc_5 IE 2008 NA 2
13: G45 V11110 sc_6 IE 2008 NA 2
14: G45 V11110 sc_TOTAL IE 2008 7736 2
15: G46 V11110 sc_1 IE 2008 5864 7
16: G46 V11110 sc_3 IE 2008 1159 7
17: G46 V11110 sc_2 IE 2008 4935 7
18: G46 V11110 sc_4 IE 2008 715 7
19: G46 V11110 sc_5 IE 2008 NA 7
20: G46 V11110 sc_6 IE 2008 NA 7
21: G46 V11110 sc_TOTAL IE 2008 12991 7
22: G47 V11110 sc_1 IE 2008 10055 16
23: G47 V11110 sc_3 IE 2008 2275 16
24: G47 V11110 sc_2 IE 2008 11690 16
25: G47 V11110 sc_4 IE 2008 1040 16
26: G47 V11110 sc_5 IE 2008 378 16
27: G47 V11110 sc_6 IE 2008 69 16
28: G47 V11110 sc_TOTAL IE 2008 25507 16
category indicator size_class geo year values grp_id