I need to group and compare a list of values depending on several factors.
An extract of the data I have to process is (real one is more that 2600000 rows):
data.frame(stringsAsFactors=FALSE,
Start.ID = c("%%EDH", "%DIPA", "%DIPI", "%DRSI", "*%200", "**EG1",
"**TNT", "*01RD"),
End.ID = c("WSN", "PITES", "LADAT", "SITET", "BAKER", "MID",
"IBUGO", "IVLUT"),
Lon_i = c(9.23027777777778, 6.39722222222222, 7.35638888888889,
-0.0458333333333333, 0.516666666666667,
-0.922777777777778, 11.1397222222222, 4.78694444444444),
Lat_i = c(53.5088888888889, 49.7688888888889, 49.4375,
50.1633333333333, 51.35, 51.2816666666667,
46.0202777777778, 52.1555555555556),
Lon_f = c(8.87472222222222, 6.51944444444444, 7.83944444444444, 0,
0.298055555555556, -0.625, 11.3766666666667,
5.25694444444444),
Lat_f = c(53.3472222222222, 49.7286111111111, 49.2652777777778,
50.1, 51.495, 51.0538888888889, 45.6411111111111,
52.2441666666667),
Rumbo_circular = c(53, 297, 299, 335, 137, 321, 336, 253)
)
What I need to do is:
First I need to compare their "Rumbo_circular" value so that they are different. Afterward, I need to compare their Lon_i value; if the difference between their value is less than 3 units, another condition is introduced, I need to check Lat_i; if their Lat_i difference is less than 3 units a solution is reached: that combination is identified as "Possible_Conflict", and the process should start with another row. In short, I need to ask the command these questions:
- Have they different "Rumbo_circular" value?
- If yes: Go to question 2) and question 4).
- If not: That combination is identified as "Non_Possible_Conflict", as they have the same "Rumbo_circular" value.
- Have they a difference in "Lat_i" value under 3 units?
- If yes: Go to question 3)
- If not: That combination is identified as "Non_Possible_Inicial_Conflict"
- Have they a difference in "Lon_i" value under 3 units?
- If yes: That combination is identified as "Possible_Inicial_Conflict"
- If not: That combination is identified as "Non_Possible_Inicial_Conflict"
- Have they a difference in "Lat_f" value under 3 units?
- If yes: Go to question 5)
- If not: That combination is identified as "Non_Possible_End_Conflict"
- Have they a difference in "Lon_f" value under 3 units?
- If yes: That combination is identified as "Possible_End_Conflict"
- If not: That combination is identified as "Non_Possible_End_Conflict"
Here is a numeric example of what I expect R to do for me:
-
Lat_i: 58; Lon_i: 27; Lat_f: 60; Lon_f: 65; Rumbo_Circular: 60
-
Lat_i: 59; Lon_i: 27; Lat_f: 60; Lon_f: 70; Rumbo_Circular: 60
-
Lat_i: 55; Lon_i: 29 ; Lat_f: 55; Lon_f: 65; Rumbo_Circular: 63
-
Lat_i: 55; Lon_i: 29; Lat_f: 57; Lon_f: 65; Rumbo_Circular: 65
-
Comparing 1&2: When answering question 1), they do NOT have different "Rumbo_Circular" value and therefore the combination 1_2 is categorized as "Non_Possible_Conflict".
-
Comparing 1&3: Answer to question 1) is yes. Answer to question 2) is yes. Answer to question 3) is yes and therefore the combination 1_3 is identified as "Possible_Inicial_Conflict". However as answer to question 1) is yes I need to answer also question 4). Answer to question 4) is no, so they are identified as "Non_Possible_End_Conflict". Then, combination 1_3 has 2 identifications: "Possible_Inicial_Conflict", "Non_Possible_End_Conflict". (I think these identifications could be shown as a table format, that is to say, one column for "Non_Possible_Conflict", other for "Non_Possible_Inicial_Conflict", other for "Possible_Inicial_Conflict", other for "Possible_End_Conflict" and another for "Non_Possible_End_Conflict". Therefore, if they are identified under one of those categories the identification would appear in the column of the categorie and if not it would be blank).
-
Comparing 1&4: Answer to question 1) is yes; Answer to question 2) and 3) is yes, so the combination 1_4 has is categorized as "Possible_Inicial_Conflict". As question 1) was affirmative, question 4) must be answered. 4) and 5) are also true, therefore the combination is categorized as "Possible_End_Conflict". Then, this combination has 2 identifications: "Possible_Inicial_Conflict", "Possible_End_Conflict".
Hope somebody could help me writing this someway in r