I need to find all combinations of values within a specific sequence. But this data set needs a bit of explaining.
In the example df
you can see the nested grouping structure, with GROUP
nested within SAMPLE
and SITE
. My actual data set has ~1000 SAMPLE
across 30 SITE
. Note that each GROUP
is in sequence (1 to 3) and this sequence needs to be retained in the final result as this is a critical part of future analyses. In other words, combinations that put GROUP
2 or 3 before GROUP
1 are not needed.
Within each GROUP
are 2 or 3 text or numeric values in the ASSIGN_1
, ASSIGN_2
, and ASSIGN_3
columns. For example, F1 has 3 groups and each group has 2 possibilities: GROUP
1 has 4 and unk_palmer_trib; GROUP
2 has 6.1 and 5.2; and ‘GROUP’ 3 has 10.1 and 6.1. You can see SAMPLE
B2 also has three groups, with GROUP
1 and 3 having two possibilities and GROUP
2 having three possibilities.
So…I need to find all the possible combinations of the values in the ASSIGN
columns while retaining the sequence of the GROUP
. Note that OPTION
columns in the result
are the various combinations of values from the ASSIGN
columns in the df
and you will notice that these combinations retain the GROUP
sequence (i.e., 1 to 3). Also note that the number of OPTION
columns with data (i.e., no “NAs”) in the result
correspond to all the possible combinations of the ASSIGN
columns that retain the GROUP
sequence. SAMPLE
F1 has 3 groups each with 2 possibilities and thus there are 8 possible outcomes (i.e., 2 x 2 x 2; see OPTION
columns 1 to 8). SAMPLE
B2 has 3 groups (2 groups with two possibilities and 1 group with three possibilities) and thus there are 12 possible outcomes (i.e., 2 x 3 x 2; see OPTION
columns 1 to 12).
To really push this point home, lets look at SAMPLE
F1 in the df
and the result
to illustrate how these combinations need to be structured. OPTION_1
for F1 is just the ASSIGN_1
column from df
. OPTION_2
is row 1 from ASSIGN_1
(i.e., GROUP_1 = 4) and then rows 2 and 3 from ASSIGN_2 (i.e., GROUP_2 = 5.2 and GROUP_3 = 6.1). ‘OPTION_3’ is row 1 from ASSIGN_1
(i.e., GROUP_1 = 4), row 2 from ASSIGN_2
(i.e., GROUP_2 = 5.2) and row 3 from ASSIGN_1
(i.e., GROUP_3 = 10.1). Repeat at nauseum….
In my actual data set, some of the GROUP
can have up to 5 ASSIGN
columns…so the number of possible combinations can be large for some SAMPLE
.
I have tried using expand.grid()
within ave()
, but have not been able to get the code to work. I am open to all solutions, but prefer base package
solutions as I want to avoid having to load packages. I also suspect that the df
likely needs to be restructured in some way…which is perfectly fine so long as the GROUP
sequence is retained in the final product.
Let me know if I need to clarify.
Thanks in advance for your help.
df <- read.table(text = "SITE SAMPLE GROUP ASSIGN_1 ASSIGN_2 ASSIGN_3
A1 F1 1 4 unk_palmer_trib NA
A1 F1 2 6.1 5.2 NA
A1 F1 3 10.1 6.1 NA
M15 B2 1 6.2 6.4 NA
M15 B2 2 10.1 6.1 5.2
M15 B2 3 10.1 6.1 NA
", header = TRUE)
result <- read.table(text = "SITE SAMPLE GROUP OPTION_1 OPTION_2 OPTION_3 OPTION_4 OPTION_5 OPTION_6 OPTION_7 OPTION_8 OPTION_9 OPTION_10 OPTION_11 OPTION_12
A1 F1 1 4 4 4 4 unk_palmer_trib unk_palmer_trib unk_palmer_trib unk_palmer_trib NA NA NA NA
A1 F1 2 6.1 5.2 5.2 6.1 6.1 5.2 5.2 6.1 NA NA NA NA
A1 F1 3 10.1 6.1 10.1 6.1 10.1 6.1 10.1 6.1 NA NA NA NA
M15 B2 1 6.2 6.2 6.2 6.2 6.2 6.2 6.4 6.4 6.4 6.4 6.4 6.4
M15 B2 2 10.1 6.1 10.1 6.1 5.2 5.2 10.1 6.1 10.1 6.1 5.2 5.2
M15 B2 3 10.1 6.1 6.1 10.1 10.1 6.1 10.1 6.1 6.1 10.1 10.1 6.1
", header = TRUE)