All possible combinations in data set with nested groups that are ordered sequencially

vesuccio · May 6, 2021, 1:42am

I need to find all combinations of values within a specific sequence. But this data set needs a bit of explaining.

In the example df you can see the nested grouping structure, with GROUP nested within SAMPLE and SITE. My actual data set has ~1000 SAMPLE across 30 SITE. Note that each GROUP is in sequence (1 to 3) and this sequence needs to be retained in the final result as this is a critical part of future analyses. In other words, combinations that put GROUP 2 or 3 before GROUP 1 are not needed.

Within each GROUP are 2 or 3 text or numeric values in the ASSIGN_1, ASSIGN_2, and ASSIGN_3 columns. For example, F1 has 3 groups and each group has 2 possibilities: GROUP 1 has 4 and unk_palmer_trib; GROUP 2 has 6.1 and 5.2; and ‘GROUP’ 3 has 10.1 and 6.1. You can see SAMPLE B2 also has three groups, with GROUP 1 and 3 having two possibilities and GROUP 2 having three possibilities.

So…I need to find all the possible combinations of the values in the ASSIGN columns while retaining the sequence of the GROUP. Note that OPTION columns in the result are the various combinations of values from the ASSIGN columns in the df and you will notice that these combinations retain the GROUP sequence (i.e., 1 to 3). Also note that the number of OPTION columns with data (i.e., no “NAs”) in the result correspond to all the possible combinations of the ASSIGN columns that retain the GROUP sequence. SAMPLE F1 has 3 groups each with 2 possibilities and thus there are 8 possible outcomes (i.e., 2 x 2 x 2; see OPTION columns 1 to 8). SAMPLE B2 has 3 groups (2 groups with two possibilities and 1 group with three possibilities) and thus there are 12 possible outcomes (i.e., 2 x 3 x 2; see OPTION columns 1 to 12).

To really push this point home, lets look at SAMPLE F1 in the df and the result to illustrate how these combinations need to be structured. OPTION_1 for F1 is just the ASSIGN_1 column from df. OPTION_2 is row 1 from ASSIGN_1 (i.e., GROUP_1 = 4) and then rows 2 and 3 from ASSIGN_2 (i.e., GROUP_2 = 5.2 and GROUP_3 = 6.1). ‘OPTION_3’ is row 1 from ASSIGN_1 (i.e., GROUP_1 = 4), row 2 from ASSIGN_2 (i.e., GROUP_2 = 5.2) and row 3 from ASSIGN_1 (i.e., GROUP_3 = 10.1). Repeat at nauseum….

In my actual data set, some of the GROUP can have up to 5 ASSIGN columns…so the number of possible combinations can be large for some SAMPLE.

I have tried using expand.grid() within ave(), but have not been able to get the code to work. I am open to all solutions, but prefer base package solutions as I want to avoid having to load packages. I also suspect that the df likely needs to be restructured in some way…which is perfectly fine so long as the GROUP sequence is retained in the final product.

Let me know if I need to clarify.

Thanks in advance for your help.

df <- read.table(text = "SITE	SAMPLE	GROUP	ASSIGN_1	ASSIGN_2	ASSIGN_3
A1	F1	1	4	unk_palmer_trib	NA
A1	F1	2	6.1	5.2	NA
A1	F1	3	10.1	6.1	NA
M15	B2	1	6.2	6.4	NA
M15	B2	2	10.1	6.1	5.2
M15	B2	3	10.1	6.1	NA
", header = TRUE)

result <- read.table(text = "SITE	SAMPLE	GROUP	OPTION_1	OPTION_2	OPTION_3	OPTION_4	OPTION_5	OPTION_6	OPTION_7	OPTION_8	OPTION_9	OPTION_10	OPTION_11	OPTION_12
A1	F1	1	4	4	4	4	unk_palmer_trib	unk_palmer_trib	unk_palmer_trib	unk_palmer_trib	NA	NA	NA	NA
A1	F1	2	6.1	5.2	5.2	6.1	6.1	5.2	5.2	6.1	NA	NA	NA	NA
A1	F1	3	10.1	6.1	10.1	6.1	10.1	6.1	10.1	6.1	NA	NA	NA	NA
M15	B2	1	6.2	6.2	6.2	6.2	6.2	6.2	6.4	6.4	6.4	6.4	6.4	6.4
M15	B2	2	10.1	6.1	10.1	6.1	5.2	5.2	10.1	6.1	10.1	6.1	5.2	5.2
M15	B2	3	10.1	6.1	6.1	10.1	10.1	6.1	10.1	6.1	6.1	10.1	10.1	6.1
", header = TRUE)

system · May 27, 2021, 1:43am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.