How do I split a data into two groups where Group 1 has the first 4 disease samples and the first 4 normal samples; group 2 has the remaining 3 disease and 3 normal? Group 1 has Sample ID '454', '3', '554', '202' as normal samples, and '531', '18', '681', '423' as disease samples; Group 2 has the reset samples. Ignore the two samples with missing Disease status. How do I go about it?
"Index" | "SampleID" | "Disease" |
---|---|---|
1 | 454 | "N" |
2 | 3 | "N" |
3 | 531 | "Y" |
4 | 18 | "Y" |
5 | 554 | "N" |
6 | 202 | "N" |
7 | 559 | "N" |
8 | 203 | "N" |
9 | 681 | "Y" |
10 | 423 | "Y" |
11 | 710 | "Y" |
12 | 768 | "Y" |
13 | "A81" | "?" |
14 | "A82" | "?" |
15 | "A11" | "Y" |
16 | "A101" | "N" |
Lets say I have a data that looks like this:
# A tibble: 27,578 x 202
Index SYMBOL `454.AVG_Beta` `454.Avg_NBEADS… `454.Avg_NBEADS… `454.BEAD_STDER…
<int> <chr> <dbl> <int> <int> <int>
1 1 ATP2A1 0.755 16 13 36
2 2 SLMAP 0.722 12 18 30
3 3 MEOX2 0.0975 20 25 111
4 4 HOXD3 0.146 20 19 122
5 5 ZNF398 0.102 16 20 181
6 6 PANX1 0.0626 12 13 543
7 7 COX8C 0.964 19 15 17
8 8 IMPA2 0.0240 15 25 494
9 9 TTC8 0.0109 23 20 562
10 10 FLJ35… 0.657 21 21 470
# ... with 27,568 more rows, and 196 more variables: `454.BEAD_STDERR_B` <int>,
# `454.Signal_A` <int>, `454.Signal_B` <int>, `454.Detection Pval` <dbl>,
# `454.Intensity` <int>, `3.AVG_Beta` <dbl>, `3.Avg_NBEADS_A` <int>,
# `3.Avg_NBEADS_B` <int>, `3.BEAD_STDERR_A` <int>, `3.BEAD_STDERR_B` <int>,
# `3.Signal_A` <int>, `3.Signal_B` <int>, `3.Detection Pval` <dbl>,
# `3.Intensity` <int>, `531.AVG_Beta` <dbl>, `531.Avg_NBEADS_A` <int>,
# `531.Avg_NBEADS_B` <int>, `531.BEAD_STDERR_A` <int>,
# `531.BEAD_STDERR_B` <int>, `531.Signal_A` <int>, `531.Signal_B` <int>,
# `531.Detection Pval` <dbl>, `531.Intensity` <int>, `18.AVG_Beta` <dbl>,
# `18.Avg_NBEADS_A` <int>, `18.Avg_NBEADS_B` <int>, `18.BEAD_STDERR_A` <int>,
# `18.BEAD_STDERR_B` <int>, `18.Signal_A` <int>, `18.Signal_B` <int>,
# `18.Detection Pval` <dbl>, `18.Intensity` <int>, `554.AVG_Beta` <dbl>,
# `554.Avg_NBEADS_A` <int>, `554.Avg_NBEADS_B` <int>,
# `554.BEAD_STDERR_A` <int>, `554.BEAD_STDERR_B` <int>, `554.Signal_A` <int>,
# `554.Signal_B` <int>, `554.Detection Pval` <dbl>, `554.Intensity` <int>,
# `202.AVG_Beta` <dbl>, `202.Avg_NBEADS_A` <int>, `202.Avg_NBEADS_B` <int>,
# `202.BEAD_STDERR_A` <int>, `202.BEAD_STDERR_B` <int>, `202.Signal_A` <int>,
# `202.Signal_B` <int>, `202.Detection Pval` <dbl>, `202.Intensity` <int>,
# `559.AVG_Beta` <dbl>, `559.Avg_NBEADS_A` <int>, `559.Avg_NBEADS_B` <int>,
# `559.BEAD_STDERR_A` <int>, `559.BEAD_STDERR_B` <int>, `559.Signal_A` <int>,
# `559.Signal_B` <int>, `559.Detection Pval` <dbl>, `559.Intensity` <int>,
# `203.AVG_Beta` <dbl>, `203.Avg_NBEADS_A` <int>, `203.Avg_NBEADS_B` <int>,
# `203.BEAD_STDERR_A` <int>, `203.BEAD_STDERR_B` <int>, `203.Signal_A` <int>,
# `203.Signal_B` <int>, `203.Detection Pval` <dbl>, `203.Intensity` <int>,
# `681.AVG_Beta` <dbl>, `681.Avg_NBEADS_A` <int>, `681.Avg_NBEADS_B` <int>,
# `681.BEAD_STDERR_A` <int>, `681.BEAD_STDERR_B` <int>, `681.Signal_A` <int>,
# `681.Signal_B` <int>, `681.Detection Pval` <dbl>, `681.Intensity` <int>,
# `423.AVG_Beta` <dbl>, `423.Avg_NBEADS_A` <int>, `423.Avg_NBEADS_B` <int>,
# `423.BEAD_STDERR_A` <int>, `423.BEAD_STDERR_B` <int>, `423.Signal_A` <int>,
# `423.Signal_B` <int>, `423.Detection Pval` <dbl>, `423.Intensity` <int>,
# `710.AVG_Beta` <dbl>, `710.Avg_NBEADS_A` <int>, `710.Avg_NBEADS_B` <int>,
# `710.BEAD_STDERR_A` <int>, `710.BEAD_STDERR_B` <int>, `710.Signal_A` <int>,
# `710.Signal_B` <int>, `710.Detection Pval` <dbl>, `710.Intensity` <int>,
# `768.AVG_Beta` <dbl>, `768.Avg_NBEADS_A` <int>, `768.Avg_NBEADS_B` <int>,
# `768.BEAD_STDERR_A` <int>, `768.BEAD_STDERR_B` <int>, …