My first approach would be to use the tidyverse
package and convert the input "wide" dataset into an intermediate "long" output dataset. The slice_head()
can easily be adjusted to the requested top 5 values.
``` r
library(tidyverse)
set.seed(123456)
example <- dplyr::data_frame(
Site = 1:10,
Substance_A = sample(0:50, 10, rep = TRUE),
Substance_B = sample(0:50, 10, rep = TRUE),
Substance_C = sample(0:50, 10, rep = TRUE)
)
#> Warning: `data_frame()` is deprecated as of tibble 1.1.0.
#> Please use `tibble()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.
output_long <-
example %>%
pivot_longer(cols = 2:NCOL(.),
names_to = "substance",
values_to = "value"
) %>%
arrange(Site, desc(value)) %>%
group_by(Site) %>%
slice_head(n = 2)
output_long
#> # A tibble: 20 x 3
#> # Groups: Site [10]
#> Site substance value
#> <int> <chr> <int>
#> 1 1 Substance_B 44
#> 2 1 Substance_A 41
#> 3 2 Substance_A 41
#> 4 2 Substance_B 29
#> 5 3 Substance_A 48
#> 6 3 Substance_B 46
#> 7 4 Substance_C 45
#> 8 4 Substance_B 22
#> 9 5 Substance_C 46
#> 10 5 Substance_A 44
#> 11 6 Substance_A 35
#> 12 6 Substance_B 27
#> 13 7 Substance_A 37
#> 14 7 Substance_B 37
#> 15 8 Substance_C 34
#> 16 8 Substance_A 9
#> 17 9 Substance_B 15
#> 18 9 Substance_C 15
#> 19 10 Substance_C 22
#> 20 10 Substance_B 20
Created on 2021-02-03 by the reprex package (v1.0.0)
The above output dataset would be easier to work with.
However, to give it a try to put it back into a wider format, I've added the top_n
and top_substance
variables and then needed to pivot_wider()
and pivot_longer()
a couple of times to get the result you're looking for.
output_long %>%
mutate(top_n = str_c("Top_", row_number()),
top_substance = str_c(top_n, "_Name")
) %>%
pivot_wider(names_from = top_n,
values_from = value
) %>%
pivot_wider(names_from = top_substance,
values_from = substance
) %>%
select(Site, sort(names(.))) %>%
## pivot_longer of multiple columns require that all values are of same class()
mutate(across(.cols = everything(), .fns = as.character)) %>%
## get rid of NA values
pivot_longer(cols = 2:NCOL(.),
names_to = "col_name",
values_to = "col_value",
values_drop_na = TRUE
) %>%
pivot_wider(names_from = col_name,
values_from = col_value
)
#> # A tibble: 10 x 5
#> # Groups: Site [10]
#> Site Top_1 Top_1_Name Top_2 Top_2_Name
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 44 Substance_B 41 Substance_A
#> 2 2 41 Substance_A 29 Substance_B
#> 3 3 48 Substance_A 46 Substance_B
#> 4 4 45 Substance_C 22 Substance_B
#> 5 5 46 Substance_C 44 Substance_A
#> 6 6 35 Substance_A 27 Substance_B
#> 7 7 37 Substance_A 37 Substance_B
#> 8 8 34 Substance_C 9 Substance_A
#> 9 9 15 Substance_B 15 Substance_C
#> 10 10 22 Substance_C 20 Substance_B
If there's an easier / quicker way to achieve the requested results then I'm curious what it might look like 