# Pivoting in Data.Table : Percentages Greater Than 100?

I have the following dataset:

``````set.seed(123)
gender <- c("Male","Female")
gender <- sample(gender, 5000, replace=TRUE, prob=c(0.45, 0.55))
gender <- as.factor(gender)

status <- c("Immigrant","Citizen")
status <- sample(status, 5000, replace=TRUE, prob=c(0.3, 0.7))
status  <- as.factor(status )

country <- c("A", "B", "C", "D")
country <- sample(country, 5000, replace=TRUE, prob=c(0.25, 0.25, 0.25, 0.25))
country  <- as.factor(country)

################

disease <- c("Yes","No")
disease <- sample(disease, 5000, replace=TRUE, prob=c(0.4, 0.6))
disease <- as.factor(disease)

###################
my_data = data.frame(gender, status, disease, country)
``````

I learned how to make a summary table that shows the (relative) disease rates for each unique combination of factors:

``````library(tidyverse)
my_data %>% group_by (gender, status, country, disease) %>%
summarise (n=n()) %>%
mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) -> step_1
#> `summarise()` has grouped output by 'gender', 'status', 'country'. You can
#> override using the `.groups` argument.

step_1 |>group_by(country) |>
pivot_wider(names_from = disease,
values_from = c(n:rel.freq),
names_prefix = "disease_") |>
mutate(overallPerc = (n_disease_No + n_disease_Yes)/sum(step_1\$n))

gender status    country n_disease_No n_disease_Yes rel.freq_disease_No rel.freq_disease_Yes overallPerc
<fct>  <fct>     <fct>          <int>         <int> <chr>               <chr>                      <dbl>
1 Female Citizen   A                308           200 61%                 39%                       0.102
2 Female Citizen   B                291           169 63%                 37%                       0.092
3 Female Citizen   C                301           228 57%                 43%                       0.106
4 Female Citizen   D                245           189 56%                 44%
``````

Now, I am trying to convert this above code into "data.table" format to increase the efficiency of this code.

Here is my attempt:

``````library(data.table)

my_data <- as.data.table(my_data)

step_1 <- my_data[, .(n = .N), by = c("gender", "status", "country", "disease")][, rel.freq := paste0(round(100 * n/.N, 0), "%")]

step_1 <- dcast(step_1, gender + status + country ~ disease, value.var = c("n", "rel.freq"))[, overallPerc := (n_No + n_Yes)/.N]

gender    status country n_No n_Yes rel.freq_No rel.freq_Yes overallPerc
1: Female   Citizen       A  308   200        962%         625%     31.7500
2: Female   Citizen       B  291   169        909%         528%     28.7500
3: Female   Citizen       C  301   228        941%         712%     33.0625
4: Female   Citizen       D  245   189        766%         591%     27.1250
``````

However, many of these percentages are greater than 100 - can someone please show me how I can resolve this problem?

Thanks!

Is this what you want?

``````a <- step_1 |>group_by(country) |>
pivot_wider(names_from = disease,
values_from = c(n:rel.freq),
names_prefix = "disease_") |>
mutate(overallPerc = (n_disease_No + n_disease_Yes)/sum(step_1\$n))

a %>%
group_by(gender, status, country) %>%
summarise(nno = sum(n_disease_No),
nyes = sum(n_disease_Yes),
ovperc = sum(overallPerc)*100,
.groups = "drop"
)
``````
``````# A tibble: 16 × 6
gender status    country   nno  nyes ovperc
<fct>  <fct>     <fct>   <int> <int>  <dbl>
1 Female Citizen   A         308   200  10.2
2 Female Citizen   B         291   169   9.2
3 Female Citizen   C         301   228  10.6
4 Female Citizen   D         245   189   8.68
5 Female Immigrant A         107    95   4.04
6 Female Immigrant B         126    76   4.04
7 Female Immigrant C         137    70   4.14
8 Female Immigrant D         129    74   4.06
9 Male   Citizen   A         237   167   8.08
10 Male   Citizen   B         247   163   8.2
11 Male   Citizen   C         250   171   8.42
12 Male   Citizen   D         230   139   7.38
13 Male   Immigrant A         103    68   3.42
14 Male   Immigrant B         117    63   3.6
15 Male   Immigrant C          93    53   2.92
16 Male   Immigrant D         102    52   3.08
``````

Maybe

``step_1 <- my_data[, .(n = .N), bys = c("gender", "status", "country", "disease")][, rel.freq :=  n / nrow(my_data)]``

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.