Creating summary with recoding variables

shoaibali · July 24, 2020, 2:11pm

Hi all, as i have a dataframe

First four columns are Categories,Last four columns are calculated variables.
calculated variables can have values 1,2,3,4 or 1-9

i am trying to create a dynamic function like
function(data,calcuation_var,grouping_var)

also creating dynamic recoding for creating new groups
db$new_var <- recode(db$region,c(1,2)~"A",c(4,5)~"B",c(6,7,8)~"C")

new variable can be A, B, C,.........N
but i am stuck at from where to start, how to start

the required output should be like for Col2

	      A	B	  C
1	    12%	41%	23%
2	    7%	10%	6%
3	    34%	16%	9%
4	    47%	33%	62%
N   	53	56	119

% values are (Percentage of occurrence for categories accordingly), N is the Total number of responses.

:Note Please provide a simplest solution as I am new to R , so that i can modify or give theme going forward.
please let me if any more explanation required.

nirgrahamuk · July 24, 2020, 2:13pm

Are you familiar at all with tidyverse / dplyr ?
you might be in danger of reinventing the wheel here, since the point of this hugely popular packages is to enable quite easy summarisations with an easy to use syntax.

shoaibali · July 24, 2020, 2:20pm

yes i am pretty familiar with that both, but need a approach to do that to start, rest modification i can do.

nirgrahamuk · July 24, 2020, 3:30pm

library(tidyverse)
set.seed(42)
(input_df <- tibble(
  id = 1:20,
  region = sample.int(5, 20, replace = TRUE),
  gender = sample.int(2, 20, replace = TRUE),
  sector = sample.int(3, 20, replace = TRUE),
  col1 = sample.int(6, 20, replace = TRUE),
  col2 = sample.int(7, 20, replace = TRUE),
  col3 = sample.int(8, 20, replace = TRUE),
  col4 = sample.int(16, 20, replace = TRUE)
) %>% mutate(across(starts_with("col"), ~ ifelse(. > 4, NA, .))) %>%
  mutate(across(starts_with("col"), forcats::as_factor)))


(recoded_df <- mutate(input_df,
  newvar = case_when(
    between(region, 1, 2) ~ "A",
    between(region, 4, 5) ~ "B",
    between(region, 6, 7) ~ "not seen",
    TRUE ~ "region3"
  )
))

(long_counts <- recoded_df %>% group_by(col1, newvar) %>%
  summarise(n = n()))
(total_col_counts <- group_by(long_counts, newvar) %>% summarise(sum_n = sum(n)))
(long_counts_x <- left_join(
  long_counts,
  total_col_counts
) %>% mutate(col_pcnt = paste0(round(100 * n / sum_n, digits = 2), "%")))



(tidied_df <- pivot_wider(long_counts_x, id_cols = col1, names_from = newvar, values_from = col_pcnt))

(summary_row <- pivot_wider(total_col_counts, names_from = newvar, values_from = sum_n, values_fn = as.character))

(collated_df <- bind_rows(tidied_df, cbind(col1 = "Totals:", summary_row)))
 
(cleaned_df <- mutate(collated_df,
                      across(.fns = ~if_else(is.na(.),'',.))))

shoaibali · July 25, 2020, 9:40am

Getting error on
Error in across(starts_with("col"), ~ifelse(. > 4, NA, .)) :
could not find function "across"

i also tried to install dplyr and tidy verse from devtools but still getting error

nirgrahamuk · July 25, 2020, 9:59am

you could use mutate_at() instead of mutate with across.
To use across() you would install the dev version of dplyr from github

shoaibali · July 25, 2020, 10:12am

Thanks for you consistent reply

still getting error Error: starts_with() must be used within a selecting function.

for single variable i have created below, do we any solution where i can update something in my current function.....

 data <- data[!is.na(data[[var]]), ]
  T1 <- as.data.frame(table(data[[var]]))
  all <- sum(T1[, 2])
  T1 <- T1 %>% mutate(
    !!Name_of_variable := as.character(Var1),
    "Percent" = format(round(Freq * 100 / all,1),nsmall = 1),
    "N" = as.numeric(Freq)
  ) %>%
    select(!!Name_of_variable,"Percent","N")
  names(T1)[2] <- "  " # update the name of Header in double quotes
  T1[ ,2]<-sapply(T1[,2], function(x) ifelse(mask_m(x,all)=="--","--",paste0(mask_m(x,all),"%")))
  
  T1<-T1%>% select(-N)
  
  T1<- rbind(c("N",all),T1)

system · August 15, 2020, 10:12am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.