My usual project workflow is to read in a bunch of files as a list using map
and readr
, then do as much cleaning as possible before saving a single minimal table for analysis. I am slowly getting the handle of purrr
functions for helping with this approach, but could do with specific advice about:
-
How to iterate through tibbles in a list using
mutate
,filter
,select
and other dplyr functions (see example below)? -
Any other hints/tips/suggestions to improve this workflow using tidy principles?
library(tidyverse)
set.seed(4321)
#Make up some data in four separate tibbles
table1 <- tibble(
id = 1:10,
age = floor(runif(min=18, max=100, n=10)),
sex = sample(c("Male", "Female"), 10, replace = TRUE),
nonsense = sample(letters, 10)
)
table2 <- tibble(
id = 1:4,
weight = c("50 kg", "45", "65kg", "67"),
height = c("141", "133cm", NA, "177 cm")
)
table3 <- tibble(
id = 1:10,
outcome = sample(c("Alive", "Dead"), 10, replace = TRUE)
)
useless_table <- tibble(
no_use = sample(LETTERS, 10)
)
#Add the tables to a list
list_tables <- list(table1, table2, table3, useless_table)
names(list_tables) <- c("table1", "table2", "table3", "useless_table")
#Keep only the tables that we are interested in
kept_tables <- list_tables %>% keep(names(.) %in%
c("table1", "table2", "table3"))
#Iterate through tables, selecting only the variables we wish to keep
keep_vars <- list("id", "age", "sex", "weight", "height", "outcome")
names(keep_vars) <- keep_vars
kept_tables <- map(kept_tables, ~ select(.x, one_of(names(keep_vars))))
#> Warning: Unknown columns: `weight`, `height`, `outcome`
#> Warning: Unknown columns: `age`, `sex`, `outcome`
#> Warning: Unknown columns: `age`, `sex`, `weight`, `height`
#Now how to iterate through tables, mutating to tidy-up some variables?
#For example, parse height and weight as numeric
#Doesn't work...
kept_tables <- kept_tables %>%
map(.x, ~ mutate_at(.vars = vars(weight, height), .funs = funs(parse_number)))
#> Error in as_mapper(.f, ...): object '.x' not found
#As a final step, would reduce to a single table e.g. for modelling
#This works
out <- kept_tables %>%
reduce(left_join, by=c("id"))
Created on 2018-12-06 by the reprex package (v0.2.1)