Is there a best practice for automatically generating tidy names for a dataset? My learning management system grade export creates the most horrific variable names (e.g. `Homework 1 (167963)`
) and I would like to automatically rename them to something tidier (e.g. Homework1(167963)
at the very least). I know about the base R function make.names()
but I wonder if there is something tidier.
janitor::clean_names()
is super useful:
The janitor package has a function called clean_names()
, which is great, and comes with several options for controlling the output.
I think janitor::clean_names()
is a good option for you right now.
tibble / the tidyverse itself will offer more support for name repair in the medium term. Some basic repair will be request-able and there will be a way to say "repair names with janitor::make_clean_names()
" (or what have you). So stay tuned.
Additionally to janitor::clean_names()
or janitor::make_clean_names()
there is also the underlying snakecase package, which should definitely be worth to look at.
If you want the specific formatting like suggested in the first post, you could go for i.e.:
library(tibble)
library(magrittr)
library(snakecase)
df <- tibble(`Homework 1 (167963)` = "abc")
names(df) %>% to_parsed_case(sep_in = NULL, numerals = "tight")
#> [1] "Homework1(167963)"
Created on 2018-10-30 by the reprex package (v0.2.0).
I just routinely feed everything through janitor::clean_names()
, it is so nice to use. Also check out the very useful janitor::remove_empty()