Making tidy names

AmeliaMN · October 29, 2018, 8:36pm

Is there a best practice for automatically generating tidy names for a dataset? My learning management system grade export creates the most horrific variable names (e.g. `Homework 1 (167963)`) and I would like to automatically rename them to something tidier (e.g. Homework1(167963) at the very least). I know about the base R function make.names() but I wonder if there is something tidier.

mara · October 29, 2018, 8:44pm

janitor::clean_names() is super useful:

jake · October 29, 2018, 8:45pm

The janitor package has a function called clean_names(), which is great, and comes with several options for controlling the output.

jake · October 29, 2018, 8:46pm

Ha, @mara you beat me to it!

AmeliaMN · October 29, 2018, 9:13pm

Yes! Thanks @jake and @mara. Don't know why my googling wasn't turning up janitor.

jennybryan · October 29, 2018, 9:23pm

I think janitor::clean_names() is a good option for you right now.

tibble / the tidyverse itself will offer more support for name repair in the medium term. Some basic repair will be request-able and there will be a way to say "repair names with janitor::make_clean_names()" (or what have you). So stay tuned.

Tazinho · October 29, 2018, 11:07pm

Additionally to janitor::clean_names() or janitor::make_clean_names() there is also the underlying snakecase package, which should definitely be worth to look at.

If you want the specific formatting like suggested in the first post, you could go for i.e.:

library(tibble)
library(magrittr)
library(snakecase)

df <- tibble(`Homework 1 (167963)` = "abc")
names(df) %>% to_parsed_case(sep_in = NULL, numerals = "tight")
#> [1] "Homework1(167963)"

Created on 2018-10-30 by the reprex package (v0.2.0).

benmoretti · October 29, 2018, 11:34pm

I just routinely feed everything through janitor::clean_names(), it is so nice to use. Also check out the very useful janitor::remove_empty()