Making tidy names

Is there a best practice for automatically generating tidy names for a dataset? My learning management system grade export creates the most horrific variable names (e.g. `Homework 1 (167963)`) and I would like to automatically rename them to something tidier (e.g. Homework1(167963) at the very least). I know about the base R function make.names() but I wonder if there is something tidier.

1 Like

janitor::clean_names() is super useful:

9 Likes

The janitor package has a function called clean_names(), which is great, and comes with several options for controlling the output.

4 Likes

Ha, @mara you beat me to it!

1 Like

Yes! Thanks @jake and @mara. Don't know why my googling wasn't turning up janitor.

1 Like

I think janitor::clean_names() is a good option for you right now.

tibble / the tidyverse itself will offer more support for name repair in the medium term. Some basic repair will be request-able and there will be a way to say "repair names with janitor::make_clean_names()" (or what have you). So stay tuned.

4 Likes

Additionally to janitor::clean_names() or janitor::make_clean_names() there is also the underlying snakecase package, which should definitely be worth to look at.

If you want the specific formatting like suggested in the first post, you could go for i.e.:

library(tibble)
library(magrittr)
library(snakecase)

df <- tibble(`Homework 1 (167963)` = "abc")
names(df) %>% to_parsed_case(sep_in = NULL, numerals = "tight")
#> [1] "Homework1(167963)"

Created on 2018-10-30 by the reprex package (v0.2.0).

2 Likes

I just routinely feed everything through janitor::clean_names(), it is so nice to use. Also check out the very useful janitor::remove_empty()

1 Like