Correcting variable names using stringr

rcoffey1015 · October 28, 2019, 9:30pm

I am attempting to correct a dataset that has multiple different names for the same variable due to differences in capitalization (ie. pepco, Pepco, PEPCO) using stringr.

So far, I have determined the different names the variables are listed under using the following code:

file.path("/Users/ryancoffey/Desktop/ElectricityDemand.txt") -> desktop.path

read_tsv(file.path(desktop.path)) -> ElectricityDemand

print(ElectricityDemand)

ElectricityDemand %>%
  distinct(Subregion)

I am wondering if anyone can help explain how to combine variables that correspond to each other but have "different" names using stringr commands.

I have included a screenshot of the dataset in order to help (datafile is a .txt file so I couldn't upload it along with this post).

andresrcs · October 28, 2019, 11:51pm

Hi!

To help us help you, could you please prepare a proper reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

mattwarkentin · October 29, 2019, 12:30am

Hi @rcoffey1015,

It seems like you have values with similar names, not variables. One way to make sure the values for the Subregion variable are stored the same way would be to use dplyr::mutate and stringr...

library(stringr)

ElectricityDemand %>%
  mutate(subregion2 = str_to_title(Subregion))

You could replace str_to_title() to str_to_lower(), str_to_upper(), or str_to_sentence(), depending on your preference.

rcoffey1015 · October 29, 2019, 3:45pm

That worked. Thank you so much!

system · November 5, 2019, 3:45pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.