Is any str_replace_all for an entire dataframe?
Nowadays, I use
df <- data.frame(lapply(df, function(x) {
gsub("valueToChange", "newVar", x)
}))
Is any str_replace_all for an entire dataframe?
Nowadays, I use
df <- data.frame(lapply(df, function(x) {
gsub("valueToChange", "newVar", x)
}))
I'm assuming you want to find and replace strings. I've been digging into dplyr lately, so this is how I'd do it.
library(tidyverse)
df <- mutate_if(df,
is.character,
str_replace_all,
pattern = "valueToChange",
replacement = "newVar")
I think you need to add an as_tibble in there:
suppressPackageStartupMessages(library(tidyverse))
df <- data.frame(col1 = c("a", "b", "c"), col2 = c("d", "a", "e"), stringsAsFactors = F)
df <- dplyr::mutate_if(
tibble::as_tibble(df),
is.character,
stringr::str_replace_all, pattern = "a",
replacement = "x")
df
#> # A tibble: 3 x 2
#> col1 col2
#> <chr> <chr>
#> 1 x d
#> 2 b x
#> 3 c e
Right you are. I'm actually surprised by this. I thought it would work with a data.frame (or else coerce the data.frame to tibble) but here's what I found:
library(tidyverse)
df <- tibble(c1 = letters[1:5], c2 = 5:9, c3 = LETTERS[5:9])
df %>% mutate_if(is.character, str_replace_all, pattern = '[aeiouAEIOU]', replacement = '%%%%')
#> # A tibble: 5 x 3
#> c1 c2 c3
#> <chr> <int> <chr>
#> 1 %%%% 5 %%%%
#> 2 b 6 F
#> 3 c 7 G
#> 4 d 8 H
#> 5 %%%% 9 %%%%
df2 <- data.frame(c1 = letters[1:5], c2 = 5:9, c3 = LETTERS[5:9])
df2 %>% mutate_if(is.character, str_replace_all, pattern = '[aeiouAEIOU]', replacement = '%%%%')
#> c1 c2 c3
#> 1 a 5 E
#> 2 b 6 F
#> 3 c 7 G
#> 4 d 8 H
#> 5 e 9 I
You’re seeing that because data.frame automatically converts characters to factors.
For the most part the tidyverse works with tibble's not data.frames. However a tibble can be substituted for a data.frame because it inherits data.frame.
suppressPackageStartupMessages(library(tidyverse))
df <- data.frame(col1 = c("a", "b", "c"), col2 = c("d", "a", "e"))
class(df)
#> [1] "data.frame"
typeof(df)
#> [1] "list"
df <- dplyr::mutate_if(tibble::as_tibble(df),
is.character,
stringr::str_replace_all, pattern = "a", replacement = "x")
# order of inheritence is most derived to the left, base class
# all the way to the right
class(df)
#> [1] "tbl_df" "tbl" "data.frame"
typeof(df)
#> [1] "list"
I've grown so accustomed to using tibbles, I totally forgot about stringsAsFactors = false. Thanks @hadley for making that possible. Also, thanks @danr for your feedback.
For those watching at home:
library(tidyverse)
df3 <- data.frame(c1 = letters[1:5], c2 = 5:9, c3 = LETTERS[5:9], stringsAsFactors = FALSE)
df3 %>% mutate_if(is.character, str_replace_all, pattern = '[aeiouAEIOU]', replacement = '%%%%')
#> c1 c2 c3
#> 1 %%%% 5 %%%%
#> 2 b 6 F
#> 3 c 7 G
#> 4 d 8 H
#> 5 %%%% 9 %%%%
Missed that, works for data.frame too if stringsAsFactors = FALSE
suppressPackageStartupMessages(library(tidyverse))
df <- data.frame(col1 = c("a", "b", "c"), col2 = c("d", "a", "e"),
stringsAsFactors = F)
df <- dplyr::mutate_if(df,
is.character,
stringr::str_replace_all, pattern = "a", replacement = "x")
df
#> col1 col2
#> 1 x d
#> 2 b x
#> 3 c e