Here's a tidyverse solution, @ysc!
library(tidyverse) # loading tidyverse
# create the sample data frame
df <- data.frame(
var1 = c("1", "2,3", "4", "5"),
var2 = c("3", "4", "5,6", "7,8"),
var3 = c("2", "5", "6", "9,10")
)
# create a new data frame with the commas and first numbers removed
df1 <- df %>%
mutate(across(everything(), function(x){
stringr::str_remove(x, "[0-9]*,") # using `str_remove` to get rid of a number followed by a comma
})) %>%
# now that we've removed the numbers and commas, convert all columns to numeric
mutate(across(everything(), as.numeric))
# Let's take a look at the new data frame:
> df1
var1 var2 var3
1 1 3 2
2 3 4 5
3 4 6 6
4 5 8 10
# awesome, looks like we got rid of those numbers and commas!
# Check the structure of the data frame to make sure all the cols are numeric:
str(df1)
'data.frame': 4 obs. of 3 variables:
$ var1: num 1 3 4 5
$ var2: num 3 4 6 8
$ var3: num 2 5 6 10
# looks good!
This solution relies on a regular expression in the line stringr::str_remove(x, "[0-9]*,")
Here's what it's saying.
stringr::str_remove
: I'm using the str_remove
function in the stringr
package to remove a specific pattern from all strings where it occurs. For any strings where the pattern doesn't occur, nothing will happen--the pattern isn't found, so it won't be removed.
"[0-9]*,"
: this means "any digits any number of times, followed by a comma". [0-9]
looks for any digit; *
means "any number of times, and
,` is a literal comma.
Note: if your real data has a different format than the toy example (i.e. if the thing you're trying to remove isn't always digits followed by a comma) then you may need to re-write the regular expression to fit your use case. If you need help with that, let me know! Here is a regex cheatsheet (second page).
The mutate(across(everything()))
syntax is also a little confusing. across()
is a relatively new function in dplyr
, and it can be hard to get used to. But basically this code is just saying "For each column, apply this function" and then you define the function, which in our case is the str_remove
section.
I hope that's helpful! Let me know if you need any more help here.