I trying to split the second column in two: I am able to do this when there is a separating character with separate(df, into = c("df1", "df2"), sep = " "), but in this case with the () characters, I always get some kind of error.
Any idea how I can make to create the two columns: City and Country?
Here are four examples of using separate to split the second column. You can use the extra argument to suppress the warning using the default sep value or you can set sep = "\\(" and remove the trailing ).
library(tidyr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
cities <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
Month = c("September", "October", "November", "December"),
`City.(Country)` = c("Paris(France)",
"Madrid(Spain)","London(UK)","Berlin(Germany)")
)
separate(cities, `City.(Country)`, into = c("City", "Country")) #warning due to trailing )
#> Warning: Expected 2 pieces. Additional pieces discarded in 4 rows [1, 2, 3, 4].
#> Month City Country
#> 1 September Paris France
#> 2 October Madrid Spain
#> 3 November London UK
#> 4 December Berlin Germany
separate(cities, `City.(Country)`, into = c("City", "Country"), extra = "drop")
#> Month City Country
#> 1 September Paris France
#> 2 October Madrid Spain
#> 3 November London UK
#> 4 December Berlin Germany
separate(cities, `City.(Country)`, into = c("City", "Country"), sep = "\\(") #leaves trailing )
#> Month City Country
#> 1 September Paris France)
#> 2 October Madrid Spain)
#> 3 November London UK)
#> 4 December Berlin Germany)
separate(cities, `City.(Country)`, into = c("City", "Country"), sep = "\\(") |>
mutate(Country = sub("\\)", "", Country))
#> Month City Country
#> 1 September Paris France
#> 2 October Madrid Spain
#> 3 November London UK
#> 4 December Berlin Germany