Separate column with () characters

jynusmac · October 31, 2021, 6:24pm

Hi all.
In this dataframe

cities <- data.frame(
  stringsAsFactors = FALSE,
       check.names = FALSE,
             Month = c("September", "October", "November", "December"),
  `City.(Country)` = c("Paris(France)",
                       "Madrid(Spain)","London(UK)","Berlin(Germany)")
)

I trying to split the second column in two: I am able to do this when there is a separating character with separate(df, into = c("df1", "df2"), sep = " "), but in this case with the () characters, I always get some kind of error.

Any idea how I can make to create the two columns: City and Country?

Regards.

FJCC · October 31, 2021, 6:57pm

Here are four examples of using separate to split the second column. You can use the extra argument to suppress the warning using the default sep value or you can set sep = "\\(" and remove the trailing ).

library(tidyr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
cities <- data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  Month = c("September", "October", "November", "December"),
  `City.(Country)` = c("Paris(France)",
                       "Madrid(Spain)","London(UK)","Berlin(Germany)")
)
separate(cities, `City.(Country)`, into = c("City", "Country")) #warning due to trailing )
#> Warning: Expected 2 pieces. Additional pieces discarded in 4 rows [1, 2, 3, 4].
#>       Month   City Country
#> 1 September  Paris  France
#> 2   October Madrid   Spain
#> 3  November London      UK
#> 4  December Berlin Germany
separate(cities, `City.(Country)`, into = c("City", "Country"), extra = "drop")
#>       Month   City Country
#> 1 September  Paris  France
#> 2   October Madrid   Spain
#> 3  November London      UK
#> 4  December Berlin Germany

separate(cities, `City.(Country)`, into = c("City", "Country"), sep = "\\(")  #leaves trailing )
#>       Month   City  Country
#> 1 September  Paris  France)
#> 2   October Madrid   Spain)
#> 3  November London      UK)
#> 4  December Berlin Germany)
separate(cities, `City.(Country)`, into = c("City", "Country"), sep = "\\(") |> 
  mutate(Country = sub("\\)", "", Country))
#>       Month   City Country
#> 1 September  Paris  France
#> 2   October Madrid   Spain
#> 3  November London      UK
#> 4  December Berlin Germany

^{Created on 2021-10-31 by the reprex package (v2.0.1)}

jynusmac · October 31, 2021, 7:17pm

Thanks FJCC for all these possibilities, all works fine.

system · November 7, 2021, 7:17pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.