> mydata
> ID Country col1 col2 col3
> 1 1 Asia
> 2 2 Africa - Benin (Cotonou)
> 3 3 Europe - France (Paris)
> 4 4 Asia - China(Shanghai)
> 5 5 Europe - United Kingdom (London)
> 6 6 Europe - France (Orléans)
> 7 7 Afrique - Togo (Lomé)
> 8 8 Afrique - Sénégal (Dakar)
I tried the following but im having issue with the regular expression
> mydata[c("Col1", "col2", "col3")] <- str_split_fixed(mydata$Country, '-d' , 3)
I want the result to be like the following:
> mydata
> ID Country Col1 Col2 Col3
> 1 1 Asia Asia <NA> <NA>
> 2 2 Africa - Benin (Cotonou) Africa Benin Cotonou
> 3 3 Europe - France (Paris) Europe France Paris
> 4 4 Asia - China(Shanghai) Asia China Shanghai
> 5 5 Europe - United Kingdom (London) Europe United Kingdom London
> 6 6 Europe - France (Orléans) Europe France Orléans
> 7 7 Afrique - Togo (Lomé) Afrique Togo Lomé
> 8 8 Afrique - Sénégal (Dakar) Afrique Sénégal Dakar
Any suggestion on how to do this?
Thanks in advance
I would do this with the separate() function from the tidyr package. It throws a warning because of the trailing parenthesis in rows 2 and 3 but it gets the right result.
Country<-c("Asia","Africa - Benin (Cotonou)",
"Europe - France (Paris)","Asia - China(Shanghai")
ID<-c(1,2,3,4)
mydata<-data.frame(ID,Country)
library(tidyr)
mydata <- mydata %>% separate(col = "Country", into = c("Col1", "Col2", "Col3"), remove = FALSE, fill = "right")
#> Warning: Expected 3 pieces. Additional pieces discarded in 2 rows [2, 3].
mydata
#> ID Country Col1 Col2 Col3
#> 1 1 Asia Asia <NA> <NA>
#> 2 2 Africa - Benin (Cotonou) Africa Benin Cotonou
#> 3 3 Europe - France (Paris) Europe France Paris
#> 4 4 Asia - China(Shanghai Asia China Shanghai
@FJCC
It does not work properly in my original large data.
It looks like the following. Some pieces are missing. For example in column col3 row 5,6,7 and 8 som part are missing
ID Country Col1 Col2 Col3
1 Asia Asia <NA> <NA>
2 Africa - Benin (Cotonou) Africa Benin Cotonou
3 Europe - France (Paris) Europe France Paris
4 Asia - China(Shanghai) Asia China Shanghai
5 Europe - United Kingdom (London) Europe United Kingdom
6 Europe - France (Orléans) Europe France Orl
7 Afrique - Togo (Lomé) Afrique Togo L
8 Afrique - Sénégal (Dakar) Afrique S n
Country<-c("Asia","Africa - Benin (Cotonou)",
"Europe - France (Paris)","Asia - China(Shanghai)", "Europe - United Kingdom (London)")
ID<-c(1,2,3,4,5)
mydata<-data.frame(ID,Country)
library(tidyr)
library(stringr)
library(dplyr)
mydata <- mydata %>% mutate(col1 = str_extract(Country, "^[:alpha:]+"),
col2 = str_extract(Country, "(?<=- ).+(?=\\()"),
col3 = str_extract(Country, "(?<=\\().+(?=\\))")) %>%
mutate(across(col1:col3, str_trim))
mydata
#> ID Country col1 col2 col3
#> 1 1 Asia Asia <NA> <NA>
#> 2 2 Africa - Benin (Cotonou) Africa Benin Cotonou
#> 3 3 Europe - France (Paris) Europe France Paris
#> 4 4 Asia - China(Shanghai) Asia China Shanghai
#> 5 5 Europe - United Kingdom (London) Europe United Kingdom London