How to remove the third word within a column?

Hi

My data looks like this:

data.frame(
stringsAsFactors = FALSE,
species = c("Bubasis agape agape",
"Bubasis agape","Bubasis agape","Bubasis agape",
"Bubasis agape ruby","Bubasis agape ruby","Bubasis agape ruby",
"Bubasis agape")
)

Some of these species names have a subspecies name attached as well, and I'm wondering if it's possible to write a code that removes the third word within each row of a particular column?

Thank you!!

Hi!

A tidyverse solution would be:

library(tidyverse)

test <- data.frame(
  stringsAsFactors = FALSE,
  species = c("Bubasis agape agape",
              "Bubasis agape","Bubasis agape","Bubasis agape",
              "Bubasis agape ruby","Bubasis agape ruby","Bubasis agape ruby",
              "Bubasis agape")
)

test %>%
  as_tibble() %>%
  mutate(
    species_clean = map_chr(
      str_split(species, pattern = "\\s+"),
      ~ str_flatten(.x[1:2], " ")))
#> # A tibble: 8 × 2
#>   species             species_clean
#>   <chr>               <chr>        
#> 1 Bubasis agape agape Bubasis agape
#> 2 Bubasis agape       Bubasis agape
#> 3 Bubasis agape       Bubasis agape
#> 4 Bubasis agape       Bubasis agape
#> 5 Bubasis agape ruby  Bubasis agape
#> 6 Bubasis agape ruby  Bubasis agape
#> 7 Bubasis agape ruby  Bubasis agape
#> 8 Bubasis agape       Bubasis agape

Created on 2022-03-01 by the reprex package (v2.0.1)

1 Like

Perfect, thank you. May I ask a follow up question. If I instead wanted to replace the current column with the new one you crated (species_clean), how would I do that?

1 Like

You could name the new column species instead of species_clean in the mutate statement. This will overwrite the old column

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.