Text concatenation

Slavek · October 3, 2018, 11:40am

Dear R coders,
I have a challenge. I am trying to replicate Excel concatenation function in R so merge some text columns for each record
Q1 Q2 Both (that is what I need)
URN1 aaa bbb aaabbb
URN2 ddd ddd
URN3 eee fff eeefff
URN4 ggg ggg

Can you help please?

Slavek · October 3, 2018, 11:46am

Concatenate

Slavek · October 3, 2018, 11:48am

Concatenate

As you can see, blanks should be ignored...

Slavek · October 3, 2018, 11:52am

the data I'm using was created from a database: Norway.thisYear.data and questions which need to be merged for each record (Norway.thisYear.data$URNName) are:
Norway.thisYear.data$BComm_10,
Norway.thisYear.data$BComm_9

tbradley · October 3, 2018, 11:58am

HI @Slavek, Welcome to RStudio Community!

What have you tried so far? It will be much easier to help you if you turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ, linked to below.

If you simply want to create a new column with the combined products you can do this:

library(tibble)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)

my_df <- tribble(
  ~name, ~q1, ~q2,
  "URN1", "aaa", "bbb", 
  "URN2", "ddd", NA_character_,
  "URN3", "eee", "fff", 
  "URN4", NA_character_, "ggg"
)

my_df %>% 
  mutate_at(vars(q1, q2), funs(str_replace_na(., replacement = ""))) %>% 
  mutate(both = paste(q1, q2) %>% str_trim())
#> # A tibble: 4 x 4
#>   name  q1    q2    both   
#>   <chr> <chr> <chr> <chr>  
#> 1 URN1  aaa   bbb   aaa bbb
#> 2 URN2  ddd   ""    ddd    
#> 3 URN3  eee   fff   eee fff
#> 4 URN4  ""    ggg   ggg

Created on 2018-10-03 by the reprex package (v0.2.0).

This example assumes that the empty strings are NA values. If that is not that case then you can skip the mutate_at line

Slavek · October 3, 2018, 1:30pm

Thank you very much. Unfortunately, I used just a few URNs in this example but I would like to use it for all other URNs so whatever appears in the URNName. I think this command should be more generic as he lists will be much longer.
Thank you for an advise about reprex. I'll have a look...

Slavek · October 3, 2018, 1:33pm

I'm new to R and I used paste function. Unfortunately, it does not merge text columns for each URN but the first available comment in Norway.thisYear.data$BComm_10 with the first available in Norway.thisYear.data$BComm_9 etc.

I need this function to work exactly the same way as it works in Excel.

tbradley · October 3, 2018, 1:44pm

I am confused what you mean by this. If you mean that your data has more rows, than the approach I showed will work for as many rows as you have as it is not restricted only to the toy dataset - shown in your screenshots - that I used. However, without any idea what you have tried, it is hard to really guide you any further.

Also, paste is vecotrized, so if you pass the two column directly into it like this:

paste(Norway.thisYear.data$BComm_10, Norway.thisYear.data$BComm_9)

and they are the same length, you will get a vector of the combined values.

Slavek · October 4, 2018, 8:55am

Ok, the code is:
comm.b.sect <- paste(na.omit(Norway.thisYear.data$BComm_10),
na.omit(Norway.thisYear.data$BComm_9),
na.omit(Norway.thisYear.data$BComm_8),
na.omit(Norway.thisYear.data$BComm_7)
head(comm.b.sect)

and the result:
[1] "Super hyggelig, fri tilgang på nett, kaffemaskin, veldig hyggelige kundemottakere Lyst og stort lokale med mulighet for en kopp kaffe Nei Fikk ikke tilbud om forfriskninger."
[2] "Fine og lyse lokaler med bilutstilling. Romslig og greit med sitteplass, og de har kaffemaskin og toalett. Litt trang parkering. Små lokaler og lite plass til utstilling"
...

Slavek · October 4, 2018, 8:57am

Please look at red comments. They are merged but they belong to different respondents.
I would expect:
[1] Super hyggelig, fri tilgang på nett, kaffemaskin, veldig hyggelige kundemottakere
[2] Nei
[3] Lyst og stort lokale med mulighet for en kopp kaffe
...

tbradley · October 4, 2018, 12:01pm

You're issue is likely a result of using na.omit on the individual columns. I would suggest not using that. Also, I am not sure if that is just a result of copy-paste stuff but you are missing a parenthesis at the end of your paste call.

As a note, it is extremely hard to help debug code, data, and/or error messages when screenshots are posted. Posting screenshots, instead of actual code/data/error messages, is likely to decrease your chances of getting the help you are looking for in general. Instead of posting a screenshot, please copy and paste the code and error message and format the pasted code/error messages as shown below.

To share you data, I recommend taking a look at this thread on sharing example data

Put code that is inline (such as a function name, like mutate or filter ) inside of backticks ( mutate ) and chunks of code (including error messages and code copied from the console) can be put between sets of three backticks:

```
example <- foo %>%
  filter(a == 1)
```

This process can be done automatically by highlighting your code, either inline or in a chunk, and clicking the </> button on the toolbar of the reply window!

This will help keep our community tidy and help you get the help you are looking for!

For more information, please take a look at the community's FAQ on formating code