Syntax to Use Character Variable in Search

Can anyone help me with a character syntax issue? I have a vector with a list of numbers:

   [1] "2369-2960" "1532-2777" "1876-2026" "1872-7727" "1872-7123"
   [6] "1532-2777" "1542-7714" "1879-1190" "1090-2139" "1090-2139"

I need to convert them to the following syntax for them to work in my search function from the RISmed package. Don't worry about the details of the package, my issue is more with basic stuff. Here's output doing that:

>  EUtilsSummary('2369-2960 OR 1532-2777 OR 1876-2026 OR 1872-7727',
+     retmax=100, mindate= 2018, maxdate= 2021, datetype = "edat")
[1] "\"JMIR Public Health Surveill\"[Journal] OR \"Med Hypotheses\"[Journal] OR \"Asian J Psychiatr\"[Journal] OR \"Eur J Radiol\"[Journal] AND 2018[EDAT] : 2021[EDAT]"

So you can see that I need the search to be 'number OR number OR number'. I wrote a paste function to make that happen, which it looks like is working fine:

> issns <- paste0("\'", paste0(unique(df$issn), collapse = " OR "), "\'")
> issns
[1] "'2369-2960 OR 1532-2777 OR 1876-2026 OR 1872-7727 OR 1872-7123 OR 1542-7714 OR 1879-1190 OR 1090-2139 OR 1873-6513... 

It's like 800 of these numbers so I cut it off there. Anyway, when I run my function WITH the issns variable, I get an error:

> EUtilsSummary(issns,
+     retmax=100,  mindate= 2018, maxdate= 2021, datetype = "edat") 
Error in file(con, "r") : 
  cannot open the connection to ''2369-2960+OR+1532-2777+OR+1876-2026+OR+1872-7727+OR+1872-7123+OR+1542-7714+OR+1879-1190+OR+1090-2139+OR+1873-6513+OR+1523-6838+OR+1532-2742+OR+1474-4457+OR+1555

Why is this happening? Theoretically:

  1. Putting my issns variable should be the same as just typing them out, right?
  2. Is there some syntax / coding function I'm missing? Have tried a bunch of things with no benefit.

The only evident thing I can notice is that you are quoting the string twice, try removing "\'" from your paste0() command

@andresrcs didn't admonish about reprex (FAQ: What's a reproducible example (`reprex`) and how do I create one?) Using a reprex, complete with representative data will attract quicker and more answers.

I'll throw in a few suggestions to deal with the malformed issns. I have to assume, though, that'2369-2960'

is a well formed query. I can't check that with the API, which I don't have.

  1. Although df$issn represents a pattern of numerals separated by -, they are represented as character objects, rather than numeric objects. It's a fine point, but keeping that agnostic orientation makes using character mapping functions easier.

  2. With more than a handful of separate character patterns, it helps to isolate them in an object, then choose a function to extract them into an object that can be sent to a receiver function.


# same as df$issn: replicated because df not in namespace here
search_for <- c("2369-2960", "1532-2777", "1876-2026", "1872-7727", "1872-7123", "1532-2777", "1542-7714", "1879-1190", "1090-2139", "1090-2139", "2369-2960")

# string patterns
# lazy programmer kludge on next line
chopoff <- " OR $"
good_form <- "\\d{4}-\\d{4}"
insert <-  " OR "

unique(search_for) %>%
  str_extract_all(.,good_form) %>% 
  flatten_chr %>%       # _chr because  want an error if any numeric objects
  str_c(., insert, collapse = "") %>%
  str_remove(.,chopoff) ->  issns

#> [1] "2369-2960 OR 1532-2777 OR 1876-2026 OR 1872-7727 OR 1872-7123 OR 1542-7714 OR 1879-1190 OR 1090-2139"

Created on 2020-04-05 by the reprex package (v0.3.0)

Now issns is ready to be passed to EUtilsSummary as a simple argument. This would be easy to wrap into a function. The virtue is being able to see more clearly into issns and make further adjustments.

