gtools::mixedsort() does what it's supposed to
library(gtools)
col1 <- c(
"[-2,-1)", "[-1,412.5)", "[412.5,1188)", "[1188,1244)", "[1244,1556)",
"[1556,1628)", "[1628,1631)", "[1631,1775)", "[1775,1834)", "[1834,1950)",
"[5438,5.729e+04)")
# help(mixedsort) examples show this is expected behavior
(with_e <- mixedsort(col1))
#> [1] "[-2,-1)" "[-1,412.5)" "[412.5,1188)" "[1188,1244)"
#> [5] "[1244,1556)" "[1556,1628)" "[1628,1631)" "[1631,1775)"
#> [9] "[1775,1834)" "[1834,1950)" "[5438,5.729e+04)"
Although, mixedsort() has a scientific argument
scientific logical. Should exponential notation be allowed for numeric values
it doesn't change this output.
So, col1 starts out as typeof character and ends up as typeof charactor and therefore can be modified with regex tools to convert the scientific notation. Here's an outline with some partial code. Come back if you need help implementing this.
- Determine which elements of
col1 have scientific notation
has_e <- function(x) which(isTRUE(grepl("e",x)))
`# need to loop if has_e() returns more than one index
attributes(which(sapply(col1,has_e) == 1))$names
- save out delimiters
get_delims <- function(x){
delims = "^(.).*(.)$"
delims = unlist(regmatches(x,regexec(delims,x)))[2:3]
return(delims)
}
- pick apart the string left over after the delimiters removed,
strip_delims <- function(x) {
return(substr(x, 2, nchar(x) - 1))
}
- There is now the string "5438,5.729e+04", which needs to be split on
, and each part tested for an e
components = strsplit(x,",")
left = components[[1]][1]
right = components[[1]][2]
split_exp = unlist(strsplit(right,"e"))
re_exp = as.numeric(paste0("1","e",split_exp[2]))
#> 10000
- Convert forepart to numeric and multiply
as.numeric(left) * re_exp
-
This, in turn, needs to be converted to character and tested for scientific notation
-
If so,
format(x,scientific = FALSE)
-
That returns a string, that can be re-united with left with `paste()1
-
Finally, reunite that with the opening and closing delimiters
With
options(scipen=999)
it's possible to suppress scientific notation displays, but mixedsort() doesn't seem to respect that.