gtools::mixedsort()
does what it's supposed to
library(gtools)
col1 <- c(
"[-2,-1)", "[-1,412.5)", "[412.5,1188)", "[1188,1244)", "[1244,1556)",
"[1556,1628)", "[1628,1631)", "[1631,1775)", "[1775,1834)", "[1834,1950)",
"[5438,5.729e+04)")
# help(mixedsort) examples show this is expected behavior
(with_e <- mixedsort(col1))
#> [1] "[-2,-1)" "[-1,412.5)" "[412.5,1188)" "[1188,1244)"
#> [5] "[1244,1556)" "[1556,1628)" "[1628,1631)" "[1631,1775)"
#> [9] "[1775,1834)" "[1834,1950)" "[5438,5.729e+04)"
Although, mixedsort()
has a scientific
argument
scientific
logical. Should exponential notation be allowed for numeric values
it doesn't change this output.
So, col1
starts out as typeof
character and ends up as typeof
charactor and therefore can be modified with regex
tools to convert the scientific notation. Here's an outline with some partial code. Come back if you need help implementing this.
- Determine which elements of
col1
have scientific notation
has_e <- function(x) which(isTRUE(grepl("e",x)))
`# need to loop if has_e() returns more than one index
attributes(which(sapply(col1,has_e) == 1))$names
- save out delimiters
get_delims <- function(x){
delims = "^(.).*(.)$"
delims = unlist(regmatches(x,regexec(delims,x)))[2:3]
return(delims)
}
- pick apart the string left over after the delimiters removed,
strip_delims <- function(x) {
return(substr(x, 2, nchar(x) - 1))
}
- There is now the string "5438,5.729e+04", which needs to be split on
,
and each part tested for an e
components = strsplit(x,",")
left = components[[1]][1]
right = components[[1]][2]
split_exp = unlist(strsplit(right,"e"))
re_exp = as.numeric(paste0("1","e",split_exp[2]))
#> 10000
- Convert forepart to numeric and multiply
as.numeric(left) * re_exp
-
This, in turn, needs to be converted to character and tested for scientific notation
-
If so,
format(x,scientific = FALSE)
-
That returns a string, that can be re-united with left
with `paste()1
-
Finally, reunite that with the opening and closing delimiters
With
options(scipen=999)
it's possible to suppress scientific notation displays, but mixedsort()
doesn't seem to respect that.