A small number of columns in the data I need to work with are strings, the rest numbers. I'm using read_excel() from the readxl package to get the data ; right after it, the string columns are of type chr and the rest num. I'm tasked with finding out which columns are integers. From an advice, I tried saving the spreadsheet content into a CSV then loading that, which works like a charm ; the chr columns are the same but now a large portion of num is now instead int. Is there a way to skip writing and reading a CSV and get the same transformation? Perhaps some way to break the spreadsheet data (eg XLdata <- read_excel(...)), then put it back together without any writing to a file (eg (1) 2 steps: XLdataSplit <- split(XLdata); XLdataFused <- fuse(XLdataSplit, auto_type=TRUE) (2) 1 step: XLdataReformed <- reform(XLdata, auto_type=TRUE)) ?
In addition, from is.integer() documentation I ran
is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) < tol
and I'm now trying to have it stop at the 1st decimal content of a column. Someone advised me to use break and I scripted
is_integer = TRUE
for (current_row in seq_along(data$column)) {
if (!is.wholenumber(data$column[current_row])) {
is_integer = FALSE
break()
}
}
but I'm wondering if there's something better to check if a column is entirely made of integers.
sure you can clean this up. your is.wholenumber, function is already vectorised, so you dont have a need to explicitly loop it yourself.
i.e.
is.wholenumber(1:4)
[1] TRUE TRUE TRUE TRUE
and you can test if they are all true with all()
Now there might be a performance reason to use your manual loop with break; but I would have thought an excel isnt going to be particularly large and the simpler code might be preferred if the speed is not an issue.
I'm not sure how these two goals are related for you, but if you just wanted to make sure that integer data is recognized as such after importing with read_excel(), then this is an approach you could apply:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
tibble(x = 1, y = 2.2, z = 3)
#> # A tibble: 1 × 3
#> x y z
#> <dbl> <dbl> <dbl>
#> 1 1 2.2 3
tibble(x = 1, y = 2.2, z = 3) |>
mutate(across(where(\(x) x == as.integer(x)), as.integer))
#> # A tibble: 1 × 3
#> x y z
#> <int> <dbl> <int>
#> 1 1 2.2 3
About the 2nd paragraph: I'm puzzled why someone would offer to check the whole column while I explicitly asked in its 1st sentence to stop at the 1st decimal.
If this is a meant as a reply to me: As I said in my reply, I'm not sure what you're trying to achieve since your original post seems contradictory, or at least unclear. Maybe you could provide "before" and "after" tables that illustrate the what you're hoping to achieve, or maybe someone can (has already?) correctly interpreted your request. Your link doesn't clarify things for me, but if it actually is the solution you were looking for, you could click the checkmark button to indicate that your question is solved.