Stop at the 1st decimal?

DynV · July 14, 2024, 7:01pm

A small number of columns in the data I need to work with are strings, the rest numbers. I'm using read_excel() from the readxl package to get the data ; right after it, the string columns are of type chr and the rest num. I'm tasked with finding out which columns are integers. From an advice, I tried saving the spreadsheet content into a CSV then loading that, which works like a charm ; the chr columns are the same but now a large portion of num is now instead int. Is there a way to skip writing and reading a CSV and get the same transformation? Perhaps some way to break the spreadsheet data (eg XLdata <- read_excel(...)), then put it back together without any writing to a file (eg (1) 2 steps: XLdataSplit <- split(XLdata); XLdataFused <- fuse(XLdataSplit, auto_type=TRUE) (2) 1 step: XLdataReformed <- reform(XLdata, auto_type=TRUE)) ?

In addition, from is.integer() documentation I ran

is.wholenumber <- function(x, tol = .Machine$double.eps^0.5)  abs(x - round(x)) < tol

and I'm now trying to have it stop at the 1st decimal content of a column. Someone advised me to use break and I scripted

is_integer = TRUE
for (current_row in seq_along(data$column)) {
  if (!is.wholenumber(data$column[current_row])) {
	is_integer = FALSE
	break()
  }
}

but I'm wondering if there's something better to check if a column is entirely made of integers.

Thank you kindly for your help

nirgrahamuk · July 15, 2024, 12:08pm

sure you can clean this up. your is.wholenumber, function is already vectorised, so you dont have a need to explicitly loop it yourself.
i.e.

is.wholenumber(1:4)
[1] TRUE TRUE TRUE TRUE

and you can test if they are all true with all()

Now there might be a performance reason to use your manual loop with break; but I would have thought an excel isnt going to be particularly large and the simpler code might be preferred if the speed is not an issue.

jrkrideau · July 15, 2024, 12:23pm

I have the feeling I am missing something obvious but cannot we just do something like this

library(data.table)

# Set up data.table with one integer and one numeric column.
DT <- data.table(dd = c(1, 3, 5), bb =c(2.3, 4.5, 7))
DT[, dd := as.integer(dd)]
DT

DT[, lapply(.SD, is.integer)]

dromano · July 15, 2024, 2:41pm

I'm not sure how these two goals are related for you, but if you just wanted to make sure that integer data is recognized as such after importing with read_excel(), then this is an approach you could apply:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tibble(x = 1, y = 2.2, z = 3)
#> # A tibble: 1 × 3
#>       x     y     z
#>   <dbl> <dbl> <dbl>
#> 1     1   2.2     3

tibble(x = 1, y = 2.2, z = 3) |> 
  mutate(across(where(\(x) x == as.integer(x)), as.integer))
#> # A tibble: 1 × 3
#>       x     y     z
#>   <int> <dbl> <int>
#> 1     1   2.2     3

^{Created on 2024-07-15 with reprex v2.0.2}

In other words, this should work in your case:

my_table <- read_excel("my_file.xlsx")

my_table <-
  my_table |> 
  mutate(across(where(\(x) x == as.integer(x)), as.integer))

DynV · July 15, 2024, 6:01pm

The answer: https://statisticsglobe.com/change-classes-data-frame-columns-automatically-r

About the 2nd paragraph: I'm puzzled why someone would offer to check the whole column while I explicitly asked in its 1st sentence to stop at the 1st decimal.

edit: I only moved the answer to the top.

dromano · July 15, 2024, 6:59pm

If this is a meant as a reply to me: As I said in my reply, I'm not sure what you're trying to achieve since your original post seems contradictory, or at least unclear. Maybe you could provide "before" and "after" tables that illustrate the what you're hoping to achieve, or maybe someone can (has already?) correctly interpreted your request. Your link doesn't clarify things for me, but if it actually is the solution you were looking for, you could click the checkmark button to indicate that your question is solved.

system · July 22, 2024, 6:59pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.