There are a couple of columns with type int64 (glimpse) or integer64 (str) when I am reading parquet files (generated by phyton code) using the arrow::read_parquet().
How can I select and convert all these columns at once? For example using the mutate_if().
The select_if(is.integer) doesn’t work here.
On the hand, it would better to read those columns as “normal” integers, but I don’t find any option in the arrow::read_parquet() to achieve this.
I tried something similar, but it was not working (and I was lost in the troubleshooting). Unfortunately the same applies to your code, too.
The following error message is presented:
Error in selected[[i]] <- eval_tidy(.p(column, ...)) :
more elements supplied than there are to replace
I think, that I know the cause: the dttm columns are returning more than 1 class, thus more than 1 logical values:
Library is called bit64 and it does indeed have is.integer64 in it already.
Also, this pattern (class(x)[[1]]=="integer64") should never be used since it assumes that there is only one class in the object. Correct way is to use inherits
Maybe the next step would be to check if that int64 number (in my dataset) can be really represented on 32 bits and converted using the as.integer(). Or to delve a bit more in the bit64 library...
As for conversion -- it depends (as usual ). Most DB's store ID's using integer64 and for those I found it much safer to use strings instead. For any actual numbers converting to 32-bit might be a good solution given that few things normally have counts larger than 2 billions.