Unable to import Excel workbook

Using R it is easy, working with large files under limited computational resources it is not.

In windows you can go to task administrator and check how much free ram you have available

Thanks for the help @andresrcs

What should I do now

Try to close as much unnecessary programs and back processes as you can, before trying to import your excel file into R.

@afzalphatan As mentioned by @andresrcs this is definitely a memory problem. There is one more thing that you can try memory.limit(size=2500). This will allocate more memory to R - 2500 is in MB.
Although there is a cost to it. All the other processes will get stuck and there is a high chance that your system will hang for a while.

Just a couple of questions regarding the data - Is this a db extract? If yes, why don't you directly connect to the db and optimize the data before pulling it into R?

Thanks!
Heramb

2 Likes

That is the whole point ... Company doesn't allow directly to connect to database..

whatsoever it is .. one has to download the data and analyse ...

I previously had the same problem using readxl on a large Excel file, one that in my case was about 200MB that had many columns as well as rows. I monitored the use of memory as the file was read, and saw memory allocation climb to well above 1GB before crashing R. I was not able to resolve the problem directly, and ended up using a CSV export for the file concerned, read in successfully by the base-R read.csv function (and by readr's read_csv). There were no issues with memory allocation growing exceptionally quickly when reading the csv file as input, unlike when using readxl to read the file.

I found that other Excel packages had their own problems, such as Java dependencies which were not practicable given admin restrictions on my PC for instance. All in all, using a CSV export was the only solution which worked for me in the circumstances.

I continue to use readxl, but not on large Excel files. As the problem is difficult to replicate, and the data involved can't be anonymised, there is no simple way to generate a reprex to show others nor to report the problem for investigation by the package authors.

1 Like

Thanks for your kind help @stewart.ross

You mean to say that using read_csv() is resolving the issue ??? can you please elaborate how can i solve this issue ?

what is difference in "readr" and "readxl" package ??? any idea which one is better one

readr is a package for reading text-based files. An introduction to the latest version is available at https://blog.rstudio.com/2016/08/05/readr-1-0-0/, from which this quote is taken:

readr makes it easy to read many types of rectangular data, including csv, tsv and fixed width files. Compared to base equivalents like read.csv() , readr is much faster and gives more convenient output: it never converts strings to factors, can parse date/times, and it doesn’t munge the column names.

readxl, as its name suggests, is for Excel files. To quote from https://www.rstudio.com/resources/webinars/whats-new-with-readxl/,

readxl is the Tidyverse solution for reading data stored in the legacy xls format or the more modern xlsx format. It has no tricky external dependencies, is quite speedy, and is easy to install and use across all operating systems.

For Excel data I have found readxl to be excellent in general, with the exception of the fast-growing use of memory which as mentioned can in my experience cause a complete crash of RStudio.

It's not a matter of which is better - they do different things. readr is for text-based files, readxl for Excel files. These are not the same, so to ask which is better is to ask the wrong question.

As a work-round of readxl crashing I have exported the data concerned to csv and read that in as text. It is only a work-round - if readxl did not crash on large files I'd be using it for almost all of the export-source-data-then-import tasks I do with R, as it is very simple and easy to use.

3 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.