Strange characters appearing when csv file is imported

Hello all,

I am running Rstudio on Windows 11 and attempting to read in a local csv file with the read.csv() function, and am running into an error that looks like this:

Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
** line 1 appears to contain embedded nulls**
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
** line 2 appears to contain embedded nulls**
3: In read.table(file = file, header = header, sep = sep, quote = quote, :
** incomplete final line found by readTableHeader**

Putting the file in Google Sheets and Notepad++ displays random ASCII characters. Furthermore, I found out that the encoding of this file is "CP1252"

I appended the argument: encoding="Windows-1252" at the end of the read.csv() function but there was no difference in error result.

Does anyone have any tips or suggestions? Any help would be appreciated.

Thank you.

Try adding the skipNul = TRUE argument to read.csv().
In the long term it would be better to track down the source of the nulls. Check how the CSV file was created?

Hi. Thank you for your response.

After adding the skipNul argument, the first two lines of the error codes go away but the "...incomplete final line found by readTableHeader" remains.

> obs_raw <- read.csv("...filepath/observations-July2023_July2024_v2.csv", encoding="Windows-1252", skipNul = TRUE)
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on '...filepath/observations-July2023_July2024_v2.csv'
> 

I found this error code online and it was recommended to add a new line at the end of the document, but that did not change the error either. As for the creation of the CSV, It was downloaded from the iNaturalist research site.

The last message is a Warning, not an Error. The function did complete reading the file. Have you looked at the last row of obs_raw to see if all the data are there?

Hello.
I do see that my obs_raw was created, but it has nonsense information captured in 2 rows (instead of the expected ~100). Here is a screenshot of the table:

image

Maybe this result suggests an encoding issue?

What happens if you open the file in Excel?

Is it possible that this file was compressed (with zip or gzip) at some point? This is what a binary file would look like.

I don't know of a perfect way to do it without installing additional software, but you can try:

readBin("observations-July2023_July2024_v2.csv", what = "hex")

and see if the value you get is one of those (adding the 0x).

If you have WSL, you can use file in a Linux terminal. If you installed Rtools, it comes with a file.exe (see reprex below).

Or more extreme, you can try renaming your file to add a ".zip" or ".gzip" extension and see if you get a proper csv when opening.

Example

# create compressed file

write.csv(data.frame(first_col = letters[1:4],
                     secnd_col = LETTERS[5:8]),
          "test.csv")

utils::zip("test.csv.zip", "test.csv")

file.rename("test.csv.zip", "test.csv")
#> [1] TRUE


# try to open it as csv
read.csv("test.csv")
#> Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
#> line 1 appears to contain embedded nulls
#> Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
#> incomplete final line found by readTableHeader on 'test.csv'
#> [1] PK...
#> <0 rows> (or 0-length row.names)

# read the beginning as binary: 0x50 matches zip
readBin("test.csv", what = "raw")
#> [1] 50


# I have RTools installed
Sys.getenv()[ startsWith(names(Sys.getenv()), "RTOOLS") ]
#> RTOOLS43_HOME           C:\rtools43
#> RTOOLS44_HOME           C:\RBuildTools\4.4

rtools_path <- Sys.getenv("RTOOLS44_HOME")

# path to "file.exe" provided by Rtools
file_path <- file.path(rtools_path, "usr/bin/file.exe")

file.exists(file_path)
#> [1] TRUE

system2(file_path, normalizePath("test.csv"))
#> C:\Users\me\test.csv: Zip archive data, at least v2.0 to extract, compression method=deflate

Created on 2024-07-20 with reprex v2.1.0

2 Likes

I found the {archive} package on CRAN which can help with uncompressing a binary file of unknown source (from help file): "Offers R connections and direct extraction for many archive formats including 'tar', 'ZIP', '7-zip', 'RAR', 'CAB' and compression formats including 'gzip', 'bzip2', 'compress', 'lzma' and 'xz'."

1 Like

Could you share the url of the csv file?

Hi, sure thing.

I used the export page on the site here (link) to create a query and email the download.

Could you specify what query would produce the same csv file?

Hello,

The file was zipped upon download but I was using the unzipped version (using windows built-in unzip function in file explorer).

I just re-downloaded the zipped file and tried unzipping again and it worked... I'm not sure what went wrong in the original process but this time was different.

Thank you for the links and support.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.