Importing/reading a .csv file into R

Hey everyone, I was watching a YouTube video on how to import a csv file into the R environment, specifically with the read.csv () function, and the instructor explained that after either writing the exact name of the csv file or pasting the url of the csv file in the read.csv function, we needed to include TRUE/FALSE and then a separator, specifically a comma since we're trying to imprort a csv file. Question: is the TRUE and comma separator the default arguments for the read.csv functions? Because I wouldn't want to keep writing them in if they were the defaults. Thank you.

The instructors code:

tuna <- read.csv("brusers.csv", TRUE, " , ")

1 Like

this is what you find in the hel searching the funcion read.csv(): "the field separator character. Values on each line of the file are separated by this character. If sep = "" (the default for read.table ) the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns." it seems you have to specify the SEP operator. hope this can help. PS: if you write ?functionYouWantInfo in the console you can see all default and options for your function

1 Like

tuna <- read.csv("brusers.csv", TRUE, " , ")

I don't think that will work.

If you have a standard comma–deliminated file rectangular file then

tuna <- read.csv("brusers.csv")

should be fine. As @ Llama33 points out, if the separator is not a comma then you need to specify it. Let's say if is a tab rather than white spare you would use

tuna <- read.csv("brusers.csv", sep = "\t")

Using TRUE does not look like it will do any harm but if you read

?read.csv

it looks like it comes from the fill = option which is intended to deal with a specific problem and seems unneeded with a standard rectangular .csv file.

So basically a comma is the default separator, so you don't need to specific if every time. However, if it's anything other than that, then I need to include it. Also, TRUE is likely the default, which indicates that the header/column names are on the first row, which they usually are, so I don't need to include it very time. However, I need to include FALSE if my datasets column names start after he first row. Correct?

1 Like

So basically a comma is the default separator, so you don't need to specific if every time.

Correct.

However, I need to include FALSE if my datasets column names start after he first row. Correct?

No. I see I was misinterpreting what that "TRUE" was for. I was thinking it was for fill = not header =.

header = FALSE 

is used when your data does not have any column names. R will automatically assign variable names, V1, V2, V3…, to the data.frame.

As far as I can tell your column names, if they exist, must always be the first row of the data set. You can have comments , or what-you-have, before the start of the data, say in rows 1 & 2 of the file and tell read.csv to skip them.

dat1 <- read.csv("mydata.csv",  skip = 2)

but then your column names must be in row 3.

2 Likes