Keep getting error in eval, object not found error?

Hi there! I am quite new to Studio and R in general and have been trying to run a binary logistic regression on some variables in my dataset. I want to figure out if the variables colour, stylist and environment have an effect on whether or not customers have been going to a hairdresser in 2021 (this data is recorded as 1s for yes and 0s for no in the dataset).
I use this following code:

logit <- read.csv("Hairdresser.csv")
pleasework <- glm(Active2021 ~ Sat_Colour + Sat_Stylist + Sat_Enviro, data = logit, family = "binomial")
summary(please work)

But always receive the following error:
Error in eval(predvars, data, env) : object 'Active2021' not found

Can anyone help me out with what I am doing wrong?

It looks like there is no column named Active2021 in your data. Run

colnames(logit)

and see what the columns are named. Carefully note the case of the letters. If you cannot see the problem, please post the output of that command.

Hi! Thank you so much for replying! You were right, there was no column named Active2021 in the data, but that's so weird given that I have the excel open right now and it clearly shows that there is a column named Active2021? Instead I have some weird column names named X.1, X.2, X.3 but I can't see what they're related to.

The csv file has something peculiar in its structure that causes read.csv() to not see the column headers. Open the csv file in a plain text editor. On Windows that would be Notepad, and look at the data structure. It should be something like

Sat-Colour,Sat_Stylist,Sat_Enviro,Active2021
1,0,0,1
0,0,1,0

Is there any content above the headers? If you cannot see the problem, copy an the top few lines of the file and post them here. Place a line with three back ticks just before and after the file content, like this
```
Pasted file content
```

I copied it onto pages because I'm on a Mac and this is what is showing up for me:

Gender_F,,Race,Race1,Race2,,PreferredStylist,Music,,Quiet,
F,1,,RACE1,1,1,,Y,1,,Y,1,,CityA,1,0,0,0,0,0
M,0,,RACE2,2,2,,N,0,,N,0,,CityB,0,1,0,0,0,0
,,,OTHER,0,0,,,,,,,,CityC,0,0,1,0,0,0
,,,,,,,,,,,,,CityD,0,0,0,1,0,0
,,,,,,,,,,,,,CityE,0,0,0,0,1,0
Active2020,Act_20,,,,,,,,,,,,CityF,0,0,0,0,0,1
Y,1,,,,,,,,,,,,CityG,0,0,0,0,0,0
N,0,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,
Active2021,Act_21,,,,,,,,,,,,,,,,,
Y,1,,,,,,,,,,,,,,,,,,
N,0,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,

The column names that RStudio returned were as follows:

[1] "Gender" "Gender_F" "X" "Race" "Race1" "Race2"
[7] "X.1" "PreferredStylist" "Music" "X.2" "Quiet"
[13] "X.3"

The column names returned by read.csv() make sense given the content of the csv file. They are simply read from the first row. The X, X.1 and X.2 column names are there because there are three blank entries in the first row. None of the variables from your formula

Active2021 ~ Sat_Colour + Sat_Stylist + Sat_Enviro

are in that header row. I have to be on a call in two minutes, so all I can say at the moment is that your csv file does not match the fit you are trying to do.

The implication is that your excel needs cleaning up.
make sure all table columns have a title, or remove the column.
Dont have one table directly above another on the same excel sheet.
if the excel content is two tables, put one on one sheet and the other on another.

omg

Okay this is going to sound so stupid but actually my assignment has variables under a completely different name but because I was a little scared that my school would consider this cheating at first, I changed the titles of the columns and then forgot to keep up the act when answering the replies - so let me rewrite this bit:

The column names that RStudio returned were as follows:

[1] "Gender" "Gender_F" "X" "Race" "Race1" "Race2"
[7] "X.1" "Sat_Stylist" "Sat_Colour" "X.2" "Sat_Enviro"
[13] "X.3"

(it is not considered cheating by the way, just in case)

I agree with @nirgrahamuk that it looks like the data need cleaning. You have 13 header columns but the first data row

F,1,,RACE1,1,1,,Y,1,,Y,1,,CityA,1,0,0,0,0,0

has many more columns. Is the part that starts with CityA a different table?
Also, Active2021 is not a column at all. Which column shows the outcome you are trying to model?

Hi guys! Sorry for the really late reply, but I appreciate all of your replies! I have figured out the problem eheh

My university had provided us with two documents with the dataset, one in excel and one in csv format - because my Rstudio wouldn't run the excel dataset. I had assumed that they would have been identical as it is the same dataset and it was the university that provided it to me - but it turns out that the CSV document was not identical (as exhibited with the column names problem) and was formatted really weirdly so I copied the data needed from the excel and made a csv myself for the data I needed which solved the problem.

Thank you for all the advice with the code for the column names and the advice for cleaning up the data. I really appreciate it so much!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.