Error message when calling a Plot function

Hey

I have a problem with the Plot function because I get an error message, because it seems a variable is not correctly defined while for me everything is okay. so I don't understand why R says that it is unknown or not specified.

here you are with the reprex

# To clean up the memory of your current R session run the following line
rm(list=ls(all=TRUE))

#install tidyverse and reprex
install.packages("tidyverse")
#> Installing package into 'C:/Users/walid/Documents/R/win-library/3.4'
#> (as 'lib' is unspecified)
#> package 'tidyverse' successfully unpacked and MD5 sums checked
#> 
#> The downloaded binary packages are in
#>  C:\Users\walid\AppData\Local\Temp\RtmpsN9QMt\downloaded_packages
library(reprex)

# Set your directory to the folder where you have downloaded the SKU dataset



# install readr
library(readr)

# Let's load our dataset

data <- read_delim('CO2_passenger_cars_v14.csv', "\t", escape_double = FALSE, trim_ws = TRUE)# The function read.table enables us to read flat files such as .csv files
#> Error: 'CO2_passenger_cars_v14.csv' does not exist in current working directory ('C:/Users/walid/AppData/Local/Temp/RtmpA70859').

View(data)
#> Error in as.data.frame.default(x): impossible de convertir automatiquement la classe  ""function"" en un tableau de données (data.frame)


# Now let's have a look at our variables and see some summary statistics
class(data)
#> [1] "function"
dim(data)
#> NULL
str(data) # The str() function shows the structure of your dataset and details the type of variables that it contains
#> function (..., list = character(), package = NULL, lib.loc = NULL, 
#>     verbose = getOption("verbose"), envir = .GlobalEnv)
summary(data) # The summary() function provides for each variable in your dataset the minimum, mean, maximum and quartiles
#> Error in object[[i]]: objet de type 'closure' non indiçable
names(data)
#> NULL

#Choice of the explanatory variables for the regression
data <- data[,c("id", "MS", "Mk", "Cn", "r", "e (g/km)", "m (kg)", "Ft", "ec (cm3)", "ep (KW)")]
#> Error in data[, c("id", "MS", "Mk", "Cn", "r", "e (g/km)", "m (kg)", "Ft", : objet de type 'closure' non indiçable
View(data)
#> Error in as.data.frame.default(x): impossible de convertir automatiquement la classe  ""function"" en un tableau de données (data.frame)

#Column names
names(data) <- c("Id", "MemberState", "Manufacturor", "BrandName", "TotalNewRegistration", "CO2Emission(g/km)", "Weight(kg)", "FuelType", "EngineCapacity(cm3)", "EnginePower(KW)")
#> Error in names(data) <- c("Id", "MemberState", "Manufacturor", "BrandName", : names() appliqué à un object autre qu'un vecteur


#Remove NA rows
data_NA_Free <- na.omit(data)

summary(data_NA_Free)
#> Error in object[[i]]: objet de type 'closure' non indiçable

# Let's plot our data to see if we can identify groups visually 
plot(data_NA_Free$MemberState, data_NA_Free$CO2Emission(g/km), main = "Emission per country", xlab = "Country Name", ylab = "CO2 Emission")
#> Error in data_NA_Free$MemberState: objet de type 'closure' non indiçable

Created on 2018-04-09 by the reprex package (v0.2.0).

you can find a sample of the data. at
https://www.dropbox.com/s/err4xn5usrb6ngl/data_sample.csv?dl=0

thanks for your help
walid

1 Like

Hi Walid --

loading data

Firstly, that data_sample you provided is a comma separated variable file. If you open it with a text editor, you'll notice values are all separated by commas.
Heres a little sample:

"","Id","MemberState","Manufacturor","BrandName","TotalNewRegistration","CO2Emission(g/km)","Weight(kg)","FuelType","EngineCapacity(cm3)","EnginePower(KW)"
"1",174754,"DK","Mercedes-Benz","V-Klasse",3,158,2145,"Diesel",2143,140
"2",174755,"DK","Mercedes-Benz","Vito Tourer",5,158,1985,"Diesel",2143,100
"3",174756,"DK","Mercedes-Benz","V-Klasse",2,158,2170,"Diesel",2143,140

Your read_delim argument has delim = "\t", which is for a tab deliminated file.

With base-R, something like data <- read.csv('CO2_passenger_cars_v14.csv') should get your data loaded.

Also note the "Loading Data" widget built into RStudio. For tasks like this, it's quite handy to help get your data loaded correctly. And it then also supplies a copy of the code you can use to run in future.

Other errors

Your error messages around class, dim, str and so on come from the bad data load. data happens to also be a function that Loads specified data sets, or list the available data sets.
Type ?data into your console and click enter to learn more, if you're interested.

Your plot

Assuming you got your data loaded, I see an error in your plot call too.

plot(data_NA_Free$MemberState, data_NA_Free$CO2Emission**(g/km)**, main = "Emission per country", xlab = "Country Name", ylab = "CO2 Emission")

I added double-asterisks around those parentheses in the variable name CO2Emission(g/km) Parentheses and slashes are not valid symbols for object names in R.
The load converted these symbols into periods. So with the way I loaded your data, the following plot creates a nice plot,

   plot(
     data_NA_Free$MemberState, 
     data_NA_Free$CO2Emission.g.km., 
     main = "Emission per country", xlab = "Country Name", ylab = "CO2 Emission")

Hi Curtis,

thanks a lot for your help. It is very useful, but I should apologise because I made your job a little bit complicated as the original dataset was a mess and Mara helped me opening it, and when I made a subset I saved it properly in a csv format. So I shouldn't have include this in the program.

still I think the error came from the wrong name I gave to the variable CO2 and changed it accordingly but when I tried making the plot I got the following message. Actually the dim, str, etc are all well with me, because I used them on a clean data, but now I don't understand this message when plotting:
plot(data_NA_Free$MemberState, data_NA_Free$CO2Emission_g.km, main = "Emission per country", xlab = "Country Name", ylab = "CO2 Emission")
Error in plot.window(...) : 'xlim' nécessite des valeurs finies
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf

cordially
walid

the french sentence: Error in plot.window(...) : 'xlim' nécessite des valeurs finies, means that xlim needs finished values, so does it mean that may in the small sample i provided there is no problem with value, while there are with the full dataset?

I'm struggling to replicate your error. But (depending on how you loaded your data) you may have mislabeled your y-axis variable, CO2Emission_g.km.


data <- read.csv("https://www.dropbox.com/s/err4xn5usrb6ngl/data_sample.csv?dl=1")

data_NA_Free <- na.omit(data)

plot(data_NA_Free$MemberState, 
     data_NA_Free$CO2Emission.g.km., 
     main = "Emission per country", 
     xlab = "Country Name", ylab = "CO2 Emission")

This could definitely be the case, and it sounds like this usually has to do with NAs (or accidental character values in your numeric variable somehow), based on the StackOverflow thread here:

Hi Curtis

do you mean applying your program on the full data to see if it still works. you can access it at
https://www.dropbox.com/s/ekc3fxc6ke76ics/CO2_passenger_cars_v14.csv?dl=0

if it doesn't work, how can we figure which row makes problems, because in my program i removed all NA rows.

thanks

You bet!
You might also check out ggplot2's geom_boxplot.

That file has UTF-16LE encoding, which (I think) made it a pain to load.

thank you I will try

thanks Mara, I read this, but I shouldn't have anymore NA in my final dataset because I removed all rows containing this. even when I check a summary there is no NA. is there any other kind of "bad" data please? how can I find them?

Dear All,

I tried to remove NA and empty cells from the dataset, but I don't understand why I do still get the same message and why when I sort the data, I can see NA in some cells, while in the same time if I look for sum(is.na(data)) I found 0.
Another problem when I read the reprex, it says that it can not convert a "function" class into a data frame? but when I see my variable I only have integer and characters, so how is it possible. here you are with the reprex

Also I don't understant why the plot is possible on the small 100 rows dataset but when I tried 1000 it didn't work. I noticed empty cells and removed them but still the same problem

# To clean up the memory of your current R session run the following line
rm(list=ls(all=TRUE))

#install tidyverse and reprex
install.packages("tidyverse")
#> Installing package into 'C:/Users/walid/Documents/R/win-library/3.4'
#> (as 'lib' is unspecified)
#> package 'tidyverse' successfully unpacked and MD5 sums checked
#> 
#> The downloaded binary packages are in
#>  C:\Users\walid\AppData\Local\Temp\RtmpGqFm0s\downloaded_packages
library(reprex)

# Set your directory to the folder where you have downloaded the SKU dataset



# install readr
library(readr)

# Let's load our dataset

data <- read_delim('CO2_passenger_cars_v14.csv', "\t", escape_double = FALSE, trim_ws = TRUE)# The function read.table enables us to read flat files such as .csv files
#> Error: 'CO2_passenger_cars_v14.csv' does not exist in current working directory ('C:/Users/walid/AppData/Local/Temp/Rtmp2nteG9').

View(data)
#> Error in as.data.frame.default(x): impossible de convertir automatiquement la classe  ""function"" en un tableau de données (data.frame)


#write.csv(data_sample, "data_sample.csv")#

# Now let's have a look at our variables and see some summary statistics
class(data)
#> [1] "function"
dim(data)
#> NULL
str(data) # The str() function shows the structure of your dataset and details the type of variables that it contains
#> function (..., list = character(), package = NULL, lib.loc = NULL, 
#>     verbose = getOption("verbose"), envir = .GlobalEnv)
summary(data) # The summary() function provides for each variable in your dataset the minimum, mean, maximum and quartiles
#> Error in object[[i]]: objet de type 'closure' non indiçable
names(data)
#> NULL

#Choice of the explanatory variables for the regression
data <- data[,c("id", "MS", "Mk", "Cn", "r", "e (g/km)", "m (kg)", "Ft", "ec (cm3)", "ep (KW)")]
#> Error in data[, c("id", "MS", "Mk", "Cn", "r", "e (g/km)", "m (kg)", "Ft", : objet de type 'closure' non indiçable
View(data)
#> Error in as.data.frame.default(x): impossible de convertir automatiquement la classe  ""function"" en un tableau de données (data.frame)

#Column names
names(data) <- c("Id", "MemberState", "Manufacturor", "BrandName", "TotalNewRegistration", "CO2Emission.g.km", "Weight_kg", "FuelType", "EngineCapacity_cm3", "EnginePower_KW")
#> Error in names(data) <- c("Id", "MemberState", "Manufacturor", "BrandName", : names() appliqué à un object autre qu'un vecteur


#Remove NA rows
data <- data[!(data$CO2Emission.g.km == "" | is.na(data$CO2Emission.g.km)), ]
#> Error in data$CO2Emission.g.km: objet de type 'closure' non indiçable

data <- data[!(data$MemberState == "" | is.na(data$MemberState)), ]
#> Error in data$MemberState: objet de type 'closure' non indiçable

data<-data[complete.cases(data), ]
#> Error in complete.cases(data): 'type' (closure) de l'argument incorrect


# Let's plot our data to see if we can identify groups visually 
plot(data$MemberState, data$CO2Emission.g.km, main = "Emission per country", xlab = "Country Name", ylab = "CO2 Emission")
#> Error in data$MemberState: objet de type 'closure' non indiçable

Created on 2018-04-12 by the reprex package (v0.2.0).

cordially
walid

also something strange is that it says that the variable memberstate and CO2Emission are object of "closure" type wich can not be an indice. what is this please

In your reprex, note there's an error message after you tried to load the data.
#> Error: 'CO2_passenger_cars_v14.csv' does not exist in current working directory ('C:/Users/walid/AppData/Local/Temp/Rtmp2nteG9').
Looks like you got that file path wrong.

You didn't have any similar error loading your small dataset.


In terms of the error about objet de type 'closure' non indiçable, recall my reply above;

Because you didn't load your csv and didn't assign it to the name data, when you call data later on, R turns to the only data it knows, which is a function.


I found your full-sized data quite tricky to load into because of it's unusual encoding. Can you open it in a spreadsheet software and convert it into a tranditional CSV?

thanks Curtis for your help. I think if I want to finish the MOOC I am following I have to work on another dataset or I will never finish on time. I will work on this one after completion to learn from my mistakes. thanks for your precious advices and @Mara as well.