use of paste command is giving an error message

following code runs for X1

newdata <- subset(train, abs(normalizeZ(X1)) >= 3)
cat("There are", nrow(newdata) ,"potential X1 outliers >= |3| in train","\n")

code attempt to generalize for X1..X5 gives an error message.

#error Incomplete expression: for(i in 1:5){...
#note that nrow(newdata) is numeric
#note that potential Xi outliers is character

how do I fix the code?

for(i in 1:5){
v <- paste("X","i",sep="")
newdata <- subset(train, abs(scale(v)) >= 3)
cat("There are", nrow(newdata) ,"potential v outliers >= |3| in train","\n")
}

It would help in the future if you could provide a reproducible example, with a version of train.

So for example building an example data.frame train:

train <- data.frame(X1 = 1:50,
                    X2 = 50:1,
                    X3 = 201:250,
                    X4 = 250:201,
                    X5 = 251:300)

for(i in 1:5){
  v <- paste("X",i,sep="")
  newdata <- subset(train, abs(scale(train[[v]])) >= 3)
  cat("There are", nrow(newdata) ,"potential v outliers >= |3| in train","\n")
}
#> There are 0 potential v outliers >= |3| in train 
#> There are 0 potential v outliers >= |3| in train 
#> There are 0 potential v outliers >= |3| in train 
#> There are 0 potential v outliers >= |3| in train 
#> There are 0 potential v outliers >= |3| in train

Created on 2022-12-12 by the reprex package (v2.0.1)

There are two problems in your code, first:

here, you have "i" in quotes, so R understands the character string "i". You mean the value of the variable i, so you should not use quotes:

v <- paste("X",i,sep="")

Second, after this command, v is a variable that contains the character string "X1" (for i=1). But what you want to scale is not "X1", it's the column of train that is named X1. So you can call it with train[[v]], which means "select the column of train which is named as the value of the variable v".

1 Like

Thank you for illustrating how to create a reproducible question. That is much easier than my adjusting my datasets. I will study this tomorrow.

The following code correction is good.

bin<-hexbin(train[[v]], train$Y, xbins=50)

The next correction is not working. Xi is meant to be not a column but a column identifier X1,...X5
The double quotes

plot(bin, main="Hexagonal Binning",xlab="train[[v]]",ylab="Y")

cause train[[v]] to be output just as train[[v]]. I need to have what is inside double quotes to evaluate.
eval function does not do it. How do I do that?

Right now I have

library(hexbin)
for(i in 1:5){
v <- paste("X",i,sep="")  
bin<-hexbin(train[[v]], train$Y, xbins=50)
plot(bin, main="Hexagonal Binning",xlab="train[[v]]",ylab="Y")
par(mfrow = c(1,1))

I am trying to understand the concept here as well as the correct r code. Now for an actual example:

train <- data.frame(X1 = 1:50,
                    X2 = 50:1,
                    X3 = 201:250,
                    X4 = 250:201,
                    X5 = 251:300)
library(hexbin)
for(i in 1:5){
v <- paste("X",i,sep="")  
bin<-hexbin(train[[v]], train$Y, xbins=50)
plot(bin, main="Hexagonal Binning",xlab="train[[v]]",ylab="Y")
par(mfrow = c(1,1))
}

Error in cut.default(rcnt, colorcut, labels = FALSE) :
invalid number of intervals

Thanks.

Well in that case it's obvious: just remove the quotes :rofl:
But of course this is not what you want, if you remove the quotes, you will actually evaluate train[[v]] and get a (big) numeric vector, which is not at all informative. No, what you want to display, as far as I can tell, is the name of the variable being plotted, that is, the content of v.

library(hexbin)


train <- data.frame(X1 = rnorm(1000, mean = 1), 
                    X2 = rnorm(1000, mean = 5), 
                    Y = rnorm(1000, mean = 50))


for(i in 1:2){
  v <- paste("X",i,sep="")  
  bin<-hexbin(train[[v]], train$Y, xbins=50)
  plot(bin, main="Hexagonal Binning",xlab=v,ylab="Y")
}

Created on 2022-12-15 by the reprex package (v2.0.1)

To create this example, I changed 4 things:

  • added a Y column in the data frame, otherwise you can't use it
  • reduced the number of X from 5 to 2, just because 5 plots felt too much in this thread for an example
  • changed the definition of X1 and X2: trying to define 50 bins from 50 points might be a problem, giving this "invalid number of intervals" error. So I went to ?hexbin and copy-pasted from their example
  • removed the par(mfrow) which does not do what you might want it to do.

For that last point, it's actually a bit complicated. There are two systems of plots in R, the "base" system and the "grid" system. Base plots are controlled by par(), not grid plots. This hexbin plot is built using the grid system.

The base system was historically the first one, it's great, but not very flexible. The grid system is very powerful and most modern plotting on R is based on it, but it's quite complicated, typically as a standard user you don't want to interact with it directly. One interesting feature of the grid system is that you can save a whole grid plot in an R object and plot it later, which is not possible with base graphics.

So, ideally, if you want to plot all your hexbins side-by-side, I would suggest you save them in a variable:

plots_saved <- list()

for(i in 1:2){
  v <- paste("X",i,sep="")  
  bin<-hexbin(train[[v]], train$Y, xbins=50)
  plots_saved[[v]] <- plot(bin, main="Hexagonal Binning",xlab=v,ylab="Y")
}

str(plots_saved,max.level = 2)
#> List of 2
#>  $ X1:List of 2
#>   ..$ plot.vp  :Formal class 'hexVP' [package "hexbin"] with 9 slots
#>   ..$ legend.vp:List of 33
#>   .. ..- attr(*, "class")= chr "viewport"
#>  $ X2:List of 2
#>   ..$ plot.vp  :Formal class 'hexVP' [package "hexbin"] with 9 slots
#>   ..$ legend.vp:List of 33
#>   .. ..- attr(*, "class")= chr "viewport"

and then plot the results with a function like grid.arrange().

But this is not possible here! I don't fully understand what the plot.hexbin() function is returning, it appears to be "viewports" from the grid system, I don't know enough to figure out how to plot them, and not fully sure it's possible.

I can suggest two solutions:

1 save plots as you go

dir.create("export")

for(i in 1:2){
  v <- paste("X",i,sep="")  
  bin<-hexbin(train[[v]], train$Y, xbins=50)
  png(paste0("export/", v, ".png"))
    plot(bin, main="Hexagonal Binning",xlab=v,ylab="Y")
  dev.off()
}

with this, we create a directory "export", and save the plots as soon as we generate them.

2 use a different package

Not saying that the {hexbin} package is necessarily bad, but I find it easier to work with {ggplot2}. It's a plotting package (developed using the grid system, but no need to interact directly with it) which has had a lot of success in recent years, and you can find a LOT of resources online. So that does require learning yet another approach, but seeing as it's ubiquitous nowadays, it's probably worth learning anyway. A good start is here.

Using it, you can pretty easily get what you want:

library(ggplot2)
library(tidyr)

train_long <- pivot_longer(train,
                           cols = starts_with("X"),
                           names_to = "X_index",
                           values_to = "X_value")

ggplot(train_long) +
  geom_hex(aes(x = X_value, y = Y)) +
  facet_wrap(~ X_index)

Hello,
I am sure I tried removing the double quotes. At any rate the code gave an error. After10 or more tries with web searches of error messages I sometimes ask for help.

I need to do a better job of keeping track of what I have tried.

It was a great reply. I am saving the entire discussion to my hard drive. Thank you.
I will keep learning R.

MM

If you just removed the quotes and had plot(...,xlab=train[[v]], ...), that means the column v of the data.frame train, so it becomes equivalent as writing:

plot(...,xlab=c(4,1,54,7,2,7,86,32), ylab="Y")
which doesn't make much sense (you can't put a whole vector of numbers as an axis label), hence an error would be expected.

It can be overwhelming at first, it does get better once you become familiar with the types of data and structure of functions. Good luck!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.