Merging object via left_join. Problem with 'tbl_vars' error message

Gluecksritter · March 15, 2020, 11:57am

Hi there,

So I'm new to R and I'm having a problem with merging together two objects via the <- left_join command.

I want to get a certain information into a shapefile, but it always shows me the same error message that I don't really understand...
Actually I'm just trying to reproduce something that my academic advisor showed me, but somehow it's not working as it should be.

So my Problem is the following:

install.packages("foreign")
install.packages("ggplot2")
install.packages("plyr")
install.packages("sp")
install.packages("shapefiles")
install.packages("raster")	
install.packages("zoom")
install.packages("gdata")	
install.packages("GISTools")
install.packages("dplyr")
install.packages("rworldmap")

library(foreign)
library(ggplot2)
library(plyr)
library(sp)
library(shapefiles)
library(raster)
library(zoom)
library(gdata)
library(GISTools)
library(dplyr) 
library(rworldmap)


shapefile <- readShapePoly("./Shapefile.zip/regbez_ex.shp")

clusterdata <- read.dbf("./Shapefile.zip/regbez_ex.dbf", header =FALSE)

shapefile@data[["BEZ_RBZ"]] %in% clusterdata[["dbf"]][["BEZ_RBZ"]]

shapefile@data <- left_join(shapefile@data, clusterdata, by =c('BEZ_RBZ'='[["dbf"]][["BEZ_RBZ"]]'))

I have no idea if all the packages are necessary for this setup, I highly doubt it. But like this you can just take the code you need to maybe reproduce the error.
The error I get is:
Fehler in UseMethod("tbl_vars") :
nicht anwendbare Methode für 'tbl_vars' auf Objekt der Klasse "list" angewendet

I know it's German, but the English error message should look a lot like:
Error in UseMethod("tbl_vars") :
inapplicable Method for 'tbl_vars' on an object with the class "list" applied

In the actual data I am trying this in, there is an information in the .dbf file that is not in the shapefile (hence the name 'clusterdata' as there are clusters of a city that we coded in a survey).
As I can't upload the files I used for this here, I hope the Code is still helpful for you to understand my problem.
Thanks a ton in advance, and if there is any problem with my question, please just ask me to edit anything that is incomprehensible or difficult to handle.
As I'm very new to R I'm grateful for every piece of advice

dromano · March 15, 2020, 1:37pm

Hi @Gluecksritter, a good way to get things started is to see if you can post the data you're working with, so folks can try can to reproduce and understand your issue. Could you try this, using the triple backticks (```) as shown?

```
<---- paste output of dput(head(shapefile, 50)) here
<---- paste output of dput(head(clusterdata, 50)) here
```

In terms of the error, it looks like clusterdata may not be a table, so that may be where the work is.

Gluecksritter · March 15, 2020, 4:48pm

Hi,
thanks for he advice, here is the output of the dput :


structure(list(LAND = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "BY", class = "factor"), 
    MODELLART = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Basis-DLM#DTK25", class = "factor"), 
    OBJART = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "75006", class = "factor"), 
    OBJART_TXT = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "AX_Gebiet_Regierungsbezirk", class = "factor"), 
    OBJID = structure(1:7, .Label = c("DEBYBDLMjK0000na", "DEBYBDLMjK0000nb", 
    "DEBYBDLMjK0000nc", "DEBYBDLMjK0000nd", "DEBYBDLMjK0000ne", 
    "DEBYBDLMjK0000nY", "DEBYBDLMjK0000nZ"), class = "factor"), 
    HDU_X = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), BEGINN = structure(c(2L, 
    1L, 1L, 1L, 1L, 1L, 1L), .Label = c("2018-10-30T20:15:28Z", 
    "2018-10-31T20:15:28Z"), class = "factor"), ENDE = structure(c(NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_), .Label = character(0), class = "factor"), ADM = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L), .Label = "3001", class = "factor"), 
    AVG = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    BEZ_GEM = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    BEZ_KRS = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    BEZ_LAN = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Freistaat Bayern", class = "factor"), 
    BEZ_RBZ = structure(c(4L, 5L, 7L, 3L, 2L, 6L, 1L), .Label = c("Mittelfranken", 
    "Niederbayern", "Oberbayern", "Oberfranken", "Oberpfalz", 
    "Schwaben", "Unterfranken"), class = "factor"), SCH = structure(c(4L, 
    3L, 6L, 1L, 2L, 7L, 5L), .Label = c("091", "092", "093", 
    "094", "095", "096", "097"), class = "factor")), data_types = c("C", 
"C", "C", "C", "C", "N", "C", "C", "C", "C", "C", "C", "C", "C", 
"C"), row.names = c("0", "1", "2", "3", "4", "5", "6"), class = "data.frame")


list(dbf = structure(list(LAND = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "BY", class = "factor"), MODELLART = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = "Basis-DLM#DTK25", class = "factor"), 
    OBJART = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "75006", class = "factor"), 
    OBJART_TXT = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "AX_Gebiet_Regierungsbezirk", class = "factor"), 
    OBJID = structure(1:7, .Label = c("DEBYBDLMjK0000na", "DEBYBDLMjK0000nb", 
    "DEBYBDLMjK0000nc", "DEBYBDLMjK0000nd", "DEBYBDLMjK0000ne", 
    "DEBYBDLMjK0000nY", "DEBYBDLMjK0000nZ"), class = "factor"), 
    HDU_X = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), BEGINN = structure(c(2L, 
    1L, 1L, 1L, 1L, 1L, 1L), .Label = c("2018-10-30T20:15:28Z", 
    "2018-10-31T20:15:28Z"), class = "factor"), ENDE = structure(c(NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_), .Label = character(0), class = "factor"), ADM = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L), .Label = "3001", class = "factor"), 
    AVG = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    BEZ_GEM = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    BEZ_KRS = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    BEZ_LAN = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Freistaat Bayern", class = "factor"), 
    BEZ_RBZ = structure(c(4L, 5L, 7L, 3L, 2L, 6L, 1L), .Label = c("Mittelfranken", 
    "Niederbayern", "Oberbayern", "Oberfranken", "Oberpfalz", 
    "Schwaben", "Unterfranken"), class = "factor"), SCH = structure(c(4L, 
    3L, 6L, 1L, 2L, 7L, 5L), .Label = c("091", "092", "093", 
    "094", "095", "096", "097"), class = "factor")), class = "data.frame", row.names = c(NA, 
-7L), data_types = c("C", "C", "C", "C", "C", "N", "C", "C", 
"C", "C", "C", "C", "C", "C", "C")), header = FALSE)

The first one is the shapefile, the second one from the clusterdata (as you implied). I hope this helps!
As you already mentioned the files are not tables, but how would I work around this? Does the left_join command only work with tables? Is there another command that could work here?
Huge thank you to everyone taking time to answer to this, I highly appreciate your help

dromano · March 15, 2020, 5:20pm

If you run str(shapefile) and str(cluster_data), you can learn something about the structure (hence str()) of the objects. From this, you can see that shapefile is a data frame (the basic table format in R), and that structure_data is a list (a generic kind of indexed object, or 'vector', as they're called in R), with the first element of structure_data being a table itself:

load data

shapefile <- 
structure(list(LAND = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "BY", class = "factor"), 
               MODELLART = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Basis-DLM#DTK25", class = "factor"), 
               OBJART = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "75006", class = "factor"), 
               OBJART_TXT = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "AX_Gebiet_Regierungsbezirk", class = "factor"), 
               OBJID = structure(1:7, .Label = c("DEBYBDLMjK0000na", "DEBYBDLMjK0000nb", 
                                                 "DEBYBDLMjK0000nc", "DEBYBDLMjK0000nd", "DEBYBDLMjK0000ne", 
                                                 "DEBYBDLMjK0000nY", "DEBYBDLMjK0000nZ"), class = "factor"), 
               HDU_X = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), BEGINN = structure(c(2L, 
                                                                           1L, 1L, 1L, 1L, 1L, 1L), .Label = c("2018-10-30T20:15:28Z", 
                                                                                                               "2018-10-31T20:15:28Z"), class = "factor"), ENDE = structure(c(NA_integer_, 
                                                                                                                                                                              NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
                                                                                                                                                                              NA_integer_), .Label = character(0), class = "factor"), ADM = structure(c(1L, 
                                                                                                                                                                                                                                                        1L, 1L, 1L, 1L, 1L, 1L), .Label = "3001", class = "factor"), 
               AVG = structure(c(NA_integer_, NA_integer_, NA_integer_, 
                                 NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
               BEZ_GEM = structure(c(NA_integer_, NA_integer_, NA_integer_, 
                                     NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
               BEZ_KRS = structure(c(NA_integer_, NA_integer_, NA_integer_, 
                                     NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
               BEZ_LAN = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Freistaat Bayern", class = "factor"), 
               BEZ_RBZ = structure(c(4L, 5L, 7L, 3L, 2L, 6L, 1L), .Label = c("Mittelfranken", 
                                                                             "Niederbayern", "Oberbayern", "Oberfranken", "Oberpfalz", 
                                                                             "Schwaben", "Unterfranken"), class = "factor"), SCH = structure(c(4L, 
                                                                                                                                               3L, 6L, 1L, 2L, 7L, 5L), .Label = c("091", "092", "093", 
                                                                                                                                                                                   "094", "095", "096", "097"), class = "factor")), data_types = c("C", 
                                                                                                                                                                                                                                                   "C", "C", "C", "C", "N", "C", "C", "C", "C", "C", "C", "C", "C", 
                                                                                                                                                                                                                                                   "C"), row.names = c("0", "1", "2", "3", "4", "5", "6"), class = "data.frame")


## end of structure
cluster_data <- 
list(dbf = structure(list(LAND = structure(c(1L, 1L, 1L, 1L, 
                                             1L, 1L, 1L), .Label = "BY", class = "factor"), MODELLART = structure(c(1L, 
                                                                                                                    1L, 1L, 1L, 1L, 1L, 1L), .Label = "Basis-DLM#DTK25", class = "factor"), 
                          OBJART = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "75006", class = "factor"), 
                          OBJART_TXT = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "AX_Gebiet_Regierungsbezirk", class = "factor"), 
                          OBJID = structure(1:7, .Label = c("DEBYBDLMjK0000na", "DEBYBDLMjK0000nb", 
                                                            "DEBYBDLMjK0000nc", "DEBYBDLMjK0000nd", "DEBYBDLMjK0000ne", 
                                                            "DEBYBDLMjK0000nY", "DEBYBDLMjK0000nZ"), class = "factor"), 
                          HDU_X = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), BEGINN = structure(c(2L, 
                                                                                      1L, 1L, 1L, 1L, 1L, 1L), .Label = c("2018-10-30T20:15:28Z", 
                                                                                                                          "2018-10-31T20:15:28Z"), class = "factor"), ENDE = structure(c(NA_integer_, 
                                                                                                                                                                                         NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
                                                                                                                                                                                         NA_integer_), .Label = character(0), class = "factor"), ADM = structure(c(1L, 
                                                                                                                                                                                                                                                                   1L, 1L, 1L, 1L, 1L, 1L), .Label = "3001", class = "factor"), 
                          AVG = structure(c(NA_integer_, NA_integer_, NA_integer_, 
                                            NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
                          BEZ_GEM = structure(c(NA_integer_, NA_integer_, NA_integer_, 
                                                NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
                          BEZ_KRS = structure(c(NA_integer_, NA_integer_, NA_integer_, 
                                                NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
                          BEZ_LAN = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Freistaat Bayern", class = "factor"), 
                          BEZ_RBZ = structure(c(4L, 5L, 7L, 3L, 2L, 6L, 1L), .Label = c("Mittelfranken", 
                                                                                        "Niederbayern", "Oberbayern", "Oberfranken", "Oberpfalz", 
                                                                                        "Schwaben", "Unterfranken"), class = "factor"), SCH = structure(c(4L, 
                                                                                                                                                          3L, 6L, 1L, 2L, 7L, 5L), .Label = c("091", "092", "093", 
                                                                                                                                                                                              "094", "095", "096", "097"), class = "factor")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                   -7L), data_types = c("C", "C", "C", "C", "C", "N", "C", "C", 
                                                                                                                                                                                                                                                                                                        "C", "C", "C", "C", "C", "C", "C")), header = FALSE)

# end of structure

inspect structure of data

# load very useful package, defines '%>%' ('then apply') , or 'pipe', syntax
library(tidyverse)
shapefile %>% str()
#> 'data.frame':    7 obs. of  15 variables:
#>  $ LAND      : Factor w/ 1 level "BY": 1 1 1 1 1 1 1
#>  $ MODELLART : Factor w/ 1 level "Basis-DLM#DTK25": 1 1 1 1 1 1 1
#>  $ OBJART    : Factor w/ 1 level "75006": 1 1 1 1 1 1 1
#>  $ OBJART_TXT: Factor w/ 1 level "AX_Gebiet_Regierungsbezirk": 1 1 1 1 1 1 1
#>  $ OBJID     : Factor w/ 7 levels "DEBYBDLMjK0000na",..: 1 2 3 4 5 6 7
#>  $ HDU_X     : int  0 0 0 0 0 0 0
#>  $ BEGINN    : Factor w/ 2 levels "2018-10-30T20:15:28Z",..: 2 1 1 1 1 1 1
#>  $ ENDE      : Factor w/ 0 levels: NA NA NA NA NA NA NA
#>  $ ADM       : Factor w/ 1 level "3001": 1 1 1 1 1 1 1
#>  $ AVG       : Factor w/ 0 levels: NA NA NA NA NA NA NA
#>  $ BEZ_GEM   : Factor w/ 0 levels: NA NA NA NA NA NA NA
#>  $ BEZ_KRS   : Factor w/ 0 levels: NA NA NA NA NA NA NA
#>  $ BEZ_LAN   : Factor w/ 1 level "Freistaat Bayern": 1 1 1 1 1 1 1
#>  $ BEZ_RBZ   : Factor w/ 7 levels "Mittelfranken",..: 4 5 7 3 2 6 1
#>  $ SCH       : Factor w/ 7 levels "091","092","093",..: 4 3 6 1 2 7 5
#>  - attr(*, "data_types")= chr  "C" "C" "C" "C" ...
cluster_data %>% str()
#> List of 2
#>  $ dbf   :'data.frame':  7 obs. of  15 variables:
#>   ..$ LAND      : Factor w/ 1 level "BY": 1 1 1 1 1 1 1
#>   ..$ MODELLART : Factor w/ 1 level "Basis-DLM#DTK25": 1 1 1 1 1 1 1
#>   ..$ OBJART    : Factor w/ 1 level "75006": 1 1 1 1 1 1 1
#>   ..$ OBJART_TXT: Factor w/ 1 level "AX_Gebiet_Regierungsbezirk": 1 1 1 1 1 1 1
#>   ..$ OBJID     : Factor w/ 7 levels "DEBYBDLMjK0000na",..: 1 2 3 4 5 6 7
#>   ..$ HDU_X     : int [1:7] 0 0 0 0 0 0 0
#>   ..$ BEGINN    : Factor w/ 2 levels "2018-10-30T20:15:28Z",..: 2 1 1 1 1 1 1
#>   ..$ ENDE      : Factor w/ 0 levels: NA NA NA NA NA NA NA
#>   ..$ ADM       : Factor w/ 1 level "3001": 1 1 1 1 1 1 1
#>   ..$ AVG       : Factor w/ 0 levels: NA NA NA NA NA NA NA
#>   ..$ BEZ_GEM   : Factor w/ 0 levels: NA NA NA NA NA NA NA
#>   ..$ BEZ_KRS   : Factor w/ 0 levels: NA NA NA NA NA NA NA
#>   ..$ BEZ_LAN   : Factor w/ 1 level "Freistaat Bayern": 1 1 1 1 1 1 1
#>   ..$ BEZ_RBZ   : Factor w/ 7 levels "Mittelfranken",..: 4 5 7 3 2 6 1
#>   ..$ SCH       : Factor w/ 7 levels "091","092","093",..: 4 3 6 1 2 7 5
#>   ..- attr(*, "data_types")= chr [1:15] "C" "C" "C" "C" ...
#>  $ header: logi FALSE

# three ways to extract data frame from cluster_data
cluster_data <- cluster_data$dbf
cluster_data <- cluster_data[['dbf']]
cluster_data <- cluster_data[[1]] # for 1st element of list

# compare contents of the two data frames
shapefile == cluster_data  # comparisions involving NA produce NAs

^{Created on 2020-03-15 by the reprex package (v0.3.0)}

From this, you can see the two tables are likely identical, so it's not clear what the purpose of the left join from your code is -- do you know what the goals is?

Gluecksritter · March 15, 2020, 6:07pm

So, this was just some example I tried to put together to see what the problem is.
In the actual files I'm working on, there is information in the clusterdata that is not in the shapefile. In the survey that we did we coded clusters of where the people are living, and these clusters (hence the name clusterdata) are not in the shapefile, because the shapefile came to us from another municipal office of statistics.
Maybe the example was not the best I could find, because here there is no extra information in the clusterdata that is not in the shapefile. But shapefiles that are freely accessible are not as easy to find.
My professor did the whole thing with a file as clusterdata that was in .csv format (which makes things far easier, as this is always registered as a table). But I can't get this file, so I'm stuck with the .dbf-file and it's making problems all along....
Somehow I'm also incapable of converting the .dbf-file into a .csv-file. It always loses a lot of information when I try converting it, mostly it just creates a file with only one variable where all the info is stored then.
But converting a .dbf-file into a .csv-file is not a subject for the RStudio Community, so I wanted to know if there is a way of getting it done sticking with the .dbf-file.
Long story short: In the actual file there is extra info in the clusterdata, so that's what the left_join is for.
But as mentioned before, I'm just trying to wrap my head around the template my prof made and I'm having big time trouble with it...

dromano · March 15, 2020, 6:11pm

In that case, could you use dput() to post samples of your actual data? That help in recreating your situation as well as illustrating the issues and tools involved.

Gluecksritter · March 16, 2020, 10:04am

@dromano:
Thanks a lot for your help and for your will to understand what the actual problem is. That helped me to find out, I don't really have a problem with RStudio. I seem to be having a problem with my eyes, maybe I need glasses, or just a pause from work...
I've just been using the wrong file the whole time. It never occured to me, that I named two files very similar and I just checked everything I did again, and now I found the source of all evil. It was myself all along
So, with the correct file everything is running smoothly, no problems whatsoever. So sorry for stealing your time, especially on a sunday
I have no idea how that slipped past me so many times, but thanks to your questioning I now discovered my fault. Of course the first thing I did is to rename the other file.

system · April 6, 2020, 10:04am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.