Error chaining two joins - tidyjson

Hello,

I am running into an issue that I cannot understand.

I am reorganizing some json data from an API into a tibble I can work with.
I created 3 tibbles: income, income_titles, header

> head(income)
# A tibble: 6 x 4
  ..JSON                   document.id array.id Value                              
  <chr>                          <int>    <int> <chr>                              
1 "{\"Value\":\"Sales..."            1        1 Sales - 1% benefit from com clients
2 "{\"Value\":\"0.00\"..."           1        2 0.00                               
3 "{\"Value\":\"14.93..."            1        3 14.93                              
4 "{\"Value\":\"15.91..."            1        4 15.91                              
5 "{\"Value\":\"6.41\"..."           1        5 6.41                               
6 "{\"Value\":\"39.84..."            1        6 39.84                              
> head(header)
# A tibble: 6 x 3
  ..JSON                  array.id Month    
  <chr>                      <int> <chr>    
1 "{\"Value\":\"30 Ju..."        2 30 Jun 21
2 "{\"Value\":\"30 Ma..."        3 30 May 21
3 "{\"Value\":\"30 Ap..."        4 30 Apr 21
4 "{\"Value\":\"30 Ma..."        5 30 Mar 21
5 "{\"Value\":\"28 Fe..."        6 28 Feb 21
6 "{\"Value\":\"30 Ja..."        7 30 Jan 21
> head(income_titles)
# A tibble: 6 x 3
  ..JSON                  document.id rowtitle                                 
  <chr>                         <int> <chr>                                    
1 "{\"Value\":\"Sales..."           1 Sales - 1% benefit from com clients      
2 "{\"Value\":\"Sales..."           2 Sales - Buying office                    
3 "{\"Value\":\"Sales..."           3 Sales - Commission from Suppliers        
4 "{\"Value\":\"Sales..."           4 Sales - Incubation - Hosting & management
5 "{\"Value\":\"Sales..."           5 Sales - Incubation - Operation expenses  
6 "{\"Value\":\"Sales..."           6 Sales - Incubation - Other projects      

I want to join income_titles and header to income.

> head(income %>% left_join(header))
Joining, by = "array.id"
# A tibble: 6 x 4
  document.id array.id Value                               Month    
        <int>    <int> <chr>                               <chr>    
1           1        1 Sales - 1% benefit from com clients NA       
2           1        2 0.00                                30 Jun 21
3           1        3 14.93                               30 May 21
4           1        4 15.91                               30 Apr 21
5           1        5 6.41                                30 Mar 21
6           1        6 39.84                               28 Feb 21
> head(income %>% left_join(income_titles))
Joining, by = "document.id"
# A tibble: 6 x 4
  document.id array.id Value                               rowtitle                           
        <int>    <int> <chr>                               <chr>                              
1           1        1 Sales - 1% benefit from com clients Sales - 1% benefit from com clients
2           1        2 0.00                                Sales - 1% benefit from com clients
3           1        3 14.93                               Sales - 1% benefit from com clients
4           1        4 15.91                               Sales - 1% benefit from com clients
5           1        5 6.41                                Sales - 1% benefit from com clients
6           1        6 39.84                               Sales - 1% benefit from com clients

But I can't join with both...

> income %>% left_join(income_titles) %>% left_join(header)
Joining, by = "document.id"
Joining, by = "array.id"
Error: Can't subset columns that don't exist.
x Location 3 doesn't exist.
ℹ There are only 2 columns.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/vctrs_error_subscript_oob>
Can't subset columns that don't exist.
x Location 3 doesn't exist.
ℹ There are only 2 columns.
Backtrace:
  1. income %>% left_join(income_titles) %>% left_join(header)
  3. dplyr:::left_join.data.frame(., header)
  4. dplyr:::join_mutate(...)
  7. tibble:::`[.tbl_df`(y_in, vars$y$out)
  8. tibble:::vectbl_as_col_location(...)
 11. vctrs::vec_as_location(j, n, names)
 13. vctrs:::stop_subscript_oob(...)
 14. vctrs:::stop_subscript(...)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/vctrs_error_subscript_oob>
Can't subset columns that don't exist.
x Location 3 doesn't exist.
ℹ There are only 2 columns.
Backtrace:
     █
  1. ├─income %>% left_join(income_titles) %>% left_join(header)
  2. ├─dplyr::left_join(., header)
  3. └─dplyr:::left_join.data.frame(., header)
  4.   └─dplyr:::join_mutate(...)
  5.     ├─rlang::set_names(y_in[vars$y$out], names(vars$y$out))
  6.     ├─y_in[vars$y$out]
  7.     └─tibble:::`[.tbl_df`(y_in, vars$y$out)
  8.       └─tibble:::vectbl_as_col_location(...)
  9.         ├─tibble:::subclass_col_index_errors(...)
 10.         │ └─base::withCallingHandlers(...)
 11.         └─vctrs::vec_as_location(j, n, names)
 12.           └─(function () ...
 13.             └─vctrs:::stop_subscript_oob(...)
 14.               └─vctrs:::stop_subscript(...)

I am stuck and can't figure this one out. Any ideas?

catkfr

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

I agree with nirgrahamuk that having more concrete examples will be much more helpful.

As I don't have more info about the data, one thing that comes to mind is how you join them without explicitly tell what to join. One thing that I'd try is to explicitly write what you want them to join on such as

final_data_frame <- left_data_frame %>%
left_join(
right_data_frame,
by = c('column_from_left_df' = 'column_from_right_df'))

Also, inspecting the intermediate process might also help: some column might change name during the intermediate join or whatever that might happen.

In any case, providing some sample data frames so that I can recreate would be helpful.

Dear both,

I created the following reprex based on my data.

library(tidyverse)

income_rep <-
  structure(
    list(
      document.id = c(1L, 1L, 1L, 1L, 1L, 1L),
      array.id = 1:6,
      Value = c(
        "Sales - 1% benefit from com clients",
        "0.00",
        "14.93",
        "15.91",
        "6.41",
        "39.84"
      ),
      ..JSON = list(
        list(Value = "Sales - 1% benefit from com clients",
             Attributes = list(
               list(Value = "c62dc423-f365-4acf-a594-5c82d9b8d272",
                    Id = "account")
             )),
        list(Value = "0.00", Attributes = list(
          list(Value = "c62dc423-f365-4acf-a594-5c82d9b8d272",
               Id = "account")
        )),
        list(Value = "14.93", Attributes = list(
          list(Value = "c62dc423-f365-4acf-a594-5c82d9b8d272",
               Id = "account")
        )),
        list(Value = "15.91", Attributes = list(
          list(Value = "c62dc423-f365-4acf-a594-5c82d9b8d272",
               Id = "account")
        )),
        list(Value = "6.41", Attributes = list(
          list(Value = "c62dc423-f365-4acf-a594-5c82d9b8d272",
               Id = "account")
        )),
        list(Value = "39.84", Attributes = list(
          list(Value = "c62dc423-f365-4acf-a594-5c82d9b8d272",
               Id = "account")
        ))
      )
    ),
    row.names = c(NA, 6L),
    class = c("tbl_json",
              "tbl_df", "tbl", "data.frame")
  )
income_titles_rep <-
  structure(
    list(
      document.id = 1:6,
      rowtitle = c(
        "Sales - 1% benefit from com clients",
        "Sales - Buying office",
        "Sales - Commission from Suppliers",
        "Sales - Incubation - Hosting & management",
        "Sales - Incubation - Operation expenses",
        "Sales - Incubation - Other projects"
      ),
      ..JSON = list(
        list(Value = "Sales - 1% benefit from com clients",
             Attributes = list(
               list(Value = "c62dc423-f365-4acf-a594-5c82d9b8d272",
                    Id = "account")
             )),
        list(Value = "Sales - Buying office",
             Attributes = list(
               list(Value = "0e15a197-cee1-43e3-948e-6f066cf96718",
                    Id = "account")
             )),
        list(Value = "Sales - Commission from Suppliers",
             Attributes = list(
               list(Value = "9554f275-d867-4142-9297-a0be6e3e5916",
                    Id = "account")
             )),
        list(Value = "Sales - Incubation - Hosting & management",
             Attributes = list(
               list(Value = "45a2030d-f64b-47c0-b576-4b3ca1ce298f",
                    Id = "account")
             )),
        list(Value = "Sales - Incubation - Operation expenses",
             Attributes = list(
               list(Value = "edc92a15-d1e2-42a0-b05e-3e4a27641502",
                    Id = "account")
             )),
        list(Value = "Sales - Incubation - Other projects",
             Attributes = list(
               list(Value = "eca48138-950b-4d73-ab3f-91b022f96b0f",
                    Id = "account")
             ))
      )
    ),
    row.names = c(NA, 6L),
    class = c("tbl_json",
              "tbl_df", "tbl", "data.frame")
  )

header_rep <-
  structure(
    list(
      array.id = 2:7,
      Month = c(
        "30 Jun 21",
        "30 May 21",
        "30 Apr 21",
        "30 Mar 21",
        "28 Feb 21",
        "30 Jan 21"
      ),
      ..JSON = list(
        list(Value = "30 Jun 21"),
        list(Value = "30 May 21"),
        list(Value = "30 Apr 21"),
        list(Value = "30 Mar 21"),
        list(Value = "28 Feb 21"),
        list(Value = "30 Jan 21")
      )
    ),
    row.names = c(NA,
                  6L),
    class = c("tbl_json", "tbl_df", "tbl", "data.frame")
  )

# 
income_rep %>% left_join(income_titles_rep, by = "document.id")
income_rep %>% left_join(income_titles_rep, by = "document.id") %>% left_join(header_rep, by = "array.id")
income_rep %>% left_join(header_rep, by = "array.id")
income_rep %>% left_join(header_rep, by = "array.id") %>% left_join(income_titles_rep, by = "document.id")

library(tidyjson)

income_rep %>% left_join(income_titles_rep, by = "document.id")
income_rep %>% left_join(income_titles_rep, by = "document.id") %>% left_join(header_rep, by = "array.id")
income_rep %>% left_join(header_rep, by = "array.id")
income_rep %>% left_join(header_rep, by = "array.id") %>% left_join(income_titles_rep, by = "document.id")

Building this reprex, it now seems as though the issue comes from a bug in tidyjson.

Before loading tidyjson, the pure tidyverse version works as expected. I just need to get rid of the the json columns which I haven't been able to do.

How could I drop the tbl_json format of these objects to create a simple tibble?

It's really tidyjson which is not working as I would expect.
I have found a solution using as_tibble() which is acceptable for what I want to do.

income_rep_tbl <- income_rep %>% as_tibble()
income_titles_rep_tbl <- income_titles_rep %>% as_tibble()
header_rep_tbl <- header_rep %>% as_tibble()

income_rep_tbl %>% left_join(income_titles_rep_tbl, by = "document.id")
income_rep_tbl %>% left_join(income_titles_rep_tbl, by = "document.id") %>% left_join(header_rep_tbl, by = "array.id")
income_rep_tbl %>% left_join(header_rep_tbl, by = "array.id")
income_rep_tbl %>% left_join(header_rep_tbl, by = "array.id") %>% left_join(income_titles_rep_tbl, by = "document.id")

income_rep_tbl %>% left_join(income_titles_rep_tbl)
income_rep_tbl %>% left_join(income_titles_rep_tbl) %>% left_join(header_rep_tbl)
income_rep_tbl %>% left_join(header_rep_tbl)
income_rep_tbl %>% left_join(header_rep_tbl) %>% left_join(income_titles_rep_tbl)

All of this works as expected.

1 Like

Issue filed tidyjson github (Error chaining two joins with tidyjson · Issue #63 · sailthru/tidyjson · GitHub) and linked back here

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.