Attempting to convert 16 column bike-share spreadsheet to 8 columns to make ready for Tableau

Have been working several hours on converting 16 column spreadsheet into 8 (doubling the number of rows) for bike-share routes to be on two adjacent rows rather than a single row. The first 8 column header names are identical to the last 8, so I wonder: what needs to be changed about the spreadsheet? Here's the last few code snippets and the resulting error messages: new_data <- bike_routes %>%

  • separate_rows(c(trip_id, subscriber_type, start_time, ride_length, station_name, latitude, longitude, origin_destination), sep = ",")
    

Error in separate_rows():
! Can't subset columns that don't exist.
:heavy_multiplication_x: Column trip_id doesn't exist.
Run rlang::last_trace() to see where the error occurred.

rlang::last_trace()
<error/vctrs_error_subscript_oob>
Error in separate_rows():
! Can't subset columns that don't exist.
:heavy_multiplication_x: Column trip_id doesn't exist.


Backtrace:

  1. ├─bike_routes %>% ...
  2. ├─tidyr::separate_rows(...)
  3. └─tidyr:::separate_rows.data.frame(...)
    Run rlang::last_trace(drop = FALSE) to see 21 hidden frames.

new_data <- bike_routes %>%

  • separate_rows(c("trip_id", "subscriber_type", "start_time", "ride_length", "station_name", "latitude", "longitude", "origin_destination"), sep = ",")
    

Error in separate_rows():
! Can't subset columns that don't exist.
:heavy_multiplication_x: Column trip_id doesn't exist.
Run rlang::last_trace() to see where the error occurred.

rlang::last_trace()
<error/vctrs_error_subscript_oob>
Error in separate_rows():
! Can't subset columns that don't exist.
:heavy_multiplication_x: Column trip_id doesn't exist.


Backtrace:

  1. ├─bike_routes %>% ...
  2. ├─tidyr::separate_rows(...)
  3. └─tidyr:::separate_rows.data.frame(...)
    Run rlang::last_trace(drop = FALSE) to see 21 hidden frames.

new_data <- bike_routes %>%

  • separate_rows(c(1:8), sep = ",")
    

Error in list_unchop():
! Can't combine x[[1]] and x[[649232]] .
Run rlang::last_trace() to see where the error occurred.

rlang::last_trace()
<error/vctrs_error_ptype2>
Error in list_unchop():
! Can't combine x[[1]] and x[[649232]] .


Backtrace:

  1. ├─bike_routes %>% separate_rows(c(1:8), sep = ",")
  2. ├─tidyr::separate_rows(., c(1:8), sep = ",")
  3. └─tidyr:::separate_rows.data.frame(., c(1:8), sep = ",")
  4. └─tidyr::unchop(as_tibble(out), any_of(vars), error_call = current_env())
  5. └─tidyr:::df_unchop(...)
    
  6.   └─vctrs::list_unchop(col, ptype = col_ptype)
    

Run rlang::last_trace(drop = FALSE) to see 9 hidden frames.

Convert columns to character type before separating rows

bike_routes[, 1:8] <- lapply(bike_routes[, 1:8], as.character)

Separate rows based on columns 1-8 (trip_id to origin_destination)

new_data <- bike_routes %>%

  • separate_rows(c(1:8), sep = ",")
    

Is that the issue? The column doesn't exist?

Since the column DOES exist, I don't know what the issue is.

Please post a few rows of your data. Run

dput(head(bike_routes))

and post the output between lines with three back ticks, like this
```
output of dput() goes here
```

Here's the first few lines of the output from running that "dput" command:

structure(list(trip_id...1 = c(9900082882, 12617682, 9075366, 
9900319298, 14468597, 9900153369), subscriber_type...2 = c("Walk Up", 
"Local365", "Local365", "24-Hour Kiosk (Austin B-cycle)", "Walk Up", 
"Local30"), start_time...3 = c("3/19/2015 19:12", "10/30/2016 2:06", 
"3/11/2016 16:28", "11/23/2014 15:12", "4/16/2017 15:39", "5/29/2015 15:12"
), ride_length...4 = c(41, 6, 13, 80, 25, 29), station_name...5 = c("Waller & 6th St.", 
"2nd & Congress", "Bullock Museum @ Congress & MLK", "Capitol Station / Congress & 11th", 
"Rainey St @ Cummings", "17th & Guadalupe"), latitude...6 = c(30.26461, 
30.26408, 30.28039, 30.2726, 30.25579, 30.27974), longitude...7 = c(-97.73049, 
-97.74355, -97.73809, -97.74127, -97.73982, -97.74254), origin_destination...8 = c("Origin", 
"Origin", "Origin", "Origin", "Origin", "Origin"), trip_id...9 = c(9900082882,

Please post all the output of dput(). The partial output() isn't useful.

> dput(head(austin_bikes))
structure(list(trip_id...1 = c(9900082882, 12617682, 9075366, 
9900319298, 14468597, 9900153369), subscriber_type...2 = c("Walk Up", 
"Local365", "Local365", "24-Hour Kiosk (Austin B-cycle)", "Walk Up", 
"Local30"), start_time...3 = c("3/19/2015 19:12", "10/30/2016 2:06", 
"3/11/2016 16:28", "11/23/2014 15:12", "4/16/2017 15:39", "5/29/2015 15:12"
), ride_length...4 = c(41, 6, 13, 80, 25, 29), station_name...5 = c("Waller & 6th St.", 
"2nd & Congress", "Bullock Museum @ Congress & MLK", "Capitol Station / Congress & 11th", 
"Rainey St @ Cummings", "17th & Guadalupe"), latitude...6 = c(30.26461, 
30.26408, 30.28039, 30.2726, 30.25579, 30.27974), longitude...7 = c(-97.73049, 
-97.74355, -97.73809, -97.74127, -97.73982, -97.74254), origin_destination...8 = c("Origin", 
"Origin", "Origin", "Origin", "Origin", "Origin"), trip_id...9 = c(9900082882, 
12617682, 9075366, 9900319298, 14468597, 9900153369), subscriber_type...10 = c("Walk Up", 
"Local365", "Local365", "24-Hour Kiosk (Austin B-cycle)", "Walk Up", 
"Local30"), start_time...11 = c("3/19/2015 19:12", "10/30/2016 2:06", 
"3/11/2016 16:28", "11/23/2014 15:12", "4/16/2017 15:39", "5/29/2015 15:12"
), ride_length...12 = c(41, 6, 13, 80, 25, 29), station_name...13 = c("Trinity & 6th Street", 
"South Congress & Academy", "Convention Center / 4th St. @ MetroRail", 
"Toomey Rd @ South Lamar", "MoPac Pedestrian Bridge @ Veterans Drive", 
"West & 6th St."), latitude...14 = c(30.26735, 30.25226, 30.26483, 
30.26304, 30.27466, 30.27041), longitude...15 = c(-97.73933, 
-97.74854, -97.739, -97.75824, -97.77028, -97.75046), origin_destination...16 = c("Destination", 
"Destination", "Destination", "Destination", "Destination", "Destination"
)), spec = structure(list(cols = list(trip_id...1 = structure(list(), class = c("collector_double", 
"collector")), subscriber_type...2 = structure(list(), class = c("collector_character", 
"collector")), start_time...3 = structure(list(), class = c("collector_character", 
"collector")), ride_length...4 = structure(list(), class = c("collector_double", 
"collector")), station_name...5 = structure(list(), class = c("collector_character", 
"collector")), latitude...6 = structure(list(), class = c("collector_double", 
"collector")), longitude...7 = structure(list(), class = c("collector_double", 
"collector")), origin_destination...8 = structure(list(), class = c("collector_character", 
"collector")), trip_id...9 = structure(list(), class = c("collector_double", 
"collector")), subscriber_type...10 = structure(list(), class = c("collector_character", 
"collector")), start_time...11 = structure(list(), class = c("collector_character", 
"collector")), ride_length...12 = structure(list(), class = c("collector_double", 
"collector")), station_name...13 = structure(list(), class = c("collector_character", 
"collector")), latitude...14 = structure(list(), class = c("collector_double", 
"collector")), longitude...15 = structure(list(), class = c("collector_double", 
"collector")), origin_destination...16 = structure(list(), class = c("collector_character", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), problems = <pointer: (nil)>, row.names = c(NA, 
6L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))

The column trip_id does not exist. There are columns called trip_id...1 and a trip_id...9. I split the data base into two, storing columns 1 - 8 in one and columns 9 - 16 in another. I changed the column names to drop the numeric suffixes, I then combined the two and sorted by trip_id.

DF <- structure(list(trip_id...1 = c(9900082882, 12617682, 9075366, 
                               9900319298, 14468597, 9900153369), 
               subscriber_type...2 = c("Walk Up", "Local365", "Local365", 
                                       "24-Hour Kiosk (Austin B-cycle)", "Walk Up", "Local30"), 
               start_time...3 = c("3/19/2015 19:12", "10/30/2016 2:06", "3/11/2016 16:28", "11/23/2014 15:12", 
                                  "4/16/2017 15:39", "5/29/2015 15:12"), 
               ride_length...4 = c(41, 6, 13, 80, 25, 29), 
               station_name...5 = c("Waller & 6th St.", "2nd & Congress", "Bullock Museum @ Congress & MLK", 
                                    "Capitol Station / Congress & 11th", "Rainey St @ Cummings", "17th & Guadalupe"), 
               latitude...6 = c(30.26461, 30.26408, 30.28039, 30.2726, 30.25579, 30.27974), 
               longitude...7 = c(-97.73049, -97.74355, -97.73809, -97.74127, -97.73982, -97.74254), 
               origin_destination...8 = c("Origin", "Origin", "Origin", "Origin", "Origin", "Origin"), 
               trip_id...9 = c(9900082882, 12617682, 9075366, 9900319298, 14468597, 9900153369), 
               subscriber_type...10 = c("Walk Up", "Local365", "Local365", "24-Hour Kiosk (Austin B-cycle)", "Walk Up","Local30"), 
               start_time...11 = c("3/19/2015 19:12", "10/30/2016 2:06", "3/11/2016 16:28", "11/23/2014 15:12", "4/16/2017 15:39", "5/29/2015 15:12"), 
               ride_length...12 = c(41, 6, 13, 80, 25, 29), 
               station_name...13 = c("Trinity & 6th Street", "South Congress & Academy", "Convention Center / 4th St. @ MetroRail", 
                                     "Toomey Rd @ South Lamar", "MoPac Pedestrian Bridge @ Veterans Drive", "West & 6th St."), 
               latitude...14 = c(30.26735, 30.25226, 30.26483, 30.26304, 30.27466, 30.27041), 
               longitude...15 = c(-97.73933, -97.74854, -97.739, -97.75824, -97.77028, -97.75046), 
               origin_destination...16 = c("Destination", "Destination", "Destination", "Destination", "Destination", "Destination")), 
          spec = structure(list(cols = list(trip_id...1 = structure(list(), class = c("collector_double", "collector")), 
                                            subscriber_type...2 = structure(list(), class = c("collector_character", "collector")), 
                                            start_time...3 = structure(list(), class = c("collector_character", "collector")), 
                                            ride_length...4 = structure(list(), class = c("collector_double", "collector")), 
                                            station_name...5 = structure(list(), class = c("collector_character", "collector")), 
                                            latitude...6 = structure(list(), class = c("collector_double", "collector")), 
                                            longitude...7 = structure(list(), class = c("collector_double", "collector")), 
                                            origin_destination...8 = structure(list(), class = c("collector_character", "collector")), 
                                            trip_id...9 = structure(list(), class = c("collector_double", "collector")), 
                                            subscriber_type...10 = structure(list(), class = c("collector_character", "collector")), 
                                            start_time...11 = structure(list(), class = c("collector_character", "collector")), 
                                            ride_length...12 = structure(list(), class = c("collector_double", "collector")), 
                                            station_name...13 = structure(list(), class = c("collector_character", "collector")), 
                                            latitude...14 = structure(list(), class = c("collector_double", "collector")), 
                                            longitude...15 = structure(list(), class = c("collector_double", "collector")), 
                                            origin_destination...16 = structure(list(), class = c("collector_character", "collector"))), 
                                default = structure(list(), class = c("collector_guess", "collector")), 
                                delim = ","), class = "col_spec"),  row.names = c(NA, 6L), 
          class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))
colnames(DF)
#>  [1] "trip_id...1"             "subscriber_type...2"    
#>  [3] "start_time...3"          "ride_length...4"        
#>  [5] "station_name...5"        "latitude...6"           
#>  [7] "longitude...7"           "origin_destination...8" 
#>  [9] "trip_id...9"             "subscriber_type...10"   
#> [11] "start_time...11"         "ride_length...12"       
#> [13] "station_name...13"       "latitude...14"          
#> [15] "longitude...15"          "origin_destination...16"
DF1 <- DF[, 1:8]
DF2 <- DF[, 9:16]
colnames(DF1) <- sub(pattern = "\\.{3}\\d{1,2}",replacement = "",colnames(DF1))
colnames(DF2) <- sub(pattern = "\\.{3}\\d{1,2}",replacement = "",colnames(DF2))
DFnew <- rbind(DF1, DF2)
DFnew <- DFnew[order(DFnew$trip_id), ]

DFnew
#>       trip_id                subscriber_type       start_time ride_length
#> 3     9075366                       Local365  3/11/2016 16:28          13
#> 9     9075366                       Local365  3/11/2016 16:28          13
#> 2    12617682                       Local365  10/30/2016 2:06           6
#> 8    12617682                       Local365  10/30/2016 2:06           6
#> 5    14468597                        Walk Up  4/16/2017 15:39          25
#> 11   14468597                        Walk Up  4/16/2017 15:39          25
#> 1  9900082882                        Walk Up  3/19/2015 19:12          41
#> 7  9900082882                        Walk Up  3/19/2015 19:12          41
#> 6  9900153369                        Local30  5/29/2015 15:12          29
#> 12 9900153369                        Local30  5/29/2015 15:12          29
#> 4  9900319298 24-Hour Kiosk (Austin B-cycle) 11/23/2014 15:12          80
#> 10 9900319298 24-Hour Kiosk (Austin B-cycle) 11/23/2014 15:12          80
#>                                station_name latitude longitude
#> 3           Bullock Museum @ Congress & MLK 30.28039 -97.73809
#> 9   Convention Center / 4th St. @ MetroRail 30.26483 -97.73900
#> 2                            2nd & Congress 30.26408 -97.74355
#> 8                  South Congress & Academy 30.25226 -97.74854
#> 5                      Rainey St @ Cummings 30.25579 -97.73982
#> 11 MoPac Pedestrian Bridge @ Veterans Drive 30.27466 -97.77028
#> 1                          Waller & 6th St. 30.26461 -97.73049
#> 7                      Trinity & 6th Street 30.26735 -97.73933
#> 6                          17th & Guadalupe 30.27974 -97.74254
#> 12                           West & 6th St. 30.27041 -97.75046
#> 4         Capitol Station / Congress & 11th 30.27260 -97.74127
#> 10                  Toomey Rd @ South Lamar 30.26304 -97.75824
#>    origin_destination
#> 3              Origin
#> 9         Destination
#> 2              Origin
#> 8         Destination
#> 5              Origin
#> 11        Destination
#> 1              Origin
#> 7         Destination
#> 6              Origin
#> 12        Destination
#> 4              Origin
#> 10        Destination

Created on 2024-02-29 with reprex v2.0.2

Hello - as a new R user, I've no understanding of what you did, though I know you explained - at your level. I copied and pasted your work into R Studio console and ran it, then viewed the 6 rows that resulted and found that the data is still in the wide format, rather than the long. I'd like to separate the origin and destination data into two adjacent rows to use it in Tableau. Furthermore, when I uploaded the spreadsheet before, it had 16 column names, all of which were in duplicate (i.e. trip_id, subscriber_type, etc were all listed twice, once each for origin and once each for destination). So when the return said "doesn't exist", I was trying to say that it existed when I ran the readr command. My question now is how to name the spreadsheet columns to get the data ready for use in Tableau - do the 16 column names need to be distinct (such as trip_id, trip_id1, subscriber_type, subscriber_type1) and do the data types need to formatted so as to be readable by R Studio?

I think you didn't run all of my code. There is the initial structure() function that simply makes the example data set of 6 rows. That is what you quote in your most recent post. After that in my previous post there are more lines, which I quoted below with additional comments.

colnames(DF)
#>  [1] "trip_id...1"             "subscriber_type...2"    
#>  [3] "start_time...3"          "ride_length...4"        
#>  [5] "station_name...5"        "latitude...6"           
#>  [7] "longitude...7"           "origin_destination...8" 
#>  [9] "trip_id...9"             "subscriber_type...10"   
#> [11] "start_time...11"         "ride_length...12"       
#> [13] "station_name...13"       "latitude...14"          
#> [15] "longitude...15"          "origin_destination...16"

The above just shows the column names. The duplicates have been eliminated by appending ...# to every name.

DF1 <- DF[, 1:8]
DF2 <- DF[, 9:16]

The above code makes two new data frames. DF1 contains all the rows and the first 8 columns and DF2 contains all the rows and the second 8 columns.

colnames(DF1) <- sub(pattern = "\\.{3}\\d{1,2}",replacement = "",colnames(DF1))
colnames(DF2) <- sub(pattern = "\\.{3}\\d{1,2}",replacement = "",colnames(DF2))

The above code changes the the column names of the two new data frames by eliminating the three dots followed by 1 or 2 numbers. If you are not familiar with "regular expressions", that code will look mysterious. If so, take it on faith that names like trip_id...1 become trip_id. DF1 and DF2 now have identical column names.

DFnew <- rbind(DF1, DF2)

The rbind() function stacks DF1 on top of DF2, so that DFnew has 12 rows.

DFnew <- DFnew[order(DFnew$trip_id), ]

The above line sorts DFnew by trip_id so rows with the same trip_id are now adjacent. The result is shown below. It's a mess to read, but each trip has two rows, one for the Origin and one for the Destination. Is that what you want?

DFnew
#>       trip_id                subscriber_type       start_time ride_length
#> 3     9075366                       Local365  3/11/2016 16:28          13
#> 9     9075366                       Local365  3/11/2016 16:28          13
#> 2    12617682                       Local365  10/30/2016 2:06           6
#> 8    12617682                       Local365  10/30/2016 2:06           6
#> 5    14468597                        Walk Up  4/16/2017 15:39          25
#> 11   14468597                        Walk Up  4/16/2017 15:39          25
#> 1  9900082882                        Walk Up  3/19/2015 19:12          41
#> 7  9900082882                        Walk Up  3/19/2015 19:12          41
#> 6  9900153369                        Local30  5/29/2015 15:12          29
#> 12 9900153369                        Local30  5/29/2015 15:12          29
#> 4  9900319298 24-Hour Kiosk (Austin B-cycle) 11/23/2014 15:12          80
#> 10 9900319298 24-Hour Kiosk (Austin B-cycle) 11/23/2014 15:12          80
#>                                station_name latitude longitude
#> 3           Bullock Museum @ Congress & MLK 30.28039 -97.73809
#> 9   Convention Center / 4th St. @ MetroRail 30.26483 -97.73900
#> 2                            2nd & Congress 30.26408 -97.74355
#> 8                  South Congress & Academy 30.25226 -97.74854
#> 5                      Rainey St @ Cummings 30.25579 -97.73982
#> 11 MoPac Pedestrian Bridge @ Veterans Drive 30.27466 -97.77028
#> 1                          Waller & 6th St. 30.26461 -97.73049
#> 7                      Trinity & 6th Street 30.26735 -97.73933
#> 6                          17th & Guadalupe 30.27974 -97.74254
#> 12                           West & 6th St. 30.27041 -97.75046
#> 4         Capitol Station / Congress & 11th 30.27260 -97.74127
#> 10                  Toomey Rd @ South Lamar 30.26304 -97.75824
#>    origin_destination
#> 3              Origin
#> 9         Destination
#> 2              Origin
#> 8         Destination
#> 5              Origin
#> 11        Destination
#> 1              Origin
#> 7         Destination
#> 6              Origin
#> 12        Destination
#> 4              Origin
#> 10        Destination

I believe that did the trick. I ran all the coding starting with the colnames(DF) function and ending with the DFnew <- DFnew[order(DFnew$trip_id), ] function. I then wrote the transformed file to my computer and viewed it. It appears to be in order though now I'm having issues creating a spider map in Tableau which probably doesn't have anything to do with the work in R Studio. Thank you for your help - it was wonderful to finally get this spreadsheet transformed the way it is apparently supposed to be for Tableau.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.