How to combine datasets into a single, usable data frame?

Hello! A question for my final Google Data Analytics Certificate project - how do I combine datasets for a separate, specific output type in the environment pane? I'm not sure what to label the output type I'm looking for, but have included a description.

I have bike data from 4 quarters (split into 4 data sets) that I believe I have correctly transposed to data frames. However, when I attempt to combine them, I can only get a printed output in console and they are still split up by each quarter and the format is not as usable for cleaning, analyzing, etc- I used method one from this article: How to Merge Multiple Data Frames in R (With Examples)

I was hoping to create a new "document" (perhaps a data frame?) in which they would all be grouped together by column (given they share the same column names) and sorted by date- basically that it would look like the original data sets, but with more rows. Even if not in date order, at least all together and I could re-order them myself.

I also saw an article that suggested using the data.table package for larger datasets, however it had no link with details as to the manner in which this could be utilized and I was unable to successfully locate any other usable information: https://www.datacamp.com/tutorial/merging-data-r

I keep finding different pieces of information for varying methods of combination, but I cannot seem to locate anything specific to what I'm attempting to accomplish. So, I'm hoping some super smart people here can help me.

I imported each quarter's data into r from a .csv file and converted them to data frames:

Quarter1 <- data.frame(Divvy_Trips_2019_Q1)
Quarter2 <- data.frame(Divvy_Trips_2019_Q2)
Quarter3 <- data.frame(Divvy_Trips_2019_Q3)
Quarter4 <- data.frame(Divvy_Trips_2019_Q4)

Then put them into a list:
Combined_2019_bike_data <- list(Quarter1, Quarter2, Quarter3, Quarter4)

And merged them accordingly:
Reduce(function(x, y) merge(x, y, all=TRUE), Combined_2019_bike_data)

This did create a "document" labeled Combined_2019_bike_data, but when going to view it, it displays all the combined data into console (as previously stated) and it's quite unreadable. In the original data frames, it's organized by columns from left to right:
trip_id; start_time; end_time; bikeid; tripduration; from_station_id; from_station_name; to_station_id; to_station_name; usertype
(yes, I plan to update these once I have everything combined in the manner I would like it to be).

I'll include output samples:

a) Sample from first section of output:

        start_time            end_time bikeid tripduration from_station_id

1 2019-01-01 00:04:37 2019-01-01 00:11:07 2167 390 199
2 2019-01-01 00:08:13 2019-01-01 00:15:34 4386 441 44
3 2019-01-01 00:13:23 2019-01-01 00:27:12 1524 829 15

b) Sample from second section output:

                 from_station_name to_station_id

1 Wabash Ave & Grand Ave 84
2 State St & Randolph St 624
3 Racine Ave & 18th St 644

c) Sample from third section of output (I've replaced the trip_id numbers with x's for any privacy concerns):

                           to_station_name   usertype  trip_id gender

1 Milwaukee Ave & Grand Ave Subscriber xxxxxxxx
2 Dearborn St & Van Buren St () Subscriber xxxxxxxx
3 Western Ave & Fillmore St (
) Subscriber xxxxxxxx

The difference I can see in the documents is that the original ones in the Environment pane show the title under Data with an example description of "3650069 obs. of 12 variables" and a grid icon to the far right that seems to indicate whatever document type that is. The new one I created instead displays "Large list (48 elements, 366.9 MB)" with no icons if that helps identify what I'm looking for.

If I understand you correctly, you want to make a single data frame combining all the data in the quarterly data frames. The final data frame would have the same number of columns as one of the original data frames and as many rows as the sum of rows in all the data frames.
The merge() function is not the right one to use to do that. A merge aligns data frames side by side, making more columns and matching rows based on a subset of the columns.
You want to row bind your data frames, stacking them on top of each other. Use the list containing the four data frames, Combined_2019_bike_data, and try this:

library(dplyr)
AllData <- bind_rows(Combined_2019_bike_data)
2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.