How to get merge() function to work in R Studio

Hello - I've uploaded csv files and used the read.csv function correctly in advance of merging spreadsheets. However, the merge() function isn't working, as the error message says I'm using column names that aren't exactly the same. I've checked the spreadsheet column names several times, and believe all eight (8) column names are identical on each of the three (3) spreadsheets I'm attempting to merge. After typing in the exact column names in the input and getting the error message telling me the 'by' must specify one or more columns as numbers, names or logical, I took that to mean I should use column identifiers (A, B, C,...) so I tried that too but that apparently wasn't what it meant. Again, I'm as sure as can be that the column names are identical. Thanks, in advance for your advise.

Here is the URL: Posit Cloud
and here are the inputs/error messages:

df1 <- read.csv("cyclistic_july_2022.csv")
df2 <- read.csv("cyclistic_august_2022.csv")
df3 <- read.csv("cyclistic_september_2022.csv")
merged_df <- merge(df1, df2, df3, by = c("ride_id", "rideable_type", "ride_length", "day_of_week", "month_day_year", "start_time", "end_time", "member_casual"))
Error in fix.by(by.x, x) :
'by' must specify one or more columns as numbers, names or logical
merged_df <- merge(df1, df2, df3, by=c("ride_id", "rideable_type", "ride_length", "day_of_week", "month_day_year", "start_time", "end_time", "member_casual"))
Error in fix.by(by.x, x) :
'by' must specify one or more columns as numbers, names or logical
merged_df <- merge(df1, df2, df3, by = c("A", "B", "C", "D", "E", "F", "G", "H"))
Error in fix.by(by.x, x) :
'by' must specify one or more columns as numbers, names or logical
-->

Here is the syntax for the merge function.

merge(x, y, by = intersect(names(x), names(y)),
      by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
      sort = TRUE, suffixes = c(".x",".y"), no.dups = TRUE,
      incomparables = NULL, ...)

Your df1 is being passed to x, df2 is passed to y, you explicitly assign by, and I guess df3 is being passed to by.x. That would account for the error.
Also, you are trying to merge on every column but the data come from different months so start_time and end_time will never match in different data frames. merged_df will have no rows. Are you trying to stack the data into a single data frame? You can do that with rbind()

May be you can try bind_rows function instead of merge. But I have another problem posit cloud crashes when I try to upload 12 files on it. File pane works ok but when I try to bring it into environment pane that's where the problem arises after couple of files it just crashes down? What should I do?

Is there any code involved in this process you describe?

Yes I am trying following code
df1 <- read.csv (file.choose())

after you run that code; you would be prompted to select a file, I assume you do so, and then df1 appears in your environment pane.

I don't see anything controversial here. presumably you are choosing new names for the new files your load in so that you are not making the mistake of naming every loaded object df1.
do you know the size of these csv's ? it is possible that you may be running out of memory on the cloud.

If you are running out of memory, the new Cloud Basic plan is $25 per month and can access up to 8 GB of RAM. There must be a way to complete the Data Analytics capstone project with just 1 GB, but you are not the only person that has had problems. If memory is the problem, this would be an easy fix.

You could pay for one month today, cancel tomorrow (to avoid getting charged again next month), and you would still have the remaining 30 days of Cloud Basic. If you choose to assign 4 GB to the project, each hour working in a project will use 2.5 of the 150 compute hours in Cloud Basic. You can also revert to 1 GB after merging all of the data files.

1 Like

After my own struggles on my first D.A. capstone project, I signed up for the $25 upgrade - yesterday. Today, I see a majority of the extra 7 GB is also now taken up. Since it's my first project, it's necessary to repeat some code, and try new, alternative lines of code to get satisfactory, never-mind successful results. So the "free" side of the Google Data Analytics Certificate is questionable, since its attainment is predicated on using other platforms - i.e. Coursera, PositCloud, etal, which are also trying to up-sell while not necessarily being invested in the students' attainment of resume-worthy education and experience.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.