And I wanted to merge the two data frames, but I know one data.frame has three rows and the other has four. So I know this would leave some of the columns blanks, which is fine and what I want. When I try to just combine them I get an error due to the difference in the number of rows. Ideally I will be doing this on a much larger dataset imported from excel, so I wanted to see if there was a quick way to do this.
That should work, thanks. But now let's say it's a little different. And this should have probably been in my original note, I apologize for not including it. So what if I want the assignments to match to rows that have names containing phrases in the proteinid2, but may not be a complete match, similar to the following:
So for example, I want all rows that contain the phrase "Replication Factor" in proteinid to have the correct protein_function assignment, even if it is not an exact match. So both "Replication Factor" and "Replication Factor alpha" have the assignment "Transcription"
inexact string matching is an art not a science, and what best to do will be context dependent; it will be up to you to define a cutoff criteria for similarity. but you can use use the tools in stringdist package to solve this.
I will define my own cut offs for the matching, but I just wanted to know how I would do it. I've never used the stringdist package, how would you set it up for this particular example?