Hi - I'm relatively new to R and hoping it can help me with a particularly onerous data cleanup project I'm working on. Basically I have a data table with course names and course enrollments listed separately for 5 years, each year having its own individual listing for the both the course names and the number of enrollments in that course for that unique year. I'd like the end result to be just one column of course names with each years' enrollments listed as the subsequent columns after that. My problem is that the course names will not always match. They may have some similar words or phrases in their names, but there are no matching IDs or easy ways to merge the data that I can think of. Does anyone know of a way R could help me with this?
Consider using string distance measures to join the messy data with a predefined list of standard course names, you can use the fuzzyjoin
package to help you with that.
If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.