I have just started with R and need to merge two csv files that share the column name SpeName. Below is the how the two files are formatted
CSV1
Kingdom,Phylum,Class,Order,Family,Genus,Species,,SciName
PLANTAE,TRACHEOPHYTA,MAGNOLIOPSIDA,FABALES,FABACEAE,Acacia,auriculiformis,19891902,Acacia auriculiformis
PLANTAE,TRACHEOPHYTA,MAGNOLIOPSIDA,FABALES,FABACEAE,Acacia,crassicarpa,38366,Acacia crassicarpa
PLANTAE,TRACHEOPHYTA,MAGNOLIOPSIDA,FABALES,FABACEAE,Acacia,decurrens,60757212,Acacia decurrens
PLANTAE,TRACHEOPHYTA,MAGNOLIOPSIDA,FABALES,FABACEAE,Acacia,koa,19891713,Acacia koa
PLANTAE,TRACHEOPHYTA,MAGNOLIOPSIDA,FABALES,FABACEAE,Acacia,mangium,18435820,Acacia mangium
CSV2
SciName,FAMILY,CONTINENT,PILOT NAME
Abarema jupunba Britton & Killip,Leguminosae (Fabaceae),AM,Huruasa
Acacia spp.,Leguminosae (Mimosaceae),AS,Acacia
Acacia auriculiformis A. Cunn.,Leguminosae (Mimosaceae),AS,Acacia
Acacia mangium Willd.,Leguminosae (Mimosaceae),AS,Acacia
Acanthopanax ricinifolius Seem. (cf. Kalopanax Araliaceae AS Senseptemlobus),Araliaceae,AS,Sen
Acrocarpus fraxinifolius Arn.,Leguminosae (Caesalpiniaceae),AS,Kuranjan
Actinodaphne spp.,Lauraceae,AS,Medang
What I'd like, is to create a data frame with a unique row for each unique value of SciName so that the values from each csv are combined. My problem is that the names in SciName do not 100% match and yet they are the same for example, above you have Acacia mangium in csv1 and Acacia mangium Wild. in csv2. I would like to create a row that keeps the longer version of the name i.e. Acacia mangium under SciName and combines the data in the rest of the columns for both names. This should only happen though where there are direct matches with the first two names in both csvs. Where there are no matches, null values would be given to the cells in the addition columns. Using the example above the table would look like:
Kingdom,Phylum,Class,Order,Family,Genus,Species, ,SciName,Family2,CONTINENT,PILOT NAME
,,,,,,,,Abarema jupunba Britton & Killip,Leguminosae (Fabaceae),AM,Huruasa
PLANTAE,TRACHEOPHYTA,MAGNOLIOPSIDA,FABALES,FABACEAE,Acacia,auriculiformis,19891902,Acacia auriculiformis A. Cunn,Leguminosae (Mimosaceae),AS,Acacia
Hope this is clear, and if R is not the best place to do this, it would also be great to get some pointers to some other solutions that can deal with relatively large datasets.
Thanks for any help