geom_point shape options/alternatives

Hi there,

I'm working on a PCA plot and using geom_point to plot the data. In one version I'm using a single shape and changing the fill according to a location variable.

I'm now attempting to change the shape according to location β€” since I have to set a range of continuous colors from a palette which makes it hard, at time, to discern the various locations; however, with this strategy I'm facing the issue of a limited number of shapes available for geom_point...

Is there anything beyond the options offered here for ggplot2, aside from letters, numbers and other characters which is more of a geometric shape? I will be very happy to use those in my code. Below what I'm doing atm to handle the 31 locations I need to plot.

Any help is greatly appreciated!

CODE

#plot the data
ggplot(pca_regions) +
  geom_point(mapping=aes(x=as.numeric(PC1), y=as.numeric(PC2), color=var$V1, shape=loc$V1), show.legend=T, size=2) +
  scale_colour_manual(values=c(brewer.pal(12, "Set3")[c(5, 1, 4)], "black")) + 
  scale_shape_manual(values=c(0:25,126,124,95,94,63)) + theme_bw() +
  guides(color=guide_legend(title='variety', title.position='top', title.hjust=.5, ncol=3, keywidth=1, override.aes=list(shape=15), position="inside"),
         shape=guide_legend(title='location', title.position='top', title.hjust=.5, ncol=16, keywidth=1, position="bottom")) +
  theme(legend.background=element_rect(fill="transparent"),
        legend.title=element_text(face='italic'), 
        legend.position.inside=c(0.9,0.9)) +
  xlab(paste0("PC1 (", signif(pve$pve[1], 4), "%)")) +
  ylab(paste0("PC2 (", signif(pve$pve[2], 4), "%)")) -> snp
snp

dput(pca_regions) β€” first 50 only

structure(list(ind = c("INLUP00130", "INLUP00131", "INLUP00132", 
"INLUP00133", "INLUP00134", "INLUP00135", "INLUP00136", "INLUP00137", 
"INLUP00138", "INLUP00139", "INLUP00140", "INLUP00141", "INLUP00142", 
"INLUP00143", "INLUP00144", "INLUP00145", "INLUP00146", "INLUP00147", 
"INLUP00152", "INLUP00153", "INLUP00155", "INLUP00156", "INLUP00157", 
"INLUP00158", "INLUP00159", "INLUP00160", "INLUP00161", "INLUP00162", 
"INLUP00164", "INLUP00165", "INLUP00166", "INLUP00167", "INLUP00169", 
"INLUP00170", "INLUP00171", "INLUP00172", "INLUP00173", "INLUP00174", 
"INLUP00177", "INLUP00178", "INLUP00179", "INLUP00180", "INLUP00182", 
"INLUP00184", "INLUP00185", "INLUP00187", "INLUP00188", "INLUP00189", 
"INLUP00190", "INLUP00191"), loc = structure(list(V1 = structure(c(7L, 
7L, 7L, 7L, 7L, 7L, 21L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 21L, 14L, 7L, 7L, 7L, 12L, 7L, 7L, 
7L, 7L, 7L, 12L, 12L, 12L, 12L, 7L, 7L, 21L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L), levels = c("AUS", "CHL", "CZE", "DEU", "DZA", 
"EGY", "ESP", "ETH", "FRA", "GEO", "GBR", "GRC", "HUN", "ITA", 
"JOR", "MAR", "NDL", "LTU", "PAL", "POL", "PRT", "RUS", "SDN", 
"SUN", "SYR", "TUR", "UKR", "USA", "YUG", "ZAF", "UNK"), class = "factor")), row.names = c(NA, 
-50L), class = "data.frame"), var = structure(list(V1 = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 
3L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L), levels = c("wt", "lr", "cv", "unk"), class = "factor")), row.names = c(NA, 
-50L), class = "data.frame"), PC1 = c("-0.0905818", "0.0290647", 
"-0.0940301", "-0.0766415", "-0.0719757", "-0.0207946", "0.00671521", 
"-0.00373625", "-0.00992566", "-0.0346908", "-0.0121009", "-0.0311462", 
"-0.00380394", "0.000922411", "-0.0748421", "-0.00210356", "-0.0714259", 
"-0.0298776", "-0.0778321", "-0.0474126", "-0.00784622", "0.0381459", 
"-0.0677863", "0.00763538", "-0.0110567", "-0.0636122", "-0.0970295", 
"-0.0647722", "0.0233859", "-0.00930208", "-0.06537", "-0.0470863", 
"0.013385", "-0.0703438", "-0.0252064", "-0.0392894", "0.0439079", 
"-0.036717", "-0.00620457", "-1.5535e-05", "-0.0119478", "0.00904671", 
"-0.0186496", "-0.0727634", "-0.0111302", "-0.0933997", "-0.101701", 
"-0.107218", "-0.0966413", "-0.0984431"), PC2 = c("-0.0795461", 
"-0.0290779", "-0.0941539", "-0.0443532", "-0.0331296", "-0.0407704", 
"0.00556518", "-0.0302595", "-0.033655", "0.00269964", "-0.0470112", 
"-0.0272876", "-0.0678772", "-0.0454818", "-0.0275835", "-0.0368061", 
"-0.0552505", "-0.011631", "-0.0580597", "-0.038941", "-0.0858756", 
"-0.0986832", "-0.0320659", "0.0850982", "0.0454862", "-0.0146859", 
"-0.103793", "-0.000762477", "0.0395774", "-0.00242237", "-0.00357609", 
"0.00485422", "-0.075234", "-0.00353877", "0.057966", "-0.0311769", 
"-0.00845331", "-0.0422088", "-0.0292406", "0.00351496", "0.0476253", 
"-0.0411485", "0.0180681", "-0.0215925", "-0.00268818", "-0.106607", 
"-0.111891", "-0.146831", "-0.103153", "-0.109329"), PC3 = c("-0.00138962", 
"-0.00748409", "-0.00176256", "0.00347965", "-0.000834205", "0.0344939", 
"0.0224754", "-0.0957929", "0.0617002", "0.0321519", "-0.0686412", 
"0.016716", "-0.0561151", "-0.0229615", "-0.00141877", "-0.00251361", 
"0.00633188", "0.0044354", "-0.00561195", "-0.019666", "0.0381118", 
"0.0595297", "-0.0114298", "-0.0082358", "-0.0583783", "-0.0222055", 
"-0.0096124", "0.0115042", "-0.135578", "-0.0153113", "-0.00396086", 
"0.0142641", "0.0446934", "0.00230603", "-0.000521044", "-0.00751414", 
"0.0420377", "-0.0810558", "0.00333866", "0.00587296", "-0.0159729", 
"-0.0435564", "0.00455461", "0.00189649", "-0.0517733", "-0.012175", 
"-0.00999359", "-0.0128122", "-0.0138699", "-0.011448"), PC4 = c("0.0248673", 
"-0.0151257", "0.0226235", "0.00159567", "-0.0038772", "0.0040877", 
"-0.00699734", "0.00282502", "-0.00276828", "0.01772", "0.00578124", 
"-0.000656035", "0.0318441", "0.00209398", "-0.0114628", "-0.0335925", 
"0.00111556", "-0.0196098", "-0.00187813", "0.0128646", "-0.0204046", 
"0.00973402", "0.0203487", "-0.0493093", "-0.0108793", "-0.014588", 
"0.0331097", "-0.00420252", "0.0143341", "-0.023391", "-0.0136439", 
"-0.0199432", "-0.0141095", "-0.0130491", "0.00773362", "0.0209273", 
"-0.0109627", "0.0102261", "-0.0364052", "-0.0112843", "-0.0121294", 
"-0.00216388", "-0.0129666", "-0.00374619", "-0.00613682", "0.0380435", 
"0.0394329", "0.0504558", "0.0382925", "0.0288498"), PC5 = c("0.0143375", 
"0.0407895", "0.0178046", "0.00140979", "0.00315903", "-0.0287447", 
"-0.017456", "0.032661", "-0.0292905", "-0.0790229", "0.0138123", 
"-0.0053016", "-0.0293139", "0.0224459", "0.00206252", "0.0456866", 
"0.016554", "0.034281", "0.0116048", "-0.00680423", "0.051829", 
"-0.0268403", "0.00990087", "0.00836678", "0.00135524", "0.0200928", 
"0.0311454", "-0.021187", "-0.00460853", "0.0232007", "-0.0132015", 
"0.0162109", "0.0202962", "-0.00724316", "-0.0677743", "-0.0142955", 
"0.0120461", "0.0439236", "0.0408373", "-0.0099259", "0.000135243", 
"0.013909", "-0.0086348", "-0.00795694", "0.00667088", "0.0143692", 
"0.0191462", "0.0371987", "0.0231081", "0.0212589"), PC6 = c("-0.0454171", 
"0.0331046", "-0.0528585", "0.0540075", "0.0799645", "0.0442909", 
"0.0236256", "0.00891521", "0.0419906", "0.00352862", "0.0204496", 
"0.0247615", "0.0552694", "0.0247261", "0.0805027", "0.0359086", 
"0.0615945", "0.0406206", "0.0591099", "0.0461959", "0.00208693", 
"-0.00722939", "-0.105282", "-0.0465947", "-0.0127702", "-0.015257", 
"-0.113017", "0.0917914", "-0.0118837", "0.0257378", "0.060403", 
"0.0330698", "0.0514057", "0.062395", "-0.0137664", "-0.0318695", 
"0.0155421", "-0.0707476", "0.0264675", "0.00260597", "0.00680551", 
"0.0779849", "0.0165352", "0.0835759", "0.0553146", "-0.1162", 
"-0.129661", "-0.244017", "-0.120825", "-0.105649"), PC7 = c("0.0125797", 
"0.0606499", "0.0056455", "0.00180765", "-0.0185536", "-0.0288014", 
"0.0151487", "-0.0139494", "-0.0163418", "0.0188601", "0.00246036", 
"-0.0112989", "0.0197205", "0.0281699", "-0.00361935", "-0.00180887", 
"-0.00810018", "-0.00671052", "0.00956915", "0.0177455", "0.0277215", 
"-0.0640255", "-0.00783096", "0.0942466", "-0.0306172", "-0.0418106", 
"0.0135646", "-0.0130877", "0.013239", "-0.0292879", "-0.0035463", 
"0.0187379", "0.0038582", "-0.0151556", "0.0189196", "0.000581421", 
"0.0244206", "-0.0282119", "0.0221617", "0.0424945", "-0.0121765", 
"0.0553591", "-0.0201943", "-0.0106955", "0.0327855", "-0.000271587", 
"0.0141329", "0.0391307", "0.00623153", "0.00708356"), PC8 = c("-0.00145489", 
"0.0350668", "0.0228418", "0.102345", "0.0912925", "0.073818", 
"0.037233", "0.0400621", "0.0665905", "0.0108318", "0.0555292", 
"0.0486614", "0.0663503", "0.0439738", "0.0520776", "0.0798453", 
"0.0812753", "0.0561098", "0.101766", "-0.0698916", "0.0432593", 
"0.00332399", "-0.00909821", "-0.0215231", "0.0135282", "0.021988", 
"-0.0188523", "0.0474033", "0.00637921", "0.0136662", "0.0596361", 
"0.049304", "0.06239", "0.0414392", "0.00936791", "-0.00719962", 
"0.0204329", "-0.0287971", "0.0628345", "0.033449", "0.0178963", 
"-0.00112953", "0.0349015", "-0.0118842", "-0.0208982", "-0.0331892", 
"-0.0453559", "-0.12837", "-0.0618356", "-0.0375705"), PC9 = c("-0.00713262", 
"0.0319015", "0.0160257", "0.0817545", "0.0681947", "0.0535609", 
"-0.000904084", "0.0594673", "0.0170689", "0.0720396", "0.0197094", 
"0.0373546", "-0.0169878", "0.0349624", "0.0268148", "0.0419015", 
"0.0715991", "0.0464332", "0.0783987", "-0.0334146", "0.0382053", 
"0.0277418", "-0.0801411", "0.0431939", "-0.0320536", "0.00687228", 
"-0.0100788", "0.00175451", "-0.0418586", "0.0182558", "0.0259619", 
"0.00472191", "0.0654636", "0.0188219", "0.0371982", "-0.03465", 
"-0.002146", "-0.00974917", "0.0413978", "0.073405", "-0.0373654", 
"0.0624965", "-0.037208", "0.019674", "-0.0313932", "-0.00971411", 
"-0.0162114", "-0.0330097", "-0.0261986", "-0.0235099"), PC10 = c("0.00652536", 
"-0.0547185", "0.0201075", "0.0262468", "0.0203448", "0.0548188", 
"0.0430478", "0.0497109", "0.0889271", "0.0787763", "0.0248434", 
"-0.0409447", "-0.0323839", "0.0523454", "0.0146931", "0.0504284", 
"0.0270329", "0.0126057", "0.026725", "-0.0620336", "0.00120752", 
"0.0217433", "-0.00756803", "0.131068", "0.0399678", "-0.014142", 
"-0.000872316", "-0.0151406", "0.0162078", "-0.0105141", "-0.0235251", 
"0.013206", "0.00386834", "-0.0117896", "-0.00658993", "-0.0449554", 
"-0.0344144", "0.0441983", "0.0589861", "0.000934144", "0.00505216", 
"-0.0327329", "-0.00319947", "-0.000865886", "0.0560204", "0.00301318", 
"0.00641874", "0.00812359", "0.00687964", "0.0106038")), row.names = c(NA, 
-50L), class = c("tbl_df", "tbl", "data.frame"))

More a comment than a definitive answer: with 31 different shapes, they will likely become impossible to distinguish on a single huge plot. I would even say anything above 4-5 shapes will strain the reader's eyes.

In this context I would suggest thinking about alternative representations: could it make sense to show the data on a map? Would you get something clearer if you used facets?

Or, if you really want to represent all these locations on a single plot, is a geometric shape really the best representation? Since the locations seem to correspond to countries, what about using the country flags as symbols (like ggimage::geom_flag())? Or writing explicitly the location (with geom_text(label=loc$V1)) instead of a point shape? That way no need for an unreadable legend.

2 Likes

@AlexisW great points, really appreciated the different options you proposed. I will expose them to my supervisor, as I was also quite skeptical to use shapes for the exact same reason you point out β€” the plot gets really messy...

On the other hand, I've seen PCA with only colored dots which still don't convey the message well enough but at least don't hurt the eyes of the reader. However, your suggestions are really valuable alternatives to this not only labels but also the geom_flag() of which I wasn't aware of!

Thanks again.

1 Like

Hello,

For PCA and other factorial analysis, you should consider the package factoshiny (that use the package factominer).
You could set the figures obtained by the analysis and then export the R code to quarto notebook, if needed.

MickaΓ«l