Multidimensional analysis of genotype in R

Hello! I am a university student who is doing an internship to complete my bachelor's.
In my project I have to see some behavioral variables and be able through multidimensional analysis to categorize them in three areas: group control (wildtype), overexpression (transgenic) and lack of a gene (knock-out). I have to these for around 100 variables and each of them has 40 values that I need to separate in these 3 groups.
I am not familiar at all with this techniques and functions (I have never worked in R and don't jnow the functions needed). I just have a basic knowledge of R that I learned eith an online course and I need to use for these variables one way ANOVA. If anyone could explain to me in a really simple way how to do this or show me any coding example I would be so grateful

Without a a reprex (see the FAQ), it's hard to offer much that will be of immediate use.

The big picture is that this is a problem in three parts:

  1. How does each variable relate to the three categories, and is there any overlap?
  2. What are the appropriate statistical tools to classify observations into the categories?
  3. How to code those tools in R?

Here's an example of dimensional reduction in the case of all binary variables.

# install.packages("rARPACK")
# devtools::install_github("andland/logisticPCA")
library(logisticPCA)
library(ggplot2)
data("house_votes84")
logsvd_model = logisticSVD(house_votes84, k = 2)
logsvd_model
#> 435 rows and 16 columns
#> Rank 2 solution
#> 
#> 63.6% of deviance explained
#> 549 iterations to converge
logpca_cv = cv.lpca(house_votes84, ks = 2, ms = 1:10)
plot(logpca_cv) + theme_minimal()
#> Warning in type.convert.default(colnames(x)): 'as.is' should be specified by
#> the caller; using TRUE
#> Warning in type.convert.default(rownames(x)): 'as.is' should be specified by
#> the caller; using TRUE

logpca_model = logisticPCA(house_votes84, k = 2, m = which.min(logpca_cv))
clogpca_model = convexLogisticPCA(house_votes84, k = 2, m = which.min(logpca_cv))
plot(clogpca_model, type = "trace") + theme_minimal()

plot(logsvd_model, type = "trace") + theme_minimal()

party = rownames(house_votes84)
plot(logsvd_model, type = "scores") + 
  geom_point(aes(colour = party)) + 
  ggtitle("Exponential Family PCA") + 
  scale_colour_manual(values = c("blue", "red")) +
  theme_minimal()

Created on 2023-06-29 with reprex v2.0.2

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.