Hello! I am a university student who is doing an internship to complete my bachelor's.
In my project I have to see some behavioral variables and be able through multidimensional analysis to categorize them in three areas: group control (wildtype), overexpression (transgenic) and lack of a gene (knock-out). I have to these for around 100 variables and each of them has 40 values that I need to separate in these 3 groups.
I am not familiar at all with this techniques and functions (I have never worked in R and don't jnow the functions needed). I just have a basic knowledge of R that I learned eith an online course and I need to use for these variables one way ANOVA. If anyone could explain to me in a really simple way how to do this or show me any coding example I would be so grateful
Without a a reprex
(see the FAQ), it's hard to offer much that will be of immediate use.
The big picture is that this is a problem in three parts:
- How does each variable relate to the three categories, and is there any overlap?
- What are the appropriate statistical tools to classify observations into the categories?
- How to code those tools in
R
?
Here's an example of dimensional reduction in the case of all binary variables.
# install.packages("rARPACK")
# devtools::install_github("andland/logisticPCA")
library(logisticPCA)
library(ggplot2)
data("house_votes84")
logsvd_model = logisticSVD(house_votes84, k = 2)
logsvd_model
#> 435 rows and 16 columns
#> Rank 2 solution
#>
#> 63.6% of deviance explained
#> 549 iterations to converge
logpca_cv = cv.lpca(house_votes84, ks = 2, ms = 1:10)
plot(logpca_cv) + theme_minimal()
#> Warning in type.convert.default(colnames(x)): 'as.is' should be specified by
#> the caller; using TRUE
#> Warning in type.convert.default(rownames(x)): 'as.is' should be specified by
#> the caller; using TRUE
logpca_model = logisticPCA(house_votes84, k = 2, m = which.min(logpca_cv))
clogpca_model = convexLogisticPCA(house_votes84, k = 2, m = which.min(logpca_cv))
plot(clogpca_model, type = "trace") + theme_minimal()
plot(logsvd_model, type = "trace") + theme_minimal()
party = rownames(house_votes84)
plot(logsvd_model, type = "scores") +
geom_point(aes(colour = party)) +
ggtitle("Exponential Family PCA") +
scale_colour_manual(values = c("blue", "red")) +
theme_minimal()
Created on 2023-06-29 with reprex v2.0.2
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.