Hi, I've been assigned a project for one of my classes and I'm having trouble getting something to work.
So we've been told to analyze the genes that cause a particular disease - I've chosen Rhizomelic Chondrodysplasia Punctata, gonna call it RCP for simplicity - and determine which stage of development during pregnancy is the fetus most susceptible to developing the disease due to certain genes being expressed at that time.
To do this, I've read in two csv files, MouseHomolog (rows are all genes collected in a mouse's pregnancy, and columns are the days when they were collected) and RCP (rows are all genes that are related to the RCP disease, column 1 is the "stage of development" the gene was clustered into by kmeans, and all other columns are the days when the gene was collected). The latter of which was heatmap data downloaded from a SWOT clock website, if that helps.
So, I've been instructed to create a biplot showing the RCP data points plotted against the PCs of the MouseHomolog data. The biplot is going to wind up looking like a SWOT clock, where each arrow drawn from the center is a different date of sample collection, and the data points are the RCP genes collected. To do this, I have to intersect the list of genes in the RCP data with the genes from the MouseHomolog data to isolate those genes from the Mouse data, calculate the PCs of the Mouse data using prcomp, then cbind the list of PCs with the list of which clusters the RCP genes were placed into. The code for all of this is described below.
Read in the data and create a dataframe. The genes are labeled on the first column, so set row.names to 1
Mouse.df <-read.csv("~/MATP-4400/data/MouseHomologData.csv", row.names = 1)
Create a matrix for our analysis
Mouse.matrix <- as.matrix(Mouse.df)
Calculate the PCA
my.pca <- prcomp(Mouse.matrix, retx=TRUE, center=TRUE, scale=TRUE)
Read in the data and create a dataframe. The genes are labeled on the 2nd column, so set row.names to 2
rcp.df <- read.csv("~/IDM_work/Lab5/Rhizomelic chondrodysplasia punctata_heat_map_data.csv", row.names=2)
rcp.matrix <- as.matrix(rcp.df)
rcp_symbols <- intersect(as.character(rcp.matrix[,1]),
as.character(rownames(Mouse.df)))
Data frame containing the PCs and the kmeans cluster
rcp.plot.df <- cbind.data.frame(my.pca$x, cluster=as.factor(rcp.df$cluster))
But when I try this, I get the error:
Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 8240, 19
Can anybody explain what this means, and maybe help find a solution? I can provide more information about my code, if needed. Thanks so much in advance!