Having Issues Cbind-ing Data

SazLaine · February 28, 2020, 2:30am

Hi, I've been assigned a project for one of my classes and I'm having trouble getting something to work.

So we've been told to analyze the genes that cause a particular disease - I've chosen Rhizomelic Chondrodysplasia Punctata, gonna call it RCP for simplicity - and determine which stage of development during pregnancy is the fetus most susceptible to developing the disease due to certain genes being expressed at that time.

To do this, I've read in two csv files, MouseHomolog (rows are all genes collected in a mouse's pregnancy, and columns are the days when they were collected) and RCP (rows are all genes that are related to the RCP disease, column 1 is the "stage of development" the gene was clustered into by kmeans, and all other columns are the days when the gene was collected). The latter of which was heatmap data downloaded from a SWOT clock website, if that helps.

So, I've been instructed to create a biplot showing the RCP data points plotted against the PCs of the MouseHomolog data. The biplot is going to wind up looking like a SWOT clock, where each arrow drawn from the center is a different date of sample collection, and the data points are the RCP genes collected. To do this, I have to intersect the list of genes in the RCP data with the genes from the MouseHomolog data to isolate those genes from the Mouse data, calculate the PCs of the Mouse data using prcomp, then cbind the list of PCs with the list of which clusters the RCP genes were placed into. The code for all of this is described below.

Read in the data and create a dataframe. The genes are labeled on the first column, so set row.names to 1

Mouse.df <-read.csv("~/MATP-4400/data/MouseHomologData.csv", row.names = 1)

Create a matrix for our analysis

Mouse.matrix <- as.matrix(Mouse.df)

Calculate the PCA

my.pca <- prcomp(Mouse.matrix, retx=TRUE, center=TRUE, scale=TRUE)

Read in the data and create a dataframe. The genes are labeled on the 2nd column, so set row.names to 2

rcp.df <- read.csv("~/IDM_work/Lab5/Rhizomelic chondrodysplasia punctata_heat_map_data.csv", row.names=2)
rcp.matrix <- as.matrix(rcp.df)
rcp_symbols <- intersect(as.character(rcp.matrix[,1]),
as.character(rownames(Mouse.df)))

Data frame containing the PCs and the kmeans cluster

rcp.plot.df <- cbind.data.frame(my.pca$x, cluster=as.factor(rcp.df$cluster))

But when I try this, I get the error:
Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 8240, 19

Can anybody explain what this means, and maybe help find a solution? I can provide more information about my code, if needed. Thanks so much in advance!

nirgrahamuk · February 28, 2020, 2:34am

Hello,
The error message means my.pca$x and rcp.df$cluster are different length,
I would guess that you might have 19 clusters? and 8240 values from the PCA?

SazLaine · February 28, 2020, 3:00am

Nvm, I figured it out. Long story short, prior to this, we had to cluster the MouseHomolog genes to see when they all occurred in time, then cbind that with the pca calculations of the Mouse data. All I had to do was use that and isolate the RCP genes from it, not cbind the RCP with the Mouse pca. Thanks anyway guys!

system · March 6, 2020, 3:00am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.