From RNA-seq data, I performed normalization which is represented by the values as 77.9 etc. In the matrix, genes are in rows, columns have samples with FPKM values for each gene across the samples. I want to run a linear mixed effect model to calculate p-vlaues for each gene across all the samples. I want to run a linear mixed model like:
Model = lmer(FPKM ~ (1|gene), data=X)
I am getting an error as there is no variable called FPKM. However, the FPKM values are represented in a matrix for each individual in the columns for each gene. I need help to run the model to generate p-values for each gene across the samples. I will appreciate any help. Thank you!
The matrix should be converted to a data frame, with individuals as rows and genes as columns. The gene name should be used to indicate the FPKM value. The gene is the fixed effect and the individual is the random effect.
Thank you! I have converted the data frame with individuals as rows and genes as columns. Can you please help in using gene name as FPKM value to run the model with gene as fixed effect and individual as random effect.
Thank you @technocrat! I have a question about basket variable. I understand it represents FPKM values. Can you please tell which order the values are put such as all values for Gene1 then Gene2 (row-wise)? My real dataset is much bigger and I am making a variable representing all FPKM values to use in above code.
basket holds values in 0.1 increments between in the closed interval [70.1,79.9] to be sampled 100 times with replacement to create the column FPKM. gene is constructed from the closed interval [1,10] with the string gene prepended. When bound as a data frame with FPKM the value recycles and the result is as shown.
Thank you! I ran the model and I have a question about Individuals. I want to also get the variance values to calculate the heritability's. The current model does not give any variance value. I tried to include Individual as another random factor in the model but it gives an error. Is there a way to include Individual as random factor in model and get variance values to calculate the heritability's further.
I tried running this model: Model <- lmer(FPKM ~ 1+ (1|gene) + (1|Individual), data)
But I get the error: boundary (singular) fit: see ?isSingular
Is the problem in my data or model. My data is pretty big with 500 individuals and 40,000 genes. I am not sure if that's what causing the error. I do have some genes with all zero values across all individuals. Is there a way to remove those to make the data manageable, if that's what causing the error. Thank you!
Thank you! I still got the same error. Here is the summary of model:
Linear mixed model fit by REML ['lmerMod']
Formula: FPKM ~ 1 + (1 | gene) + (1 | Individual)
Data: data
REML criterion at convergence: 116744602
Scaled residuals:
Min 1Q Median 3Q Max
-113.42 -0.01 0.00 0.00 405.23
Random effects:
Groups Name Variance Std.Dev.
gene (Intercept) 249092 499.1
Individual (Intercept) 0 0.0
Residual 164508 405.6
Number of obs: 7851855, groups: gene, 29997; Genotype, 500
Fixed effects:
Estimate Std. Error t value
(Intercept) 35.300 2.946 11.98
convergence code: 0
boundary (singular) fit: see ?isSingular
Here random effect for individual has zero value. Is it possible to fit separate models for each gene. Thank you!
If you are going the route of fitting a model for the expression of each separated gene you should take a look at DESeq2 or edgeR. Those packages were created to do exactly what you are seeking.
And they have extensive documentation and vignettes. E.g.: