Hi all,
As we know that three sets of codons codes for an amino acids, for example ATG codes just for M(methionine) and ATC, ATA,ATT codes for I (isoleucine)
and percentage of ATG in a DNA sequence would always be 1 for coding M and percentage of ATC in DNA sequence would always be 0.33 for coding I so as ATA and ATT.
I want to make a function which could calculate the counts of the codons in a sequence and then calculate its frequency percentage of forming particular amino acids.
codon <- list(ATA = "I", ATC = "I", ATT = "I", ATG = "M", ACA = "T",
ACC = "T", ACG = "T", ACT = "T", AAC = "N", AAT = "N", AAA = "K",
AAG = "K", AGC = "S", AGT = "S", AGA = "R", AGG = "R", CTA = "L",
CTC = "L", CTG = "L", CTT = "L", CCA = "P", CCC = "P", CCG = "P",
CCT = "P", CAC = "H", CAT = "H", CAA = "Q", CAG = "Q", CGA = "R",
CGC = "R", CGG = "R", CGT = "R", GTA = "V", GTC = "V", GTG = "V",
GTT = "V", GCA = "A", GCC = "A", GCG = "A", GCT = "A", GAC = "D",
GAT = "D", GAA = "E", GAG = "E", GGA = "G", GGC = "G", GGG = "G",
GGT = "G", TCA = "S", TCC = "S", TCG = "S", TCT = "S", TTC = "F",
TTT = "F", TTA = "L", TTG = "L", TAC = "Y", TAT = "Y", TAA = "stop",
TAG = "stop", TGC = "C", TGT = "C", TGA = "stop", TGG = "W")
( fracs <- 1/table(unlist(codon)) )
codonfracs <- setNames(lapply(codon, function(x) unname(fracs[x])), names(codon))
str(head(codonfracs))
s <- 'AAGGCCTGCGCAAATATTTCCACTCCTTCCCGGGTGCTCCTGAGTTGAACCCGC
TTAGAGACTCCGAAATCAACGACGACTTCCACCAGTGGGCCCAGTGACCGCCACACTGGA
CCCCATACCACTTCTTTTTGTTATTCTTAAATATGTT
'
strsplit3 <- function(s, k=3) {
starts <- seq.int(1, nchar(s), by=k)
stops <- c(starts[-1] - 1, nchar(s))
mapply(substr, s, starts, stops, USE.NAMES=FALSE)
}
strsplit3(s)
I have separated my argument into frame of 3. Please guide me in finding the count of each codon in an argument also its percentage of occurrence. It Output for which i am looking for is in the form of table which includes four column Codons, amino acids for which it codes for, count of codons and percentage of occurrence for forming that amino acids.
Thank you.