Error in strsplit(genotypes, "/") : non-character argument

Hi
Good Evening
I am trying convert my SNP genotyping data into 0,1,2, format. I am trying with the below code, but I am getting error in strsplit, "/":non-character argument. here is my reproducible code
'''r

Example SNP genotyping data (replace this with your actual data)

snp_data <- data.frame(
SNP1 = c('A/A', 'A/G', 'G/G'),
SNP2 = c('C/C', 'C/T', 'T/T'),
SNP3 = c('A/A', 'A/A', 'A/G')
)

Function to convert SNP genotypes to number of copies of reference allele (0, 1, or 2)

convert_genotypes <- function(genotypes, reference_allele = NULL) {

If reference allele is not provided, use the most common allele as the reference

if (is.null(reference_allele)) {
reference_allele <- names(sort(table(unlist(strsplit(genotypes, '/'))), decreasing = TRUE)[1])
}

Create a matrix to store the converted genotypes

converted_genotypes <- matrix(NA, nrow = nrow(genotypes), ncol = ncol(genotypes))

Loop through each SNP

for (i in 1:ncol(genotypes)) {
# Split the genotypes into alleles
alleles <- unlist(strsplit(genotypes[, i], '/'))

# Count the number of reference alleles for each individual
counts <- sapply(alleles, function(x) sum(x == reference_allele))

# Store the counts in the converted genotypes matrix
converted_genotypes[, i] <- counts

}

return(converted_genotypes)
}

Convert SNP genotypes to number of copies of reference allele

converted_data <- convert_genotypes(snp_data)

Print the original and converted data

cat("Original SNP Data:\n")
print(snp_data)

cat("\nConverted Genotypes (Number of Copies of Reference Allele):\n")
print(converted_data)
'''
I tried to convert data using as.character before loop but still i am getting error.can anyone help to solve this issue? any help in this regard will be highly appreciated
Thanks in advance

The error is caused by passing a data frame instead of a vector to strsplit when you determine the reference_allele. Try using this instead:

if (is.null(reference_allele)) {
    reference_allele <- names(sort(table(unlist(strsplit(unlist(genotypes), "/"))), decreasing = TRUE)[1])
  }

The next problem is that the counts vector has a length of 6 and converted_genotypes has three rows, so converted_genotypes[, i] <- counts causes an error. Since I don't understand your goal, I'm not sure of the solution.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.