Hi there,

have some problems in replacing elements in a dataframe.

I have a dataframe (10 columns and ten rows) constituted by letters (ACTG) and I would like to replace each letter with a number:

A <- 1

C <- 2

G <- 3

T <- 4

I tried the following script but it did not work

key <- c('A','T','C','G')

val <- c('1','2','3','4')

lapply(1:11,FUN = function(i){x[x == key[i]] <<- val[i]})

Could anybody help me?

Thank you very much

All the best

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

If you have stored the data set in some R object, dput function is very handy.

In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link .

1 Like

Dear Andres,

thank you very much for your kind reply.
Here after you can find a small part of my dataframe.

I tried to use datapasta and i worked well by I could not use reprex as I received this message:

no function 'reprex_selection' found in package 'reprex'.

thank you very much for your help.

Best regards

Chiara

A tibble: 13 x 7
S1_261059 S1_330484 S1_623981 S1_656912 S1_658173 S1_686055 S1_717357
1 C G A T C C C
2 C G A C T T C
3 C G G T C C C
4 T G A C T T G
5 C G G T C C C
6 C G G T C C C
7 C G A C T T G
8 C G A C T T G
9 T G A C T T G
10 C G G T C C C
11 C G G T C C C
12 C A A C T T G
13 T G A T C C C

Here's a `tidyverse`

solution. I've reproduced just the first 4 rows and columns of your data to illustrate.

```
library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.3
data <- tribble(~ S1_261059, ~ S1_330484, ~ S1_623981, ~ S1_656912,
"C", "G", "A", "T",
"C", "G", "A", "C",
"C", "G", "G", "T",
"T", "G", "A", "C")
data %>% mutate_all(~ case_when(. == "A" ~ 1,
. == "T" ~ 2,
. == "C" ~ 3,
. == "G" ~ 4))
#> # A tibble: 4 x 4
#> S1_261059 S1_330484 S1_623981 S1_656912
#> <dbl> <dbl> <dbl> <dbl>
#> 1 3 4 1 2
#> 2 3 4 1 3
#> 3 3 4 4 2
#> 4 2 4 1 3
```

^{Created on 2020-04-09 by the reprex package (v0.3.0)}

By the way, do those "A", "C", "G" and "T" represent the 4 DNA bases?

Dear Siddharth,

thank you very much.

ATCG are DNA bases actually. I would like to use the bpca package for a PCA analysis but it requires numeric values.

Actually I tried to install dplyr but I could download it.
I'll try again.

thanks a gain

1 Like

system
Closed
April 30, 2020, 12:19pm
6
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.