Replace element in a dataframe

Hi there,

have some problems in replacing elements in a dataframe.

I have a dataframe (10 columns and ten rows) constituted by letters (ACTG) and I would like to replace each letter with a number:

A <- 1

C <- 2

G <- 3

T <- 4

I tried the following script but it did not work

key <- c('A','T','C','G')

val <- c('1','2','3','4')

lapply(1:11,FUN = function(i){x[x == key[i]] <<- val[i]})

Could anybody help me?

Thank you very much

All the best

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

  1. If you have stored the data set in some R object, dput function is very handy.

  2. In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

1 Like

Dear Andres,

thank you very much for your kind reply.
Here after you can find a small part of my dataframe.

I tried to use datapasta and i worked well by I could not use reprex as I received this message:

no function 'reprex_selection' found in package 'reprex'.

thank you very much for your help.

Best regards

Chiara

A tibble: 13 x 7

S1_261059 S1_330484 S1_623981 S1_656912 S1_658173 S1_686055 S1_717357

1 C G A T C C C
2 C G A C T T C
3 C G G T C C C
4 T G A C T T G
5 C G G T C C C
6 C G G T C C C
7 C G A C T T G
8 C G A C T T G
9 T G A C T T G
10 C G G T C C C
11 C G G T C C C
12 C A A C T T G
13 T G A T C C C

Here's a tidyverse solution. I've reproduced just the first 4 rows and columns of your data to illustrate.

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.3

data <- tribble(~ S1_261059, ~ S1_330484, ~ S1_623981, ~ S1_656912,  
                "C", "G", "A", "T", 
                "C", "G", "A", "C",
                "C", "G", "G", "T",
                "T", "G", "A", "C")

data %>% mutate_all(~ case_when(. == "A" ~ 1,
                                . == "T" ~ 2,
                                . == "C" ~ 3,
                                . == "G" ~ 4))
#> # A tibble: 4 x 4
#>   S1_261059 S1_330484 S1_623981 S1_656912
#>       <dbl>     <dbl>     <dbl>     <dbl>
#> 1         3         4         1         2
#> 2         3         4         1         3
#> 3         3         4         4         2
#> 4         2         4         1         3

Created on 2020-04-09 by the reprex package (v0.3.0)

By the way, do those "A", "C", "G" and "T" represent the 4 DNA bases?

Dear Siddharth,

thank you very much.

ATCG are DNA bases actually. I would like to use the bpca package for a PCA analysis but it requires numeric values.

Actually I tried to install dplyr but I could download it.
I'll try again.

thanks a gain

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.