Replace element in a dataframe

Hi there,

have some problems in replacing elements in a dataframe.

I have a dataframe (10 columns and ten rows) constituted by letters (ACTG) and I would like to replace each letter with a number:

A <- 1

C <- 2

G <- 3

T <- 4

I tried the following script but it did not work

key <- c('A','T','C','G')

val <- c('1','2','3','4')

lapply(1:11,FUN = function(i){x[x == key[i]] <<- val[i]})

Could anybody help me?

Thank you very much

All the best

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

  1. If you have stored the data set in some R object, dput function is very handy.

  2. In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Dear Andres,

thank you very much for your kind reply.
Here after you can find a small part of my dataframe.

I tried to use datapasta and i worked well by I could not use reprex as I received this message:

no function 'reprex_selection' found in package 'reprex'.

thank you very much for your help.

Best regards

Chiara

A tibble: 13 x 7

S1_261059 S1_330484 S1_623981 S1_656912 S1_658173 S1_686055 S1_717357

1 C G A T C C C
2 C G A C T T C
3 C G G T C C C
4 T G A C T T G
5 C G G T C C C
6 C G G T C C C
7 C G A C T T G
8 C G A C T T G
9 T G A C T T G
10 C G G T C C C
11 C G G T C C C
12 C A A C T T G
13 T G A T C C C

Here's a tidyverse solution. I've reproduced just the first 4 rows and columns of your data to illustrate.

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.3

data <- tribble(~ S1_261059, ~ S1_330484, ~ S1_623981, ~ S1_656912,  
                "C", "G", "A", "T", 
                "C", "G", "A", "C",
                "C", "G", "G", "T",
                "T", "G", "A", "C")

data %>% mutate_all(~ case_when(. == "A" ~ 1,
                                . == "T" ~ 2,
                                . == "C" ~ 3,
                                . == "G" ~ 4))
#> # A tibble: 4 x 4
#>   S1_261059 S1_330484 S1_623981 S1_656912
#>       <dbl>     <dbl>     <dbl>     <dbl>
#> 1         3         4         1         2
#> 2         3         4         1         3
#> 3         3         4         4         2
#> 4         2         4         1         3

Created on 2020-04-09 by the reprex package (v0.3.0)

By the way, do those "A", "C", "G" and "T" represent the 4 DNA bases?

Dear Siddharth,

thank you very much.

ATCG are DNA bases actually. I would like to use the bpca package for a PCA analysis but it requires numeric values.

Actually I tried to install dplyr but I could download it.
I'll try again.

thanks a gain

1 Like