Hi there,
have some problems in replacing elements in a dataframe.
I have a dataframe (10 columns and ten rows) constituted by letters (ACTG) and I would like to replace each letter with a number:
A <- 1
C <- 2
G <- 3
T <- 4
I tried the following script but it did not work
key <- c('A','T','C','G')
val <- c('1','2','3','4')
lapply(1:11,FUN = function(i){x[x == key[i]] <<- val[i]})
Could anybody help me?
Thank you very much
All the best
Can you please share a small part of the data set in a copy-paste friendly format?
In case you don't know how to do it, there are many options, which include:
If you have stored the data set in some R object, dput function is very handy.
In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link .
1 Like
Dear Andres,
thank you very much for your kind reply.
Here after you can find a small part of my dataframe.
I tried to use datapasta and i worked well by I could not use reprex as I received this message:
no function 'reprex_selection' found in package 'reprex'.
thank you very much for your help.
Best regards
Chiara
A tibble: 13 x 7
S1_261059 S1_330484 S1_623981 S1_656912 S1_658173 S1_686055 S1_717357
1 C G A T C C C
2 C G A C T T C
3 C G G T C C C
4 T G A C T T G
5 C G G T C C C
6 C G G T C C C
7 C G A C T T G
8 C G A C T T G
9 T G A C T T G
10 C G G T C C C
11 C G G T C C C
12 C A A C T T G
13 T G A T C C C
Here's a tidyverse
solution. I've reproduced just the first 4 rows and columns of your data to illustrate.
library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.3
data <- tribble(~ S1_261059, ~ S1_330484, ~ S1_623981, ~ S1_656912,
"C", "G", "A", "T",
"C", "G", "A", "C",
"C", "G", "G", "T",
"T", "G", "A", "C")
data %>% mutate_all(~ case_when(. == "A" ~ 1,
. == "T" ~ 2,
. == "C" ~ 3,
. == "G" ~ 4))
#> # A tibble: 4 x 4
#> S1_261059 S1_330484 S1_623981 S1_656912
#> <dbl> <dbl> <dbl> <dbl>
#> 1 3 4 1 2
#> 2 3 4 1 3
#> 3 3 4 4 2
#> 4 2 4 1 3
Created on 2020-04-09 by the reprex package (v0.3.0)
By the way, do those "A", "C", "G" and "T" represent the 4 DNA bases?
Dear Siddharth,
thank you very much.
ATCG are DNA bases actually. I would like to use the bpca package for a PCA analysis but it requires numeric values.
Actually I tried to install dplyr but I could download it.
I'll try again.
thanks a gain
1 Like
system
Closed
April 30, 2020, 12:19pm
6
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.