Hello there !
I have to do a NMDS (Non-metric multidimensional scaling) on my bacterial datas which represents somr factors and their score for each strains, when there is a "-", that means the strain doesn't have the factor.
The problem is that my datas are not very organised but I don't know how to proceed without loosing a lot of my datas:
Here is the first lines of the file :
If you have any suggestions ? Thanks
your screenshot implies that you have the data in WPS/SAS , so probably it will be most convenient to use haven package in R to import the sas7bdat file, and skip any excel steps in your workflow.
Thank you for your answer,
here is a view on rstudio
I imported it with
"library("xlsx")
df <- read.xlsx("/home/raw.xlsx", 1)"
thanks,but no thanks. eh ?
1 Like
Leon
June 17, 2020, 11:08am
5
Hi @LarraFarron ,
Welcome! Take a look at this:
# Load libraries ----------------------------------------------------------
library("tidyverse")
# Set example data --------------------------------------------------------
set.seed(9481)
d <- tibble(ids = str_c("strain", 1:5),
`-` = c("0", rep("-", 4)),
F1 = sample(c(rnorm(2), "-"), size = 5, replace = TRUE),
F2 = sample(c(rnorm(2), "-"), size = 5, replace = TRUE),
V1 = sample(c(rnorm(2), "-"), size = 5, replace = TRUE),
V2 = sample(c(rnorm(2), "-"), size = 5, replace = TRUE),
X1 = sample(c(rnorm(2), "-"), size = 5, replace = TRUE),
X2 = sample(c(rnorm(2), "-"), size = 5, replace = TRUE))
# Wrangle data ------------------------------------------------------------
d_clean <- d %>%
pivot_longer(cols = -ids, names_to = "strain", values_to = "score") %>%
mutate(score = case_when(score == "-" ~ NA_character_,
score != "-" ~ score) %>% as.numeric)
Yielding:
> d
# A tibble: 5 x 8
ids `-` F1 F2 V1 V2 X1 X2
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 strain1 0 - -0.334358504586595 0.903817354962354 - 1.12019753417009 1.34203949772225
2 strain2 - - -0.290915459898793 - 0.701652663546783 1.12019753417009 -
3 strain3 - 0.301567369458915 - -0.199246032293474 0.701652663546783 1.12019753417009 -
4 strain4 - 0.301567369458915 -0.290915459898793 - 0.701652663546783 0.32104758856752 1.34203949772225
5 strain5 - 0.301567369458915 -0.334358504586595 0.903817354962354 0.701652663546783 1.12019753417009 1.34203949772225
> d_clean
# A tibble: 35 x 3
ids strain score
<chr> <chr> <dbl>
1 strain1 - 0
2 strain1 F1 NA
3 strain1 F2 -0.334
4 strain1 V1 0.904
5 strain1 V2 NA
6 strain1 X1 1.12
7 strain1 X2 1.34
8 strain2 - NA
9 strain2 F1 NA
10 strain2 F2 -0.291
# … with 25 more rows
...and then take a look at this excellent ressource R for Data Science , which I think will benefit you.
Hope it helps
Thank you very much Leon, it's way better I'll go take a look at this ressource
system
Closed
June 24, 2020, 11:54am
7
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.