I'm aiming to make a script that can be used to filter and tidy faunal abundance data to prepare it for multivariate analyses (as would be required for analysis in something like Primer) that is faster than sorting it manually through Excel.
I am dealing with a large faunal abundance data set which contains a mix of numeric values, and "P" (Present) values for where taxa were observed but could not be accurately counted. I'd like to learn how I can purge rows containing any "P" values without having to go column by column.
Here's a summary of the data I'm looking at. Only one "P" is visible in the first row, but a fair few more are scattered throughout the data:
# A tibble: 383 × 111 Code Phylum `Taxa ID` Qualifiers S001_F1 S001_F2 S001_F3 S002_F1 S002_F2 S002_F3 <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 1 A5025 Ciliophora Ciliophora NA - - - P - - 2 A5050 Ciliophora Folliculin… NA - - - - - - 3 C0001 Porifera Porifera NA - - - - - - 4 C0053 Porifera Leucosolen… NA - - - - - - 5 C0133 Porifera Sycon NA - - - - - - 6 D0158 Cnidaria Tubulariid… NA - - - - - - 7 D0218 Cnidaria Eudendrium NA - - - - - - 8 D0246 Cnidaria Bougainvil… NA - - - - - - 9 D0335 Cnidaria Lovenella … NA - - - - - - 10 D0343 Cnidaria Phialella … NA - - - - - - # … with 373 more rows, and 101 more variables:
And the script I've managed so far (not much unfortunately!). My attempt so far works for one column to remove rows, but ideally I don't want to have to go column by column!
#1. Keep only rows that contain "aggregate" or "female" or "NA" qualifiers AbundanceFilteredQualifiers2 <- SeaLinkFauna11_PRIMER[SeaLinkFauna11_PRIMER$Qualifiers %in% c('aggregate', 'female', NA),] #2. Drop rows with empty phylum AbundanceFilteredPhylum <- SeaLinkFauna11_PRIMER %>% filter_at(vars(Phylum), all_vars(!is.na(.))) #3. Remove rows containing "P" Presence <- subset(SeaLinkFauna11_PRIMER, S002_F1 != 'P')
Thanks in advance!