help extracting values from dataframe

rstudio

#1

Total newbie to R here, I realize this is a basic question but I've been trying to figure it out for a long time here. I have a data frame with 9 columns; 6 of which describe the specific sample, 3 of which are actual numerical data. I have three different independent variables going on here, which is "root.exclusion" "fertilization" and "species." I am trying to select the values of "P" from which fertilization=phosphorus, species = FERO, and root.exclusion=parasitism , so I can compare with another subset of values, in which, for example, fertilization = no phosphorus

I realize this is insanely simple, but I can't get those . Any help with the code to start off? I've attached a screenshot of my .csv file that I've imported into R


#2

This is a problem that the tidyverse was designed to make tractable.

library(dplyr)
library(tibble)
agronomics <- as.tibble(read.csv("path_to_your_file.csv", stringsAsFactors = FALSE)
subset1 <- agronomics %>% filter(fertilization == "phosphorous" & species == "FERO && root.exclusion == parasitism)

You don't strictly need tibble, but if you have more than a screenful of data it very politely shows only the first 10 and tells you how many more you have to go.

Given that subset, if you're only interested in P-K-Ca, for example, you can tack on %>% select(P,K,Ca) or if you were want to get ride of HostSpecies `%>% select(-HostSpecies).


#3

I have approximately made the dataset according to the table (NO NEED TO GO TO DETAILS, MAY NOT BE ACCURATE)
data<-data.frame("fertilization"=c("phosphorus","phosphorus","phosphorus","phosphorus","phosphorus","phosphorus","phosphorus","phosphorus","phosphorus","phosphorus","phosphorus","phosphorus","no phosphorus","no phosphorus","no phosphorus","no phosphorus","no phosphorus",
"no phosphorus","no phosphorus","no phosphorus"),
"root.exclusion"=c("no p","no p","p","p","p","p","p","p","no p","no p","p","p","no p","no p","p","p","no p","no p","p","p"),
"species"=c("CALE","ACMI","CALE","FERO","CALE","ACMI","CALE","FERO","CALE","FERO","CALE","ACMI","CALE","FERO","CALE","FERO","CALE","ACMI","CALE","ACMI"),
"P"=c(4980,5348,7078,4589,6344,4097,5578,2967,3524,4396,5337,4343,3075,3621,4086,2772,2175,4240,4825, 2615))

THE FUNCTION I AM USING IS

selected_P<-data$P[data$fertilization=="phosphorus"&data$species=="FERO"&data$root.exclusion=="p"]
selected_P
O/P
[1] 4589 2967

Is this the solution.
Apologies if I didn't understand the problem correctly.
As I am a newbie myself. :wink:


#4

AND OPERATOR (LOGICAL OPERATOR)


#5

Thank you this is very helpful!


#6

This is a real good solution. I'd tweak it just a bit by using readr which eliminates needing to set stringsasfactors and the tibble call. And I think the filter statement needed some quotes and only single &


library(dplyr)
library(readr)
agronomics <- read_csv("path_to_your_file.csv")

subset1 <- agronomics %>% 
    filter(fertilization == "phosphorous" & 
    species == "FERO" & 
    root.exclusion == "parasitism")