How to reference variables without the dataset modifier

Hi,

I am a new R user, coming from SPSS. I have no coding background but have done reasonably well teaching myself how to do simple things in R.

One basic issue that I can't seem to find (probably because I'm not asking the right question or using the correct terminology in my search) is how to reference specific variables in my code without having to reference the dataset simultaneously.

For example, I have written the following:

ggplot(data, 
       aes(x=data$var1, y=data$var2)) +
  geom_point()+
  geom_smooth(method=lm)

Now, this does generate the scatterplot that I want, but it gets tedious having to write "data$. . ." with each instance of the variables. it would be so much easier for me if I were able to just directly write the variable name directly.

What am I missing? I know of the Attach() function but have been told this is not good practice.

Suggestions are welcome.

Have you tried

data%>%
ggplot() +
aes(x=var1, y=var2) +
geom_point() +
geom_smooth(method=1m)

Tell.us if it works.

Yes, that did work (and I just learned about piping as well, so thanks for that).

I also realized that the "data" in the first line of code is also passed to the aes() line. Thus, the inclusion of that data$ was not necessary in the first place.

Now I've tried to replicate this using cor.test and tried

data %>%
    cor.test(x = var1, y = var2, use = "complete.obs")

This did not run and returns the error: Error in match.arg(alternative) : object 'var2' not found

I also tried this, and this also didn't work:

cor.test(data = data,
         x = var1,
         y = var2,
         use = "complete.obs")

The error returned here is: object 'var1' not found

But this does run:

cor.test(x = data$var1, y = data$var2, use = "complete.obs")

R has many independent contributors, so each package has its own syntax variants and you should always consult the functions documentation to know when to use wich syntax. cor.test() is not pipe friendly so you need to use place holders with the pipe, and when you use the data argument you need to also use the formula argument instead of x and y wich accepts vectors only.

Take a look at this examples

library(magrittr)

# Using vectors
cor.test(x = iris$Sepal.Length, y = iris$Sepal.Width, use = "complete.obs")
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  iris$Sepal.Length and iris$Sepal.Width
#> t = -1.4403, df = 148, p-value = 0.1519
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  -0.27269325  0.04351158
#> sample estimates:
#>        cor 
#> -0.1175698

# Using the data plus formula arguments
cor.test(formula = ~ Sepal.Length + Sepal.Width, use = "complete.obs", data = iris)
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  Sepal.Length and Sepal.Width
#> t = -1.4403, df = 148, p-value = 0.1519
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  -0.27269325  0.04351158
#> sample estimates:
#>        cor 
#> -0.1175698

# Using the pipe with a place holder and named arguments.
iris %>% 
    cor.test(formula = ~ Sepal.Length + Sepal.Width, use = "complete.obs", data = .)
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  Sepal.Length and Sepal.Width
#> t = -1.4403, df = 148, p-value = 0.1519
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  -0.27269325  0.04351158
#> sample estimates:
#>        cor 
#> -0.1175698

Created on 2022-08-04 by the reprex package (v2.0.1)

2 Likes

A couple of other examples


dat1  <- data.frame(xx = 1 : 20, yy = 20: 1)

with(dat1, cor.test(xx, yy, use = "complete.obs"))



ggplot(dat1, 
       aes(xx, yy)) +
  geom_point()+
  geom_smooth(method=lm)

All of the solutions have been great and shown just how versatile R can be. Thanks to everyone who has taken the time to help!