Combined data frame names & variable names within a function to create new variables

Hello everyone,
I want to create a function that allows me to run the same code using different variables. I have a data set with unique observation by individual, and three variables (i.e. X,Y, and Z).
I want to perform the following analysis.

myFunc = function(dat,var1,var2){
paste(dat$var2) = ifelse(is.na(paste(dat$var1)), NA,(paste(dat$var2)))
print(paste(dat$var2))
}

myFunc(MyDat,X,Y)

It could also be something like
freq = function(dat,var){
table(dat$var)
}

freq(MyDat,X)

Of course, this is an oversimplification of my problem. As you may have noticed, my main goal is to learn how to combined data frame names & variable names within a function to create new variables (e.g. paste(dat$var2)). Once I learn how to do that, I can use a similar principle to run less basic analysis.

I tried paste(MyDat$X) outside the function and it works perfectly. That’s to say, it shows the X variable from the MyDat data frame. I would appreciate if you have show me how to do that. I posted a similar question in the past, but I did not get a proper answer. Maybe I did not do a good job at explaining my situation. Hopefully, this time I’ll get an answer.

Please, see below a simplified version of my data frame as well as some other information that you may need.

ID X Y Z
1 1 0 NA
2 0 NA NA
3 1 0 NA
4 1 0 NA
5 1 1 1
6 1 1 1
7 1 0 NA
8 1 0 NA
9 1 1 1
10 1 1 1
11 1 1 1
12 1 1 1
13 1 0 NA
14 1 1 0
15 1 0 NA
16 1 0 NA
17 1 1 1
18 1 0 NA
19 1 1 0

Additional relevant information:
OS: Windows 10 (64-bit)
R version: 3.6.2
R studio version: 3.5

Thanks a lot for your support.

A.G.

Is this what you mean?

# Sample data
sample_df <- data.frame(
          ID = c(1,2,3,4,5,6,7,8,9,10,11,12,13,
                 14,15,16,17,18,19),
           X = c(1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
           Y = c(0, NA, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1),
           Z = c(NA,NA,NA,NA,1,1,NA,NA,1,1,1,1,
                 NA,0,NA,NA,1,NA,0)
)

my_func <-  function(dat,var1,var2){
    dat[[var2]] <- ifelse(is.na(dat[[var1]]), NA, dat[[var2]])
    print(dat[[var2]])
}

my_func(sample_df, "Y", "Z")
#>  [1] NA NA NA NA  1  1 NA NA  1  1  1  1 NA  0 NA NA  1 NA  0

Created on 2021-12-02 by the reprex package (v2.0.1)

If this doesn't solve your problem, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

1 Like

Hi andresrcs,
Thanks a lot for your soon reply. The code you suggest works perfectly. However, it does not answer my question on how to combined data frame names & variable names within a function to create new variables (e.g. paste(dat$var2)).
In my second example, I would like run a simple frequency analysis by first putting together dat$var2 (or dat$var) and the run the analysis. That would be the starting point for other type of analysis such as regression, etc.
Hope this is more clear now.
Best,
A.G.

d$x and d[[x]] are equivalent syntax in so far as they pick out (for getting out, or settting in ) a named object from a list of names yet the square bracket syntax directly supports metaprogramming because the x within the [[x]] is evaluated first, whereas with $ is directly applied.

1 Like

Hi nirgrahamuk,
I think I can work with that. I'm trying to transfer my SAS programming habits into R, but it seems I need to make some adjustments.
As previously mentioned, that code runs well. When I run the code as below, the frequency analysis makes sense.

myFunc = function(dat,var1,var2){
dat[[var2]] = ifelse(is.na(dat[[var1]]), NA, dat[[var2]])
table(dat[[var2]])
}

However, when I run again "table(dat[[var2]])" by itself outside (and after) the function, I get the original values prior to running the function. It's like the replacement in the function is not permanently saved. I have no clue what's going on. I'm fairly new to R, so I apologize if is this is kind of a silly question.

Thanks a lot.

Hello @alejandroglez ,

in your function you changed a copy of dat not the original one.
But of course you can return the changed copy as a result of your function:

myFunc = function(dat,var1,var2){
dat[[var2]] = ifelse(is.na(dat[[var1]]), NA, dat[[var2]])
dat
}

changed_dat <- myFunc(dat,var1,var2)
table(changed_dat)

There are ways to 'assign' the internal results of a function to a global variable,
but try to avoid these when you can.

1 Like

That works perfectly. Thanks a lot @HanOostdijk.
And thanks a lot too to @andreasr & @nirgrahamuk.
You all helped me to solve a problem I was stuck with.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.