Generate new variables using number from loop

theinzawoo · October 11, 2020, 5:57am

I want to generate a variable based on the number of rows coming, we could not know how many rows will come in. The following is the data frame
v1<- c("caseid a3_enu_nm","caseid a3_sup_nm")
dup<-data.frame(v1)
The following is to generate variable r base on row of v1
r1<-list(dup$v1[[1]])
r1<-strsplit(unlist(r1), " ")
r1<-unlist(r1)

r2<-list(dup$v1[[2]])
r2<-strsplit(unlist(r2), " ")
r2<-unlist(r2)

The v1 of the current data frame has only 2 rows, in reality, there may be unknown rows filled by users.

So I want to generate r(n) depending on the numbers of rows of v1.I use the following for loop function, but it doesn't work, please suggest me.
for (j in 1:NROW(dup$v1)) {
r,j<-list(dup$v1[[j]])
r,j<-strsplit(unlist(rj), " ")
r,j<-unlist(rj)
}

stefan1 · October 11, 2020, 8:20am

There are several issues with your for loop, e.g. it should be length instead of NROW and r,jwill also not work.

Either way the preferred approach would be to store the results of your strsplit in a list instead of creating a variable for each row. Making use of lapply the following code creates a list where the result in the first list element corresponds to the first elemnt of v1, ... and so on. Depending on what your are finally trying to achieve it would possibly be preferable to store the result as new columns in your df:

v1<- c("caseid a3_enu_nm","caseid a3_sup_nm")
dup<-data.frame(v1)

r <- lapply(dup$v1, strsplit, " ")
r <- lapply(r, unlist)
r
#> [[1]]
#> [1] "caseid"    "a3_enu_nm"
#> 
#> [[2]]
#> [1] "caseid"    "a3_sup_nm"

^{Created on 2020-10-11 by the reprex package (v0.3.0)}

theinzawoo · October 11, 2020, 8:46am

Thanks for your idea,

Actually, I would like to reproducible variables using the number from a loop.
Based on the current data frame, I could create variable r1,r2 because there are only two rows of v1. These r1,r2,.... would be used as the class - a character for other tasks.

r1<-list(dup$v1[[1]])
r1<-strsplit(unlist(r1), " ")
r1<-unlist(r1)

r2<-list(dup$v1[[2]])
r2<-strsplit(unlist(r2), " ")
r2<-unlist(r2)

However, I would like to create the variable r in a  reproducible way - depending on the number of rows of dup$v1 because I could not know how many rows will come into dup$v1.
If n(n=1,2,3,.....) is the number of rows of v1, I want to make like below,

rn<-list(dup$v1[[n]])
rn<-strsplit(unlist(rn), " ")
rn<-unlist(rn)

To make it happen, I use for loop function over numbers of the row of dup$v1, and then, treat them as a character to be used for other tasks
for (j in 1:NROW(dup$v1)) {
   rj<-list(dup$v1[[j]])
   rj<-strsplit(unlist(rj), " ")
   rj<-unlist(rj)
} 

I would use the r1,r2,... to exact duplicates observation of another data frame, like below  

d1 = df[,c(r1)]           # select columns to check duplicates

dupid1<- df[duplicated(d1) | duplicated(d1, fromLast=TRUE),]

dupid1<-subset(dupid1,  select = c(k))

stefan1 · October 11, 2020, 9:05am

Okay. I see. You could create your variables from the list like so:

for (i in seq_along(r)) {
  assign(paste0("r", i), r[[i]])
}

However, instead of doing d1 = df[,c(r1)] you could also do d1 = df[ , r[[1]] ], i.e. without creating new variables. Additionally making use of the list will make it easier to check for the duplicates via a loop as you could simply loop over the list.

theinzawoo · October 11, 2020, 10:08am

Hi Stefan1,
Your suggestion really closed to the ways I want,
I did the following ways, but still need to improve

r <- lapply(dup$variable, strsplit, " ")
r <- lapply(r, unlist)
r
for (i in seq_along(r)) {
assign(paste0("r", i), r[[i]])
assign(paste0("d", i), df[ , r[[i]] ])
assign(paste0("dupid", i), df[duplicated("d",i) | duplicated("d",i, fromLast=TRUE),])
}

After running above, dupid1 and dupid2 have no observation,
However, I run this out of the loop, it works well
dupid1<- df[duplicated(d1) | duplicated(d1, fromLast=TRUE),]

I got 20 rows that have duplicated cases. Would you mind me to give away to improve in the loop to get a similar result like dupid1?

stefan1 · October 11, 2020, 10:28am

(: The issue is that you use "d", i to refer to your df d1. But that will not work.

As far as I get it r1, r2, ... and d1, d2, ... are just auxilliary variables. If you don't need them later on you can get rid of them and simply use:

for (i in seq_along(r)) {
  # Intstead of creating r1, .. and d1, ... store the result in a temporary df d
  d <- df[ , r[[i]] ]
  # d can now be used to check for the duplicates
  assign(paste0("dupid", i), df[duplicated(d) | duplicated(d , fromLast=TRUE),])
}

theinzawoo · October 11, 2020, 12:08pm

Hi Stefan1,
Thanks alot , I got the right ways

r <- lapply(dup$variable, strsplit, " ")
r <- lapply(r, unlist)
r
for (i in seq_along(r)) {
#assign(paste0("r", i), r[[i]])
d<-df[ , r[[i]] ]
dp<- df[duplicated(d) | duplicated(d, fromLast=TRUE),]
assign(paste0("dupid", i), subset(dp, select = c(k)))
}

system · November 1, 2020, 12:09pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.