creating a custom R function in SPSS for summary

I am trying to create a custom R function in SPSS to get output Distinct count for dynamic Variables.

dt <- data.frame(AA =c("a","h","d","f","d","s","j","s","d","f","g","g","d","f","g","s","d","f","a","d","f"),
                 BB = c("ab","ac","ab","cd","da","bb","da","ac","fg","fg","bb","cd","ac","ab","da","ab","ac","an","fk","an","fk"))



# these cuts are dynamic 
dt$tot<- 1
dt$CA<-ifelse(dt$AA %in% c("g","f"),1,NA)
dt$AM<-ifelse(dt$AA  %in%  c("a","h","s","d"), 1, NA)

val_lab(dt$tot)<-c("Total"=1)
val_lab(dt$CA)<-c("CA"=1)
val_lab(dt$AM)<-c("AM"=1)

varcut <-list("tot","CA","AM") # z can be any type of list of cuts 


#dt = spssdata.GetDataFromSPSS()

#y <- spssdictionary.GetMultiResponseSet(mrsetName = "$varcut")
for (d in 1:length(dt)){
  cat('\n\n#### ', names(dt)[d], '\n\n')
md<-dt[[d]]
c<-cbind('\n\n#### ', length(unique(md$BB, incomparables = FALSE)))
c<-as.data.frame(c)
names(c)<-c(" ","")
cat('\n\n')
}
#spsspivottable.Display(c,  title='Number of organizations', format=formatSpec.GeneralStat,hiderowdimlabel=TRUE)

the objective is to create a custom function which can dynamically create a summary of distinct count of variables in BB column. I don't have any idea if there is any solution please help.

I am trying to create output like below.
image

It is not totally clear to me what you are trying to do, and I don't think your current code works. Would this help?

dt <- data.frame(AA =c("a","h","d","f","d","s","j","s","d","f","g","g","d","f","g","s","d","f","a","d","f"),
                 BB = c("ab","ac","ab","cd","da","bb","da","ac","fg","fg","bb","cd","ac","ab","da","ab","ac","an","fk","an","fk"))


# to count the distinct values in column BB
table(dt$BB)
#> 
#> ab ac an bb cd da fg fk 
#>  4  4  2  2  2  3  2  2

# to count the groups of values in column AA, as in your code
dt$AA_translated <- dplyr::case_when(dt$AA %in% c("g","f")         ~ "CA",
                                     dt$AA %in% c("a","h","s","d") ~ "AM")


table(dt$AA_translated, useNA = "ifany")
#> 
#>   AM   CA <NA> 
#>   12    8    1

Created on 2020-12-07 by the reprex package (v0.3.0)

actually I am trying create a code which can dynamically select varcut , and according to varcut variable . it can dynamically give count of unique values in column BB

so previously i was calculating for overall like below. but now I want to calculate according to varcut. but this will be a R function which i can use in SPSS

c<-cbind('\n\n#### ', length(unique(md$BB, incomparables = FALSE)))
c<-as.data.frame(c)
names(c)<-c(" ","")

In your code, md is a vector (one column of dt), so you can't use md$BB: a vector does not have columns. Perhaps you want to use dt$BB?

This only gives you the number of different values. If you want to count them, you can use table() as in my example code above.

I don't understand: is varcut generated in SPSS and passed to the R code, or is varcut generated within the R code? Are the columns CA and AM provided from the SPSS code, or do you have to create them in R based on varcut?

In your example, you are looking at the values in column AA. Unless the cuts of AA are provided from SPSS and you want to use them in R to count values in BB?

actually my code is not working so i am looking for a new or different solution.

Then you need to express clearly what is the data you get from SPSS.

  • There is a data frame dt that you get with dt = spssdata.GetDataFromSPSS() and that does look like your example, with columns AA and BB
  • There is a variable varcut or y that you get from SPSS? What is the class and content of y after that command: y <- spssdictionary.GetMultiResponseSet(mrsetName = "$varcut")? Does y only contain list("tot","CA","AM"), or does it also contain c("g","f") and c("a","h","s","d")?
  • If it's not part of y, how do you determine c("g","f") and c("a","h","s","d")? Where do these values come from?

And you also need to express clearly what is the result you want. If you just want to count the values in BB, then you can get that result with table(dt$BB). If you want to "cut" the values in BB and then count them, you have to explain what varcut or y looks like. If you want to use AA to count BB, you have to explain how.

You can find advice here on how to create a reproducible example.

Thanks you are helping !

so the data is same as i have putted sample
varcut is names of mutated columns having values 1 or zero so for CA the raw data will be filter CA == 1
these are three column tot = Total, CA = "canada", AM = "America"
BB is the name of cities for example so i want the dynamic count of cities according to varcut
y is same according to varcut

dt <- data.frame(AA =c("a","h","d","f","d","s","j","s","d","f","g","g","d","f","g","s","d","f","a","d","f"),
                 BB = c("ab","ac","ab","cd","da","bb","da","ac","fg","fg","bb","cd","ac","ab","da","ab","ac","an","fk","an","fk"))
# these cuts are dynamic 
dt$tot<- 1
dt$CA<-ifelse(dt$AA %in% c("g","f"),1,NA)
dt$AM<-ifelse(dt$AA  %in%  c("a","h","s","d"), 1, NA)

dt <- dt   # so the data is look like this


varcut is ( tot, CA , AM) and i am defining it as Y (multi response variable)
https://www.ibm.com/support/knowledgecenter/SSLVMB_24.0.0/spss/programmability_option/r_package_spssdictionary_getmultiresponseset.html

i want a function which can dynamically  give count distinct of values of BB in SPSS  with respect to varcut

for instance like after filter 1 in CA i want count of unique distinct values in BB column

because  varcut can be any list of set of variables


So something more like that?

# received from SPSS
dt <- data.frame(AA =c("a","h","d","f","d","s","j","s","d","f","g","g","d","f","g","s","d","f","a","d","f"),
                 BB = c("ab","ac","ab","cd","da","bb","da","ac","fg","fg","bb","cd","ac","ab","da","ab","ac","an","fk","an","fk"))
# these columns are provided from SPSS
dt$tot<- 1
dt$CA<-ifelse(dt$AA %in% c("g","f"),1,NA)
dt$AM<-ifelse(dt$AA  %in%  c("a","h","s","d"), 1, NA)

# This is provided from SPSS
y <- list("tot","CA","AM")


# to just sum the number of rows in each cut
count_values_in_cut <- function(varcut){
  sum(dt[[varcut]] == 1, na.rm = TRUE)
}
lapply(y, count_values_in_cut)
#> [[1]]
#> [1] 21
#> 
#> [[2]]
#> [1] 8
#> 
#> [[3]]
#> [1] 12

# To count the individual values in BB for each cut
count_BB_values <- function(varcut){
  table(dt$BB[dt[[varcut]] == 1])
}

lapply(y, count_BB_values)
#> [[1]]
#> 
#> ab ac an bb cd da fg fk 
#>  4  4  2  2  2  3  2  2 
#> 
#> [[2]]
#> 
#> ab an bb cd da fg fk 
#>  1  1  1  2  1  1  1 
#> 
#> [[3]]
#> 
#> ab ac an bb da fg fk 
#>  3  4  1  1  1  1  1

Created on 2020-12-08 by the reprex package (v0.3.0)

Thanks alot for your help.
I have done this way but do we have any other solution to do this below process.
I mean if we have another way(simple way) to do this then it will be better for me to modify in future.

dt <- spssdata.GetDataFromSPSS()
vardict<-spssdictionary.GetDictionaryFromSPSS()
varlist<-spssdictionary.GetMultiResponseSet(mrsetName="$varcut")
mrlabel<-varlist$label
nam<-vardict["varName",]
labels<-vardict["varLabel",]
vl<-varlist$vars

tb<-data.frame(matrix(ncol=2, nrow=1, dimnames=list(NULL, c("v","n"))))

for (i in 1:length(vl)) {
  subs<- dt[which(dt[,vl[[i]]]==1),]
  for (nm in 1:length(nam)) {
     if (nam[[nm]]==vl[[i]]) {
        lab<-as.character(labels[[nm]])
     }
  }
  nOrg<- as.numeric(length(unique(subs$Q5_2_TEXT, incomparables = FALSE)))
 rec<-list(lab, nOrg)
  tb<-rbind(tb,rec)
}

tb=tb[-1,]

Sorry, I can't help, I don't understand what the code is doing (since I don't know what is in nam, labels and vl).

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.