non parametric ANCOVA with 2 factors and 2 covariates (npsm package)

Hallo everyone, I am new to R and to the forum and I hope I am doing this right.

I need to run a non parametric ANCOVA on my data (called "dataset") which has a response variable (Y), 2 factors (A with 2 levels and B with 3 levels) and 2 covariates (X, Z).

I have been looking through the R packages on non parametric ANCOVA and it seems to me that only the npsm (Kloke and McKean) package has this option (multiple factors with more than 2 grouping levels and multiple covariates).

I am following example 5.4.4 in Kloke and McKean's book "non parametrical statistical methods using R" (pages 138-141), but I keep getting an error message: "object xfull not found".

I think this is because xfull needs to be defined but I cannot work out how to do this.
I cannot find it in the book and I spent the last few days looking it up on the Internet with no success.

I have a dataset with 5 columns: first column Y, second column A, third column B, fourth column X and fifth column Z.

The code given in the book is the following

library (npsm)
fitfull = rfit (dataset [, "Y"] ~ xfull -1)

I have managed to work through all the other examples but I am definitely stuck on this. I apologize if the question is stupid (maybe the answer is really obvious), but I am a novice and I would really appreciate any help you can give me!


Hi, and welcome!

There are no stupid questions here. We're a community of learners from beginners through intermediate and advanced, with occasional visits from on high--some of the top luminaries in the R universe.

One of the things that allows us to be helpful is reproducible examples, called a reprex . In the case of the code snippet, dataset is not in the namespace by default. Representative data in the same form (even from a built-in data set) makes for more and better answers.

In this case, though, running the example from help(package = Rfit, rfit) illustrates the problem:

#> Call:
#> rfit.default(formula = weight ~ height, data = baseball)
#> Coefficients:
#>               Estimate Std. Error t.value   p.value    
#> (Intercept) -228.57143   56.14659 -4.0710  0.000146 ***
#> height         5.71429    0.76018  7.5171 4.373e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Multiple R-squared (Robust): 0.4418605 
#> Reduction in Dispersion Test: 45.125 p-value: 0

Created on 2020-01-03 by the reprex package (v0.3.0)

Then, take a look at the rfit function signature

Default S3 method:
rfit(formula, data, subset, yhat0 = NULL,
scores = Rfit::wscores, symmetric = FALSE, TAU = "F0", ...)

subset is optional, but it needs both formula and data. The error thrown by the code snippet reflects the missing data argument, which would tell rfit where to find xfull.

Dear technocrat, thank you very much for your answer ( and for being so understanding: such a relief!).

I am importing the data from SPSS, so I forgot to say that "dataset" was in fact defined by the following code:

library (foreign)
dataset=read.spss("C:\Users\mrk\Documents\ filename.sav",

If I use the code:

library (npsm)
model <- rfit (Y ~A+ B + X +Z, data=dataset)
summary (model)

I have no problems. From what I understand, by doing this I am running a nonparametric rank-based regression (as in Kang, Kloke et al., 2012. doi;10.1007%2Fs10286-012-0158-6)

However, I am still stuck on the nonparametric ANCOVA from the book example. I think that, since dataset is defined, I should be defining xfull, but I don't know how to do it. I have been again looking through all Internet references to J. Kloke, but this thing is beyond me.

PS: I am also working on the reprex. I have not yet managed to download the reprex package because avast is blocking it no matter how I try to enable it. I am also trying to understand how to show part of the data.


Hi, Paola,

This is good information!

Let's start with some basic checks. I'm going to use an example from the help(read.spss) page.

(sav <- system.file("files", "electric.sav", package = "foreign"))
#> [1] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library/foreign/files/electric.sav"
dat <- read.spss(file=sav, 
str(dat)   # now a data.frame
#> 'data.frame':    240 obs. of  13 variables:
#>  $ CASEID  : num  13 30 53 84 89 102 117 132 151 153 ...
#>  $ FIRSTCHD: Factor w/ 5 levels "NO CHD","SUDDEN  DEATH",..: 3 3 2 3 2 3 3 3 2 2 ...
#>  $ AGE     : num  40 49 43 50 43 50 45 47 53 49 ...
#>  $ DBP58   : num  70 87 89 105 110 88 70 79 102 99 ...
#>  $ EDUYR   : num  16 11 12 8 NA 8 NA 9 12 14 ...
#>  $ CHOL58  : num  321 246 262 275 301 261 212 372 216 251 ...
#>  $ CGT58   : num  0 60 0 15 25 30 0 30 0 10 ...
#>  $ HT58    : num  68.8 72.2 69 62.5 68 68 66.5 67 67 64.3 ...
#>  $ WT58    : num  190 204 162 152 148 142 196 193 172 162 ...
#>  $ DAYOFWK : Factor w/ 7 levels "SUNDAY","MONDAY",..: NA 5 7 4 2 1 NA 1 3 5 ...
#>  $ VITAL10 : Factor w/ 2 levels "ALIVE","DEAD": 1 1 2 1 2 2 1 1 2 2 ...
#>  $ FAMHXCVR: Factor w/ 2 levels "NO","YES": 2 1 1 2 1 1 1 1 1 2 ...
#>  $ CHD     : num  1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "names")= chr  "CASEID" "FIRSTCHD" "AGE" "DBP58" ...
#> [1] "dat" "sav"

Created on 2020-01-04 by the reprex package (v0.3.0)

This tells us two things:

The dat data frame (equivalent to your dataset) has no field xfull, so SPSS don't include that by default. Second, there is nothing in the global environment that ls() shows as anything other than the objects, sav and dat that we have specifically created. Try the same with dataset in a fresh session to confirm that is the case.

I also took a look at help(rfit). Interesting fact: Kloke is a co-author! There's no mention of an xfull parameter. A bit of Googling, however, suggests that xfull is actually a SPSS function to create a matrix. This is a term also used in some R packages.

So, I think that you have found the correct solution with

model <- rfit (Y ~A+ B + X +Z, data=dataset)

for two reasons:

  1. it gives you the expected result
  2. it's a well formed formula in R regression usage

Kloke has a new text: Nonparametric Statistical Methods Using R (Chapman & Hall/CRC The R Series) 1st Edition by John Kloke (Author), Joseph W. McKean (Author) specifically using R.

The best I can offer is that if the results are similar to the example from the help(rfit) page, you have probably understood the function.

The reprex problem could be that you are behind a firewall or your default repository is down. Try

install.packages("reprex", repos='', ask=FALSE, checkBuilt=TRUE)

Dear technocrat,

thank you so much. The example is very clear: if xfull is not defined it cannot be recognized!

I agree that xfull must be a matrix. The thing is that I have no idea how Kloke defines it. I have been using his book and the only clue I have is on p 39 "..the full model design matrix consists of the four dummy columns for the cell means and the 5 covariates. In the
following code segment this design matrix is in the R matrix xfull ...". I am using the 2014 edition and I could not find anything newer.

However, I guess that, as you say, if Rfit works I should not be worrying too much. In the end ANCOVA and multiple linear regression are fundamentally similar.

I really appreciate all the help you have been giving me, I could not have worked all this out on my own.

PS: with the command you gave me the reprex is now working .


1 Like

@paola, I read the sentence as meaning a matrix with, implicitly, the response variate, Y, four columns of means, and five covariates. In OLS in R this would be

fit <- lm(Y ~ A  + B + C + D + U + V + W + X  + Z, data = dataset)

I should disclosed that my grasp of statistics is limited by the small amount of formal training that I have received. However, I'm fairly adept at reading documentation. (A hard won skill!)

Good luck, and please return with questions about implementation of your work in R.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.