How to check Outlier after imputing the missing value by Hmisc Package


#1

I am using Hmisc package for imputing the missing value. At first, I have converted all my dummy variables into factor. Then I am using aregimpute function from the HMisc package. I have written following code,

impute_miss <- aregImpute(~  MarketID + MarketSize + LocationID  + AgeOfStore  +  Promotion+  week+SalesInThousands , data =table.miss, n.impute = 5)

impute_miss

Then, I have completed the datasets by impute.transcan function. I have written following code for that,

completetable <- impute.transcan(impute_miss, imputation=1, data=table.miss, list.out=TRUE,pr=FALSE, check=FALSE) 

head(completetable)

still now, It works fine… No error.
But, after that, when i am going to check the outlier, it shows me the error.
i am using following code for checking outlier.

boxplot(completetable$SalesInThousands)

it is showing me the below error,

Error in k[...] : incorrect number of dimensions

i cant understand why i am getting this error? Please help me to solve this problem.

Any suggestion is really appreciable.

The full code

library(boot) 
library(car)
library(QuantPsyc)
library(lmtest)
library(sandwich)
library(vars)
library(nortest)
library(MASS)

setwd("D:\\R\\New folder")

table <- read.csv("Fn-UseC_-Marketing-Campaign-Eff-UseC_-FastF.csv")
View(table)

str(table)



## Creating Missing Value
library(missForest)

table.miss <- prodNA(table, noNA = 0.1)
summary(table.miss)

##missing value Checking
sapply(table.miss,function(x)sum(is.na(x)))

## Load the Package

library(Hmisc)
impute_miss <- aregImpute(~  MarketID + MarketSize + LocationID  + AgeOfStore  +
                            Promotion+  week+SalesInThousands , data = table.miss, n.impute = 5)

impute_miss

impute_miss$imputed$week

## Combined  The data sets

completetable <- impute.transcan(impute_miss, imputation=1, data=table.miss, list.out=TRUE,pr=FALSE, check=FALSE) 

head(completetable)

str(completetable)




##outlier treatement
names(completetable)

summary(completetable)
boxplot(completetable$SalesInThousands)

I am getting the below error during the outlier treatment

boxplot(completetable$SalesInThousands)
Error in k[...] : incorrect number of dimensions

Thanks,
snandy2011Preformatted text


#2

Hi, can you try putting this in reprex? It’ll help answer your question.


#3

hey @mishabalyasin, Thanks for your reply… I am trying to reprex this code, but it continuously showing the error,

No input provided and clipboard is not available.
Unable to put result on the clipboard. How to get it:

  • Capture what reprex() returns.
  • Use outfile = "foo" to request output in specific file.

I am giving you the full code as a text mode over here, please try to co-operate and help me to solve this probelm.


#4

I’m not sure what the exact issue with reprex is, but from the code above it is impossible to say (for me, at least) what the problem that you have since it depends on data that I don’t have, so I can’t replicate the steps you took. Can you try putting output of commands above into the post too?

There is command in the editor called “Preformatted text” that you should use to input your code and output, otherwise it is difficult to read.

Preformatted text will make your code and output look like this:

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Without it it’ll look like this:
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

As you can see, it is easier to read.


#5

I did the Preformatted text… Thank you very much…


#6

If you type above code into console, what is the output?

As a general strategy, it is easier to solve multiple small problems, rather than one big problem, so try to delete lines in your code that have nothing to do with the problem (e.g., table$MarketID <- as.factor(table$MarketID) and such). This will make it easier to understand where the problem is


#7
completetable$SalesInThousands

If i run this code it is showing the all the SalesInThousands value…

like this:

head(completetable$SalesInThousands)

and the output is,

 33.73  47.50* 29.03  39.25  27.81  34.67 

#8

Why is there a star next to 47.50?

Does this work: boxplot(c(1,2,3))?


#9

Why is there a star next to 47.50?

Ans : Because, there was a missing value, I have imputed the missing value through aregImpute function, then combined with original datasets by impute.transcan function.So, it conveniently puts an asterisk next to each imputed value.

Does this work: boxplot(c(1,2,3))?

Ans: Yes, It works.It shows the boxplot…


#10

If there is a star next to it then class of the vector is probably not numeric.
What happens if you convert it to numeric explicitly, like this: boxplot(as.numeric(completetable$SalesInThousands))?


#11

Oh, my god!! It works… Genius Sir you are… Thank you thank you thank you thank you very much sir…Today i have learned missing value imputation through Hmisc package and I was facing this error.But thanks a lot sir…you solved.Thanks a lot, sir. It was a great learning experience for me.


#12

If you dont mind can i ask you a question?
can i have your gmail id, please? It will be really awesome to learn over there rather than here.My mail id is : sanalytics2018@gmail.com

I promise you i wont trouble you more :smile:

Please. Thanks once again for guiding and helping me on today.