variable in a dataframe

I chose a dataframe from kaggle. The df has 11 columns, I tried to give a corrplot for this df, unfortunately I received an error message: " 'x' must be numeric". When I ask for id's class, the answer is "function".![!
I don't understand how this varieble became a function. What can I do for fix it?
df|690x191
error

Please show the code that produced the objects id and train. Which kaggle data set are you using?

data inmport from kaggle

code:
train=read.csv(file.choose(), header=T, sep=",")
chart.Correlation(train, histogram=TRUE, pch=19)Customer Segmentation Classification | Kaggle

I downloaded the train.csv file from the link you posted. When I read it in and look at its structure, most of the columns are not numeric. Is this the data set you are working with?

Train <- read.csv("Train.csv")
str(Train)
'data.frame':	8068 obs. of  11 variables:
 $ ID             : int  462809 462643 466315 461735 462669 461319 460156 464347 465015 465176 ...
 $ Gender         : chr  "Male" "Female" "Female" "Male" ...
 $ Ever_Married   : chr  "No" "Yes" "Yes" "Yes" ...
 $ Age            : int  22 38 67 67 40 56 32 33 61 55 ...
 $ Graduated      : chr  "No" "Yes" "Yes" "Yes" ...
 $ Profession     : chr  "Healthcare" "Engineer" "Engineer" "Lawyer" ...
 $ Work_Experience: num  1 NA 1 0 NA 0 1 1 0 1 ...
 $ Spending_Score : chr  "Low" "Average" "Low" "High" ...
 $ Family_Size    : num  4 3 1 2 6 2 3 3 3 4 ...
 $ Var_1          : chr  "Cat_4" "Cat_4" "Cat_6" "Cat_6" ...
 $ Segmentation   : chr  "D" "A" "B" "B" ...
2 Likes

Yes, this is the df. I convert some variables to factor and some to numeric (maybe it's not a good idea?)

The problem is the nature of the data, of your imported dataset.
You must evaluate the type of variables (features) and also cleaning the unnecessary information. These processes are called Feature Engineering and Data Wrangling .
That is the reason why the in the cor(train) you have X must be numeric, you need a numerical variable in this case.
This dataset (with some arrangements) could be used more for a Logistic regression or even a support vector classification.

1 Like

I really appreciate your help :pray: thank you!!
I will do what you suggested above

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.