I have a dataset with `28`

variables (`6`

continuos + `22`

categorical).

As you can see below, some of the categorical / factor variables have several levels.

```
> str(myds)
'data.frame': 841500 obs. of 28 variables:
$ score : num 0 0 0 0 0 0 0 0 0 0 ...
$ amount_sms_received : int 0 0 0 0 0 0 3 0 0 3 ...
$ amount_emails_received : int 3 36 3 12 0 63 9 6 6 3 ...
$ distance_from_server : int 17 17 7 7 7 14 10 7 34 10 ...
$ age : int 17 44 16 16 30 29 26 18 19 43 ...
$ points_earned : int 929 655 286 357 571 833 476 414 726 857 ...
$ registrationDateMM : Factor w/ 9 levels ...
$ registrationDateDD : Factor w/ 31 levels ...
$ registrationDateHH : Factor w/ 24 levels ...
$ registrationDateWeekDay : Factor w/ 7 levels ...
$ catVar_06 : Factor w/ 140 levels ...
$ catVar_07 : Factor w/ 21 levels ...
$ catVar_08 : Factor w/ 1582 levels ...
$ catVar_09 : Factor w/ 70 levels ...
$ catVar_10 : Factor w/ 755 levels ...
$ catVar_11 : Factor w/ 23 levels ...
$ catVar_12 : Factor w/ 129 levels ...
$ catVar_13 : Factor w/ 15 levels ...
$ city : Factor w/ 22750 levels ...
$ state : Factor w/ 55 levels ...
$ zip : Factor w/ 26659 levels ...
$ catVar_17 : Factor w/ 2 levels ...
$ catVar_18 : Factor w/ 2 levels ...
$ catVar_19 : Factor w/ 3 levels ...
$ catVar_20 : Factor w/ 6 levels ...
$ catVar_21 : Factor w/ 2 levels ...
$ catVar_22 : Factor w/ 4 levels ...
$ catVar_23 : Factor w/ 5 levels ...
```

**My question is:** Given the current situation, how can I do the data exploration with them in order to decide whether include them on the model or not (I will use Neural Networks).

When the number of level for categorical variables is low I can use tools like:

- DataExplorer::plot_correlation()
- Gower / cluster::daisy()

But my problem here is that the amount of data is huge.

Thanks!