Discretising a dataset - Solution

Also I noticed you edited the reply I made...are these corrections to make them work?

Earlier you said cut() wouldn't work for you. Can you show us what you tried with it?

Since the example a couple of messages ago produces the output you want it seems like cut(), which is in base, is what you need.

Yeah, it appeared not work for me since r was returning whatever my dataset is, the code I used was:

disc<-function(dataset,bins) {
for(i in 1:ncol(dataset))
{
if(is.numeric(dataset[,i])==“TRUE”)
{
maxval<-max(dataset[,i])
minval<-min(dataset[,i])
width<-(maxval-minval)/bins
dataset[,i]<-cut(dataset[,i], breaks=seq(minval,maxval,bins))
}
return(dataset)
}
}

I hope this helps
Ronnie

For future reference your question would have be much easier and quicker for us to answer if you had done the following

  1. posted the code you were having a problem with in a reprex. When you said that cut() just didn't work for you it made us think that it didn't produce the results you wanted and didn't give a chance to see you were using it incorrectly

  2. posted data or toy data we could easily use to duplicate what you are trying.

  3. clearly define the input and output you are looking for.

if your are going to post code always wrap it in

```{r}
and
```

Otherwise most of the time we won't be able to just copy paste it to try it out

But most important learn to use reprex.

There are two errors in your code

disc<-function(dataset,bins) {
	for(i in 1:ncol(dataset))
	{
		# subsetting returns a data.frame, not a vector
		# a data.frame is never a numeric
		if(is.numeric(dataset[,i])=="TRUE")
		{
			print("numeric")
			maxval<-max(dataset[,i])
			minval<-min(dataset[,i])
			width<-(maxval-minval)/bins
			# same thing here, dataset[,1] is a data.frame not
			# a numeric vector. cut() requires a numeric vector
			dataset[,i]<-cut(dataset[,i], breaks=bins)
		}
		return(dataset)
	}
}

check out unlist()

tbl <- tibble::tribble(
    ~Income,    ~Loan,
    12, T,
    13, T,
    14, T,
    12, F,
    14, T,
    16, T,
    18, F,
    33, T,
    22, F,
    24, F,
    46, F,
    53, F,
    24, F,
    19, F,
    25, F,
    32, T,
    33, T,
    37, F,
    21, F,
    25, T
)

disc<-function(dataset,bins) {
    for(i in 1:ncol(dataset))
    {
        # unlist to make numeric vector out 
        # of single column data.frame
        if(is.numeric(unlist(dataset[,i]))=="TRUE")
        {
            print("numeric")
            maxval<-max(dataset[,i])
            minval<-min(dataset[,i])
            width<-(maxval-minval)/bins
            # unlist to make numeric vector out 
            # of single column data.frame
            dataset[,i]<-cut(unlist(dataset[,i]), breaks=bins)
        }
        return(dataset)
    }
}

disc(tbl, 4)
#> [1] "numeric"
#> # A tibble: 20 x 2
#>    Income      Loan 
#>    <fct>       <lgl>
#>  1 (12,22.2]   T    
#>  2 (12,22.2]   T    
#>  3 (12,22.2]   T    
#>  4 (12,22.2]   F    
#>  5 (12,22.2]   T    
#>  6 (12,22.2]   T    
#>  7 (12,22.2]   F    
#>  8 (32.5,42.8] T    
#>  9 (12,22.2]   F    
#> 10 (22.2,32.5] F    
#> 11 (42.8,53]   F    
#> 12 (42.8,53]   F    
#> 13 (22.2,32.5] F    
#> 14 (12,22.2]   F    
#> 15 (22.2,32.5] F    
#> 16 (22.2,32.5] T    
#> 17 (32.5,42.8] T    
#> 18 (32.5,42.8] F    
#> 19 (12,22.2]   F    
#> 20 (22.2,32.5] T

I'm still having the same problemm with the first code you gave, since I'm finding that it just returns the dataframe I use without discretising it.
Whereas with the second code you've provided, i have to type disc(tbl, 2) to get the solution, where what I want to be able to type is disc(loan, 2), and get the output.
Is it possible to modify any of of them such that when you type disc(loan,2) you receive the discretised dataset?

I should have made this part clearer a lot earlier on to save confusion and time, so the output you have received from discretising the dataset is exactly what I want, but I want the function to receive that from typing in disc(loan,4), instead of disc(tbl,4)

The first example was your code with comments where there were errors in it, that's why you got the same results. Did you look at that the first example to see what the problems with it were?

I don't see what the problem is you are having. You can type anything you want in the in the first argument of the call to disc().

You have to show the code you are having a problem with.

Sorry about that, I've just noticed what my mistakes were and have corrected them.
Many thanks for the help you've given, its greatly appreciated.
Ronnie

Your welcome. Please mark it as a solution so if others look at this topic they can find it.