If else statements on R data frame

Hi,

I am trying to make two new column in an already existing R data.frame, and putting in characters using using if else statements:

So, what I am doing and error messages:

  1. creating the dataframe with the extra columns (that are based on a specific column)

  2. Filling in values to meanSplit
    image
    error message

  3. Filling in values to quantiles


    error message

I would like to keep doing these operations without any packages or other functions.

I can see the error says " the condition has length > 1, but I don't understand what this really mean, and thus it's hard to change.

Hope you can help, thank you!

Best, Charlie

Hello,

your error occurs because you wrote if (myData[,1] ...). This basically means, compare the column 1of myData with a value. The result is a vector with TRUE/FALSE entries, but not ONE TRUE or FALSE value (which is necessary for if() to work. Regardless, here is a solution you could use in base R

# create the data
myData <- iris[,1:5]
myData$meanSplit <- myData[,1] # these are replicates of column 1, not 2 (indexing starts at 1)
myData$Quantiles <- myData[,1]

mean_val <- mean(myData[,1])

# insert values in meanSplit with if - else
for (i in seq.default(1,nrow(myData))){
  if (myData[[i,1]] > mean_val){
    # mean is less then value
    myData$meanSplit[[i]] <- 'Less'
  } else {
    # mean is greater then value
    myData$meanSplit[[i]] <- 'More'
  }
}

# quantiles
myData$Quantiles <- cut(
  myData[,1],
  breaks = quantile(myData[,1])[c(1,2,4,5)],
  labels = c('Low','middle','High'))

head(myData, n = 10)
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species meanSplit
#> 1           5.1         3.5          1.4         0.2  setosa      More
#> 2           4.9         3.0          1.4         0.2  setosa      More
#> 3           4.7         3.2          1.3         0.2  setosa      More
#> 4           4.6         3.1          1.5         0.2  setosa      More
#> 5           5.0         3.6          1.4         0.2  setosa      More
#> 6           5.4         3.9          1.7         0.4  setosa      More
#> 7           4.6         3.4          1.4         0.3  setosa      More
#> 8           5.0         3.4          1.5         0.2  setosa      More
#> 9           4.4         2.9          1.4         0.2  setosa      More
#> 10          4.9         3.1          1.5         0.1  setosa      More
#>    Quantiles
#> 1        Low
#> 2        Low
#> 3        Low
#> 4        Low
#> 5        Low
#> 6     middle
#> 7        Low
#> 8        Low
#> 9        Low
#> 10       Low

Created on 2022-09-01 by the reprex package (v2.0.1)

The second one includes base::cut(), which is a more appropriate way then using a bunch of if and else statements in a chain. But you can figure it out by yourself how to rewrite it with if and else statements if necessary, given the way of filling the meanSplit column I provided.

You might consider changing your statement for the meanSplit however, since More is kind of confusing if More means "mean is greater then value".

I hope this answers a) the error message and b) how to solve your issue to get working code with the expected results.

Kind regards

Hi,

Thank you so much for you answer. That was very helpful!

Questions to clarify:

  1. In general, when using an if-else statement on a vector, do you need to first create this for loop going through each values in order for it to work?

  2. Within the for loop for the meanSplit column, you use [[i,1]] and [[i]]. I have only used them in subtracting information from lists, so I don't understand them here as It's not a list .

  3. For the Quantiles column using cut(), the breaks gives an output of 4 numbers, but that you get to fit with the 3 labels. How come?

Best, Charlie

Regarding 1:
It depends what you want to do, if you want to replace value by value separately, you have to tell R to go trough every entry step by step. This can be achieved by a for loop or with while for example. However, since your comparison (myData[,1] > meanvalue) results in a vector of correct length, you can use the following alternative as well:

myData$meanSplit <- ifelse(myData[,1] > mean_val, 'Less', 'More')

Regarding 2:
Using [[i]] means "take the ith element of the object", whereas [[i,1]] means "take the element in ith row in first column". You can use double squared brackets to extract exactly one element, whereas single squared brackets can be used to extract an element (e.g. takes my_mat[[i]] and my_mat[i] the ith element of a matrix) or a complete row (my_mat[i,]) or a complete column (my_mat[,i]). However, since you use [[]] for lists, it is more convenient (I think) to use [[]] for single elements and [] for a complete row or column.

Regarding 3:
The given breaks define intervals of the form (a,b]. So if you give a vector of value to cut, e.g. c(a,b,c,d) with a,b,c,d \in \mathbb{R}, you will have a set of intervalls as follows:

I = \left\{(a,b], (b,c], (c,d]\right\}

This is the default behaviour which can be adjusted of course. However, in the end these are three intervalls, hence for four breaking points, you only need to have three labels which label the intervalls.

Kind regards

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.