I am trying to remove rows that have a certain value in a specific column but am getting an error.

I am trying to perform a simple task: Remove rows from my data frame that have a value of 0 in a column titled "complete". A working example is below, but the dataset I am working with is VERY large.

# Creating a sample data frame
data <- data.frame(
  id = c(1, 2, 3, 4, 5),
  name = c("John", "Jane", "David", "Michael", "Emily"),
  age = c(25, 30, 35, 40, 45)

# Removing rows with the name "David"
data_filtered <- data[data$age != 30, ]

# Printing the resulting data frame

It works when I run the above code, but when I try to do the same thing with my own data, I am getting the error message below. My data is called "prepost.reccee.data" and I want to keep rows where the column "complete" != 0.

Error in vec_equal():
! Can't combine ..1 <haven_labelled> and ..2 .
** 1. prepost.reccee.data[prepost.reccee.data$complete != 0, ]**
** 3. vctrs:::!=.vctrs_vctr(prepost.reccee.data$complete, 0)**
** 4. vctrs::vec_equal(e1, e2)**

Thank you for any help! :slight_smile:

This is the immediate cause. The import of prepost.reccee.data resulted in an intermediate object not a data frame that needs be converted to a data frame before you can proceed to do much of anything within using normal R functions.

The goal of haven is not to provide a labelled vector that you can use everywhere in your analysis. The goal is to provide an intermediate datastructure that you can convert into a regular R data frame. You can do this by either converting to a factor or stripping the labels. Vignette.

Once that step is complete, the error thrown will no longer appear. That is no guarantee that the rest of the code will work, but is a necessary start. Come back with a fresh post if you are still having difficulties after resolving the conversion.

Unless you want to use the following snippet.

# I never name objects after built-ins
# assumes that rownames have been converted to a variable
d <- data
filtrate <- d[!which(d$name != "David" & d$age != 30),]

Read: return the row indexes of d that excludes instance in which both conditions are satisfied and includes all column indexes and use the resulting index matrix to subset d Presumably that triggers a further operation to complete complete. I would initially set the value of all rows in column complete to FALSE then use filtrate to change the value of the surviving rows to TRUE.


Thank you for your help! I believe it was an issue with the data originally being in an spss format. I had to download the Haven package and then converted the variable using as.factor(). Then it worked for me!

1 Like