deidentify and duplicate data

Welcome to RStudio community, @cwiggz! We can give you a bit of general guidance here, but I think we'll probably need you to make a reprex, or reproducible example, in order to properly help you.

The reprex will have stuff like:

  • The code you're using (not just the line you're stuck on or the error you're getting); and
  • A sample of the data you're using—or, if you can't provide that, some simulated data that is a similar shape (eg. the same columns).

If you can prep something like this for us, it'll give us a whole lot more context that can help us get to the root of the problem :slight_smile:

That said, it seems like there are a few things going on here that we can help with. I'm not familiar with a deidentify() function in R. is this supplied by a package you're using? (This is one of the benefits of supplying a reprex: it can help us establish where things come from!).

If there are @ symbols in your student numbers, you can remove them using the str_replace() function in the readr package.

I'm not quite sure I understand your explanation of how you want duplicates to be handled. If you could give us an example of a correctly handled duplicate along with your reprex, we can probably help you work that out :slight_smile:

Thanks!

1 Like