How to clean Data in R?

Hi,
I am new to R. So I have a question.
While cleaning data in the spreadsheet we check for data integrity keeping data constraints in our mind. say for example:

  1. if a column contains IDs of 10 digits, one will check that column if all values are of 10 digits or not. Same thing how we do it in R in terms of packages and functions to call?

It is unclear from your question what you want to do if any element in the relevant column violates the 10-character constraint, but here is a simple example that might help get you started:

library(tidyverse)

digits <- tibble(digits = c(1234567891, 123456789))

digits %>% 
  mutate(digits = digits, 
         length = nchar(digits),
         len_check = case_when(length == 10 ~ TRUE,
                               TRUE ~ FALSE))
#> # A tibble: 2 × 3
#>       digits length len_check
#>        <dbl>  <int> <lgl>    
#> 1 1234567891     10 TRUE     
#> 2  123456789      9 FALSE

Created on 2021-10-05 by the reprex package (v2.0.1)

1 Like

the validate package may be useful
The Data Validation Cookbook (r-project.org)

1 Like

Thanks E, this is exactly what I ment

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.