Hi rdubai,
This is a case where the dplyr::case_when() function is really helpful.
See the reproducible example below, in which I create a toy dataset and read it in as character vectors.
Then I use mutate() and case_when() to identify cases with blank cells or a single space in cells and convert these to zero (character "0"). Then I can convert these to numeric, and the product_id into a factor.
library(tidyverse)
excel_data <- data.frame( # create data
product_id = c(rep('001',5), rep('002',5),
rep('003',5), rep('004',5)),
weekly_sales_qty = c(12,-5,""," ",12, "",6, 32, -2,3,
4,-1,"", 4, 5,7, 3, -3, "", 14),
stringsAsFactors = FALSE #read in as character vectors
)
head(excel_data) # take a look at data with blanks
#> product_id weekly_sales_qty
#> 1 001 12
#> 2 001 -5
#> 3 001
#> 4 001
#> 5 001 12
#> 6 002
excel_data %>%
mutate(weekly_sales_qty = case_when( # conversions
weekly_sales_qty == "" ~ "0", # truly blank to zero
weekly_sales_qty == " " ~ "0", # one space to zero
TRUE ~ weekly_sales_qty # the rest remain the same
)) %>% # now switch to numeric for qty
mutate(weekly_sales_qty = as.numeric(weekly_sales_qty)) %>% # now switch to factor for product_id
mutate(product_id = as.factor(product_id)) ->
clean_data
clean_data # show data
#> product_id weekly_sales_qty
#> 1 001 12
#> 2 001 -5
#> 3 001 0
#> 4 001 0
#> 5 001 12
#> 6 002 0
#> 7 002 6
#> 8 002 32
#> 9 002 -2
#> 10 002 3
#> 11 003 4
#> 12 003 -1
#> 13 003 0
#> 14 003 4
#> 15 003 5
#> 16 004 7
#> 17 004 3
#> 18 004 -3
#> 19 004 0
#> 20 004 14
Created on 2019-11-05 by the reprex package (v0.3.0)
for guidance on making your own reprex (which helps get questions answered), take a look at