Hi all,
I tried searching for similar cases, but somehow, I am getting the opposite responses.
Here goes my query:
I am trying to import large data set from excel to R, which consists sales data.
column consists weekly sales qty.
Row consists the product ID against which sales happened.
this raw data has many excel cells where sales value is negative (example: -5, because of return of sold product), some cells are blank (because of no sell on particular week.).
My requirement:
While doing read.csv function, I want to directly convert "blank cell" and "negative values" to numerical "0".
Please advice.
This is a case where the dplyr::case_when() function is really helpful.
See the reproducible example below, in which I create a toy dataset and read it in as character vectors.
Then I use mutate() and case_when() to identify cases with blank cells or a single space in cells and convert these to zero (character "0"). Then I can convert these to numeric, and the product_id into a factor.
library(tidyverse)
excel_data <- data.frame( # create data
product_id = c(rep('001',5), rep('002',5),
rep('003',5), rep('004',5)),
weekly_sales_qty = c(12,-5,""," ",12, "",6, 32, -2,3,
4,-1,"", 4, 5,7, 3, -3, "", 14),
stringsAsFactors = FALSE #read in as character vectors
)
head(excel_data) # take a look at data with blanks
#> product_id weekly_sales_qty
#> 1 001 12
#> 2 001 -5
#> 3 001
#> 4 001
#> 5 001 12
#> 6 002
excel_data %>%
mutate(weekly_sales_qty = case_when( # conversions
weekly_sales_qty == "" ~ "0", # truly blank to zero
weekly_sales_qty == " " ~ "0", # one space to zero
TRUE ~ weekly_sales_qty # the rest remain the same
)) %>% # now switch to numeric for qty
mutate(weekly_sales_qty = as.numeric(weekly_sales_qty)) %>% # now switch to factor for product_id
mutate(product_id = as.factor(product_id)) ->
clean_data
clean_data # show data
#> product_id weekly_sales_qty
#> 1 001 12
#> 2 001 -5
#> 3 001 0
#> 4 001 0
#> 5 001 12
#> 6 002 0
#> 7 002 6
#> 8 002 32
#> 9 002 -2
#> 10 002 3
#> 11 003 4
#> 12 003 -1
#> 13 003 0
#> 14 003 4
#> 15 003 5
#> 16 004 7
#> 17 004 3
#> 18 004 -3
#> 19 004 0
#> 20 004 14
Hello, great to see your response. I am just putting it in work. But I see this part will only identify blanks, spaces to zero. Another portion of my query is to convert any negative sales qty to 0. example: -15 to 0. How do I do it ? thanks a lot for response!
It would be easier to help you with your specific problem if you provide a proper REPRoducible EXample (reprex) or at least sample data on a copy/paste friendly format (like explained in the link).
Please read the guide more carefully, we don't need your whole dataset, we just need a small sample and just the part of your code that is relevant for the issue.