read.csv didnt return a correct form for me to plot the scatter graph

earnest1209 · March 21, 2020, 1:15pm

I have a very simple question.
After i use the function read.csv to read a csv file using R
i tried to use lm to get the linear regression line for estimation.
But failed because the function didn't get the value in correct form.
Even i used as.numeric for the data and get the slope and intercept, the number seems weird

Here is my code:
data1=read.csv("....csv")
lm(as.numeric(Quantity.Sold)~as.numeric(Average.Price),data=data1)
ggplot(data1,aes(x=Average.Price,y=Quantity.Sold))+geom_point()

The value in x axis is not in ascending due to the "," signal
it is 10.1,10.6,...,12.7,9.7,9.9

I have no idea how to fix this problem, can anyone help me? Thanks in advance

andresrcs · March 21, 2020, 1:41pm

Can you share a sample of your csv file?

earnest1209 · March 21, 2020, 4:51pm

The csv file only consist these data, when i plot the graph, the item with sales=9715 will appear on top (R seeked it as the highest value among sales) but it actually wasn't. I am quite sure that the comma lead to the misinterpretion (read.csv function able to compare every 4 digit value and 5digit value but they always seek every 4 digit value as higher than those 5digits).

FJCC · March 21, 2020, 5:59pm

You have only posted an image of your data which is much less helpful than data in a copy-friendly format. I typed in the first two rows of data and wrote a function to remove the $ and commas from the numbers. See if this works for you.

DF <- read.csv("c:/users/fjcc/Documents/R/Play/Dummy.csv", stringsAsFactors = FALSE)
DF
#>   Year   Sales Population Advertising PreviosAdvertising
#> 1    1 $15,713    102,558     $20,000            $30,000
#> 2    2 $12,937    101,792     $15,000            $20,000
library(stringr)
library(dplyr)
CleanFunc <- function(COL) gsub("\\$|,", "", COL)
DF <- mutate_if(DF, is.character, CleanFunc)
DF
#>   Year Sales Population Advertising PreviosAdvertising
#> 1    1 15713     102558       20000              30000
#> 2    2 12937     101792       15000              20000

^{Created on 2020-03-21 by the reprex package (v0.3.0)}

nirgrahamuk · March 21, 2020, 7:57pm

as.numeric() will only convert the most obvious text strings to numbers (i.e. involving digits 0-9 and a single fullstop at most) for parseing anything more, like currencies, and strings with tick/comma formatting breaks you will need to rely on additional functionality, like parse_number from readr package

system · April 11, 2020, 7:57pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.