I have a very simple question.
After i use the function read.csv to read a csv file using R
i tried to use lm to get the linear regression line for estimation.
But failed because the function didn't get the value in correct form.
Even i used as.numeric for the data and get the slope and intercept, the number seems weird
Here is my code:
data1=read.csv("....csv")
lm(as.numeric(Quantity.Sold)~as.numeric(Average.Price),data=data1)
ggplot(data1,aes(x=Average.Price,y=Quantity.Sold))+geom_point()
The value in x axis is not in ascending due to the "," signal
it is 10.1,10.6,...,12.7,9.7,9.9
I have no idea how to fix this problem, can anyone help me? Thanks in advance
The csv file only consist these data, when i plot the graph, the item with sales=9715 will appear on top (R seeked it as the highest value among sales) but it actually wasn't. I am quite sure that the comma lead to the misinterpretion (read.csv function able to compare every 4 digit value and 5digit value but they always seek every 4 digit value as higher than those 5digits).
You have only posted an image of your data which is much less helpful than data in a copy-friendly format. I typed in the first two rows of data and wrote a function to remove the $ and commas from the numbers. See if this works for you.
as.numeric() will only convert the most obvious text strings to numbers (i.e. involving digits 0-9 and a single fullstop at most) for parseing anything more, like currencies, and strings with tick/comma formatting breaks you will need to rely on additional functionality, like parse_number from readr package