read.csv didnt return a correct form for me to plot the scatter graph

I have a very simple question.
After i use the function read.csv to read a csv file using R
i tried to use lm to get the linear regression line for estimation.
But failed because the function didn't get the value in correct form.
Even i used as.numeric for the data and get the slope and intercept, the number seems weird

Here is my code:

The value in x axis is not in ascending due to the "," signal
it is 10.1,10.6,...,12.7,9.7,9.9

I have no idea how to fix this problem, can anyone help me? Thanks in advance

Can you share a sample of your csv file?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

You have only posted an image of your data which is much less helpful than data in a copy-friendly format. I typed in the first two rows of data and wrote a function to remove the $ and commas from the numbers. See if this works for you.

DF <- read.csv("c:/users/fjcc/Documents/R/Play/Dummy.csv", stringsAsFactors = FALSE)
#>   Year   Sales Population Advertising PreviosAdvertising
#> 1    1 $15,713    102,558     $20,000            $30,000
#> 2    2 $12,937    101,792     $15,000            $20,000
CleanFunc <- function(COL) gsub("\\$|,", "", COL)
DF <- mutate_if(DF, is.character, CleanFunc)
#>   Year Sales Population Advertising PreviosAdvertising
#> 1    1 15713     102558       20000              30000
#> 2    2 12937     101792       15000              20000

Created on 2020-03-21 by the reprex package (v0.3.0)

as.numeric() will only convert the most obvious text strings to numbers (i.e. involving digits 0-9 and a single fullstop at most) for parseing anything more, like currencies, and strings with tick/comma formatting breaks you will need to rely on additional functionality, like parse_number from readr package


The csv file only consist these data, when i plot the graph, the item with sales=9715 will appear on top (R seeked it as the highest value among sales) but it actually wasn't. I am quite sure that the comma lead to the misinterpretion (read.csv function able to compare every 4 digit value and 5digit value but they always seek every 4 digit value as higher than those 5digits).