Omit() function is not working


#1

Hi
I have used na.omit() to remove missing values from a csv file but it is not working.
Can anyone suggest me?

Thanks


#2

Hi @rock,

what did you try already ? Can build an example of what you want to achieve ? Some small data and desired output ?

What i could say without the info you gave

  • You can use readr :package: to import your csv as a tibble in R
  • You can manipulate you tibble using dplyr :package:. mutate(tab, newcol = col2 - col1) will add a newcol column to the tab data.frame

With this you should be able to achieve what you want


#3

Or alternately try data.table and use fread() to load the data in R and then use
Df[,third_col:=first_col - sec_col]


#4

Hi

Thanks for the reply.Unfortunately the mutate() did not work.Please see the following.

setwd("C:/INFT6201")

moviedata = read.csv("moviedata.csv", header=TRUE, sep=",", dec = ".", na.strings ="?")

install.packages("dplyr")

library(dplyr)

mutate(moviedata, moviedata$profit = gross - budget)

Warning message:

In Ops.factor(gross, budget) : ‘-’ not meaningful for factors

Can you suggest me how to solve this situation?


#5

Hi
How to omit missing values in a csv file?


#6

With dplyr you do not need $ notation. Use the column name directly
mutate(moviedata, profit = gross - budget)

Also, according to the warning you get, you should check the column type to insure that both gross and budget column are numeric.

If you can provide a small example of your dataset it would be easier to show you an example.


#7

Thank you,
please see the following

x<-as.numeric(moviedata$gross)

y<-as.numeric(moviedata$budget)

mutate(moviedata, profit = x - y)

moviedata$sequelcat <- factor(moviedata$dummy_sequel, levels = c(0, 1),
                          labels = c("ORIGINAL", "SEQUEL"))

ggplot(moviedata, aes(x = sequelcat, y = profit,  fill = year)) + 
geom_violin() +                    # Make it a Violin Plot
theme_bw() +                       # Change Background Color
labs(title = "Weight of Cars by Origin") + # Add a Title     ylim(1500, 5200) +                 # Range for Y-Axis
xlab("Origin") +                   # Label for X-Axis     ylab("Weight (lbs)") +             # Label for Y-Axis
guides(fill=FALSE) +               # Remove the legend
 scale_fill_manual(values=c("#666666", "#999999", "#BBBBBB")) + # Change Fill Color
 geom_boxplot(width=0.2)            # Add a Box Plot on Top

Error: Discrete value supplied to continuous scale
Please can you point me how to post questions in the forum?


#8

WIth ggplot2, this error message means that your geom (the function that creates a visualization from your data and statistical mappings) is expecting something that can fit on a continuous scale (think, integer or a real number), but is given a factor (think categorical variable).

It would be really helpful to sort this if we had a reprex.

Also, what might be quite helpful is to get a sense of the data you're dealing with. Could you supply the structure of your data with? You can do this with the str() function.