For one of my courses involving Big data we were asked to merge several variables from different datasets into a single dataset, we have chosen the topic of response time and used SPSS to merge the several datasets into one. Literature study pointed to series of variables being responsible for response time, hence why we chose path analysis instead of regular regression. Spss does not fully support path analysis by means of chi and model fit, therefore we moved to R.
How I introduced the SPSS file into R, created the covariance matrix and defined the number of rows is the following;
ResponseTimeData = read.spss("file", to.data.frame = TRUE)
### variables that need to be excluded because not necessary
myvars <- names(ResponseTimeData) %in% c("Fire_ID","Hour_212223","Borough_5", "Loc_Zipcode", "Structural_Fires")
newdata <- ResponseTimeData[!myvars]
### There are some missing data points in a single variable (is this the correct form of putting this?)
ResponseFire <- na.omit(newdata)
### items for SEM
Num=length(ResponseFire[,1])
S=var(ResponseFire)
The variance matrix returns a warning In var(ResponseFire) : NAs introduced by coercion for which I do not know what to do about.
With missing values a path analysis can ofcourse not be conducted (or can it??)
Hoping someone can assist me in how to track the missing values (preferably without dropping the variable with the NA's...) or how to solve this.