Training/Testing Data to Predict a column

I have two different csv files. One file is a training set with a column that I want to predict filled in and the other file is the testing data without the column I want to predict. How do I use the training data file to predict the column of the testing data. Here is an example:

0=no 1= yes
Training Data:
Age , Sex , Married , Salary
18 , M , 0 , $18000
29 , F , 0 , $76000
67 , F , 1 , $54000
34 , M , 0 , $120000
40 , F , 1 , $200000
25 , M , 1 , $340000
86 , F , 1 , $500000
19 , M , 0 , $120000

Testing Data:
Age Sex Married Salary
31 , M , ? , $88000
54 , M , ? , $76000
27 , F , ? , $62000
22 , M , ? , $20000
48 , M , ? , $200000
65 , F , ? , $300000
71 , F , ? , $5000000
18 , M , ? , $10000

I want to use the training data to predict the percentage of the person being married in the testing data. Any ideas? Thank you!

The outcome data don't need to be in both data sets. You would model the data where you have the outcome and use predict() to get the results from the model so that you can score the new samples.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.