@FJCC, thanks for your response, very helpful.
About Q1:
Question 1: at what point I have to use: as.factor()?
About Q3: I was thinking also on the future new values (Production).
Version 1:
[1] { dataset_training, dataset_testing } = split_dataset(dataset)
# training time
[2] { dataset_training_normalized, norm_config } = normalize(dataset_training)
[3] trained_model = train(dataset_training_normalized)
# testing time
[4] dataset_testing_normalized = normalize(dataset_testing, norm_config)
[5] test_result = test(trained_model, dataset_testing_normalized)
# production time
[6] input_normalized = normalize(input, norm_config)
[7] prediction = predict(trained_model, input_normalized)
I know [4] could be done in a previous step as follows:
Version 2:
[1] { dataset_normalized, norm_config } = normalize(dataset)
[2] { dataset_training_normalized, dataset_testing_normalized } = split_dataset(dataset_normalized)
# training time
[3] trained_model = train(dataset_training_normalized)
# testing time
[4] test_result = test(trained_model, dataset_testing_normalized)
# production time
[5] input_normalized = normalize(input, norm_config)
[6] prediction = predict(trained_model, input_normalized)
But with Version 1, the testing time is more similar to the production time because it also contains the normalization step. Then I can test both things at the same time: normalization and model.
Question 2: does this make sense? if not or not at all, please, let me know what do you think?
Question 3: how do we get the values for: { trainMean, trainSD }? what function do I need to call and how?
Question 4: by the way, could you suggest me the model(s) that fit the more to the problem I described above?, if you want you can enlarge the embedded image.
Thanks!