Aim: To predict companies Credit Rating

Training Data: Internal data with financial numbers and financial ratios spanning across 3 years

Training Data Target Variable: Credit Rating with 20 discrete values

Training Data Remarks: Existence of missing data

Scoring Data: External data from various data sources with financial numbers and financial ratios spanning across 3 years

Scoring Data Remarks: Existence of missing data is higher than Training Data depending on data source

Which method should i use to predict credit rating? Logistics Regression comes to my mind first. However, there are missing values in both training and scoring data. A lot of imputation needs to be done and the model may not be accurate. I can accept predicting Credit Rating into 3 groups:

Group 1: A to F

Group 2: G to L

Group 3: M to V

rather than predicting the 20 discrete value from A to V.

I think the accuracy of a 20 discrete value model will be challenging.

Can anyone advise me?