As a start, I'd look at the question as how do diagnostic codes differ between a population of providers submitting data and the reference panel that performs its own analysis. Because, by definition, the panel is right that could be formalized as Y \sim X_i \dots x_n where Y is the binary outcome of agreement/non-agreement between providers and the panel. That's a problem in logistic regression with the potential to provides an odds ratio for results given the suite of variables.
You might see, for example, an odds ratio of 1.0, meaning that as a whole the diagnoses between the population and panel are unaffected by the suite of variables collected. Or OR > 1, that the presence of the variables indicates a higher likelihood of mis-diagnosis. Or OR < 1, lower.
Of course, for diagnostics as a whole, the measure may not be so useful. Perhaps a dermatologist with a busy Sunbelt practice with a geriatric population might be doing scalp biopsies of careless golfers on a daily basis, while one in Fairbanks, Alaska might not see one from year to year. So, you'd want to stratify to identify how to compare likes with likes.
This strikes me as something akin to my experience as a senior lawyer for a major bank. A couple of my go-to outside lawyers had very high hourly rates. I wouldn't go to them for everyday stuff but I never hesitate to call them on what seemed to me to be a novel issue. For one expensive hour, I'd get everything I needed in a timely, direct and useful way. On the other hand, if I went to a firm that did a lot of routine work, I might pay a blended hourly rate three times less than the superstar lawyers. It would take longer and be nowhere near as direct and useful.
In other words, high level expertise can save you a ton of money. It' easy for anyone's management to fall into the trap of why should I pay money when we can get this done in-house for free? Well, it's not really free, of course and once a system is built, it's going to be around a long time and potentially be a tar pit of resources if it turns out to have been poorly thought out. We've all been there.
I can put you in touch with who I'd go to in your situation. He's a recently retired biostatistician with deep clinical experience whose package is every serious
R user's statistics toolkit. He has only a tenuous idea of who I am but I'm pretty sure I can find his details. DM me if you think that would be helpful.