Seeking suggestions on a team modeling evaluation metrics tool?

I work in a team where we build models to try to predict things. Often our team has differences of opinion on approaches and this sometimes leads to lively discussions. I've seen situations where one team member claims a superior approach but they are using their own logic and evaluation metrics which differ from the same standard that an existing model might have been held to.

In the past I entered a Kaggle competition and they have this platform and leaderboard where one submits their predictions on the same test data and a leaderboard is generated.

I wanted to ask the community if there's any ways to mimick this set up in a work team environment? A server, package or approach that would hold us all to the same standard and would allow us to test new ideas in an objective way.

Are there any recommended tools out there for this?