MLOps Maturity Model
Your organization gets more value from its data as its MLOps maturity increases. But, maturing isn’t as simple as waiting for your team to figure it out or increasing their budget until they scale out of their current issues. Sure, additional capital can help, but, if an ML practice isn’t configured to succeed, further investment will just exacerbate inefficiencies and potentially cause your operation to plateau.
Introduction
Every organization is unique, but there are some universal patterns in how ML practices tend to evolve within companies (these patterns generally transcend industries). Many companies are trying to solve very similar scaling problems. Operational bottlenecks emerge at predictable times. You can try to grow your organization with blinders (and run into hurdle after hurdle), or you can leverage the mistakes of others to scale smarter, faster, and more efficiently.
The ML Op Shop Maturity Model (illustrated below) captures and generalizes the five most common stages an ML organization evolves through. This framwork combines the most useful and actionable online information with over 16 years professional experience (see the Further Reading section below for more details).
We name these 5 stages in this model as follows:
Stage 1: Early Analytics
Stage 2: Mature Analytics
Stage 3: Early Machine Learning
Stage 4: Mature Machine Learning
Stage 5: Transcendence
It’s important to note that some aspects of your ML operation may sit in different stages, but overall, you should be able to map your own operation onto one of the 5 stages by reading on below.
Stage 1: Early Analytics
In this Stage your organization is starting its data driven journey and it’s your job to demonstrate that data can be useful. This is the first zero-to-one stage and the workflows you establish will pay compounding dividends going forward (as long as you can demonstrate enough value to merit further investment in ML).
Outgoing self-starters tend to do well in this stage. You want people who can build relationships, deal with ambiguity, mine for pain-points, and wear both Data Engineering and Business Analyst hats (Data Scientists and Project Managers are less important in this phase).
Main Focuses:
Data and opportunity discovery and ad-hoc reporting.
Questions you may be asking yourself:
What data do we have and where does it live? How do I gain access to it? What tools should I be using to create a dashboard/report? How do I get other people interested in working with me? How do I capture opportunities to add value?
Stage 2: Mature Analytics
In this stage you have a good handle on the available data and have established reports and dashboards that are regularly unlocking value. This is the first product-market-fit stage. It may feel like you have more work than time and you may even struggle to keep up with the demand for your work. Your job increasingly becomes about making your workflows as operationally efficient as possible so that you can scale them without running into maintenance issues. By the end of this stage, you should have a mature analytics function in your organization and this function should have a far reaching presence on the ground that sets them up for success in adding value to every department.
Organized and process-driven people tend to do well in this stage. You need to simplify and standardize ad-hoc workflows with the right amount of processes that will keep you lean enough to scale, but stable enough not to fail. The majority of your team will still consist of Data Engineers and Business Analysts. You may have a dedicated Project Manager for triaging the backlog from the increasing demand for your services, or you may have a Senior BI or Data Engineer fulfilling this role—we generally recommend going the organized Senior Engineering route (just make sure this person has effective stakeholder management skills or you can run into a bunch of quite common issues).
Main Focuses:
Workflow and dashboard standardization and documentation, scaling into untapped opportunities, meeting deadlines.
Questions you may be asking yourself:
How can I spend less time on maintenance? Why does my backlog seem to grow faster than I can pull from it? How can I automate this regular task? Are we still using the best tools? How do I most effectively allocate this additional budget allocation?
Stage 3: Early Machine Learning
In this stage you are starting the shift from reporting and presenting, to predicting and forecasting. This is the second zero-to-one stage and needs to happen in a central team before it can be scaled and decentralized. You have built out a decent data analytics engine, and can intuit patterns in the data that you would now like to formalize so your decision making can be more scientific.
ML generalists tend to do well leading initiatives in this stage, and you want them to be impact-enthusiasts rather than technology-enthusiasts—this is very important. They need to be able to identify and unlock low hanging fruit by speaking with business stakeholders and understanding their needs—it is quite likely that your organization can identify many opportunities that don’t require extremely complex technical solutions. These opportunities should be prioritized but can often be overlooked by technology-enthusiasts.
Main Focuses:
Opportunity discovery, proactive reporting, infrastructure build-out.
Questions you may be asking yourself:
How do I get other people interested in working with me? How do I capture opportunities for adding value with this expanded capability set? Does this count as data science? Will this really help us make better decisions? What new tools do we need to deploy these models?
Stage 4: Mature Machine Learning
In this stage you are reliably deploying models that unlock real business value. This is the second product-market-fit stage, you again may feel like you have more work than time and you may struggle to keep up with demand. Your job increasingly becomes about achieving higher levels of efficiency and automation. By the end of this stage, you should have an ML function in your organization with AutoML capabilities, and you should be starting to decentralize ML across your organization so that every department is exposed to its fullest potential.
Organized and process-driven people, and technology specialists tend to do well in this stage. You start to require dedicated MLOps specialists too; people who know how to coordinate people, processes, tools, and data efficiently, reliably, securely, and at scale. There are many moving parts to coordinate and they need to know how to do this while implementing lean principles.
Main Focuses:
Automation, Automation, Automation! Test automation, deployment automation, automatic validation, automatic model selection. Scaling to address untapped opportunities, meeting deadlines, expanding into new business units.
Questions you may be asking yourself:
How can my Data Scientists be more productive? Is our lead time on new and improved models as fast as it could be? Is this really the simplest model we can use to help us make that decision? How can I automate this workflow? How can I spend less time on maintenance? Why does my backlog seem to grow faster than I can pull from it? Are we still using the best tools? How do I most effectively allocate this additional budget allocation?
Stage 5: Transcendence
If you reach this stage, you have a well-oiled ML machine and it’s your obligation to ensure this expanded operational capacity penetrates as deeply as possible into the greater organization. Your goal in this stage is primarily one of cultural advocacy of deeper adoption, and coordination of efforts across departments.
The most popular working model in this stage is a hub and spoke ML Center of Excellence (that manages coordination and standardization) with a diffusion of applied SME ML practitioners across each department.
Main Focuses:
Advocacy/cultural cultivation, cross-departmental coordination, continuous improvement.
Questions you may be asking yourself:
Is our decentralization/centralization balance correct, do we have the correct trade offs? Is everyone in the company data aware? What are the future ML technology trends and do we have the capability to capitalize from them? Are we considering the correct strategic acquisitions? Can we package any of our tools as external services?
Speed Running the Stages
The goal of any ML Operation should be to negotiate through these stages as quickly and efficiently as possible. The transitions into Stage 1, and between Stages 2 and 3, and Stages 4 and 5 require operational transformations to expand capabilities, while the transitions between Stages 1 and 2, and Stages 3 and 4 are driven by scaling existing operations to expand into untapped opportunities. We have more information on transitioning between these stages in this post.
Further Reading:
here are many online resources depicting MLOps maturity models. While we have attempted to extract the most impactful aspects of each maturity model (combining online insights with our own experience), we encourage you to read and learn from these other online resources too. Google’s AI Adoption Framework whitepaper is a particularly influential resource and you will find its traces our work and also in Gartner’s AI maturity model (which is written for an executive audience and is perhaps the best presented to a non-technical audience—with Datatron’s 3M article getting an honorable accessibility mention!!). You will find many references to Microsoft’s Machine Learning operations maturity model in other people’s models, but we feel that this model places a little too much emphasis on DevOps’ influence over MLOps (to be useful as a generalized, industry-agnostic model); that said, Micorsoft’s framework is well-documented and will be the most appropriate maturity model for organizations in the software industry. Josh Poduska’s Seven Stages Model is a lovely read and introduces the concept of a “point of inflection” which we roughly map onto Stage 5 of our model (although you may notice we include two other points of inflection during Stages 1 and 3). Here is an unordered list of other resources we found to be useful in the curation of this material:
Amazon's MLOps foundation roadmap for enterprises with Amazon SageMaker
Alex Strick van Linschoten's Everything you ever wanted to know about MLOps maturity models
Learn more about Navigating the MLOps Maturity Model here.