Refactor to go faster

If you’ve noticed your team is starting to take longer to build out solutions, new members are taking longer to onboard, existing models are often breaking, or you are running into scaling issues, it’s likely that you have a complexity problem.

Poorly maintained codebases and under/over engineered pipelines: accumulate technical debt, lead to growing performance issues, and make collaboration more difficult over time. Sound familiar? If so, you should consider implementing a refactoring (which is a controlled technique for improving the design of an existing codebase) intervention to simplify your codebase and alleviate pressure on your team.

While refactoring is an attractive tool for any scientific computing team, it also comes with its own set of risks and drawbacks. This blog post will guide you through the process of understanding when to invest in a refactoring intervention, and if so, how to de-risk that investment.

Step 1: Establish and Track Performance Metrics

Before you make any refactoring investment you need to verify that you have a problem, and quantify its size. If you are not already tracking workflow and codebase performance then now it’s time to start. We recommend the following 3 step approach as it achieves the key objective of assessing performance metrics, while having the ancillary benefit of getting your team more engaged with MLOps best practices:

Establish and track performance metrics: Each performance metric should measure time, money, and/or reliability. We recommend all ML teams track: compute costs ($ per month), average time to deploy (days per model), average time to train (days per model), bug rates (monthly bug count/number of solutions in production), model drift (accuracy delta/month), model accuracy (%), business value created (dollars/month). Note that your team may also need to track more bespoke performance metrics specific to your industry.
Conduct code and pipeline reviews: Audit your codebase and pipelines to identify areas where MLOps best practices could be reinforced. Use Refactoring by Martin Fowler as your guide. If you are not already doing so, now will also be a good time to establish a regular review process for your team so that any changes are made to codebases and pipelines get vetted.
Establish a feedback loop: As you gather new insights, you should let your team know so that they are aware of their impact and feel empowered to take corrective action. You should incorporate regular check-ins, post-mortem analyses of incidents, and retrospective meetings to identify areas for improvement.

Code and pipeline complexity often unwinds naturally (without need for a formal refactoring intervention) once your team becomes more best-practice conscious. However, this is not always the case (especially if your team is new to best-practices, or the complexity is endemic). If the problem doesn’t resolve itself or you need to speed up the resolution rate, read on.

Step 2: Design a code refactoring intervention

Once you are tracking your team’s performance metrics, you are well positioned to design a refactoring intervention that will allow you to reach new performance targets. We like to apply Gojko Adzic’s Impact Mapping technique in this stage as it brings all relevant stakeholders together to focus on identifying the desired business outcomes and working backward to determine the necessary steps to achieve those outcomes. Impact Mapping ensures: the intervention will be aligned with the desired business outcomes, relevant stakeholders will have bought in, and that the intervention will exert the minimum energy needed to meet the business objectives. The Impact Mapping process is as follows:

Discover the desired business goals (and their measurements): Identify the desired business outcomes for the code refactoring intervention and define them as SMART objectives.
Identify the potential actors: Who are the stakeholders that can possibly take interventions that would help achieve the desired business outcomes?
Identify the potential impacts: What behavior changes can those actors take to help us in achieving our business goals?
Identify the deliverables: What actions can we take to proactively support the desired impacts?
Quantify impact value: Quantify the impact of each deliverable along with the effort required to conduct it (tying it back to how the business goals are measured).
Define scope of intervention: Prune the list of deliverables such that the intervention only includes the minimum amount of effort needed to meet its business requirements.

It’s important to note that steps 2-4 afford you the opportunity to leverage the diversity and creativity of your team. Impact Mapping is a collaborative process, and you should seek to encourage as many ideas as possible.

Step 3: Do a Cost/Benefit Analysis on the designed intervention

Now that you understand the scope of work needed to deliver the intervention, you need to understand how much value it will drive so that you can see whether or not this intervention is worth prioritizing. This can be achieved using a traditional cost-benefit analysis report. You will find various cost-benefit analysis templates online (see here for example), when conducting a cost-benefit analysis for a code refactoring project, you should consider the following factors:

Costs: Identify the costs associated with the code refactoring project, including the cost of developers, computing power, and any additional resources that may be needed. This should also include any potential costs associated with disruptions to ongoing projects and workflows.
Benefits: Identify the potential benefits of the code refactoring project, including improved performance, reduced technical debt, and streamlined workflows. This should also include any potential cost savings associated with improved efficiency, reduced errors, and reduced maintenance costs.
ROI: Calculate the potential ROI of the project by dividing the estimated cost savings by the cost of the project. This will help you determine whether the project is worth pursuing and can help you make a stronger case for getting buy-in from stakeholders.

We recommend all of your projects and initiatives (and not just impact mapping) have their cost-benefit analyzed and regularly updated to ensure you are prioritizing your limited resources as effectively as possible.

After conducting the Impact Mapping in Step 2, you should have a refactoring plan that outlines the specific deliverables that will be taken to refactor your codebase and workflows. Once you have the green-light to begin the intervention, you will need to take the following steps to see it through:

Segment deliverables into sprints: Prioritize the refactoring tasks within these sprints. We suggest you front-load highest-impact and lowest-effort tasks to gain early momentum. This will keep team spirits high while also allowing you to measure and adjust to the accuracy of your planning.
Implement the refactoring plan: Execute the prioritized tasks and monitor their impact on your ML models and pipelines.
Establish a feedback loop: End each spring with a review session that will serve as feedback loops so that you can celebrate wins and make adjustments to the plan as necessary.

Plans will often need to be adjusted—planning oversights should be embraced as learning opportunities rather than covered up. We encourage you to continuously evaluate the results of your refactoring intervention by comparing the performance performance metrics before and after each sprint. Be ready to end the intervention early if you are able to meet the business objectives early and without needing to implement the entire refactoring plan, or if your cost/benefit analysis was off and you need to readjust your team’s focus towards higher impact issues.

Refactoring interventions can be very profitable investments for machine learning teams, but only if they are used at the right time and place. Now that you have a better understanding of assessing when and how to do a refactoring intervention, we invite you to share your feedback or questions with us, and to share your experiences with code refactoring and its impact on your organization.

Step 1: Establish and Track Performance Metrics

Step 2: Design a code refactoring intervention

Step 3: Do a Cost/Benefit Analysis on the designed intervention

Step 4: Implement the intervention

Mastering ML Ops: A Blueprint for Success

Case Study: AI Chatbot using RAG