Expert Tips for Smooth ML-SaaS Onboarding

Navigating the Unique Challenges of Integrating Machine Learning Software

If you are shipping an ML-based SaaS product, customer onboarding can become a bottleneck (especially if you need to integrate with each new customer’s bespoke set of data systems before you can add value). You don’t want onboarding to be hacky and labor intensive or it will slow you down and leave a bad taste in your new customers’ mouths. But the work still needs to be done, and it’s non trivial. Good news, MLSaaS integrations don’t need to be labor intensive to be effective. Although each integration scenario is unique, there are common issues that your developers, scientists, and engineers will need to resolve, and so there are patterns you can leverage.

When a customer buys a SaaS product, it’s likely they will need to do some work before they can start using it. This work can range from low-effort SSO integration (so that the customer can manage access to the product via user groups), all the way to high-effort two-way data system integrations (so that data in the customer’s internal systems can manipulate and/or be manipulated by, the software they have just purchased). This work is a force of friction during the SaaS onboarding process, and so naturally SaaS companies want to minimize it.

As such, the traditional SaaS world has already developed a comprehensive set of best-practices to make data system integrations go smoother. These involve understanding type of integration, answering questions such as “do we need this connection to be in real time”, or “what formats do we need the data to be in”, and building out APIs, pre-built connectors, and custom data mapping tools where necessary.

When I shipped my first MLSaaS product, I followed traditional SaaS best practices that I used to follow during Salesforce integrations. However, I quickly realized data integration is a more complex process for MLSaaS than for traditional SaaS. With traditional SaaS you only need to integrate your systems with your customers’ data systems, while with MLSaaS you also need to ensure your models will perform well with this new, potentially never-before-seen data. The traditional SaaS integration processes only took me so far, and I needed to add an extra couple of steps for effective MLSaaS customer onboarding.

This blog post describes strategies (in the form of patterns) that I found useful while shipping my own MLSaaS product. Note that these patterns should be viewed as supplementary to those you would use during a traditional SaaS integration—I may write a blog post on that in the future.

Once you have customer data in a standardized and workable format (following a traditional SaaS integration), you’ll realize that a lot of your customers’ data is not as feature-rich as the data you used to train your own models, or you may need to do additional inference to get the data into the format your model needs. If this onboarding step is a major pain point, you need to standardize your workflows, specifically you can probably benefit from the following:

Transforming your data inputs into a standard set of features
Implementing schema bridges when new customers have incomplete/expanded feature sets

Transforming your data inputs into a standard set of features

Many black-box models use some combination of raw data and data features as their input data. This approach can yield decent results with relatively low effort, however, it’s not a scalable approach for building out a high-performing scalable model. The Transform design pattern solves this problem by introducing the concept of a “transformer” which separates out raw data and data features such that the machine learning model only uses a predictable set of data features as its input.

You may be familiar with connectors and data-mapping tools from traditional SaaS integrations—transformers function in a similar way—they transform/map data from one format into another. Popular LLMs such as Open AI’s ChatGPT use transformers in their approach, however, this design pattern doesn’t just apply to the use of embeddings in text classification, it is a useful generalizing pattern that allows you to make your modeling pipelines more modular by disentangling data inputs, features, and transforms. This allows you to standardize model features, while allowing for a dynamic range of data input types.

See Figure 1 for an example of this design pattern in action for two different customers. Customer 1 is served by Transform 1, which takes 2 data inputs (Location and DateTime) and transforms them into the three features (Country, Day of Week, and Time) needed as input for the model. Customer 2 is served by Transform 2, which takes three data inputs (two of which are new) and transforms them into the same target features. In this example, Customer 2 does not have DateTime, but does have Time and Date and so they are still able to leverage the model because they can plug into a different Transformer (and both customers will be able to benefit from improvements made to the same model).

**Figure 1:** Two customers getting served by different transformers to use the same ML model.

Once you have standardized the set of features you are able to accommodate with your model, you may find that some of your customers only have subsets of this feature set available to them. Alternatively, they may have supersets of this feature set, which means they have additional features that were not considered in your original model. In these situations, you can leverage Schema Bridges.

Implementing Schema Bridges when new customers have incomplete/expanded feature sets:

Consider two scenarios: one where a new customer has fewer features available in their data than your model requires (e.g., your model requires features A, B, and C, but the customer’s data only includes features A and B), and another where a new customer has an additional feature you want to test making available to your model (e.g., your model requires features A, B, and C, but the customer’s data includes features A, B, C, and D). Proceed with the following steps:

Step 1) Assess the new feature (for customers with additional features): Analyze the additional feature (Feature D) to determine if it can potentially provide valuable insights and improve the model’s performance. Perform exploratory data analysis, correlations, or other feature importance techniques to assess its potential impact.

Step 2) Extend your model (for customers with additional features): If the new feature is deemed valuable, update your model to include this additional feature. This may require retraining your model with an expanded dataset containing the new feature. Ensure that you validate the model’s performance with and without the additional feature to measure its impact.

Step 3) Bridge the data: For customers with missing features, use statistical imputation to replace missing feature data in your model (either the original or the expanded version) with synthetic (but reasonable) values derived from the customers’ data. This allows customers with incomplete feature sets to benefit from the enhanced model.

Step 4) Split the data: Train and evaluate a customer-specific model using as much real data as possible, including the additional feature when available. Determine the number of evaluation examples needed to have confidence in your model. Techniques for doing this include k-fold cross-validation or train-validation-test split, and the choice will depend on your data.

By following these steps, you can accommodate customers with both incomplete and expanded feature sets. As you bridge and split data over and over again for new customers, you will notice patterns in how the data needs to be bridged for different features and how the data needs to be augmented, allowing you to operationalize much of this workload and improve the integration process for your customers.

If you are developing an MLSaaS product and would like to learn more about integration patterns and practices, don’t hesitate to reach out to us. Our team of experts can provide guidance and insights to help you streamline your onboarding process and improve the overall user experience for your customers. Contact us today to learn more!

Expert Tips for Smooth ML-SaaS Onboarding

Navigating the Unique Challenges of Integrating Machine Learning Software

Transforming your data inputs into a standard set of features

Implementing Schema Bridges when new customers have incomplete/expanded feature sets:

XAI’s crucial role in GxP Manufacturing

Navigating the MLOps Maturity Model