Your AI Journey Starts After the First Model Deployment: A Product Leader’s Perspective on MLOps

Akash Bhate
7 min readApr 17, 2024

--

Image credits — ml-ops.org

In the dynamic and complex world of retail and e-commerce, the application of AI and ML has been transformative, particularly in streamlining supply chain logistics. However, it’s crucial to understand that the real journey of AI implementation truly begins after the initial deployment. Here are some common challenges that arise post-deployment and potential solutions.

Image credits — pexels.com Liza Summer, (Managing Unrealistic expectations)

Managing Unrealistic Expectations: Often, business stakeholders perceive AI and ML as magical solutions that can instantly resolve issues like inventory management or sales forecasting without comprehending their intricacies.

As Product Leaders entrusted with the task of integrating AI into business processes, our role is to manage these expectations effectively.

We need to make it clear that while AI and ML can significantly optimize supply chain logistics and enhance customer experience, they are not immediate miracle solutions. For instance, consider an ML model designed to manage inventory in a large retail chain. While it can drastically improve the accuracy of inventory predictions over time and help avoid stock-outs or excess stock, it won’t solve all inventory-related problems instantly. The model will require time to learn, adapt, and improve.

Image credits — pexels.com Lukas, (Choosing appropriate success metrics)

Choosing Appropriate Success Metrics: In the e-commerce context, it’s crucial to choose the right success metrics for an ML solution. If we measure the success of an ML model solely based on increased sales, we might overlook its impact on other significant areas such as customer satisfaction or product return rates.

As Product Leaders, we need to collaborate with both technical and non-technical stakeholders to define realistic and meaningful success metrics that align with broader business objectives.

For example, goals could include improving customer retention rates, reducing shipping times, or increasing the accuracy of product recommendations.

Image credits — Pexels.com Pavel Danilyuk, (Data pipeline complexity)

Addressing Data Pipeline Complexity: The world of Retail/e-commerce is a data-rich environment, dealing with an assortment of data sources including real-time sales data, customer feedback, social media sentiment, competitor pricing, sales forecasts, and supply chain information. Accessing and harnessing this diverse data is an essential yet challenging task, primarily due to its dynamic nature.

Data in the retail and e-commerce sectors is in constant flux, continuously updated with new sales, customer feedback, market trends, and promotional events. This ever-changing data landscape presents its own set of challenges. For instance, the suitability of data can change over time, rendering a once valuable data source less relevant. Furthermore, new data sources can emerge that might be beneficial to include in the model.

The task of creating robust data pipelines to handle such diverse and dynamic data, preprocess it efficiently, and maintain data quality is substantial. For example, an ML model for predicting product demand would need to integrate data from past sales, current market trends, and upcoming promotional events. This requires a data-centric approach, where emphasis is placed on the quality, relevance, and timeliness of data, rather than solely focusing on the model.

As Product Leaders, our role is to ensure that the technical/ engineering team dedicates sufficient resources to address the challenges of data compatibility and consistency during the integration of these multiple and continuously changing data sources.

In doing so, we can create machine learning solutions that are adaptable, flexible, and able to provide accurate predictions and insights, even in the face of shifting retail and e-commerce landscapes.

Model version control

Model versioning and artifact management present several challenges in machine learning operations. One of the main issues is tracking the changes and iterations in models over time. Without a proper versioning system, it becomes difficult to reproduce previous results or trace back the root cause of any discrepancies in model predictions. This lack of traceability can lead to ‘hallucination effects’ where the model generates unexpected or inexplicable outputs, and the source of these outputs cannot be easily identified or corrected.

MLOps version control needs to track additional factors like datasets, hyperparameters, and model performance metrics.

For example, in an e-commerce platform, an ML model used for sales forecasting might suddenly start predicting significant spikes or drops in sales. Without a robust model versioning system, it becomes challenging to understand what changes in the model or data led to these anomalies. As a result, the reliability and trust in the model’s predictions can be compromised, affecting strategic business decisions based on these forecasts.

Moreover, without proper artifact management, it becomes difficult to manage and utilize the various components associated with ML models, such as training scripts, data sets, parameters, and model configuration. This can lead to inefficiencies in model development, testing, and deployment processes. It can also make it challenging to share and collaborate on models across teams, hindering collective progress and learning.

Hence, implementing a robust system for model versioning and artifact management is crucial. It ensures the reliability and maintainability of ML models in production, facilitates transparency and collaboration, and helps prevent issues related to the ‘hallucination effects’ of models. By keeping track of each model iteration and associated artifacts, teams can better understand, troubleshoot, and improve their models over time.

Accounting for Deployment Variability: Different deployment strategies are required based on the application. In-store recommendation systems might use edge devices, while cloud-based servers might power inventory management systems. Not accounting for this variability can introduce complications in deployment pipelines.

As Product Leaders, we need to collaborate with the engineering team to ensure that the MLOps solution is flexible enough to handle these varying deployment scenarios.

It should provide scalable and reliable deployments capable of handling changing workloads and potential system failures.

Implementing Monitoring and Governance: After deploying an ML model, it’s crucial to continuously track its performance and establish governance frameworks. This is critical for several reasons. Firstly, continuous monitoring can help detect any anomalies in model performance early, allowing for quick adjustments and improvements. For instance, an ML model used for dynamic pricing in an e-commerce platform needs to be monitored to ensure it’s providing fair prices. Continuous monitoring can help detect if the model starts deviating from its expected behavior — for example, if it starts predicting unusually high or low prices.

Secondly, monitoring provides valuable insights into the model’s performance over time, helping to assess its overall effectiveness. These insights can guide future improvements and enhancements to the model, ensuring it continues to deliver value.

Monitoring is crucial in MLOps because machine learning models can degrade in performance over time due to model and data drift.

Establishing governance frameworks is equally important. These frameworks define the rules and guidelines for the responsible and ethical use of ML models. In the case of the dynamic pricing model, a governance framework would ensure that the model is not inadvertently leading to price discrimination or price gauging. This is crucial for maintaining customer trust and regulatory compliance.

Furthermore, governance frameworks guide how the model should be updated and maintained over time, ensuring consistency and reliability. They can also outline how to handle potential issues or challenges that may arise, providing a roadmap for maintaining the model’s integrity.

Collaborating with the technical team to prioritize the implementation of comprehensive monitoring systems and governance frameworks is fundamental. This collaboration ensures that both the performance and ethical aspects of ML models are given the necessary attention, maintaining the integrity and reliability of the ML systems over time.

Image credits — Pexel.com Katrin Bolovtsova, (Promoting effective collaboration and communication).

Promoting Effective Collaboration and Communication: Fostering effective collaboration and clear communication between data scientists, engineers, business analysts, and other stakeholders is vital in retail. Here, ML models impact multiple business areas, and insights derived from them can be valuable across departments. For example, a model predicting potential stock-outs is not just beneficial for the supply chain leadership but also for the sales and go-to-marketing teams as they can plan their strategies accordingly.

As Product Leaders, promoting cross-functional collaboration and ensuring clear communication channels are established is crucial to align all stakeholders on the project goals and progress.

By proactively addressing these common mistakes through a collaborative, cross-functional approach, we can overcome the real-world challenges of implementing MLOps.

Our focus should be on delivering reliable, scalable, and responsible ML solutions that drive business value, enhance customer experiences, and optimize supply chain processes. This is where the real journey of our AI-powered products begins — after the initial deployment, as we continuously refine, optimize, and govern the ML systems, ensuring their long-term success and realizing the true potential of AI or ML.

--

--

Akash Bhate

Senior Product ^ Engineering Leader @ Amazon ex @GE @Capgemini | Startup advisor - SalesTech AI, HealthTech AI