From 7551e7311eae90b0d93f07b68f6861bb53b14188 Mon Sep 17 00:00:00 2001 From: ben Date: Thu, 1 Feb 2024 10:49:24 -0800 Subject: [PATCH 01/20] added evolution of ml platform --- _posts/2024-02-01-evolution-of-mlplatform.md | 178 +++++++++++++++++++ 1 file changed, 178 insertions(+) create mode 100644 _posts/2024-02-01-evolution-of-mlplatform.md diff --git a/_posts/2024-02-01-evolution-of-mlplatform.md b/_posts/2024-02-01-evolution-of-mlplatform.md new file mode 100644 index 0000000..fa11b2d --- /dev/null +++ b/_posts/2024-02-01-evolution-of-mlplatform.md @@ -0,0 +1,178 @@ +--- +layout: post title: "The Evolution of the Machine Learning Platform" +team: Machine Learning Platform +author: bshaw +tags: +- ml +- mlops +- devops +- platform +--- + +Technical Debt is not unique to Software Engineering and is a concept applicable to production Machine Learning (ML) at scale. Machine Learning Platforms (ML Platforms) have the potential to be a key component to achieving production ML at scale without large technical debt, yet ML Platforms are not often well understood. This document outlines the key concepts and paradigm shifts that led to the conceptualization of ML Platforms and how ML Platforms can act as a key to unlocking Development Velocity without Technical debt. + +* 1 [Technical Debt and development velocity defined](#Technical-Debt-and-development-velocity-defined) + * 1.1 [Development Velocity](#Development-Velocity) + * 1.2 [Technical Debt](#Technical-Debt) + * 1.3 [Technical Debt in Machine Learning](#Technical-Debt-in-Machine-Learning) +* 2 [The Evolution Of ML Platforms](#The-Evolution-Of-ML-Platforms) + * 2.1 [DevOps -- The paradigm shift that led the way](#DevOps----The-paradigm-shift-that-led-the-way) + * 2.2 [Platforms -- Reducing Cognitive Load](#Platforms----Reducing-Cognitive-Load) + * 2.3 [ML Ops -- Reducing technical debt of machine learning](#ML-Ops----Reducing-technical-debt-of-machine-learning) +* 3 [The Rise of Machine Learning Platform](#The-Rise-of-Machine-Learning-Platform) + * 3.1 [Benefits to the Organization](#Benefits-to-the-Organization) +* 4 [References](#References) + +Technical Debt and development velocity defined +----------------------------------------------- + +### Development Velocity + +Machine learning development velocity refers to the speed and efficiency at which machine learning (ML) projects progress from the initial concept to deployment and maintenance. It encompasses the entire lifecycle of a machine learning project, from data collection and preprocessing to model training, evaluation, deployment, and ongoing optimization. In platform engineering this is often referred to as rate of change. + +### Technical Debt + +The term "technical debt" in software engineering was coined by Ward Cunningham, Cunningham used the metaphor of financial debt to describe the trade-off between implementing a quick and dirty solution to meet immediate needs (similar to taking on financial debt for short-term gain) versus taking the time to do it properly with a more sustainable and maintainable solution (akin to avoiding financial debt but requiring more upfront investment). Just as financial debt accumulates interest over time, technical debt can accumulate and make future development more difficult and expensive. + +The idea behind technical debt is to highlight the consequences of prioritizing short-term gains over long-term maintainability and the need to address and pay off this "debt" through proper refactoring and improvements. The term has since become widely adopted in the software development community to describe the accrued cost of deferred work on a software project. + +### Technical Debt in Machine Learning + +Originally a software engineering concept, Technical debt is also relevant to Machine Learning Systems infact the landmark google paper .css-118vsk3{line-height:22px;padding:var(--ds-space-025,2px) 0px;display:inline;-webkit-box-decoration-break:clone;box-decoration-break:clone;border-radius:var(--ds-border-radius-100,4px);color:var(--ds-link,#0052CC);background-color:var(--ds-surface-raised,white);-webkit-user-select:text;-moz-user-select:text;-ms-user-select:text;user-select:text;border:1px solid var(--ds-border,#DFE1E6);-webkit-transition:0.1s all ease-in-out;transition:0.1s all ease-in-out;-moz-user-select:none;}.css-118vsk3:hover{border-color:var(--ds-border-accent-blue,#2684FF);}.css-118vsk3,.css-118vsk3:hover,.css-118vsk3:focus,.css-118vsk3:active{-webkit-text-decoration:none;text-decoration:none;}.css-118vsk3:active{background-color:var(--ds-background-selected,#DEEBFF);}.css-118vsk3:focus{cursor:pointer;box-shadow:0 0 0 2px var(--ds-border-selected,#4C9AFF);outline:none;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none;}.css-118vsk3:focus,.css-118vsk3:focus:hover,.css-118vsk3:focus:focus,.css-118vsk3:focus:active{-webkit-text-decoration:none;text-decoration:none;}.css-118vsk3:focus:hover{border:1px solid var(--ds-border,#DFE1E6);}.css-1cwva94{white-space:pre-wrap;word-break:break-all;-webkit-box-decoration-break:clone;box-decoration-break:clone;padding:var(--ds-space-025,2px) var(--ds-space-050,4px);}.css-10y2gog{color:var(--ds-link,#0052CC);}.css-10y2gog:hover{-webkit-text-decoration:none;text-decoration:none;}[.css-1lcr4h8{margin-right:var(--ds-space-050,4px);position:relative;display:inline-block;}.css-5j6uzt{white-space:pre-wrap;word-break:break-all;-webkit-box-decoration-break:clone;box-decoration-break:clone;padding:var(--ds-space-025,2px) var(--ds-space-050,4px);vertical-align:text-bottom;padding:0px;}https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems) suggest that ML systems have the propensity to easily gain this technical debt. + +> Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt , we find it is common to incur massive ongoing maintenance costs in real-world ML systems +> +> [https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems) + +> As the machine learning (ML) community continues to accumulate years of experience with livesystems, a wide-spread and uncomfortable trend has emerged: developing and deploying ML sys-tems is relatively fast and cheap, but maintaining them over time is difficult and expensive +> +> [https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems) + +Technical debt is important to consider especially when trying to move fast. Moving fast is easy, moving fast without acquiring technical debt is alot more complicated. + +The Evolution Of ML Platforms +----------------------------- + +### DevOps -- The paradigm shift that led the way + +DevOps is a methodology in software development which advocates for teams owning the entire software development lifecycle. This paradigm shift from fragmented teams to end-to-end ownership enhances collaboration and accelerates delivery. Dev ops has become standard practice in modern software development. The adoption of DevOps has been widespread across various industries, with many organizations considering it an essential part of their software development and delivery processes. Some of the principles of DevOps are: + +1. **Automation** + +2. **Continuous Testing** + +3. **Continuous Monitoring** + +4. **Collaboration and Communication** + +5. **Version Control** + +6. **Feedback Loops** + + +### Platforms -- Reducing Cognitive Load + +This shift to DevOps and teams teams owning the entire development lifecycle introduces a new challenge—additional cognitive load. Cognitive load can be defined as + +> The total amount of mental effort a team uses to understand, operate and maintain their designated systems or tasks. +> +> — [](https://teamtopologies.com/book "https://teamtopologies.com/book")[https://teamtopologies.com/book](https://teamtopologies.com/book) + +As teams grapple with the mental effort required by adopting DevOps of understanding, operating, and maintaining systems, cognitive load becomes a barrier to efficiency. The weight of this additional load can hinder productivity, prompting organizations to seek solutions. + +Platforms emerged as a strategic solution, delicately abstracting unnecessary details of the development lifecycle. This abstraction allows engineers to focus on critical tasks, mitigating cognitive load and fostering a more streamlined workflow. + +> The purpose of a platform team is to enable stream-aligned teams to deliver work with substantial autonomy. The stream-aligned team maintains full ownership of building, running, and fixing their application in production. The platform team provides internal services to reduce the cognitive load that would be required from stream-aligned teams to develop these underlying services. +> +> — [](https://teamtopologies.com/book "https://teamtopologies.com/book")[https://teamtopologies.com/book](https://teamtopologies.com/book) + +> _Infrastructure Platform teams enable organisations to scale delivery by solving common product and non-functional requirements with resilient solutions. This allows other teams to focus on building their own things and releasing value for their users_ +> +> \- [https://martinfowler.com/articles/building-infrastructure-platform.html](https://martinfowler.com/articles/building-infrastructure-platform.html) + +### ML Ops -- Reducing technical debt of machine learning + +The ability of ML systems to rapidly accumulate technical debt has given rise to the concept of MLOps, a methodology that takes inspiration from and incorporates best practices of the DevOps , tailoring them to address the distinctive challenges and workflows inherent in machine learning and controlling technical debt. MLOps seamlessly applies the established principles of DevOps to the intricate landscape of machine learning, recognizing that merely a fraction of real-world ML systems comprises the actual ML code. Serving as a crucial bridge between development and the ongoing intricacies of maintaining ML systems. + +Some examples of concepts of DevOps applied to ML (aka ML Ops) are: + +1. **Automation:** + + 1. Automation can be applied to many parts of the machine learning lifecycle. The incorporation of automation not only streamlines processes but also addresses technical debt through the establishment of consistency and a standardized and reproducible approach. + + 2. Model deployments which can be automated by the implementation of DevOps CI/CD strategies. + + 3. Automation can also be applied to retraining of machine learning models + +2. **Continuous** **Testing:** + + * Continuous testing can be applied as part of a model deployment pipeline, removing the need for manual testing (increasing development velocity) and removing technical debt by ensuring tests are performed consistently + + * Model validation can be automated using tooling providing consistency between training iterations. + +3. **Monitoring:** + + * Monitoring provides key insights and a steps towards creating vital feedback loops. + + * Monitoring can be applied to real time inference infrastructure revealing performance concerns similar to dev ops. + + * Monitoring can be applied to Model performance and monitor for model drift in realtime, providing realtime insight and analysis to model performance and when it may need to be retrained. + +4. **Collaboration and Communication:** + + * Utilize collaboration tools for effective communication and information sharing among team members. + + * Feature Store provides a platform for discovering, re using and collaborating on ML features + + * Model Database provides platform for discovering, re using and collaborating on ML Models + +5. **Version Control:** + + * Applying version control to experiments, machine learning models and features provides + + +MLOps is a methodology that provides a collection of concepts and workflows designed to promote efficiency, collaboration, and sustainability of the ML Lifecycle. MLOps plays a pivotal role in ensuring the efficiency, reliability, and scalability of machine learning implementations over time. + +The Rise of Machine Learning Platform +------------------------------------- + +The paradigm shifts of DevOps, MLOps and Platform Thinking led to the emergence of Machine Learning platforms. ML platforms are the application of MLOps concepts and workflows and provide a curated developer experience for Machine Learning developers throughout the entire ML lifecycle. These platforms address the challenges of cognitive load, technical debt, quality and developer velocity and increase efficiency, collaboration, and sustainability. As the ML team grows, the benefits amplify, creating a multiplier effect that allows organizations to scale whilst maintaining quality. + +### Benefits to the Organization + +The adoption of a Machine Learning Platform unfolds a spectrum of benefits: + +**Increasing Flow of Change (aka developer velocity):** A swift pace in model development and deployment, enhancing overall efficiency. + +**Fostering Collaboration Amongst Teams:** Breaking down silos and promoting cross-functional collaboration. The platform becomes the silent foundation for collaboration, facilitating a harmonious working environment. + +**Enforcing Best Practices:** Standardizing and ensuring adherence to best practices across ML projects. + +**Reducing/Limiting Technical Debt:** Strategically mitigating the risk of accumulating technical debt, ensuring long-term sustainability. + +**Multiplier Effect:** As the ML team grows, these benefits of the platform amplify—a dividend that multiplies with organizational growth. + +References +---------- + +[https://www.youtube.com/watch?v=Bfhl8kcSaEI&embeds\_referring\_euri=https%3A%2F%2Fplatformengineering.org%2F&feature=emb\_imp\_woyt](https://www.youtube.com/watch?v=Bfhl8kcSaEI&embeds_referring_euri=https%3A%2F%2Fplatformengineering.org%2F&feature=emb_imp_woyt) + +[https://www.atlassian.com/devops/frameworks/team-topologies](https://www.atlassian.com/devops/frameworks/team-topologies) + +[https://platformengineering.org/blog/what-is-platform-engineering](https://platformengineering.org/blog/what-is-platform-engineering) + +[https://www.thoughtworks.com/insights/blog/platforms/art-platform-thinking](https://www.thoughtworks.com/insights/blog/platforms/art-platform-thinking) + +[https://www.scribd.com/document/611845499/Whitepaper-State-of-Platform-Engineering-Report](https://www.scribd.com/document/611845499/Whitepaper-State-of-Platform-Engineering-Report) + +[https://martinfowler.com/bliki/ConwaysLaw.html](https://martinfowler.com/bliki/ConwaysLaw.html) + +[https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems) + +[https://martinfowler.com/articles/building-infrastructure-platform.html](https://martinfowler.com/articles/building-infrastructure-platform.html) + +[https://martinfowler.com/articles/platform-teams-stuff-done.html](https://martinfowler.com/articles/platform-teams-stuff-done.html) + +[https://martinfowler.com/articles/talk-about-platforms.html](https://martinfowler.com/articles/talk-about-platforms.html) + +[https://www.techopedia.com/definition/27913/technical-debt](https://www.techopedia.com/definition/27913/technical-debt) From c52ab4558dbfe25fff7c36db28d3c9b2ce95f563 Mon Sep 17 00:00:00 2001 From: ben Date: Thu, 1 Feb 2024 10:52:30 -0800 Subject: [PATCH 02/20] remove title of contents --- _posts/2024-02-01-evolution-of-mlplatform.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/_posts/2024-02-01-evolution-of-mlplatform.md b/_posts/2024-02-01-evolution-of-mlplatform.md index fa11b2d..3d9e94e 100644 --- a/_posts/2024-02-01-evolution-of-mlplatform.md +++ b/_posts/2024-02-01-evolution-of-mlplatform.md @@ -11,17 +11,6 @@ tags: Technical Debt is not unique to Software Engineering and is a concept applicable to production Machine Learning (ML) at scale. Machine Learning Platforms (ML Platforms) have the potential to be a key component to achieving production ML at scale without large technical debt, yet ML Platforms are not often well understood. This document outlines the key concepts and paradigm shifts that led to the conceptualization of ML Platforms and how ML Platforms can act as a key to unlocking Development Velocity without Technical debt. -* 1 [Technical Debt and development velocity defined](#Technical-Debt-and-development-velocity-defined) - * 1.1 [Development Velocity](#Development-Velocity) - * 1.2 [Technical Debt](#Technical-Debt) - * 1.3 [Technical Debt in Machine Learning](#Technical-Debt-in-Machine-Learning) -* 2 [The Evolution Of ML Platforms](#The-Evolution-Of-ML-Platforms) - * 2.1 [DevOps -- The paradigm shift that led the way](#DevOps----The-paradigm-shift-that-led-the-way) - * 2.2 [Platforms -- Reducing Cognitive Load](#Platforms----Reducing-Cognitive-Load) - * 2.3 [ML Ops -- Reducing technical debt of machine learning](#ML-Ops----Reducing-technical-debt-of-machine-learning) -* 3 [The Rise of Machine Learning Platform](#The-Rise-of-Machine-Learning-Platform) - * 3.1 [Benefits to the Organization](#Benefits-to-the-Organization) -* 4 [References](#References) Technical Debt and development velocity defined ----------------------------------------------- From 6312feceb85d18fdf8dc3601fd2092f5cc2df9a0 Mon Sep 17 00:00:00 2001 From: ben Date: Thu, 1 Feb 2024 10:57:16 -0800 Subject: [PATCH 03/20] remove formatting paste --- _posts/2024-02-01-evolution-of-mlplatform.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-02-01-evolution-of-mlplatform.md b/_posts/2024-02-01-evolution-of-mlplatform.md index 3d9e94e..0311732 100644 --- a/_posts/2024-02-01-evolution-of-mlplatform.md +++ b/_posts/2024-02-01-evolution-of-mlplatform.md @@ -27,7 +27,7 @@ The idea behind technical debt is to highlight the consequences of prioritizing ### Technical Debt in Machine Learning -Originally a software engineering concept, Technical debt is also relevant to Machine Learning Systems infact the landmark google paper .css-118vsk3{line-height:22px;padding:var(--ds-space-025,2px) 0px;display:inline;-webkit-box-decoration-break:clone;box-decoration-break:clone;border-radius:var(--ds-border-radius-100,4px);color:var(--ds-link,#0052CC);background-color:var(--ds-surface-raised,white);-webkit-user-select:text;-moz-user-select:text;-ms-user-select:text;user-select:text;border:1px solid var(--ds-border,#DFE1E6);-webkit-transition:0.1s all ease-in-out;transition:0.1s all ease-in-out;-moz-user-select:none;}.css-118vsk3:hover{border-color:var(--ds-border-accent-blue,#2684FF);}.css-118vsk3,.css-118vsk3:hover,.css-118vsk3:focus,.css-118vsk3:active{-webkit-text-decoration:none;text-decoration:none;}.css-118vsk3:active{background-color:var(--ds-background-selected,#DEEBFF);}.css-118vsk3:focus{cursor:pointer;box-shadow:0 0 0 2px var(--ds-border-selected,#4C9AFF);outline:none;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none;}.css-118vsk3:focus,.css-118vsk3:focus:hover,.css-118vsk3:focus:focus,.css-118vsk3:focus:active{-webkit-text-decoration:none;text-decoration:none;}.css-118vsk3:focus:hover{border:1px solid var(--ds-border,#DFE1E6);}.css-1cwva94{white-space:pre-wrap;word-break:break-all;-webkit-box-decoration-break:clone;box-decoration-break:clone;padding:var(--ds-space-025,2px) var(--ds-space-050,4px);}.css-10y2gog{color:var(--ds-link,#0052CC);}.css-10y2gog:hover{-webkit-text-decoration:none;text-decoration:none;}[.css-1lcr4h8{margin-right:var(--ds-space-050,4px);position:relative;display:inline-block;}.css-5j6uzt{white-space:pre-wrap;word-break:break-all;-webkit-box-decoration-break:clone;box-decoration-break:clone;padding:var(--ds-space-025,2px) var(--ds-space-050,4px);vertical-align:text-bottom;padding:0px;}https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems) suggest that ML systems have the propensity to easily gain this technical debt. +Originally a software engineering concept, Technical debt is also relevant to Machine Learning Systems infact the landmark[https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](google paper)suggest that ML systems have the propensity to easily gain this technical debt. > Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt , we find it is common to incur massive ongoing maintenance costs in real-world ML systems > From 6875996ff4a635f629fc052720526c457ef96226 Mon Sep 17 00:00:00 2001 From: ben Date: Thu, 1 Feb 2024 10:57:59 -0800 Subject: [PATCH 04/20] fix formatting --- _posts/2024-02-01-evolution-of-mlplatform.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-02-01-evolution-of-mlplatform.md b/_posts/2024-02-01-evolution-of-mlplatform.md index 0311732..aab2420 100644 --- a/_posts/2024-02-01-evolution-of-mlplatform.md +++ b/_posts/2024-02-01-evolution-of-mlplatform.md @@ -27,7 +27,7 @@ The idea behind technical debt is to highlight the consequences of prioritizing ### Technical Debt in Machine Learning -Originally a software engineering concept, Technical debt is also relevant to Machine Learning Systems infact the landmark[https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](google paper)suggest that ML systems have the propensity to easily gain this technical debt. +Originally a software engineering concept, Technical debt is also relevant to Machine Learning Systems infact the landmark [https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](google paper)suggest that ML systems have the propensity to easily gain this technical debt. > Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt , we find it is common to incur massive ongoing maintenance costs in real-world ML systems > From 91a5328a67e4ec90e1cfff57baaf9b524c2ede11 Mon Sep 17 00:00:00 2001 From: Ben Shaw Date: Thu, 1 Feb 2024 11:00:24 -0800 Subject: [PATCH 05/20] Update 2024-02-01-evolution-of-mlplatform.md fixing formatting --- _posts/2024-02-01-evolution-of-mlplatform.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-02-01-evolution-of-mlplatform.md b/_posts/2024-02-01-evolution-of-mlplatform.md index aab2420..51e7438 100644 --- a/_posts/2024-02-01-evolution-of-mlplatform.md +++ b/_posts/2024-02-01-evolution-of-mlplatform.md @@ -27,7 +27,7 @@ The idea behind technical debt is to highlight the consequences of prioritizing ### Technical Debt in Machine Learning -Originally a software engineering concept, Technical debt is also relevant to Machine Learning Systems infact the landmark [https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](google paper)suggest that ML systems have the propensity to easily gain this technical debt. +Originally a software engineering concept, Technical debt is also relevant to Machine Learning Systems infact the landmark google paper suggest that ML systems have the propensity to easily gain this technical debt. > Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt , we find it is common to incur massive ongoing maintenance costs in real-world ML systems > From fa37acd9e3c0ff3d6759eee3a42257774cdb6e4f Mon Sep 17 00:00:00 2001 From: Ben Shaw Date: Fri, 2 Feb 2024 10:06:29 -0800 Subject: [PATCH 06/20] Update 2024-02-01-evolution-of-mlplatform.md --- _posts/2024-02-01-evolution-of-mlplatform.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_posts/2024-02-01-evolution-of-mlplatform.md b/_posts/2024-02-01-evolution-of-mlplatform.md index 51e7438..94847c3 100644 --- a/_posts/2024-02-01-evolution-of-mlplatform.md +++ b/_posts/2024-02-01-evolution-of-mlplatform.md @@ -1,5 +1,6 @@ --- -layout: post title: "The Evolution of the Machine Learning Platform" +layout: +post title: "The Evolution of the Machine Learning Platform" team: Machine Learning Platform author: bshaw tags: From 264713b08923ba2861c2961e88698e3fc8ed1549 Mon Sep 17 00:00:00 2001 From: Ben Shaw Date: Fri, 2 Feb 2024 10:07:04 -0800 Subject: [PATCH 07/20] Update 2024-02-01-evolution-of-mlplatform.md --- _posts/2024-02-01-evolution-of-mlplatform.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_posts/2024-02-01-evolution-of-mlplatform.md b/_posts/2024-02-01-evolution-of-mlplatform.md index 94847c3..6c65504 100644 --- a/_posts/2024-02-01-evolution-of-mlplatform.md +++ b/_posts/2024-02-01-evolution-of-mlplatform.md @@ -1,6 +1,6 @@ --- -layout: -post title: "The Evolution of the Machine Learning Platform" +layout:post +title: "The Evolution of the Machine Learning Platform" team: Machine Learning Platform author: bshaw tags: From 9937bf093749a914ec733b59dec51f4995f0fe1c Mon Sep 17 00:00:00 2001 From: Ben Shaw Date: Fri, 2 Feb 2024 10:07:54 -0800 Subject: [PATCH 08/20] Update 2024-02-01-evolution-of-mlplatform.md --- _posts/2024-02-01-evolution-of-mlplatform.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-02-01-evolution-of-mlplatform.md b/_posts/2024-02-01-evolution-of-mlplatform.md index 6c65504..42eac4e 100644 --- a/_posts/2024-02-01-evolution-of-mlplatform.md +++ b/_posts/2024-02-01-evolution-of-mlplatform.md @@ -1,5 +1,5 @@ --- -layout:post +layout: post title: "The Evolution of the Machine Learning Platform" team: Machine Learning Platform author: bshaw From 21f6da921848ca8dd53b63fd287c19b7a4eb62d7 Mon Sep 17 00:00:00 2001 From: Ben Shaw Date: Fri, 2 Feb 2024 15:40:49 -0800 Subject: [PATCH 09/20] Update 2024-02-01-evolution-of-mlplatform.md --- _posts/2024-02-01-evolution-of-mlplatform.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-02-01-evolution-of-mlplatform.md b/_posts/2024-02-01-evolution-of-mlplatform.md index 42eac4e..054145e 100644 --- a/_posts/2024-02-01-evolution-of-mlplatform.md +++ b/_posts/2024-02-01-evolution-of-mlplatform.md @@ -2,7 +2,7 @@ layout: post title: "The Evolution of the Machine Learning Platform" team: Machine Learning Platform -author: bshaw +author: benshaw tags: - ml - mlops From 62c25c6a5e962853264f811957a35e921f66e5b0 Mon Sep 17 00:00:00 2001 From: Ben Shaw Date: Fri, 2 Feb 2024 15:45:23 -0800 Subject: [PATCH 10/20] Update authors.yml --- _data/authors.yml | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/_data/authors.yml b/_data/authors.yml index 1ffec72..69d8120 100644 --- a/_data/authors.yml +++ b/_data/authors.yml @@ -3,6 +3,13 @@ # description, etc --- +bshaw: + name: Ben Shaw + github: benshaw + twitter: ben_a_shaw + about: | + Ben leads the ML Platform group, helping scale production Machine Learning at scribd. Other times you will find him outside playing in the mountains. + alexjb: name: Alex Bernardin github: alexofmanytrades From d7b3e02599b6bef22ecc225798f8b88c6b2d06e5 Mon Sep 17 00:00:00 2001 From: Ben Shaw Date: Fri, 2 Feb 2024 15:47:03 -0800 Subject: [PATCH 11/20] Update 2024-02-01-evolution-of-mlplatform.md --- _posts/2024-02-01-evolution-of-mlplatform.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-02-01-evolution-of-mlplatform.md b/_posts/2024-02-01-evolution-of-mlplatform.md index 054145e..42eac4e 100644 --- a/_posts/2024-02-01-evolution-of-mlplatform.md +++ b/_posts/2024-02-01-evolution-of-mlplatform.md @@ -2,7 +2,7 @@ layout: post title: "The Evolution of the Machine Learning Platform" team: Machine Learning Platform -author: benshaw +author: bshaw tags: - ml - mlops From 817346f143b9481667f48f29073090b172c542f5 Mon Sep 17 00:00:00 2001 From: ben Date: Fri, 2 Feb 2024 17:44:29 -0800 Subject: [PATCH 12/20] fixed tags --- _posts/2024-02-01-evolution-of-mlplatform.md | 5 ++--- tag/mlops/index.md | 6 ++++++ 2 files changed, 8 insertions(+), 3 deletions(-) create mode 100644 tag/mlops/index.md diff --git a/_posts/2024-02-01-evolution-of-mlplatform.md b/_posts/2024-02-01-evolution-of-mlplatform.md index 42eac4e..6829bf0 100644 --- a/_posts/2024-02-01-evolution-of-mlplatform.md +++ b/_posts/2024-02-01-evolution-of-mlplatform.md @@ -4,10 +4,9 @@ title: "The Evolution of the Machine Learning Platform" team: Machine Learning Platform author: bshaw tags: -- ml - mlops -- devops -- platform +- featured +- ml-platform-series --- Technical Debt is not unique to Software Engineering and is a concept applicable to production Machine Learning (ML) at scale. Machine Learning Platforms (ML Platforms) have the potential to be a key component to achieving production ML at scale without large technical debt, yet ML Platforms are not often well understood. This document outlines the key concepts and paradigm shifts that led to the conceptualization of ML Platforms and how ML Platforms can act as a key to unlocking Development Velocity without Technical debt. diff --git a/tag/mlops/index.md b/tag/mlops/index.md new file mode 100644 index 0000000..b51bead --- /dev/null +++ b/tag/mlops/index.md @@ -0,0 +1,6 @@ +--- +layout: tag_page +title: "Tag: mlops" +tag: mlops +robots: noindex +--- From 9ea9a10f9d04a138f7312f8c215d8c59224328ac Mon Sep 17 00:00:00 2001 From: ben Date: Fri, 2 Feb 2024 17:55:19 -0800 Subject: [PATCH 13/20] always fixing --- _posts/2024-02-01-evolution-of-mlplatform.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_posts/2024-02-01-evolution-of-mlplatform.md b/_posts/2024-02-01-evolution-of-mlplatform.md index 6829bf0..b17f0ca 100644 --- a/_posts/2024-02-01-evolution-of-mlplatform.md +++ b/_posts/2024-02-01-evolution-of-mlplatform.md @@ -9,7 +9,7 @@ tags: - ml-platform-series --- -Technical Debt is not unique to Software Engineering and is a concept applicable to production Machine Learning (ML) at scale. Machine Learning Platforms (ML Platforms) have the potential to be a key component to achieving production ML at scale without large technical debt, yet ML Platforms are not often well understood. This document outlines the key concepts and paradigm shifts that led to the conceptualization of ML Platforms and how ML Platforms can act as a key to unlocking Development Velocity without Technical debt. +Machine Learning Platforms (ML Platforms) have the potential to be a key component to achieving production ML at scale without large technical debt, yet ML Platforms are not often well understood. This document outlines the key concepts and paradigm shifts that led to the conceptualization of ML Platforms in an effort to increase an understanding of these and how they can best be applied to bring value. Technical Debt and development velocity defined @@ -81,7 +81,7 @@ Platforms emerged as a strategic solution, delicately abstracting unnecessary de ### ML Ops -- Reducing technical debt of machine learning -The ability of ML systems to rapidly accumulate technical debt has given rise to the concept of MLOps, a methodology that takes inspiration from and incorporates best practices of the DevOps , tailoring them to address the distinctive challenges and workflows inherent in machine learning and controlling technical debt. MLOps seamlessly applies the established principles of DevOps to the intricate landscape of machine learning, recognizing that merely a fraction of real-world ML systems comprises the actual ML code. Serving as a crucial bridge between development and the ongoing intricacies of maintaining ML systems. +The ability of ML systems to rapidly accumulate technical debt has given rise to the concept of MLOps, a methodology that takes inspiration from and incorporates best practices of the DevOps , tailoring them to address the distinctive challenges and workflows inherent in machine learning in an effort to control technical debt. MLOps applies the established principles of DevOps to machine learning, recognizing that merely a fraction of real-world ML systems comprises the actual ML code. Serving as a crucial bridge between development and the ongoing intricacies of maintaining ML systems. Some examples of concepts of DevOps applied to ML (aka ML Ops) are: @@ -111,16 +111,16 @@ Some examples of concepts of DevOps applied to ML (aka ML Ops) are: * Utilize collaboration tools for effective communication and information sharing among team members. - * Feature Store provides a platform for discovering, re using and collaborating on ML features + * Feature Store's provides a platform for discovering, re using and collaborating on ML features - * Model Database provides platform for discovering, re using and collaborating on ML Models + * Model Database's provide a platform for discovering, re using and collaborating on ML Models 5. **Version Control:** - * Applying version control to experiments, machine learning models and features provides + * Applying version control to experiments, machine learning models and features provides better change management and auditing of these ML artifacts -MLOps is a methodology that provides a collection of concepts and workflows designed to promote efficiency, collaboration, and sustainability of the ML Lifecycle. MLOps plays a pivotal role in ensuring the efficiency, reliability, and scalability of machine learning implementations over time. +MLOps is a methodology that provides a collection of concepts and workflows designed to promote efficiency, collaboration, and sustainability of the ML Lifecycle. Correctly applied MLOps can play a pivotal role in ensuring the efficiency, reliability, and scalability of machine learning implementations over time. The Rise of Machine Learning Platform ------------------------------------- From e628fcaf7a38045bfbd0b8a78c2339d30a4b7d84 Mon Sep 17 00:00:00 2001 From: ben Date: Fri, 2 Feb 2024 17:58:24 -0800 Subject: [PATCH 14/20] update date till monday will release then --- ...ion-of-mlplatform.md => 2024-02-05-evolution-of-mlplatform.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename _posts/{2024-02-01-evolution-of-mlplatform.md => 2024-02-05-evolution-of-mlplatform.md} (100%) diff --git a/_posts/2024-02-01-evolution-of-mlplatform.md b/_posts/2024-02-05-evolution-of-mlplatform.md similarity index 100% rename from _posts/2024-02-01-evolution-of-mlplatform.md rename to _posts/2024-02-05-evolution-of-mlplatform.md From 48b6ec515f2606f463596fe1932ae1100d700b9f Mon Sep 17 00:00:00 2001 From: Ben Shaw Date: Tue, 6 Feb 2024 20:22:46 -0800 Subject: [PATCH 15/20] Update 2024-02-05-evolution-of-mlplatform.md --- _posts/2024-02-05-evolution-of-mlplatform.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-02-05-evolution-of-mlplatform.md b/_posts/2024-02-05-evolution-of-mlplatform.md index b17f0ca..dc2df23 100644 --- a/_posts/2024-02-05-evolution-of-mlplatform.md +++ b/_posts/2024-02-05-evolution-of-mlplatform.md @@ -9,7 +9,7 @@ tags: - ml-platform-series --- -Machine Learning Platforms (ML Platforms) have the potential to be a key component to achieving production ML at scale without large technical debt, yet ML Platforms are not often well understood. This document outlines the key concepts and paradigm shifts that led to the conceptualization of ML Platforms in an effort to increase an understanding of these and how they can best be applied to bring value. +Machine Learning Platforms (ML Platforms) have the potential to be a key component in achieving production ML at scale without large technical debt, yet ML Platforms are not often understood. This document outlines the key concepts and paradigm shifts that led to the conceptualization of ML Platforms in an effort to increase an understanding of these platforms and how they can best be applied. Technical Debt and development velocity defined From 88cce8e236c0dcc1d0b671991b3ec58992b12c9e Mon Sep 17 00:00:00 2001 From: Ben Shaw Date: Thu, 8 Feb 2024 19:59:01 -0800 Subject: [PATCH 16/20] Update 2024-02-05-evolution-of-mlplatform.md [WIP] refactor links and move benefits to bottom with more specific examples --- _posts/2024-02-05-evolution-of-mlplatform.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/_posts/2024-02-05-evolution-of-mlplatform.md b/_posts/2024-02-05-evolution-of-mlplatform.md index dc2df23..10943bc 100644 --- a/_posts/2024-02-05-evolution-of-mlplatform.md +++ b/_posts/2024-02-05-evolution-of-mlplatform.md @@ -30,7 +30,7 @@ The idea behind technical debt is to highlight the consequences of prioritizing Originally a software engineering concept, Technical debt is also relevant to Machine Learning Systems infact the landmark google paper suggest that ML systems have the propensity to easily gain this technical debt. > Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt , we find it is common to incur massive ongoing maintenance costs in real-world ML systems -> +> /todo fix link > [https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems) > As the machine learning (ML) community continues to accumulate years of experience with livesystems, a wide-spread and uncomfortable trend has emerged: developing and deploying ML sys-tems is relatively fast and cheap, but maintaining them over time is difficult and expensive @@ -82,7 +82,15 @@ Platforms emerged as a strategic solution, delicately abstracting unnecessary de ### ML Ops -- Reducing technical debt of machine learning The ability of ML systems to rapidly accumulate technical debt has given rise to the concept of MLOps, a methodology that takes inspiration from and incorporates best practices of the DevOps , tailoring them to address the distinctive challenges and workflows inherent in machine learning in an effort to control technical debt. MLOps applies the established principles of DevOps to machine learning, recognizing that merely a fraction of real-world ML systems comprises the actual ML code. Serving as a crucial bridge between development and the ongoing intricacies of maintaining ML systems. +MLOps is a methodology that provides a collection of concepts and workflows designed to promote efficiency, collaboration, and sustainability of the ML Lifecycle. Correctly applied MLOps can play a pivotal role in ensuring the efficiency, reliability, and scalability of machine learning implementations over time. + +The Rise of Machine Learning Platform +------------------------------------- + +The paradigm shifts of DevOps, MLOps and Platform Thinking led to the emergence of Machine Learning platforms. ML platforms are the application of MLOps concepts and workflows and provide a curated developer experience for Machine Learning developers throughout the entire ML lifecycle. These platforms address the challenges of cognitive load, technical debt, quality and developer velocity and increase efficiency, collaboration, and sustainability. As the ML team grows, the benefits amplify, creating a multiplier effect that allows organizations to scale whilst maintaining quality. +### Scribd's ML Platform -- MLOps in Action +/todo Some examples of concepts of DevOps applied to ML (aka ML Ops) are: 1. **Automation:** @@ -120,13 +128,6 @@ Some examples of concepts of DevOps applied to ML (aka ML Ops) are: * Applying version control to experiments, machine learning models and features provides better change management and auditing of these ML artifacts -MLOps is a methodology that provides a collection of concepts and workflows designed to promote efficiency, collaboration, and sustainability of the ML Lifecycle. Correctly applied MLOps can play a pivotal role in ensuring the efficiency, reliability, and scalability of machine learning implementations over time. - -The Rise of Machine Learning Platform -------------------------------------- - -The paradigm shifts of DevOps, MLOps and Platform Thinking led to the emergence of Machine Learning platforms. ML platforms are the application of MLOps concepts and workflows and provide a curated developer experience for Machine Learning developers throughout the entire ML lifecycle. These platforms address the challenges of cognitive load, technical debt, quality and developer velocity and increase efficiency, collaboration, and sustainability. As the ML team grows, the benefits amplify, creating a multiplier effect that allows organizations to scale whilst maintaining quality. - ### Benefits to the Organization The adoption of a Machine Learning Platform unfolds a spectrum of benefits: From 65e33686d9292ec1842cc66e9d337a4c4e6729aa Mon Sep 17 00:00:00 2001 From: Ben Shaw Date: Fri, 9 Feb 2024 17:25:59 -0800 Subject: [PATCH 17/20] Update 2024-02-05-evolution-of-mlplatform.md fix links and add details about scribds ml platform --- _posts/2024-02-05-evolution-of-mlplatform.md | 69 ++++++-------------- 1 file changed, 21 insertions(+), 48 deletions(-) diff --git a/_posts/2024-02-05-evolution-of-mlplatform.md b/_posts/2024-02-05-evolution-of-mlplatform.md index 10943bc..8ae666d 100644 --- a/_posts/2024-02-05-evolution-of-mlplatform.md +++ b/_posts/2024-02-05-evolution-of-mlplatform.md @@ -30,12 +30,11 @@ The idea behind technical debt is to highlight the consequences of prioritizing Originally a software engineering concept, Technical debt is also relevant to Machine Learning Systems infact the landmark google paper suggest that ML systems have the propensity to easily gain this technical debt. > Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt , we find it is common to incur massive ongoing maintenance costs in real-world ML systems -> /todo fix link -> [https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems) +> [Sculley et al (2021) Hidden Technical Debt in Machine Learning Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems) > As the machine learning (ML) community continues to accumulate years of experience with livesystems, a wide-spread and uncomfortable trend has emerged: developing and deploying ML sys-tems is relatively fast and cheap, but maintaining them over time is difficult and expensive > -> [https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems) +> [Sculley et al (2021) Hidden Technical Debt in Machine Learning Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems) Technical debt is important to consider especially when trying to move fast. Moving fast is easy, moving fast without acquiring technical debt is alot more complicated. @@ -65,7 +64,7 @@ This shift to DevOps and teams teams owning the entire development lifecycle int > The total amount of mental effort a team uses to understand, operate and maintain their designated systems or tasks. > -> — [](https://teamtopologies.com/book "https://teamtopologies.com/book")[https://teamtopologies.com/book](https://teamtopologies.com/book) +> [Skelton & Pais (2019) Team Topologies](https://teamtopologies.com/book) As teams grapple with the mental effort required by adopting DevOps of understanding, operating, and maintaining systems, cognitive load becomes a barrier to efficiency. The weight of this additional load can hinder productivity, prompting organizations to seek solutions. @@ -73,11 +72,11 @@ Platforms emerged as a strategic solution, delicately abstracting unnecessary de > The purpose of a platform team is to enable stream-aligned teams to deliver work with substantial autonomy. The stream-aligned team maintains full ownership of building, running, and fixing their application in production. The platform team provides internal services to reduce the cognitive load that would be required from stream-aligned teams to develop these underlying services. > -> — [](https://teamtopologies.com/book "https://teamtopologies.com/book")[https://teamtopologies.com/book](https://teamtopologies.com/book) +> [Skelton & Pais (2019) Team Topologies](https://teamtopologies.com/book) -> _Infrastructure Platform teams enable organisations to scale delivery by solving common product and non-functional requirements with resilient solutions. This allows other teams to focus on building their own things and releasing value for their users_ +> Infrastructure Platform teams enable organisations to scale delivery by solving common product and non-functional requirements with resilient solutions. This allows other teams to focus on building their own things and releasing value for their users > -> \- [https://martinfowler.com/articles/building-infrastructure-platform.html](https://martinfowler.com/articles/building-infrastructure-platform.html) +> [Rowse & Shepherd (2022) Building Infrastructure Platforms](https://martinfowler.com/articles/building-infrastructure-platform.html) ### ML Ops -- Reducing technical debt of machine learning @@ -87,66 +86,40 @@ MLOps is a methodology that provides a collection of concepts and workflows desi The Rise of Machine Learning Platform ------------------------------------- -The paradigm shifts of DevOps, MLOps and Platform Thinking led to the emergence of Machine Learning platforms. ML platforms are the application of MLOps concepts and workflows and provide a curated developer experience for Machine Learning developers throughout the entire ML lifecycle. These platforms address the challenges of cognitive load, technical debt, quality and developer velocity and increase efficiency, collaboration, and sustainability. As the ML team grows, the benefits amplify, creating a multiplier effect that allows organizations to scale whilst maintaining quality. +The paradigm shifts of DevOps, MLOps and Platform Thinking led to the emergence of Machine Learning platforms. ML platforms are the application of MLOps concepts and workflows and provide a curated developer experience for Machine Learning developers throughout the entire ML lifecycle. As the ML team grows, the benefits of a platform amplify, creating a multiplier effect that allows organizations to scale whilst maintaining quality and not getting bogged down with technical debt. + ### Scribd's ML Platform -- MLOps in Action -/todo -Some examples of concepts of DevOps applied to ML (aka ML Ops) are: +At Scribd we have applied concepts from DevOps to our ML Operations in the following ways 1. **Automation:** - - 1. Automation can be applied to many parts of the machine learning lifecycle. The incorporation of automation not only streamlines processes but also addresses technical debt through the establishment of consistency and a standardized and reproducible approach. - - 2. Model deployments which can be automated by the implementation of DevOps CI/CD strategies. - - 3. Automation can also be applied to retraining of machine learning models + + * Applying CI/CD strategies to model deployments through the use of Jenkins pipelines which deploy models from the Model Registry to AWS based endpoints. + * Automating Model training throug the use of Airflow DAGS and allowing these DAGS to trigger the deployment pipelines to deploy a model once re-training has occured. 2. **Continuous** **Testing:** - * Continuous testing can be applied as part of a model deployment pipeline, removing the need for manual testing (increasing development velocity) and removing technical debt by ensuring tests are performed consistently - - * Model validation can be automated using tooling providing consistency between training iterations. + * Applying continuous testing as part of a model deployment pipeline, removing the need for manual testing. + * Increased tooling to support model validation testing. 3. **Monitoring:** - - * Monitoring provides key insights and a steps towards creating vital feedback loops. - - * Monitoring can be applied to real time inference infrastructure revealing performance concerns similar to dev ops. - * Monitoring can be applied to Model performance and monitor for model drift in realtime, providing realtime insight and analysis to model performance and when it may need to be retrained. + * Monitoring real time inference endpoints + * Monitoring training DAGS 4. **Collaboration and Communication:** - - * Utilize collaboration tools for effective communication and information sharing among team members. - - * Feature Store's provides a platform for discovering, re using and collaborating on ML features + + * Feature Store which provides feature discovery and re-use + * Model Database which provides model collaboration - * Model Database's provide a platform for discovering, re using and collaborating on ML Models - -5. **Version Control:** +6. **Version Control:** - * Applying version control to experiments, machine learning models and features provides better change management and auditing of these ML artifacts + * Applyied version control to experiments, machine learning models and features -### Benefits to the Organization - -The adoption of a Machine Learning Platform unfolds a spectrum of benefits: - -**Increasing Flow of Change (aka developer velocity):** A swift pace in model development and deployment, enhancing overall efficiency. - -**Fostering Collaboration Amongst Teams:** Breaking down silos and promoting cross-functional collaboration. The platform becomes the silent foundation for collaboration, facilitating a harmonious working environment. - -**Enforcing Best Practices:** Standardizing and ensuring adherence to best practices across ML projects. - -**Reducing/Limiting Technical Debt:** Strategically mitigating the risk of accumulating technical debt, ensuring long-term sustainability. - -**Multiplier Effect:** As the ML team grows, these benefits of the platform amplify—a dividend that multiplies with organizational growth. - References ---------- -[https://www.youtube.com/watch?v=Bfhl8kcSaEI&embeds\_referring\_euri=https%3A%2F%2Fplatformengineering.org%2F&feature=emb\_imp\_woyt](https://www.youtube.com/watch?v=Bfhl8kcSaEI&embeds_referring_euri=https%3A%2F%2Fplatformengineering.org%2F&feature=emb_imp_woyt) - [https://www.atlassian.com/devops/frameworks/team-topologies](https://www.atlassian.com/devops/frameworks/team-topologies) [https://platformengineering.org/blog/what-is-platform-engineering](https://platformengineering.org/blog/what-is-platform-engineering) From 698a85802a2d7dcba663078e7c6386f5e49bf2e5 Mon Sep 17 00:00:00 2001 From: Ben Shaw Date: Fri, 9 Feb 2024 17:43:05 -0800 Subject: [PATCH 18/20] Update 2024-02-05-evolution-of-mlplatform.md Fix references --- _posts/2024-02-05-evolution-of-mlplatform.md | 21 ++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/_posts/2024-02-05-evolution-of-mlplatform.md b/_posts/2024-02-05-evolution-of-mlplatform.md index 8ae666d..74655c1 100644 --- a/_posts/2024-02-05-evolution-of-mlplatform.md +++ b/_posts/2024-02-05-evolution-of-mlplatform.md @@ -30,6 +30,7 @@ The idea behind technical debt is to highlight the consequences of prioritizing Originally a software engineering concept, Technical debt is also relevant to Machine Learning Systems infact the landmark google paper suggest that ML systems have the propensity to easily gain this technical debt. > Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt , we find it is common to incur massive ongoing maintenance costs in real-world ML systems +> > [Sculley et al (2021) Hidden Technical Debt in Machine Learning Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems) > As the machine learning (ML) community continues to accumulate years of experience with livesystems, a wide-spread and uncomfortable trend has emerged: developing and deploying ML sys-tems is relatively fast and cheap, but maintaining them over time is difficult and expensive @@ -120,22 +121,22 @@ At Scribd we have applied concepts from DevOps to our ML Operations in the follo References ---------- -[https://www.atlassian.com/devops/frameworks/team-topologies](https://www.atlassian.com/devops/frameworks/team-topologies) +[Bottcher (2018, March 05). What I Talk About When I Talk About Platforms. https://martinfowler.com/articles/talk-about-platforms.html](https://martinfowler.com/articles/talk-about-platforms.html) -[https://platformengineering.org/blog/what-is-platform-engineering](https://platformengineering.org/blog/what-is-platform-engineering) +[D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Franc¸ois Crespo, Dan Dennison (2021) Hidden Technical Debt in Machine Learning Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems) -[https://www.thoughtworks.com/insights/blog/platforms/art-platform-thinking](https://www.thoughtworks.com/insights/blog/platforms/art-platform-thinking) +[Fowler (2022, October 20).Conway's Law. https://martinfowler.com/bliki/ConwaysLaw.html](https://martinfowler.com/bliki/ConwaysLaw.html) -[https://www.scribd.com/document/611845499/Whitepaper-State-of-Platform-Engineering-Report](https://www.scribd.com/document/611845499/Whitepaper-State-of-Platform-Engineering-Report) +[Galante, what is platform engineering. https://platformengineering.org/blog/what-is-platform-engineering](https://platformengineering.org/blog/what-is-platform-engineering) -[https://martinfowler.com/bliki/ConwaysLaw.html](https://martinfowler.com/bliki/ConwaysLaw.html) +[Humanitect, State of Platform Engineering Report](https://www.scribd.com/document/611845499/Whitepaper-State-of-Platform-Engineering-Report) -[https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems](https://www.scribd.com/document/428241724/Hidden-Technical-Debt-in-Machine-Learning-Systems) +[Hodgson (2023, July 19).How platform teams get stuff done. https://martinfowler.com/articles/platform-teams-stuff-done.html](https://martinfowler.com/articles/platform-teams-stuff-done.html) -[https://martinfowler.com/articles/building-infrastructure-platform.html](https://martinfowler.com/articles/building-infrastructure-platform.html) +[Murray (2017, April 27. The Art of Platform Thinking. https://www.thoughtworks.com/insights/blog/platforms/art-platform-thinking)](https://www.thoughtworks.com/insights/blog/platforms/art-platform-thinking) -[https://martinfowler.com/articles/platform-teams-stuff-done.html](https://martinfowler.com/articles/platform-teams-stuff-done.html) +[Rouse (2017, March 20). Technical Debt. https://www.techopedia.com/definition/27913/technical-debt](https://www.techopedia.com/definition/27913/technical-debt) -[https://martinfowler.com/articles/talk-about-platforms.html](https://martinfowler.com/articles/talk-about-platforms.html) +[Rowse & Shepherd (2022).Building Infrastructure Platforms. https://martinfowler.com/articles/building-infrastructure-platform.html](https://martinfowler.com/articles/building-infrastructure-platform.html) -[https://www.techopedia.com/definition/27913/technical-debt](https://www.techopedia.com/definition/27913/technical-debt) +[Skelton & Pais (2019) Team Topologies](https://teamtopologies.com/book) From 9a88cbaaf3c5f56e625297c6ec96f66fc6a112ba Mon Sep 17 00:00:00 2001 From: Ben Shaw Date: Fri, 9 Feb 2024 17:53:45 -0800 Subject: [PATCH 19/20] Update 2024-02-05-evolution-of-mlplatform.md Reduce cruft --- _posts/2024-02-05-evolution-of-mlplatform.md | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/_posts/2024-02-05-evolution-of-mlplatform.md b/_posts/2024-02-05-evolution-of-mlplatform.md index 74655c1..089d8a1 100644 --- a/_posts/2024-02-05-evolution-of-mlplatform.md +++ b/_posts/2024-02-05-evolution-of-mlplatform.md @@ -17,7 +17,7 @@ Technical Debt and development velocity defined ### Development Velocity -Machine learning development velocity refers to the speed and efficiency at which machine learning (ML) projects progress from the initial concept to deployment and maintenance. It encompasses the entire lifecycle of a machine learning project, from data collection and preprocessing to model training, evaluation, deployment, and ongoing optimization. In platform engineering this is often referred to as rate of change. +Machine learning development velocity refers to the speed and efficiency at which machine learning (ML) projects progress from the initial concept to deployment in a production environment. It encompasses the entire lifecycle of a machine learning project, from data collection and preprocessing to model training, evaluation, validation deployment and testing for new models or for re-training, validation and deployment of existing models. ### Technical Debt @@ -44,7 +44,7 @@ The Evolution Of ML Platforms ### DevOps -- The paradigm shift that led the way -DevOps is a methodology in software development which advocates for teams owning the entire software development lifecycle. This paradigm shift from fragmented teams to end-to-end ownership enhances collaboration and accelerates delivery. Dev ops has become standard practice in modern software development. The adoption of DevOps has been widespread across various industries, with many organizations considering it an essential part of their software development and delivery processes. Some of the principles of DevOps are: +DevOps is a methodology in software development which advocates for teams owning the entire software development lifecycle. This paradigm shift from fragmented teams to end-to-end ownership enhances collaboration and accelerates delivery. Dev ops has become standard practice in modern software development and the adoption of DevOps has been widespread, with many organizations considering it an essential part of their software development and delivery processes. Some of the principles of DevOps are: 1. **Automation** @@ -67,7 +67,7 @@ This shift to DevOps and teams teams owning the entire development lifecycle int > > [Skelton & Pais (2019) Team Topologies](https://teamtopologies.com/book) -As teams grapple with the mental effort required by adopting DevOps of understanding, operating, and maintaining systems, cognitive load becomes a barrier to efficiency. The weight of this additional load can hinder productivity, prompting organizations to seek solutions. +The weight of the additional load introduced in DevOps of teams owning the entire software development lifecycle can hinder productivity, prompting organizations to seek solutions. Platforms emerged as a strategic solution, delicately abstracting unnecessary details of the development lifecycle. This abstraction allows engineers to focus on critical tasks, mitigating cognitive load and fostering a more streamlined workflow. @@ -81,17 +81,12 @@ Platforms emerged as a strategic solution, delicately abstracting unnecessary de ### ML Ops -- Reducing technical debt of machine learning -The ability of ML systems to rapidly accumulate technical debt has given rise to the concept of MLOps, a methodology that takes inspiration from and incorporates best practices of the DevOps , tailoring them to address the distinctive challenges and workflows inherent in machine learning in an effort to control technical debt. MLOps applies the established principles of DevOps to machine learning, recognizing that merely a fraction of real-world ML systems comprises the actual ML code. Serving as a crucial bridge between development and the ongoing intricacies of maintaining ML systems. -MLOps is a methodology that provides a collection of concepts and workflows designed to promote efficiency, collaboration, and sustainability of the ML Lifecycle. Correctly applied MLOps can play a pivotal role in ensuring the efficiency, reliability, and scalability of machine learning implementations over time. +The ability of ML systems to rapidly accumulate technical debt has given rise to the concept of MLOps. MLOps is a methodology that takes inspiration from and incorporates best practices of the DevOps, tailoring them to address the distinctive challenges inherent in machine learning. MLOps applies the established principles of DevOps to machine learning, recognizing that merely a fraction of real-world ML systems comprises the actual ML code. Serving as a crucial bridge between development and the ongoing intricacies of maintaining ML systems. +MLOps is a methodology that provides a collection of concepts and workflows designed to promote efficiency, collaboration, and sustainability of the ML Lifecycle. Correctly applied MLOps can play a pivotal role controlling technical debt and ensuring the efficiency, reliability, and scalability of the machine learning lifecycle over time. -The Rise of Machine Learning Platform +Scribd's ML Platform -- MLOps and Platforms in Action ------------------------------------- - -The paradigm shifts of DevOps, MLOps and Platform Thinking led to the emergence of Machine Learning platforms. ML platforms are the application of MLOps concepts and workflows and provide a curated developer experience for Machine Learning developers throughout the entire ML lifecycle. As the ML team grows, the benefits of a platform amplify, creating a multiplier effect that allows organizations to scale whilst maintaining quality and not getting bogged down with technical debt. - - -### Scribd's ML Platform -- MLOps in Action -At Scribd we have applied concepts from DevOps to our ML Operations in the following ways +At Scribd we have developed a machine learning platform which provides a curated developer experience for machine learning developers and applies the concepts of DevOps in the following ways 1. **Automation:** From 441d41a44a3a230c51041429ccb969ac95d90ad5 Mon Sep 17 00:00:00 2001 From: ben Date: Thu, 15 Feb 2024 17:18:12 -0800 Subject: [PATCH 20/20] refined scribs platform section --- _posts/2024-02-05-evolution-of-mlplatform.md | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/_posts/2024-02-05-evolution-of-mlplatform.md b/_posts/2024-02-05-evolution-of-mlplatform.md index 089d8a1..37f22c2 100644 --- a/_posts/2024-02-05-evolution-of-mlplatform.md +++ b/_posts/2024-02-05-evolution-of-mlplatform.md @@ -86,31 +86,27 @@ MLOps is a methodology that provides a collection of concepts and workflows desi Scribd's ML Platform -- MLOps and Platforms in Action ------------------------------------- -At Scribd we have developed a machine learning platform which provides a curated developer experience for machine learning developers and applies the concepts of DevOps in the following ways +At Scribd we have developed a machine learning platform which provides a curated developer experience for machine learning developers. This platform has been built with MLOps in mind which can be seen through its use of common DevOps principles. -1. **Automation:** - +1. **Automation:** * Applying CI/CD strategies to model deployments through the use of Jenkins pipelines which deploy models from the Model Registry to AWS based endpoints. * Automating Model training throug the use of Airflow DAGS and allowing these DAGS to trigger the deployment pipelines to deploy a model once re-training has occured. 2. **Continuous** **Testing:** - * Applying continuous testing as part of a model deployment pipeline, removing the need for manual testing. * Increased tooling to support model validation testing. 3. **Monitoring:** - * Monitoring real time inference endpoints * Monitoring training DAGS + * Monitoring batch jobs 4. **Collaboration and Communication:** - * Feature Store which provides feature discovery and re-use * Model Database which provides model collaboration 6. **Version Control:** - - * Applyied version control to experiments, machine learning models and features + * Applying version control to experiments, machine learning models and features References