Enterprises are betting big on machine learning (ML). According to IDC, 85% of the world’s largest organizations will be using artificial intelligence (AI) — including machine learning (ML), natural language processing (NLP) and pattern recognition — by 2026.
And a survey conducted by ESG found, “62% of organizations plan to increase their year-over-year spend on AI, including investments in people, process, and technology.”
But despite all the money flowing into ML projects, most organizations are struggling to get their ML models and applications working on production systems.
The market researchers at Gartner say that “Only half of AI projects make it from pilot into production, and those that do take an average of nine months to do so.”
IDC’s numbers look even worse, with only 31% of enterprises surveyed saying that they have AI functioning in production. In addition, “Of the 31% with AI in production, only one third claim to have reached a mature state of adoption wherein the entire organization benefits from an enterprise-wide AI strategy.”
And another recent survey has the worst numbers of all, finding that 90% of ML models are not deployed to production.
So what’s the problem? Why are so many enterprises finding it difficult to realize their ML goals?
The problem with ML
Industry watchers suggest that enterprise struggles with ML boil down to two key factors: processes and infrastructure.
On the process side, most ML projects require the integration of multiple teams and systems. An Omdia report notes, “Successful enterprise ML at scale demands the careful orchestration of a complex tapestry made up of people, processes, and platforms, an effort that does not end when an ML solution goes live but instead continues for the life of the solution.”
Many enterprises do not yet have repeatable processes in place to address these needs. As a result, data scientists often spend too much time on IT operations tasks, like figuring out how to allocate computing resources, rather than actually creating and training data science models.
These problems are exacerbated by a lack of hardware designed for ML use cases. Gartner reports, “86% of organizations identified at least one of the following areas as a weak link in their AI infrastructure stack: GPU processing, CPU processing, data storage, networking, resource sharing, or integrated development environments.”
IDC agrees. “IDC research consistently shows that inadequate or lack of purpose-built infrastructure capabilities are often the cause of AI projects failing,” said Peter Rutten, IDC research vice president and global research lead on Performance Intensive Computing Solutions.
The promise of MLOps
So how can enterprises overcome these challenges? A partial solution lies in the adoption of MLOps.
At its simplest, MLOps is defined as applying the principles of the DevOps movement to machine learning. Cnvrg.io, which has built ready-to-use open source ML pipelines that can run on any infrastructure, explains that MLOps “reduces friction and bottlenecks between ML development teams and engineering teams in order to operationalize models.” It adds, “It is a discipline that seeks to systematize the entire ML lifecycle.”
The approach works. Organizations that have implemented MLOps report up to a 10x increase in productivity, 5x faster model training, and up to a 50% increase in compute utilization according to cnvrg.io research.
It should be no wonder then that IDC predicts, “By 2024, 60% of enterprises will have operationalized their ML workflows through MLOps/ModelOps capabilities and AI-infused their IT Infrastructure operations through AIOps capabilities.”
Infrastructure designed for MLOps
But MLOps is only part of the answer. Enterprises also need infrastructure designed to meet ML needs and, more specifically, to meet the needs MLOps. With that in mind, Dell Technologies recently rolled out its Dell Validated Design for AI, built in collaboration with cnvrg.io.
It addresses the need for fast compute with VxRail HCI V670 or PowerEdge R750a servers. The Dell design augments the CPUs with industry-leading NVIDIA A100 or A30 GPUs. PowerSwitch 25GbE S5248F‑ON or NVIDIA® Spectrum® SN3700 and out‑of‑band PowerSwitch S4148T‑ON — provide the speed and bandwidth necessary for MLOps. And PowerScale F600 or H600 provides highly scalable storage. Tying it all together is cnrg.io’s MLOps stack, VMware Tanzu, and NVIDIA AI Enterprise software.
Dell infrastructure is also part of Intel’s cnvrg.io Metacloud, giving AI developers the flexibility to run, test and deploy AI and ML workloads on mixed hardware within the same AI/ML workflow or pipeline. Metacloud leverages cloud-native technologies such as containers and Kubernetes, which enables developers to quickly and easily select infrastructure located on-premises, co-located and in any public cloud and run the workload.
With the right processes and infrastructure, enterprises can overcome the challenges inherent in ML at scale and begin to accomplish the goals of their machine learning projects.
Intel® Technologies Move Analytics Forward
Data analytics is the key to unlocking the most value you can extract from data across your organization. To create a productive, cost-effective analytics strategy that gets results, you need high performance hardware that’s optimized to work with the software you use.
Modern data analytics spans a range of technologies, from dedicated analytics platforms and databases to deep learning and artificial intelligence (AI). Just starting out with analytics? Ready to evolve your analytics strategy or improve your data quality? There’s always room to grow, and Intel is ready to help. With a deep ecosystem of analytics technologies and partners, Intel accelerates the efforts of data scientists, analysts, and developers in every industry. Find out more about Intel advanced analytics.