Jonathan Martin and Shimon Ben-David, WEKA | AI’s Second Wave- What’s Next for Enterprise AI in 2025
Estimated read time: 1:20
AI is evolving every day. Don't fall behind.
Join 50,000+ readers learning how to use AI in just 5 minutes daily.
Completely free, unsubscribe at any time.
Summary
In a discussion hosted by SiliconANGLE's theCUBE, Jonathan Martin and Shimon Ben-David from WEKA explored the evolution and future of Enterprise AI heading into 2025. They highlighted the shift from experimental phases to practical implementation, focusing on the use of proprietary enterprise datasets and preparing for AI at an exascale, which offers more small, secure, and sovereign language models. The conversation touched on key AI trends including the importance of ROI, the challenges of exascale computing, and the sustainability concerns related to power and cooling in data centers.
Highlights
AI's shift from experimentation to practical application is accelerating as we approach 2025. 🚀
The focus is on refining AI models for real-world ROI and tapping into proprietary data. 📈
Weka's insights reveal the growing importance of small, sovereign language models in enterprise AI. 🧠
Power density and performance in data centers are as crucial as ever due to rising data demands. ⚡
The industry is preparing for an exascale future where data handling at massive scales becomes routine. 📈
Key Takeaways
Enterprises are moving from AI exploration to practical integration by 2025. 🤖
Proprietary data sets are key to leveraging competitive advantage with AI. 📊
Exascale computing is becoming essential, shifting from niche to widespread necessity. 💾
Power and cooling are major challenges in AI data centers, making sustainability a critical issue. 🌱
AI could transform industries like the internet did in the '90s, emphasizing the importance of staying ahead of the curve. 🌐
Overview
As we approach 2025, enterprises are transitioning from merely exploring AI capabilities to fully integrating AI into their operations. Jonathan Martin and Shimon Ben-David of WEKA highlight this trend during a chat with SiliconANGLE's theCUBE, noting that proprietary datasets are becoming the new goldmine for achieving a competitive edge with AI. This shift is essential for driving meaningful ROI from AI investments and navigating the rapidly changing technological landscape.
The conversation took a deep dive into exascale computing—a burgeoning necessity rather than a niche—as organizations generate and process data at unprecedented levels. Martin and Ben-David emphasized that power performance density within data centers has become a vital consideration. The scalability and efficiency challenges associated with these large-scale environments demand innovative approaches and solutions that are both cost-effective and environmentally sustainable.
Moreover, the discourse touched on the transformative potential of AI, likening its impact to the dawn of the internet era. Companies that master AI integration today are poised to lead their industries tomorrow, much like digital pioneers of the '90s. Both speakers highlighted the dual challenge and opportunity of harnessing AI: it requires navigating immense technical complexities while also capturing groundbreaking benefits.
Jonathan Martin and Shimon Ben-David, WEKA | AI’s Second Wave- What’s Next for Enterprise AI in 2025 Transcription
00:00 - 00:30 generative AI has captured the narrative and has catalyzed a lot of Rapid shifts in technology and business priorities now the initial wave of Enterprise AI centered on experimentation a lot of governance and training and concerns about legal and compliance issues but really the focus has been on training large language models but as 2025 approaches Enterprises are moving from exploration to implement ation and
00:30 - 01:00 they're refining their AI models for real world applications to drive Roi and importantly they're getting ready to tap the potential of the proprietary Enterprise data sets and in this conversation wca's Jonathan Martin and shimone Ben David unpack the key AI Trends poised to reshape Enterprises in 2025 from AI inference the rise of small secure and Sovereign language models to the vital importance of power performance density and preparing for an exoscale future now together we're going
01:00 - 01:30 to dive into the challenges and opportunities that lie ahead for AI scale welcome gentlemen to our pregame coverage of supercomputing 2024 it's great to have you on the program great to be here great to be here thank you all right let's get into it so guys training has been all the rage it's driven billions of dollars in capex as the Big B tens of billions hundreds of billions of dollars really is the big five llm vendors they Rush after what we call the Holy Grail of artificial general intelligence but for
01:30 - 02:00 mainstream Enterprises inference is going to be the name of the game and it's expected to dominate the future of AI workloads Jonathan let's start with you how do you see the current state of gen Ai and what should our audience expect in the coming months and years ahead in this topic of training V inference so so I think you're right on the money in your your opening around 2024 being a big Focus for the the AI explosion or the first wave the air
02:00 - 02:30 explosion really being around L large scale training of models um buildout of GPU clouds and a significant amount of infrastructure and large Enterprises really focusing on governance on compliance on risk on bias on the policy end of of AI as we move into the back end of 24 we are are absolutely seeing that traditional large Enterprise is beginning to technically far into
02:30 - 03:00 deploying their first AI projects at scale so we're starting to see Financial Services manufacturing traditional large Enterprise who typically in 2024 haven't been deploying at scale beginning to do so um they but they're doing it in a very very different different way they're taking existing pre-trained models they're fine-tuning those models they're augmenting those models with o other sources of data and they're beginning to run know in production for
03:00 - 03:30 the very first time as we go into 2025 you know and I think that's the key is their sources of data that data is not going to just seep into the the public domain and that's really where the competitive Advantage is Shimon I wonder if you could discuss the critical elements of this second wave that Jonathan just talked about H how is this going to affect in your view the priorities and and what tech Trends should we expect are going to unfold as a result
03:30 - 04:00 BR I think the as Jonathan mentioned the barrier of entry into gen environment is is actually lower than what it was for traditional AI environment so if in the past you had to train your models and you had to have a whole practice of data scientist around it today it's very easy to just take models existing models pre-trained llms and run them in in your environment now saying easy is actually um a bit of a lie it's actually easier than what it was before eventually
04:00 - 04:30 Enterprises that will implement it are looking for an outcome and and we're seeing that uh going through 2025 they will actually look for a better Roi on their investment because they need to benefit from these llms and they need to do it in a way that actually allows them to to get more Revenue than than not doing it so talking about the the overall arching theme it's how do I as an Enterprise managed to get a better Roi the way to do it is um Enterprise now exploring whether to uh continue
04:30 - 05:00 using or start using um inferencing Services as a service CL in Cloud environments uh or maybe build their own um Enterprise inferencing environment in a GPU cloud or on Prem if they're actually going to do it how are they going to do it um because this is relatively a new field it's out there for two years already but that's not a lot uh in terms of creating blueprints so there there's actually a lot of exploration still uh whether to do it
05:00 - 05:30 with different Frameworks um which networking I think gpus is actually uh we're seeing uh Nvidia dominating that market we're seeing that Trend actually increasing with maybe um additional players coming into play how do you um as you mentioned how do you augment the data that you that your llms are familiar with and how do you benefit from that as well you know I wanted to ask you guys about uh retrieval augmented generation
05:30 - 06:00 and we did a survey this summer and I was surprised at how what a low percentage of Enterprises had actually embraced Rag and put it into deployment and when I talked to some of the the folks that took the survey they said yeah you know it's not as simple as everybody thinks do you guys have a point of view on that um in terms of of in in the one hand there's the cost aspects of what you pay but also just the time it takes and the skills do you see that accelerating and changing
06:00 - 06:30 definitely maybe I'll take a stab at it so we we're actually building uh these environments and we're actually interacting with customers that are doing that at at a PC level and at a production level and it's very easy to get a rag pipeline uh POC environment it's much more complicated to take that PC environment and deploy it in a way that is production ready secure scalable uh safe and in a cost-effective way as I mentioned um it's there's a lot of knoow regarding the pipeline itself which
06:30 - 07:00 utilities which Frameworks and how to stitch all of these environments together I have to say that Nvidia is doing a really good job at providing scalable utilities such as uh nemes and the neack to simplify these environments and we are seeing uh we're actually deploying a few of these environments as well and we're seeing where it uh saves um in complexity having said that it's still something that's not trivial it's
07:00 - 07:30 still something that customers are exploring there's new Frameworks every day um and honestly this is where customers are still looking for These Blueprints on how to do it what's what's the best out outcome how can we talk with organization that have that knowledge and familiarity and can guide us through this let's talk about ex scale um we're talking massive scale that's what you guys are all about just for the audience exas scale it's a quintilian calculations per second that's so 10
07:30 - 08:00 think 10 to the power of 18 it's just mind-boggling six order of Mag orders of magnitude more than a trillion so it's just incredible now we always talk about you can't have good AI without good data GPT 4 was trained on half a petabyte of of data but organizations have far more you know JP Morgan for instance it was reported has 150 pedabytes uh of of data and I'm sure there are many many organizations have greater amount of data so as that data growth curve
08:00 - 08:30 continues it's bending sort of exponentially so exoscale Computing it's Jonathan it's no longer a niche it's becoming a necessity isn't it what are you seeing in your customer base with respect to exas scale and and how real is it beyond the those most isolated government and supercomputing Labs y so I would say it this is the year where also ex scale has become to come very real so at the start of this year we had no customers that were over
08:30 - 09:00 an exobyte of storage capacity will end this year with five customers over an exobyte and one customer almost 10 exabytes the the realm of uh particularly text text to image and text to video is is burgeoning very very quickly and that that generates absolutely massive volumes of of data because of that what you found is that there are a whole bunch of new phrases and new metrics that are beginning to appear you know last year was very much just
09:00 - 09:30 AI at all costs this year increasingly it's there is a cost and that cost tends to be at times quite eyering and so the finance people the governance people are getting involved and more and more organizations are beginning to get focused on a new set of metrics to measure the impact of of the infrastructure they're deploying so people are beginning to use phrases and metrics like measuring the the number of tokens that they're generating for every dollar that they're spending on on infrastructure beginning to measure the number of tokens they're generating
09:30 - 10:00 for everyt that they're using beginning to generate the or or to measure the number of tokens for every hour of processing time that that they're running and a massive amount of that is getting focused on things like the the What's called the power density of the environment you know for for a rack of equipment how dense can you make that that equipment how how can you maximize the performance that it has at the lowest power
10:00 - 10:30 consumption as these environments get bigger and you know last year big big data centers were being measured in in 10 tens or maybe hundreds of petabytes I think next year you're going to see a number of data centers at large corporations being measured in in the tens of exabytes power density performance density become incredibly important and you know doing simplicity at scale is a a brand new challenge that many organizations are doing for the first time not not only a vendors you know
10:30 - 11:00 selling exites of storage for the first time organizations are deploying it and implementing exites of storage for the first time so it really is a a a new Panacea for everybody that's involved in this space yeah thank you Jonathan Shimon I wonder if you could address the what are the technical challenges of providing infrastructure at this type of exascale environment brilliant I think there's a a way to categorize it by by a few levels so first of all there's their Hardware complexity uh XS scale means a
11:00 - 11:30 certain amount of footprint racks servers controllers cables Network switches network topologies uh heating cooling um so so first of all there's the massive amount of footprint that you need to handle and that that's a big problem um Data Center capacity is a real challenge um in modern environments um powering all of these racks is a real challenge uh so so the the more you can shrink these environments
11:30 - 12:00 obviously the better but that's first of all that's and I would say even that this is the almost the easy problem then the next problem is actually the The Logical problem of how do you scale your data environment your storage environment to accommodate for these exoscale capacities uh that's something simply that not a lot of storage solution can do uh We've we've seen customers that had a storage environment a data environment that simply could not scale anymore uh in terms of capacity just like Rog gigabytes terabytes
12:00 - 12:30 petabytes that it could could handle and also there's another aspect to it that sometimes uh we fail to see but that's just the number of objects files iodes that A system can handle eventually everything is represented as a logical entity and there's tables in memories there's data structures that needs to be accommodated for there's uh algorithms that need to process those in an efficient way so um at some point we're seeing usually that um data environments can scale in terms of capacity and performance but then at some point there's the demission missing return
12:30 - 13:00 where they either cannot scale anymore and or or worse they they even decrease in performance and and efficiency so uh that that's a massive challenge actually that logical challenge I'll give an example uh with W what we did is we actually created the environment in a way that all of the data structures and algorithm are Computing computed based so there is almost no tables in memory so we have a very uh thin and effective um dense environment and also the data structures are built in a way that the
13:00 - 13:30 environment can scale and not increase in terms of memory on the computer nodes and on the storage nodes uh also there's the scale when we when we look at scale at modern scale there's also the how do scale between multiple data centers because we uh we're seeing that in 2024 and definitely in 2025 organizations are not going to be single geolocated anymore there's uh usually I can get compute and I can get gpus in multiple environments maybe maybe I already as an organization I'm acquiring data in one
13:30 - 14:00 environment and I'm processing it in another environment and then I'm inferencing on it in in a third environment so there is all of this notion of how do I scale globally also um so again if I'm looking at wer with the ability to move the data simy at petabyte and exabyte scale between this uh location that that facilitates for that so the combination of a very dense footprint with everything built to scale from day one and this is part of the being a modern environment and with the
14:00 - 14:30 data movement actually accommodates for these future scale challenges you know speaking of future scale sh Shimon the way you're describing this I think about exponential growth and sometimes you know it's hard to Fathom what the future is going to look like you know even in the near- term you know next two three four five years how do you make sure that what your customers deploy today are going to actually accommodate the needs in the next couple of years I think that's a great question and
14:30 - 15:00 that's something that we're seeing as part of uh customer exploring new environments new Frameworks uh exploring Genai uh we're seeing that that the way that they're currently as as I think I mentioned the way that they're currently doing things is is a day one implementation we're only scratching the surface of the art of the possible of how their environment would look in the future as they grow as they scale in terms of compute and capacity and between different data centers where was built to to scale in all of these uh
15:00 - 15:30 Dimensions capacity performance footprint it was built to be able to accommodate for new hardware environments new CPUs new gpus uh new accelerators it was built to accommodate for cloud environment it was built to also uh shrink and expand as needed to accommodate for for better uh Power utilization for better uh Roi on the environment if it's a cloud environment for example so we're future proofing the the environment by by looking at where customers are actually going to be in
15:30 - 16:00 the next two three four years and making sure that W can actually accommodate that and more and we actually if I'll maybe I'll finish with this we're actually also finding ourself guiding customers uh because we have a lot of accumulated knowledge regarding uh a lot of large scale AI projects in production and we're seeing how these large scale AI projects in prod Productions which are a year or two ahead of the market even are doing things we actually take that and we help customers uh
16:00 - 16:30 accommodate for their new environments using this uh that experience yeah thank you Shimon you know Jonathan The Narrative today of course we talking a lot about it here around Roi the the cost and I I like to say that enterprises are you know kind of hitting singles today but they're excited to really lean in you basically got five giant llm vendors two of which are open source and the economics of that space are brutal uh but when you think about um going getting Beyond training and taking that proprietary
16:30 - 17:00 data that we were talking about within organizations and start driving you know inference and Edge type of use cases from a business perspective Jonathan how are your customers do you think going to be leveraging inference in the future what's the value that can that that can bring to organizations maybe you could paint a picture for us yeah I think I think the value is is phenomenal you know PE people ask me all the time like you know are we done with this AI thing is it you
17:00 - 17:30 know is the is the bubble is the bubble over and I I keep going back to it kind of feels like the internet in 1994 where you know your vision of the internet or your perception of the internet was you got on your 288 modem and dialed into a bulletin board service and that was the internet and so it's very hard to to think 20 or 30 years down the the line that you can walk around with a bit of plastic in your pocket with the some of the world's knowledge on it and that's kind of where we are with AI it's it's incredibly incred early and the market
17:30 - 18:00 is evolving incredibly quickly so last year obviously big focus on kind of the monolithic model this year I think more Progressive customers are really thinking how do I use an ensemble of models those models some of them are very general some of them may leverage you know check GPD like interfaces but other models are incredibly specific and it's a combination of large generic models and a slew of of smaller very very specialist models that will work in
18:00 - 18:30 an ensemble in an orchestra together to deliver an outcome and I think for the more Progressive customers that we working with that is is the the nut that they they're trying to crack if they can crack that then I think you're going to find in just a few years there is going to be two types of companies on the planet there's going to be companies that are AI native at the core that have built a solid data Pipeline and they are are industrializing the process of ingesting large volumes of data and
18:30 - 19:00 transforming that that data through an ensemble of models into tokens and into insights and then there are going to be other companies that are not we saw the impact of the internet on transforming existing Industries and creating brand new Industries I think we're going to see exactly the same with AI over the next couple of years and it could actually be a double whammy I've been saying that I think you're going to have both organizational top- down command and control implementations of AI plus you're going to have citizens AI where
19:00 - 19:30 personal productivity is going to be driven by people close to the action who are going to learn how to deploy AI That's to me is what an AI native company looks like so it's I think it's internet plus PC cycle you know combined to you know it's going to be incredible productivity booms but I want to talk about power and cooling because that seems to be you know the big you know elephant in the room if you will or potential blocker I mean these systems are just in insanely dense uh you talked about that earlier uh the the miles and miles of of of cabling uh you walk into
19:30 - 20:00 a Data Center and just you put your hand near these things and you you can fry an egg on them so what are you hearing from customers in terms of the sustainability challenge people are moving data centers you know closer to U where where the power is um Jonathan what are you hearing from customers and and governments um about this concern and and then maybe shimone you could talk about how WCA is architecting to address this issue
20:00 - 20:30 yep so you're absolutely right right on the money with that I think you know again the numbers are eyering chat GPT 3.5 took about $4.6 million of power to train the model just five months later chat GPT 4 took 100 million dollars of power to train the model every time you're you know on on one of these new generative sites and you're tpe typing in a prompt to create create an image every time that that image gets created it's the same power consumed a full charge of an
20:30 - 21:00 iPhone so there may maybe no surprise that people are are beginning to wake up that you know while AI may solve the sustainability and environmental challenges on the planet it's probably also equally possible that it's going to be the thing that that melts it um so because of that you you're seeing more and more and more organizations we're certainly seeing in more and more rfps a focus on this you know do tokens put per kilowatt tokens per watt um tokens per
21:00 - 21:30 hour Focus um what we also see though is there's a bit of a an impedance mismatch between the board or the executive team maybe have a sustainability Target to hit and yet the the people the AI practitioners that are implementing these are kind of running and delivering AI at any cost at the moment so it's it's what we'd call the the the sustainability conundrum or the AI sustainability conundrum it's an
21:30 - 22:00 initiative that that we kicked off about two years ago here and as you you may may or may not know our series D was led by by Generation by Al gors fund um ex Vice President of the United States and so there's a big Focus here on how do we deliver all of the benefits of AI but do it in a way that is sustainable and is not going to mount the planet thank you and Shimon like what can you do I mean is is you optimize through software what are the TR the tips and tricks you're applying to
22:00 - 22:30 actually solve this problem technically H loads uh so first of all work is using off the-shelf components so we we are a software environment so we didn't take any hardware shortcuts so when you look at the W Bill of material on how you deploy a w environment these are just servers that and we do everything through software all of the data distribution all of the data protection uh everything is through software so by that we actually removed a lot of Hardware components that we simply don't need we don't need jbods
22:30 - 23:00 jbff storage class memory EnV Rams EnV dims raid controllers uh we we consume much less network cables connectivity no fiber channel right so we by removing a lot of the hardware the necessary Hardware from this petabyte exabyte scale system we actually consume significantly less power more than that uh if we're looking at the rack performance density um if we're looking at a w system compar at certain Capa capacity perform performance compared to
23:00 - 23:30 Alternatives that could be 10 15x of a difference that consumes significantly less power so it's significantly more efficient uh I would also say that we we have the ability to and we're seeing customers utilizing it obviously as I mentioned to run on different Cloud environments and we do know that some Cloud environments are more power efficient than others uh we even had this idea around having a sustainable API where you can burst into a data center that is now powered by by sustainable energy and then burst
23:30 - 24:00 back when when to another data center is needed but uh even more than that the ability to run on a cloud environment and and scale according to what you need at that point so for example if I need a certain capacity performance I I'm I'm utilizing a work at a certain size and then as I need more I scale that environment to additional instances servers servers uh and also I'm obviously scaling the computer and as as I scale down the compute because I don't need I I completed the computations I can scale down the storage environments again so I'm I'm using power in the most
24:00 - 24:30 efficient way and I would say that there's also another secret we pen that we're using when applied and that's our converge mode and you can think about it as our zero footprint storage and again where it applies and we see customers using it it's actually converging W on the same GPU servers because if you can now run if you have a slew of GPU servers 10 100 a thousand more uh and these GPU servers already contain components that are being used to compute and store data uh today we see a
24:30 - 25:00 lot of customers just moving around data between different storage environments copying the data wasting time an effort uh imagine that you could have run and you can W on these environments on alongside in a safe way with these GPU uh environments just carving up a few uh slim silver of resources from the same GPU servers now you you have a zero footprint environment that provides a high scalable performance uh resilient uh file system wer uh at no additional
25:00 - 25:30 footprint so there's actually uh we see customers converging on hundreds of of GPU nodes uh getting the the performance that they need with no additional footprint it can't be more efficient than that even awesome you know the supercomputing event it used to be this the exclusive domain it's kind of a nichy domain of the supercomputing powered labs and governments but it's become the Super Bowl of AI and and WCA has a dominant presence there Jonathan and tell the audience what's happening at
25:30 - 26:00 supercomputing 2024 in Atlanta what do you guys have going on I know there lots of action at the show lot lots lots of action so so we we have some really very exciting announcements coming up a number of of industry first um touching on a lot of the things that we talked about here touching on things like inferencing doing things differently at ex scale helping people transform their their power profile building Simplicity into doing these environments at scale so we got some really really exciting announcements some exciting industry
26:00 - 26:30 first obviously we have a massive Booth Booth presence you you won't be able to miss the purple on the show floor there um and then you know we we also like to socialize when we're at these things so Wednesday evening I'll do the the gratuitous plug we're running wacka Fest we've got about 12200 people coming to that event um Jimmy Eat world's playing we've got DJs From My bether it's going to be the certainly the the party of the week at supercomputing so if you're interested in coming to w T come check us out on the purple Booth center of the
26:30 - 27:00 floor fantastic gentlemen thanks so much for your time we'll see you in Atlanta we're going to be featuring WCA on the cube uh we've got tons of action three days wall toall coverage appreciate your time and uh have a great show next week thanks so much safe travels all right you too and thank you for watching we'll see you next time this is Dave Volante for the cube [Music]