Kubernetes performance optimization with Michael Levan and Eli Birger.

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

In a lively discussion on Kubernetes optimization, experts Michael Levan and Eli Birger share insights into optimizing Kubernetes for performance and cost-effectiveness. They explore the various challenges and nuances of Kubernetes day-two operations, highlighting the importance of proactive and prescriptive approaches over traditional reactive methods. With detailed discussions around efficient resource management, Michael and Eli emphasize the dual focus required: minimizing waste while ensuring resilience. They illustrate how PerfectScale offers innovative solutions in this complex field, emphasizing the need for continuous optimization aligned with business needs.

Highlights

Kubernetes optimization isn't just about saving money; it's about enhancing performance and resilience 🔧
Be proactive, not reactive, in managing your Kubernetes resources 🔍
PerfectScale automates and simplifies resource optimization for better efficiency 🛠️
Resource optimization might sometimes mean spending more to ensure performance 📈
Efficient Kubernetes management helps in reducing carbon footprints too 🌍

Key Takeaways

Optimizing Kubernetes is a balance between cost reduction and ensuring system resilience 😊
Reactive approaches like monitoring can fall short; proactive strategies offer better results 🔍
PerfectScale's tools allow for continuous and prescriptive optimization by leveraging data effectively 📊
Automated resource management helps prevent out-of-memory errors and under-utilization 📈
Kubernetes resource management can aid in reducing both costs and environmental impact 🌱

Overview

Michael Levan and Eli Birger dive into the realm of Kubernetes optimization, discussing the complexities involved in day-two operations, which include ongoing performance and resource optimization. They emphasize the key distinctions between various optimization strategies, including cost, performance, and resource utilization.

The conversation highlights how PerfectScale stands out by providing a proactive, prescriptive approach to resource management. They detail the importance of understanding and appropriately setting resource requests and limits to avoid the pitfalls of over or under-provisioning, which can lead to increased costs or system failures.

Throughout the session, the speakers stress the importance of not just focusing on cost savings but also on maintaining optimal system performance. They explain how PerfectScale’s automated tools help achieve this balance, ensuring systems run efficiently without unnecessary spending, and even explore the environmental benefits of optimized resource management.

Chapters

00:00 - 01:00: Introduction and Overview of Kubernetes Performance Optimization The chapter, titled 'Introduction and Overview of Kubernetes Performance Optimization', features Michael Lan and his guest, Ellie from Perfect Scale. They address topics related to performance optimization, cost optimization, and resource optimization within the specific context of Kubernetes. The chapter sets the stage for a deeper exploration into how Kubernetes environments can be fine-tuned for better efficiency. The speakers also touch upon the complexity and abundance of jargon in the field of Kubernetes optimization.
01:00 - 02:00: Explanation of Day Two Operations and Continuous Optimization The chapter discusses various terminologies related to optimization in operations, such as cost optimization, finops, resource optimization, and performance optimization. The conversation highlights these concepts as part of 'Day Two' operations. Day Zero is described as the planning phase, while Day One is the execution phase. 'Day Two' involves continuous optimization and management of operations post-implementation. The chapter aims to clarify where these optimizations fit into operational management and their significance in sustaining and improving performance.
02:00 - 04:00: Discussion on Different Tools and Value Proposition of PerfectScale The chapter discusses the ongoing journey of maintaining and optimizing a system once it goes live. It highlights the importance of continuous maintenance to ensure efficiency as the system grows and changes, while still providing the desired level of performance.
04:00 - 05:00: Importance of Safe Resource Reduction and Proactive Monitoring The chapter discusses the importance of performance and resource optimization through various tools and products available in the market. These range from startups to major companies like AWS and Microsoft, which offer different solutions for cost and resource optimization, such as autoscalers and other optimization tools. It emphasizes the significance of resiliency and meeting service level agreements (SLAs) to ensure customer satisfaction.
05:00 - 06:00: Challenges in Cost and Performance Optimization This chapter discusses the value proposition of new tools from a scalability perspective, addressing common concerns from engineers who may feel overwhelmed by the need to learn yet another tool. The importance of these tools in optimizing cost and performance is emphasized, aiming to give engineers a clear understanding of their benefits despite the initial learning curve.
07:00 - 09:00: Environmental Impact of Resource Waste The chapter discusses the environmental impact of resource waste, particularly in the context of managing Kubernetes systems. It highlights that many organizations focus on cost reduction as a main value proposition because operating Kubernetes can be costly. The suggested approach to address the financial issue is straightforward — reduce unused resources to stop wasting money. However, the chapter points out that while the financial solution seems simple, it becomes more complex when approached from an engineering perspective.
09:00 - 10:00: Demo Setup and PerfectScale Environment Overview The chapter titled 'Demo Setup and PerfectScale Environment Overview' discusses the key aspects of engineering tasks with a focus on safety and efficiency. The main concern is about how to implement tasks without jeopardizing safety or functionality. The transcript emphasizes the importance of preserving operational integrity while reducing resources safely. The idea is to save resources without causing any damage or disruption, highlighting the value proposition of potentially saving up to 80 or 100% in certain contexts.
10:00 - 12:00: Cluster Analysis and Resource Utilization Insights In the chapter titled 'Cluster Analysis and Resource Utilization Insights,' the discussion centers around the common reliance on monitoring in Kubernetes environments. The reactive nature of monitoring is highlighted, where issues are only addressed after they have arisen, often necessitating prompt corrective actions. This approach, while effective in identifying problems, tends to induce high-stress scenarios requiring immediate attention and resolution, which many find thrilling.
13:00 - 16:00: Requests, Limits, and Automation in Kubernetes The chapter discusses the progression from reactive to proactive and prescriptive actions in DevOps, emphasizing the importance of continuous monitoring and the use of various data sources to improve scalability. Despite having access to an abundance of data from different monitoring solutions, simply possessing this data is not sufficient to optimize operations. The need for effective data management and analysis to drive automation and set requests and limits in Kubernetes is highlighted.
16:00 - 19:00: Handling Out of Memory Issues and Resource Recommendations This chapter focuses on addressing out of memory issues and providing resource recommendations for maintaining a healthy and efficient environment. The discussion highlights the importance of not just having data, such as logs, but actively using it to improve system performance. The historical reference to having logs for over 40 years underscores the underutilization of data in system management. The chapter emphasizes setting clear goals for system efficiency and the actionable steps needed to achieve these objectives.
19:00 - 22:00: Waste Detection and Recommendations for Efficiency This chapter focuses on the topic of waste detection and recommendations for improving efficiency, particularly in the context of resource and performance optimization. The speaker discusses a common scenario they encounter in consulting: clients wanting to optimize costs by reducing resources when workloads are low. However, simply dropping Kubernetes worker nodes to one is not necessarily the best approach, as there are more nuanced solutions for effective optimization.
22:00 - 24:00: Horizontal Pod and Autoscaler Optimization This chapter titled "Horizontal Pod and Autoscaler Optimization" discusses the importance of cost optimization in application performance. It highlights a common misconception that cost-saving equates to optimal performance, emphasizing that saving money does not benefit if the application performs poorly. The discussion stresses the necessity of following best practices in the overall environment to achieve true optimization.
25:00 - 31:00: Implementing Automation and Alert Management The chapter emphasizes that cost optimization is not solely about reducing expenses but also about potentially increased spending to improve performance. It challenges the common misconception that resource optimization is synonymous with saving money, highlighting that improving a cluster or application stack's performance might lead to higher costs.
31:00 - 37:00: Understanding Cost and Waste Trends The chapter 'Understanding Cost and Waste Trends' focuses on performance optimization in clusters, highlighting that achieving optimal performance might require increased spending. The conversation acknowledges the potential need for additional resources to enhance performance effectively. A brief interruption occurs as they navigate a slide presentation.
37:00 - 41:00: Importance of Continuous Analysis in Optimization Introduction to the chapter theme with an emphasis on the widespread monitoring of numerous clusters.
45:00 - 47:00: Upcoming Features and Machine Evaluation The chapter discusses the efficiency of computational resources, linking it to cost implications and environmental impact. It highlights that optimizing the use of memory and CPU not only reduces expenses but also minimizes carbon emissions from energy use. This is important as computer operations contribute to CO2 emissions, affecting global environmental health. Therefore, improving these processes not only benefits financial health but also supports global ecological efforts. The chapter also notes that 20% of work can be optimized for better efficiency.
47:00 - 49:00: Final Thoughts and Getting Started with PerfectScale The chapter discusses the persistent challenges faced in operational settings, particularly focusing on issues such as out-of-memory errors, CPU throttling, restarts, and latency. It highlights the burden on operational teams, platform engineering, DevOps, and R&D teams, who spend significant time and effort continuously adjusting Kubernetes settings to respond to these issues effectively.

Kubernetes performance optimization with Michael Levan and Eli Birger. Transcription

00:00 - 00:30 hello good morning good evening good afternoon wherever you are in the world I always open up like that hopefully everybody still enjoys it my name is Michael Lan and I am here with Ellie from perfect scale and today we're going to be diving into all things performance optimization cost optimization resource optimization uh everything in anything kubernetes specific and I feel like in this space there are there and and I'm sure you feel this way too Ellie where there's like there's so many uh words and and and so much verbiage around like
00:30 - 01:00 what this space actually is some people call it cost optimization some people call it finop some people call it resource optimization some people call it performance optimization where does it kind of sit for you hello Michael uh great question so uh from our point of view the problem is uh we call it a day two like a crucial part of day two operations Day Zero is when you plan everything day one is when
01:00 - 01:30 you build everything day two is when your system is actually live and serving your customers and this is where the Journey of continuous maintenance of your environment starts and continuous optimization is is basically making sure over time as your system grow as your system changes making sure that your system is remains efficient on one hand but on the other side provides the desired level of
01:30 - 02:00 resiliency and serving your customers with the expected slas makes sense makes sense so in this space you know again what whatever we want to call we'll go ahead and call uh performance optimization in this space there are you know a lot of products a lot of tools a lot of vendors uh ranging from startups to you know AWS is doing cost optimization resource optimization Microsoft uh you know cluster autoscaler right just like by itself Carpenter all these things uh
02:00 - 02:30 in your in your opinion right what is the value proposition you know from a perfect scale perspective and the reason why I asked that is because you know I'm sure there are a lot of Engineers on here saying oh another tool another thing to learn Etc right so we kind of want to give them the the why behind the value proposition uh so I I'll go ahead and let you handle that yeah absolutely so um the there is a lot of different tools
02:30 - 03:00 calling for different value proposition as you correctly stated most of them focusing on the cost reduction because running kubernetes is kind of expensive but uh the actual problem or like if you look in the from the financial standpoint the problem is pretty easy yeah reduce the resources that you don't use and stop wasting money that's pretty simple but when it's come to the engineering with when you come to the
03:00 - 03:30 engineering with such a task the first and the most important thing is okay but how do I do it where are the resources that I can safely and the focus is on safely reduce without harming anything without breaking anything and making sure that I'm still running I'm still working focusing on this perspective and by the way uh that that's one part of the value proposition so yeah you can save uh 80 or 100% of
03:30 - 04:00 kubernetes if you turn your kubernetes off but that's not what you're trying to do here that's one part and the second part is being proactive uh most of the companies relays on the um monitoring and monitoring is very reactive approach once the problem is there and you was smart enough to configure proper alert then you will know that something went wrong and then you jump on and fix a lot of adrenaline this is why we love being
04:00 - 04:30 devops but uh eventually we can do better we can do proactive and even more we can do prescriptive and what perfect scale brings is this continuous and it's not it's not only the data yeah we have all promus data dog whatever monitoring solution we have we have a lot of different phop solutions they have all the data there is ocean of data where you don't but but having this data doesn't mean that you you have the
04:30 - 05:00 answer that you have clear prescription of what you need to do in order to make your environment healthy and efficient and this is what we set for us as a goal this is what we do yeah so two really really good points you know the the second thing that you talked about yeah having the data it doesn't mean much right uh We've we've had logs for 40 plus years and the majority of the time we don't do much with them so it's it's very true and then you know the first
05:00 - 05:30 part that you mentioned you hit the nail on the head and this is the same thing that I talk about all the time when it comes to Resource optimization when it comes to you know overall performance optimization because I've had clients in the past from from a Consulting perspective that say yeah we want to do cost optimization and I'm like awesome and they're like so what we're going to do is you know when when workloads are are super low when resources aren't being used we're going to drop our kubernetes uh you know worker nodes down to one worker node and like no no no
05:30 - 06:00 that's not cost optimization that is throwing everything down down in a bucket what that you worked on so far uh that is not a great thing or a good thing to do saving money doesn't help if your application is performing terribly right if your overall environment is not following best practices so you really hit the nail on the head there talking about cost optimization because a lot of people think oh it's going to save money but in reality
06:00 - 06:30 cost optimization isn't about just saving money cost optimization is also about maybe paying more because that means your uh cluster your environment your application stack wasn't performing the way that it was supposed to be performing so once you implement resource optimization guess what you may be paying more and I think that you know a lot of people don't think about it like that from a performance and a resource perspective because they think oh resource optimization equals saving money no sometimes that's not the case
06:30 - 07:00 performance optimization means making your cluster work as good as it possibly can which sometimes means spending more money yes that's uh that's absolutely true and if you will allow me I will share for a second um one slide and this slide is uh sorry no no problem take your time yeah the the you should see a present button at the bottom sometimes it's kind
07:00 - 07:30 of hiding all right so I guess it's visible now so this is what we see across all the Clusters and we monitor hundreds of them uh many many hundreds and uh also not us only but data do did the similar research for 1.5 billion containers that they analyzed over a long period of time 70% of the resource Es are wasted this
07:30 - 08:00 is a memory and the CPU which eventually translates to uh to the bill that you are paying and not only this is also translates to the carbon footprint because eventually you know when when you running computer you have also CO2 pollutions and at the end of the day we're all sharing the same Planet optimizing this will also help the planet but at the same exact time there is around 20% of the work which are
08:00 - 08:30 continuously struggling with problems like out of memory evictions CPU tring restarts latency and so on and so forth and a lot of effort of those operational teams platform engineering devops teams R&D teams are wasted in continuous tuning of those knobs of kubernetes to make to to to react on the
08:30 - 09:00 to fix the problems makes sense yeah and you know exactly what you said about waste of productivity I mean that is that's arguably the biggest thing that Engineers are trying to do right now right is optimize not only their environments but themselves uh and that's you know hopefully where platform engineering will come into play you know the whole idea of separation of concerns and release cognitive load and all these really cool words that sound great but uh let's actually see if they get
09:00 - 09:30 implemented right so that's that's the obvious goal of platform engineering so awesome now a couple weeks ago and by the way for everybody that's here please feel free to comment ask questions show uh ask us to show stuff all that jazz please feel free we can see all the comments that pop in here a couple weeks ago I did a live stream on just getting perfect scale up and running right so I had one cluster got the installation going showed some uh you know
09:30 - 10:00 uh menu options showed hey here's what happens if I have standard pods running Etc but of course I was just getting it up and running for the first time so the environment was small so what I wanted to do is I wanted to ask Ellie to come on and say hey can we actually see a full environment with you know a bunch of workloads running a bunch of environments running Etc uh and happily and luckily he said yes so here we are and that's exactly what we're going to be doing uh in this you know for the rest of this live stream is we're going to go over what uh a full I'll call it a full
10:00 - 10:30 environment looks like with a bunch of PODS running a bunch of resources running Etc because again the one that I put out a couple weeks ago and you can find the recording on YouTube was just the initial setup in the initial installation and getting everything prepared so Ellie if it's okay with you what I'd like to do is I'd like to pass it over to you and uh we can go over the demo and we'll do a little player coach thing here so I'll ask a ton of questions sure absolutely
10:30 - 11:00 all right um so let's start by analyzing a cluster we have here a cluster running 10030 different workloads deployment State full set jobs whatever things we're running in the
11:00 - 11:30 cluster we currently looking and analyzing this cluster over a period of 1 months what we can learn from the big picture of this cluster so the 99 percentile of utilization of our cluster was only 17 cores and 43 gig of memory this is enough resources to run this cluster in 99% of the cases uh this is measured every 30 seconds but as you know in kubernetes
11:30 - 12:00 you have to define the resources for each and every container that you are running and those resources or the allocations or the requests and limits uh they are usually set by service owners so service owner expected to have this knowledge on how much resources I actually need to run my application and in most of the cases what happens in this situation is okay so I don't really know I'm
12:00 - 12:30 guesstimating and when I'm guesstimating I may think okay my application need let's say 8 gig of memory so most likely I will not put 8 gig I will put 16 I will put 32 why is that because I don't want to be waking up in the middle of the night in order to fix this problem by the way I like the waste aspect I feel like I've looked at a lot of different tools in the space and I think perfect scale is the first one that that
12:30 - 13:00 categorizes it as waste it's just it's a it's a it's a fun word that really pops out at you and you're like wait what so that's awesome yeah so the total combined allocation for that came for this cluster is 63 cores nearly five times more than we nearly four times more than we would need same went same goes for the memory right we we see here pretty much three times but this is not the end
13:00 - 13:30 because cluster autoscaler also takes uh additional assumption and clust autoscaler base the assumptions on the request not on the actual utilization and this is very very important part to understand when you using tools like horizontal po to scaler and setting the trigger to be 80% this is 80% of the request that you set same goes for for the cluster autoscaler when cluster autoscaler scale
13:30 - 14:00 down the nodes it's uh at 50% this is 50% of allocation this is nothing to do with the actual utilization So eventually we have additional buffer here and additional buffer here so we ending up paying for 72 cores 260 gig of memory while we need only 17 and 43 so our cluster is extremely over provisioned we have like six or seven times more more resources than we would
14:00 - 14:30 need the expectation in this situation is that okay so we spending a little bit more of money but at least we would expect the the resilience to be fine right we have more than enough resources everywhere in the cluster however we detected 74 we detected all the waste but we also detected 74 places where particular workloads are struggling to get enough resources despite the fact that entire cluster is heavily over
14:30 - 15:00 provision so let's start looking into the workloads themselves so let me ask you this actually really quick before we move on when we're thinking about requests limits quotas a lot of the time you'll see requests set right because it's it's usually the the safest option so to speak right because you're not limited but at least you're going to get at least that amount of resources from a CPU and memory perspective for your
15:00 - 15:30 particular workloads but to your point there's also a lot of waste there as well because you can request you know x amount of resources and only use 30 to 40% of them and the rest are kind of just sitting there so in your opinion do you think that even utilizing things like requests aren't the best way to go yeah so uh uh for to answer this
15:30 - 16:00 question I will pause for a second the um the demo and uh I would like to present uh one slide just to explain what we are talking here about uh okay so we're uh so this is actually coming from my personal experience when I have built few few of my first kubernetes clusters I have implemented the cluster autoscaler I've
16:00 - 16:30 implemented the uh HPA and what I expected I expected a very good up time I expected also my cost to fluctuate together with together with the load when I have during the Working Day High load then I pay more but everything is scaling down at the at the evening and at the weekends however when I evaluated what happened in in the reality some this is something that I so my up time
16:30 - 17:00 was not as I expected moreover my cost is constantly grew up so I try to understand where the um wirus is happen and eventually the four noobs that kubernetes gives you to address uh the the four main nobes this is where everything starts this is the requests and limits the requests this is where what you uh guarantee to have for your
17:00 - 17:30 workload the limits you may think of them in the best way as the safety valves because we sharing multiple containers on the same machine we want to avoid the situations of busy neighbor when one particular C for some reason started to consume all the memory of the machine and kill the entire machine and the blast radius then is huge because we lost all the PS of this machine and to understand it even better we need to evaluate what actually
17:30 - 18:00 happens if we set those knobs in correctly so when we looking at the request if we over provision the request then the result is we are we're going to waste money we're going to reserve 32 gig out of which we will not use nearly anything if we underprovision them and set them too low then the results will be the reliability and resiliency issue late evictions out of memory Etc if we do not
18:00 - 18:30 set them then we break the entire concept of kubernetes orchestration because kubernetes places the PS on the nodes based on the requests and if you don't set the request you actually saying saying kubernetes I am not going to consume any resources and then you are arriving to the node and then you're trying to consume resources but there is no resources for you so you are a
18:30 - 19:00 or some other p is evicted so this will break the entire concept of orchestration in kubernetes and looking into the limits so this when we over provision them uh we basically like not setting the cutof switch if our if my pod can consume the entire memory of the node then there is no cut of switch it it can kill the node if we under provision them then again we
19:00 - 19:30 have throttling we have evictions we have out of memories and if we do not set them for memory it's again cut off switch for the CPU it's a little bit different story this is where we actually recommend not to set the CPU limit at all unless you know why you are doing this and what what is the exact case why you need it cool so we we got a question that came in here and I think funny enough we will be answering this question with
19:30 - 20:00 the demo I don't know if you have any other comments in terms of you know this question overall or if you know again if it's just going to be answered more or less in the demo yeah so the question is horizontal pod and clust autoscaler how to fine tune them to Optimal Performance or is it anything else y That's it y yeah uh so we will we will talk about this uh in dur during the demo perfect and after
20:00 - 20:30 the demo okay so uh let me share the screen back uh and let's continue with the demo all right so uh so as we said this cluster is extremely over provisioned on one hand but have resiliency issues on the other hand and this very very typical story we
20:30 - 21:00 see it across all the Clusters that that are first on boarding for to perfect scale without nearly without any exceptions okay so let's start with the resiliency let's start looking how we can detect the problems the and then we will address the waste so all the problems are here can be categorized by the severity level or by particular indicators those indicators are d damic means that at the moment that you see
21:00 - 21:30 this o it means that at least one workload actively having om that you need to address right now let's see how does it look like and for that let's uh focus on this Moon Under workload so what do we know about this one uh so this is uh this is a workload which first was deployed on January 9th so it is running for uh for nearly a
21:30 - 22:00 month it has one container and in this container we have 1,186 times out of memory we're running here with single replica which means that every time we failing we are not serving customers at all why this out of memory happened this out of memory happened because someone said the request to be 50 the limit to be 200 and that was the estimation however when we
22:00 - 22:30 look at the utilization we see that the utilization is very very close to this 200 and every time we touching this 200 the O killer of kubernetes Simply kills the Pod and how do we fix that we fix that by incre here are the recommendations of how much resources you actually need and as you can see here we have also CPU throttling so this is why we need to increase the CPU request and remove the CPU limmit and this will fix the problem with this
22:30 - 23:00 workload all those recommendations are here rra into the yl file so you can just copy paste them into the values file of your Helm chart run your cicd and you fix the problem however in many situations we are not the develop we are not the developers we are not the owners of the service we are platform team we are engineering team we are phobs team we driving optimization so we can go go ahead and open a ticket to the relevant
23:00 - 23:30 developer this ticket will contain all needed information about what exactly need to be done this ticket will also contain a link leading developer back to the right cluster to the right namespace to the right workload showing the exact same picture as you so and from here the developer can also we have the integration with observability tools like promus and data dog so if this information is not enough and developer would like like to see how does it look
23:30 - 24:00 in um in terms of par in terms of behavior of particular PS over time then we have those predefined dashboards that we are supplying and you can definitely see how everything behaves So eventually we found a problem we know how to fix this problem we know the impact of this problem we know also the financial impact of the change and we can fix and improve our cluster and any questions so
24:00 - 24:30 far so I would say what is the in terms of the recommended changes right those are of course the things that we can copy and paste into our kubernetes manifest now is there any particular options for automating that process yeah absolutely and I will show it show it uh in a second yes and we did just get a
24:30 - 25:00 question that popped in here uh I will take a look at this Tool uh is it free and does it support Mac OS uh I think that there may be uh some slight confusion around what exactly the tool is uh so just to clarify here if you missed the beginning of the live stream this is a performance optimization tool or resource optimization cost optimization whatever you like to call it in whatever realm for kubernetes so this is something this is a tool that's
25:00 - 25:30 running on one or many kubernetes clusters and it's being managed via a guei or UI yeah that's correct so this is a sus solution you basically deploy an agent into the cluster this is how you do it and Michael showed it in the previous stream but here is the button you give you clone the repo you give your cluster a name and then you will see the helm command appearing here to add your cluster starting from this moment uh all the information about the utilization will start flowing to our
25:30 - 26:00 platform and then you connect to our platform and run the optimization sessions okay so let's continue so we detected the resiliency problem and we learned how to fix them let's detect some waste problem and learn how to fix them so we have here a workload named PSC aggregator this is deployment running in the name space of apps running for 2016 hours this is the total combined up time of all the replicas
26:00 - 26:30 that is running then we understand on top of which instances you are running what type of instances what type of reservations is it a sport instance or in demand instance and eventually we coming to the price how much you actually spent on this workload and what we are saying here is out of $65 you may you you can you wasting $36 which is more than half let's see why
26:30 - 27:00 and I'm apologizing for slight latency that I have here with my internet today no problem happens to the best of us yeah so here we can see all the revision all the code changes that happen to this class to this workload and as you can see this workload is is being actively developed and changed and some CH why it is important because some changes might be small changes like you move the pixel left or right not really
27:00 - 27:30 affecting anything reg regarding the uh performance but in some situations you may introduce some heavy query or some internal cache and then you need much more resources so following not only the time but the changes in code is extremely important to understand how your workload behaves so what do we see here we see here somewhere between three to four replicas there isn't HPA the HPA is triggered by custom metric for
27:30 - 28:00 example the amount of requests or the length of the queue or whatever trigger it is and while we're running somewhere between three to four replicas each replica is provisioned with 5 gigabyte of memory 2 gigabyte of uh two cores of CPU however when we look at all those all those replicas across the months what do we learn we learn that we utilizing somewhere around half a gig of the memory most of the time so this is
28:00 - 28:30 like a forecasting in a sense right or I don't I don't know if perfect scale calls it that but it's kind of like a forecasting or a history of what was utilized for that specific workload yeah so this is the history and this is the forecast okay so the historically the highest Spike across all the replicas was 1.7 gig most of the time we using somewhere below half a gig so why we are
28:30 - 29:00 wasting here around five gig multiply by four replicas it's easily three four machines full blown machines that we can save here and uh so perfect scale again comes with recommendations of how much resources you actually need to set those knobs again the requests and the limits in order to run your workload and you can uh adjust the recommendations BAS Bas on the desired SLA level so for
29:00 - 29:30 example if it's a development environment then you can set low and save much more money and expect SL SLA around 95% but if it's a production workload and you need four NES then you can go to the higher one and uh and we will recommend much more but we will take into account the seasonality waves of your load we will take into account the the overall trend of utilization will take into account machines that you
29:30 - 30:00 are scaled on etc etc many many different code changes many different parameters that are being take here into account when we coming to the recommendations now are these looking at uh deployment stateful sets Etc or are they looking directly at pods like from the history perspective because I like if it's looking directly at pods like what if the Pod exits or crashes and then a new one comes up right like does that history get lost or is the history
30:00 - 30:30 portion looking at the full deployment or the stateful set or the Damon set versus the direct pods yeah so what happens here this is the aggregation of all the objects that are came under apps PS under Nam space apps PC aggregator deployment so all the podes that were here and attached to this one this is the he combined history of them together okay uh said that it's uh another important
30:30 - 31:00 feature if you're running Emeral workloads like spark jobs like airflow like splink uh Flint like uh Runners we have the custom grouping capabilities so instead of looking here at the enormous amount of uh random podes they will be aggregated based on the labels based on the container names based on the machines that they're running on into something meanful into something where you you can change the parameters of the
31:00 - 31:30 particular operator that you are using to create those Emeral workloads got it cool and for just just a a quick pause for everybody that's listening watching please feel free to ask questions I've seen a ton of comments coming already stuff's getting answered really appreciate it please feel free to ask any question that you'd like around performance and cost optimization or around you know perfect scale in general yeah and and addressing your last comment about the automation so
31:30 - 32:00 here is an example of the automation so obviously as we know how to uh recommend changes we can also Implement those changes automatically how do you set uh so to to to enable the automation you basically set a label on the like annotation on the level of workload of namespace saying something like perfect scale automations equal true and starting from this moment our automation will address all the needed changes uh
32:00 - 32:30 for if we see that particular workload is struggling to get enough resources we will step in and add more resources uh in most of the cases we will do it in advance way before it's actually crashed for the cost reduction on the other side we will do it gradually in multiple iterations making sure that nothing is harmed as the result of this reduction here we can see the example of workload working under the Automation
32:30 - 33:00 and uh we will see in a second how the automation increased our amount of CPU and also increased the amount of memory keeping this workload live without latency if we wouldn't do that this workload most likely would crash or create latency or create some alert someone will need to wake up in the middle of the night and fix
33:00 - 33:30 this so that's on the active optimization side since we have this benefit of knowing exactly where are your problems and you are managing multiple clusters we have customers with 40 60 and 100 clusters running in their environments it is very very hard to stay on top of all the alerts on top of all the problems and we invested a lot here in using the alert
33:30 - 34:00 fatigue why this is important let's say we had the workload with 1,100 out of memories most of alerting systems will alert you 100 1,100 times do create enormous amount of noise you cannot really operate with that perfect scale will alert you one time but perfect scale will keep a very convenient dashboard to show you what you actually need to do let's say I'm responsible for the monitoring name space across multiple clusters here are
34:00 - 34:30 the set of my active problems they are happened first some time ago they are still here and actively happening we notified You by this channels there might be slack channels there might be teams channels but we send we send you the notifications however they are still here let's look at this one for example uh
34:30 - 35:00 um so here's a problem we have 353 times out of memory we have single replica boom problem how to fix that or what is the reason the resources are too low we set at request equals limit which is the best practice for production grade workloads but it's not enough there might be enough back then when it's set but since then someone deployed additional microservice with heavy query which affects us now now and we are not serving the customers as we
35:00 - 35:30 expected uh so here is a very convenient workbench and I can acknowledge this problem and say hey I'm now taking care of this and it's fine I'm now responsible for it so this actually leads us into a really good question that just came in and this is a question that is probably popping up for a lot of people in their heads right now uh because the whole managed aspect of things right so for example this question is do I really need to worry if we use AKs I assume we pay for the
35:30 - 36:00 manage cluster with number of node pools we won't request for number of cores or memory great question so unfortunately even though they are called managed kubernetes Services uh they are very much more or less not managed other than the control plane uh so one of the things that are not managed is resources right so memory CPU storage Etc that is
36:00 - 36:30 all stuff that is on Demand right so in the cloud the one of the beautiful things about the cloud is that it's on demand and we can use whatever resources we want whenever we want one of the bad things about the cloud is that the resources are on demand and we can use them whenever we want so the the pro and the con is almost the same thing and point being is absolutely 100% I would actually argue that if you're in the cloud you have to be thinking about cost
36:30 - 37:00 and resource optimization more because you have the ability to literally just grab it whenever you want versus on Prem you don't there's a little bit more planning that goes into play there because you only have so many servers uh and there's only so many or there's only so much memory and CPU within those servers but in the cloud you can grab it whenever you want so yes 100% uh you will definitely definitely definitely want to think about cost and resource optimization even when you're using AKs GK eks Etc Ellie I don't know if you
37:00 - 37:30 have anything that you'd like to add there yeah so only one thing I wanted to add managed clusters are managing only the master nodes they have nothing to do with the workloads that you are running on the Clusters so managed is okay you have your three Master nodes and they are managed for you all the rest remains compl completely your headache or pain
37:30 - 38:00 or call it as you wish all right so with that let's uh continue the next capability is Trends okay we want to understand what's going on in our environments so here for example we can look at the let's switch to the monthly View and what we will see we will see the cost per cluster per month and uh this is pretty much what you're getting from your cloud provider you're getting
38:00 - 38:30 a bill you're paying for for this cluster if you put labels accordingly and you know to pull this information so you paid 1,130 1,300 dollar for this cluster but the big question is what actually happens here and let's focus on this particular cluster and now let's break down this cluster by workloads and then what we will see the
38:30 - 39:00 blackish part here is the amount of extra resources that we are not that we are the cluster autoscaler scaled up and we haven't used but and all the rest is the workloads so this is my most uh expensive workload it is here and I paid for it uh a lot $400 but eventually I optimize and now I'm paying only $78 for it now let's look uh let's break it down to
39:00 - 39:30 the Daily View and what we will see is how our system behaves day over day as we go deeper into the into the resolutions into the time frames we start seeing interesting things we start seeing anomalies for example our cluster cost here for day was only $38 but then it jumped to 45 which is 30%
39:30 - 40:00 more assume this is Thousands okay from three from 30,000 you started to pay 45,000 so your CFO is coming and asking hey guys what happened in most of the cases you don't have any answer your best answer is developers did something what exactly they did I don't know they did something here we can clearly see that the workload this workload started here it was deployed here and it run for some amount of days and then it reduced and
40:00 - 40:30 then it started again here so we have the Smoking Gun we understand now how the cluster reacted and reacted to what actually not only cost but we can evaluate the waste and we can clearly see how much waste it created we can evaluate the machines that we are using here and uh the machine types okay what are the machine types how they are
40:30 - 41:00 scaling what kind of machines we are using what is the mix we can evaluate okay so we implemented spot instances and we want to understand how effective we are yes so nearly 50% of our cost is already on the spot instances which means we are saving here a lot of money providing the same exact service we are good um so this is in in in the nutshell the
41:00 - 41:30 reporting capabilities and there's there's a lot of details here and this is actually a perfect time to bring up this statement that that came in uh somebody said gy features a resource recommendation engine yes 100% all the major Cloud providers do uh I think even some of the smaller Cloud providers do as well but the thing is is that they're not not coming even as close to being
41:30 - 42:00 detailed as this tool as plenty of the other tools that are focusing in the resource and optimization and cost optimization space uh the cloud provided tools aren't coming even close from a detail perspective yeah so I would like also to reflect on that um when uh when tools like uh there is a main uh tool and Google Rel based on this tool also this is a vpa the vertical p autoscaler and
42:00 - 42:30 uh there is two things to uh to think about the first one is you have to understand the impact because eventually you don't have all the time in the world to do the optimization you want to address the most critical issues the most critical to understand what is the most critical issue and I will give just one example okay assuming we have a heavy over provision workload 700 % more resources but this is a job running once
42:30 - 43:00 in a month on a spot instance do we need to invest here no on the other hand we may have something slightly over provision 15% but we running 200 replicas on the very expensive uh node with GPU then the impact of our action will be enormous and this is what is uh unique in what we are providing here we are providing here the ability to slice and dice and find immediately all those needles in the Hy stack where you
43:00 - 43:30 actually need to invest your time because your time is precious and you want to take the most ex the mo the most important actions first yeah and you I'm sorry one of the other things that I'll say too is like those types of tools as well when they give the recommendations it's very much like a oneandone thing right you know like a tool like that is like a consultant and funny because I'm a consultant too uh where you know you you the consultant
43:30 - 44:00 comes in they say yep do this and you'll be good to go and you say all right cool and then three months later or two months later or one month later you're like oh things are different right but that consultant's not around anymore and it's very similar with those types of tools where you're not getting history you're not getting forecast like you don't know how things are going to change and For Better or For Worse things are always changing right so like let's say you're an Ecommerce site uh and or an e-commerce company what what
44:00 - 44:30 are going to be your your your two most busy days Black Friday and Cyber Monday so year one it's going to look like you're using x amount of resources but year three hopefully if your business is growing you're using even more resources so that recommendation that you got a year ago three years ago six months ago three months ago a month ago may or may not be relevant most likely it won't be again for Better or For Worse you're either using more resources or you're using way less it's NE you're you're
44:30 - 45:00 never sitting like this ever and that's why resource and cost optimization exists in the first place yeah yeah and last thing is the algorithms like uh the vpa one of the problem one of the big problems of the vpa is the Decay histogram algorithm which means that something that happened now have more weight than something that happened back then which which means if you have the standard seasonality waves where you're consuming more resources
45:00 - 45:30 during the peak hours and you have less and you need less resources during the off peak hours then eventually you will find yourself by beginning of Monday where you need to scale up with less resources than you would need and then your struggle will start because system will not react system will create latency you will have two low risk ources during the peak period so Safety
45:30 - 46:00 First exactly all right uh last one that I I will not go too much into the details uh this is this is a better feature but will be very very soon um public is the option to evaluate machines so as I said everything in kubernetes is is related to the resources and cluster autoscaler
46:00 - 46:30 Carpenter G gke autoscaler or autopilot every s relays on the resources and let's look into this example of machine so this machine H we we have two machines running somewhere between from running with eight uh cores and 32 gig of memory so 16 and 64 combined and this is the request that the they hold this is the request that prevents from
46:30 - 47:00 cluster autoscaler or from Carpenter or from any other tool to scale down those machines but they are not utilized and we want to know what exactly is sitting here and let's remove it because if we will reduce the resources here it will actually jump into one of those nodes and we will save to two machines here and eight machines here and so on and so forth so evaluating the cluster autoscaler evaluating the types of machines that you are using evaluating
47:00 - 47:30 your autoscaling strategies this is something that we are building right now awesome so when will that be or do you have an ETA by chance on like when that will be like Q2 Q3 Etc yeah somewhere by end of q1 oh nice awesome very short period of time sweet perfect sounds good so I want want to just ask everybody once more if you or statement
47:30 - 48:00 whatever you want to call it uh if you have any questions please feel free to let us know I'm looking at the comments here as we speak we're popping them up on the screen um or if you'd prefer not to have them popped up on the screen just let me know when you write the comment and we will answer any questions that you may have about perfect scale cost and resource optimization or you know like we just did before give our general opinion about um you know other implementations that you can kind of go after so one other thing thing I want to just pop into the comments here do you
48:00 - 48:30 by chance have a link uh for a specific link to a trial that everybody can can test out absolutely uh so instead of sharing the link I will just show you you go to the perfect scale doio and on the top right corner there is let me share the screen for a sec cool uh yeah here we go so you simply click here with the get started and uh then the only one thing you need to provide is your work
48:30 - 49:00 email address we do not accept Gmail or other things no other commitments are needed just your work email address valid address and we provide 30 days free trial no commitment during this 30 days free trial you also getting session the optim complimentary optimization session where we are going together over your environments and showing you all the findings and helping you to understand what exactly need to be done
49:00 - 49:30 here perfect sounds good well Ellie before we wrap up here is there anything else that you'd like to mention uh any shout outs that you'd like to do anything like that uh yeah so uh thank you very much for your time it was pleasure talking to you and guys remember the main problem like many people complaining about complexity of kubernetes but eventually the complexity of kubernetes is why
49:30 - 50:00 kubernetes so compelling and why it's built to solve complex problems the problems of uh pets we we call it the pets versus cattle Paradigm and that's the main challenge right you have many many pods and you need to understand what exactly going on you have multiple environments there and we helping to close this Gap first f f first of all the second yes we are your Tri adviser to how make the everything better you can keep reacting on
50:00 - 50:30 the on the monitoring and but you can also prevent many many different issues awesome very cool well thank you so much for joining me today really appreciate it and for everybody that tuned in thank you all so much hope that you enjoyed it and the recording will be available on YouTube thanks everybody thank you bye-bye