Lecture 45 "VM Migration - Basics Migration strategies"

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

In the lecture on VM migration basics and strategies by NPTEL IIT Kharagpur, the discussion revolves around the necessity and methods of moving virtual machines (VMs) from one server to another within a cloud computing environment. The need for VM migration arises due to load balancing, server maintenance, and fault tolerance. There are two primary migration types: cold (non-live) and hot (live). Cold migration involves shutting down the VM before moving, while hot migration allows the VM to run during the transfer. Various strategies and considerations, such as minimizing downtime and migration time, are crucial in ensuring seamless performance in cloud environments. The lecture also touches on pre-copy and post-copy strategies used in live migration.

Highlights

VM migration is essential for load balancing, server maintenance, and improving fault tolerance. 🎯
Cold migration involves stopping VMs before moving, while hot migration allows live state transfers. 💻
Challenges in live migration include managing CPU context and network settings while minimizing downtime. ⏳
Strategies like iterative pre-copy help in efficiently managing live migration tasks. 🔄
Ensuring minimal service disruption during migration is critical for maintaining Service Level Agreements. ⭐

Key Takeaways

VM migration helps manage load balancing and server maintenance efficiently in cloud computing. ⚙️
There are two main types of VM migration: cold (non-live) and hot (live), each with distinct processes. 🔄
Minimizing downtime and effective migration time is crucial for seamless cloud operations. ⏱️
Live migration is challenging but allows VMs to continue functioning during the process, requiring sophisticated techniques. 🚀
Strategies like pre-copy and post-copy are employed to optimize live migration efforts. 🛠️

Overview

Virtual Machine (VM) migration is a cornerstone strategy in cloud computing for optimizing resource allocation and ensuring system reliability. By transferring VMs across different servers, cloud providers can balance loads, perform maintenance without disrupting service, and improve system fault tolerance. The migration can be of two main types: cold and hot, each suited to different scenarios and demands.

Cold migration is a straightforward process where VMs are shut down and transferred, which may lead to some downtime but is easier to manage technically. In contrast, hot migration allows VMs to remain active during transfer, demanding more sophisticated techniques like maintaining CPU state and network configurations seamlessly. This latter method is crucial for time-sensitive applications where downtime must be minimized.

Techniques like pre-copy and post-copy are integral to optimizing live migrations. Pre-copy involves transferring the memory state while the VM is running, progressively reducing the size of data to migrate. In the advanced stages, methods ensure minimal disruption and migration transparency, allowing users to experience uninterrupted service continuity even during migration.

Chapters

00:00 - 04:00: Introduction to VM Migration The chapter serves as an introduction to the concept of Virtual Machine (VM) Migration within the broader context of cloud computing paradigms. It sets the stage for an in-depth discussion about the process of migrating virtual machines, highlighting its relevance and application in the cloud computing landscape. The section assures readers the subsequent content will explore different facets and technical considerations involved with VM migration.
04:00 - 08:00: Understanding VM Flavors and Subscription The chapter titled 'Understanding VM Flavors and Subscription' introduces the concept of virtual machines (VMs) as a key service offered by cloud providers, primarily under the Infrastructure as a Service (IaaS) category. It explains how users can configure VMs according to their specific requirements when making a request, emphasizing the customization possibilities in cloud computing.
08:00 - 12:00: Challenges and Necessities of VM Migration The chapter discusses the availability of different virtual machine (VM) configurations and flavors offered by both open-source platforms like OpenStack and commercial providers like Amazon. It highlights the flexibility in subscribing to various categories of VMs for specified durations to suit different user needs and requirements.
12:00 - 16:00: Basics of VM Migration - Live vs Cold This chapter introduces the basics of Virtual Machine (VM) migration, focusing on the differences between live and cold migration. It discusses scenarios where VMs are initialized on several servers that users access. A challenge mentioned is the management of VMs when they're not in use, pointing out the importance of releasing resources tied to VMs that are not currently needed. This situation is elaborated with scenarios involving particular servers where such challenges may arise.
16:00 - 24:00: Detailed Look at Live Migration The chapter titled 'Detailed Look at Live Migration' delves into the operations and management of virtual machines (VMs) in a cloud infrastructure. It discusses how each server can handle a specific capacity of VMs, for instance, 'm' number of machines, and explains the dynamic nature of VMs where they are allocated to servers, perform their tasks, and then release resources once their tasks are completed. This process creates gaps that cloud management tools, like cloud brokers or controller units, fill by reallocating resources efficiently. The discussion highlights the intricacies and mechanisms of live migration in virtual environments.
24:00 - 32:00: Memory Migration Steps in Live Migration This chapter discusses the concept of memory migration during the process of live migration in computing. It highlights a common scenario where servers experience an imbalance in workload, with some being heavily loaded and others less so. To address this, there is a need to migrate virtual machines (VMs) from a heavily loaded server (e.g., server one or server three) to a less loaded server (e.g., server six or server four) to ensure balanced distribution of workload across servers.
32:00 - 39:00: Iterative Pre-Copy and Post-Copy Migration The chapter discusses the concepts of iterative pre-copy and post-copy migration in virtual environments. It addresses the need for migration during system shutdowns or maintenance, highlighting the challenges and necessities of managing virtual machines in such scenarios.
39:00 - 45:00: Conclusion and Further Discussion The chapter discusses the perspectives of data centers and cloud providers in the context of VM migration. It introduces the basics and philosophies behind VM migration, as well as different strategies to facilitate this process.

Lecture 45 "VM Migration - Basics Migration strategies" Transcription

00:00 - 00:30 [Music] [Music] hello so let us continue our discussion on different aspects of this ah cloud computing paradigm right today we will have will start a discussion on vm migration so virtual machine as all of you
00:30 - 01:00 know that ah it is a its a one of the major ah service what a cloud provider ah provides right it is primarily as a in the level of ias or infrastructure as a service so what you will do ah you can have your own virtual machine configured based on ideally configured based on your requirement when you when you request for the things right in reality what will what it happens that there are
01:00 - 01:30 different flavors of the things are available like different configurations available like if you have used ah any open source like ah open stack or any commercial like from amazon or any other sources so you will see that the vms there are ah different flavor available like so different category of virtual machines are there so you can basically subscribe that machine for a period of time and then work on that
01:30 - 02:00 machine and then release when you do not require that messy that vm right the challenge is sometimes so what is happening there are several ah servers where the virtual machine are initiated where the users are using that consumers or the users of the vms are using this right and then there may be a situation like and suppose there are there is a server where or a particular particular server there
02:00 - 02:30 are say in virtual machine ah and maybe eve every server has a capacity of say m number of machines somewhere in one somewhere in two and three vms are running right and what is happening like ah once it is coming it is getting ah requisition one of the vms then sometimes ah once that work is over there is release that gap is created and the overall cloud that cloud brokers so called or the controller units is going to ah allocate
02:30 - 03:00 the things ah and in in a in this type of scenario one thing may happen there may be ah imbalance in the thing some of the servers are loaded heavily some are less loaded and and so you may need ideally need to migrate some of the vms working in this say server one two say server six because the server one is more loaded right or server three to cyber server four and
03:00 - 03:30 server fourteen anything right so there i require a virtual magnet secondly some of the things may require a shutdown that ah or maintenance go may need a maintenance right so once there is a need of maintenance then also you need to shut down that system so all the vms need to be migrated into the things so this migration is a major ah challenge or ah is one of the necessity what the virtual or what we see in this
03:30 - 04:00 case when when we do ah any work with this when you think about the data centers or the cloud providers point of view right so in this talk or in couple of ah and next talk also will see that what are the different way this ah vm migration can ah occur or what are the what ah what is the basic ah philosophy behind it so you look at the v m migration basics and the migration strategies right
04:00 - 04:30 so vm migration so what we see that if you if you see that when we talk it is a process to move running applications or vms from one physical server host to another physical server as we are discussing right so it was running in a watched server and then i want to move it to the another server there may be several needs i will come to that right so the once a process is running this is a this is a big challenge one one way off
04:30 - 05:00 is that you shut down everything put it to another switch on the things right so that is that is that is is possible that will see that that is one way that is possible and that may be the easiest way of migrating this type of thing shut down the process and ah re invoke or invoke the process in the other missing or invoke the vm into the other message right the but the challenge is that always it may not be possible the consumer or the um
05:00 - 05:30 user may not allow you that that or no no that is that may not be feasible because my process is running for days together i may be running a simulator or i am writing a critical job and then you say that no no you know you shut down so that may not be always feasible right so then i am you do a live migration when everything is on then you migrate on the things right so what we try to migrate that processor state right if it is a thing ah storage memory network
05:30 - 06:00 connection are moved from one host to another right so processor ah state that whatever the storage things are there ah memory locations so that what is whatever in the so called ram and then the network connectivity whatever is there if there are connectivity and it is going away some network level application is going on that need to be reconnected there right so this this the type of configuration out here in the machine one
06:00 - 06:30 or the what we say source machine and whatever they are in the destination machine there should be a match ah that should be that connectivity because suddenly the if there are external connections and doing some network operation they will be in a for a spin so that is important why to migrate distribute vm load efficiently across servers in a cloud that is one of the major one of the reason what we are telling that load balancing so called so one ah because ah allocation deallocation of the vms what
06:30 - 07:00 will happen the some of the server will be more loaded than the others and then you may want the servers to be ah more distributed ah the loading could be distributed so that means distribute the vm load loads officially across the servers in a ah cloud environment right another is the system maintenance i may want to do some maintenance for a server and in that case i need to migrate those things right
07:00 - 07:30 so this is uh that typical picture ah i think we that this one of the thing you have we have seen earlier also or in several literature so what is there the hardware then the vmm or the hypervisor is there over wire different different vms are there if this is another instant another server right so i can migrate one ah to here right so that is the ideally one in one server one one stack of vm is running and then i want to migrate on the thing right there are
07:30 - 08:00 there can be ah either whole everything is migrated or some of the things migrated there some other servers and things like that right so ah nevertheless ah this what we see that virtualization what we have looked ah studied in earlier means previously so i have vms then i have virtual machines which have some case to s in this case linux something net vste windows and over there different applications are running right
08:00 - 08:30 ok so if we look into the more deeply in the need of vm migration one definitely the load balancing as we discussed that for fair distribution of workload among computing resources right so there can be computing several computing resources and i want to have a fair distribution of the workload of the thing maintenance for server
08:30 - 09:00 maintenance vm can be migrated transparently from one server to another right so the server is going for the maintenance customer is not bothered about it it it wants that sla to be ah respected and its applications should be running as smoothly as was there so that time migrate because you need to shut down that server where those vms has to be migrated here manages in some cases manage operational parameter to reduce operational parameters like power consumption vm can be consolidated minimal number of
09:00 - 09:30 servers under utilized server ah will be put on low power mode etcetera like like i have five servers and ah maybe in some cases the other way of the looking at the load balancing is that that five vms are running in five servers but it could have run in a one or two servers right so i migrate them to the things and other things i put it in the low power mode when the demand is ah low right ah there may be q is violation type of scenario when the service provider fails
09:30 - 10:00 to meet the desired quality of service a user can migrate his vm to another service provider so this is a this is not only the ah say one server within a service provider inter service provider like ah like one server or one dc to another dc it is something like one provider to another provider it wants to migrate right so that can be another situation fault tolerance in case of failure vm can be migrated from one data center to another where they can be executed so
10:00 - 10:30 there can be other ford turnaround type scenario where we need to do some migration when there is when i am expecting ah when the system is coming down and there are faults and etcetera right so there are dif various reasons as you see ah one may be for load balancing ah maintenance managed operational ah the user wants to be migrated somewhere and ah maybe some of the existence is like or fault tolerant type of scenario
10:30 - 11:00 now if we look at the migration things so broadly there are two types right one is cold or non-life migration right another is a hot or lime migration so this is the broad category of ah vm migration right so one is cold or non-life another is a hot or ah live magnesium so what is there in the non-life in case of cold all cold migration ah the vm executing on the
11:00 - 11:30 source machine is turned off or suspended during the migration process so it is ah during the migration process the vm is either shut down or turned off or it is suspended and then it is migrated and then switched on right so in other sense it is it is a non-life so it is not doing the on the fly when the things are running so it is it it actually takes a ah some sort of a quote unquote ah downtime from the user they may have to
11:30 - 12:00 pay some penalty or some cloud credit as a result of that and that is migrated to the other ah vm right another other one can be hot or live migration right in case of hot or live migration the vm executing the source machine continues to provide the service during the migration process so that is the ah this in case of hot or life so it goes on providing when the whole migration process is going on in fact the tigrid v
12:00 - 12:30 m ah is not suspended during the migration process so when it is not suspended it is a life thing it is going on this is pretty tricky definitely there are lot of consideration out here and but at the same time it gives a somewhat ah seamless performance for the ah user right so user practically does not know the migration process is going on so it was working and then migrated to the other thing and its vm remains as the vm
12:30 - 13:00 things itself right so that can be a hot or ah live migration so what we see that there are two broad categories one is a little ah simple or easy to conceptualize easy to implement that is the cold or non live migration other is hot and live migration is there are this is a tricky thing and requires a defined process that or how it will execute right so if you look at the live migration so
13:00 - 13:30 migrate an entire vm from one physical host to another all user of the user processes and the kernel state need to be migrated without having to shut down the machine right so what we are doing the vm from one physical host to another it is migrated ah all ah user processes which are running in that original vm before migration and the kernel state remain need to be
13:30 - 14:00 preserved or should go on running in the next location without any interruption and you do not have to shut down the machine per se right so in case of whereas in case of non-line migration vm provides providing services even suspended during the entire migration process right hence the large size vms the service downtime might be very high right so there can be large downtime
14:00 - 14:30 thing and that can be penalty for this type of downtime in case of a non-life or cold migration so for real time application non-line migration can cause severe degradation in the service quality which is not tolerable so if there is a real life application like something online broadcasting going on or some ah some online some computation going on and things like for some some real life applications and then you have a non-live migration then you have a
14:30 - 15:00 severe degradation problem so two main approaches one is pre copy and post copy approaches are there for this type of migration process right ah this life migration process so when to migrate to remove a physical machine from the service that is as again and to relieve load on the congestor always seen so va migration ah if cost wise the cost associated with
15:00 - 15:30 the communication overhead like if you are migrating the whole thing whole state and other things to the things there is a communication over it and cost associated with the migration migration time downtime so this is this is the costing on the things right how much migration time how much time down time and then effectively affects the ah overall performance of the cloud and at times it may violate the sla for which the pro csp has to pay
15:30 - 16:00 ah pay for it pay the penalty for it right ah and also it hits its ah if a lot of down time and my time times are there then it also hits its reputation right so so these are different typical factor what we intuitively also try to know so based on that so the major concern are the minimize the down time right definitely minimize the downtime down time refers to a total amount of time
16:00 - 16:30 services times ah the services remain available to the user so down time what we say that that overall time ah so long the services are unavailable to the user that is the what we refer to as downtime minimizing the downtime is one of the major challenge minimize total migration time right so migration time refers to the total time taken to move a vm from the source source to the destination host right so i have a vm running right and then ah the full vm is run
16:30 - 17:00 migrated to the another ah host another server and it is running faithfully so no no ah nothing is left behind means like no memory no state and other things right so the total time requires um whether you want to shut down and revoke or do it dive migration etcetera that is the migration time so it can be considered as the total time taken for the entire migration process right so thats why ah minima ah minimum the migration time
17:00 - 17:30 better your your performance because my migration is always a overhead over over the whole system so migration does not ah does not ah necessarily disrupt the active services through resource contents that is cpu network bandwidth ah with migrating ways right um so what it tries to do that ah your there can be resource contents in cpu network bandwidth etcetera where you are
17:30 - 18:00 where at the destination source and destination that need to be taken care of so now what to migrate as we started the discussion with one is the cpu context of the vm like contents of the main memory these are the two things need to be migrated because as if you ah as you have shown the figure so it was running on a hypervisor right so and it has a
18:00 - 18:30 this cpu context of that particular vm and the contents of the main memory of the things has to be migrated because now it is be running on another hypervisor 2 and those has to be instantiated disk so if there is a network attached storage ah that is accessible from the both host ah so so that is one way of handling so both hoster can connect to that particular nas storage right on a server and that is accessible board host or local disk
18:30 - 19:00 is mirror migrating disk ah data may not be critical so if the if it is on a nas server so this disk desktop is still you just need to reconnect now the connection is from this ah particular port or this server and now it is from the other port that may not be very very challenging right but still there are ah nitty gritty is there network assume both the host on the same lan so if you assume both the host of the screen migrate the i p advertise new
19:00 - 19:30 mac address to the ip mapping via irp reply and all those things right so if it is much easier or if it is on the same land or i need to do other other things like proxying and other things has to be there so migrate mac address let switches learn new mac locations network packets redirected to new location with transient losses and all those things will be there right so there are challenges but ah there are
19:30 - 20:00 ways out in a sense that if it is within the same service provider they can extend the lan and things like that can be possible right but never the less network is a issue to be looked into then io devices virtual i o devices are easier to migrate if it is a virtual direct device assignment to physical devices ah vms may be difficult to my means physical devices to the vms may be difficult to migrate so that can be a ah
20:00 - 20:30 if there are virtual devices then it is easier to migrate but if there are direct devices that has ah means device assignment of physical devices to the vms which may be difficult to migrate then you have to take care of that how this connectivity will be there even it the because the physical device needs to now connected to the other things right so if we look at the memory migration steps so one is that push source vm
20:30 - 21:00 continues running while certain pages are pushed across the network to the new destination right to ensure consistency pages modified during the ah processor ah recent so what will happen the source vm is running along with that the this memory pages are pushed to the destination ah where the source is running so whatever is getting modified out here or that it so that has to be retransmitted right so go only transmit
21:00 - 21:30 it so stop and copy source vm stop pages are copied across to the destination wave then the nin vm start right so this is ah thing the so you push it then stop and copy and then pull new vm executes and if it accesses a page that is not found in the copied the page is faulted in the pulled across the network from the source so that means there can be push there can be stop and
21:30 - 22:00 copy or pull the when the source is running ah when the that destination vm or the migrated vm is running it does not find some page then it can be pulled ah across the network from the source vm right so that is say when the live migration is going on so pure stop and copy simply simple but both downtime and total migration time are proportional to the amount of physical memory located in the thing so
22:00 - 22:30 pure stop and copy means you just stop copy and then do then what will happen that how much migration time is dependent on the amount of physical memory allocated to the vm may lead to an unacceptable outage if the vm is running via live services right so if it is a something which is like services is running then this outage may be ah may not be acceptable ah to the um consumer or the user right so
22:30 - 23:00 these need to be taken care so if the live ah some services is providing then this type of things becomes a serious challenge pre copy phase it is carried out over several rounds right when we have a pre copy phase the vm continues to execute at source while the memory is copied on the destination so this is the pre copy phase right ah
23:00 - 23:30 so it continues in a source and then go on ah copying to the continues to execute the source while memory is copied to the destination right pre copy termination phase so when should i stop because if it is running then the some pages are getting modified and things like that and so the stopping criteria for the peak coffee phase takes one of the following thresholds like the number of rounds executed stressor so i say that there will be n rounds and it is over total memory transmitted exceeds
23:30 - 24:00 sso like how much memory can be transmitted over the things it is above earth result or the number of dirty pages in the previous round drops below a threshold right so when when i am basically any modification then only i need to transmit if there is no modification ah then i do not may need to transmit that to the destinations right so when the number of duty pages is below a threshold then i can stop so some of the stopping criteria and stop encouraging phase in this phase
24:00 - 24:30 execution of the vm to be migrated is suspended at the source then the remaining dirty pages ah the state of the cpu is copied to the destination node where the execution of the vm is then the execution or the vm is resumed so what happened the execution of the bm to be migrated is suspended at the source so it is suspended source then the remaining dirty pages ah cpu state are copied to the destination and while ah then the
24:30 - 25:00 execution of the ah is resumed at the destination right so that that type of ah those that is the ah stop and copy phase so iterative pre-copy live migration in case of iterative pre-copy pre-copy this this phase may be carried out over several round as we are discussing the vm continues to execute at the source while its memory is copied to the destination ah active pages of the vm to
25:00 - 25:30 be ah migrated are copied ah iteratively ah at each round ah during the copying process some active pages might be dotted like there may be change in the things as we discussed at the source host which are again recent to the ah subsequent round to the things right if if there is a ah update of the thing so that page will be determinate pre-copy termination what do we already discussed that either the
25:30 - 26:00 rounds are these total this is basically it is more ah the same thing what we discussed in a more organized way ah and other total memory transmitted exceeded threshold or what we see the number of dirty pages in the previous round drops below a threshold those can be there as topic stop copy already we discussed and there is a restarting phase the restart the vm at the destination server so what we see ah when we try to do a live
26:00 - 26:30 migration this is a very pretty complex process and also what we are not looking at that what we are considering that the network delay is minimal right so because if it is a far away migrating from one data center to another there will be serious ah challenge that ah the delay in the overall communication path right so these are these are the things what we look at and post copy live migration so what we did the stop phase stop the source vm and copy the
26:30 - 27:00 cpu state to the destination vm right restart the destination vm on demand copying copying the vm according to the ah demand so the on copying memory right so on demand means as we discuss in the pool like i ah there is a in the destination there is a it found that one some page is not found then it is pulled or on demand it is copied from the source things so in the post copy strategy when the vm is restarted the vm memory is empty if the vm tries to access the memory page that
27:00 - 27:30 has not been copied the memory page needs to be brought from the source vm however most of the time some memory pages will not be used so we need only a copy of the vm according to the demand so that means it is not like that all are need to be copied so in reality what will happen you have that things which are there and whatever is on demand you are basically migrating those so with this let us ah conclude our today's discussion and we will continue
27:30 - 28:00 this migration things in our next subsequent talk and there are very some of these nice references you can have a look in through the things right on vm primarily on vm migration mostly more critical on the live migration of the vms right with this let us stop our today's discussion thank you