#57 2D Geometric Transformations | Part 1 | Modern Computer Vision
Estimated read time: 1:20
Learn to use AI like a Pro
Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
In this insightful lecture by NPTEL-NOC IITM, the focus is on understanding 2D geometric transformations, starting with concepts like similarity transform, shear, affine transformations, and more general transformations like projective and homography. These transformations play a crucial role in computer vision for tasks such as image stitching and data augmentation. The talk delves into the mathematical intricacies of these transformations, explaining how different operations such as translation, rotation, scaling, and shearing can be mathematically represented and the importance of homogeneous coordinates in simplifying these operations. Additionally, the lecture briefly touches upon the theoretical aspects of projective geometry and the intrinsic and extrinsic parameters of camera motion, making it a comprehensive overview of 2D transformations in computer vision.
Highlights
Similarity transform combines scaling, rotation, and translation ✨.
Shear transformation visually explained through a relatable car and zebra crossing scenario 🚗🦓.
Affine transformations preserve parallelism but not necessarily angles 🔄.
Homogeneous coordinates add a dimension to simplify transformation equations 📏.
Projective transformation or planar homography is the most general image transformation used for 2D mapping 🚀.
Key Takeaways
Understand the basics of 2D geometric transformations like shear, affine, and projective transformations 🚀.
Learn the importance of homogeneous coordinates in simplifying complex transformations 🔍.
Gain insights into how these transformations are used in computer vision tasks like image stitching and data augmentation 🖼️.
Explore the role of intrinsic and extrinsic camera parameters in transformations and image alignment 📷.
Homography and planar transformation provide the most general form of transformation for image alignment 😎.
Overview
In the video, the speaker begins by explaining the concept of similarity transforms which combine uniform scaling, rotation, and translation. This is followed by an introduction to shear transformations, which is humorously illustrated by a car and zebra crossing analogy. The lesson quickly transitions into affine transformations, describing how they preserve parallel lines but not angles.
The lecture becomes increasingly mathematical as it delves into the concept of homogeneous coordinates. These coordinates simplify the way transformations are calculated by introducing an additional dimension, allowing for more intricate and higher-dimensional transformations. The speaker provides a thoughtful explanation of homogeneous coordinates and their application in performing translations, rotations, scalings, and more.
Towards the end, the focus shifts to projective transformations and planar homography, the most general transformations used for image alignment. The instructor elaborates on the intrinsic and extrinsic parameters of camera motion and their impact on image transformations, making it clear how these principles are applied in real-life scenarios like panorama stitching.
Chapters
00:00 - 00:30: Introduction and Recap This chapter serves as an introduction and provides a recap of the last class. It mentions that the previous class covered similarity transform, which involves uniform scaling, rotation, and translation.
00:30 - 02:00: Shear Transformation The chapter discusses shear transformation, focusing on how it affects coordinates. It explains the concept using x and y coordinates and how an X Shear transformation modifies these coordinates. The result of an X Shear is shown through the matrix [1 k; 0 1], signifying how the x-coordinate is transformed while the y-coordinate remains unchanged.
02:00 - 05:00: Affine Transformation The chapter titled 'Affine Transformation' discusses the concept of affine transformations using the visual analogy of observing a zebra crossing while traveling in a car. It explains how the scene sets itself up in a certain way, where the normal of the scene aligns with your direction of motion. The focus is on understanding this alignment in terms of affine transformations.
05:00 - 08:00: Homogeneous Coordinates The chapter discusses the concept of homogeneous coordinates and their application in graphics transformations, specifically focusing on shear transformations. The explanation highlights how shear transformations can alter the shape of geometrical figures like squares by slanting them in specific directions depending on the shear type. This serves as an example to understand how different transformations can be represented and manipulated using homogeneous coordinates.
08:00 - 12:00: Translation and Rotation with Homogeneous Coordinates This chapter discusses translation and rotation using homogeneous coordinates. It introduces the concept of sharing operations along the X or Y axis, explaining how the shear transform can be applied in a 2D space. The chapter focuses on the mathematical representation and computation of these transformations, exemplified by coordinates x_t and y_t, where x_t reflects the x-axis transformation and y_t reflects the y-axis transformation.
12:00 - 15:00: Projective Transformation This chapter discusses the concept of projective transformation, explaining how shear occurs depending on movement. It introduces the idea that this phenomenon is a special case of a more general concept known as affine transformation, hinting at an upcoming discussion on an even more general form of transformation.
15:00 - 18:00: 6D Motion and Camera Motion Analysis The chapter discusses 6D motion and camera motion analysis. The key concept is about understanding how the center of an object can be kept stationary while tilting, indicating a more general movement pattern represented mathematically by x(t) and y(t). By the end of the class, students should be capable of stitching images captured by a camera, utilizing their understanding of motions like x(t) and y(t).
18:00 - 23:00: Homography and Camera Parameters This chapter introduces the concept of homography and camera parameters, focusing on affine transformations. It explains that affine transformations are general transformations that preserve parallel lines. Only specific cases of the transformations differ, but affine can broadly cover various transformations.
#57 2D Geometric Transformations | Part 1 | Modern Computer Vision Transcription
00:00 - 00:30 foreign [Music] [Music] so last class we did what is called similarity transform then the next one okay which is a combination of uniform scaling rotation and and translation
00:30 - 01:00 so another thing right which is if interest is what is called Shear let us see right how this transformation looks like so you have like x t y t and then if it is an X Shear right you will get one k zero one excess OAS so what this means is that your x t right is going to look like excess
01:00 - 01:30 K times y s that is your y t is equal to y s OK this actually this is something that you can see when you are traveling in a car and if you look at a zebra Crossing and uh so the scene kind of sets itself up in a particular way in the sense that the you know normal sets itself up in a particular way you're traveling in this contested Direction the normal of the scene is like this and your motion is along along X
01:30 - 02:00 so you'll get something like this so what this means is right if you had a square right like this then what this means is that ah you know you can get a get a get a share like that okay or if it is the other kind of Shear then it will look like this I mean depending upon whether the shear
02:00 - 02:30 is along X or Y okay this is called the you know the the sharing operations the other kind of Shear is that this becomes one zero and that becomes K one now and this is like excess y s so in this case you get like x t y t so x t is equal to what is the excess right and Y T becomes
02:30 - 03:00 equal to K times x s plus y s ok so yeah so which way the shear occurs right depends upon depends upon uh depends upon how we are moving but typically right this is what will happen actually right this is all a special case of something which is more General uh what is called what is called an affine a fine transformation and there is something more General than a fine right which we will see shortly
03:00 - 03:30 but ah okay this Center should be right there okay the center does not move I'm just going to say tilts like that ah so this is a sign right this more General and it looks like x t y t OK and and by the end of this class right you should be able to stitch images for example you should be able to take care of a camera capture pictures and be able to stitch them so x t y t so this looks like a b c d
03:30 - 04:00 x s y s plus d x t y so as you can see it all the others that you saw previously were all going to see special cases this is a fine so if you take a b c d to B of A particular kind then you know right you can get the rest but ah sort of a general transformation is a fine and what it kind of see preserves is that parallel lines remain parallel
04:00 - 04:30 remain parallel angles are used are generally not noticeable not necessary preserved angles are gently not preserved not preserved preserved and preserves ratios of lens
04:30 - 05:00 of parallel preserves ratios of lens of the parallel segments I mean if you have if you have a general a fine but in some cases you may still you may still end up end up keeping the Angles and all intact I mean that depends upon how the ABCDs are because the previous ones are all special cases of this are fine but in general if you take if you
05:00 - 05:30 have like a b c d in general if you have a general effect transformation then although you can expect is parallel lines remain parallel angens may not be good to say preserved see for example they know in the earlier example when you had a Shear right I mean initially the angles were all good as right angles but the moment the shear happened right the angles changed and then The Preserves ratios of lens of the parallel segments okay ah there is something something even higher
05:30 - 06:00 than a fine OK ah but then prior to going that right we will just look at what are called homogeneous coordinates I mean there is a way to see if you see right I mean you know the way I'm going to say writing this is that in some cases I am able to write it as a transformation on this x s comma y s but in some cases right I'm not like for example A B C D into x s y s plus d x t y right it does not look like look like a you know a direct transformation on let's say red x s comma y s it looks like I have to do something and then plus add something right so it does not look very you say elegant
06:00 - 06:30 OK there is something called ah I mean a projective space OK I mean there's a whole area out there okay which is called a projective geometry idea is not to go into the details of that but just to take red elements from there which are actually relevant to us and what are the things that is that is taken from from you know a projective geometry is what is called what are called homogeneous coordinates
06:30 - 07:00 so for example the coordinates that we deal with are called are called actually heterogeneous coordinates the ones that we normally use inside a Cartesian space but then in a projective space these are called homogeneous coordinates and one can kind of go from the homogeneous coordinates to to your say heterogeneous coordinates the way to interpret it is if you are a heterogeneous coordinate or what you normally use in a Cartesian
07:00 - 07:30 space heterogeneous is coordinated if it is let us say x y okay if it is like x y then the homogeneous coordinate can is just you know you just have to add one more sort of a dimension to it call it X Y one or in general right it can be alpha x alpha y alpha alpha not equal to zero we will see later a little bit more
07:30 - 08:00 about what this is what this coordinates are and uh okay the last one is Alpha so what this means is if you scale the first two coordinates by the last one right you should you should be able to go there go to this or go to this heterogeneous coordinates so you can think of this as a sort of you know points on a line like this right and then and then a projection right of this of these points so for example right every every I mean these are called homogeneous because all of them say represent the same right I mean X Y comma one I mean if you okay that's the right think about it right so it's
08:00 - 08:30 like X Y 1 right so they all kind of I mean they are going to look the same in the sense that except for a scale factor if it's saying that if you travel on this line and if you were to say project them all here and and if you had your X Y axis like this right where let's say this is your Z and this point is like X Y 1 okay this Z this x this is y then all the points on this line okay if you if you if you're going to map them onto that plane right they would they would all map to the same X Y comma one we will see this in more
08:30 - 09:00 detail later but for the time being right just because remember that if you are using the homogeneous coordinates and if you want to go to this heterogeneous coordinates because finally when when you do image Transformations right you have to come back to the to a Cartesian space all that you need to do is even third coordinate whatever it is that you have you if you if you divide the first two by the third that you will come back to your to your original original space where you are operating okay for the timing OK this much is enough when we we will kind of make use of this later I
09:00 - 09:30 mean you know it also there are there are other things to this in the sense that right I mean you know points like points at infinity and all right which in a which in a heterogeneous coordinator is not really a number right in that sense whereas you know in a homogeneous coordinate you can write you know a point at Infinity this is X Y zero okay so you can still use finite numbers to tell that you have a point at infinity and so also it helps you know so in terms of geometry rate it explains points at infinity and all Vanishing points as they occur but right now OK we will not we will not get a get into that for our for our courtesy you know this
09:30 - 10:00 one a temporary this one the purpose is want to know man how to write this homogeneous coordinates because if you know this right then I can go back and rewrite many of the things that I wrote but now you know I could have much more elegant way for example if I go back to see translation okay because we need this prior to going into into a projective transformation so for example translation rate if you remember we had simply x t y is equal to x x y s plus t x t y right now instead of that what I am going to write is I'm going to write this as x t y t extended
10:00 - 10:30 extend extend this is a dimension by one and then I'm going to write this as a kind of a three cross three Matrix and then I am going to write my source as x y s one okay in this case I know that the last bond rate is one in general it need not be but in this case I know so now for example when so now if you had to write this how would you write it now how we develop this Matrix if you have to represent a translation it will be 1 0 t x 0 1 t y 0 0 1. so if you see right this
10:30 - 11:00 is like x t is equal to x s plus d x y t is equal to y s plus t y and one equal to one right so so you see that exactly what we had earlier but now whenever it looks like a direct transformation and I mean otherwise right if I had asked you to write x t y t and then I had what did you have x s y s plus t x t y as a kind of as I say two cross two operation can you write this
11:00 - 11:30 as a two cross two Matrix multiplying x s y s you cannot you cannot write it right so this homogeneous coordinates allows you to allows that flexibility then this interpretation becomes much more easier it looks like you know you have the source coordinates you just want to apply on them and if it turns out that that let us say let's say that if this last row is not something like zero zero one in general need not be then in that case whatever is this number that you get here okay if you use that number and scale these these top two coordinates then then you have but then you're back to this heterogeneous okay just have to remember that because
11:30 - 12:00 if you want to go back to the original space that is like X Y one and and for that it should just scale I mean if it so happens that the last number is not a one in this case it turns out to be one therefore you don't have to do anything you directly get your x t y t if that is not happening then you should scale it for example if you had to do do a rotation right a 2d rotation how would you write it now you will again write x t y t one then is going to be x s y s one then you'll have like cos Theta
12:00 - 12:30 sine Theta minus sin Theta cos Theta know this one translation zero zero one right of course this even in kind of a two cross two you could do this actually even there you could write this is a two cross two times x s y s but but does that right but translations you couldn't write at that time but now now you can write everything in this form rotation and then if you think about what was that scaling right so scaling would have been would be like x t y t one
12:30 - 13:00 and then x s y s one that you have like let's say your scale factor is a then let's say 0 0 a 0 0 0 1. OK that would be scaling and then you can also have a combination of these Transformations right it depends upon the order in which you're going to apply them so for example you can have x t y t let us say one and I can I can think of think of you know these three cross three matrices so
13:00 - 13:30 if you have a similarity transform I can have a scaling I can have a rotation I can have whatever a translation but then if you change the order then again the entire thing will change okay so so this so This Matrix operations do not get a commute okay so it depends on what you're applying first right so for example if this is your x s y s one and if you say that this is translation suppose right the rotation in the scaling then it means that you're first applying a translation on X Y is followed by a rotation followed by by scaling right
13:30 - 14:00 and and uh and this last coordinate you have to watch out for okay it depends upon what's happening on this right hand side so now if you if you know for example write a fine find it if you had to write how would you write that now you have x t y t one and then you will have x s y s one and then you have some like a b c d e f 0 0 1
14:00 - 14:30 right that would be enough fine or you can write the C and f s t x comma d y Fusion OK and ah so so what this really means is right in a fine you've got like six unknowns and all this all this right goes back to how we are able to solve for an image transformation finally why are we doing all this simply because if you had to do and do a data augmentation then we should know how to how to see generate or synthesize an image if you are trying to align two images like in a kind of a
14:30 - 15:00 panorama where you want a stitch then you need to know what is the transformation across them which means that which means that if you know a priority that is an offender then you need to only solve for six unknowns but in general if you don't know anything about it then it has to be higher than this right we do not know what motion happened but if somebody gave you a pride information that it's only enough fine then you can restrict yourself to six if they say that it's only a pure rotation then you can just restrict it to one if it is in plane rotation okay and then these are the form of R will change okay depending upon whether
15:00 - 15:30 it is say OK yeah right which is which is what I am going to say going to come to next which is actually a projective a projective transformation a projective transformation or it's called a planar homography of planar homography this is also something called called A rotational homography right which we will see at the end but then a projective transformation [Music] is the most sort of see General okay and
15:30 - 16:00 this projected transformation is again a three cross three Matrix but which can be only estimated up to a scale because of the fact that these homogeneous coordinates right you know them only up to a scale right so so if you think about it you will have like x t y t one and you can write this as we will just use okay h11 so this is a homography matrix okay so the standard notation is H one one H one two H one three h two one h two two
16:00 - 16:30 h two three h three one it's three two H three three multiplying excess y s one okay and now right this this in general I just multiply this by Lambda because right it's there is no guarantee that this last number will be one now right because of the fact that this could this could come out to be anything right so I'll just say Lambda okay which is also the reason why this Matrix itself has only eight unknowns now because the Matrix can only be found up to a up to a
16:30 - 17:00 scale factor OK so you have actually eight unknowns even though there are there are nine entries in that Matrix right you can only find it up to let's see eight unnotes ok so this is the H Matrix this this is a planar homography and uh at the end you know I actually read this actually relates to the say right now the way red we are actually thinking about it is if I had if I had write two images and if I wanted to find out what was the relation between the two in general OK and this
17:00 - 17:30 comes from a full 60 motion ok 60 motion what that means is I could I could sit at a place like so for example if I'm at Point C and I want to go to go to go to say Point C Dash then I can translate along what is it so this is like X I can go like t x you can go like t y I can go like you say t z okay I am allowed so this camera so this is in the this is the camera space okay all these coordinates that we are talking about are all in this here are all on the you know image space right whether at x t y
17:30 - 18:00 t and all does not does not is this is not in this 3D World right I am talking about what's happening on the image plane but all of this is happening because of because the camera is moving in a certain way right and the scene is going to say directed at the camera in a certain way again it's an interplay between the two and what this means is that if I am here and then and then if I want to go to C Dash I can go through through three translations and I can have three this rotations rotations means I can have the whatever right I mean I could actually spin about the x axis I could rotate about the x axis or
18:00 - 18:30 I could rotate about the y axis or I could I could say rotate about a z axis right so the Z axis is what is what aligns with the optical axis so your in plane rotation is typically this about the optical axis anything else that you do will cause will cause and is out of plane rotation I mean something like this that will cause an outer plane rotation or if you what is the other one this way right so if no no it's like this let us see your AC image plane is fixed right so what this
18:30 - 19:00 means is that what kind of an image will get formed say for example if your if your camera undergoes undergoes a rotation let's say right and then you go here then the idea is that how do you relate the image which you get get in this view to this View um the projection yeah but but we need to be able to show what is what is going on right I mean we know that we know that all of these have a have a role to play so how they how they actually come into the picture
19:00 - 19:30 right you're doing a rotation 3D rotation you're doing 3D translations this homography helps you to relate the two images finally because database eventually what do you want you want to be able to relate images right you see you may not even be interested in the camera motion actually most of the time you may not be interested in what is a camera motion right who bothers about it for example when you Stitch the images right do you think that you will be very interested in knowing how the camera what motion the camera went through it'll be only interested in and it's aligning the image you see this homography it is was
19:30 - 20:00 in the camera homography yeah no no what I am saying is the camera motion like this t x t y t z which is in the 3D world and the RX Ry r z that may not be specifically of interest to you right OK that's what I mean that means those six unknowns that are actually sitting there you may not be interested in them or you may not be even interested in how they seem normalness see for example if you take if you take a unit normal at
20:00 - 20:30 the scene right you've got to see two unknowns there right and the third thing is how far away is the scene right none of this you may be interested in all that I know is anything like that can happen the the the the optical axis so this so the scene could be completely front or parallel it could be inclined at some angle I could be I could be moving arbitrarily right you know in a kind of you know I could be having an arbitrary 60 motion but I may not be at the end of the day interested in knowing what how much I moved by my interest is only in being able to relate the relate
20:30 - 21:00 the images because all that I have are feature points right finally I have an image here I have an image here if I can get my feature correspondences I should be able to align the images due to motion all of this is because of motion by the way everything that we have said is because of camera motion none of this is anything other this is not a barrel Distortion or a distortion of the lens or anything this is all pure camera motion you don't I'll show you no no
21:00 - 21:30 ok so yeah so the idea is that in general see it's like this right I think you know what you're trying to ask me is should it should I not know how the camera moved or something no the whole idea is you don't need to know in fact this homography Matrix itself right this has a form this comes as some ah what is it R plus 1 by d t n transpose K Dash ok ok inverse OK this is the a this I mean right I'm not showing all this but this is the homography actually if you have a planar situation right what this means
21:30 - 22:00 is you have a you have a you have a normal for the plane which is which is n you have a translation of the camera which is actually you know a 3D 3D translation then this Vector if you take the outer product right this becomes a three cross three Matrix then there is a rotation which could be a combination of r x r y and r z in some sequence and and then and then right you have an you have what is called an intrinsic camera Matrix which is K so so there are various factors here here at play okay there is something called the intrinsic which is the camera camera Matrix which
22:00 - 22:30 is K I will talk I will talk about it I'll talk about it right in the you know next class right how you arrive at the camera Matrix and so on but that is again there okay it does not mean that it's not that the camera intrinsics also matters then there is an R and T right which is a rotation and Transit which is supposed to be extrinsics so these are extrinsic parameters because they don't belong to the camera you just just depends on how you move k something intrinsic to the camera it's like saying I know where the center of the center of the center of the camera is is it kind of deviated are they are they are the
22:30 - 23:00 axis of the camera exactly exactly orthogonal is there a skew all of that so that's all it is all related to the camera per se R and T are not related to the camera that's why you call them as this extends X but then the idea is not to kind of go into this the idea is not at all to go into this we do not want to go into that ok that's what I am saying without going into that you want to be able to write relate images ok that is the idea so no no no no actually say I don't want to
23:00 - 23:30 get into this but it's more or less like this right see for example I mean the eight unknowns right if you want to get us think about it it'll be like you know three three four R three for T and then you see two for n because it's supposed to be a supposed to be a you know a unit normal so you have two unknowns for the normal three for the translations three for r i mean if you want to think about it that way but the whole idea is not even to get into see it's not like we have got an h and therefore we know we want to go back and work out R and T you know we are not even interested in that the inverse problem there's something
23:30 - 24:00 called you know finding out the camera motion given the homography Matrix we don't want to get into that it's like saying given this right can you can you arrive at this I mean that's that's a harder problem you don't want to even get into that Panorama stitching does not involve any of that ok now what this means is the following okay and I think I think we we spend too much time on that I wasn't planning to spend that much time [Music]