Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
In this episode of Computer Vision Decoded, the hosts explore the intricacies of 3D reconstruction using COLMAP, a popular open-source software. The discussion begins with an introduction to COLMAP and its capabilities in translating images into 3D models, with specific input from computer vision expert Jared Heinley. Listeners learn about the key processes within COLMAP's pipeline including correspondence search, initialization, and incremental reconstruction. With a focus on practical applications, the episode offers insights into efficient reconstruction techniques and the implications of using global reconstruction tools like GloMAP.
Highlights
Discover how COLMAP helps unravel the 'black box' of 3D reconstruction. π¦β¨
Learn why selecting the right camera model impacts 3D modeling accuracy. π·π―
Explore the magic of feature extraction and image matching within COLMAP. ππ
Witness the intricate dance of camera poses and sparse point clouds. ππ
GloMAP's global reconstruction makes 3D modeling faster and efficient. ππ¨
Key Takeaways
COLMAP transforms ordinary photos into 3D models using cutting-edge techniques. πΈβ‘οΈπ
Understanding camera models and matching strategies is key for accurate 3D reconstructions. π§ πΊοΈ
Improving 3D modeling is all about experimenting with features and settings in COLMAP. ποΈπ
GloMAP enhances processing speed by tackling 3D reconstructions globally instead of incrementally. πβ±οΈ
Trying 3D reconstruction with your own images offers valuable learning experiences. π€³π¬
Overview
Today, we unlock the secrets of COLMAP, the open-source marvel that's taking 3D reconstruction by storm! π Join us as we explore how this innovative tool can transform basic photos into precise 3D models. Our hosts team up with Jared Heinley, a computer vision guru, to delve into the nitty-gritty of structure from motion and capture the essence of 3D imaging.
Ever wondered how your camera's settings affect its ability to capture the world in 3D? π€ COLMAP's process begins with smart feature extraction and strategic image matching. Understanding these steps ensures that whether you're collecting images with a smartphone or drone, your reconstruction results are top-notch.
The conversation doesn't stop at COLMAP! π Meet GloMAP, the turbocharged global reconstruction solution that drastically reduces processing time. By taking a holistic approach to image data, GloMAP offers a faster alternative to traditional methods without compromising on accuracy. If you're serious about mastering 3D reconstruction, this episode is your playground!
Chapters
00:00 - 01:00: Introduction In the 'Introduction' chapter, the episode of 'Computer Vision Decoded' sets the stage for exploring structure from motion and 3D reconstruction. The focus is on addressing common questions and base-level tasks related to 3D reconstruction from images. The host is joined by Jared Heinley, a computer vision expert, to guide listeners through the processes involved in using software like Colemap for these purposes.
09:00 - 21:00: Feature Extraction Feature Extraction: This chapter explains the process of understanding camera poses and 3D reconstruction techniques. It aims to demystify software like Cole Map and similar 3D reconstruction tools by breaking down their workflows. The episode's goal is to provide listeners with an in-depth understanding of these technologies, improving their ability to work with or utilize 3D reconstruction software effectively. The discussion features an introduction to the episode and an emphasis on learning about the Cole Map software, though not exclusively.
21:00 - 31:00: Feature Matching and Geometric Verification In this chapter, the discussion revolves around the use of open-source software called Cole Map for feature matching and geometric verification. The speaker emphasizes the advantages of using open-source tools like Cole Map, highlighting that they are free and provide more learning opportunities compared to some paid third-party software. The chapter begins with Jared preparing to share his screen and presenting images intended for conversion into a 3D model.
31:00 - 40:00: Incremental Reconstruction The chapter 'Incremental Reconstruction' begins with the speaker discussing the importance of understanding the spatial relationship between cameras. They mention that they will be sharing their screen, and although the session is mainly audio, they will try to describe the visuals for those who cannot see them. The speaker introduces the subject by showing a picture they took on a sunny day of a fountain that used to work in front of the Oregon State Capitol. They walked around the fountain to take multiple pictures, indicating a technique of collecting various perspectives for a comprehensive reconstruction.
40:00 - 50:00: CPU vs GPU and GlΓΆmap Introduction The chapter discusses the process of turning videos into 3D models, highlighting the importance of knowing the positioning of cameras. Despite challenges such as sun glare, it is possible to extract usable images for modeling. The conversation emphasizes the initial steps required for 3D modeling.
50:00 - 57:00: Considerations and Tips The chapter discusses the interchangeability of the terms 'camera' and 'image' and the usage of a single camera, such as a phone or DSLR, to capture images. It highlights the process of extracting frames from videos or taking photos manually to gather multiple images. Additionally, it emphasizes how moving around with the camera results in capturing images from various 3D physical points in space.
57:00 - 62:00: Conclusion and Recommendations In the chapter 'Conclusion and Recommendations', the discussion centers around the natural human ability to perceive depth and spatial relationships from 3D images. The speaker illustrates this by referring to how we, as humans, interpret 3D perspectives, such as when viewing photos where our brain automatically understands the positioning of objects like fountains and trees with varying distances. This chapter emphasizes the innate human capability to process spatial information, which serves as a foundation for the concluding thoughts and suggestions presented.
Understanding 3D Reconstruction with COLMAP Transcription
00:00 - 00:30 Welcome to another episode of computer vision decoded. I'm really excited about this episode because it's going to solve a lot of questions that we get about structure from motion and 3D reconstruction when it comes to coal map and just figuring out how to do some of the basics of 3D reconstruction from imagery. And as always, I have Jared Heinley, our in-house computer vision expert, to walk us through what happens when you run software like Colemap to
00:30 - 01:00 get camera poses, 3D reconstruction, and kind of break down how that all works at a tangible level. So when you walk away from this episode, you should have a better understanding of this black box of Cole Map and other 3D reconstruction software that follows the same workflow. So, as always, Jared, thanks for joining me and welcome to the episode. Yeah, thank you. Let's just get to what we're all here for. Let's let's learn about Cole Map. And I don't want to say specifically Cole Map, but we're going
01:00 - 01:30 to use it as the basis for this episode to have something for someone to follow along. And since it's open-source and free, they can download Cole Map and do this on their own PC without, you know, have to pay for some third party software that they won't learn as much through. So Jared, let's just start off with I'm going to share my screen. I have some images and we want to turn these images into a 3D model or just at
01:30 - 02:00 least know where these cameras are in relation to each other. I'm going to be doing some screen shares. If you're listening to the audio only, I'll do my best to talk about what we have on the screen. But, uh, if I start out here, I have a picture of a well, it was a fountain that used to work in front of the Oregon State Capital. I took this one sunny day last year. And if I flip through the images, I basically walked around this fountain and got a bunch of
02:00 - 02:30 good angles. In fact, I believe I used a video and extracted a bunch of images and at some points there's some sun issues, things like that. But it was good enough for me to get a 3D model. So Jared, what's what's the first step someone would take then to turn this into a 3D model? Know where the cameras are, things like that? Yeah. Yeah. Well, you just you hinted it right there at the very end. Know where the cameras are. And I guess and to try to refine some of my language. A lot of times when
02:30 - 03:00 I say camera, sometimes I mean, you know, image and camera. I'll use those words interchangeably sometimes, you know, but you said that you walked around with a single camera, you know, your phone or a DSLR or whatever it may be. And from that video, maybe you extract frames, you know, images or you took photos yourself. And so you have multiple images taken by a single physical camera, but you were moving around that scene, moving around that object. And so that camera was occupying different physical 3D points in space
03:00 - 03:30 and then these images were captured from those different 3D points and those from those different 3D 3D perspectives. So you know as humans we just do this naturally like as you just flipped through those photos there and you know uh you know and as you kind of orbited around that fountain it's like yeah our brains are immediately like oh yeah okay I can see that the ground is a little bit closer. Here's this foreground fountain. I see the trees in the background. I see some other structures in the background and I'm immediately I can see that yep you were moving to the
03:30 - 04:00 left and sort of this clockwise motion this thing's you know near that other things are far and our brains are immediately doing all of that 3D reasoning but in order to have software do this in order to have a computer generate a 3D reconstruction or 3D representation of what's in these photos it has to figure out it has to do all of that math and it doesn't know how to do that reasoning by default. has to figure out well where were you standing when that photo was taken? Where was the camera positioned? How was it angled?
04:00 - 04:30 What was the zoom level uh of of the current lens? And so it's doing all has to figure out where everything was oriented. And that's typically one of the first processes is trying to figure out how how are things related to each other, you know, and once we kind of know how they're related, then figure out what is the 3D 3D geometry that that uh describes describes that relationship. And so it goes through. So, I I don't have it on my screen, but I will pull it up in a second, but Cole Map has in their tutorial information a good kind
04:30 - 05:00 of diagram. I'll let me bring that up, but it it it basically shows the workflow that it goes through. So, if I go to the actual website for Cole Map and you go look at their tutorial, you can see that. So, let's just pull that up on my screen as well. while he's pulling that up. Um, just jump in with a little bit of personal history about call map. So, I did my PhD back at UNC Chapel Hill. So, I was there from 2010
05:00 - 05:30 to 2015. And while I was there, Johannes Schunberger, he came to UNCC for two years to do his masters. And so, Johannes, he's the author of Call Map. Um, but at the time when he was there, call Map didn't exist. Johannes had worked on uh previous structural motion software and had built he'd worked with some uh I believe it was drones so aerial photography 3D reconstruction and so he had built a pipeline that he had called MAV map I I'll probably get this wrong but I think like you know mobile aerial vehicle mav map like map or
05:30 - 06:00 mobile vehicle mapper and so but he was looking to generalize that to move beyond just aerial photography and to do more general purpose image collections and So it was this idea of image collections you know where he came up with call map collection mapper um to say I want to take a collection of images and generate a 3D reconstruction from it. So he was working on that while he was at UNC. I may have been one of the first people to actually use calm map um in in my final uh PhD project. I
06:00 - 06:30 had uh processed a 100 million images on a single PC and I was doing so this you feature matching extraction but then I needed some way to reconstruct them and our lab had some other software that could do 3D construction but Johannes had just written this first version of call map and so I said great let's use that and that that was efficient that was fast and it did did exactly what we needed to do and so that that helped uh helped get my paper across the gold line there at the very end so nice and since
06:30 - 07:00 then Johannes has gone off you know at ETH circ and now uh at other companies and continued to you know open source call map and now it's used all over the world and has won won him some awards for it. So you know interestingly the glow map what came out last year and he had his fingers in that as well. Yep. So it's not over. I still see coal map being updated on a semi-regular basis as well. So although it came out a few
07:00 - 07:30 several years ago, it's it's not static. No. No. because it because it is such a an important step the the task that call map solves and and similarly glow map you know figuring out the 3D pose you know pose is position plus orientation figuring out the 3D pose of images is a key step in so many uh 3D pipelines you if you want to understand the world in 3D you got to figure out where these images were taken from you know and that's the key task that that call map
07:30 - 08:00 uh solves for a lot of people Okay, that that makes that makes sense. I had no idea also that coal map stood for collection mapper. I'm I mean it makes sense, but I thought maybe it was a long acronym. So, um so, okay. Well, I have this diagram up then. If you're watching, you can see it on the screen, but if you're listening, it's basically a workflow of how images go from just a collection of images to a 3D reconstruction. And you got camera
08:00 - 08:30 poses. And I'm going to show this in coal map on my screen as well. But this diagram just shows the different phases are the right or steps that you go through to get from pictures to 3D. And it starts out with feature extraction. And if you actually go to the tutorial as well. So if I share the just the tutorial page, that diagram makes sense. But the minute you start diving into it, you have a wall of text that to most people won't make through this very well unless they are perhaps a computer
08:30 - 09:00 science major, someone like Jared who does this academically or for a job. I look at this and I'm like, okay, some of this makes sense. A lot of this is beyond me. So, we're going to break that down. So, yeah. Okay, starting out with feature extraction. So, what is that step? So, what do we We're taking the images and sounds like something's happening there with features. Yeah. Yeah. Absolutely. So, uh, and just to take a step back here too. So, like we like you said, this is, you know, sort of a workflow, a sequence of steps that goes into generating a reconstruction. So, you had those images input. There's
09:00 - 09:30 a sort of first block of steps that's labeled correspondent search. After that, we have incremental reconstruction and then finally we end up with a final reconstruction. But yeah, so within correspondence search, our goal for correspondent search is to figure out the 2D relationship between that collection of images. So, we're not even talking about, you know, real 3D yet. There might be some hints at 3D uh in these steps, but we haven't done any reasoning to really understand which photos, you know, are where in 3D space.
09:30 - 10:00 Um, so it's just about 2D understanding, 2D matching, 2D correspondence between this this collection of images. So, with that in mind, first step is feature extraction. So the goal there is to identify unique landmarks within a photograph. And these unique unique landmarks, the intent of that is if I can identify a unique landmark, you know, a 2D point in one photo, hopefully I can identify that same point in
10:00 - 10:30 another photo and another photo and another photo. And if I can identify and follow or you know or track that 2D point between multiple images now I can use that as a constraint later on when I do the 3D construction. I can say, "Hey, hey, however these images are positioned, that point that they saw, that pixel should converge to a common 3D position in space." And so it's adding a sort of a viewing constraint saying, you know, each image saw a 2D
10:30 - 11:00 point. I don't know the depth of that point. So it all it sort of gives me is a viewing ray. So along this direction out into the scene, I saw this unique landmark. Now, I've seen that same landmark in many other photos. I want to identify that and add that as a constraint because that is like most likely you know a 3D point. So feature extraction is the automatic identification of typically tens of thousands thousands or tens of thousands of these unique landmarks in an image. A lot of times there are different flavors of feature detection. The one used in
11:00 - 11:30 call map is sift scale and variant feature transform. What it does is it looks for I call it a blob style detector where it's looking for a patch of pixels that has high contrast to its its background. So it could be something that's you know light colored surrounded by dark or vice versa something that's dark surrounded by light. You know it's going to look at look for these at multiple scales. That's why it's scale invariant. So multiple resolutions. So this could be something that's you know very small or something that's larger in
11:30 - 12:00 the image. Mhm. But once it's found that sort of high contrast landmark, it now will then extract some representation of uh the appearance, you know, of of the area around that landmark. So it'll say, "Hey, I found something interesting." So maybe it's the um you know, a doorork knob on a door, you know. So it'll say, "Hey, that that doorork knob is a different color than the background, the rest of the door." And so now I want to describe that door knob. And so I'm going to look
12:00 - 12:30 I don't want to look just at the doororknob itself. I'm going to look around it and say here's my doorork knob and then oh there's this wood pattern on the door around it. And so it's going to come up with a representation for that. And so what sift actually does or what different feature representations are that could be a whole podcast in and of itself. But at a conceptual level you just think about it. It sort of summarizes what that looks like at at a rough level. It says, "Okay, I saw something dark in the middle and then there was this, you know, rough pattern
12:30 - 13:00 around it vicinity." Mhm. Okay. So then I'm bringing up coal map and this is I've unfortunately had already run the project because I didn't want us to have to sit and watch things go and a lot of these things run really fast. So sift is fast if you can run it on GPU. I don't can't necessarily show what 10 what say I think it maxes at 10,000 by default but if you have coal map and you kind of want to follow along the first thing you do is set up a new project and that part's pretty easy but then you just go
13:00 - 13:30 to processing and hit feature extraction and you get to pick a camera model. Why is that important? Why is picking a camera model important for this? Well, this is important and and this this ends up being really important later on when we start thinking about the geometry of these images and and what kind of camera and lens was used because these camera models are it is defining the geometry of that camera. So this right now you have a simple radial camera you know selected and so
13:30 - 14:00 underneath of it sort of in grayscale are some parameters listed. It says, "Oh, simple radial has f, cx, c y, and k." Mhm. And so you kind of have to know from a computer vision literature that f is your focal length. CX and CY, that's the principal point. So that's defined, well, where is the center uh of my image or where is the optical axis of my my lens and how is that aligned with the image center? So a lot of times I just kind of say, hey, hand wavy, it's you know, what's the center of my image? And
14:00 - 14:30 then that K is a a single radial distortion term. So it's assuming a lot of times lenses introduce a little bit of curvature effect, you know, curvature distortion to them. And so we're going to use a single mathematical term, a single, you know, polomial term to represent the distortion in that lens. This might be great. This is great for a lot of just, you know, general cameras. But if you know that your lens has a little bit more distortion, maybe you're
14:30 - 15:00 using, you know, a wide-angle camera, a GoPro or a drone that has uh a wider field of view and some distortion. If you have a really wide angle camera, something that you can see lot of distortion, then you might want one of these one of these fisheye versions. They have simple radial fisheye or the normal fisheye. There's even I think at the very bottom of the list, there's one called FOV. That's one that's really great for super wide angle. Mhm. You know, a lot of times for a normal camera like your iPhone in your pocket or your DSLR or your point and shoot or whatever
15:00 - 15:30 it ends up being or your simple radial or your radial models um are nice because they assume that you've um you've got a single focal length. You know, your pixels are square. So, I don't need more than one f term. You want to model your principal point with CXC. And here the radial model added an extra lens distortion. So now instead of just K, now we have K1 and K2. So that's two radial distortion terms. So we can
15:30 - 16:00 do a little better job of estimating the distortion of our lens. Okay. And so call map asks for this right away because what it's doing is it has that, you know, part of that project creation process is you create a database. And so that's going to be, you know, a collection of data stored on disk. And so this process of feature extraction uh is when call map goes through all of your images, extracts features, but then also creates those image entries in the database. And so it needs to know what style of camera is going to be
16:00 - 16:30 associated with that image. Mhm. And and we could go and deepen a bunch of buttons on here. Yeah. I don't want if you just run this in default and simple radio and using smartphone or something, you'll be okay. But, you know, like here is thinking I have all these different cameras. There's options where you can say use it's always one camera. So, it just assumes then everyone's the same camera, which is great. Yeah, that's good. There's options for masks. I just bring up a mask on my screen. This is me
16:30 - 17:00 masked. This is a mask. Not necessarily the mask you would um use, but basically there's a picture of me. This might be the wrong picture. And I've been with the mask as a separate file. And then if you kind of like combine the two, you end up with me masked out. And that's like a way to say you want me not to be in this result. You can mask out things. Specifically, if you want perhaps just an object to be reconstructed, you want to mask out a background, things like that we could go deep into. But there's all these options, right, to help get
17:00 - 17:30 the right key points. So, if I go to this database, so I ran this already and I have this database manager where I can kind of jump into things and I pick one of these and I'm just going to hit show image, it's going to bring up the image and I can make this nice and big on my screen. What we're seeing now is an image of the fountain. I'm on the back side of it right now with all these red circles which are key points. Not necessarily all the features, right? It's just some of the ones that I think it matched on. Is that wrong or am I on the wrong I'm not I'm not entirely sure.
17:30 - 18:00 Yeah. Yeah. In some software packages, they may show you all of them or may show you just just the ones that have been matched. I'm not sure with this spec specific viewer right now. So, yeah. And I'm not 100% clear either. I haven't read the documentation. All I know is visualizing. So, this is an idea of key points where you'll notice there's no key points where you have a lot of low contrast, not a lot of visual variation. So, I'm on my screen. And there's a part where it shows the street and there's just not much going on there
18:00 - 18:30 versus there's a lot of points on the fountain which has all these ornate decorations on it. In the background there's trees and buildings that it's latching onto. So it makes sense that where you have less variation you're going to have less features that it's it's oh the sky is also another one where you this nice tree behind this thing it caught a lot on. So it doesn't mean it matched on those because you might not see those. So if I then I'm going to close this and then you can look at show overlapping images. So you know if I click here you can look at the the
18:30 - 19:00 matches. You're going to see then this kind of correspondence matches where it's finding key points between two images and they show these green lines basically saying these two images have matching features that it it believes are the same points. Right. Is that what we're seeing? Exactly. Exactly. So that this is now sort of moved to the second and third bubbles within that correspondent search block. So back to that correspondent search. The first step was the feature extraction which was just the identification of these key
19:00 - 19:30 points in each of the images. So it wasn't even trying to compare images yet. We're just saying for each image let me find those key points. And as as Jonathan said, by default, if you've got a GPU enabled version of call map and you've got a nice GPU in your computer, uh it will use the GPU implementation, that graphics processor, which makes it go a lot faster. So once we've extracted those key points and those or or features, again, I use those terms interchangeably a lot, the key point and the feature. Now, we want to match images together, and that's to discover
19:30 - 20:00 which images show similar content. And so the result of that is going to be the set of correspondences, the set of uh features saying the features in this image matched to the features in this image. And those were those green lines that Jonathan had shown up uh just prior saying that you know not all of the key points from one image matched to the other. There was some subset but um we're trying to discover what those matches are. In this diagram we said that you know we had feature extraction, matching and then geometric
20:00 - 20:30 verification. matching and geometric verification uh a lot of times will go hand inand you know so you run matching and then you immediately run geometric verification after that. So the intention there is your matching is just trying to figure out which features look similar between two images but it's not trying to do any sort of 2D or 3D reasoning. So, it may think that, oh, the the top of the tree in one image looks like the top of another tree in
20:30 - 21:00 another image, but they're in completely different parts of the image, and it doesn't even make sense. Like, it it may confuse things or especially if you have, you know, a building with some sort of repetitive pattern on it. You know, the same brick repeated over and over again, but you have some sort of unique windows or unique artwork, you know, that appears, you know, on that wall. For feature matching, it may end up matching incorrect parts of the image to each other. So matching does its best to try to figure out what matches, but it might be wrong. It's geometric verification's job to come in and clean
21:00 - 21:30 those up to figure out, well, now that I have these initial set of matching key points between my two images, which ones actually make sense based on our knowledge of geometry and how cameras move. And so that's where sometimes you can leverage, you know, knowing what kind of camera model you have can be helpful. knowing if if you expect a lot of distortion or if it's a fisheye lens, that can help. But sometimes um some methods don't even try to use that information. We'll just look at the 2D2D relationships. Mhm. And so there are
21:30 - 22:00 some key words that you might see would be it's you know estimating a homography homography like a perspective transform or an essential matrix or a fundamental matrix. So each of these sort of relationships, each of these matrices is a way to describe how a point in one image matches to a location in another image or a set of locations in another image. And and so we're trying to estimate, you know, is there a valid camera motion that we can imagine to get
22:00 - 22:30 a set of points in one image to move to the set of points in the other image. That's what geometric verification is doing. Just figuring out those those 2D relationships uh between images. And somewhere in my logs, you can see some hints of that. So, as this running, it's showing all kinds of text on your screen and it's I'm sure some of that when it's well, it's showing bundle adjustment on my screen right now, but at one point it's it's talking about some of that the matches and running different algorithms
22:30 - 23:00 in the background to get that. Um, so and then if I if I click on like one of these points that it created, it almost it shows you where you have multiple matches on a specific point and things you can do to kind of get different views and get hints of what we're talking about here. But so one thing we we didn't really talk about when you're matching these images too that there's there's different options as well. So when I go through here, I'm processing, I've got my key points. It goes fast on a GPU because it's able to like look at all the different images all at once, right? They don't care about respect to
23:00 - 23:30 each other when you're extracting features. But then you get to the point where you need to do your matching. This is where it's all CPU driven because it's kind of either a sequential or exhaustive, but it's not able to look at every image all at once. But there's options here where if I go to this button here, it's not displaying on my screen correctly for some reason. Oh, there we go. You can you can do exhaustive, sequential, vocab tree, spatial. There's these different styles you can pick or I want to say styles, different algorithms you can pick to
23:30 - 24:00 match these. Yep. My understanding always is if you have a random collection of images like someone walked around and they're not necessarily one image is taken and then your next image you moved over and took just of the same part of the scene. But I don't know, maybe you're just walking around taking pictures in all which directions. Exhaustive is what you want to use because it's going to you can explain this but it's going to like kind of try to get every image to match to every image versus sequential where you're saying no no no each image was taken in
24:00 - 24:30 sequence. So I see the fountain from one spot I moved a few feet took another photo of it. They should be sequentially somewhat matching to each other. Does that sound correct? Is am that the right assumption? You you're exactly right. You're exactly right. So yeah, once you once you've extracted the key points from a single image, now you want to figure out well which pairs of images you know are related to each other. So the the simplest most naive way is to say well let me match every single image to every single other one. Let me look at all order n squared every single
24:30 - 25:00 combination of pairs of images that I can imagine. And so that's what exhaustive matching is doing. So exhaustive matching like you said it's great when you have sort of an unsorted random collection of images and especially it works well if you have you know the order of a few hundred images um you know because because it is doing this you know every image to every other image that quickly gets expensive in terms of time like that's going to take a lot of time to compute if you try to do this on thousands of images you can still do it you just have to wait a long time but yeah it's it's great because
25:00 - 25:30 it's going to try to discover every single pair of matching images that it Mhm. And so that's where then the sequential is nice if you have something like you said there in the fountain sequence where you know hey these are you know frames from a video or my images you maybe I was taking photos but I I'm taking them in order like oh I started here took a photo took a few steps took another photo took a few more steps took another photo and so there is some sort of sequential information to those photos you know that images taken near each other in that list show
25:30 - 26:00 similar content and that's what sequential it'll leverage leverage that information to help the the matching be more efficient. And then I don't really understand vocab tree. I do know that if you want to do an exhaustive style match, not sequential, but you have let's say 800 images, I've always heard use a vocab tree. Yeah. Yeah, that that's exactly right. So the vocab tree, you might heard like it's a vocabulary tree or image retrieval style matching. Yeah. What it's doing behind the scenes is is it uses a image lookup data
26:00 - 26:30 structure. So it takes all the images, comes up with a really compact summarization of the kinds of things that are in each image and then provides a way that I can say, hey, for this given image, what other images in my data set are likely to have the same kinds of things in them. you know, it's not a guarantee, but it just says, you know, if I'm I have one image and I've got 10,000 other images I can match to, I can ask it, well, hey, I don't want to
26:30 - 27:00 look at all 10,000. Can you at least give me a sorted list of the ones that are most likely to match? And so that's what the vocab tree option does for you is it returns that ranked list and then so instead of matching all 10,000, I can choose to match the best 50 or the best 100 or whatever my threshold. Speed up. Yep. It's more efficient. Yeah. Um, once you get beyond three to 400 images, exhaustive should not be your option. You should go to vocab tree unless they're all sequentially taken. And then
27:00 - 27:30 always use sequential. Well, not always, but that's that's probably your default. So, if I'm taking a video and then extracting images, sequential is always going to work. Well, always going to be your first option if you want to be as fast as possible. And so, and then and then in here, you can I know you can you can uh pick loop detection. So, it's trying to we've talked about that before, right? is it's trying to detect have you come back to an area correct and and and that will do it using the vocab tree option like so if I do loop detection so under the sequential tab if
27:30 - 28:00 I do loop detection and then specify a vocab tree path there at the bottom that will enable it to say oh as I'm processing through all those video frames you know every 10th frame or every 50th frame or every 100th frame whatever you set it to you can have it go and then do a vocabulary tree retrieval do that image retrieval step to try to discover loop closures within within some of that uh that okay so we have these options I always just say and then there's spatial and transitive we
28:00 - 28:30 haven't talked about that does spatial have to do with GPS exactly right so it just says you know for each image assuming if the images have embedded uh geo tags so GPS data embedded in the EXIF it will say for each image just find other images with similar GPS and match to those yes I love that a lot of people here listening probably are taking drone images and spatial is the one I always use. That's a great option because a lot of times that drone is looking straight down or you know it's not looking at completely random
28:30 - 29:00 directions but there is some order and structure to that drone data and so that and in fact a lot of the drones that people are using nowadays have a really good GPS on it. thinking of the enterprise versions of like a DJI drone are getting really good GPS. Even even without a RTK attachments, it's not going to it's not going to throw a bunch of air into there. And then what's transitive? That's the one I don't think I've ever touched. I don't even know what that means. Yeah, that just that's a way to densify a set of existing
29:00 - 29:30 matches. So suppose you had gone and run one of the existing modes. see ran. Okay, maybe not exhaustive, but like if you had ran sequential or ran your spatial or ran your vocab tree, but then you wanted to go back and create a a more complete set of connections between images. What transitive will do is it'll look at your database and it'll say, "Hey, if image A matched to B and image B matched to image C, but I didn't try to match image A directly to C, let me
29:30 - 30:00 go ahead and do that now." And so it goes back and finds these transitive links between images and attempts to do that matching. And so what that does that just creates a stronger set of connections between images which will help it call map out during the reconstruction phase. Okay. So that I feel like this gives me a good idea then of or the the listener/viewer an idea. There's different options. Pick the one that makes sense for the data set you have. You might get the best results out of exhaustive as far as air, but you might
30:00 - 30:30 be waiting a day. Heard people say, "I set this and now it's telling me it'll be ready in 28 hours." Well, probably not the right mode. You probably used a vocab tree, but you know, I always say find the right one. Start with sequential. If you have sequential images, at least you probably get good a good result there. I also want and just to mention it back you know in the diagram under the corresponding search you know they do break it down versus the feature extraction feature matching and then geometric verification that geometric verification
30:30 - 31:00 those options show up on that matching those matching settings screens that we just saw for each of those tabs at the bottom there was the general settings or general options and a lot of those general options are related to geometric verification saying when I'm matching these points and I want to then verify, you know, what sort of pixel error do I expect or what is the minimum number of inliers or an inlier ratio and so that those inliers are the number of
31:00 - 31:30 geometrically verified matches between a pair of images. And so that's that's where geometric verification kind of comes into play within this call map workflow. Okay. So just move this along. Then I do want to point out I'm going to show call map one more time. At this point, you've ran both your feature extraction and feature matching. You will still see nothing on your screen. Well, you will see logs, but you will not see these camera poses, which I have. So, I have a point I have this sparse point cloud. I have these red camera positions around it, and none of
31:30 - 32:00 this shows up because at this point, we haven't we haven't created a point cloud. We haven't projected anything yet. So, we're moving from correspondence search to, if I bring up that diagram one more time, we're moving on to incremental reconstruction, and that's where we start to see fun things happening on a cool map uh guey screen. If you're running on a guey, you'll start to see camera poses show up. So, the first step is initialization. What is that? So, is that just just starting? Yeah, that's what it is. I mean, it's
32:00 - 32:30 it's the starting process for this incremental reconstruction. So incremental reconstruction is just one style to attempt to do 3D reconstruction. And so the the core idea here is that you know like you said we don't have any in 3D information yet. So we're going to start with the minimum amount that we need which is a pair of images. So let's start with a pair of images and then figure out what is their 3D relationship you know between those images as well as what 3D points did
32:30 - 33:00 they see in the scene. And so we're going to create this two view reconstruction. take that pair of images, triangulate an initial set of 3D points, and then we use that as the initialization for the rest of the reconstruction. And so everything after that is going to figure out, well, based on these initial two images and some points, how can I add a third image to that? And how does it relate? And now that I have these three, how can I add a fourth and then a fifth and a sixth? And so you just keep adding images one at a time to grow a larger and larger reconstruction. But initialization is
33:00 - 33:30 just what is that initial pair? Which two images am I going to start with to build this entire reconstruction? Okay. And then and then it kind of goes into a circle. So if you look at this, I say circle the the diagram on the screen shows image registration, triangulation, bundle adjustment, outlier filtering, and then if you follow the lines, you notice you're really doing a loop. Yep. So it's looping through that process. And then also this dashed line showing reconstruction. So it's kind of probably
33:30 - 34:00 looping through that and adding to the reconstruction while it's going or Yep. Okay. Exactly right. Exactly right. So it's it's that initialization that picks the first pair of images. But as but once I have my pair of images now I'm going to enter in this loop that starts with image registration. So image registration is is a fancy name to say how does a new image how can I add a new image to my existing reconstruction. And so what it's going to look at is based
34:00 - 34:30 on the 3D points that have already been triangulated. It's going to ask what's the best next image in my data set that also saw those points. And then if um and once I find that image you know via via the set of feature matches. So we say you know uh if if I've matched image one and two and triangulated that well two image two matched to image three well then image three is seeing the same points in the scene. So let me add image three and so there it's a 2D to 3D
34:30 - 35:00 registration process 2D 3D pose estimation process where I take the 2D points in that third image and I want to align those 2D points with the 3D points that have been triangulated. So you might hear that as image registration or perspective endpoint problem, pose estimation. There's a few different words for what this process is, but you're adding a new image to the reconstruction. And so that's the image registration step. I do know when I ran this um I can always take a a video and kind of project onto this in post. But
35:00 - 35:30 when it's creating this reconstruction, instead of taking image one and then image two and then image three and kind of building off that, I'll notice it'll pick, if you look at my if if you're watching this on video, you'll notice I took two loops and some of the images are like right above each other almost where I held the phone at like above my head and then I held it down at chest level. So I have two loops and there's a lot of common key points, common features. So, as it's building this up,
35:30 - 36:00 it started at this kind of where I started walking around this this fountain, but it's using images from further along in the video extraction or sorry, the images I had. So, it use like image one and image 180 because those are next to each other and had a lot of strong feature matches. So, they're not necessarily using images in sequence of how you took them. It's ones that had strong correlation. That's a great point. That's a great point. Yeah, it it isn't just going to go, you know, 1 2 3 4 5 6, you know,
36:00 - 36:30 it's not going to do them in order, you know, it's going to start that pair of images. It's going to look through all of the images in your collection and find the pair. And it might not be the consecutive pair, but find the pair of images, you know, that maximizes some criteria. You know, it's a pair of images that has strong connectivity. So, there were a lot of feature matches, but I also want to make sure that that pair of images has, you know, differences in viewpoint. I don't want two images that were taken at the exact same position in space because that gives me no 3D information. I need, you know, we talked
36:30 - 37:00 about this in the last episode, this concept of a baseline. I need some sort of translation. I need some motion between two images or maybe it was in our depth map depth map episode, you know, we talked about this, you know, in that we need motion between images in order to estimate depth. So the initialization could look for the same thing. I want it wants lots of matches between the image, but it also wants a strong amount of motion between that. So, it's going to pick whichever pair of images maximizes those that criteria and once it has that, then it'll start
37:00 - 37:30 adding other images that are strongly connected to those initial ones. And yeah, it won't necessarily do it in order that you capture those images. It can be in the order in which those connections are strongest. And I I I was seeing mostly you were I was seeing like the first photo and then somewhere further along where I came and did a loop. I saw those two photos start together because I think there was more as we were talking about a baseline was was better. There was more parallax because I have these are pretty closely
37:30 - 38:00 spaced images I took from picture to picture. So not a lot has changed versus the next loop I have a I'm looking the exact same part of the fountain but I have a different elevation and angle. So there's a lot of parallax movement between those those images. So it it was it was matching those better as opposed to image one to image two. It's more of image one to image 180 because of that baseline was probably better. So you get to the fun thing is when you run this in the guey, this coal map, you get to watch those build and you get to see the point cloud just start to generate in
38:00 - 38:30 front of you and you get an understanding then of what it's doing in these logs that are looping through this process over and over. And you can kind of see it just iteratively add to the scene and build and refine. When it's doing this incremental reconstruction, is it refining the camera poses as it goes or is it just saying, "Here's the camera poses. There's where they are." No, there's there's refinement. There's refinement. And a lot of times that refinement is is called bundle adjustment. That's that's a key word
38:30 - 39:00 that's used commonly in the literature. I remember the first time I heard the word bundle adjustment. I was a first year grad student and I had no idea what the person was talking about. I was like, "What? A bundle of sticks? A bundle of what? A straw? What is going on?" Um, but no, b a bundle adjustment. So, it's the idea of refining the 3D points as well as the camera positions. And so you end up with just a bundle of constraints, you know, a bunch of constraints saying, you know, these 2D points in these images all triangulate and all saw the same 3D point in the
39:00 - 39:30 scene, but I've got a bunch of images and I've got a bunch of points. How can I optimize the alignment of all of this data? And that's what bundle adjustment is. So yeah, so as call map is running, it's it's doing that image registration process. It'll add a new image. It then runs triangulation which creates new 3D points based on that new image and other images that are already there but then it'll do bundle adjustment which will say how can I refine that and there's two styles of bundle adjustment that I
39:30 - 40:00 believe call map uses one of them is local bundle adjustment the other is global so a lot of times what you will see is you know suppose suppose we had already reconstructed a thousand images and we're adding that a thousand in first um when I add that thousand first you know trying to do bundle adjustment using all thousand images that takes a long time. Um, and so I can re we recognize that well that first image that that that thousand first that next image that I'm adding, you know, well, it's off in the corner of the reconstruction, you know, it's far away
40:00 - 40:30 from the other side of the reconstruction. You know, these these things aren't really related to each other. So, I can run a local bundle adjustment. Let me just optimize only those cameras and points that are near that new image that I just added or those new points that I've triangulated. And so, that's a way to sort of do this local refinement. And I can do that every single time I add a new image. And then periodically, com will run a global bundle adjustment. So there's some settings there. I think every, you know, once the reconstruction is increased in size by 10% or you've added every, you
40:30 - 41:00 know, 500 images or something, there's certain criteria, especially at the end of the reconstruction, homeup will run a global bundle adjustment which says, let's optimize everything. Let's optimize the points. Let's optimize the camera poses. And something we haven't mentioned is it will also be optimizing the camera parameters. So back when we picked that camera model and we said, "Oh, you know, we're going to use a camera model that has a focal length term and a principal point CX and C Y or maybe has some radial distortion terms." During bundle adjustment, COM app will
41:00 - 41:30 also be optimizing those parameters as well to figure out well what is the field of view of my camera that's the focal length or how much lens distortion was there in order to achieve that line of. Would it run those? if you cuz we didn't cover this earlier on, but let's say you do have a camera model uh calibration file. So, you're saying I know this. I think DJI's in their again in their enterprise level drones will give you this information on their
41:30 - 42:00 lenses cuz they've been calibrated and it's in the XF data. Will will that change? Does it do like a refinement on top of that or does it just say no, no, no, you give us that, we won't change that. That's that's an option. So I think under the either under the reconstruction options or under the bundle adjustment options there are ways to say hey do I want to refine my focal length you want to refine you know my distortion terms. Um so you could you know enable or disable that setting. To that point I do believe you know that call map will parse the XF data in those
42:00 - 42:30 images and if it sees that yeah there is a focal length cuz a lot of times an image will you know will contain you know that oh this was taken with a 10 mm lens or a 24 mm lens you know and so call map can parse that data to take an initial guess at what it thinks that focal length is you know what's the field of view of the camera and can use that as initialization. But a lot of times there is benefit to refine that um because it may it may be make it you close but not might not be close enough to get a really sharp
42:30 - 43:00 reconstruction. So okay so I got a lot more appreciation for what's happening here. I tell people run this on their computer. You don't need the highest spec computer to run a small data set and learn how this works. I ran this on my older computer which doesn't have you know 24 cores or anything and it still ran fairly quick. I'd say there's there's some things you gave me some notes. I think we covered largely most of it. But then from here, you can do things. So, I've ran this through. You can hit automatic reconstruction. It'll
43:00 - 43:30 create all this, but then you can hit bundle adjustment, which is that global one at the end. And then you can build a dense reconstruction, which we're not really going to cover on this episode. This is just kind of like here's how we got that that workflow I showed to get the camera poses, the sparse point cloud, and then from there, you can use it for more downstream tasks, right? So I could use this for again doing a dense 3D reconstruction where you're going to I want to get millions of points on this scene or I can use this as the basis for initializing 3D god and splatting. There's just different things you can
43:30 - 44:00 use once you got camera positions and a point cloud spar sparse point cloud. I'm showing also on my screen I didn't talk about you have these kind of magenta lines. This is showing kind of your these images matched. If I clicked on double clicked on one, it'll it'll show that kind of that information of the key points and which ones matched to it. But you can just click around and and and learn things. Double click on different parts of the scene. It'll show you the point and which which different cameras made up that point. And it's a good tool
44:00 - 44:30 to kind of learn how this works because it's very visual on the screen. Lots of data, lots of options. You can even create animations in this if you really want to show off what you learned. There is one thing we didn't really talk about. Well, there's a couple things. So, incremental reconstruction. Everyone always complains. I got the newest GPU. This should be really fast. Why is this running so slow? My GPU is not even being used and it says it's taking 5 hours to run my thousand image data set. Why is that? Why can't we use a GPU for
44:30 - 45:00 this incremental reconstruction? Or I know we can, but why can't we in co map the way it's configured? Yeah. Yeah. Because coal map Yeah. A lot of these algorithms are not easily to parallelize on a GPU. So a GPU works well when you're doing the exact same operation on millions of things, you know, cuz that's what a GPU does. Its job is to draw pixels to a screen, you know, on your on your monitor on your desktop. And so you've got millions of pixels on your screen. And so that GPU is processing a million pixels at once and figures out
45:00 - 45:30 what to draw. And so for tasks like feature extraction where hey I've got a again millions of pixels and I want to figure out which ones have features in them. GPU is great or feature matching. I've got tens of thousands of features in one image, tens of thousands of the other. I want to figure out which features match with each other. Then again that's great for a GPU. for incremental reconstruction. It's like I'm operating on one image at a time and I have to just solve a math equation and do some, you know, linear algebra to
45:30 - 46:00 figure out what's the 3D position or pose of that image. That's not a very paralyzable task. And so it's it's not very easy to uh adapt some of these algorithms to the GPU. I will say in the another thing too that contributes to it is COMAP is very uh flexible. There's a lot of algorithms, a lot of switches, a lot of different techniques that you can use and to implement all of those on the GPU would just take a lot of time. It's nice having software that's flexible. You know, with Clap being open source, a
46:00 - 46:30 bunch of people contributing to it, it's nice having a flexible platform where people can easily dive in, make changes, add their own algorithm, plug it in, tweak things, and play with it. So having that having that sort of more general purpose CPUbased implementation is is helpful. But yeah, to get back to the core, it really is primarily just around the algorithms. A lot of these algorithms are not parallelizable or or not well suited for processing on a GPU. That makes sense. I someone I once explained it or someone was trying to
46:30 - 47:00 explain it. It's like your CPU is a really good detective at solving clue by clue one thing at a time versus GPU. It's like it can just point out all the clues all at once. Yeah. But you really need that like hard math equation. and you need a really fast cores to trying to solve those things one at a time and it's incremental. So think about it. It's like you can't you can't solve all these all at once as is. So that's something that people just have to keep in mind that don't get frustrated. It's just how this technology works today.
47:00 - 47:30 And there's glow map. So how does glow map make this all a sudden magically fast? Yeah. So glow map is a different style for that reconstruction process. So glow map deals with global mapper you know. So global reconstruction versus incremental reconstruction. So instead of here in colap we just talked about it uses an incremental reconstruction adds you know one image at a time whereas global reconstruction it tries to figure out the 3D poses of all of the images all at once. So glow map still has that
47:30 - 48:00 same correspondent search step. So to run glow map you still got to extract key points extract features from your image. You got to match them. Got to run your geometric verification. But once you have that web of connectivity between your images, you can then run global reconstruction techniques. And so there's a few different steps there. In glow map, they run rotation averaging first. So the idea with that is that you look at all of the feature matches between your pairs of images. For each
48:00 - 48:30 pair, you estimate how much rotation occurred between that pair of images, you know. So that gives you a constraint. But now if I look at all of the rotations that I estimated between all of the pairs, can I come up with a consistent orientation for all of my images that satisfies each of those pair-wise constraints? So, can I arrange the orientations of my images so that all of those pair-wise rotations make sense? And that's what rotation averaging does. So, it's not even
48:30 - 49:00 looking at position. It's just trying to rotate all of the images. And once they're rotated in 3D space, then it does a global positioning step which simultaneously solves both the camera positions as well as some of the 3D points. And so it kind of throws all of the cameras into a big soup, a big mess. It gives them a bunch of random initializations and then defines these constraints saying, well, these images saw these common points. How can I rearrange all of these images so that they line up and see those common
49:00 - 49:30 points? So it's it's similar to bundle adjustment. So that the idea of take a bunch of images that see points and refine it, but uh it uses a different formulation, a different set of constraints that is better suited to, you know, random unknown camera positions. And so that's this global positioning sol problem that they solve. So that gets you pretty pretty so once you've run your rotation averaging, your global positioning, you get a reconstruction that's pretty close. And then you can run bundle adjustment, you know, an actual high quality refinement
49:30 - 50:00 using bundle adjustment. And then you have your your 3D reconstruction. So it skips a lot of this incremental slow process that wasn't parallelizable. The rotation averaging uh and global positioning that's a little better suited to parallelization and is is more efficient because you're not having to do this one after the other after the other. Yeah. And I have it on my screen here, the project page where it kind of showed you were talking about. And this last showing it all happening all at once where it just kind of all just kind of resolves at once. I do want to say
50:00 - 50:30 that it it's something it's it to me it's there's a low a low what's the right words? It's not you're not going to be wasting a lot of your time to give this a shot to see if this works well for your project because you don't have to wait a lot of time for it to do the incremental reconstruction. So, it doesn't work well with all scenes as I found, but because you know within minutes if it's going to work well or not, it's worth a shot and you get to learn what scenes work well with it. You've done some tests as well, Jared.
50:30 - 51:00 You kind of you can't get too tied in on a bunch of little things. I feel like you need a more of a global view or a you know, the the example images have a lot of features and aren't really close tied in on little features in a scene. Mhm. Yeah. You want from my for my experience with glow map and and other global structure for motion, global reconstruction techniques, they work best when you have a lot of connections between your images. Mhm. So it's not you just walking through a cave or
51:00 - 51:30 walking down, you know, a city street and never returning back. It likes a lot of loop closures. It likes a lot of connectivity, a lot of different vantage points and overlap and diverse content. And so it it takes the strength of those diverse and dense connections and very quickly figures out how to arrange them to produce that final reconstruction. And that's probably why in my experience when I have these more broader view shots, it works well because I have a lot of connections. I have a lot of unique features and you get too close in
51:30 - 52:00 on one little object or you have a lot of like I think inside I've done some indoors that haven't turned out because you have a lot of just blank white walls with not a lot of features. So, it's just not able to do that. So, all right. Well, this is something I say I had on my screen just to to kind of show some examples. If you're listening, I I will make sure I'll link in the show notes as well. Glow map and coal map, but glow map's an interesting one you can look at. It's it it drops on top of coal map. So, even get it running isn't like a
52:00 - 52:30 large lift. And you see Johannes in the the list of names. So, you can see he's still working on these things. I think this is interesting because it does make things go faster. And if you look in the results that they are in the same range of accuracy as you get with incremental reconstruction using coal maps. So it's not saying well this is fast but it's not nearly as good. It's fast and is good if you have a good result but you find out really quick because I've noticed that the results either are absolutely all over the place or you have a really good sparse point cloud and so you know if it's good or not. In
52:30 - 53:00 fact, you'll see cameras all over the place where everything's kind of like this weird looking cube and and that's how you know it didn't work. But you will know based off of your output. Yeah, I've gotten a few B bork I say Borg cubes, that's what I think they look like, but I think I've gotten a few cubes in my uh Yeah, and as my results. All right. Well, I think we covered I think we covered this all really well. I hope at the end of this people will go try coal map or go I mean even if they use other software it will follow
53:00 - 53:30 relatively the same sort of process. I don't think you could maybe there's other ways it's done. I'm sure there is, but this is the standard kind of method that most at least follow this sort of style. And now there's all this machine learning stuff that's different. But as far as classical 3D reconstruction from imagery, this is a very well-known and reused pipeline for a lot of projects. Yeah. And it's a great, like you said, like just go and try that. That's that's I can't stress that enough. Just just try it. you know, if if you're either
53:30 - 54:00 one just get, you know, new to computer vision and want to understand how 3D reconstruction works, you know, or maybe you kind of understand it but don't under, you know, but want to get a better insight of how things work behind the scenes. A tool like call map is great just to, you know, throw some images at it, run a reconstruction, and then start poking around. There's a lot of neat visualizations that Jonathan showed where you can look at a point and see which image is solid or in an image, what did it match to. There's other debug visualizations where you can look at sort of the match graph or the match matrix and see how uh the different
54:00 - 54:30 patterns or ways that images are matching to each other. So, it's it's a nice way to get in get your hands dirty and see how this process of turning pixels to 2D information to final 3D results, you know, and and that mapping from, you know, 2D to 3D and all the uh information that goes into that. So, it's a great way to get in there and get an intuition for how this all works behind the scenes. Yes, definitely. And I would say the most important part when you're trying to run this is picking the
54:30 - 55:00 right matching strategy because that can be the that can be the difference between waiting hours and an hour or minutes. So, well, thanks Jared for this episode and kind of covering all this stuff. I hope this was tangible enough for people to go try it and having the visuals up. So, if you're listening, go find this video on the EveryPoint YouTube channel. We have a playlist of all of our episodes. I'll make sure. I haven't named it yet, but I'm sure Colemap will be in the name. It'll be uh
55:00 - 55:30 I can't remember what episode we're on, but it's like 15 or 16. You will see that it's a great it's a it'll be a great way for you to learn this if you're if you're getting into there, cuz I see every day I didn't go over these, but we have questions I see every day either on my videos or on Reddit or Discord. There's these different communities that are all using projects that require coal map to run to start think 3D gods been splatting and it's just obvious that this is something that people just know they have to use but
55:30 - 56:00 have no idea what's happening. They just know they threw a bunch of images at it and something came out and then they're going to do something else with it. But they have no appreciation for the sausage making of coal mapping. If you know what each step is, you can get better results in my opinion. Just play with it. see what works, learning what those different options are. If you don't know what an option is as well, jump on our YouTube channel, ask a question. I will be watching and trying to respond as intelligently as possible on those and and give you a a good
56:00 - 56:30 answer. So Jared, any other parting thoughts you want on this? You you said go get give it a try. Any other tips you would give people? Take good sharp imagery. Take I just do it do it yourself. Get out and try, you know, take your own photos and see how see how they turn out. Yeah, take your own photos. Don't go use the like open- source data sets because they know those are going to work and you know those are great for testing but not great for learning on your own data. So right well thank you and if again you're if you're listening this will be on all major
56:30 - 57:00 podcast players please if you can subscribe to our to our channel or to one of our podcast episodes that'll mean a lot to us know that we're making the right content and that you guys care about learning about this information. And as always, let us know in the comments as well on our YouTube channel if there is something here that you would like us to go deeper in. Maybe we can get someone like Johannes on one of these episodes to go super deep if you want to. Anyways, well, thanks Jared for being on this episode and I'll see you