Building Windsurf with Varun Mohan

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

In this engaging episode featuring Varun Mohan, co-founder and CEO of Windsurf, insightful discussions unravel the complexities of modern software engineering. The conversation delves into how Windsurf leverages AI to revolutionize coding, with a focus on unique engineering challenges, the creation of low-latency models, and strategic decisions that drive innovation. Varun candidly shares Windsurf's journey, including the bold move to fork Visual Studio Code, and the commitment to building sustainable, high-performance coding tools that go beyond incremental changes. With a blend of technical depth and personal reflections, this episode is an enlightening exploration of the future of software development.

Highlights

Varun Mohan discusses the misconception that AI will drastically reduce the need for software engineers, arguing instead for enhanced productivity and innovation. 🤖
The creation and application of proprietary AI models at Windsurf is a testament to the company's dedication to cutting-edge technology. ⚙️
Varun shares the strategic decision to fork Visual Studio Code, enabling Windsurf to build a more flexible and powerful coding environment. 🌌
Insightful anecdotes reveal how the Windsurf team embraces failures as critical learning opportunities on their path to success. 💡
Conversations around MCP and security highlight the ongoing challenges and innovations in optimizing software development infrastructure. 🔒

Key Takeaways

Understanding the real potential of AI in coding can shift how software engineering is viewed and practiced. 🚀
Windsurf's journey illustrates the impact of building proprietary tools and models tailored specifically for developer needs. 🛠️
Forking Visual Studio Code opened new pathways for Windsurf to innovate while maintaining a developer-friendly environment. 🌐
Engineering at Windsurf is about balancing innovation with practical, user-centered design. 🤝
Varun Mohan emphasizes the importance of failure as a stepping stone to success. Fail fast, learn faster! 🚀

Overview

In an insightful dialogue, Varun Mohan, co-founder and CEO of Windsurf, takes us through the ambitious landscape of modern software engineering, highlighting Windsurf's pioneering use of AI to reshape the industry. With rich anecdotes, Varun elaborates on how Windsurf builds its own AI models, tackling unique challenges like latency, and the strategic foresight required to create tools that meet developers' expansive demands.

Windsurf’s approach to innovation is both grounded and groundbreaking—it involves reconstructing established platforms like Visual Studio Code while tailoring the experience through a developer-centric lens. Varun discusses the nuances of balancing effective user experience design with high-level technical capabilities, shedding light on the extensive work behind Windsurf's seamless integration with tools like JetBrains.

By emphasizing the cultural ethos at Windsurf, Varun sheds light on their approach to failure as a catalyst for growth: every unsuccessful sprint leads to actionable insights. This philosophy not only fuels internal development but also aligns with their broader vision of what AI-driven software development should be—more intuitive, more powerful, and ultimately, more human-centered.

Chapters

00:00 - 00:30: Introduction The chapter titled 'Introduction' discusses the evolving role of software engineers and the future of the profession. It challenges the pessimistic viewpoint that the number of software engineers will decrease, suggesting that this perspective may come from those who dislike software engineers. The chapter argues that with the cost of building software decreasing, companies are positioned to invest more in technology, leading to a higher return on investment (ROI) for software development. Consequently, instead of reducing the workforce, companies should aim to build more and improve their products, leveraging the increased ROI and potential for software advancements.
00:30 - 06:00: Windsurf and its challenges The chapter discusses the impact of technology on businesses, highlighting that a single developer can significantly uplift a company's growth due to technological advancements. It focuses on 'Windsurf', a popular tool among software engineers that leverages AI coding capabilities. The chapter explores the engineering challenges involved in developing such tools and their potential to transform the field of software engineering. The discussion features insights from Vun Moan, co-founder and CEO of Windsurf, who elaborates on the company's development of their own large language models (LLMs) and addresses the limitations of current text-based LLMs in coding applications.
06:00 - 11:00: Use and evaluation of AI models The chapter focuses on the use and evaluation of AI models, highlighting how Windsurf employs various techniques such as a combination of embeddings and keyword-based searches to tackle different problems like search. A key challenge they face is latency, emphasizing the impact of improperly balanced GPU compute and memory load on latency, particularly for real-time code suggestions. Additionally, Varum shares insights on the evolution of software engineering and expresses skepticism about predictions that a large portion of code will soon be generated by AI. This chapter is particularly insightful for those interested in understanding the engineering behind next-generation developer tools.
12:00 - 15:00: Code Rabbit sponsorship The chapter begins with a welcome and an invitation to subscribe to the podcast on various platforms. The guest is thanked for joining the podcast. The conversation brings up the recent launch of GPC 4.1 support in a software called Windsurf, noting that it had been a few weeks since its release at the time of recording. Initial impressions and evaluations of new model introductions, particularly regarding their effectiveness in coding use cases, are discussed. The guest seems prepared to elaborate further on this topic.
15:00 - 21:00: The team behind Windsurf In the chapter titled 'The team behind Windsurf,' the speaker discusses the internal workings of models such as GBT 4.1, emphasizing their non-deterministic properties. These models can exhibit varied performances across different tasks, sometimes in unexpected ways. The speaker highlights that relying solely on competitive programming scores does not guarantee exceptional programming capabilities. Additionally, it is revealed that many team members have a background in the autonomous vehicle industry, which may provide valuable context for understanding their approach.
27:00 - 33:00: Engineering and infrastructure at Windsurf The chapter discusses the complexities and risks associated with testing autonomous vehicle software in the real world. It highlights the challenges posed by the modular nature of the software, where each piece is driven by machine learning and exhibits non-determinism, making real-world testing difficult. The potential dangers of releasing flawed software, which could harm the general public, are emphasized. Comparisons are made to the difficulties faced by Windsurf, underscoring the importance of thorough testing in engineering and infrastructure projects.
33:00 - 39:00: Challenges and approaches in Windsurf development The chapter discusses the challenges and methods involved in the development of windsurfing technology. It emphasizes the importance of building robust simulation and evaluation infrastructure, drawing parallels to autonomous vehicle development. The chapter highlights the need for comprehensive evaluation suites that not only assess overall software performance but also test aspects such as task completion rates, retrieval accuracy, and edit precision. This ensures all components of the model perform optimally, identifying and correcting negative or redundant changes.
41:00 - 49:00: Internal use of Windsurf The chapter discusses the importance of efficient processes in product development, particularly in minimizing unnecessary steps to improve user experience. It highlights the use of metrics and a series of tests to evaluate models, ensuring they meet the standards for end-user application. This systematic testing approach is essential in determining the effectiveness of a model.
52:00 - 60:00: Impact of AI on software engineering The chapter explores the impact of AI on the field of software engineering. It highlights how traditional software engineering practices, like writing unit tests, integration tests, and end-to-end tests, might change or adapt with the use of AI. The speaker reflects on the possible similarities and differences in approaching these processes when AI is involved. It also touches upon how engineers might incorporate example codes and prompts to work with AI-driven technologies, offering a glimpse into the evolving practices and methodologies that might come into play in a future closely integrated with artificial intelligence.
60:00 - 62:00: Future of software development The chapter explores the future of software development by highlighting the importance of code functionality and testing. It suggests a method to improve code descriptions by using open source repositories to identify past pull requests or commits that simultaneously added tests and implementations. This would allow for the creation of more meaningful commit descriptions rather than relying solely on the initial commit description.
62:00 - 78:00: Product decisions and engineering insights The chapter discusses the process of making product decisions and gaining engineering insights by starting with a high-level intent. This intent becomes a programmatic problem, entailing finding the right files that require changes based on a 'ground truth'. The base code initially has a set of certain files where changes were made. Understanding the intent within these files is essential, and one can trace backward from the ground truth to comprehend the final changes made. This retrospective analysis helps generate the original intent, ultimately contributing to better product decisions and engineering insights.
87:00 - 88:00: Closing remarks The 'Closing remarks' chapter discusses the multiple layers of testing necessary to ensure the correctness of an edit with respect to the intended outcome. It emphasizes the importance of validating: 1) if the correct items were retrieved, 2) if the intent was accurately discerned, and 3) if the edits performed well. Suggesting a more comprehensive approach, the transcript points out that beyond these measures, one can also verify effectiveness by executing the code itself. This multifaceted approach enhances reliability and accuracy in content validation.

Building Windsurf with Varun Mohan Transcription

00:00 - 00:30 Like a lot of people talk about how we're going to have way fewer software engineers in the near future. I think it feels like it's people that hate software engineers largely speaking that say this. It feels pessimistic not only towards these people but I would say just in terms of what the ambitions for companies are. I think the ambitions for a lot of companies is to build a lot better product and if you now give the ability for companies to now have a better return on investment for building technology right because the cost of building software has gone down. What should you be doing? You should be building more because now the ROI for software and developers is even higher
00:30 - 01:00 because a singular developer can do more for your business. So technology actually increases the ceiling of your company much faster. Windsurf is one of the popular ideas of software engineers use thanks to AI coding capabilities. But what are the unique engineering challenges that go into building it and how could tools like Windsurf change software engineering? Today I sat down with Vun Moan, co-founder and CEO of Windsurf. We talk about why the Windsurf team built their own LLMs and how LMS for text are missing capabilities necessary for coding like fill in the
01:00 - 01:30 middle. How Windsurf uses a mix of techniques for many cases like to solve for search. How they use a combination of embeddings and keyword- based searches. Why latency is their number one challenge and how incorrectly balancing GPU compute load and memory load can lead to higher latency for code suggestions popping up. how Varum thinks his software engineering field will evolve and why he stopped worrying about predictions like 90% of code will be generated by AI in 6 months. If you want to understand the engineing that goes into these next generation ids, then this episode is for you. If you enjoy
01:30 - 02:00 the show, please do subscribe on any podcast platform and on YouTube. Welcome to the podcast. Yeah, thanks for having me on. You've recently launched GPC 4.1 support in Windsurf which uh by the time this is out it will have been a few weeks but what are your initial impressions so far and in general when you introduce a new model how do you evaluate like how it's working for the coding use cases that we all use? Yeah, maybe I can talk about
02:00 - 02:30 the the second part and then I can talk about you know GBT 4.1 the other models um afterwards. uh basically internally you know these models have these non-deterministic properties right they they sometimes perform uh differently in different tasks in ways that are unexpected uh you know you can't just look at a score on an competitive programming competition and decide hey it's going to be awesome for for programming and you know interestingly about the company maybe this is this is going to be helpful context a lot of us at the company previously worked in autonomous vehicles and I think in
02:30 - 03:00 autonomous vehicles we had a similar type of behavior where you had a piece of software the software was very modular, lots of different pieces. Uh each piece was machine learning driven, so there was some non-determinism and it's very hard to test it in the real world, right? Actually, it's much harder than it is to test uh I guess winfur out in the real world. It's much harder to test autonomous vehicle software out in the real world because if you ship bad software, you have the chance of hurting a lot of people, right? Hurting a lot of people, hurting the, you know, I don't know, just the general public
03:00 - 03:30 infrastructure, right? So in that case we needed to build really good simulation evaluation infrastructure in autonomous vehicles and I guess we brought that over here as well where hey if you want to test out a new model we have evaluation suites and the evaluation suites not only test endto-end software performance which is to say you give a highle task what is the you know what is the pass rate of actually completing the highle task on a bunch of unit tests it also tests retrieval accuracy edit accuracy right redundant changes all these different parts of a model that are like negative
03:30 - 04:00 behavior Because for our product, it not only matters that you pass a test, it also matters that you didn't go out and make 10 steps that were unnecessary because the human is going to be waiting on the other end for all of those changes. So we have we have metrics for all of these things and we're able to put each model through like I guess a suite of tests that give us metrics and that's like the way we decide, hey, this is a good model for our end users, right? And that's like the the highle way that we go about testing and and like these tests, you know, they sound great in theory, but in practice, what does it look like? Like I'm going to
04:00 - 04:30 assume you're going to have you know I we can imagine us engineers who've been writing you know code uh probably not autonomous vehicles but similar ones you know we know our unit test our integration tests if you do mobile you know your end to end test I'm assuming this will be a little bit different but with some similarities like do you actually like code some scenarios you have like example codes example prompts and and then so I assume you can do a bit of that but then what else and and you know how does this all come together and like how can I imagine in this test
04:30 - 05:00 suite is it like one big giant blob that runs for I don't know how long. Yeah, one of the aspects of code that is really good is it can be run right. So it's not like a very you know touchyfey kind of thing in the end like a test can be passed. So what we can do is we can take a bunch of open source repositories we can find previous pull requests or commits that actually not only add tests but also add the implementations correspondingly. And what we can do is instead of just taking the commit description we can remake what the description of the commit should have
05:00 - 05:30 been like a very high level intent and then from there it becomes a very I guess programmatic problem which is to say hey like first of all find the right files that you need to go and make changes to right then there is a ground truth for that right because the base code actually has a set of five 10 files that changes were made to then after that what is the intent on those files you can actually go from the ground truth backwards which is that you know what what the final change was from the from the actual code and you can have the model generate that intent and then after that you can you can you can see
05:30 - 06:00 if the edit given that intent is correct. So you now have three layers of tests uh which is that hey did I retrieve the right things did I have the highle intent correctly and is the edit performance good right and then you can imagine doing much more than just that and but at a high level now you know just from a pure commit or a pure actual ground truth piece of code you now have multiple metrics that you can go about and then obviously the final thing you can actually do is run the code right so it's not just like a you know when you when you measure some of these chat products they actually you know the
06:00 - 06:30 evaluation is a little bit different which is to say the evaluation is you give it to multiple hum humans in a blind test in an AB test and you ask them which one did you like more obviously for us to quickly evaluate we can't be giving it to like tens of thousands of humans like in in a in a second and with this now within like minutes we can get answers to what is the performance on tens of thousands of repositories and tests basically this episode is brought to you by modal the cloud platform that makes AI development simple need GPUs without the headache with modal just add one line of code to
06:30 - 07:00 any Python function And boom, it's running in the cloud on your choice of CPU or GPU. And the best part, you only pay for what you use. With sub-second container start and instant scaling to thousands of GPUs, it's no wonder companies like Sunno, RAMP, and Substack already trust Modal for their AI applications. Getting an H100 is just a PIP install away. Go to modal.com/pragmatic to get $30 in free credits every month. That is m o.com/pragmatic.
07:00 - 07:30 This episode is brought to you by Code Rabbit, the AI code review platform transforming how engineering teams ship faster without sacrificing code quality. Code reviews are critical but timeconuming. Code Rabbit acts as your AI co-pilot, providing instant code review comments and potential impacts of every pull request. Beyond just flagging issues, code driver provides one-click fix solutions and lets you define custom code quality rules using a graph patterns, catching sub issues that traditional static analysis tools might
07:30 - 08:00 miss. Code Rabbit has so far reviewed more than 5 million pull requests, is installed on 1 million repositories, and is used by 50,000 open source projects. Try Code Rabbit free for one month at colder rabbit.ai using the code Pragmatic. That is code rabbit.ai. AI and use the code pragmatic. I I really like how much engineering you can bring in because it's code and because we have repositories because you can use all these things that it feels
08:00 - 08:30 to me it's it gives a bit of an edge to like some of the other use cases just as you mentioned. No, I think you're I think you're totally right. Like we think about this a lot of like what would have happened if we were to pick a different sort of category um entirely. It's just I think the ground truth is just very hard. you don't even know if the ground truth is great, right? In some ways, in some cases, for all we know, the ground truth is not good. But in this case, I think it's a lot easier because of the verifiability or kind of if you have a good test, it's it's a lot more easy to verify software
08:30 - 09:00 and can can you give us a sense of what is the team behind Windsurf and and also how complex this thing is and how did it even come about because I for all I know, you know, like a few months ago when this podcast started, there was no windfur. There was Kodium. We actually talked a bit about what Kodium was a little bit different and then out of nowhere boom winter comes out a week later already in the pragmatic engineer about 10% of people that we surveyed were already using it which was a I think the second largest uh usage of of tools and people were enthusiastic about
09:00 - 09:30 it but I assume there's more to this it didn't just come out of you know like nothing right yeah so happy to talk a little bit about like our our story and and summarize it so we started the company now close to four years ago which substantially before I guess the you know the co-pilot and chachi bt sort of moment. Um we a lot of us at the company as I mentioned previously worked I would say on these hard tech problems you know AR VR uh autonomous vehicles and I guess at that point what we started building out and we had a
09:30 - 10:00 different company name at that point it was called Exaunction uh we we started building out GPU virtualization systems so we built out systems to make it very fast to and efficient to run GPU based workloads and we would enable companies to run these GPU workloads on CPUs and we would transparently offload all GPU computations to remote machines And that could be that could be CUDA kernels all the way down to full-on model calls, right? We were it was a it was a very lowle abstraction that we we provided people and so much so that if another if the remote machine died, we would be
10:00 - 10:30 able to reconstruct the state of what was on that GPU and another GPU, right? And and the main use case we targeted were these large scale simulation workloads uh for these deep learning workloads that a lot of these robotics and autonomous vehicle companies had. And we thought, hey, the world was going to look like that in the future. A lot of companies would be running deep learning workloads. What ended up happening was in the middle of 2022, I think text da Vinci 3 sort of came out, which was I guess the uh you know, the GBT3 sort of instruction model sort of came out and I guess that changed a lot of our priors uh like both me and my
10:30 - 11:00 co-founders prior which is to say we thought that the set of models that would run were going to look a lot more homogeneous, right? If you were to imagine in the past the number of different models that people would run was very diverse, right? People would run convolutional neural nets, right? recurrent neural nets, LSTMs, right? Graph neural nets, they were there was a whole suite of different types of models. We thought in that case, hey, if we were an infrastructure company, we can make it a lot easier for these companies to run these workloads. But the thing is with text 3, we actually thought that actually there would be a
11:00 - 11:30 simplification of the set of models that would run. Why go out and train a very custom BERT model if you could go out and just ask a very large genital model, is this a positive or negative sentiment? And we thought that that was where the puck was going. I guess like for us like we believe in scaling laws and all these things. If it's this good today, how good is a much smaller model going to be in two years, it's probably going to be way better. So what we decided to do was actually focus on the application layer. Take the infrastructure that we had and actually build out an application. And that was what Kodium was. So we built out extensions in all the major IDEs, right?
11:30 - 12:00 And and very quickly we were able to get to get to that point. And we actually did train our own models and and and run them ourselves with our own inference stack. And the reason why we did that is at the time the models were not very good. the open models are not very good and also for the workload that we had which was autocomplete. It was a very weird workload. It's not very similar to the chat workload. Code is in a very incomplete state. You need to fill in code in the middle of a line. There's a bunch of reasons why this workload is not very similar and we thought we could do a much better job. So we provided that because of our infrastructure
12:00 - 12:30 background for free to basically every developer in the world. There was no way to pay for the product. And then very quickly enterprises started to reach out. we were able to handle the security requirements and personalization because the companies not only care about hey it's fast it's free but is this the best code for my company right and we were able to meet that workload and then fast forward to today and I I know that this is this is this is a long answer what we felt was agents in the beginning of last year would be very huge the problem was the the models were not there yet right
12:30 - 13:00 we we had internal we had teams inside the company building these agent use cases and they were just not good enough but the middle of last year we were like hey it's actually going to be good enough but the problem is the IDE is going to be a limitation for us because VS code is not evolving fast enough to enable us to provide the best experience for our end users in a world in which agents were going to write 90% of or 95% of software developers would still be in the loop but the way they would interact with their IDs would look marketkedly different and that's why we ended up building out windsurf uh in the first
13:00 - 13:30 place we thought that there was a much higher ceiling on what IDs could provide and with the agent product which is cascade we were able to deliver what we felt was a premier experience right off the bat that we couldn't have with VS Code. How large is the team who's working on Windsurf and how complex is Windurf as a as a product? Like I'm not sure how much we can quantify it. Yeah, you know, I I try to I try to be pretty like you know sort of uh like modest with with some of these things but just to say we're a pretty small team. So right now the engineering team is a bit
13:30 - 14:00 over 50 people. um at the time when we maybe that's like that's that's large compared to compared to other startups but if I were to say compared to other you know large engineering projects in the grand scheme of things like one of the books that I read a while ago was this book called showstopper right and it's this book about how how Microsoft built Windows NG uh right and it's a much larger team obviously but operating systems are more are a very complex piece of software but my viewpoint on this is that this is a very very complex piece of software in terms of where the
14:00 - 14:30 goalpost is which is to say I would say the goalpost is constantly moving right one of the one of the goals that I give to the company is that we should be reducing the time it takes to build applications by 99%. Right? And I would say pre-winsurf it was probably 20 and post windsurf it was probably over 40 but we are very far from 99 right we're still like you know a 60x away from 99 right like if we if there's a if there's a 60 units of time and we want to make it one we're quite far so in my head there's a lot of different engineering
14:30 - 15:00 projects that we have at the company in fact like I would say over maybe close to half of the engineering team is working on projects that have not seen the light of day right and and that's like an interesting decision that I guess we've made Because I think we cannot be embracing incremental, right? Like we're not going to win and be a a valuable company to our customers if all we're doing is changing the location of buttons. Like I think people will like us for great UI, but that cannot be the only reason why we win. No. And I love it. I mean this is, you know, when you're a startup, I think you need to aim really big. You cannot just do
15:00 - 15:30 incremental. You can do incremental later. Hopefully, you're going to get there. And what are some interesting numbers that you can share about the usage of of of wind surf or or the load that you're handling? I'm assuming this is just going to it's pretty easy to tell it will be keep going up, right? That's an easy prediction. No, I I think you're right. So, one of the interesting numbers I can pro I can I can or a handful of numbers is within a couple months of the product existing, we had like well over a million developers try the product. Uh so, it's been growing quite quickly. within pricing coming
15:30 - 16:00 out, we've we've reached over within a month, we reached over sort of eight figures in in ARR. Um, and uh and I think I think all of those are kind of interesting metrics, but also on on top of that sort of we run our own model still in a lot of places like you can imagine the fast passive experience is completely our own model. A lot of the models to go out and and retrieve parts of the codebase and find relevant snippets are our own models. And that system processes well over sort of 500 billion tokens of code every day right now. So that system itself is huge. It's
16:00 - 16:30 a huge work um that we that we actually run. Yeah. And and I guess the history of the of Windsurf is interesting once we I understand that you've actually been building your own models for quite the time. You know, you've not just started here because I think for for most engineering teams that will be daunting and also it's just a lot of time, right? Like it's it's not something that you would just like it's harder to do from scratch. I I'll say that because not nothing's impossible here. I totally agree with you. I think you know one of the weird things is
16:30 - 17:00 because of the time that we started and the fact that we were like in the very beginning first of all we had the infrastructure background but we were first saying we need to go out and build an autocomplete model the best model at the time that was open source end of 2022 was Salesforce codejet and I'm not saying it was a bad model it was awesome that Salesforce did open source that model but it was missing a lot of capabilities that we needed for our product right it was missing fill in the middle which feels like a very very obvious capab capability, but the model fill in the middle. What is that? So the
17:00 - 17:30 idea of fill in the middle is basically if you look at the task of writing software, it's very different than than chat. And maybe an example of what chat is, you're always appending something to the very end and maybe adding an instruction. But the problem for writing code is you're writing code in ways that are in the middle of a line, in the middle of a snippet of code. That kind of stuff. Yeah. In the middle of a function. And the problem there is actually there's a lot of issues that pop up, which is to say actually the tokenization. So these models when they consume files they actually tokenize the
17:30 - 18:00 files right which is they don't consume them bite by bite they consume them token by token but Phil but the fact that the code when you write it at any given point doesn't tokenize into something that looks like in distribution and I'll give you an example how many times do you think in the training data set for these models does it see instead of return only without the RN probably never it probably never sees that so it's completely out of distribution but we still need to when we see REU predict who are going to do RN space a bunch of other stuff right it sounds like a very
18:00 - 18:30 small detail but that is actually very important if you want to build a product and that is a capability that cannot be slightly post-trained onto the models it's actually something where like you need to do a non-trivial amount of training on top of a model or pre-trained to get that capability and it was table stakes for us to provide that for our users so that forced us very early on to actually build out our own models and figure out training recipes and make sure we could run them at massive scale ourselves for our end users and what are other things that are unique in terms of building models for code as opposed to the the usual text
18:30 - 19:00 models. I I I can think of things like you know the brackets for example and some languages. Maybe this is just just naive. You're you have all seen so many more. So like what what makes code what makes it interesting slashworthwhile to build your own model for code? Yeah, I think I think what you said is is is definitely like one thing. the fill-in-the-middle capability. I would say another thing, another thing you can do is code is like quite easy to to and
19:00 - 19:30 quite easy is maybe you know an an overstatement but quite easy to parse right you could actually asd parse code you can find relationships of the code uh because code is a system that is like evolved over time you could actually look at the commit history of code to see to build a knowledge graph of the codebase and you can start putting details what do do you do that yep yeah yeah yeah we look at the previous commits and And one of the things that it enables us to do is build a probability distribution of the codebase of conditional on you modifying a piece
19:30 - 20:00 of code. What is the probability of you modifying another piece of code? So there's you know when you get into the weeds code is a very it's very information dense right it's testable. Um there's a way that it evolves people write comments which is which is also cool which is to say once a pull request gets created people actually say I didn't like this code. So there's a lot of signal on what good and bad looks like within a company. And you can use that actually as a way to make to automatically make the product much better for companies, right? You know, one of the things that I think we were
20:00 - 20:30 all of us were talking about I would say like a couple years ago when when and I guess I guess we've been here in the space quite a long time. I know couple years is not a very long time in most categories but in this category it's you know dinosaur years. Um so so you know one of the things that I think is is is kind of interesting is we in the beginning we were we were saying hey people would write all these guidelines and documentations on how best to use the product but the interesting thing is code is such a treasure trove you can go out and probably make a good first cut on what the best way to write software
20:30 - 21:00 is inside JP Morgan Chase inside Dell you can go out and do that by using the rich history inside the company um so there's a lot of things that you can start doing autonomously as well if that makes sense Yeah, one thing I I' I'd love to get your your take on on how it might have changed. A year or two ago when using when copilot started to become more popular again on earlier version companies like source graph and others have started to to build out their capabilities there was this debate of
21:00 - 21:30 would it be worth fine-tuning a model on my company's codebase talking about large companies let's talk JP Morgan or or those others and you know there were two strange of thoughts one said like oh it's probably worth it because our code is so unique it it might be worth it and some other people were thinking like it might not be worth it because uh it it might be too resource intensive. The models are too generic. Did you try this out and where did you land in this? Because I I never got an answer to you know what happened like what what was worth it, what was not worth it in the
21:30 - 22:00 end. So So for what it's worth, we did try it out. Um we built out some crazy infrastructure to go out and try it out. I you know I guess this will be the first place where I talk about the actual infrastructure. We built out systems. So transformers have these many layers, right? And if you were to imagine if when we we actually enable companies to selfo at some point in the past we were enabling companies to self-host the system and the fine tuning system as well. So at that time you'll have five self-hosting that is you built this out. We built out self-hosted not only deployment but also fine-tuning and
22:00 - 22:30 the way that that actually worked was was actuallyite quite crazy which was to say um okay where do you get the capacity to fine-tune a model if you're already running it for inference like the company may not want to give you so many GPUs. So we just said hey why don't we use the preemptable time which is to say when the model is not running inference what if we actually go out and back propagate do backrops on the transformer model while this is happening and then what we found was oh the back props take a long time and it might cause downtime on the inference side. So what we enabled it was we
22:30 - 23:00 enabled the back prop to be able to be preemptable on every layer of the transformer. So if that's to say let's say you send an inference request and it's going to do it's doing back propagation um and it's on layer 10 it'll just stop at layer 10 and it will continue after your inference request completes. So we built a lot of crazy systems to actually go out and do this. I guess here's the thing we found. We found that fine-tuning was a bump but it was a very modest bump compared to what great personalization and great retrieval could do. That's what we found. Now does that mean fine-tuning in
23:00 - 23:30 the future is not going to be valuable? I think actually per person fine-tuning could actually work quite well. I think though maybe some of the techniques that we do it are going to need to change. And here's the way I like to look at it, right? Or anytime a system, you know, you build a system, there are many ways to improve it. Some of them are much easier than other ways. And you can imagine there's a hill to climb for everything. And some hills are much easier. And the right strategy to do when a hill is much easier and it provides a lot of value is climb that hill fully before you go out and do something that's a lot harder. Because
23:30 - 24:00 when you do the thing that's a lot harder, you are like adding some amount of tech debt if that's not the right solution. Right? If what I described to you in terms of like the solution of doing backdrop on a layer bylayer basis, it's a cool idea, but you could imagine it added a lot of tech technical complexity to the software that might have been unnecessary if we thought that purely doing better retrieval was going to be much better. So there's this like I guess there's this tight rope to kind of you know balance on top of on on how you how you decide these things. Now I I I was asking around I've been using windsurf as well but I'm I'm not a very
24:00 - 24:30 heavy user but I have been asking around more heavy users and one of the biggest criticisms both of windsurf but also in of every tool in this area has been like look I start off it's good it works good I have a relatively small complex my project grows either because windsurf generates code or it's just a big project and after a while it starts to struggle with the context maybe it doesn't see u you know part it gets confused etc uh And clearly as as an engineer understand that it it is going to be a problem of like you have a
24:30 - 25:00 growing context window, you still want to have similar quality. How do you tackle this challenge? What what progress have you made? And like I I I think this is a bit of a million-dollar question in the sense of like if if we could somehow have a solution for this, we would be better off. Uh where have you gotten on this? I'm assuming this is a pretty common challenge and difficult. I think I think it's a very hard problem. Um you're you're totally right. There's a lot of things that we can do which is to say obviously we need to work around the fact that the models
25:00 - 25:30 don't have infinite context and when they do have larger and larger context you are paying a lot more and you take a lot more time right and developers usually a lot of the time don't really want to wait and you know one of the things that we have for our products we hate waiting. Yeah. Exactly. But one of the things that we have for our products that we've learned is if you make a developer wait the answer better be 100% correct. And I don't think we're at a time right now where I can guarantee you with like a magic wand that all of our cascade responses are 100% correct, right? I don't think we're at that right now. So there's a lot of a lot of things
25:30 - 26:00 that we need to do that are almost approximations, right? Um how do we keep a very large context? But despite that, we have chats that are so long that how do you accurately checkpoint the past conversation? But that has some natural lossiness attached to it, right? And then similarly uh if the codebase gets very large how do we get very very confident that the retrieval is very good and we have evaluations for all of these things right this is not something which we're like shooting in the dark and being like hey yolo let's try a new approach and like give it to half of our users um but I think you're totally
26:00 - 26:30 right there's no I don't think there's like a complete solution for it. What I think it's going to be is like a mixture of a bunch of things which is to say much better checkpointing coupled with better usage of context link much faster LMS and much better models. So it's going to be it's not going to be I think a silver bullet and by the way that could be tied with hey you know understanding you know understanding the codebase much better from the perspective of the codebase already existed able to use the knowledge graph right able to use a lot of the dependencies within the codebase a lot
26:30 - 27:00 better so it's a bunch of things that I think are going to multiply together to solve the problem I don't think there's going to be like one silver bullet that makes it so you're going to be able to have amazingly coherent conversations that are very very long basically to be fair as an engineer It kind of this might feel weird but it makes me feel a bit better that we're we're actually back to talking about like engineering step by step as opposed to like okay you know having these you know it feels like you get a new model not now but early on when we got a new model it was like oh my gosh this magic and it it took a while to understand how it works how
27:00 - 27:30 it's broken down etc. You did mention your infrastructure. Can you talk a little bit about how we can imagine your hardware and and backend stack if I was to join it uh Windsurf as an engineer? Like is is it going to be a bunch of you know cloud deployments here and there? Do you do you self-host some of your your GPUs? Cuz a lot of AI startups who are smaller or more modest, they're just going to you know platform as a service. I it sounds like you might be at the scale where maybe you're outgrowing this as well. Yeah, I think we might have we
27:30 - 28:00 might have just never done kind of, you know, buying offtheshelf stuff in the in the early part of the company. Maybe to our your background, I keep forgetting this. Yeah, but even more than the background, I think there were cases where we could have and and made maybe should have. One of the reasons why we also didn't was very quickly we got brought into working with very large enterprises and I think the more dependencies you have in your in your in your software, it just makes it harder and harder for these larger companies to integrate the technology, right? um like they don't want a ton of subprocessors
28:00 - 28:30 attached to it, right? We recently got Fed Ramp compliant, Fedramp high compliance. We're the only AI software assistant with Fedramp high compliance and the only reason why that's the case is we've kept our systems like very tight, right? Um and for these compliances, I did some but not specifically Fedramp. What what do you need to prove that you are uh this compliant? Yeah, I think basically you need to map out uh map out the high levels of sort of all the interactions. You need to be very methodical about releases and how the releases make it
28:30 - 29:00 into the system. You need to be very uh methodical about where data is persisted at a layer that is like probably much deeper than sock 2. I think like going through the sock 2 versus feder uh I did so two and that was already pretty long. So it was already a really long Yeah. It's impressive that you did this as a as a startup scale of congrats. Yeah, one of the reasons why was I guess like one of our first customers that were a large enterprise was was like Dell, right? Which is like not a usual first large enterprise and I guess for startups no. For startup definitely no.
29:00 - 29:30 So it forces down a path of how do we build very scalable infrastructure? How do we make sure our systems work at a codebase that is 100 plus million lines of code? What does our GPU provisioning need to look like for this large a team? It's just forced us to become a lot more I guess operationally sound um for these kinds of problems if that makes sense. Yeah. And how do you deal with with inference? You're you're you're serving the these systems that serve probably billions or hundreds of billions well actually hundreds of billions tokens per
29:30 - 30:00 day as just as you just said with low latency. What smart approaches do you do to to do this? What kind of optimizations have you looked into? Yeah, I mean like a lot as you can imagine. Um, one of the interesting things about some of the products that we have like the passive experience, latency matters a ton in a way that's like very different than some of these API providers. Like I think for the API providers, time to first token is important, but it doesn't matter that time to first token is 100 milliseconds. For us, that's the that's the bar we are trying to look for, right? Can we get it
30:00 - 30:30 to sub, you know, a couple hundred milliseconds and then hundreds of tokens a second um of for the generation time. So much faster than what all of the providers are providing in terms of throughput as well. um just because of how quickly we want this product to kind of run. And you can imagine there's a lot of things that we we want to do, right? How do we run how do we do things like speculative decoding? How do we do things like model parallelism, right? How do we make sure we can actually batch requests properly to get the maximum utilization of the GPU all the while not hurting latency, right? That's
30:30 - 31:00 an important thing. And one of the interesting things just to give uh some of the listeners some mental model, GPUs are are amazing. They have a lot of compute. If I were to draw an analogy to computer or to CPUs, GPUs have over sort of two orders of magnitude more compute than a CPU, right? It might actually be more on the on the more recent GPUs, but keep that in mind. But GPUs only have an order of magnitude more memory bandwidth than a CPU. So what that actually means is if you do things that are not compute intense, you will be memory bound, right? So that necessarily means to get
31:00 - 31:30 the most out of the compute of your processor, you need to be doing a lot of things in parallel. But if you need to wait to do a lot of things in parallel, you're going to be hurting the latency. So there's there's all of these different trade-offs that we need to make to ensure a quality of experience for our users that we think is high for the product. And we've obviously mapped out all of these. We've seen how hey, like if we change the latency by this much, what does this change in terms of people's willingness to use the product? And and it's very stark, right? Like a 10 millisecond increase in latency affects people's willingness to use the
31:30 - 32:00 product materially, right? It's percentage points that we're talking about. So these are all parts of the inference stack that we've needed to optimize. And is latency important enough or or does the location factor factor into this? The the physically how close people you know using windsurf are to wherever wherever your server and then your GPUs are running. You need to worry about that as well. You do need to worry about that like speed of light starts uh starts mattering. Interestingly, you know, this is not something I would have expected, but we
32:00 - 32:30 do have users in India and interestingly, the speed of light is not actually what is bottlenecking their uh their performance. It's actually the local network. So, just the time it takes for the packet to get from maybe like from their home to the major ISP is actually somehow there's a lot of congestion there and that's the kind of stuff that we need to kind of deal with. Um, but by the way, that is something that we just cannot solve right now. So you're totally right. The data center placement matters. Like for instance, if you force a data center in Sydney and
32:30 - 33:00 you have people in Europe, they're not going to be happy about the latency. So we do think about like where the location of our GPUs are to make sure that we do have good performance. But there are some places where there are some issues that even we can't get around basically. No, I the last time I heard this complaint before Windsurf because this came up with actually again some someone who's using Windsor and other tools a lot said that specifically for one of the tools he can tell that the data centers are far away because it's just slow. Cloud development environments had the exact same thing
33:00 - 33:30 because they were similar, right? Like this was I'm not sure they're as popular right now, but there was a time where it looked like it might be the future. you just log on to your remote environment which is running on CPUs or GPUs somewhere else and it again I think it might have to do with as as does when you're typing like when I'm when I'm using it I mean I'm just used to like I do on subsecond probably like sub few hundred milliseconds I I I just notice right you feel it's slow and it it it just bothers you like it just No, I agree. I think if I had to if I
33:30 - 34:00 had to even see every time I typed a keystroke like a couple hundred milliseconds later the the key would show up like I would rage quit. I that would be a terrible experience for me. How do you deal with indexing of the code? So you're you're going to be indexing you know depends on the code base it'll be more or less but if you add it up I'm sure we're talking billions or or a lot more in in code and for for your enterprise customers you might actually have you know the hundreds of millions or or even more lines of code. Is is there anything like novel or interesting that you're using
34:00 - 34:30 or is it just kind of tried and proven things for example that search engines might might use? It's a little bit of both to be honest and and what I mean by that it's not a very clean answer. We do try approaches that are embedding based. We have approaches that are keyword based um on the on the indexing. Um, interestingly actually one of the approaches that we've taken that's very different than search and maybe actually systems like Google actually do this is we not only actually look at just the retrieval we do a lot of computation at
34:30 - 35:00 retrieval time. So what that means is let's say you want to go out and like ask a question you know one of the things that you can go out and do is like ask it to an embedding store and get a bunch of locations. What we found was the recall of that operation was quite low and you know one of the reasons why that happens is embedding search is a little bit lossy like let's say I was to go to a codebase and ask hey give me all cases where this function uh this spring boot version xt type function was there I don't think
35:00 - 35:30 anyone would believe embedding search would be comprehensive right because it's just not like you're taking something that is very high dimensionality and reducing it to something very low dimensionality without any knowledge of the question. That's like the most important thing. So it like it needs to somehow encode all the possible be relevant for all the possible questions. So instead what we decided to do is take a variety of approaches to retrieve a large amount of data and that it could include the knowledge graph that include could include the dependencies from the abstract syntax tree that could include like as like keyword search that could
35:30 - 36:00 include embedding search and you kind of fuse them all together and then after that we throw compute at this and actually go out and process large chunks of the codebase at at inference time and go out and like say hey these are the most relevant snippets and this gives us much higher precision recall right on the retrieval side to actually go out and and by the way that is like very important for an agent because imagine if an agent kind of like doesn't have access and doesn't deeply understand the codebase all the while the codebase is much larger than the context length of what an agent is able to take in right
36:00 - 36:30 so we've you know optimizing the precision recall of the system is actually something that we spent a lot of time and built a lot of systems for it's interesting because it feels like you're it shows how well a it's code so you can more easily work with it especially with certain keywords for example on some languages I can imagine that you can even just, you know, you can even list all the keywords that are pretty common and you can decide if it's a keyword or if it's something special where and if it's a keyword, you can already just like do it. And it's interesting how you can combine the kind of old school or old school before
36:30 - 37:00 before LMS and then add the best parts of LMS but not forgetting about the you know what worked before. That's right. I'm just I I wonder if there's other any other industry that that has this that that we we do have this like lower the dimensionality space in terms of the the grammar and all these things. We understand the usage pretty well and then the users are power users who actually you know the same people use it who could actually build you know this tool. Yeah. I you know I
37:00 - 37:30 feel I feel like Google's like Google's system is probably ridiculously complex and sophisticated for obvious reasons just because for for one they've been doing this for so long and they've been the they obviously they were they've been at the top for such a long time and then also on top of that they're the monetary value they get from delivering great search is so high given ads that they are incentivized to throw a lot of compute even at at the at the query time right to make sure that the quality of suggestions is is really good so I assume they're doing a lot of a lot of
37:30 - 38:00 tactics there. Obviously, I'm not privy to to all the details uh of the system. So, I I don't know. Well, it's interesting because I I would have agreed with you until recently, but there are some search engines that are are doing really good results. Uh so, I I wonder if Google is less focused on the actual the hay stack and the needle and maybe more on revenue or maybe they're doing it as unvisible. I'm I'm sure they're doing an amazing job by the way behind the hood, but I I wonder if some of that knowledge has commoditized, but you know, we'll see. But moving on from indexing, in terms of databases, uh what kind of databases do you use and
38:00 - 38:30 what challenges are they giving you? Like again, you're you're not I'm assuming you're not just going to be happy with like the usual let's throw everything in Postgress or or or do you actually you might be able to I don't know sounds like these days Postgress is is can be used surprisingly well for even embeddings. Yeah, you know I I think we we uh we do a combination of things. So we do like some amount of local indexing. We do some remote indexing as well. Uh local indexing as on the user machine on the user's machine that helps us get in in some
38:30 - 39:00 ways the benefit of that is it helps you build up um if you were to say hey you have some understanding of the codebase. Um the problem is that understanding changes very quickly as the user starts changing code starts checking out new branches and you don't want to like basically say all of your information about the codebase you need to throw away because of that. So, it's good to have like some information about like the user's a user's history and what they've done locally kind of like stored. Um, in terms of remote, I think it would be a lot simpler than people would imagine. One of the complexities of our product, the reason why the
39:00 - 39:30 product is very complex is actually the fact that we need to run all of this GPU infrastructure, right? That's actually a large chunk of the complexity because if you were to look at our QPS, our QPS is high, but it is not like tens of thousands of QPS, right? Actually it's not it doesn't need to be that high because in in some ways in in some ways like actually each of the queries that is happening is actually a really expensive query. It's doing trillions of operations remotely. So actually the complexity of the problem is how do you optimally do that process right? So we
39:30 - 40:00 can actually get away with things like postgress like we're not in fact I would say I like to keep things pretty simple if the if it's possible to keep things very simple and we should not be rolling any type of our own database like I think databases are very very complex pieces of technology. I think we're good engineers but we're definitely not good enough to kind of like on the side build our own database. And then for local indexing what database do you use it? Yeah, we have our own combination of like sort of like SQL based database. We have a local SQL database and then like some like sort of embedding databases as well that we store locally as well. What
40:00 - 40:30 is your view on the value of embedding databases? This has been an ongoing debate for for the past like since Chad GBC became big. Again, there were two schools of thoughts. One is we do need uh embedding based database embedding databases because they can give us vector search. They can give us all these other features that LLMs and embeddings will need. And the other school of thought is well let's just expand relational databases. We add a few extra indexes and boom we're done. From you're you know you're more of a user of this but you're a heavy user at at Windsurf and and
40:30 - 41:00 Kodium. What pros and cons are you seeing? I'm just trying to get you to like go to one direction the other. It's a good question. So my our viewpoint on embeddings are probably that they are not uh they don't solve uh a problem by themselves. They actually just do not. So the answer is going to be mixed. And then the question is why do we even do it in the first place? Right? And I think it really boils down to it's a it's a recall problem, right? When you want to do good retrieval, you need the input to what you're willing to consider
41:00 - 41:30 to be large and high recall, right? And if you were to think about it, the problem is if you only have something like keyword search and you have a very very large uh sort of uh codebase actually what happens if the user typos something, right? Then your recall is going to be bad. So I the way I like to think about it is each of these approaches keyword search right um like sort of knowledge graph based retrieval all of them they're all like different circles and what you're trying to do is get get something where the union of these circles is going to give you the highest recall uh ultimately for the
41:30 - 42:00 retrieval query and I think embedding can give you good recall because it is able to summarize or actually able to distill somewhat of semantic information about the about the chunk of code the a or the file or the directory and all this other stuff. So what I would say is it's a tool in the toolkit, right? It's not like you cannot build our product entirely with an embedding system, but also does the embedding system help? I think it actually does help, right? It does improve our recall metrics and our precision metrics. So I talked with your your head of
42:00 - 42:30 research Nicholas Moy and he told me about a really interesting challenge that you're facing which he called it the split split brain situation. He he was basically saying that you it's almost like the the team and everyone on the team needs to have two brains. One is just being aggressively in the present, shipping improvements as you go, but also then do a long-term vision where you're building for the long run. How do you do this? Like how did you start doing it and how how do you keep doing it? You did mention earlier, right, that half the team is working on other stuff, but you kind of do you kind of like split people so like people
42:30 - 43:00 focus on short-term, long-term or or do does everyone including you juggle these things in your head dayto day? It's an interesting one. Yeah, I don't want to give uh myself like that much credit here. I think like our engineers uh are probably probably should be given most of the credit here. But I think in terms of like maybe company strategic direction both me and my co-founder the CTO of the company. Uh he we we try to think a lot about how do we disrupt ourselves right? Um because I think it's very easy to get into the state where
43:00 - 43:30 hey I added this cool button I added this like way to control X with a knob and you keep going down this path and yeah your users get very happy but what happens if tomorrow I told you users don't need to do that and it's an amazing experience and it's like a better experience. Your users are going to feel like why why do I need to do this? So here's the thing. Users are right up to a certain point, right? They will not be able to see a like, by the way, if they can, then we should not be doing this. They will not be able to see exactly what the future solution should
43:30 - 44:00 be, right? If our users can see the future solution better than we can, like we should just pack up our bags and leave, right, at that point. Like what are we actually doing here? So I think basically, you know, you have this you have this tension here where you need to build features to make the product more usable today, right? And our users are 100% right. They understand this. They they face pain uh through many different axes that we don't and we should listen to them. But also at the same time, we might have an opinionated stance on where coding and where these models and where this product can go that we should
44:00 - 44:30 go out and build towards and we should be expanding expounding a large amount of our engineering capital on that. And C can you talk about like some kind of bets that you're having? you know, not necessarily giving away everything, but like some some promising directions that might or might not work out or even in the past some some promising directions that maybe did not work out. Yeah, I'll tell you a lot of them. Yeah. So, so so we failed a lot. Um and and I think failing is great. Uh and one of the things that I I tell our engineers is
44:30 - 45:00 like engineering is not like a factory building, right? It's it's actually, you know, you have a hypothesis, you go in and you shouldn't be penalized if you failed. Actually, I love the idea of hey, an idea sounds interesting. We tried it and it didn't work because we at least learned something and learning something is awesome. And I'll give you an example. The agent work that we did for we didn't even start beginning of last year. We started even before beginning of last year. It was not working for many months. And actually Nick Moy was working on who who you probably spoke with was the one who was working on on some of this stuff. And
45:00 - 45:30 you know for a long time a lot of what he was doing was just not working. And he would come to us and we would say okay fine. It doesn't seem like it's working. So we're definitely not going to ship this. Uh but let's keep doing it. let's keep working on it uh because because we believe it's going to get better and better but it was a it was failing for a long time right we came out with a with a review product right in uh beginning of last year or around then called forge for code reviews um we thought it was it was kind of useful internally at the company and we thought we could continue to improve it people did not find it that useful right um it
45:30 - 46:00 was not actually that useful and and you know this was we were going in with the assumption code reviews take a long time what if we could help people and the fact of the matter was the way we thought we could help people wasn't actually material enough for people to want to take on this new tool, right? And there's a lot of things that that sort of obviously that we've tried in the past that just didn't work the way we the way we thought it did. And you know, for me, I think I would be totally fine if 50% of the bets we make don't work. Yeah. And it's a lot of startups
46:00 - 46:30 say that. And then after a while, what I notice is as a company becomes bigger, I saw this as Uber, it's actually not really the case. there's like failures kind of on paper it's embraced but actually it's not. So I think you know like there there's this tricky thing that when it's actually meant like it's awesome otherwise people just like start to like polish things and make things look good when they're not pretend that it's not a failure but it was a success and we're just walking away that kind of stuff. So it's it's nice to see that you're doing it. What was the the thing that turned the agents around which then
46:30 - 47:00 I assume became Cascade? Like like was it a breakthrough on your end? Was it the models getting better? or was it a mix of something else? Yeah, I think it was a handful of things. So, I'll walk through it. So, first of all, the models got better. 100% the models got better. I think even with all the internal breakthroughs we had, if the models hadn't gotten better, we wouldn't have been able to release this. So, I don't want to trivialize that matter. It was huge. The two other pieces that were quite important was our retrieval stack was also getting better and better, which I think enabled us to work much better at these larger code bases. I
47:00 - 47:30 guess table stakes it's it's quite good at zero to one programming but I think the thing that was like like like a groundbreaking to us was our developers on a complex codebase we're getting a lot of value from it right and I would say something quite interesting which is that chajbt by itself wasn't incredibly groundbreaking to our developers inside the company and that's not because that's not because chajbt is not a very useful product is a ridiculously useful product it's actually just because you need to think about it from the
47:30 - 48:00 perspective of opportunity cost and how much more efficient you get. Our developers a lot of them have been developers in the past. They are quite I think we do have an exceptional engineering team. They were used to how to use Stack Overflow and all these other tools to get what they wanted. So, but suddenly when the model had the capability to not only understand your codebase and start to make larger and larger changes, it changed the behavior of the people inside the company. And not only change make changes, we built systems to very quickly edit the code, right? the edit the ability to
48:00 - 48:30 edit code. We built uh the kind of models to take a highle plan and make an edit to a piece of code very fast. So all of these together made it so that this was a workflow that our developers wanted to use, right? We had the speed covered. We had the fact that the that it it understood the codebase well and then we also had massive model improvements to actually be able to call these tools and make these iterative changes, right? That's like, you know, I don't want to diminish that you you have all of these and suddenly now you have a real product. I've been meaning to ask you this, but
48:30 - 49:00 how how is the team, you know, like using windsurf to develop windsurf because you're doing it, right? Like you just told me how you're you're doing it. Do you have from two two perspective? One, from the the technical feasib feasibility, I'm assuming, you know, like just, you know, you're not going to work on the exact same codebase or you have a fork or something like that or a build or something like that. And then the the other hand on like, you know, do you kind of force people to dock food? Do people just do it? Do people get stuck on certain versions? Do they turn on features for themselves? etc. So the
49:00 - 49:30 way we do it is we do have like an insiders developer mode. So this enables us to test new features. I guess anyone at the company should be able to create a feature and then deploy to everyone internally. And now we have a large number of developers. We'll get feedback. We have an ability for our own developers to dog food releases. We can have our own developers say, "I hate this thing. Please don't ever do this." And it's nice because then we don't need to give it to our own developers but other developers. So I think we have this tiered system at the company. We have our own sort of release. We have
49:30 - 50:00 next which is future-looking products that we that we are releasing that that are are a little bit more raw and then we have like the actual release that we give to developers which we're willing to AB test things but we're not willing to AB test things in such a way where we give people a comically bad experience just to AB test something right like it's bad because people are using this for their real work. So if you're using it for your real work, we don't want to be hurting you, right? So I think one of the things that's quite valuable to us is probably this is a this is you would think this is a failure mode for our
50:00 - 50:30 company which is that we use windsurf largely speaking to modify large code bases, right? For obvious reasons because I think our developers aren't building these toy apps over and over again. But crazily enough, one of our biggest power users inside our company is actually a non-developer. He leads partnerships. He's never written software before. and he routinely builds apps with Windsorf right and he's one of our biggest users inside the company and we've used this to actually replace buying other SAS tools and he's actually even deployed some of these tools inside the company what function is this person
50:30 - 51:00 in it's partnership so I'll give you an example like some of the tools some of the tools these are not complex pieces of software but you would be surprised at how much they actually cost they're six figures in in cost because it's bespoke software right I'll give you an example you have a quoting tool so the idea of a quoting tool is you have a customer, the customer has this size, they're in this vertical, you know, they want this kind of uh deal, here's the way it would look, here's the amount of discount we're willing to give them as a customer. Yeah. And usually these systems are really like systems that you would need to pay a lot of money for.
51:00 - 51:30 And the reason is because I I don't know like our there's no reason for us for our developers to go out and build this internally, right? It's a big distraction from going out and building our product. But now on the other hand, you have a domain expert in the person that actually runs partnerships. He doesn't know software, but he knows this really well, right? And because of that, he's able to create these apps really quickly. Now, granted, we do have a person inside the company that looks at the app, make sure that it it logistically makes sense. It has it's secure, can be deployed inside the
51:30 - 52:00 company. But these are more ephemeral apps, right? They're quite stateless. If you were to look at the input output of this app, it is it is not as complex as like let's say the Windinsserve project, right? Um but now we now have this like growing set of people inside the company that are not developers that are getting value from which we found a little surprising too. Yeah. And can you also give maybe just like some other examples of what you think it might be your place? Reason being is this like I'm actually really interested in this because I do hear a lot of people either on social media or CEOs saying that SAS
52:00 - 52:30 apps could be the end of it and I've always been skeptical for the reason that you know there's two types of SAS apps and the most of the SAS apps I see for example workday which is HR platform and and they will have hosting they will have business rules they will like update to some extent with regulations and all that stuff so they do a lot of stuff that is the UI I know we can trivialize but it's it's a lot more than that and then there are a few of these simpler ones like I I don't want to put names but there's like a a polling app where internally inside the company you can pull it has state but it's
52:30 - 53:00 relatively simple you can see behind it it's just going to I I could build I could build it but I just don't want to deal with uh authentication to to host it inside the company but it's it's already there and then there's ones you mentioned that are stateless so like what kinds of SAS tools you do you see that you're replacing and you might see other companies potentially using you know like tools like this actually have with need one dedicated helper devel developer build build it internally bring it in house yeah you know I think
53:00 - 53:30 it's hubris to believe that products like workday and salesforce are going to get replaced by this I think you're totally right um these products have a lot of state they encapsulate business workflows there's actually for a product like workday probably compliance that you need to do because of how business critical the system is so this isn't the kind of system that this would replace it probably falls in the latter two categories and probably even just the last one which is to say kind of these stateless systems that don't do rights to the most business critical parts of your your databases. It's probably
53:30 - 54:00 actually those kinds of systems that very quickly can get replaced. And I would say there's a new category where think about the amount of software that would benefit a business that just isn't getting created that now could get created, right? Because and the reason why that software couldn't get created is a company couldn't be created that would be able to sustain itself that would have an economic a business model that would justify it existing. But now since the software is very easy to create, these pieces of software are going to proliferate, right? And one of the things that I like to talk about for software is there's a little bit of a
54:00 - 54:30 we've been, you know, because the cost of building software was a lot higher, right? Of of of simple software was a lot higher, right? Right now for a front end, we have to admit it's gotten a lot cheaper to build a basic front-end system, right? Radically radically cheaper. So I think the way I would sort of look at it is for these kind of uh systems, what are you really paying for when you pay a SAS vendor? You're not only paying for your product, you're paying for the maintenance. You're paying for the fact that actually, you
54:30 - 55:00 know, this company actually is building a bunch of other features that you don't need. And the reason why is because they need to support a bunch of customers, but you're still paying for that R&D, right? You're paying for their sales and marketing, a bunch of other stuff there. So my viewpoint is if you can build custom software for yourself that is not very complex but helps you in your own business process I think that might proliferate inside companies and that might actually cause a whole host of kind of companies that fall into that category that is simple business software that is feels largely stateless
55:00 - 55:30 to kind of have trouble unless they like kind of reinvent themselves. Yeah. And I I guess you know one obvious reinventing that could happen later is once this happened let's just continue this thought of like companies are building a lot of internal software they they might start to have some similar problems of let let's take you know 3 five years that under old maintenance storage uh uh compliance just just going through if if if they're working re-evaluating if it makes sense to actually bring it into
55:30 - 56:00 something. So like this could create a lot of new opportunities for other software businesses or or or software developers or or you know maybe these companies or maybe a new job role in software engineering which is you know I'm now specialized in I I've built some so many of these apps and I can help you with them. Who knows? No, I think I think that like a lot of people talk about how we're going to have like way fewer software engineers um in the near future. I think it feels like it feels like it's people that hate software engineers largely speaking that say this. uh it feels like like pessimistic
56:00 - 56:30 not only towards these people but I would say just in terms of what the ambitions for companies are right I think the ambitions for a lot of companies is to build a lot better product and if you now give the ability for companies to now have a better return on investment for building technology right because the cost of building software has gone down what should you be doing you should be building more because now the ROI for software and developers is even higher because a singular developer can do more for your business right so technology actually increases the ceiling of your
56:30 - 57:00 company much faster. Yeah. And I'm going to just double click on that because like you know you're you have been building wind surf and and you've been building these tools but you've also worked with the team in fact with the same team even before these tools today one of your you know solid engineers who was a solid engineer four years ago. How has their work changed now that they have access to wind surf agentic you know cascade all these other tools in including you know like chat gpz etc. What what's changed and then not just
57:00 - 57:30 your your engineering but also the team that you had four years ago uh you know that was doing work how has their work changed in terms of I don't want to point you in any direction but I'm just interested like like what you would say how does that seem different in what they do or how they do how much they do. Yeah, I think I think there's maybe a couple things. So, first of all, the amount of code that we have in the company is quite high. It now dominates what a single person knows at the moment. So, in the beginning of the company, that's not the case. So, actually, this is something that I can't point to because the company was quite
57:30 - 58:00 small. Right now, I would say it enables like there's less there's more fearlessness to jump into a new part of the codebase and start making changes. Right? I would say in the past you would be you would you would more say hey this person has way more familiarity with this with this part of the code that is still the case right when you say familiarity now it is it's like understanding the code but this person also knows where the dead bodies are which is to say um hey when that where all you know you did X and you got Y
58:00 - 58:30 that happened and that means you always should do Z right and and there are still people like that at the company and I'm not saying that that is not valuable but I think now engineers feel more empowered forward to go out and make changes throughout the codebase which is actually awesome. And the second key piece is our developers now go to the AI first to see what value it would generate for them before making a change which is something which I would say in the autocomplete days you would go out and type it and you would get a lot of advantage from autocomplete and the passive AI but now the active AI is
58:30 - 59:00 something that developers more and more reach towards to actually go out and make changes at the very beginning. Right. Yeah. I'm I'm interested in how this will change software engineering as a whole cuz I I also noticed I noticed both things on myself like I I I still I still code and I do my side projects but I always drag my feet of getting back into the context of the code that I wrote which was you know I kind of forgot part of it getting back into the language because I use multiple languages between because they're side projects and AI like it does help me just like jump into it. I I no longer
59:00 - 59:30 have the thing and sometimes yeah I I just prompt the AI saying what would you do? I just want to know and then if it looks good I do it. If it not I just scrap it. maybe I prompted or sometimes I just like nah I'm just going to do it because either I didn't give it right instructions like you know there's this thing especially when you're working on on stuff uh you know the codebase you've onboarded you know what you want to do but I think it really help it helps me at least with the effort sorry with the with the thing that that that wouldn't
59:30 - 60:00 take like much creativity but but it would just be time a drag figuring out the the the right things finding the the right dependency changing those things, that kind of stuff. I think you're you're exactly right. I think this reducing friction piece is something that is it's it's kind of hard to quantify the value because it makes you more excited to do more, right? You know, this stuff I think software development is a very weird profession and I'll give you an example of why it's
60:00 - 60:30 weird and and a lot of people would think, oh, this is a very easy job and I actually think it's it's quite hard on on you mentally and I'll I'll walk you through what I mean by that. It's, you know, you're doing a hard project, you sometimes go home with incomplete with, you know, with an incomplete idea. The code didn't pass a bunch of tests and it just it just bothers you when you sleep and you need to go back and kind of fix it. And this could be for days, right? And for other jobs, I don't think you kind of feel that, right? It's it's it's a lot more procedural potentially for
60:30 - 61:00 other types of jobs. I'm not saying for every job. There are obviously jobs where there's a massive like problem solving component, but that just means that this this kind of you you do get a fatigue if you if you know at some point even the easy things just forcing you to do new easy things, it adds some amount of mental fatigue. And I think you now have a very powerful system that you now trust that should ideally reduce the this fatigue and and be able to do a lot of the things that are in the past high activation energy and do it really fast
61:00 - 61:30 for you. Yeah, this is this is really interesting because I I was just talking with a former colleague of mine who had a few months where he just wasn't producing much code. really good engineer, really solid and at the time uh I didn't know why and he didn't tell me and then he kind of you know snapped out of it but we're just talking he said like he said that actually he was at a really bad time in his life lots of stress in a relationship in at home with family all these things and he said that
61:30 - 62:00 he just realizes how mental game software engineering is he at work he just couldn't get himself to you know get into the zone we know how it is especially before AI tools and what you said I'm starting to get a bit of an appreciation and the fact that I I I I remember you know stressful but I I couldn't turn off like you go home you're having dinner you're still thinking about how you would change that or why it's not working it's like I I I don't think we'll be able to like you know go onwards but I think for for listeners it's it's worth
62:00 - 62:30 thinking about like how it a a how weird it is I think it's good to reflect on it because it is a unique it's it is for for so many jobs you can actually you know just put down your work and leave the office and you cannot continue and and that that's it and cannot even think about it because all your work is there and also like how these tools might just change it for the better in many ways and and maybe just in in weird ways that we don't expect in in others. No, I think you're totally this idea of I I think this is why like finding amazing software engineers is a very it's rare.
62:30 - 63:00 It's rare because these people are people that I guess have gone through this and are willing to put themselves through the idea of like, hey, all of the learnings that I had from like the lowest level to the highest level and then willing to go to the go down to the weeds to to to kind of make sure you solve the problem is it's a rare skill. It's that, you know, you would imagine, hey, this is something that everyone would be able to do, but it it like takes a lot of dedication and as you as you pointed out, it's like this for an activity that is that is not a very normal activity.
63:00 - 63:30 Yeah. Well, going back to engineering challenges and decisions, one super interesting thing that I've been dying to ask you is you did mention in the beginning that you know like it's you when you started Windsurf you you realized like Visual Studio Code is just it's not there where it should be. However, you started by forking Visual Studio Code, right? Do I know that right? That's exactly right. C can you tell me the pros and cons of of doing this as opposed to like building your own own editor? And I'm aware that there
63:30 - 64:00 are some downsides of doing there's there's some licensing things. So that's one part of the question. The second part of the question like why did you think that forking is the right move to build a much better much more capable thing of whatever Visual Studio was back VS Code was back in the day. Yeah. So just maybe some clarifications just on terminology. VS Code is like is a is a an like a a product that is built on top of code OSS which is the ultimate which is the the basically the the open source
64:00 - 64:30 project. I did not know that. Yeah. So because VS code has proprietary pieces on top of the open source on top of the open source. I I do know that and a lot of people don't know that actually. Yeah. Exactly. So, so I guess one of the things that we actually did was we wanted to make sure we did this right. And what I mean by that is when we actually built our product, we did fork code OSS, but we did not support any of the proprietary pieces that Microsoft had. And we never actually provided support uh for those uh not through a
64:30 - 65:00 marketplace or anything. We actually use an open marketplace that is it is completely fine. And this, by the way, this forced us to actually have to build out a lot of extensions that people needed and bake it into the product. I'll give you an example. For Python language servers, we actually now we have our own version, right? For remote SSH, we have our own version. For dev containers, we have our own version. So, this actually forced us to get a lot tighter on what we need to do. And we never took I guess a shortcut of hey, let's go out and do something that we shouldn't be doing. Um because hey, we
65:00 - 65:30 work with real companies. We work with real developers and why should we be putting them in that position, right? I guess we we kind of took that position and um so so that was like that was the positioning that was the positioning we had. Obviously there were some complexities but this this just caused us more engineering effort before we launched the product right we did launch the product with an ability to connect to remote SSH and do all this other stuff and we did have like internal engineering effort to actually go out and and do that. Um now the question might be why even fork VS code or test
65:30 - 66:00 in the first place. I think it's because it's a very it's a very um well-known ID where people have workflows. Um there are also many extensions there that people rely on uh that are extremely popular, right? An ID is not just uh like I guess the the the the place where you write software. It's also the place where you attach a debugger and do all these other operations. And we didn't want to reinvent the wheel on that. We didn't think we were better than I guess the entire open source community uh on
66:00 - 66:30 that, right? um in terms of all the ways you could use the product and I'll give you an example of of of maybe how we're trying to be pragmatic here. We didn't go out and try to replace Jetrains with this product. We actually put all the capabilities of Windsurf into Jetrains in what's called a Windinssurf plug-in. And this is where our goal is to meet developers where they are. And meeting VS Code developers where they are means we should give them a familiar experience. Meeting Jetrains developers means we should give them a familiar experience which is actually use Jetrains. Now a question might be why
66:30 - 67:00 didn't we fork jeopardize and the answer is two reasons. First of all we can't it's it's it's close to us. Second of all, the answer is actually because Jepperins is actually a fantastic IDE for Java developers and and and and in a lot of cases C++ and Python developers in so far as PHP as well, PHP Storm if you ever That's exactly right. They have one for almost a every single language for every single language. And the reason is because they have great debuggers, great language servers that I actually think are not even present on
67:00 - 67:30 VS Code right now. If you are a great Java developer, most of them and probably 80 plus% right now use intelligent in the market. So the question there is like I think as a company our goal is not to be dogmatic. Our goal is to build the best technology and provide it and democratize it and provide it to as many developers as possible. No, I love it. And and this is actually I I was talking with one of your your software engineers who did mention an interesting challenge because of just this the fact that you do have a Jet Brains plugin and then you have the ID and now you're apparently you're
67:30 - 68:00 sharing some binaries between the two. Can you talk a little bit about that engineering? Yeah, so this was actually an engineering decision we needed to make a couple months into starting working on podium which is that hey we're going to go out and build a VS Code extension. That's what we started out with. But very quickly like the next step is let's go implement it in jeopard. The problem is if we need to duplicate all the code it's going to be really really annoying for us to support all this. So what we decided to do is actually go out and build almost the shared binary between both that we call a language server that actually does the
68:00 - 68:30 heavy lifting. So the the goal there is hopefully we're not just duplicating the work in a bunch of places and this enables us to support many many ideas from an architecture standpoint. Um and that's why we were able to support not just jet brains, Eclipse, you know, Vim, all of these other IDs that people would, you know, uh that are that are that are popular um without much lift. Okay, I need to ask you about MCP. You have started to support it, which is really cool. I I play around with it and I think it's a good first step. What is
68:30 - 69:00 your take on MCP, especially with uh with the security uh worries and and also where do you see MCP going right now? Now I think it's a bit of an open book but uh you you are probably a bit more exposed to this than most listeners will be. You know I think it's I think it's very exciting. Um I have some maybe maybe one concern but let me start with the exciting part. I think the exciting part is now it's it's democratizing access to everything inside a company or everything a user would want um within their coding environment for our product
69:00 - 69:30 in particular. Obviously there are other products maybe it can help you buy buy goods and grocery and stuff like that. Obviously, we're not that interested in that case, but but um but that is that is nice. Um one of the other things that it lets companies do is they can implement their own MCP servers with security guarantees, which is to say they can implement a a uh battle tested MCP server that talks to an internal service that actually does off and all these other things uh for the end user and they can they can own the implementation of that. So there's a way for companies now to to enable us to use
69:30 - 70:00 to to interact with their internal services in a secure way. But you're but you're totally right like there could be a slippery slope where this where this causes everyone to have immediate access to everything in a right based fashion that could have negative consequences. But the thing I'm like I'm particularly maybe a little bit uh you know worried about and it's not worry it's more so like the paradigm itself is is MCP the right work uh like right way of encapsulating talking to other systems
70:00 - 70:30 or is it like like actual workflows of developers like going and interacting with these systems and I'll give you an example of that. One of the problems with MCP is it forces you to to hit a particular spec and you know this actually the best spec is flexibility. Yeah, right. It's it's flexibility and you know if you ask these systems now to integrate with another like you ask an LM like a GPT41 or a Sonnet hey you know build an integration to this system to a notion it will do it zero shot now. Yep. So you could build an MCP server that is
70:30 - 71:00 particular that only lets you have access to two things in notion or the models themselves are capable of doing a lot and it's like how much do you want to constrain versus have freedom and then also there is the corresponding security issue too. Like look, it's awesome that we have access to it. Is this the final version? I don't know if that this is the final version. Yeah. Is it I'm going to rephrase it and let me know if if if you think I'm off, but when you set up, for example, you know, I'm I'm building a web project. I'm using Node and I have I
71:00 - 71:30 have my my packages JSON that that specify what packages I'm going to use. Now on my machine I will have a lot of packages installed but for each specific project I'm going to be very clear of what I want to use what package maybe a subset of it and you know like I right now it feels to me that the current version of MCP it just lets me connect everything and I I can't really you know say that for example on this project like I actually want you to only talk to this table in my database. I don't want you to to access all the other stuff cuz
71:30 - 72:00 it's just this it's a proud database and I have a test table there. That that that kind of stuff, right? Like are are we talking about this like granularity and and figuring out what would actually help me as an engineer be productive? No, it's an interesting point. So like you're totally right. You want these systems to have access to a lot of things so that you can get be productive. All the while you want to be imperative uh and and very instructive on on on how on what systems they should
72:00 - 72:30 have access to internally. But the problem is people are very I'm not going to say lazy but it is annoying if you have 50 services and you need to tell it you need to do this, you need to do that, you need to do this. And what can very quickly happen is people don't and they get like mixed results or it has like negative consequences. So look, I think we're figuring this out. I think the whole industry is kind of figuring this out what the right model is and maybe it actually is a lot of engineering that needs to get done post the MCP server which is to say the MCP server provides a very free flowing
72:30 - 73:00 interface but there's a lot of like kind of understanding of the server to who the user is what service they're trying to touch what codebase they're in and there's like proper access controls that are implemented you know afterwards that helps you kind of like do that I I I'm I'm thinking these languages are not really popular but When I when I started programming, I used C# and in C for the classes, you had keywords. You know, you have classes, but you couldn't just access them. You had you had public classes which everyone can access. You had protected classes. You actually had internal classes that were inside the
73:00 - 73:30 module. You had private classes which were not accessible unless you were a child class. And these were just keywords of how what module can access what parts of your code inside the codebase. And we back then, this was like the 2000s, we spent a lot of care deciding who can access what and how. Even though technically you could have just everyone could have talked with everyone, but we we decided this was, you know, evolution of a few decades that it wasn't a good idea. So, I'm wondering if we're going to get there. For example, with MCP, we might remend some parts of it because that didn't
73:30 - 74:00 come up because like, you know, like someone thought it was like just lick their finger. It was because we needed it to organize large amounts of code back then when we didn't have the tools that we have today. No, I think you're right. I think some primitives are missing right now for sure. It's too free form right now. It's it's going to be super exciting though because we are seeing it it that it is going somewhere. Maybe MCP, maybe not. And we're in the middle of it, you know. Who knows? Some some people listening to it might actually influence the direction of this
74:00 - 74:30 new thing that we're going to use in like five years from now. It's awesome. Yeah. What is your take on this uh 70 30% of mental model for AI tools? This is something that comes up every now and then especially with with folks who are less technical that today they can you know prompt AI tools u from windsurf to lovable and others of like hey generate this idea that I have and they do a good job of the the you know the oneshot or or the the the tweaking and then the last 30% especially when they're not experienced software engineers they just get a little stuck or hopelessly stuck.
74:30 - 75:00 Uh do you observe this with uh with with Windsor fusers or this is not really a thing when when like people are pretty technical and and developers? Yeah, I think we do have non-developers that use the product and I do think the level of frustration for them and by the way my viewpoint on this is not like just let them be frustrated. It's I would love to help them but the level of frustration when they get it when they have a problem is much higher. And the reason is because for for you and I when we go and and use this and it gets into this
75:00 - 75:30 degenerate state where it goes out and it tries to make a change and it does a series of changes that doesn't make sense. Our first instinct is don't just like do it 10 more times when five times it didn't work. It's probably like look at the code and see what step didn't work and revert back to the step that works. Right? Like debugging principles because but that's by the way the reason why we do that is we understand the code. Yeah. We can like go back into the code and kind of understand it. But you're you're right that for developers that can't it's they're kind of in a state of helplessness. Um and I I I deeply empathize with that. And it's
75:30 - 76:00 like it's our job to figure out ways that we can make that a lot better. Now granted, right, does that mean we make our product completely catered to non-developers? No, that's actually not what we do. Are there principles from that that we can take that help both both groups? Right? Because I think for us, we do want to get to a state where these systems can be more and more autonomous, right? a real developer needs to go out and needs to fix these issues all the time when they prompt it. It also just means we're getting we're being autonomous as well.
76:00 - 76:30 But but I I do think as as an industry and and this is you know there's a engineers who like the coders and then the non-coders there is a question that needs to be asked of do do we eventually need to understand what the code does? Do you need to be able to read the code? Because for example when I was at university we studied assembly. Now I I never really programmed assembly beyond the the class but I have since came across assembly code and I'm not afraid to look at it. Now again, I'm not saying I'm the expert, but you can go all the
76:30 - 77:00 way down to the stack. And I think there is something to be said that, you know, we're now adding a new level abstraction that as a professional, it will always be helpful to be able to look through the stack, you know, sometimes uh all the way to networking logs or or or the packet, not often, but just knowing where to look and and eventually where where to go. So, this might be more of a philosophical question because I think a lot of people don't want they just think, okay, we can just use English for everything. But it it does trans translate into a level which is programming languages translates into the next level and so on. I think you're
77:00 - 77:30 you're right. So here's my take on it. We are going to have a proliferation of software. Some of the software will be built by people that don't know code, right? I think I it feels simplistic to say that that is not going to happen, right? And we're already seeing it play out in real time. But here's the thing. It's almost like when you think about the best developer that you know, even if they're a full-sight developer, they probably understand when the product is slow, it's because there's some issue with the way that this interacts with the operating system. If there's some
77:30 - 78:00 issue with the way that this interacts with the networking stack, it's the ability for this person to kind of peel back layers of abstraction to get to ground truth. That is what makes a great developer a great developer. And these people are more powerful. They're more powerful in any organization. you know that you can take these people and put them on any project and it's just going to be a lot more successful with them. Yeah. And I think the same thing is going to happen which is that some set of projects it is going to be fine if the level of abstraction you deal with is the final application plus English and a spec right for some other set of
78:00 - 78:30 applications it's you know actually a developer will go in but there's going to be some gnarly natures right it's going to interact with the database it's going to have some it's going to have like performance related issues and you're going to have an expectation that the AI and the human can go down the stack and the human can reason about this y and I think these people are always going to be really valuable. similar to how I think actually like our best engineers can if I ask them to go and look at the object dump of of a of like a like C++ program and actually understand hey actually you know here's
78:30 - 79:00 a place where we're here's a function here's a place where we're seeing a massive amount of like contention uh and we need to go out and fix this right and if if if the developer didn't understand the the sort of fundamentals uh they would be much worse at our company because of that yeah I I wonder if an analogy might be that a car mechanic, you know, car mechanics evolve over time. Like my dad used to we used to have like these old school cars where he he would take apart the the engine. He would take the whole thing apart and then put it back together over a weekend
79:00 - 79:30 like all the parts lay. I remember and of course by the time you know I got to owning a car I can I could change the oil and now I have an electric car which which is you know like there there's not as many moving parts. However, someone who understands how cars work, how they're built, how they evolved, they will always be more in demand for special cases. For example, I I just had my 12vt battery die in my electric car. I had no idea there was a 12vt battery, but apparently I talked with someone who, you know, is in this and like, yeah, it's from the gas cars and this is why and this is the reason and this is
79:30 - 80:00 how the new version will evolve. So like and clearly we we we will the majority of people might not need it eventually but there is that expertise plus these are the people who understand everything who will often take innovation forward because they understand what came before and they understand what needs to come. You're totally right. Well, maybe one other thing that I I would want to add to what you what you basically said is when you look at what great computer scientists and software engineers do. I think there are great problem solvers
80:00 - 80:30 given understanding sort of a highle sort of business case or what the what the company really wants to do and there are people that can distill it down and I think that skill is actually what I think boils down to when you meet great engineers. It's not just like you tell them about a feature. You tell them about an issue, a desired outcome, and they will go out and find any way possible to go out and get get to that. I think that's what great engineers are. They're problem solvers. I think that's always going to be in demand. Now, is the person that builds the most
80:30 - 81:00 boilerplate website and that is the only thing they are excited to do in the future, that person's skill set is going to be depreciating with time. I think that's a but that's a simplistic way of looking at it because you know if they were a software engineer they should know how to reason about systems. They should be good problem solvers. I think that's the hallmark of of like software engineering as a whole and they will always have a position out there in in my opinion. Now since you started to build windsurf or even kodium how has your view changed on the future of
81:00 - 81:30 software engineering and we we've touched on a few things but but like h have there been some things like before and after now you're thinking about things differently you know I think that timelines for a lot of things I'm like less scared of them even though like I think a lot of them are supposed to come like come out like very like as scary numbers you know I think recently Dario from anthropic was 90% of all committed code is going to be AI generated I think the answer to that is going to be yes. And my question after that is so what? Uh like so what if that's the case? Developers don't
81:30 - 82:00 only spend time writing code, right? I think there's there's this fear that comes from all this stuff. I think I think AI systems are going to get smarter and smarter very quickly. But look what I when I think about what engineers love doing, I think they love solving problems, right? They love collaborating with their peers to find out how to make solutions that work. And I think when I look at the future, it's more like things are going to improve very quickly, but I think people are going to be able to focus on the things that they really want to do when they're developers, not like the nitty-gritty
82:00 - 82:30 details that, as you said, you go home and you're like, I don't know why this doesn't compile. I think that will a lot of those small details for most people are going to be a relic of the past. Well, I'll I'll tell you. I'll give the editor side of what why people are are stressed, you know, like because and they're going to say, you know, some listeners will say like, well, you're in an easy position because you're in the middle of an AI company. You're building all these tools, which is the future, right? Like, and you're going to be fine for the next few years. And they're thinking, I'm sitting at a B2B SAS company, uh, where like I'm I'm building
82:30 - 83:00 this software, and my employer is thinking that these things make us 20% or 25% more efficient, and they're going to cut a quarter of the team. And I'm I'm worried a if it's going to be me, b the job market is not that great. And I get it that I can be more more productive with these things, but I still need to find a job. And that is the you know like not everyone will verbalize this, but this is the thing that gives people this is you know when they're hearing Dario talk about the 90% they're thinking oh damn my employer will say like okay Joe we don't need you anymore. Yeah. I the problem is I don't
83:00 - 83:30 know what like maybe this is like I don't know if this is like a real good answer but that feels like the employer is being like irrational because okay my let me let me provide let me provide the take here if the B2B SAS company that is not doing well needs to compete with other B2B SAS companies if they just if they reduce the number of engineers that they have they're basically saying their product is not going to improve that quickly compared to a competitor that is willing to hire engineers and improve their software much more quickly I do think consumers and just businesses are going to much higher expectations for
83:30 - 84:00 software. So the demand for software that I buy is way higher. Like I don't know if I've noticed this. I feel bad when I buy a piece of software that that looks like it did like you know a couple years ago that's like this ugly procurement software. Yeah. And these days you don't have a I hear you. I think I think I see your I see the short term of like like are there employers that look at this and they're like this is an opportunity to cut. I think these employers are being really really shortsighted. Yeah. And I I think I'm I'm getting a little bit of hope from even other other industries. There was a
84:00 - 84:30 time where people writers were being fired left and right. Like I'm not talking software writers but like just like old traditional writers. And now there's a big hiring spree from all sorts of companies of hiring writers because turns out the AI is kind of you know this a bit bland and a great writer with AI is way better than without. I think same same for software engineers. So that's also a bit of my message for anyone listening. But just good to hear from you. Exactly. When you have a competitive market and you add a lot of automation, automation is great, but what you actually need to compare is automation with a human and if that's
84:30 - 85:00 way more leveraged, then you actually should compete with that. That's like the game theoretically optimal thing to do. And actually that that's the tool that you're building right now, which which I think is one of the reasons that it's it like a reason I like to use it. It doesn't feel that it's like trying to do anything instead of me doing it with me and making me way more efficient as an engineer. So to to wrap up, I just have some rapid questions. I'm just going to ask them and then you can shoot the answer. So, I've heard that you are really into endurance sports, long-distance running, cycling, and you
85:00 - 85:30 do just a lot of it. Now, a lot of people are are thinking, well, I'm pretty busy with my job with with coding, etc. I don't have as much time for sports. How do you make time for sports? And what would your advice be for someone who is like actually want to get in a lot better shape while being a software engineer and busy with with your work? So, I will say this, like since the company that has gone down drastically, but at my previous company, um I I still worked a ton. I worked at an autonomous vehicle company. I would bike over like 150 miles uh a week, like rigorously, like probably close to 160,
85:30 - 86:00 170. Um I think it's just interestingly, it's I for for an activity like this, I actually got Zift, so like this way to bike indoors. Uh and I would just be able to knock out like 20 to 25 miles in an hour uh like at home. Uh, and and the benefit there is like now I can come back from work very quickly, do a ride, and then, you know, on the weekends on a Saturday, I would just dedicate being able to do potentially like 70 a 70mi loop uh somewhere. One of the lucky things for for me is I'm in the Bay
86:00 - 86:30 Area, so there's a lot of like amazing places to ride a bike um like that have hills and stuff like that. So, I think it's easy to carve out this time, but you kind of, you know, you need to make the friction for yourself a lot lower, right? I think if I needed to, I I would never go to a gym like rigorously. I think I'm I'm not the type of person that would like, you know, I would I would just find a way to not do it, but if it's literally at home right next to where I sleep, I'm I'm going to find a way to do it. Sounds like just just make it work for you. Nice. And what's a book that you would recommend and why? You
86:30 - 87:00 know, there was a book that I read a long time ago that I really enjoyed. Uh it's called The Idea Factory. It's basically about how Bell Labs uh kind of like innovated so much while being a very commercial entity and it was very interesting to see some of like the great scientists of our time working at this company providing so much value. So like information theory Claude Shannon worked there right um the like the founding of the transistor happened uh sort of like Shockley and all these people kind of were were there too and just hearing how a company is able to
87:00 - 87:30 straddle the line between both uh was really exciting. Yeah, and I I hear that, you know, OpenAI got inspired by Bell Labs a lot. Their their titles are coming back and I think I actually I personally want to read more about that. So, thanks for the recommendation. Well, well, thank you. This was this was great. This was super interesting and just just love all the insights that you shared. Yeah, thanks a lot for having me. I hope you enjoyed this conversation with Veroon and the challenges that the Windsurf team is solving for. One of the things I enjoyed discussing was when Veroon shared how they have a bunch of features that just didn't work out, like
87:30 - 88:00 their review tool, and then they celebrate failure and just move on. I also found it fun to learn how any developer can roll out any feature they build to the whole company and get immediate feedback whether it's good or bad. For more deep dives on AI coding tools, check out the Pragmatic Engineer Deep Dives link in the show notes below. If you enjoy this podcast, please consider leaving a rating. This helps more listeners discover the podcast. Thanks and see you in the next one.