▲Windsurf SWE-1: Our First Frontier Modelswindsurf.com

190 points by arittr 42 days ago | 62 comments

resters 42 days ago [-]

A few points that are getting overlooked:

- OpenAI is buying WindSurf and probably did diligence on these models before it decided to invest.

- WindSurf may have collected valuable data from it users that is helpful in training a coding-focused AI model. The data would give a 6 month lead to OpenAI which is probably worth the $3B.

- Even if Windsurf's frontier models are not better than other models for coding, if they excel in a few key areas it would justify significant investment in their methodology (see points above).

- There are still areas of coding where even the top frontier models falter that would seemingly be ripe for improvement via more careful training. Notably, making the model better at working within a particular framework and version, programming language version, etc. Also better support for more obscure languages and libraries/versions and the ability to "lock in" on the versions that the developer is using. I've wasted a lot of time trying to convince OpenAI models to use OpenAI's latest Python API -- even when given docs and explicit constraints to use the new API, OpenAI frontier models routinely (incorrectly) update my code to use old API conventions and even methods that have been removed!

Consider that the basic competency of doing a frontier coding model well is likely one of the biggest opportunities in AI right now (second to reasoning and in my opinion tied with image analysis and production). An LLM that can both reason and code accurately could read a chapter in a textbook and code a 3D animation illustrating all of the concepts as a one-shot exercise. We are far from that at present even in OpenAI's best stuff.

libraryofbabel 42 days ago [-]

Thanks - this does help contextualize the $3B acquisition. When the story first broke all they seemed to be paying for was a coding agent (of which there are sooo many out there) and the large windsurf user base (but with no moat). So a lot of us were rather skeptical. The valuation is still kinda insane, I think, but Windsurf’s ability to train a frontier model - and with a much smaller team than the big AI shops - is the key differentiator from the Clines, Cursors, Aiders etc.

It is a bit of a shame that we’ll never get to see what they could do on their own. But I hope their clearly very talented employees do very well out of this.

resters 41 days ago [-]

> Thanks - this does help contextualize the $3B acquisition.

Agreed. My initial reaction to the $3B acquisition was similar to yours. Seeing this announcement made me rethink it a bit.

keeganpoppen 42 days ago [-]

this is clearly the right take… it’s fun to semi-dunk on “how on earth is that the valuation”, but this is one of those rare cases where the tech and platform are genuinely more valuable in the hands of the acquirer than they ever could be in the hands of the acquiree. because i think windsurf has executed as well as one possibly could in the space, but openai is the SOTA model king, and i don’t see that changing anytime soon.

dghlsakjg 41 days ago [-]

Minor nit: OpenAI is in a three way tie for SOTA models with Google and Anthropic. They are the king of marketing attention, studio Ghibli imitation, and consumer subscriptions, though.

paulddraper 39 days ago [-]

First mover too

antirez 42 days ago [-]

So because they need to have a better business model, they will try to move users to weaker models compared to the best available? This "AI inside the editor" thing makes every day less sense in many dimensions: it makes you not really capable of escaping the accept, accept, accept trap. It makes the design interaction with the LLM too much about code and too little about the design itself. And you can't do what many of us do: have that three subscriptions for the top LLMs available (it's 60$ for 3, after all) and use each for it's best. And by default write your stuff without help if LLMs are not needed in a given moment.

infecto 42 days ago [-]

You’ve got a couple of ideas colliding here, let me try to unpack them.

First, most of the major players already have their own models or have been developing them for some time. Your take feels a bit reductive. Take Windsurf pre-acquisition, for example, their risk was being too tightly coupled to third-party vendors. It’s only logical to assume that building task- or language-specific models will ultimately help reduce costs and offer more control.

As for the other point: in my experience, trying to fully leverage LLMs actually makes me more prescriptive in my designs. I spend more time thinking through architecture and making my code modular, more so than when I wasn’t using an LLM. I’m sure others may design less or take shortcuts, but for me it’s pushed the opposite behavior. Is it the “right” way? I’m not sure, but I’m enjoying it and staying productive.

phillipcarter 42 days ago [-]

I think the point is that the UX favors accepting code changes as the primary action, rather than using the chat interface as an ideation tool. It's quite valid, because as a user of all these tools, Winsurf and Cursor very much do try to make you slap the Accept button uncritically!

infecto 42 days ago [-]

Does it though? I use the chat option quite a bit in the tools. The only UX that favors accept pattern is tab which makes sense.

phillipcarter 41 days ago [-]

It does. Defaults matter, and the defaults for these tools are agent mode with code changes meant to be accepted, rather than forcing you to read the code and manually apply those changes.

Note: I'm not saying that's a bad thing! It's significantly more convenient for many use cases, so I can see why it's a default. But the incentive being created is to accept first, analyze later.

ipnon 42 days ago [-]

I don't think they are targeting software engineers as users. They are seeking those on the software engineering margins, users who know what Python and for-loops are but don't care to configure Aider and review each of the overwhelming number of models released daily. They want to tell the editor to add function foo to bar.py. I suspect this latter market segment is much larger than the former!

stevenally 36 days ago [-]

When I got my first job in 1986, the company had a tool that allowed non engineers to write code. Of course it didn't work. They could write code, but it ended up as a buggy, unreliable, unmaintainable mess. It turned out it was a good sales tool, get our technology into the company, then we would get paid to write the programs.

Then the were the the MS Access and Excel amateur efforts. I worked at a company that for years had a very profitable business replacing in house MS Access spaghetti with our well designed application.

Aaaand..... here we go.... deja vu all over again....

bluelightning2k 42 days ago [-]

I don't like or agree with this take. You're basically saying - "something good exists, so why try to improve upon it".

Their stated goal is to improve on the frontier models. It's ambitious, but on the other hand they were a model company before they were an IDE company (IIRC) and they have a lot of data, and the scope is to make a model which is specialized for their specific case.

At the very least I would expect they would succeed in specializing a fronteir model for their use-case by feeding their pipeline of data (whether they should have that data to begin with is another question).

The blog post doesn't say much about the model itself, but there's a few candidates to fine tune from.

42 days ago [-]

vunderba 41 days ago [-]

> they will try to move users to weaker models compared to the best available

> you can't do what many of us do: have three subscriptions and use each for its best

I don't think has anything to do with whether or not AI is in the editor so much as it is the difference between a subscription (Cursor) vs. a BYOK approach (VS Codium + Cline, Zed, etc). Most BYOK plug-ins will let you set up multiple profiles against various providers so that you can choose the most optimal LLM for the given problem you're trying to solve.

keeganpoppen 42 days ago [-]

i think this comment is just a reflection of how the world has not caught up with the inevitable shift of “software engineering” up further into “idea space”. i completely agree that the tooling has not caught up with this new world order yet. personally, i think “true software engineering” is more valuable than ever in the AI era, but the tools for actually realizing this are woefully behind.

bhl 42 days ago [-]

Slightly weaker, but cheaper models mostly good for Windsurf only. As a developer, I would rather have stronger models I can throw more money at.

visarga 42 days ago [-]

> it makes you not really capable of escaping the accept, accept, accept trap

The definition of vibe coding - trust the process, let it make errors and recover

conartist6 42 days ago [-]

"press pay to think for me button" "press pay to think for me button" "press pay to think for me button" "press pay to think for me button" "press pay to think for me button" I love it

DrBenCarson 42 days ago [-]

“Hmm seems we’re very far off course but we have thousands of lines…I can’t figure all that out rn…press magic thinking button”

firejake308 42 days ago [-]

I'm confused why they are working on their own frontier models if they are going to be bought by OpenAI anyway. I guess this is something they were working on before the announcement?

allenleein 42 days ago [-]

It seems OpenAI acquired Windsurf but is letting it operate independently, keeping its own brand and developing its own coding models. That way, if Windsurf runs into technical problems, the backlash lands on Windsurf—not OpenAI. It’s a smart way to innovate while keeping the main brand safe.

riffraff 42 days ago [-]

But doesn't this mean they have twice the costs in training? I was under the impression that was still the most expensive part of these companies' balance.

kcorbitt 42 days ago [-]

It's very unlikely that they're doing their own pre-training, which is the longest and most expensive part of creating a frontier model (if they were, they'd likely brag about it).

Most likely they built this as a post-train of an open model that is already strong on coding like Qwen 2.5.

rfoo 42 days ago [-]

mid/post training does not cost that much, except maybe large scale RL, but even this is more of an infra problem. If anything, the cost is mostly in running various experiments (i.e. the process of doing research).

It is very puzzling why "wrapper" companies don't (and religiously say they won't ever) do something on this front. The only barrier is talents.

anshumankmr 42 days ago [-]

You might be underestimating the barrier to hiring the really smart people. Open AI/Google etc would be hiring and poaching people like crazy, offering cushy bonuses and TCs that would make blow your mind.(Like say Noam Brown at Open AI) And some of the more ambitious ones would start their own ventures (like say Ilya etc.).

That being said I am sure a lot of the so called wrapper companies are paying insanely well too, but competing with FAANGMULA might be trickier for them.

whywhywhywhy 42 days ago [-]

Any half decent and methodical software engineer can fine tune/repurpose a model if you have the data and the money to burn on compute and experiment runs, which they do.

anshumankmr 42 days ago [-]

Fine tuning/distilling etc is fine. I was speaking to the original commenter's question about research, which is where things are trickier. Fine tuning is something I even managed and Unsloth has removed even barriers for training some of the more commonly used open source models.

brookst 42 days ago [-]

They can absolutely do it, but they will get poorer results than someone who really understands LLMs. There is still a huge amount of taste and art in the sourcing and curation of data for fine tuning.

NitpickLawyer 42 days ago [-]

FAANGMULA ... Microsoft, Uber?, L??, Anthropic? Who's the L?

riffraff 42 days ago [-]

A is Airbnb, afair.

Archonical 42 days ago [-]

Lyft.

sunshinekitty 42 days ago [-]

This is an incredibly premature statement to make. The acquisition announcement is days old.

OtherShrezzing 42 days ago [-]

This is effectively how Microsoft is treating OpenAI.

ActionHank 42 days ago [-]

Windsurf is a hedge against MS + VSCode and GH + copilot.

OAI is trying frantically to build a moat without doing any digging.

jstummbillig 42 days ago [-]

Why would OpenAI not let smart people work on models? That seems to be what they do. The point is: They are no longer "their own" models. They are now OpenAI models. If they suck, if they are redundant, if there is no idea there that makes sense, that effort will not continue indefinitely.

seunosewa 42 days ago [-]

They were working on the model before the acquisition. It makes sense to test it and see how it does instead of throwing the work away. Their data will probably be used to improve gpt-4.1, o4 mini high, and other OpenAI coding models

kristopolous 42 days ago [-]

Must have been. These things take months.

dyl000 42 days ago [-]

openAI models have an issue where they are pretty good at everything but not incredible at anything. They're too well rounded.

for coding you use anthropic or google models, I haven't found anyone who swears by openAI models for coding... Their reasoning models are either too expensive or hallucinate massively to the point of being useless... I would assume the gpt 4.1 family will be popular for SWE's

Having a smaller scope model (agentic coding only) allows for much cheaper inference and windsurf building its own moat (so far agentic IDE's haven't had a moat)

jjani 42 days ago [-]

> openAI models have an issue where they are pretty good at everything but not incredible at anything. They're too well rounded.

This suggests OpenAI models do have tasks they're better at than the "less rounded" competition, who have taks they're weaker in. Could you name a single sucg task (except for image generation, which is an entirely different usecase), that OpenAI models are better at than Gemini 2.5 and Claude 3.7 without costing at least 5x as much?

anshumankmr 42 days ago [-]

Getting more money perhaps also, if they believed their model to be good, and had amassed some good training data Open AI can leverage, apart from the user base.

blixt 42 days ago [-]

> Enabled from the insight from our heavily-used Windsurf Editor, we got to work building a completely new data model (the shared timeline) and a training recipe that encapsulates incomplete states, long-running tasks, and multiple surfaces.

This data is very valuable if you're trying to create fully automated SWEs, while most foundation model providers have probably been scraping together second hand data to simulate long horizon engineering work. Cursor probably has way more of this data, and I wonder how Microsoft's own Copilot is doing (and how they share this data with the foundation model providers)...

whywhywhywhy 42 days ago [-]

There is a world where the wrapper makers surpass the current model makers in their area of focus. Cursor/Windsurf have all the data on when people got so frustrated with Claude they switched to Gemini/GPT and also all the data of when the problem was actually solved and when it wasn't.

lemming 42 days ago [-]

The company that is best placed to collect tons of high quality data of this type is undoubtedly Google. They’ve had publications talking about how they capture data from their in house SWE tools and use it to improve their tooling.

blixt 42 days ago [-]

They certainly can automate their own SWE but I wonder if that’s as good as getting full computer use logs (terminal, web browsing, code acceptance/rejection, etc etc — as claimed in the linked article) from millions of individuals and thousands of companies all with their quirky technology setups.

throwaway314155 41 days ago [-]

This summarizes Google's approach to software engineering well; just pretend the outside world doesn't exist and the "Google way" is the only way.

figassis 42 days ago [-]

And is probably why OpenAI paid $$$ to acquire

bluelightning2k 42 days ago [-]

Two takes here. Cynical and optimistic.

Cynical take: describing yourself as a full stack AI IDE company sounds very invest-able in a "what if they're right" kind of way. They could plausibly ask for higher valuations, etc.

Optimistic take: fine tuning a model for their use-case (incomplete code snippets with a very specific data model of context) should work. Or even has from their claims. It certainly sounds plausible that fine-tuning a frontier model would make it better for their needs. Whether it's reasonable to go beyond fine-tuning and consider pre-training etc. I don't know. If I remember correctly they were a model company before Windsurf, so they have the skillset.

Bonus take: doesn't this mean they're basically training on large-scale gathered user data?

heymijo 42 days ago [-]

FYI, OpenAI acquired Windsurf so valuation is not an issue.

I don’t know Varun (their founder/CEO) personally but I get highly competent vibes from him. I’d let my skeptical self lean on your optimistic take.

OkGoDoIt 42 days ago [-]

I don’t think the acquisition has closed yet, maybe this is still useful for a leverage/negotiating perspective. And it was almost certainly something they were working on before the acquisition anyway.

I do think that’s an overly cynical way to look at this though.

dyl000 42 days ago [-]

it was only a matter of time, they have too much good data to not train their own models, not to mention that claude API calls were probably killing their profitability.

open source alternative https://huggingface.co/SWE-bench/SWE-agent-LM-32B

though I haven't been able to find a mlx quant that wasn't completely broken.

aquir 42 days ago [-]

It's a shame that my development work needs a specific VSCode extension (domain specific language for ERP systems) so my options are VSCode+Copilot or Cursor.

DrBenCarson 42 days ago [-]

You can try Cline is VSCode as well, many engineers swear by it

bicepjai 41 days ago [-]

My favorite tools are cline and roo. My experience says cline eats tokens like crazy and roo eats less. I don’t try Aider since I do like to watch the mesmerizing diffs (fire verification) :)

aitchnyu 42 days ago [-]

Aider runs in your terminal and you can make comments against your code in any editor and it will execute your requests. It can use any model. CLine, mentioned in sibling comment is is same space.

tintor 41 days ago [-]

Aider wastes tokens like crazy.

aitchnyu 39 days ago [-]

In which cases?

albertot 42 days ago [-]

you can use the codeium extension I believe no? Also I think that if the license of the extension that you are using permits it you could export that extension to the open source store

TiredOfLife 41 days ago [-]

Windsurf is also a VS Code fork like Cursor

knes 41 days ago [-]

Check augmentcode.com

tianshuo 39 days ago [-]

Sorry, but as a paid Windsurf user, I think Windsurf should stop chasing shiny frontier models and focus on building better predictable & manageable workflows to build real-life products. - How about providing a Jira/Trello-style dashboard with subtasks for our AI, instead of copy-pasting "Cline Memory Bank" to .windsurfrules? - How about supporting TDD and regression-fixing by default? - How about using git with branches instead of the current undo-redo system? - How about a better way of syncing documentation vs real code?

We are paying for more "manageable" AI agents to get stuff done, not a chaotic "genius-hacker" to hack together quick prototypes.

infecto 42 days ago [-]

Can we get arm Linux builds? Would be really nice!

Loading comments...

resters 42 days ago [-]

A few points that are getting overlooked:

- OpenAI is buying WindSurf and probably did diligence on these models before it decided to invest.

- WindSurf may have collected valuable data from it users that is helpful in training a coding-focused AI model. The data would give a 6 month lead to OpenAI which is probably worth the $3B.

- Even if Windsurf's frontier models are not better than other models for coding, if they excel in a few key areas it would justify significant investment in their methodology (see points above).

libraryofbabel 42 days ago [-]

It is a bit of a shame that we’ll never get to see what they could do on their own. But I hope their clearly very talented employees do very well out of this.

resters 41 days ago [-]

> Thanks - this does help contextualize the $3B acquisition.

Agreed. My initial reaction to the $3B acquisition was similar to yours. Seeing this announcement made me rethink it a bit.

keeganpoppen 42 days ago [-]

dghlsakjg 41 days ago [-]

Minor nit: OpenAI is in a three way tie for SOTA models with Google and Anthropic. They are the king of marketing attention, studio Ghibli imitation, and consumer subscriptions, though.

paulddraper 39 days ago [-]

First mover too

antirez 42 days ago [-]

infecto 42 days ago [-]

You’ve got a couple of ideas colliding here, let me try to unpack them.

phillipcarter 42 days ago [-]

infecto 42 days ago [-]

Does it though? I use the chat option quite a bit in the tools. The only UX that favors accept pattern is tab which makes sense.

phillipcarter 41 days ago [-]

It does. Defaults matter, and the defaults for these tools are agent mode with code changes meant to be accepted, rather than forcing you to read the code and manually apply those changes.

Note: I'm not saying that's a bad thing! It's significantly more convenient for many use cases, so I can see why it's a default. But the incentive being created is to accept first, analyze later.

ipnon 42 days ago [-]

stevenally 36 days ago [-]

Aaaand..... here we go.... deja vu all over again....

bluelightning2k 42 days ago [-]

I don't like or agree with this take. You're basically saying - "something good exists, so why try to improve upon it".

The blog post doesn't say much about the model itself, but there's a few candidates to fine tune from.

42 days ago [-]

vunderba 41 days ago [-]

> they will try to move users to weaker models compared to the best available

> you can't do what many of us do: have three subscriptions and use each for its best

keeganpoppen 42 days ago [-]

bhl 42 days ago [-]

Slightly weaker, but cheaper models mostly good for Windsurf only. As a developer, I would rather have stronger models I can throw more money at.

visarga 42 days ago [-]

> it makes you not really capable of escaping the accept, accept, accept trap

The definition of vibe coding - trust the process, let it make errors and recover

conartist6 42 days ago [-]

"press pay to think for me button" "press pay to think for me button" "press pay to think for me button" "press pay to think for me button" "press pay to think for me button" I love it

DrBenCarson 42 days ago [-]

“Hmm seems we’re very far off course but we have thousands of lines…I can’t figure all that out rn…press magic thinking button”

firejake308 42 days ago [-]

I'm confused why they are working on their own frontier models if they are going to be bought by OpenAI anyway. I guess this is something they were working on before the announcement?

allenleein 42 days ago [-]

riffraff 42 days ago [-]

But doesn't this mean they have twice the costs in training? I was under the impression that was still the most expensive part of these companies' balance.

kcorbitt 42 days ago [-]

It's very unlikely that they're doing their own pre-training, which is the longest and most expensive part of creating a frontier model (if they were, they'd likely brag about it).

Most likely they built this as a post-train of an open model that is already strong on coding like Qwen 2.5.

rfoo 42 days ago [-]

It is very puzzling why "wrapper" companies don't (and religiously say they won't ever) do something on this front. The only barrier is talents.

anshumankmr 42 days ago [-]

That being said I am sure a lot of the so called wrapper companies are paying insanely well too, but competing with FAANGMULA might be trickier for them.

whywhywhywhy 42 days ago [-]

Any half decent and methodical software engineer can fine tune/repurpose a model if you have the data and the money to burn on compute and experiment runs, which they do.

anshumankmr 42 days ago [-]

brookst 42 days ago [-]

NitpickLawyer 42 days ago [-]

FAANGMULA ... Microsoft, Uber?, L??, Anthropic? Who's the L?

riffraff 42 days ago [-]

A is Airbnb, afair.

Archonical 42 days ago [-]

Lyft.

sunshinekitty 42 days ago [-]

This is an incredibly premature statement to make. The acquisition announcement is days old.

OtherShrezzing 42 days ago [-]

This is effectively how Microsoft is treating OpenAI.

ActionHank 42 days ago [-]

Windsurf is a hedge against MS + VSCode and GH + copilot.

OAI is trying frantically to build a moat without doing any digging.

jstummbillig 42 days ago [-]

seunosewa 42 days ago [-]

kristopolous 42 days ago [-]

Must have been. These things take months.

dyl000 42 days ago [-]

openAI models have an issue where they are pretty good at everything but not incredible at anything. They're too well rounded.

Having a smaller scope model (agentic coding only) allows for much cheaper inference and windsurf building its own moat (so far agentic IDE's haven't had a moat)

jjani 42 days ago [-]

> openAI models have an issue where they are pretty good at everything but not incredible at anything. They're too well rounded.

anshumankmr 42 days ago [-]

Getting more money perhaps also, if they believed their model to be good, and had amassed some good training data Open AI can leverage, apart from the user base.

blixt 42 days ago [-]

whywhywhywhy 42 days ago [-]

lemming 42 days ago [-]

blixt 42 days ago [-]

throwaway314155 41 days ago [-]

This summarizes Google's approach to software engineering well; just pretend the outside world doesn't exist and the "Google way" is the only way.

figassis 42 days ago [-]

And is probably why OpenAI paid $$$ to acquire

bluelightning2k 42 days ago [-]

Two takes here. Cynical and optimistic.

Cynical take: describing yourself as a full stack AI IDE company sounds very invest-able in a "what if they're right" kind of way. They could plausibly ask for higher valuations, etc.

Bonus take: doesn't this mean they're basically training on large-scale gathered user data?

heymijo 42 days ago [-]

FYI, OpenAI acquired Windsurf so valuation is not an issue.

I don’t know Varun (their founder/CEO) personally but I get highly competent vibes from him. I’d let my skeptical self lean on your optimistic take.

OkGoDoIt 42 days ago [-]

I do think that’s an overly cynical way to look at this though.

dyl000 42 days ago [-]

it was only a matter of time, they have too much good data to not train their own models, not to mention that claude API calls were probably killing their profitability.

open source alternative https://huggingface.co/SWE-bench/SWE-agent-LM-32B

though I haven't been able to find a mlx quant that wasn't completely broken.

aquir 42 days ago [-]

It's a shame that my development work needs a specific VSCode extension (domain specific language for ERP systems) so my options are VSCode+Copilot or Cursor.

DrBenCarson 42 days ago [-]

You can try Cline is VSCode as well, many engineers swear by it

bicepjai 41 days ago [-]

My favorite tools are cline and roo. My experience says cline eats tokens like crazy and roo eats less. I don’t try Aider since I do like to watch the mesmerizing diffs (fire verification) :)

aitchnyu 42 days ago [-]

Aider runs in your terminal and you can make comments against your code in any editor and it will execute your requests. It can use any model. CLine, mentioned in sibling comment is is same space.

tintor 41 days ago [-]

Aider wastes tokens like crazy.

aitchnyu 39 days ago [-]

In which cases?

albertot 42 days ago [-]

you can use the codeium extension I believe no? Also I think that if the license of the extension that you are using permits it you could export that extension to the open source store

TiredOfLife 41 days ago [-]

Windsurf is also a VS Code fork like Cursor

knes 41 days ago [-]

Check augmentcode.com

tianshuo 39 days ago [-]

We are paying for more "manageable" AI agents to get stuff done, not a chaotic "genius-hacker" to hack together quick prototypes.

infecto 42 days ago [-]

Can we get arm Linux builds? Would be really nice!