Grok are the first models I am boycotting on purely environmental grounds. They built their datacenter without sufficient local power supply and have been illegally powering it with unpermitted gas turbine generators until that capacity gets built, to the significant detriment of the local population.
Tested this yesterday with Cline. It's fast, works well with agentic flows, and produces decent code. No idea why this thread is so negative (also got flagged while I was typing this?) but it's a decent model. I'd say it's at or above gpt5-mini level, which is awesome in my book (I've been maining gpt5-mini for a few weeks now, does the job on a budget).
Things I noted:
- It's fast. I tested it in EU tz, so ymmv
- It does agentic in an interesting way. Instead of editing a file whole or in many places, it does many small passes.
- Had a feature take ~110k tokens (parsing html w/ bs4). Still finished the task. Didn't notice any problems at high context.
- When things didn't work first try, it created a new file to test, did all the mocking / testing there, and then once it worked edited the main module file. Nice. GPT5-mini would often times edit working files, and then get confused and fail the task.
All in all, not bad. At the price point it's at, I could see it as a daily driver. Even agentic stuff w/ opus + gpt5 high as planners and this thing as an implementer. It's fast enough that it might be worth setting it up in parallel and basically replicate pass@x from research.
IMO it's good to have options at every level. Having many providers fight for the market is good, it keeps them on their toes, and brings prices down. GPT5-mini is at 2$/MTok, this is at 1.5$/MTok. This is basically "free", in the great scheme of things. I ndon't get the negativity.
I mean they are completely unrelated things serving different purposes. I get that this is a US centric forum so most people commenting are in the great divide between two political parties, but geez.
Elon Musk chose to make his identity nakedly partisan in a context where doing so is deeply alienating to a lot of people. That is going to have brand consequences.
Out of all his brands, though, X and particularly XAI (and so Grok) have been particularly influenced by â indeed he seems to see them as vehicles for â his personal political opinions and reckless ethics.
It's also a company headed by an individual who performed a nazi salute publicly. For some of us this matters a lot, and we choose not to have anything to do with any of the companies headed by that individual.
For others, the situation is reversed in the sense that the backlash that emerged then clearly revealed who the true victim was. Since I havenât considered Tesla and havenât used Grok, I offer Musk my support as a response to this backlash. I'm event started paying for X. There should be a limit to slander and hatred and I believe someone should stand up to the crowd. In the name of a better tomorrow and to ensure history never repeats itself.
So in your view, the true victim of Elon's nazi salute was.. Elon?
How do you come to that conclusion? Because the backlash was "too much" ? He is still (one of) the richest people in the world, and controls several huuge companies. But he got his feelings hurt, I guess? And that was "too much" ?? Poor snowflake Elon.
The Anti-Defamation League stated it wasn't a salute and that they weren't offended.
Rabbi Ari Lamm wrote that Musk has repeatedly shown he's a friend to the Jewish community.
David Greenfield suggested people should focus on actual antisemitism instead.
Netanyahu highlighted the absurdity of the accusations and pointed to Musk's aid and engagement after the October 7th attacks.
And yes, Musk became a victim. I don't see what his current wealth has to do with it. It's hard to ignore the imbalance where one man drew the world's anger and became public enemy #1. If you call him a snowflake, I don't know what to call all those who might have been offended by his gesture
That seems like a pretty defamatory statement to make. Do you have proof he was doing the nazi salute or has other famous people made similar gestures in the past?
Why does it matter to you when Netanyahu - one of the most important prime ministers and a representative of Jews - made a whole X post exonerating Elon over the salute?
> .@elonmusk is being falsely smeared.
Elon is a great friend of Israel. He visited Israel after the October 7 massacre in which Hamas terrorists committed the worst atrocity against the Jewish people since the Holocaust. He has since repeatedly and forcefully supported Israelâs right to defend itself against genocidal terrorists and regimes who seek to annihilate the one and only Jewish state.
If it matters to you a lot, does almost all others politically-oriented people also performing that also matter a lot? (you can find even better ones ..) https://imgur.com/a/wikg2zR
"My heart goes out to you" "Taxi!" or just a "I see you guys!" can all be accompanied by a bad arm angle in hindsight. As he obviously wasn't going for that by his own words, maybe we should consider actions more important than interpreted hand movements. And Musk has been loud about AI safety since 2016, giving name, cofounding and funding OpenAI before Sam conducted a hostile takeover and made it profit-first instead of a gift for humanity.
The reality is that even if he didn't do it intentionally, or did it in such a way that it could only be ambiguous (which i agree, it is) - he's 100% the type of person to lean into the controversy it creates. At that time, building favor with trump voters was good for him. Further any of your examples shown with a video and the full context would clearly not be misinterpreted. It's a full motion gesture and only video captures it unless there are swastickas and white hooded men.
"Terminally tarnished"? Like Microsoft got "terminally tarnished" when trolls drove Tay crazy?
Grok was built separate from the competitors to avoid the kind of actually potentially harmful issues like considering misgendering worse than a global thermonuclear war, or imaging the founding fathers as black by default. The incident you're alluding was caused purely by jokesters constantly trying to break it and managing to force a binary between exactly "Gigaj*" and "Mechah***".
After that incident, the system prompt was changed, and a line was removed: "The response should not shy away from making claims which are politically incorrect, as long as they are well substantiated"
We should be more concerned about the issues of that line being removed than what you get it to say when you try to drive it crazy for hours on end.
Your link is paywalled, but your claim of that "inforwars" line being part of the system prompt is incorrect. It is only a part of a made-up persona one can select if they really want to, among other equally crazy options intended for flavor, obviously not a part of the core product itself. I've managed to get ChatGPT to spout worse stuff when I tried just for curiosity, and that's fine, because I explicitly requested for it.
Remember. Musk was the ONLY one seriously talking about the risks of AI back in 2016. And cofounded OpenAI, giving it its' name as it was founded on a goal to have all its research open to help against the risks. Then Sam enacted a hostile takeover. Now the cat's out of the bag, but its clear who really cares about the issues instead of just money or power (Musk funded OpenAI for a long time at a time when its core was explicitly not aiming for profit!)
I use Grok for stuff close to politics because I know it will won't have manufactured mind fog there. I've been using GPT5 for code tens of prompts a day, but this new model - might be well worth looking at.
It's not just the model, it's Elon Musk's view of the world and business in general. Neither Microsoft nor Google nor their leadership--though admittedly imperfect--make it a habit of trolling people, openly embroiling themselves in politics, and committing blatant legal and societal transgressions. You reap what you sow; and if you live for controversy, you can't expect people not to want to do business with you.
Those incidents are a bit different, don't you think? CEOs of Microsoft and Google didn't really publicly do Nazi salutes, did insane damage to people in need over the world, sided with autocrats and are actively working on undermining developed democracies around the world.
Sure, the AI product might be interesting (let's not talk about how it was financed and how GPUs from a public company were diverted to a private venture), but ignoring all of the surrounding factors is an interesting approach.
That single incident was only the worst of the bunch. This is on top of all heaps of context which paints Grok, X, and Elon Musk in general as something any decent human being should not touch with a 10 foot pole.
This is a forum for tech-related discussions, not a venue for your virtue signaling. We are discussing technical merits of an LLM for programming.
And if you are such a party purist, delete the Twitter handle from your HN profile
While this point might be open to debate, the original claim, which I definitely stand by, was not that Musk is a Nazi, but rather that xAI have put out a product under the grok brand which manifestly promoted nazi ideas.
If Musk is not in favor of those ideas he might need to work a bit harder to make that clear, because he does tend to leave people with the impression heâs okay with it.
Excuse me sir, this is a forum for yCombinator backed startups. The technical aspect is a historical novelty, if they could make money without it, they would
> terminally tarnished for you by the âmechahitlerâ incident
It is forgivable because there is no real understanding in an llm.
And other llm can also be prompted to say ridiculous things, so what? If a llm would accept a name of a Viking or Khan of the steppes it doesnât mean it wants to rape and pillage.
What was the alternative? This was clearly an oversight and this much was admitted.
Your suggestion that an oversight like this is reason enough to not use the model?
I donât get the big problem over here. The model said some unsavoury things and the problem was admitted and fixed - why is this making people lose their minds? It has to be performative because I canât explain it in any other way.
Well, youâd also be forgiven for thinking âhow on earth can a social website chatbot be a white supremacist?â And yet xAI managed to prove that is a legitimate concern.
xAI has a shocking track record of poor decisions when it comes to training and prompting their AIs. If anyone can make a partisan coding assistant, they can. Indeed, given their leadership and past performance, we might expect them to explicitly try.
Even on regular Grok, I've seen it disagree with fundamental consensus viewpoints of people on the right. You're reading a lot of comments from people who have never used Grok in any way.
I urge anyone who disagrees to use grok and get it to say something obviously untrue and right wing. I have used it many times and it is clearly balanced.
I am sure this of course a good faith argument and no need to once again teach the point of everything being, in a sense, political.
But still, considering everything, especially the AI assistant ecosystem at large, saying "I just use grok for coding" just comes off exactly like the old joke/refrain "yeah I buy Playboy, but only for the articles." Like yeah buddy, suuure.
It was a good faith question. I use perplexity for my searches/research today. I have NO intention of moving that to X (although grok maybe one of the models I can use underneath, I havenât explicitly enabled it).
I donât use social media in general, maybe YouTube but itâs been a real challenge to get rid of all the political content - both left and right wing.
I have used both Grok and Perplexity, and I've recently decided to just use Perplexity, even when I tell it to use Grok under the covers, I like the way Perplexity organizes things.
The article you linked talks about the voice personality prompt for "unhinged mode", which is an entertainment mode. It has nothing to do with the code writing model.
The fact that that represents something the folks at xAI think would be entertaining can certainly be a basis for thinking twice about trusting their judgement in other matters, though, right?
Qwen3-Coder-480B hosted by Cerebras is $2/Mtok (both input and output) through OpenRouter.
OpenRouter claims Cerebras is providing at least 2000 tokens per second, which would be around 10x as fast, and the feedback I'm seeing from independent benchmarks indicates that Qwen3-Coder-480B is a better model.
As a bit of a side note, I want to like Cerebras, but using any of the models through OpenRouter that uses them has lead to, too many throttling responses. Like you can't seem to make a few calls per minute. I'm not sure if Cerebras is throttling OpenRouter or if they are throttling everybody.
If somebody from Cerebras is reading this, are you having capacity issues?
You can get your own key with cerebras and then use it in openrouter. Its a little hidden, but for each provider you can explicitly provide your own key. Then it won't be throttled.
There is a national superset of âNIHâ bias that I think will impede adoption of Chinese-origin models for the foreseeable future. Thatâs a shame because by many objective metrics theyâre a better value.
It does totally ridiculous things, very fast. That's not a good thing.
I imagine it might be good for something really tight and simple and specific like making some CRUD endpoints or i8n files or something but otherwise..
Exactly, asked it to improve my Justfile a bit, it went wild, broke everything and got into an endless loop trying to fix it.
This is under Kilo Code so YMMV.
Anecdotally itâs coming under sonnet 4 for me but much quicker and if I understand the costs correctly, vastly cheaper. No idea if itâs subsidized or not but I am going to keep playing around with it. My example is it is definitely writing the code I want but with more quirks than sonnet. I only do things in chunks though.
> No idea why this thread is so negative (also got flagged while I was typing this?)
Grok is owned by Elon Musk. Anything positive that is even tangentially related to him will be treated negatively by certain people here. Additionally, it is an AI coding tool which is seen as a threat to some peopleâs livelihoods here. Itâs a double whammy, so Iâm not surprised by the reaction to it at all.
Mostly by name association. The LLMs named Grok are good LLMs. The twitter bot of the same name, using those models and a custom prompt, has a habit of creating controversy. Usually after somebody modified the system prompt.
I use grok a lot on the web interface (grok.com) and never had any weird incidents. It's a run-of-the-mill SOTA model with good web search and less safety training
If we accept the "broken windows" theory, it'd seem that people love to pile onto a thread that already has negativity.
See also the Microsoft threads on HN where everyone threatens to switch to Linux, and by reading them you'd think Linux is finally about to have its infamous glory year on the desktop.
People really are trying to switch to Linux right now, but it won't really matter if it doesn't stick, and spoiler alert, for most people it probably won't stick as a daily driver. Still, it's an interesting sort of unplanned experiment to watch.
iâd love to switch to linux as a daily driver but the mac cmd shortcuts for text editing and the general well thought out text editing in the system make macos more compelling. iâd love to switch over for gaming on the windows computer but the lack of performance for comparable specs hurts it. drivers are very important.
ive seen some that change it for copy and paste but i donât think it works for cmd-left right up down. or option those.
The chords are different, but the only functionality missing from the other OSes are predictive text, text expansion, and skipping to the beginning/end of an entire text box. Are those the text editing features you're referring to or am I missing out on something more?
It's kind of fascinating that other "certain people" are so casually dismissive of what a piece of trash Musk is. There is no universe where any money or attention that I have available would going to him or any of his endeavors. I don't care if his latest model is giving away free blowjobs, it's still being boosted and financed by a morally bankrupt man-child who, in case you forgot, was complicit in doing serious damage to this country.
Calling another person âa piece of trashâ is, in my country, a criminal offense. It is also the hallmark of what you call âmoral bankruptcyâ, and of being a man-child.
> Calling another person âa piece of trashâ is, in my country, a criminal offense. It is also the hallmark of what you call âmoral bankruptcyâ, and of being a man-child.
Musk better not visit your country then since he routinely calls people worse, with no or contrary evidence
^ And this, ladies and gentlemen, is a stereotypical example of a user who are happy to downvote anything about Musk, and thus Grok. So for the future reference this kind of behaviour should not come as a surprise.
Also, watch the "Nazi salutes" clip in its entirety from a non-biased source. He is excited that Trump won and is awkwardly gesturing while saying "my heart goes out to you" in celebration and thanks to the voters. Even the ADL said it wasn't a Nazi salute.
It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.
I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.
If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.
Ad absurdum, if it could injest and work on an entire project in milliseconds, then it has mucher geater value to me, than a process which might take a day to do the same, even if the likelihood of success is also strongly affected.
It simply enables a different method of interactive working.
Or it could supply 3 different suggestions in-line while working on something, rather than a process which needs to be explicitly prompted and waited on.
Latency can have critical impact on not just user experience but the very way tools are used.
Now, will I try Grok? Absolutely not, but that's a personal decision due to not wanting anything to do with X, rather than a purely rational decision.
>If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.
Before MoE was a thing, I built what I called the Dictator, which was one strong model working with many weaker ones to achieve a similar result as MoE, but all the Dictator ever got was Garbage In, so guess what came out?
why's this guy getting downvoted? SamA says we need a Dyson Sphere made of GPUs surrounding the solar system and people take it seriously but this guy takes a little piss out of that attitude and he's downvoted?
> If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.
Asking any model to do things in steps is usually better too, as opposed to feeding it three essays.
I don't know what other people are doing, I mostly use LLMs:
* Scaffolding
* Ask it what's wrong with the code
* Ask it for improvements I could make
* Ask it what the code does (amazing for old code you've never seen)
* Ask it to provide architect level insights into best practices
One area where they all seem to fail is lesser known packages they tend to either reference old functionality that is not there anymore, or never was, they hallucinate. Which is part of why I don't ask it for too much.
Junie did impress me, but it was very slow, so I would love to see a version of Junie using this version of Grok, it might be worthwhile.
That's phase 1, ask it to "think deeply" (Claude keyword, only works with the anthropic models) while doing that. Then ask it to make a detailed plan of solving the issue and write that into current-fix.md and ask it to add clearly testable criteria when the issuen is solved.
Now you manually check the criteria wherever they sound plausible, if not - it's analysis failed and its output was worthless.
But if it sounds good, you can then start a new session and ask it to read the-markdown-file and implement the change.
Now you can plausibility check the diff and are likely done
But as the sister comment pointed out, agentic coding really breaks apart with large files like you usually have in brownfield projects.
I hope in the future tooling and MCP will be better so agents can directly check what functionality exists in the installed package version instead of hallucinations.
If Apple keeps improving things, you can run the model locally. I'm able to run models on my Macbook with an M4 that I can't even run on my 3080 GPU (mostly due to VRAM constraints) but they run reasonably fast, would the 3080 be faster? Sure, but its also plenty fast to where I'm not sitting there waiting longer than I wait for a cloud model to "reason" and look things up.
I think the biggest thing for offline LLMs will have to be consistency for having them search the web with an API like Google's or some other search engines API, maybe Kagi could provide an API for people who self-host LLMs (not necessarily for free, but it would still be useful).
It's a fair trade off for smaller companies where IP or the software is necessary evil, not the main unique value added. It's hard to see what evil would anyone do with crappy legacy code.
The IP risks taken may be well worth of productiviry boosts.
I don't think he was saying their release cadence is a direct metric on their model performance. Just that the team iterates and improves the app user experience much more quickly than on other teams.
He seems to be stating that app release cadence correlates with internal upgrades that correlate with model performance. There is no reason for this to be true. He does not seem to be talking about user experience.
It's a metric for showing you can move more quickly on product improvements. Anyone who has worked on a product team at a large tech company knows how much things get slowed down by process bloat.
After trying Cerebras free API (not affiliated) which delivers Qwen Coder 480b and gpt-oss-120b a mind boggling ~3000 tps, that output speed is the first thing I checked out when considering a model for speed. I just wish Cerebras had a better overall offering on their cloud, usage is capped at 70M tokens / day and people are reporting that it's easily hit and highly crippling for daily coding.
For autocompleting simple functions (string manipulation, function definitions, etc), the quality bar is pretty easy to hit, and speed is important.
If you're just vibe coding, then yeah, you want quality. But if you know what you're doing, I find having a dumber fast model is often nicer than a slow smart model that you still need to correct a bit, because it's easier to stay in flow state.
With the slow reasoning models, the workflow is more like working with another engineer, where you have to review their code in a PR
For agentic workflows, speed and good tool use are the most important thing. Agents should use tools for things by design, and that can include reasoning tools and oracles. The agent doesn't need to be smart, it just needs a line to someone who is that can give the agent a hyper-detailed plan to follow.
Speed absolutely matters. Of course if the quality is trash then it doesn't matter, but a model that's on par with Claude Sonnet 4 AND very speedy would be an absolute game changer in agentic coding. Right now you craft a prompt, hit send and then wait, and wait, and then wait some more, and after some time (anywhere from 30 seconds to minutes later) the agent finishes its job.
It's not long enough for you to context switch to something else, but long enough to be annoying and these wait times add up during the whole day.
It also discourages experimentation if you know that every prompt will potentially take multiple minutes to finish. If it instead finished in seconds then you could iterate faster. This would be especially valuable in the frontend world where you often tweak your UI code many times until you're satisfied with it.
Tbh I kind of disagree ; there are certain use-cases were legitimately speed would be much more interesting such as generating a massive amount of HTML. Tough I agree this makes it look like even more of a joke for anything serious.
To a point. If gpt5 takes 3 minutes to output and qwen3 does it in 10 seconds and the agent can iterate 5 times to finish before gpt5, why do I care if gpt5 one shot it and qwen took 5 iterations
Often all it takes is to reset to a checkpoint or undo and adjust the prompt a bit with additional context and even dumber models can get things right.
I've used grok code fast plenty this week alongside gpt 5 when I need to pull out the big guns and it's refreshing using a fast model for smaller changes or for tasks that are tedious but repetitive during things like refactoring.
Yes fast/dumb models are useful! But that's not what OP said - they said they can be as useful as the large models by iterating them.
Do you use them successfully in cases where you just had to re-run them 5 times to get a good answer, and was that a better experience than going straight to GPT 5?
Not everyone is solving complicated things every time they hit cmd-k in Cursor or use autocomplete, and they can easily switch to a different model when working harder stuff out via longer form chat.
I'm more curious if its based on Grok 3 or what, I used to get reasonable answers from Grok 3. If that's the case, the trick that works for Grok and basically any model out there is to ask for things in order and piecemeal, not all at once. Some models will be decent at the 'all at once' approach, but when me and others have asked it in steps it gave us much better output. I'm not yet sure how I feel about Grok 4, have not really been impressed by it.
Fast can buy you a little quality by getting more inference on the same task.
I use Opus 4.1 exclusively in Claude Code but then I also use zen-mcp server to get both gpt5 and gemini-2.5-pro to review the code and then Opus 4.1 responds. I will usually have eyeballed the code somewhere in the middle here but I'm not fully reviewing until this whole dance is done.
I mean, I obviously agree with you in that I've chosen the slowest models available at every turn here, but my point is I would be very excited if they also got faster because I am using a lot of extra inference to buy more quality before I'm touching the code myself.
> I use Opus 4.1 exclusively in Claude Code but then I also use zen-mcp server to get both gpt5 and gemini-2.5-pro to review the code and then Opus 4.1 responds.
This is a nice setup. I wonder how much it helps in practice? I suspect most of the problems opus has for me are more context related, and Iâm not sure more models would help. Speculation on my part.
I've been testing Grok for a few days, and it feels like a major step backward. It randomly deleted some of my code - something I haven't had happen in a long time.
While the top coding models have become much more trustworthy lately, Grok isn't there yet. It doesn't matter if it's fast and/or free; if you can't trust a tool with your code, you can't use it.
Kilo Code has a free trial of Grok Code Fast 1 and I've had very poor results with it so far. Much less reliable than GPT 5 Mini, which was also faster, ironically.
To me, "full self driving" means you can hop in the back seat and have a nap. If you have to keep your hands near the wheel and maintain attention to the road then... shrugs not really the same. IMHO we're in the "uncanny valley" of vehicular automation.
everything a layman would call "AI" is in the "uncanny valley" at the moment!
- Boston Dynamics' Atlas does not move as gracefully as a human
- LLM writing and code is oh-so-easy to spot
- the output of diffusion models is indistinguishable from a photo... until you look at it for longer than 5 seconds and decide to zoom in because "something's wrong"
Is this the model that is the "Coding" version of Grok-4 promised when Grok-4 had awful coding benchmarks?
I guess if you cannot do well in benchmarks, instead pick an easier to pump up one and run with that - speed. Looking online for benchmarks the first thing that came up was a reddit post from an (obvious) spam account[1] gloating about how amazing it was on a bunch of subs.
I've actually seem really good outputs from the regular Grok 4. The issue seemed to be that it didn't explain anything and just made some changes, which like, I said, were pretty good. I never wanted a faster version, I just wanted a bit more feedback and explanations for suggested changes.
I recently found it much more valuable, and why I am now preferring GPT-5 over Sonnet 4, is that if I start asking it to give me different architectural choices, its really quite good at summarizing trade-offs and and offering step-by-step navigation towards problem solving. I am liking this process a lot more than trying to "one shot" or getting tons of code completely rewritten, thats unrelated to what I am really asking for. This seems to be a really bad problem with Opus 4.1 Thinking or even Sonnet Thinking. I don't think it's accurate, to rate models on "one-shoting" a problem. Rate it on, how easy it is to work with, as an assistant.
I had that issue with gpt-5 that when it wanted to do something in one way that was just plain wrong in this project, and no matter what I said it just kept doing the same action.
it was completely unsterable. I get why people are often upset by "you're right" of Claude models, but that's what I usually want from model.
I guess there is different in expectations depending on experience level of developer, but I want to have final saying what is the right way
I have the same experience, except while I agree that GPT-5 is better than Sonnet 4 for architecture and deep thinking, Sonnet 4 still seems to be better for just banging out code when you have a well-defined and a very detailed plan.
I thought it was incredible - I asked it a question about a refactoring and it called a ton of tools very quickly to read the code and it had what seemed like solid reasoning - it found two bugs! Of course, neither were bugs at all. But it looked cool!
My experience with 'sonic' during the stealth phase had it do stuff plenty fast, but the quality was slightly off target for some things. It did create tests and then iterate on those tests. The tests it wrote don't actually verify intended behavior. It only verified that mocks were called with the intended inputs while missing the larger picture of how it is used.
Definitely fast, but initial use puts quality either comparable to or below gpt-5-nano. This might be a low-cost option for people who don't mind babysitting the output (or working in very small projects), but claude/gpt-5/gemini all seem to have significantly higher quality at marginally more cost/time.
By just emphasizing the speed here, I wonder if their workflows revolve more around the vibe practice of generating N solutions to a problem in parallel and selecting the "best". If so, it might still win out on speed (if it can reliably produce at least one higher-quality output, which remains to be seen), but also quickly loses any cost margin benefits.
Ah, so this is what the Sonic model that Cursor had was. I've been doing this personal bench where I ask each model to create a 3D render of a guy using a laptop on a desk. I haven't written up a post to show the different output from each model, yet, but it's been a fun way to test the capabilities. Opus was probably the best -- Sonic put the guy in the middle of the desk, and the laptop floating over his head. Sonic was very fast, though!
I noticed it pop up on copilot so gave it about two attempts. Neither were fast, and both were incredibly average. Gpt4.1 and 5-mini do a better job, and 5-mini was faster...but I find speed of response varies hugely and seemingly randomly throughout the day.
Maybe its just me but I wish models like this would also provide a normal chat interface.
The leap from taking advice and copy-pasting almost as a shameful fallback, to it just directly driving your tools is a tough pill. Having recently adjusted to "micro-dosing" on LLM's (asking no direct code output, smaller patches) when it comes to code to allow me to learn better is something I don't know how I would integrate with this.
Or do the agentic tools allow for this in some reasonably way and I just don't know?
Just a few days ago I spent some time to sign up for Groq (not Grok, not Musk!) to implement fast code suggestions with qwen3-32b and gpt-oss-20b. Works handily with Jetbrains integrated AI features. I still use Claude Code as my "main" engineer, but I use these fast models for quick, fast edits.
I support the X enterprise, its motives, and its agenda. I'm a happy paying customer. Question away as seriously as you please. But don't bother looping me into that dialog. I'm not interested.
And, yes, I do. I donât support CCP. I do, however, support hardworking Chinese people fighting for freedom. So, please donât try to twist it like I am making some racist statement.
Fast is cool! Totally has its place. But I use Claude code in a way right now where itâs not a huge issue and quality matters more.
Opus 4.1 is by far the best right now for most tasks. Itâs the first model I think will almost always pump out âgood codeâ. I do always plan first as a separate step, and I always ask it for plans or alternatives first and always remind it to keep things simple and follow existing code patterns. Sometimes I just ask it to double check before I look at it and it makes good tweaks. This works pretty well for me.
For me, I found Sonnet 3.5 to be a clear step up in coding, I thought 3.7 was worse, 2.5 pro equivalent, and 4 sonnet equal maybe tiny better than 3.5. Opus 4.1 is the first one to me that feels like a solid step up over sonnet 3.5. This of course required me to jump to Claude code max plan, but first model to be worth that (wouldnât pay that much for just sonnet).
fast but not smart. Fine for non-critical "I need this query" or "summarize this" but it's pretty much worthless for real coding work (compared to gpt-5 thinking or sonnet 4)
I'd argue that even GPT-5 and Sonnet 4 at their highest reasoning budgets are not enough for "real coding work" because you still have to think about how an LLM should do it instead of what, and put it into a prompt. some harnesses, such as JetBrains Junie or Gemini CLI, make a good job of letting me drift into declarative prompts, but that's still not enough.
totally agree - they all need a human in the loop at this point. I'm constantly stopping gpt-5/sonnet 4 and steering. Unfortunately with grok it completely misses the plot constantly
This will probably be a unpopular, wet blanket opinion...
But anytime I hear of Grok or xAI, the only thing I can think about is how it's hoovering up water from the Memphis municipal water supply and running natural gas turbines to power all for a chat bot.
Looks like they are bringing even more natural gas turbines online...great!
It's not the water that is the big problem here. It is the gas turbines and the location.
They started operating the turbines without permits and they were not equipped with the pollution controls normally required under federal rules. Worse, they are in an area that already led the state in people having to get emergency treatment for breathing problems. In their first 11 months they became one of the largest polluters in an area already noted for high pollution.
They have since got a permit, and said that pollution controls will be added, but some outside monitors have found evidence that they are running more turbines than the permit allows.
Oh, and of course 90% of the people bearing the brunt of all this local pollution are poor and Black.
This is the model that was code named "Sonic" in Cursor last week. It received tons of praise. Then Cursor revealed it was a model from xAI. Then everyone hated it. :/ I miss the days where we just liked technology for advancement's sake.
*edit Case in point, downvotes in less than 30 seconds
I'm pretty sure everyone knew it was xAI last week. It's a great model. I'll never pay to use it, but I like it enough while it's free.
> I miss the days where we just liked technology for advancement's sake.
I think you haven't fully thought through such statements. They lead to bad places. If Bin Laden were selling research and inference to raise money for attacks, how many tokens would you buy?
Its focus seems to be on faster responses, which Grok 3 definitely is good at. I have a different approach to LLMs and coding, I want to understand their proposed solutions and not just paste garbled up code (unless its scaffolded) if you treat every LLM as a piecemeal thing when designing code (or really trying to figure out anything) and go step by step, you get better results from most models.
Yeah, I tried it in Copilot and it's fast, but I'd rather have a 2x smarter model that takes 10x longer. The competition for "fast" is the existing autocomplete model, not the chat models.
I haven't used Copilot in a while but Cursor lets you easily switch the model depending on what you're trying to do.
Having options for thinking, normal, fast covers every sort of problem. GPT-5 doesn't let you choose which IMO is only helpful for non-IDE type integrations, although even in ChatGPT it can be annoying to get "thinking" constantly for simple questions.
I have the option for either, but it's an option I'll never choose. My issue with Copilot wasn't speed, it's quality. The only thing that has to be fast is the text-completion part, which Grok isn't replacing. The code chat/agent part needs to focus on actually being able to do things.
AI coding tools are amazing and if you don't use them, that's fine. But lots of people, myself included, are finding tremendous utility in these models.
I'm getting 30-50% larger code changes in per day now. Yesterday I plumbed six slightly mechanical, but still major changes through our schema, several microservice layers, API client libraries, and client code. I wrote down the change sites ahead of time to track progress: 54. All requiring individual business logic. This would have been tedious without tab complete.
And that's not the only thing I did yesterday.
I wouldn't trust these tools with non-developers, but in our hands they're an exoskeleton. I like them like I like my vim movements.
A similar analogy can be made for the AI graphics design and editing models. They're extremely good time saving tools, but they still require a human that knows what they're doing to pilot them.
I'm not an animator and I made that with a few simple tools.
It has a lot of errors and mistakes that I didn't take the time to correct since it was just a silly meme, but do you see how accessible all of this is?
When people with intention and taste use these tools, the results are powerful. I won't claim that the above videos demonstrate this, but I can certainly do good work with these tools.
I don't see how this is anything short of revolutionary.
Grok are the first models I am boycotting on purely environmental grounds. They built their datacenter without sufficient local power supply and have been illegally powering it with unpermitted gas turbine generators until that capacity gets built, to the significant detriment of the local population.
https://www.datacenterdynamics.com/en/news/elon-musk-xai-gas...
It would be nice if they could get more power online faster.
Tested this yesterday with Cline. It's fast, works well with agentic flows, and produces decent code. No idea why this thread is so negative (also got flagged while I was typing this?) but it's a decent model. I'd say it's at or above gpt5-mini level, which is awesome in my book (I've been maining gpt5-mini for a few weeks now, does the job on a budget).
Things I noted:
- It's fast. I tested it in EU tz, so ymmv
- It does agentic in an interesting way. Instead of editing a file whole or in many places, it does many small passes.
- Had a feature take ~110k tokens (parsing html w/ bs4). Still finished the task. Didn't notice any problems at high context.
- When things didn't work first try, it created a new file to test, did all the mocking / testing there, and then once it worked edited the main module file. Nice. GPT5-mini would often times edit working files, and then get confused and fail the task.
All in all, not bad. At the price point it's at, I could see it as a daily driver. Even agentic stuff w/ opus + gpt5 high as planners and this thing as an implementer. It's fast enough that it might be worth setting it up in parallel and basically replicate pass@x from research.
IMO it's good to have options at every level. Having many providers fight for the market is good, it keeps them on their toes, and brings prices down. GPT5-mini is at 2$/MTok, this is at 1.5$/MTok. This is basically "free", in the great scheme of things. I ndon't get the negativity.
If the Grok brand wasnât terminally tarnished for you by the âmechahitlerâ incident, Iâm not sure what more it would take.
This is an offering being produced by a company whose idea of responsible AI use involves prompting a chatbot that âYou spend a lot of time on 4chan, watching InfoWars videosâ - https://www.404media.co/grok-exposes-underlying-prompts-for-...
A lot of people rightly donât want any such thing anywhere near their code.
I don't use Twitter, I don't use X, I don't buy Tesla. It's not hard to understand why I don't use Grok either.
I mean they are completely unrelated things serving different purposes. I get that this is a US centric forum so most people commenting are in the great divide between two political parties, but geez.
> they are completely unrelated
I'm not going to engage into that... I don't see what the US has to do with this, I'm from Europe.
Elon Musk chose to make his identity nakedly partisan in a context where doing so is deeply alienating to a lot of people. That is going to have brand consequences.
Out of all his brands, though, X and particularly XAI (and so Grok) have been particularly influenced by â indeed he seems to see them as vehicles for â his personal political opinions and reckless ethics.
That 404media article is interesting. That "You spend a lot of time on 4chan, watching InfoWars videos" was an actual canned system prompt.
It's also a company headed by an individual who performed a nazi salute publicly. For some of us this matters a lot, and we choose not to have anything to do with any of the companies headed by that individual.
For others, the situation is reversed in the sense that the backlash that emerged then clearly revealed who the true victim was. Since I havenât considered Tesla and havenât used Grok, I offer Musk my support as a response to this backlash. I'm event started paying for X. There should be a limit to slander and hatred and I believe someone should stand up to the crowd. In the name of a better tomorrow and to ensure history never repeats itself.
So in your view, the true victim of Elon's nazi salute was.. Elon?
How do you come to that conclusion? Because the backlash was "too much" ? He is still (one of) the richest people in the world, and controls several huuge companies. But he got his feelings hurt, I guess? And that was "too much" ?? Poor snowflake Elon.
Let's make this clear:
The Anti-Defamation League stated it wasn't a salute and that they weren't offended. Rabbi Ari Lamm wrote that Musk has repeatedly shown he's a friend to the Jewish community. David Greenfield suggested people should focus on actual antisemitism instead. Netanyahu highlighted the absurdity of the accusations and pointed to Musk's aid and engagement after the October 7th attacks.
And yes, Musk became a victim. I don't see what his current wealth has to do with it. It's hard to ignore the imbalance where one man drew the world's anger and became public enemy #1. If you call him a snowflake, I don't know what to call all those who might have been offended by his gesture
That seems like a pretty defamatory statement to make. Do you have proof he was doing the nazi salute or has other famous people made similar gestures in the past?
Why does it matter to you when Netanyahu - one of the most important prime ministers and a representative of Jews - made a whole X post exonerating Elon over the salute?
> .@elonmusk is being falsely smeared.
Elon is a great friend of Israel. He visited Israel after the October 7 massacre in which Hamas terrorists committed the worst atrocity against the Jewish people since the Holocaust. He has since repeatedly and forcefully supported Israelâs right to defend itself against genocidal terrorists and regimes who seek to annihilate the one and only Jewish state.
I thank him for this.
Famously great guy that Netanyahu.
Whatever you think of him it is important that the elected representative of Jewish people exonerated him.
[delayed]
If it matters to you a lot, does almost all others politically-oriented people also performing that also matter a lot? (you can find even better ones ..) https://imgur.com/a/wikg2zR
"My heart goes out to you" "Taxi!" or just a "I see you guys!" can all be accompanied by a bad arm angle in hindsight. As he obviously wasn't going for that by his own words, maybe we should consider actions more important than interpreted hand movements. And Musk has been loud about AI safety since 2016, giving name, cofounding and funding OpenAI before Sam conducted a hostile takeover and made it profit-first instead of a gift for humanity.
The reality is that even if he didn't do it intentionally, or did it in such a way that it could only be ambiguous (which i agree, it is) - he's 100% the type of person to lean into the controversy it creates. At that time, building favor with trump voters was good for him. Further any of your examples shown with a video and the full context would clearly not be misinterpreted. It's a full motion gesture and only video captures it unless there are swastickas and white hooded men.
When you code with Grok, you code with Hitler.
"Terminally tarnished"? Like Microsoft got "terminally tarnished" when trolls drove Tay crazy?
Grok was built separate from the competitors to avoid the kind of actually potentially harmful issues like considering misgendering worse than a global thermonuclear war, or imaging the founding fathers as black by default. The incident you're alluding was caused purely by jokesters constantly trying to break it and managing to force a binary between exactly "Gigaj*" and "Mechah***".
After that incident, the system prompt was changed, and a line was removed: "The response should not shy away from making claims which are politically incorrect, as long as they are well substantiated"
We should be more concerned about the issues of that line being removed than what you get it to say when you try to drive it crazy for hours on end.
Your link is paywalled, but your claim of that "inforwars" line being part of the system prompt is incorrect. It is only a part of a made-up persona one can select if they really want to, among other equally crazy options intended for flavor, obviously not a part of the core product itself. I've managed to get ChatGPT to spout worse stuff when I tried just for curiosity, and that's fine, because I explicitly requested for it.
Remember. Musk was the ONLY one seriously talking about the risks of AI back in 2016. And cofounded OpenAI, giving it its' name as it was founded on a goal to have all its research open to help against the risks. Then Sam enacted a hostile takeover. Now the cat's out of the bag, but its clear who really cares about the issues instead of just money or power (Musk funded OpenAI for a long time at a time when its core was explicitly not aiming for profit!)
I use Grok for stuff close to politics because I know it will won't have manufactured mind fog there. I've been using GPT5 for code tens of prompts a day, but this new model - might be well worth looking at.
Microsoft had Tay. Google Gemini had âBlack George Washingtonâ.
I think that pinning your entire view of a model forever on a single incident is not a reasonable approach, but you do you.
It's not just the model, it's Elon Musk's view of the world and business in general. Neither Microsoft nor Google nor their leadership--though admittedly imperfect--make it a habit of trolling people, openly embroiling themselves in politics, and committing blatant legal and societal transgressions. You reap what you sow; and if you live for controversy, you can't expect people not to want to do business with you.
What about promoting renewable energy, space exploration, frontier physics and advanced engineering makes you concerned?
Donating to orphanages after committing a genocide resets your karma only in videogames.
> and committing blatant legal and societal transgressions.
lol what.
Those incidents are a bit different, don't you think? CEOs of Microsoft and Google didn't really publicly do Nazi salutes, did insane damage to people in need over the world, sided with autocrats and are actively working on undermining developed democracies around the world.
Sure, the AI product might be interesting (let's not talk about how it was financed and how GPUs from a public company were diverted to a private venture), but ignoring all of the surrounding factors is an interesting approach.
But you do you.
Who did a nazi salute?
Elon Musk
That single incident was only the worst of the bunch. This is on top of all heaps of context which paints Grok, X, and Elon Musk in general as something any decent human being should not touch with a 10 foot pole.
This is a forum for tech-related discussions, not a venue for your virtue signaling. We are discussing technical merits of an LLM for programming. And if you are such a party purist, delete the Twitter handle from your HN profile
It's the equivalent of "voting with your wallet". Or "giving market share with your wallet".
Context matters, not just for LLMs themselves. And Grok/X/Twitter's context is tarnished indeed for a lot of us.
Yes, being anti-nazi, what a... virtue? I guess?
Elon Musk is not a nazi.
While this point might be open to debate, the original claim, which I definitely stand by, was not that Musk is a Nazi, but rather that xAI have put out a product under the grok brand which manifestly promoted nazi ideas.
If Musk is not in favor of those ideas he might need to work a bit harder to make that clear, because he does tend to leave people with the impression heâs okay with it.
A prompt was edited by an xAI staff that caused xAI to ignore the politically correct filter, how is Elon Musk responsible for this ?
He's not very good at demonstrating that.
I don't know man. For like... the other 7 billion people on Earth it seems preeeetty easy for them not to be confused with a Nazi.
Seems to me just Elon has that issue. I've never had that issue. I don't know anyone who's has that issue. So, it makes you wonder.
Then why did he publicly do a Nazi salute?
He did not do a Nazi salute because otherwise ADL would've been all over him. ADL came out saying he didn't do a Nazi salute.
The ADL is not the final arbitrator of what is a Nazi salute and what isn't.
Thanks for the reminder. I donât need to leave that there for discoverability any more.
Excuse me sir, this is a forum for yCombinator backed startups. The technical aspect is a historical novelty, if they could make money without it, they would
The forum for YC backed startups is not public. This is effectively an indirect tech recruitment board for those start-ups, sir.
Youre right, with a dash of hype machine for their launches
Oh fuck off.
> terminally tarnished for you by the âmechahitlerâ incident
It is forgivable because there is no real understanding in an llm.
And other llm can also be prompted to say ridiculous things, so what? If a llm would accept a name of a Viking or Khan of the steppes it doesnât mean it wants to rape and pillage.
It's not about the model, it's about the ethics of the company intentionally building the model, and what they might do in the future.
What was the alternative? This was clearly an oversight and this much was admitted.
Your suggestion that an oversight like this is reason enough to not use the model?
I donât get the big problem over here. The model said some unsavoury things and the problem was admitted and fixed - why is this making people lose their minds? It has to be performative because I canât explain it in any other way.
How exactly is a code assistant âpartisanâ? I donât use X but Iâm open to buying a Tesla and grok for code purposes.
Kinda weird to mix political sentiment with a coding technology.
Well, youâd also be forgiven for thinking âhow on earth can a social website chatbot be a white supremacist?â And yet xAI managed to prove that is a legitimate concern.
xAI has a shocking track record of poor decisions when it comes to training and prompting their AIs. If anyone can make a partisan coding assistant, they can. Indeed, given their leadership and past performance, we might expect them to explicitly try.
Whatâs their incentive to do this? What do they gain by making a partisan model instead of one that just works well?
Perhaps youâve never heard of Tay?
Microsoft did pioneering work in the Nazi chatbot space.
Even on regular Grok, I've seen it disagree with fundamental consensus viewpoints of people on the right. You're reading a lot of comments from people who have never used Grok in any way.
I urge anyone who disagrees to use grok and get it to say something obviously untrue and right wing. I have used it many times and it is clearly balanced.
I am sure this of course a good faith argument and no need to once again teach the point of everything being, in a sense, political.
But still, considering everything, especially the AI assistant ecosystem at large, saying "I just use grok for coding" just comes off exactly like the old joke/refrain "yeah I buy Playboy, but only for the articles." Like yeah buddy, suuure.
It was a good faith question. I use perplexity for my searches/research today. I have NO intention of moving that to X (although grok maybe one of the models I can use underneath, I havenât explicitly enabled it).
I donât use social media in general, maybe YouTube but itâs been a real challenge to get rid of all the political content - both left and right wing.
I have used both Grok and Perplexity, and I've recently decided to just use Perplexity, even when I tell it to use Grok under the covers, I like the way Perplexity organizes things.
The article you linked talks about the voice personality prompt for "unhinged mode", which is an entertainment mode. It has nothing to do with the code writing model.
The fact that that represents something the folks at xAI think would be entertaining can certainly be a basis for thinking twice about trusting their judgement in other matters, though, right?
It's a comment about the company/brand behind the models, not the individual models themselves.
Qwen3-Coder-480B hosted by Cerebras is $2/Mtok (both input and output) through OpenRouter.
OpenRouter claims Cerebras is providing at least 2000 tokens per second, which would be around 10x as fast, and the feedback I'm seeing from independent benchmarks indicates that Qwen3-Coder-480B is a better model.
As a bit of a side note, I want to like Cerebras, but using any of the models through OpenRouter that uses them has lead to, too many throttling responses. Like you can't seem to make a few calls per minute. I'm not sure if Cerebras is throttling OpenRouter or if they are throttling everybody.
If somebody from Cerebras is reading this, are you having capacity issues?
You can get your own key with cerebras and then use it in openrouter. Its a little hidden, but for each provider you can explicitly provide your own key. Then it won't be throttled.
There is a national superset of âNIHâ bias that I think will impede adoption of Chinese-origin models for the foreseeable future. Thatâs a shame because by many objective metrics theyâre a better value.
In my case it's not NIH, but rather that I don't trust or wish to support my nation's largest geopolitical adversary.
It does totally ridiculous things, very fast. That's not a good thing.
I imagine it might be good for something really tight and simple and specific like making some CRUD endpoints or i8n files or something but otherwise..
Exactly, asked it to improve my Justfile a bit, it went wild, broke everything and got into an endless loop trying to fix it. This is under Kilo Code so YMMV.
This is exactly what I use it for. It's my go-to "dumb tedious things" model. And it fills that role very well.
You don't need the smartest slow model for every task. I've used it all week for tedious things nobody wants to do and gotten a ton done in less time.
The only thing I've had issues with is if you're not a level more specific than you might be with smarter models it can go off the rails.
But give it a tedious task and a very clear example and it'll happily get the job done.
nice review, thanks! how does it compare to claude code?
Similar thoughts. I have been using this model and find it pretty good and extremely fast.
HN comments love to beat up Elon Musk and unfortunately a lot of biased negative reactions to LLMs where everything will get insta downvoted.
Do you find it more or less matching what Sonnet 4 does?
Grok fast is enticing due to its cheap cost
Anecdotally itâs coming under sonnet 4 for me but much quicker and if I understand the costs correctly, vastly cheaper. No idea if itâs subsidized or not but I am going to keep playing around with it. My example is it is definitely writing the code I want but with more quirks than sonnet. I only do things in chunks though.
To add, if the costs in Cursor are accurate, itâs highly competitive.
> No idea why this thread is so negative (also got flagged while I was typing this?)
Grok is owned by Elon Musk. Anything positive that is even tangentially related to him will be treated negatively by certain people here. Additionally, it is an AI coding tool which is seen as a threat to some peopleâs livelihoods here. Itâs a double whammy, so Iâm not surprised by the reaction to it at all.
Elon aside, Grok has its own reputation issues.
And this model is arguably their least impressive model.
Mostly by name association. The LLMs named Grok are good LLMs. The twitter bot of the same name, using those models and a custom prompt, has a habit of creating controversy. Usually after somebody modified the system prompt.
I use grok a lot on the web interface (grok.com) and never had any weird incidents. It's a run-of-the-mill SOTA model with good web search and less safety training
How does somebody modify the system prompt over an x message to the chat bot?
I think somebody here refers to a very specific meddling CEO
Claude Code threads are full of excited people so Iâm not sure the second part is true.
If we accept the "broken windows" theory, it'd seem that people love to pile onto a thread that already has negativity.
See also the Microsoft threads on HN where everyone threatens to switch to Linux, and by reading them you'd think Linux is finally about to have its infamous glory year on the desktop.
People really are trying to switch to Linux right now, but it won't really matter if it doesn't stick, and spoiler alert, for most people it probably won't stick as a daily driver. Still, it's an interesting sort of unplanned experiment to watch.
iâd love to switch to linux as a daily driver but the mac cmd shortcuts for text editing and the general well thought out text editing in the system make macos more compelling. iâd love to switch over for gaming on the windows computer but the lack of performance for comparable specs hurts it. drivers are very important.
ive seen some that change it for copy and paste but i donât think it works for cmd-left right up down. or option those.
The chords are different, but the only functionality missing from the other OSes are predictive text, text expansion, and skipping to the beginning/end of an entire text box. Are those the text editing features you're referring to or am I missing out on something more?
Anthropic lives up to their name, so far
It's kind of fascinating that other "certain people" are so casually dismissive of what a piece of trash Musk is. There is no universe where any money or attention that I have available would going to him or any of his endeavors. I don't care if his latest model is giving away free blowjobs, it's still being boosted and financed by a morally bankrupt man-child who, in case you forgot, was complicit in doing serious damage to this country.
Calling another person âa piece of trashâ is, in my country, a criminal offense. It is also the hallmark of what you call âmoral bankruptcyâ, and of being a man-child.
That makes your country a piece of trash them
> Calling another person âa piece of trashâ is, in my country, a criminal offense. It is also the hallmark of what you call âmoral bankruptcyâ, and of being a man-child.
Musk better not visit your country then since he routinely calls people worse, with no or contrary evidence
In what country is that a criminal offense lmao?
^ And this, ladies and gentlemen, is a stereotypical example of a user who are happy to downvote anything about Musk, and thus Grok. So for the future reference this kind of behaviour should not come as a surprise.
What âserious damageâ do you believe Elon did?
The US seems to no longer have an overseas aid program. The Institute of Peace got shut down. Data protections at the social security administration were breached. https://time.com/7312556/doge-social-security-data-whistlebl...
Oh, and some asshole threw a couple of Nazi salutes at the presidentâs inauguration.
US AID and the Institute of Peace were CIA fronts. Good riddance to them.
https://www.theblaze.com/columns/opinion/cias-secret-grip-on...
Also, watch the "Nazi salutes" clip in its entirety from a non-biased source. He is excited that Trump won and is awkwardly gesturing while saying "my heart goes out to you" in celebration and thanks to the voters. Even the ADL said it wasn't a Nazi salute.
https://thehill.com/homenews/administration/5097676-elon-mus...
every part of the US government has been used as a front for clandestine activity. USAID also actually carried out an overseas aid mission.
It hasnât been replaced with a non-CIA-fronting overseas aid department.
Institute of Peace you say? It couldn't sound more like double speak could it? Similar to the DDR.
What kind of an argument is that? âThatâs just the sort of name they would choose if they were trying to cover up that they are evil!â
Yes - itâs also the kind of name they would choose if they were an institute dedicated to diplomacy.
Is 50% of context length considered high performance? Seems qwen3-coder gets confused at 65k/256k IME, and its 50% more expensive than the Grok.
Because democrats don't know how to cope.
> No idea why this thread is so negative
Because politics, as it always has been, is the mind-killer.
HN is incapable of separating the product from the man.
Human nature. It is what it is.
It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.
I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.
It depends how fast.
If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.
Ad absurdum, if it could injest and work on an entire project in milliseconds, then it has mucher geater value to me, than a process which might take a day to do the same, even if the likelihood of success is also strongly affected.
It simply enables a different method of interactive working.
Or it could supply 3 different suggestions in-line while working on something, rather than a process which needs to be explicitly prompted and waited on.
Latency can have critical impact on not just user experience but the very way tools are used.
Now, will I try Grok? Absolutely not, but that's a personal decision due to not wanting anything to do with X, rather than a purely rational decision.
>If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.
Before MoE was a thing, I built what I called the Dictator, which was one strong model working with many weaker ones to achieve a similar result as MoE, but all the Dictator ever got was Garbage In, so guess what came out?
You just need to scale out more. As you approach infinite monkeys, sorry - models, you'll surely get the result you need.
why's this guy getting downvoted? SamA says we need a Dyson Sphere made of GPUs surrounding the solar system and people take it seriously but this guy takes a little piss out of that attitude and he's downvoted?
this site is the fucking worst
That doesn't seem similar to MoE at all.
Besides being a faster slot machine, to the extent that they're any good, a fast agentic LLM would be very nice to have for codebase analysis.
For 10% less time you can get 10% worse analysis? I donât understand the tradeoff.
> If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.
Asking any model to do things in steps is usually better too, as opposed to feeding it three essays.
I thought the current vibe was doing the former to produce the latter and then use the output as the task plan?
I don't know what other people are doing, I mostly use LLMs:
* Scaffolding
* Ask it what's wrong with the code
* Ask it for improvements I could make
* Ask it what the code does (amazing for old code you've never seen)
* Ask it to provide architect level insights into best practices
One area where they all seem to fail is lesser known packages they tend to either reference old functionality that is not there anymore, or never was, they hallucinate. Which is part of why I don't ask it for too much.
Junie did impress me, but it was very slow, so I would love to see a version of Junie using this version of Grok, it might be worthwhile.
> Ask it what's wrong with the code
That's phase 1, ask it to "think deeply" (Claude keyword, only works with the anthropic models) while doing that. Then ask it to make a detailed plan of solving the issue and write that into current-fix.md and ask it to add clearly testable criteria when the issuen is solved.
Now you manually check the criteria wherever they sound plausible, if not - it's analysis failed and its output was worthless.
But if it sounds good, you can then start a new session and ask it to read the-markdown-file and implement the change.
Now you can plausibility check the diff and are likely done
But as the sister comment pointed out, agentic coding really breaks apart with large files like you usually have in brownfield projects.
I hope in the future tooling and MCP will be better so agents can directly check what functionality exists in the installed package version instead of hallucinations.
> amazing for old code you've never seen
not if you have too much! a few hundred thousand lines of code and you can't ask shit!
plus, you just handed over your company's entire IP to whoever hosts your model
If Apple keeps improving things, you can run the model locally. I'm able to run models on my Macbook with an M4 that I can't even run on my 3080 GPU (mostly due to VRAM constraints) but they run reasonably fast, would the 3080 be faster? Sure, but its also plenty fast to where I'm not sitting there waiting longer than I wait for a cloud model to "reason" and look things up.
I think the biggest thing for offline LLMs will have to be consistency for having them search the web with an API like Google's or some other search engines API, maybe Kagi could provide an API for people who self-host LLMs (not necessarily for free, but it would still be useful).
It's a fair trade off for smaller companies where IP or the software is necessary evil, not the main unique value added. It's hard to see what evil would anyone do with crappy legacy code.
The IP risks taken may be well worth of productiviry boosts.
That's far from the worst metric that xAI has come up with...
https://xcancel.com/elonmusk/status/1958854561579638960
what's wrong with rapid updates to an app?
It's like measuring how fast your car can go by counting how often you clean the upholstery.
There's nothing wrong with doing it, but it's entirely unrelated to performance.
I don't think he was saying their release cadence is a direct metric on their model performance. Just that the team iterates and improves the app user experience much more quickly than on other teams.
He seems to be stating that app release cadence correlates with internal upgrades that correlate with model performance. There is no reason for this to be true. He does not seem to be talking about user experience.
It's a fucking chat. How many times a day do you need to ship an update?
See the reply, currently at #2 on that Twitter thread, from Jamie Voynow.
They aren't a metric for showing you are better than the competition.
It's a metric for showing you can move more quickly on product improvements. Anyone who has worked on a product team at a large tech company knows how much things get slowed down by process bloat.
After trying Cerebras free API (not affiliated) which delivers Qwen Coder 480b and gpt-oss-120b a mind boggling ~3000 tps, that output speed is the first thing I checked out when considering a model for speed. I just wish Cerebras had a better overall offering on their cloud, usage is capped at 70M tokens / day and people are reporting that it's easily hit and highly crippling for daily coding.
depends for what.
For autocompleting simple functions (string manipulation, function definitions, etc), the quality bar is pretty easy to hit, and speed is important.
If you're just vibe coding, then yeah, you want quality. But if you know what you're doing, I find having a dumber fast model is often nicer than a slow smart model that you still need to correct a bit, because it's easier to stay in flow state.
With the slow reasoning models, the workflow is more like working with another engineer, where you have to review their code in a PR
For agentic workflows, speed and good tool use are the most important thing. Agents should use tools for things by design, and that can include reasoning tools and oracles. The agent doesn't need to be smart, it just needs a line to someone who is that can give the agent a hyper-detailed plan to follow.
Speed absolutely matters. Of course if the quality is trash then it doesn't matter, but a model that's on par with Claude Sonnet 4 AND very speedy would be an absolute game changer in agentic coding. Right now you craft a prompt, hit send and then wait, and wait, and then wait some more, and after some time (anywhere from 30 seconds to minutes later) the agent finishes its job.
It's not long enough for you to context switch to something else, but long enough to be annoying and these wait times add up during the whole day.
It also discourages experimentation if you know that every prompt will potentially take multiple minutes to finish. If it instead finished in seconds then you could iterate faster. This would be especially valuable in the frontend world where you often tweak your UI code many times until you're satisfied with it.
> I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.
We already know that in most software domains, fast (as in, getting it done faster) is better than 100% correct.
Tbh I kind of disagree ; there are certain use-cases were legitimately speed would be much more interesting such as generating a massive amount of HTML. Tough I agree this makes it look like even more of a joke for anything serious.
They reduce the costs tough !
To a point. If gpt5 takes 3 minutes to output and qwen3 does it in 10 seconds and the agent can iterate 5 times to finish before gpt5, why do I care if gpt5 one shot it and qwen took 5 iterations
It doesnât though. Fast but dumb models donât progressively get better with more iterations.
There are many ways to skin a cat.
Often all it takes is to reset to a checkpoint or undo and adjust the prompt a bit with additional context and even dumber models can get things right.
I've used grok code fast plenty this week alongside gpt 5 when I need to pull out the big guns and it's refreshing using a fast model for smaller changes or for tasks that are tedious but repetitive during things like refactoring.
Yes fast/dumb models are useful! But that's not what OP said - they said they can be as useful as the large models by iterating them.
Do you use them successfully in cases where you just had to re-run them 5 times to get a good answer, and was that a better experience than going straight to GPT 5?
That very much depends on the usecase
Different models for different things.
Not everyone is solving complicated things every time they hit cmd-k in Cursor or use autocomplete, and they can easily switch to a different model when working harder stuff out via longer form chat.
I'm more curious if its based on Grok 3 or what, I used to get reasonable answers from Grok 3. If that's the case, the trick that works for Grok and basically any model out there is to ask for things in order and piecemeal, not all at once. Some models will be decent at the 'all at once' approach, but when me and others have asked it in steps it gave us much better output. I'm not yet sure how I feel about Grok 4, have not really been impressed by it.
I agree. Coding faster than humans can review it is pointless. Between fast, good, and cheap, I'd prioritize good and cheap.
Fast is good for tool use and synthesizing the results.
Fast can buy you a little quality by getting more inference on the same task.
I use Opus 4.1 exclusively in Claude Code but then I also use zen-mcp server to get both gpt5 and gemini-2.5-pro to review the code and then Opus 4.1 responds. I will usually have eyeballed the code somewhere in the middle here but I'm not fully reviewing until this whole dance is done.
I mean, I obviously agree with you in that I've chosen the slowest models available at every turn here, but my point is I would be very excited if they also got faster because I am using a lot of extra inference to buy more quality before I'm touching the code myself.
This is a nice setup. I wonder how much it helps in practice? I suspect most of the problems opus has for me are more context related, and Iâm not sure more models would help. Speculation on my part.
A a a a a a a a a a a a a a a.
At least this comment was written fast.
I've been testing Grok for a few days, and it feels like a major step backward. It randomly deleted some of my code - something I haven't had happen in a long time.
While the top coding models have become much more trustworthy lately, Grok isn't there yet. It doesn't matter if it's fast and/or free; if you can't trust a tool with your code, you can't use it.
Kilo Code has a free trial of Grok Code Fast 1 and I've had very poor results with it so far. Much less reliable than GPT 5 Mini, which was also faster, ironically.
Full Self Coding?
No, making edits to an exiting codebase.
(If that's what you meant)
I think that was just a joke about "Full Self Driving" -- and how it still doesn't work.
Full Self Coding by next year at the latest
What do you mean by âit still doesnât workâ? I never drive anymore because my Tesla does such a fine job of doing it for me.
To me, "full self driving" means you can hop in the back seat and have a nap. If you have to keep your hands near the wheel and maintain attention to the road then... shrugs not really the same. IMHO we're in the "uncanny valley" of vehicular automation.
> the "uncanny valley" of vehicular automation
I think this is a very good description of where autonomous vehicles are right now.
everything a layman would call "AI" is in the "uncanny valley" at the moment!
- Boston Dynamics' Atlas does not move as gracefully as a human
- LLM writing and code is oh-so-easy to spot
- the output of diffusion models is indistinguishable from a photo... until you look at it for longer than 5 seconds and decide to zoom in because "something's wrong"
- motion in AI-generated videos is very uncanny
Iâve had good long trips with my Model Y where I didnât need to intervene once. 4+ hour end of summer road trips.
Fell Self Coding beta (supervised)
Is this the model that is the "Coding" version of Grok-4 promised when Grok-4 had awful coding benchmarks?
I guess if you cannot do well in benchmarks, instead pick an easier to pump up one and run with that - speed. Looking online for benchmarks the first thing that came up was a reddit post from an (obvious) spam account[1] gloating about how amazing it was on a bunch of subs.
[1]https://www.reddit.com/user/Suspicious_Store_137/
"On the full subset of SWE-Bench-Verified, grok-code-fast-1 scored 70.8% using our own internal harness."
Let's see this harness, then, because third party reports rate it at 57.6%
https://www.vals.ai/models/grok_grok-code-fast-1
It does still compare well against the others: https://www.vals.ai/benchmarks/swebench-2025-08-27
I've actually seem really good outputs from the regular Grok 4. The issue seemed to be that it didn't explain anything and just made some changes, which like, I said, were pretty good. I never wanted a faster version, I just wanted a bit more feedback and explanations for suggested changes.
I recently found it much more valuable, and why I am now preferring GPT-5 over Sonnet 4, is that if I start asking it to give me different architectural choices, its really quite good at summarizing trade-offs and and offering step-by-step navigation towards problem solving. I am liking this process a lot more than trying to "one shot" or getting tons of code completely rewritten, thats unrelated to what I am really asking for. This seems to be a really bad problem with Opus 4.1 Thinking or even Sonnet Thinking. I don't think it's accurate, to rate models on "one-shoting" a problem. Rate it on, how easy it is to work with, as an assistant.
I had that issue with gpt-5 that when it wanted to do something in one way that was just plain wrong in this project, and no matter what I said it just kept doing the same action.
it was completely unsterable. I get why people are often upset by "you're right" of Claude models, but that's what I usually want from model.
I guess there is different in expectations depending on experience level of developer, but I want to have final saying what is the right way
Sometimes it's obvious, but in this case, why are you downmodding my comment? I'm genuinely curious, what am I saying, that is so offensive or wrong?
I have the same experience, except while I agree that GPT-5 is better than Sonnet 4 for architecture and deep thinking, Sonnet 4 still seems to be better for just banging out code when you have a well-defined and a very detailed plan.
I thought it was incredible - I asked it a question about a refactoring and it called a ton of tools very quickly to read the code and it had what seemed like solid reasoning - it found two bugs! Of course, neither were bugs at all. But it looked cool!
My experience with 'sonic' during the stealth phase had it do stuff plenty fast, but the quality was slightly off target for some things. It did create tests and then iterate on those tests. The tests it wrote don't actually verify intended behavior. It only verified that mocks were called with the intended inputs while missing the larger picture of how it is used.
Sounds like it excels at tasks like generating boilerplate.
something GPT-4.1 and Gemini 1.5 Flash also did very well!
Definitely fast, but initial use puts quality either comparable to or below gpt-5-nano. This might be a low-cost option for people who don't mind babysitting the output (or working in very small projects), but claude/gpt-5/gemini all seem to have significantly higher quality at marginally more cost/time.
By just emphasizing the speed here, I wonder if their workflows revolve more around the vibe practice of generating N solutions to a problem in parallel and selecting the "best". If so, it might still win out on speed (if it can reliably produce at least one higher-quality output, which remains to be seen), but also quickly loses any cost margin benefits.
Benchmarked here: https://blog.brokk.ai/grok-code-fast-1-added-to-the-power-ra...
Ah, so this is what the Sonic model that Cursor had was. I've been doing this personal bench where I ask each model to create a 3D render of a guy using a laptop on a desk. I haven't written up a post to show the different output from each model, yet, but it's been a fun way to test the capabilities. Opus was probably the best -- Sonic put the guy in the middle of the desk, and the laptop floating over his head. Sonic was very fast, though!
I noticed it pop up on copilot so gave it about two attempts. Neither were fast, and both were incredibly average. Gpt4.1 and 5-mini do a better job, and 5-mini was faster...but I find speed of response varies hugely and seemingly randomly throughout the day.
Maybe its just me but I wish models like this would also provide a normal chat interface.
The leap from taking advice and copy-pasting almost as a shameful fallback, to it just directly driving your tools is a tough pill. Having recently adjusted to "micro-dosing" on LLM's (asking no direct code output, smaller patches) when it comes to code to allow me to learn better is something I don't know how I would integrate with this.
Or do the agentic tools allow for this in some reasonably way and I just don't know?
Just a few days ago I spent some time to sign up for Groq (not Grok, not Musk!) to implement fast code suggestions with qwen3-32b and gpt-oss-20b. Works handily with Jetbrains integrated AI features. I still use Claude Code as my "main" engineer, but I use these fast models for quick, fast edits.
I seriously question anyone supporting this enterprise with all the motives and agenda behind it.
I support the X enterprise, its motives, and its agenda. I'm a happy paying customer. Question away as seriously as you please. But don't bother looping me into that dialog. I'm not interested.
Do you similarly question the Chinese AI companies regarding their involvement with the CCP?
(Not a Trump supporter.)
And, yes, I do. I donât support CCP. I do, however, support hardworking Chinese people fighting for freedom. So, please donât try to twist it like I am making some racist statement.
Fast is cool! Totally has its place. But I use Claude code in a way right now where itâs not a huge issue and quality matters more.
Opus 4.1 is by far the best right now for most tasks. Itâs the first model I think will almost always pump out âgood codeâ. I do always plan first as a separate step, and I always ask it for plans or alternatives first and always remind it to keep things simple and follow existing code patterns. Sometimes I just ask it to double check before I look at it and it makes good tweaks. This works pretty well for me.
For me, I found Sonnet 3.5 to be a clear step up in coding, I thought 3.7 was worse, 2.5 pro equivalent, and 4 sonnet equal maybe tiny better than 3.5. Opus 4.1 is the first one to me that feels like a solid step up over sonnet 3.5. This of course required me to jump to Claude code max plan, but first model to be worth that (wouldnât pay that much for just sonnet).
Pretty sure this was the "stealth" model behind Roo Code Sonic (I saw the name Grok Sonic floating around).
It's a good model for implementing instructions but don't let it try to architect anything. It makes terrible decisions.
Interesting. Available in VSCode Copilot for free.
https://i.imgur.com/qgBq6Vo.png
I'm going to test it. My bottleneck currently is waiting for agent to scan/think/apply changes.
I have been testing it since yesterday in VS Code and it seemed fine so far. But I am also happy with all the GPT-4 variants, so YMMV.
Trying this out now via OpenCode. Seems to be pretty good so far, certainly quick! Free for the next week as well which is a bonus
fast but not smart. Fine for non-critical "I need this query" or "summarize this" but it's pretty much worthless for real coding work (compared to gpt-5 thinking or sonnet 4)
I'd argue that even GPT-5 and Sonnet 4 at their highest reasoning budgets are not enough for "real coding work" because you still have to think about how an LLM should do it instead of what, and put it into a prompt. some harnesses, such as JetBrains Junie or Gemini CLI, make a good job of letting me drift into declarative prompts, but that's still not enough.
totally agree - they all need a human in the loop at this point. I'm constantly stopping gpt-5/sonnet 4 and steering. Unfortunately with grok it completely misses the plot constantly
Also what's interest is that Grok Code is not a general purpose model: it knows coding only.
This will probably be a unpopular, wet blanket opinion...
But anytime I hear of Grok or xAI, the only thing I can think about is how it's hoovering up water from the Memphis municipal water supply and running natural gas turbines to power all for a chat bot.
Looks like they are bringing even more natural gas turbines online...great!
https://netswire.usatoday.com/story/money/business/developme...
Where does OpenAI and Anthropic get their water?
It's not the water that is the big problem here. It is the gas turbines and the location.
They started operating the turbines without permits and they were not equipped with the pollution controls normally required under federal rules. Worse, they are in an area that already led the state in people having to get emergency treatment for breathing problems. In their first 11 months they became one of the largest polluters in an area already noted for high pollution.
They have since got a permit, and said that pollution controls will be added, but some outside monitors have found evidence that they are running more turbines than the permit allows.
Oh, and of course 90% of the people bearing the brunt of all this local pollution are poor and Black.
Why can't it suck up water right from the Mississippi and do Once-Through cooling? Isn't it close? There's definitely more than enough water
qwen coder is 3k tps on cerebras
This is the model that was code named "Sonic" in Cursor last week. It received tons of praise. Then Cursor revealed it was a model from xAI. Then everyone hated it. :/ I miss the days where we just liked technology for advancement's sake.
*edit Case in point, downvotes in less than 30 seconds
I'm pretty sure everyone knew it was xAI last week. It's a great model. I'll never pay to use it, but I like it enough while it's free.
> I miss the days where we just liked technology for advancement's sake.
I think you haven't fully thought through such statements. They lead to bad places. If Bin Laden were selling research and inference to raise money for attacks, how many tokens would you buy?
Iâm tired boss is the only response, I just stick to OpenAI or one provider as they leap frog each other every other Sunday anyways
According to the model card it is extremely fast, can be hijacked 25% of the time, has access to search tools, and has a propensity for dishonesty.
I also think it is optimistic to think the jailbreak percentage will stay at "0.00" after public use, but time will tell.
https://data.x.ai/2025-08-26-grok-code-fast-1-model-card.pdf
it's free in Cursor till Sept 2. My experience is subpar so far
Its focus seems to be on faster responses, which Grok 3 definitely is good at. I have a different approach to LLMs and coding, I want to understand their proposed solutions and not just paste garbled up code (unless its scaffolded) if you treat every LLM as a piecemeal thing when designing code (or really trying to figure out anything) and go step by step, you get better results from most models.
Ah yes, just what everyone has been asking for.
AI with lower code quality, and lots of it, faster.
Thanks, xAI.
Yay more garbage code - faster
A hint to all AI companies, nobody wants quickly generated broken code.
Yeah, I tried it in Copilot and it's fast, but I'd rather have a 2x smarter model that takes 10x longer. The competition for "fast" is the existing autocomplete model, not the chat models.
Why wouldn't you want the option for both?
I haven't used Copilot in a while but Cursor lets you easily switch the model depending on what you're trying to do.
Having options for thinking, normal, fast covers every sort of problem. GPT-5 doesn't let you choose which IMO is only helpful for non-IDE type integrations, although even in ChatGPT it can be annoying to get "thinking" constantly for simple questions.
I have the option for either, but it's an option I'll never choose. My issue with Copilot wasn't speed, it's quality. The only thing that has to be fast is the text-completion part, which Grok isn't replacing. The code chat/agent part needs to focus on actually being able to do things.
Im doing 1000 calculation per second and they're all wrong
AI coding tools are amazing and if you don't use them, that's fine. But lots of people, myself included, are finding tremendous utility in these models.
I'm getting 30-50% larger code changes in per day now. Yesterday I plumbed six slightly mechanical, but still major changes through our schema, several microservice layers, API client libraries, and client code. I wrote down the change sites ahead of time to track progress: 54. All requiring individual business logic. This would have been tedious without tab complete.
And that's not the only thing I did yesterday.
I wouldn't trust these tools with non-developers, but in our hands they're an exoskeleton. I like them like I like my vim movements.
A similar analogy can be made for the AI graphics design and editing models. They're extremely good time saving tools, but they still require a human that knows what they're doing to pilot them.
This is a non-sequitur comment.
I provided anecdotal evidence, but if you want more I can "show, don't tell" it.
Here's YC's pg that I edited after this week's nano banana release:
https://imgur.com/a/internet-DWzJ26B
I'm not an animator and I made that with a few simple tools.
It has a lot of errors and mistakes that I didn't take the time to correct since it was just a silly meme, but do you see how accessible all of this is?
When people with intention and taste use these tools, the results are powerful. I won't claim that the above videos demonstrate this, but I can certainly do good work with these tools.
I don't see how this is anything short of revolutionary.
> anecdotal evidence
These are words that do not belong together in any argument attempting to be convincing.
Here's your last comment stating your own opinions from your own experience:
https://news.ycombinator.com/item?id=45063583
Your commentary is also anecdotal. Why even bother commenting if that's what you believe?