This seems to be an attempt to compete with people running local models on Apple hardwareâeven though those local Mac Mini setups aren't really powerful.
I expect we'll get there in a few years, so perhaps this is Nvidia taking an early step in that direction.
In that case, this goes against Anthropic and OpenAI's business models. Which is a double whammy after Jensen Huang's recent comment about how agentic coding will only increase demand for software engineers, not reduce it.
So it also feels like a part of a budding shift in the competitive tension between the various parts of the AI supply chain.
I don't believe Anthropic and OpenAI are any more fearful of local AI than Google or Microsoft are of people hosting their own email.
Local AI capabilities are growing at a rapid pace, but so is hosted AI. While you can do a surprising amount of useful work with a model occupying a few to a few hundred gigs of VRAM, the hosted models are going to be way ahead for a long time.
Local AI was/is bound to happen, eventually. It'd be smart of Nvidia to get ahead of it.
Non-techy consumers may never do it, but at some point businesses are going to start asking when do they stop paying per token and start running models themselves. Right now the hardware is cost prohibitive, but I doubt that'll always be the case. Eventually the hardware will get cheaper and more available, and Nvidia seems to be betting on that.
They don't care where inference happens, so long as it happens on Nvidia hardware.
IMO it's only a matter of time before "self-hosting local AI" is as complicated as installing an app and clicking a download button.
And when that happens, the pitch to non-techy users is "Free ChatGPT you can use offline with zero privacy risk". Once hardware accessibility and LLM efficiency advance to the point that this becomes feasible, I suspect it'll result in a much bigger hit to the cloud AI market than many expect.
Why is it only a matter of time? The AI-as-a-service companies are going to continue to improve their products by improving both the part that could be reproduced in a self-hosted setup, but also the âsecret sauceâ they put on top of that to make it a better product. There is no incentive for this âsecret sauceâ to be something that can be reproduced for self-hosting, is there?
I think a major incentive could be to sell hardware. If Apple is able to get their hands on a local LLM capable of covering a significant % of what people use ChatGPT for, the pitch they can offer is:
"Free, private, offline ChatGPT so long as your laptop has X GB of RAM"
Beyond that, I wouldn't underestimate the incentive of "because I can". The "secret sauce" you refer to is effectively just a DB & a while loop that feeds text to a bunch of tensors. If an indie dev decides they want to release something that dismantles the OpenAI & Anthropic moats, there really isn't all that big of a technical barrier stopping them.
What secret sauce? We already have open source tooling for tool use, web browsing, and code execution/computer use. Open weight models will win in the end.
AIaaS might keep an edge with multi-modal agentic workflows, but for 80% of general use cases, no "secret sauce" needed, the open weight models are already there, and tooling is constantly getting better.
The bottleneck is the cost of local hardware right now.
I'm from the times when you had to purchase a separate chip to perform floating point math. It was called a math co-processor. [1]
After a few generations (and over a decade) that was indistinguishable from the CPU chip itself.
It's a long hyperbole, I know, but I think local inference is inevitable; and the big fishes know it.
Will that be a complex technical setup? An appliance? An additional chip in your motherboard? So transparent it's burned right into the CPU? Those are just implementation details. We're probably just one generational breakthrough away from it.
That said, Apple's vertical integration is a massive competitive advantage here, IMO. Nvidia's reliance on Microsoft & Windows for software support likely makes competing w/ Apple an uphill battle.
If/when Local AI gets good enough to compete with Cloud AI on most inference workloads, Apple starts to look like Nvidia's biggest competitor.
While this is admittedly a dream scenario, the biggest downside would be Apple effectively having a monopoly in "Agent-ready" consumer electronics. Hopefully local AI both becomes the norm, and there is sufficient competition among the consumer platforms.
Side-note: I would love to see an "RTX Spark" Framework 13 mainboard at some point.
There are still a *lot* of sharp edges with the Spark: compatibility, overstated performance, power consumption/heat generation, etc. It's one thing to have that situation on a box explicitly aimed at developers and quite another with an actual consumer-focused laptop.
I don't think there's any incentive for Nvidia to make this a Windows-only device, so most likely it will be fully supported on Linux, just like their GPUs are.
Depends. It is the typical Nvidia problem. Everything is a black box but when it all works it is the best option available. But when it breaks, you hate them with a passion.
Honestly this looks like Microsoft must have thrown a pile of money at them to not mention it, as it's just too obviously the main question.
No one seriously cares about this running Windows. We want Steam and CUDA/Ollama, and Windows just gets in the way. nVidia are simply not that oblivious, but I have to admit in their position I'd have considered the Microsoft involvement more trouble than it's worth, which is among the many reasons I'm not a billionaire.
Maybe they think the RAM market is so terrible it will kill the whole initiative regardless.
Iâve read all the stuff about how llama.cpp is much faster and better than ollama, and i believe it - but good god llama.cpp isnât user friendly.
Youâd think in an era where âcode is freeâ there would be an easier story around running local ai than compiling llama.cpp by hand and then spending hours researching flags - only for it to crash from an oom error every ten prompts or so.
You're supposed to use a cheap ChatGPT subscription to run optimization loops over llama.cpp flags with a self-contained reproducible benchmark script and just let it burn for hours/days until it is fully optimized ))))
Valve did that little more than a decade ago, the original Steam Machines. It didn't take, and despite the success of the Deck and current techy trends, Linux does not have the % to make the ROI worthwhile if it isn't simple for developers. Proton is a wedge in the door that will help Linux get there.
A potential change in Valve's culture/management aside, "let valve do the work" is a feature, not a bug. Studio spends all their budget targeting one platform (which still has ~90+% of the PC gaming market), and get Linux support for free.
Windows' monopoly on game dev isn't just market share either, since game dev isn't just code. You still need Photoshop, Maya, etc. and in smaller studies there's typically a crossover where some devs are doing art as well. Visual Studio's C++ debugger is still one of the best, and the tooling elsewhere hasn't caught up yet (compared to DX + PIX).
Then you also have to solve distribution and handling the fragmented display & audio stack. It's gotten a lot better, but its still a factor.
I'm fine with most of the work going into Wine/Proton. A stable ABI for Linux is a boon, if it happens to be Win32 then so be it.
In truth if AMD or nVidia put their mind to having decent profiling tooling on Linux, and the AI wave suggests they will have no option, then this could readily become a thing.
Doesn't it come with Nvidia's blend of Ubuntu with a custom kernel? Do other distros work as well as "DGX OS" or are nvidia's kernel changes pretty important to have?
I've not noticed much in it that is NVIDIA specific.
But I would say that as an Ubuntu and Debian user for decades I have no incentive to use anything else on it and I'm just pleased to have a Linux on Aarch64 machine that is well supported for a change.
afaict, they have their own package repo mirrors and a few dedicated packages for nvidia stuff
tbh, I was rather unimpressed with the out-of-box experience for an "ai" computer, you couldn't even run a model locally with the common tools people use (no llama-cpp, ollama, vllm, etc). No huggingface CLI eiher, like come on!
I need to update that because I have a nice vllm setup on there now with 4 models running, but should be able to get anyone else going without having to muddle about as I did.
There are two new things being announced here: the GB10 chip being put into laptops, and GB10 running Windows. GB10 running Linux is not news, it's a product that's been shipping since last fall.
Kinda underwhelming. I was hoping to see that they improved their memory bandwidth to move toward competing with the M5 Max. But this is more akin to the Strix Halo.
I think this is the first time an ARM windows device gets marketed for gaming. Would be interesting to see what kind of performance hit games have on the x86 to ARM translation layer.
Rosetta on Mac was obviously impressive. There was also impressive Arm->Intel translation in the mobile ecosystem at one time.
One reason it works surprisingly well on modern systems is how much is offloaded to the GPU. You aren't going to get great power optimization or anything without it being truly native though.
There are games which are CPU limited though, and it will be interesting how those do. Curiously those also tend to be in engines with Arm support already.
There was a presentation from Valve about their Dex compatibility layer. They did something that seems so obvious in retrospect.
When you lay out the software stack it is essentially OS > Game code > APIs. Both the OS and APIs are native code, it is only that middle point that needs the real work.
This is why x86 to ARM doesn't have such a heavy performance cost. So games can be CPU heavy but if it is heavy at the API end, that isnt a huge issue.
For Apple use of Rosetta 2 was only temporary as they moved whole lineup to ARM. MS would not abandon x64 anytime soon. So I'm guessing they will try hard to convince developers to release for both architectures.
Some competition for Apple in this space and competition for Intel and AMD is great.
But I really do question how well Windows on Arm is really going to work out long term.
For Apple it worked because they were able to force the issue. If you wanted a new Mac it was going to be Arm and we all knew eventually (this year or is it next year?) Intel support would drop. Over time we have seen M series exclusive features.
Developers were forced to update or abandon Mac which gave users a great experience (with some early growing pains).
This is something that Windows will never be able too do. They will always be stuck maintaining an emulator and a likely large subset of apps only supporting one over the other. (also does this work the other way around with an Arm only app working on x86?)
This seems like a repeat of when it was not uncommon for games to only support Intel or AMD or NVIDIA or AMD. But worse since they are not both x86. Sure at least we have emulation but just like with Rosetta2 it shouldn't ever be the long term solution.
For Apple it worked because they waited until they had a really, really good ARM ISA CPU (combined with arguably sandbagging their x86 offering for a few years prior but I digress).
Qualcomm is also working on a really good ARM ISA CPU with their acquisition of NuVia and subsequent Oryon architecture.
Meanwhile this is just using off-the-shelf ARM CPUs in a MediaTek SoC with blackwell bolted to the side of it. ARM's CPUs so far have been subpar for laptop-class chips. Hence why neither Apple nor Qualcomm are using them.
That's surely one thing, Apple went all-in on ARM, for Microsoft it's still a kinda "reduced experience".
But the bigger problem in my opinion: How much of the Windows userbase actually sticks to Windows because of its backwards-compatibility?
--> What would happen if they break this model and the OS is only judged based on its user experience and available applications...?
I'm not sure it would stand any chance to compete in the B2C space. If I think about it, there's not a single new feature in Windows of the last ~20 years I particularly care about.
Without backwards compatibility, there's barely any ecosystem. MacOS on the other hand is full of ecosystem features, improving collaboration, connectivity, handoff across devices, etc.
> MacOS on the other hand is full of ecosystem features, improving collaboration, connectivity, handoff across devices, etc.
True, but if you're only in the ecosystem as a mac user, in many ways it's felt like a mixed bag. I still wildly prefer mac over other operating systems, but if upgrades had a price, I think those sales would mostly go to iPhone users. Even at free, I'm yet to find a compelling reason to install Tahoe, and will probably just continue waiting until the next one.
I am wary of those ARM-based Windows machines because I am unsure how good the ongoing driver support for those SoCs will be. Will they even outlive the Windows version they currently ship with?
Looking at devices like the NVIDIA Shield gives me some hope that NVIDIA will be better than Qualcomm here. I just hope this is not a case where the OEM has to purchase X years of driver support from the chip vendor beforehand, and that NVIDIA will provide support directly itself.
I really hope these take off and succeed and they support Linux. Qualcomm is seriously holding back the Linux ARM adoption with their continuous missteps.
For anyone curious to know how this will fare against Macbooks, at least in CPU perf: DGX Spark has the exact same GPU and CPU as the top RTX Spark laptops will, so you can just directly compare from that.
Of course, DGX Spark is a miniPC, so laptops will likely be slower due to power limits/throttling.
We'll need to wait for the benchmarks, but this looks great! Windows 11 ARM64 is already amazing, and if these really are an upgrade from the Qualcomm chips we're going to have even better laptops on the market.
+ battery too. I've wondered if a mini pc with battery would make for a good form factor. I often move between places where I have a desk with a screen but still use a laptop because I want to just suspend and resume. If a mini pc had a small battery just to hold its RAM while suspended I could move between places and just plug in a single USB-C cable and have my full workstation up and running. The thermals could be better than in a laptop and having a built-in UPS better than with a desktop. But last time I checked no one packaged things like that.
There's the Khadas Mind series of mini pcs. They have a proprietary docking interface though. Agree that it would be great if this form-factor was more common.
They didn't say that Mediatek made the cpu sores. Grace is NVidia's own cpu arm cores. I bet that Mediatek made other parts of SoC necessary for a notebook
Well, MediaTek actually said they made most of the SoC in fact. But the actual CPU cores themselves are all but certainly off-the-shelf Cortex parts, since MediaTek doesn't have a custom core design at all afaik.
NVIDIA hasn't done custom CPU cores for anything they've yet branded "Grace". The original Grace data center CPU (paired with the Hopper data center GPU) used ARM Neoverse V2 cores. The "GB10" chip shipped in DGX Spark and announced here for RTX Spark uses Cortex X925 and Cortex A725 CPU cores.
Physically, NVIDIA did the GPU chiplet and Mediatek did the other chiplet that has the CPU, DRAM controller, and IO.
GB300 is nominally "available" in desktop form factor workstations priced around $100k. That's a few orders of magnitude away from the ordinary desktop PC market that consumers participate in.
No thunderbolt is a big no for me. Its one of the greatest feature of MacbookPro that makes it dockable and expandable as a desktop with a good thunderbolt dock.
can these do training or only inference? currently working on learning machine learning and I'd love to have a physical machine I could aim to build real workloads on in a few years.
Itâs possible (likely, even) to have a chip fast enough for inference, but not fast enough or with enough memory to do meaningful training runs. Like the current DGX spark.
I didn't see this in the article but elsewhere I've seen the memory bandwidth quoted as 600GB/s [1]. For comparison:
- 5090/6000 Pro: 1792GB/s
- 5080:: 960GB/s
- 5070Ti: 892GB/s
- M3 Ultra: 819GB/s
- DGX Spark: 273GB/s (less than an M5 Pro at 307GB/s)
Memory bandwidth isn't everything but it will cap inference rate pretty heavily. Also, the M3 Ultra is for an almost 2 year old Mac Studio. It's widely expected that it'll be refreshed in Q3 with a likely M5 or M4 Ultra with >1000GB/s. I really hope Apple realizes what a market opportunity Apple has here.
The above shows just how good value the 5090 really is. It basically a RTX 6000 Pro with less RAM (and ~12% fewer CUDA units), which is a ~$10k card, for 20-30% of the price. This also demonstrates how NVidia uses VRAM for market segmentation. As an aside, the true data center cards (eg B100, H100) use HBM memory at ~3.2TB/s.
> tl;dr - For software development, Qwen3.6 27B, 5090 gives you ~3x speed over M5 Max, letting you plow through code, while M5 Max gives you ~4x memory, letting you use higher quantization and bigger context. Which would you choose and why?
I've read a number of things from which the consensus seems to be that yes you can run a larger model and/or have more context with a 128GB+ Mac but the performance gap is still massive and with current hardware we're still talking about inference rates that matter. By this I mean there's a big difference between 10tok/s vs 30. Once we get to t apoint where it's 100 vs 300, it won't be as big of a deal, a bit like FPS in games.
Oh and there are similar concerns with the DGX Spark [2].
Unfortunately in the current market 32GB of ddr5 seems to run about $400 as 2x16gb DIMMS, and even more for 1x32GB DIMM (higher density chips are more expensive). So $600 really isn't much over market price, especially considering strix halo uses 8000MHz ram instead of the typical 6000 found in consumer dimms.
But probably worth clarifying it's not a typical "MediaTek CPU" some might assume by that. It has Nvidia's customized ARM CPU implementation + their GPU.
Looks like the MSI one might be a 2-in-1, if it has good stylus support I might have a good candidate for an upgrade, thought my ~3-4 year old Galaxy Book is holding up alright for now.
What is this product anyway? Is it a general purpose CPU or is it specifically designed for MS Windows? Nvidia stepping back from the open source?
"Introducing the NVIDIA RTX Sparkâą Superchip. The fusion of NVIDIA AI and RTX graphics in a single chip redefines Windows PCs and delivers amazing creating, AI development, and gamingâon the slimmest, most beautiful RTX laptops ever and small, ultra-efficient desktops."
Its nvidia attempt to gain additional market share and expected as well. If the whole ecosystem is around nvidia and its the easiest way of running stuff, Nvidia offering more enterprise infrastrcuture allows companies to just buy directly nvidia.
Nvidia is also very very rich and pushes the boundaries of stuff. They stoped waiting for industry standards. You can see this in there network stuff. All nvidia.
Next logical step (at least now, not something i thought about) was there CPU for their GPU racks/clusters/systems.
Now they have everything anyway, RTX Spark is just logical.
I don't think its specificly targeted at Apple at all.
Apple has like 10-15% market share and just because some IT nerds buy themselves a mac mini doesn't mean much.
Plenty of them actually just run openclaw without local models. Something which surprised me quite a lot.
But i have two 4090 at home. They consume a lot of power and i had to research the proper Mainboardmodel and had to mod one 4090 to use water cooling because they run too hot.
There Spark setup was at 3k, way to expensive for normal people. If they can get this down and sell more, great for their ecosystem (strengthening it) and getting more money from people.
It does surprise me though that they have enough capacity for this chip and not just putting everyting in Rubin but perhaps the build out has slowed down a little or they start to diverse already for economic savety
All the news articles in my feed mentioned Nvidia reinventing personal computing which is laughable given the specs are worse than the m series. Iâm guessing they saw how well Apple devices were selling and rushed to get something similar out so they can ride the hype train and have something to fall back on if ai DC spend slows down.
There's a lot of companies trying to support datacenter systems like GH and Rubin that don't have dev hardware remotely resembling it. M-series isn't a good option, speaking from the personal experience of currently using one for this exact purpose.
I wouldn't say it's Nvidia stepping back from open source... if anything this is doubling down on it, as one of the selling points of this is the 128GB of unified memory which will allow for hosting local models (i.e, nvidia's new open model they just released). I guess it's pretty cool, I'm a big supporter of local LLMs/open weight models so seems enticing to me, although I'm not sure this will be super applicable to a lot of regular consumers. Seems like a pretty niche product.
Unified RAM means its soldered to the mainboard, right?
I'm not sure if I like this. Sure for a laptop this might be not a big problem but if this ARM ecosystem is a success it will spread to desktop computers and I fear we could lose the existing modularity.
I have no idea how powerful or power efficient these guys are, but this seems to be the first step in a bigger push towards Windows on ARM (without loosing gaming).
I think more announcements will follow soon from other companies.
It's worth noting that Nvidia power management on Linux has been absymal. There also aren't any of the usual power management options to see how much power things are using, which is quite atypical for a modern system.
Nvidia really threw stuff over the wall with the DGX Spark release. They don't seem to really care. I sort of think they'll spend a little more time on Windows, where there's no pesky upstreaming to do and they can just do whatever, but man, it's such typical hubris from Nvidia to build such an expensive box with good chips but make it basically unsupportable and roasty hot all the time.
You also generally have to run an ever more stale two year old Ubuntu derived DGX OS to get anywhere, with bespoke kernel and drivers all. None of it is well supported, none of it just works like a comparable PC or even well behaved arm system would.
As for other ARM, there were rumors AMD Sound Wave is/was going to be a ~10W arm APU, but there hasn't been much said about it lately. Honestly given the ram crunch, it's maybe just not worth trying to build a system with a cheap core, if the rest of your costs are going to stay so stratospheric.
https://www.techpowerup.com/341848/amd-sound-wave-arm-powere...
I really like this, but I think the reason Apple Silicon took off was that Apple sort of forced devs to support ARM. Not sure if Microsoft can do the same for WindowsâŠ
Developers werenât really âforcedâ to support ARM. They simply recognized that all future Macs would be ARM, whereas most new PCs would continue to run on x86. So the incentive to adopt ARM was much weaker on the PC side.
ARM64+GPU sure seems like the future. I'm still using my M1 and even that can handle models well, has decent graphics, M5 is a beast, and M6 must surely go even bigger on LLM compute. Now Microsoft has a compelling ARM64+GPU future too.
After nvidia's many years of neglecting Linux, paired with direct Microsoft's involvement? Are we going to trust them, to allow installing Linux in these easily?
It ships with DGX OS 7, which includes Ubuntu's 24.04 repos. It is not using mainline Ubuntu, and if you want to run Ubuntu 26.04, you'll have to do some work.
Strix halo's 8060S gpu is very weak, and is roughly equivalent to a 4060 laptop GPU, whereas GB10's gpu is equivalent to a desktop 5070. For LLM throughput, tok/s is similar due to bottleneck by memory bandwidth, but the GB10 has 3x faster prefill. People have also been able to squeeze out much better performance on GB10 using NVFP4 and other improvements in the months after the DGX Spark launch, so don't be misled by early lackluster benchmarks. For the RTX Spark, which also targets gaming and creative applications, the 3x faster GPU is quite nice.
I feel like the shape of the market right now for "home lab" inference is:
The sparks are good if your ultimate plan is to spend even more on NVidia hardware in future to run your dev setups at usable speeds. Or, you're developing for a work cluster.
If you mainly want to run local models at acceptable speeds portably, buy a mac with lots of RAM. If youâre happy with non-portable / racked, buy 3090s (dense) or mac studios (MoEs). Buy newer cards if you are restricted on power or slots. If you are rich, buy a6000 blackwells.
The only Question is is it worth suffering hip and x86? I suspect a lot of folks might like a machine that mimics their GB300 But costs less than a dgx.
Also I heard the tensor core instructions on the dgx are gimped and youâre better off with a rtx pro x000. Is that the same with these machines?
Is CUDA really a lead for long? Arenât all the latest competitive approaches avoiding all the standard software stacks and writing deeply customized software that is very directly tied to whatever hardware they use?
And is it really a way to lock in people? With AI coding tools, isnât it trivial to write software on top of CUDA and rewrite it to target some other hardware?
It all sounds good on paper. But I have trouble believing Windows can be a good platform for this. Microsoft has lost all trust after inserting ads into windows, slowly removing power user features, and exploiting every dark pattern they can. And for years, the ARM based Windows laptops have been useless due to app compatibility issues. Why would this change now? Is it priced to be a lot cheaper than Appleâs laptops? Or is this a niche product for AI developers basically?
Anecdotally Windows ARM works fine for me, although to be honest most of my work is command line + browser anyway. WSL works like a treat. Steam installs and most lower end games also play fine on my ARM laptop too. Games that require kernel anticheat don't work.
I think they make a great "second device" where you have something meatier to fall back to if something doesn't quite work right. I'm not sure if it's ready to take on the "main device" role just yet. But it's a far far better experience than the Surface RT days.
The "gaming" take is a strange one indeed for an ARM platform. Hopefully they (Microsoft or Nvidia?) put some real effort into the translation layer. They claim modern AAA games, but it is possible they strongarmed the developers to make them an ARM build for a few select titles...
It's clear gaming was not a major concern, it's just "good enough" for someone running AI models and occasionally wants to play some games, not made to primarily play games.
Yep. I noticed the press releases talk about all the partners they have. It seems like a desperate attempt to manufacture a consensus to invest in this new hardware instead of leaving it sort of abandoned like the other Windows ARM stuff. But the problem is that these attempts end up having a few very visible apps working on the architecture and others not actually doing anything substantial.
Sure the graphics capabilities are probably very good. But if youâre a game developer who has traditionally built on Windows on x86 chips, would you want to invest in this new chip or invest in making games for the Apple ecosystem? Arenât there more new customers to reach in the Apple world than this new Nvidia world?
> But if youâre a game developer who has traditionally built on Windows on x86 chips, would you want to invest in this new chip or invest in making games for the Apple ecosystem?
Windows and the new chip. Higher developer productivity and higher chances of a substantial audience.
Who cares about Windows, the goal is to run local AI models similar to AMD Strix Halo and Apple Silicon machines. The OS is honestly a distant last concern as long as the models work well, as you could put Linux on these too, but not sure how well wake lock works.
I would never trust Microsoft. Their next drama is revoking Office 2019 perpetual licenses https://www.youtube.com/watch?v=KRnno9VIZx0. It never ends with them because they know they have you by the balls.
This may finally be the chip family ARM on Windows has always needed. Qualcomm's chips have always been dogs with slow off-the-shelf ARM CPU cores that have pathetic single-threaded performance compared to x86 AMD/Intel or ARM Apple Silicon designs.
For reference, this is just a single benchmark, but as an idea of each vendor's top mobile CPU single-threaded performance:
Geekbench Single Thread Score:
- DGX Spark (same CPU as RTX Spark): 3125
- Snapdragon X1 Elite: 2950
- Snapdragon X2 Elite Extreme: 4050
- AMD Ryzen 9 9955HX: 3225
- Intel Core Ultra 9 290HX Plus: 3175
- Apple M5 Max: 4350
I'm happy to be wrong about Qualcomm's latest X2 chip performance, even if it is shipping in only a single product so far. Their previous best was the lowest in this list.
Qualcomm Snapdragon x1 and upcoming x2 use their Oryon core and have much faster single-thread performance than Intel/Amd and this nvidia soc that uses off-the-shelf arm cores
That wasn't true of the X1, but apparently the X2 (which is only in a single device so far) does appear to finally be fast. The first Windows ARM CPU to be faster than any of its x86 rivals. Competitive with Apple Silicon single-thread performance even.
I was disappointed to see that the RTX Spark has the ARM cores from the DGX Spark. I was hoping it had their new in-house developed cores that Nvidia is starting to use on their latest gen server parts. They look really fast. That said, if RTX Spark has CPU performance like the DGX Spark, it will be almost as fast as the top AMD/Intel parts.
It won't, the top tier RTX Spark has the same exact CPU and GPU as DGX Spark, so you can check DGX Spark CPU benchmarks to see how it fares. Spoiler: it's about M3 Max level. And they're only coming this fall.
This seems to be an attempt to compete with people running local models on Apple hardwareâeven though those local Mac Mini setups aren't really powerful.
I expect we'll get there in a few years, so perhaps this is Nvidia taking an early step in that direction.
In that case, this goes against Anthropic and OpenAI's business models. Which is a double whammy after Jensen Huang's recent comment about how agentic coding will only increase demand for software engineers, not reduce it.
So it also feels like a part of a budding shift in the competitive tension between the various parts of the AI supply chain.
I don't believe Anthropic and OpenAI are any more fearful of local AI than Google or Microsoft are of people hosting their own email.
Local AI capabilities are growing at a rapid pace, but so is hosted AI. While you can do a surprising amount of useful work with a model occupying a few to a few hundred gigs of VRAM, the hosted models are going to be way ahead for a long time.
Local AI was/is bound to happen, eventually. It'd be smart of Nvidia to get ahead of it.
Non-techy consumers may never do it, but at some point businesses are going to start asking when do they stop paying per token and start running models themselves. Right now the hardware is cost prohibitive, but I doubt that'll always be the case. Eventually the hardware will get cheaper and more available, and Nvidia seems to be betting on that.
They don't care where inference happens, so long as it happens on Nvidia hardware.
IMO it's only a matter of time before "self-hosting local AI" is as complicated as installing an app and clicking a download button.
And when that happens, the pitch to non-techy users is "Free ChatGPT you can use offline with zero privacy risk". Once hardware accessibility and LLM efficiency advance to the point that this becomes feasible, I suspect it'll result in a much bigger hit to the cloud AI market than many expect.
Why is it only a matter of time? The AI-as-a-service companies are going to continue to improve their products by improving both the part that could be reproduced in a self-hosted setup, but also the âsecret sauceâ they put on top of that to make it a better product. There is no incentive for this âsecret sauceâ to be something that can be reproduced for self-hosting, is there?
I think a major incentive could be to sell hardware. If Apple is able to get their hands on a local LLM capable of covering a significant % of what people use ChatGPT for, the pitch they can offer is:
"Free, private, offline ChatGPT so long as your laptop has X GB of RAM"
Beyond that, I wouldn't underestimate the incentive of "because I can". The "secret sauce" you refer to is effectively just a DB & a while loop that feeds text to a bunch of tensors. If an indie dev decides they want to release something that dismantles the OpenAI & Anthropic moats, there really isn't all that big of a technical barrier stopping them.
What secret sauce? We already have open source tooling for tool use, web browsing, and code execution/computer use. Open weight models will win in the end.
AIaaS might keep an edge with multi-modal agentic workflows, but for 80% of general use cases, no "secret sauce" needed, the open weight models are already there, and tooling is constantly getting better.
The bottleneck is the cost of local hardware right now.
I'm from the times when you had to purchase a separate chip to perform floating point math. It was called a math co-processor. [1]
After a few generations (and over a decade) that was indistinguishable from the CPU chip itself.
It's a long hyperbole, I know, but I think local inference is inevitable; and the big fishes know it.
Will that be a complex technical setup? An appliance? An additional chip in your motherboard? So transparent it's burned right into the CPU? Those are just implementation details. We're probably just one generational breakthrough away from it.
[1] https://en.wikipedia.org/wiki/X87
One can only hope.
That said, Apple's vertical integration is a massive competitive advantage here, IMO. Nvidia's reliance on Microsoft & Windows for software support likely makes competing w/ Apple an uphill battle.
If/when Local AI gets good enough to compete with Cloud AI on most inference workloads, Apple starts to look like Nvidia's biggest competitor.
While this is admittedly a dream scenario, the biggest downside would be Apple effectively having a monopoly in "Agent-ready" consumer electronics. Hopefully local AI both becomes the norm, and there is sufficient competition among the consumer platforms.
Side-note: I would love to see an "RTX Spark" Framework 13 mainboard at some point.
There are still a *lot* of sharp edges with the Spark: compatibility, overstated performance, power consumption/heat generation, etc. It's one thing to have that situation on a box explicitly aimed at developers and quite another with an actual consumer-focused laptop.
Can it work with Linux? That's all I care about.
I don't think there's any incentive for Nvidia to make this a Windows-only device, so most likely it will be fully supported on Linux, just like their GPUs are.
> just like their GPUs are
So with proprietary blobs that give you more trouble that they're worth?
Those blobs are worth $5T; show some respect.
Depends. It is the typical Nvidia problem. Everything is a black box but when it all works it is the best option available. But when it breaks, you hate them with a passion.
I wouldn't trust it to have good upstream support. It's Nvidia. So not really interested.
Honestly this looks like Microsoft must have thrown a pile of money at them to not mention it, as it's just too obviously the main question.
No one seriously cares about this running Windows. We want Steam and CUDA/Ollama, and Windows just gets in the way. nVidia are simply not that oblivious, but I have to admit in their position I'd have considered the Microsoft involvement more trouble than it's worth, which is among the many reasons I'm not a billionaire.
Maybe they think the RAM market is so terrible it will kill the whole initiative regardless.
You misspelled llama.cpp
Iâve read all the stuff about how llama.cpp is much faster and better than ollama, and i believe it - but good god llama.cpp isnât user friendly.
Youâd think in an era where âcode is freeâ there would be an easier story around running local ai than compiling llama.cpp by hand and then spending hours researching flags - only for it to crash from an oom error every ten prompts or so.
You're supposed to use a cheap ChatGPT subscription to run optimization loops over llama.cpp flags with a self-contained reproducible benchmark script and just let it burn for hours/days until it is fully optimized ))))
WSL is the answer in what most folks are concerned.
Has Steam finally started to push for native Linux games instead of translating Windows ones?
Valve did that little more than a decade ago, the original Steam Machines. It didn't take, and despite the success of the Deck and current techy trends, Linux does not have the % to make the ROI worthwhile if it isn't simple for developers. Proton is a wedge in the door that will help Linux get there.
It is simple, Android NDK has all the same APIs for 3D rendering and audio, as do all major middleware engines.
The failure of business, only reinforces Windows as the platform most studios reach for.
Buy Windows, buy Visual Studio, pay game engines licenses, let Valve do the work.
This ignoring that current Valve's management doesn't live forever, so who knows what happens afterwards.
A potential change in Valve's culture/management aside, "let valve do the work" is a feature, not a bug. Studio spends all their budget targeting one platform (which still has ~90+% of the PC gaming market), and get Linux support for free.
Windows' monopoly on game dev isn't just market share either, since game dev isn't just code. You still need Photoshop, Maya, etc. and in smaller studies there's typically a crossover where some devs are doing art as well. Visual Studio's C++ debugger is still one of the best, and the tooling elsewhere hasn't caught up yet (compared to DX + PIX).
Then you also have to solve distribution and handling the fragmented display & audio stack. It's gotten a lot better, but its still a factor.
I'm fine with most of the work going into Wine/Proton. A stable ABI for Linux is a boon, if it happens to be Win32 then so be it.
At this point Valve look more capable of running a platform business than Microsoft do.
Microsoft have spent the whole Nadella era in "oooo cloud" inspired wonder and actively screwed up everything else.
If it runs faster than the windows ones, who cares?
The game developers that use Windows, with Visual Studio, to develop such games.
This is, admittedly, the great anomaly.
In truth if AMD or nVidia put their mind to having decent profiling tooling on Linux, and the AI wave suggests they will have no option, then this could readily become a thing.
Sort of. It's the same chipset as in the DGX Spark & DGX Station, which run Ubuntu (NVIDIA's flavor).
DGX Spark comes with linux out of the box, it would be hard to imagine this device is not also compatible
Doesn't it come with Nvidia's blend of Ubuntu with a custom kernel? Do other distros work as well as "DGX OS" or are nvidia's kernel changes pretty important to have?
Hopefully better than support on their Jetson or orin boards, where compiling anything is hard because of the outdated stack.
I've not noticed much in it that is NVIDIA specific.
But I would say that as an Ubuntu and Debian user for decades I have no incentive to use anything else on it and I'm just pleased to have a Linux on Aarch64 machine that is well supported for a change.
For some value of "well supported" - NVIDIA's own internal catalogs (libraries, NIMs, etc) are still spotty on aarch64 coverage.
afaict, they have their own package repo mirrors and a few dedicated packages for nvidia stuff
tbh, I was rather unimpressed with the out-of-box experience for an "ai" computer, you couldn't even run a model locally with the common tools people use (no llama-cpp, ollama, vllm, etc). No huggingface CLI eiher, like come on!
I did put together my eventual setup in a repo https://github.com/verdverm/sparky
I need to update that because I have a nice vllm setup on there now with 4 models running, but should be able to get anyone else going without having to muddle about as I did.
This is strangely absent from the news.
There are two new things being announced here: the GB10 chip being put into laptops, and GB10 running Windows. GB10 running Linux is not news, it's a product that's been shipping since last fall.
It's a collaboration with Microsoft so going to say no, probably not.
Kinda underwhelming. I was hoping to see that they improved their memory bandwidth to move toward competing with the M5 Max. But this is more akin to the Strix Halo.
I think this is the first time an ARM windows device gets marketed for gaming. Would be interesting to see what kind of performance hit games have on the x86 to ARM translation layer.
Rosetta on Mac was obviously impressive. There was also impressive Arm->Intel translation in the mobile ecosystem at one time.
One reason it works surprisingly well on modern systems is how much is offloaded to the GPU. You aren't going to get great power optimization or anything without it being truly native though.
There are games which are CPU limited though, and it will be interesting how those do. Curiously those also tend to be in engines with Arm support already.
There was a presentation from Valve about their Dex compatibility layer. They did something that seems so obvious in retrospect.
When you lay out the software stack it is essentially OS > Game code > APIs. Both the OS and APIs are native code, it is only that middle point that needs the real work.
This is why x86 to ARM doesn't have such a heavy performance cost. So games can be CPU heavy but if it is heavy at the API end, that isnt a huge issue.
Very cool.
Apple Silicon has a special mode that modified how the ARM chip handles memory transactions to be like x86. Does this nvidia ARM have the same?
What would be interesting to me would be how quickly developers start targeting ARM64 directly.
For Apple use of Rosetta 2 was only temporary as they moved whole lineup to ARM. MS would not abandon x64 anytime soon. So I'm guessing they will try hard to convince developers to release for both architectures.
Some competition for Apple in this space and competition for Intel and AMD is great.
But I really do question how well Windows on Arm is really going to work out long term.
For Apple it worked because they were able to force the issue. If you wanted a new Mac it was going to be Arm and we all knew eventually (this year or is it next year?) Intel support would drop. Over time we have seen M series exclusive features.
Developers were forced to update or abandon Mac which gave users a great experience (with some early growing pains).
This is something that Windows will never be able too do. They will always be stuck maintaining an emulator and a likely large subset of apps only supporting one over the other. (also does this work the other way around with an Arm only app working on x86?)
This seems like a repeat of when it was not uncommon for games to only support Intel or AMD or NVIDIA or AMD. But worse since they are not both x86. Sure at least we have emulation but just like with Rosetta2 it shouldn't ever be the long term solution.
For Apple it worked because they waited until they had a really, really good ARM ISA CPU (combined with arguably sandbagging their x86 offering for a few years prior but I digress).
Qualcomm is also working on a really good ARM ISA CPU with their acquisition of NuVia and subsequent Oryon architecture.
Meanwhile this is just using off-the-shelf ARM CPUs in a MediaTek SoC with blackwell bolted to the side of it. ARM's CPUs so far have been subpar for laptop-class chips. Hence why neither Apple nor Qualcomm are using them.
> arguably sandbagging their x86 offering
tbh, I always read this as Intel doing some sales magic here.
Apple: "Hey, we're making a product that has a 15w thermal envelope, do you have anything?"
Intel: "Yes!"
(Unspoken: their products will throttle down to fit, in fact, they will try to run always at 99ÂșC so you always get the best performance! FEATURE!)
Apple: "uhhhh..."
Consumers: "HEH IS IT EVEN A PRO DEVICE IF IT DOESN"T HAVE <INTEL MARKETING BRAND TERM>?"
Apple: "UHHHH... Guess we'll do it ourselves"
That's surely one thing, Apple went all-in on ARM, for Microsoft it's still a kinda "reduced experience".
But the bigger problem in my opinion: How much of the Windows userbase actually sticks to Windows because of its backwards-compatibility?
--> What would happen if they break this model and the OS is only judged based on its user experience and available applications...?
I'm not sure it would stand any chance to compete in the B2C space. If I think about it, there's not a single new feature in Windows of the last ~20 years I particularly care about.
Without backwards compatibility, there's barely any ecosystem. MacOS on the other hand is full of ecosystem features, improving collaboration, connectivity, handoff across devices, etc.
> MacOS on the other hand is full of ecosystem features, improving collaboration, connectivity, handoff across devices, etc.
True, but if you're only in the ecosystem as a mac user, in many ways it's felt like a mixed bag. I still wildly prefer mac over other operating systems, but if upgrades had a price, I think those sales would mostly go to iPhone users. Even at free, I'm yet to find a compelling reason to install Tahoe, and will probably just continue waiting until the next one.
I feel like making universal binaries a thing, and pushing for it to be standard is one viable path.
They already kind of are with ARM64EC, however Windows ecosystem isn't macOS, unless there is market pressure, most shops will keep doing x86/x64.
Microslop doesnât want people to be able to run their binaries elsewhere, itâs the only reason people buy their product.
They also buy it, because to this day most people cannot buy GNU/Linux powered laptops on the stores they usually buy their computers from.
They only know Apple, Windows and Chromebooks.
Love seeing AMD forcing Novideo to catch up for once rather than the other way around.
Oh, btw, we are only making 10 of these, the rest of our capacity has been sold off to the large AI firms.
I am wary of those ARM-based Windows machines because I am unsure how good the ongoing driver support for those SoCs will be. Will they even outlive the Windows version they currently ship with?
Looking at devices like the NVIDIA Shield gives me some hope that NVIDIA will be better than Qualcomm here. I just hope this is not a case where the OEM has to purchase X years of driver support from the chip vendor beforehand, and that NVIDIA will provide support directly itself.
I would love a RTX Spark Shield. ;p
I really hope these take off and succeed and they support Linux. Qualcomm is seriously holding back the Linux ARM adoption with their continuous missteps.
For anyone curious to know how this will fare against Macbooks, at least in CPU perf: DGX Spark has the exact same GPU and CPU as the top RTX Spark laptops will, so you can just directly compare from that.
Of course, DGX Spark is a miniPC, so laptops will likely be slower due to power limits/throttling.
UEFI, display panels, wifi, storage controllers, etc would be what I'm worried about. I doubt Microsoft is going to make it easy.
DGX Spark is also $4700
It's been almost 30 years, and a single letter changed. When will we get the Sparkstation, the UltraSpark and the SuperSpark?
Will we get enterprise ready open firmware too instead of this "we missed DOS so we invented UEFI" for boot firnware?
SuperSpark and then UltraSpark. And then we can get SparkCube, Sparkii, and SparkiiU.
I'm personally waiting for the OpenSpark.
https://en.wikipedia.org/wiki/OpenSPARC
I'm waiting for the AllSpark
And the top tier QuantumSpark?
With competition from the MegaSpark and SparkGenesis.
Awesome, won't be buying it all at current prices but once they calm down, I will very much like to get one.
Around 2-3K USD something with a good GPU + CPU + 128GB of integrated RAM is just going to be an awesome experience.
Considering Mac options are north of 5K+ even on a regular day.
DGX Spark is $4700, so I kind of doubt that RTX Spark's top configs will be cheaper than that.
The DGX also contains the 200 GbE networking and linux support.
The ConnectX 7 2x200 Gbps networking card in the DGX Spark alone is worth $700
To be fair the connectx-7 in the spark can't even push 2x200 Gbps since it is connected via 4 pcie lanes.
Technically it's connected via 8 PCIe gen 5 lanes (two 4x connections), allowing ~100Gbps per port.
Thanks for the correction. I should have looked it up; I only remembered it being somewhat odd.
Laptops will also have to contain a much tighter configuration, display, keyboard, camera, etc ;)
there is desktop variant as well
isn't dgx ai first and rtx prosumer first. I think it will be cheaper longer term not atm with component inflation
We'll need to wait for the benchmarks, but this looks great! Windows 11 ARM64 is already amazing, and if these really are an upgrade from the Qualcomm chips we're going to have even better laptops on the market.
Is this just dgx spark, but a laptop?
yes, same chip
+ Windows
+ Screen
- ConnectX-7 Smart NIC
> - ConnectX-7 Smart NIC
Can the link type be toggled between Ethernet and Infiniband? (Don't think I've ever heard of a laptop with IB.)
+ battery too. I've wondered if a mini pc with battery would make for a good form factor. I often move between places where I have a desk with a screen but still use a laptop because I want to just suspend and resume. If a mini pc had a small battery just to hold its RAM while suspended I could move between places and just plug in a single USB-C cable and have my full workstation up and running. The thermals could be better than in a laptop and having a built-in UPS better than with a desktop. But last time I checked no one packaged things like that.
There's the Khadas Mind series of mini pcs. They have a proprietary docking interface though. Agree that it would be great if this form-factor was more common.
What about the desktop version? It seemed like it is not a dgx since it has the CPUs cores done by mediatek
The DGX Spark/GB10 has CPU cores from Mediatek (in a pretty odd cluster configuration, too).
They didn't say that Mediatek made the cpu sores. Grace is NVidia's own cpu arm cores. I bet that Mediatek made other parts of SoC necessary for a notebook
MediaTek said MediaTek made the CPU: https://www.mediatek.com/press-room/mediatek-collaborates-wi...
Well, MediaTek actually said they made most of the SoC in fact. But the actual CPU cores themselves are all but certainly off-the-shelf Cortex parts, since MediaTek doesn't have a custom core design at all afaik.
NVIDIA hasn't done custom CPU cores for anything they've yet branded "Grace". The original Grace data center CPU (paired with the Hopper data center GPU) used ARM Neoverse V2 cores. The "GB10" chip shipped in DGX Spark and announced here for RTX Spark uses Cortex X925 and Cortex A725 CPU cores.
Physically, NVIDIA did the GPU chiplet and Mediatek did the other chiplet that has the CPU, DRAM controller, and IO.
desktop is GB300, not GB10 like Spark
GB300 is nominally "available" in desktop form factor workstations priced around $100k. That's a few orders of magnitude away from the ordinary desktop PC market that consumers participate in.
they also announced a GB10/N1X windows desktop mini PC.
No thunderbolt is a big no for me. Its one of the greatest feature of MacbookPro that makes it dockable and expandable as a desktop with a good thunderbolt dock.
Thats also possible with usb-c.
With some caveats, you wouldn't be able to connect two 4k monitors to a dock without TB5.
USB 4 v2 has the same display capabilities as TB5. In fact, TB5 gets its display capabilities from USB 4 v2
can these do training or only inference? currently working on learning machine learning and I'd love to have a physical machine I could aim to build real workloads on in a few years.
They're Turing complete. What else do you need?
There is a reason why Google has tpu8i and tpu8t
technically in order for something to be turing complete it needs infinite memory
Itâs possible (likely, even) to have a chip fast enough for inference, but not fast enough or with enough memory to do meaningful training runs. Like the current DGX spark.
not for llm full training, but can do some finetuning for sure.
I believe training is way more processor intensive than inference.
I didn't see this in the article but elsewhere I've seen the memory bandwidth quoted as 600GB/s [1]. For comparison:
- 5090/6000 Pro: 1792GB/s
- 5080:: 960GB/s
- 5070Ti: 892GB/s
- M3 Ultra: 819GB/s
- DGX Spark: 273GB/s (less than an M5 Pro at 307GB/s)
Memory bandwidth isn't everything but it will cap inference rate pretty heavily. Also, the M3 Ultra is for an almost 2 year old Mac Studio. It's widely expected that it'll be refreshed in Q3 with a likely M5 or M4 Ultra with >1000GB/s. I really hope Apple realizes what a market opportunity Apple has here.
The above shows just how good value the 5090 really is. It basically a RTX 6000 Pro with less RAM (and ~12% fewer CUDA units), which is a ~$10k card, for 20-30% of the price. This also demonstrates how NVidia uses VRAM for market segmentation. As an aside, the true data center cards (eg B100, H100) use HBM memory at ~3.2TB/s.
[1]: https://wccftech.com/nvidia-enters-pc-space-with-rtx-spark/
Spark memory bandwidth is ~300 GB/s. Internal bandwidth is 600 GB/s but that doesn't matter.
128 GB at 600 GB/s for this versus 32 GB at 1800 GB/s for 5090.
This is much better value than 5090, you can run much bigger models.
Here's a pretty detailed breakdown of this [1]:
> tl;dr - For software development, Qwen3.6 27B, 5090 gives you ~3x speed over M5 Max, letting you plow through code, while M5 Max gives you ~4x memory, letting you use higher quantization and bigger context. Which would you choose and why?
I've read a number of things from which the consensus seems to be that yes you can run a larger model and/or have more context with a 128GB+ Mac but the performance gap is still massive and with current hardware we're still talking about inference rates that matter. By this I mean there's a big difference between 10tok/s vs 30. Once we get to t apoint where it's 100 vs 300, it won't be as big of a deal, a bit like FPS in games.
Oh and there are similar concerns with the DGX Spark [2].
[1]: https://www.reddit.com/r/LocalLLaMA/comments/1t5v2gr/need_ad...
[2]: https://www.reddit.com/r/LocalLLaMA/comments/1sqk333/dgx_spa...
Yeah and also the quoted 1 PF is only for sparse models (only half that for dense, if that), and the DGX had serious hardware issues: https://x.com/ID_AA_Carmack/status/1982831774850748825
Will NVIDIA get a monopoly on providing laptops and desktops with a lot of RAM going forward?
No. You can get a PowerBook today with 128 GB ram.
https://www.bhphotovideo.com/c/product/1957120-REG/apple_mbp...
I'm sorry to announce this to you, but the last PowerBook was released 21 years ago
Or get an AMD 395 laptop or mini PC for half the price of an equivalent mac device
https://www.bosgamepc.com/products/bosgame-m5-ai-mini-deskto...
Bosgame M5 AI Mini Desktop Ryzen AI Max+ 395 96GB variant âŹ1.800,95 (sold out)
128GB+2TB variant âŹ2.401,95 (in stock)
I have the latter, it's fantastic
$600 for 32GB ram seems bananas
Unfortunately in the current market 32GB of ddr5 seems to run about $400 as 2x16gb DIMMS, and even more for 1x32GB DIMM (higher density chips are more expensive). So $600 really isn't much over market price, especially considering strix halo uses 8000MHz ram instead of the typical 6000 found in consumer dimms.
https://onexplayerstore.com/products/onexplayer-super-x?vari...
$3649 with 128GB of ram
It was wintel (windows + intel) before. This will be what? Windia? Wintek?
Winvidia
Nvideous
Nvidiows
Nvindows
Yeah, there is zero chance I'm ever running Windows ROFL.
However, I'd jump from Mac in a Heartbeat if this supported Linux.
They made their own x86 CPU? Or was that part outsourced? Ok ARM MediaTek.
ARM cpu made by MediaTek.
But probably worth clarifying it's not a typical "MediaTek CPU" some might assume by that. It has Nvidia's customized ARM CPU implementation + their GPU.
This has off-the-shelf Arm cores.
I think that Nvidia made GPU and CPU, and Mediatek made other parts of SoC necessary for a notebook. Grace is Nvidia's own CPU ARM core
I believe Grace is an ARM designed core. Vera is the nVidia designed core.
Looks like the MSI one might be a 2-in-1, if it has good stylus support I might have a good candidate for an upgrade, thought my ~3-4 year old Galaxy Book is holding up alright for now.
hope nvidia support driver better than qualcomm. also hope they support linux soon.
Question is: "Can it run Doom?"
"Unified Memory" still means divided address space right? You have to pre-allocate system vs gpu and copy from one to the other?
Is this finally Macbook Chip Efficiency coming to Windows or will it just be shittier compatibility for slightly better battery life?
I heard leaked geekbench putting it behind the m3, which is couple years old now.
All I care about is if I can get one of these for significantly less than a dgx and get Linux on it for some cuda Blackwell kerneling.
Yeaaaah . But at what Cost though.
What is this product anyway? Is it a general purpose CPU or is it specifically designed for MS Windows? Nvidia stepping back from the open source?
"Introducing the NVIDIA RTX Sparkâą Superchip. The fusion of NVIDIA AI and RTX graphics in a single chip redefines Windows PCs and delivers amazing creating, AI development, and gamingâon the slimmest, most beautiful RTX laptops ever and small, ultra-efficient desktops."
Itâs nivdia attempting to compete with Appleâs M-series
Its nvidia attempt to gain additional market share and expected as well. If the whole ecosystem is around nvidia and its the easiest way of running stuff, Nvidia offering more enterprise infrastrcuture allows companies to just buy directly nvidia.
Nvidia is also very very rich and pushes the boundaries of stuff. They stoped waiting for industry standards. You can see this in there network stuff. All nvidia.
Next logical step (at least now, not something i thought about) was there CPU for their GPU racks/clusters/systems.
Now they have everything anyway, RTX Spark is just logical.
I don't think its specificly targeted at Apple at all.
Apple has like 10-15% market share and just because some IT nerds buy themselves a mac mini doesn't mean much.
Plenty of them actually just run openclaw without local models. Something which surprised me quite a lot.
But i have two 4090 at home. They consume a lot of power and i had to research the proper Mainboardmodel and had to mod one 4090 to use water cooling because they run too hot.
There Spark setup was at 3k, way to expensive for normal people. If they can get this down and sell more, great for their ecosystem (strengthening it) and getting more money from people.
It does surprise me though that they have enough capacity for this chip and not just putting everyting in Rubin but perhaps the build out has slowed down a little or they start to diverse already for economic savety
Their target competition is the AMD Strix Halo which is eating the Sparks lunch right now.
Also sounds like they are ditching the discrete GPU altogether.
All the news articles in my feed mentioned Nvidia reinventing personal computing which is laughable given the specs are worse than the m series. Iâm guessing they saw how well Apple devices were selling and rushed to get something similar out so they can ride the hype train and have something to fall back on if ai DC spend slows down.
There's a lot of companies trying to support datacenter systems like GH and Rubin that don't have dev hardware remotely resembling it. M-series isn't a good option, speaking from the personal experience of currently using one for this exact purpose.
I wouldn't say it's Nvidia stepping back from open source... if anything this is doubling down on it, as one of the selling points of this is the 128GB of unified memory which will allow for hosting local models (i.e, nvidia's new open model they just released). I guess it's pretty cool, I'm a big supporter of local LLMs/open weight models so seems enticing to me, although I'm not sure this will be super applicable to a lot of regular consumers. Seems like a pretty niche product.
Linux works but MS is just paying them not to mention it.
Unified RAM means its soldered to the mainboard, right?
I'm not sure if I like this. Sure for a laptop this might be not a big problem but if this ARM ecosystem is a success it will spread to desktop computers and I fear we could lose the existing modularity.
"Unified" means that it's shared between CPU and GPU, I believe.
But yes, it tends to be soldered on.
No, but LPDDR means soldered, there are no LPDDR dimms
There's LPCAMM2, but it's very recent. The Framework Pro laptop supports it, for example, although only on the Intel variant.
I think unified RAM means soldered to the SoC, which is in turn soldered to the mainboard
I have no idea how powerful or power efficient these guys are, but this seems to be the first step in a bigger push towards Windows on ARM (without loosing gaming).
I think more announcements will follow soon from other companies.
My DGX Sparks are the first and only devices I have with 200W USB-C PD. Low power by AI workstation standards, but intolerable in a laptop.
Intolerable? Why?
Battery life
The comment I'm replying to appears to be talking about power DELIVERY, not consumption. Why would extra power-delivery capacity be intolerable?
The DGX Spark doesn't have a battery. If it comes with 200W delivery (actually 240W), it's because it plans on consuming close to that amount.
Although I'm kinda surprised the DGX Spark used USB-C at all for power instead of just like a DC jack or whatever. But whatever.
It's worth noting that Nvidia power management on Linux has been absymal. There also aren't any of the usual power management options to see how much power things are using, which is quite atypical for a modern system.
Nvidia really threw stuff over the wall with the DGX Spark release. They don't seem to really care. I sort of think they'll spend a little more time on Windows, where there's no pesky upstreaming to do and they can just do whatever, but man, it's such typical hubris from Nvidia to build such an expensive box with good chips but make it basically unsupportable and roasty hot all the time.
You also generally have to run an ever more stale two year old Ubuntu derived DGX OS to get anywhere, with bespoke kernel and drivers all. None of it is well supported, none of it just works like a comparable PC or even well behaved arm system would.
As for other ARM, there were rumors AMD Sound Wave is/was going to be a ~10W arm APU, but there hasn't been much said about it lately. Honestly given the ram crunch, it's maybe just not worth trying to build a system with a cheap core, if the rest of your costs are going to stay so stratospheric. https://www.techpowerup.com/341848/amd-sound-wave-arm-powere...
I really like this, but I think the reason Apple Silicon took off was that Apple sort of forced devs to support ARM. Not sure if Microsoft can do the same for WindowsâŠ
Developers werenât really âforcedâ to support ARM. They simply recognized that all future Macs would be ARM, whereas most new PCs would continue to run on x86. So the incentive to adopt ARM was much weaker on the PC side.
They didnât though. Rosetta 2.
rosetta is a relatively short term solution. will be supported up to macOS 28
Microsoft can do the same for windows - they need to address the fat bundle solution that Apple came up with, but for Windows, though ..
ARM64+GPU sure seems like the future. I'm still using my M1 and even that can handle models well, has decent graphics, M5 is a beast, and M6 must surely go even bigger on LLM compute. Now Microsoft has a compelling ARM64+GPU future too.
What does AMD or Intel have here?
Don't know about intel, but AMD has Strix Halo with unified memory and really impressive performance.
I think the future will be 50/50 x64 vs arm64 for PCs.
Imagine a Beowulf cluster of these. /slashdot
After nvidia's many years of neglecting Linux, paired with direct Microsoft's involvement? Are we going to trust them, to allow installing Linux in these easily?
I don't think so.
This most likely be a winmodem situation, again
DGX Spark has the same soc and ships with Ubuntu
Okay, but still it's highly skeptical trusting MS, and NVIDIA.
It ships with DGX OS 7, which includes Ubuntu's 24.04 repos. It is not using mainline Ubuntu, and if you want to run Ubuntu 26.04, you'll have to do some work.
Related:
A powerful new chapter for Windows PCs, accelerated by Nvidia RTX Spark
https://news.ycombinator.com/item?id=48352693
Surface Laptop Ultra: Made for World Makers
https://news.ycombinator.com/item?id=48352627
competitor is already on the market and is x86: AMD AI 395+
bechmarks with DGX arnt spectacular for NVIDIAs software and CUDA lead.
wouldnt count on this being a price/compute challenger. especially with overpriced VRAM.
Strix halo's 8060S gpu is very weak, and is roughly equivalent to a 4060 laptop GPU, whereas GB10's gpu is equivalent to a desktop 5070. For LLM throughput, tok/s is similar due to bottleneck by memory bandwidth, but the GB10 has 3x faster prefill. People have also been able to squeeze out much better performance on GB10 using NVFP4 and other improvements in the months after the DGX Spark launch, so don't be misled by early lackluster benchmarks. For the RTX Spark, which also targets gaming and creative applications, the 3x faster GPU is quite nice.
Or like a m4 max? This thing has <300GB/s vs the max with 550GB/s
All those CUDA cores in the sparks but they're starved for memory bandwidth.
I am still waiting for NVidia to release a system that legit beats 3090 maxxing for the home gamer...
I feel like the shape of the market right now for "home lab" inference is:
The sparks are good if your ultimate plan is to spend even more on NVidia hardware in future to run your dev setups at usable speeds. Or, you're developing for a work cluster.
If you mainly want to run local models at acceptable speeds portably, buy a mac with lots of RAM. If youâre happy with non-portable / racked, buy 3090s (dense) or mac studios (MoEs). Buy newer cards if you are restricted on power or slots. If you are rich, buy a6000 blackwells.
The only Question is is it worth suffering hip and x86? I suspect a lot of folks might like a machine that mimics their GB300 But costs less than a dgx.
Also I heard the tensor core instructions on the dgx are gimped and youâre better off with a rtx pro x000. Is that the same with these machines?
Is CUDA really a lead for long? Arenât all the latest competitive approaches avoiding all the standard software stacks and writing deeply customized software that is very directly tied to whatever hardware they use?
And is it really a way to lock in people? With AI coding tools, isnât it trivial to write software on top of CUDA and rewrite it to target some other hardware?
yes.
no.
Some other relevant discussions and sources âŠ
NVIDIA and Microsoft Reinvent Windows PCs for the Age of Personal AI
https://news.ycombinator.com/item?id=48352705
NVIDIA DGX Station for Windows Puts a Trillion-Parameter AI Supercomputer on Every Enterprise Desk
https://news.ycombinator.com/item?id=48352691
Introducing Surface Laptop Ultra: Made for world makers
https://news.ycombinator.com/item?id=48352627
Introducing a powerful new chapter for Windows PCs, accelerated by NVIDIA RTX Spark
https://news.ycombinator.com/item?id=48352693
2 comments in total there
It all sounds good on paper. But I have trouble believing Windows can be a good platform for this. Microsoft has lost all trust after inserting ads into windows, slowly removing power user features, and exploiting every dark pattern they can. And for years, the ARM based Windows laptops have been useless due to app compatibility issues. Why would this change now? Is it priced to be a lot cheaper than Appleâs laptops? Or is this a niche product for AI developers basically?
Anecdotally Windows ARM works fine for me, although to be honest most of my work is command line + browser anyway. WSL works like a treat. Steam installs and most lower end games also play fine on my ARM laptop too. Games that require kernel anticheat don't work.
I think they make a great "second device" where you have something meatier to fall back to if something doesn't quite work right. I'm not sure if it's ready to take on the "main device" role just yet. But it's a far far better experience than the Surface RT days.
The "gaming" take is a strange one indeed for an ARM platform. Hopefully they (Microsoft or Nvidia?) put some real effort into the translation layer. They claim modern AAA games, but it is possible they strongarmed the developers to make them an ARM build for a few select titles...
It's clear gaming was not a major concern, it's just "good enough" for someone running AI models and occasionally wants to play some games, not made to primarily play games.
Yep. I noticed the press releases talk about all the partners they have. It seems like a desperate attempt to manufacture a consensus to invest in this new hardware instead of leaving it sort of abandoned like the other Windows ARM stuff. But the problem is that these attempts end up having a few very visible apps working on the architecture and others not actually doing anything substantial.
Sure the graphics capabilities are probably very good. But if youâre a game developer who has traditionally built on Windows on x86 chips, would you want to invest in this new chip or invest in making games for the Apple ecosystem? Arenât there more new customers to reach in the Apple world than this new Nvidia world?
> But if youâre a game developer who has traditionally built on Windows on x86 chips, would you want to invest in this new chip or invest in making games for the Apple ecosystem?
Windows and the new chip. Higher developer productivity and higher chances of a substantial audience.
Who cares about Windows, the goal is to run local AI models similar to AMD Strix Halo and Apple Silicon machines. The OS is honestly a distant last concern as long as the models work well, as you could put Linux on these too, but not sure how well wake lock works.
A lot of the app compatibility issues on current machines are down to Qualcomm's poor drivers - the actual core bits are mostly okay.
Hopefully MSFT would look at this as a do or die system, and go all in on improving the user and ownership experience. Will they? Not so sure.
Microsoft sees windows purely as a platform to sell AI products these days.
That's what they're working on, in theory, with Windows K2.
I would never trust Microsoft. Their next drama is revoking Office 2019 perpetual licenses https://www.youtube.com/watch?v=KRnno9VIZx0. It never ends with them because they know they have you by the balls.
I trust them on a daily basis. No issues thus far..
So basically Cerebras style?
Not at all. This is a more like what Apple has been doing the past few years. A bunch of decent arm cores paired with a beefy integrated GPU.
No.
This may finally be the chip family ARM on Windows has always needed. Qualcomm's chips have always been dogs with slow off-the-shelf ARM CPU cores that have pathetic single-threaded performance compared to x86 AMD/Intel or ARM Apple Silicon designs.
For reference, this is just a single benchmark, but as an idea of each vendor's top mobile CPU single-threaded performance:
Geekbench Single Thread Score:
- DGX Spark (same CPU as RTX Spark): 3125
- Snapdragon X1 Elite: 2950
- Snapdragon X2 Elite Extreme: 4050
- AMD Ryzen 9 9955HX: 3225
- Intel Core Ultra 9 290HX Plus: 3175
- Apple M5 Max: 4350
I'm happy to be wrong about Qualcomm's latest X2 chip performance, even if it is shipping in only a single product so far. Their previous best was the lowest in this list.
This will likely have worse single threaded performance than recent Qualcomm CPUs.
These chips also appear to be using off-the-shelf ARM cores.
Qualcomm Snapdragon x1 and upcoming x2 use their Oryon core and have much faster single-thread performance than Intel/Amd and this nvidia soc that uses off-the-shelf arm cores
That wasn't true of the X1, but apparently the X2 (which is only in a single device so far) does appear to finally be fast. The first Windows ARM CPU to be faster than any of its x86 rivals. Competitive with Apple Silicon single-thread performance even.
I was disappointed to see that the RTX Spark has the ARM cores from the DGX Spark. I was hoping it had their new in-house developed cores that Nvidia is starting to use on their latest gen server parts. They look really fast. That said, if RTX Spark has CPU performance like the DGX Spark, it will be almost as fast as the top AMD/Intel parts.
This will crush the M5 Max going by the numbers. I'm curious to see how much they end up costing
It won't, the top tier RTX Spark has the same exact CPU and GPU as DGX Spark, so you can check DGX Spark CPU benchmarks to see how it fares. Spoiler: it's about M3 Max level. And they're only coming this fall.
Nah, still ~300GB/s memory bandwidth. That will be slower than the M5 max, by a wide margin for LLM inference.
M5 max is 3x stronger and 50% more power efficient. nice try though.
... but you'll be rewriting inference for any model that isn't a well-known LLM. Yourself.
AI coding agents can do that pretty nicely already and it will only (slowly) improve over time.