Microsoft’s “1‑bit” AI model runs on a CPU only, while matching larger systems

kac77 · Saturday at 3:41 AM

Microsoft’s “1‑bit” AI model runs on a CPU only, while matching larger systems

Future AI might not need supercomputers thanks to models like BitNet b1.58 2B4T.

Kyle Orland
Now, researchers at Microsoft's General Artificial Intelligence group have released a new neural network model that works with just three distinct weight values: -1, 0, or 1. Building on top of previous work Microsoft Research published in 2023, the new model's "ternary" architecture reduces overall complexity and "substantial advantages in computational efficiency," the researchers write, allowing it to run effectively on a simple desktop CPU.

DukenukemX · Saturday at 1:28 PM

Someone at Microsoft is probably yelling, "Go Back, I want to use 640K of ram".

View: https://youtu.be/W5g33W0Zafs?si=MMm276hGKQHCH1z4

Lakados · Saturday at 3:16 PM

This could have some awesome uses for games, especially on consoles.
No GPU that you can reasonably place into a console will ever be powerful enough, upscaling tech obviously helps but it does have a CPU thread requirement, FSR4 and DLSS absolutely love higher thread counts, it keeps frame pacing better and reduces jitter and such.
The actual load they place on the threads is relatively minor so lots of powerful cores is overkill, for a multitude of reasons, so lots of moderate cores is much better, enter something like the Zen4C, small, efficient, powerful enough when paired with an APU and being forced into a power envelope a console could reasonably deal with.
That still leaves a lot of highly under utilized threads where a simple AI solution like this could absolutely thrive.

I could reasonably see Microsoft doing a console with 10-12 Zen4C cores, using some sort of extended cache, as the 2MB the 4C cores have is too little, the current method used by the X3D chips isn't an option for the 4C cores due to the removal of the TSV's but the Intel Adamantine solution or some TSMC equivalent would be near perfect as it would also improve CPU-GPU communications.

Haswellbeast · Monday at 12:55 PM

I mean, this is great for mobile or in-game ai. I still think that eventually we will have AI based MMOs, that would be amazing if I could have unique quests and storylines, also, a huge timesink!

KD5ZXG · 2025-04-22T01:04:04-0400

Carry=MAJ3(A,B,C)
Sum=MAJ5(A,B,C,/Carry,/Carry)
Or throw 11 NAND gates at the same addition problem.
Marble weighting ain't nothin new.

-1,0,+1 also good enough to multiply waveforms to test for phase correlation.
Plus * Plus = Plus, Neg * Neg also = Plus. If waving in unison, a sum goes up.
Waving in opposition, sum goes down. Waving randomly goes nowhere...
Test again with 90deg phase shift so nothing falls through the cracks.
Also to obtain a complete vector.

Magic sines might be better for this purpose than squarewaves.
https://www.tinaja.com/glib/msintro1.pdf

LukeTbk · 2025-04-22T01:35:51-0400

Lakados said:
This could have some awesome uses for games, especially on consoles. No GPU that you can reasonably place into a console will ever be powerful enough, upscaling tech obviously helps but it does have a CPU thread requirement, FSR4 and DLSS absolutely love higher thread counts, it keeps frame pacing better and reduces jitter and such.

Not so sure I get why a CPU here, instead of a say a npu or gpu to run small model particularly or why DLSS would love many light used CPU cores... APU tend to be a certain amount of possible watt like a laptop, more core running less npu-gpu or made specially good for AI specialized hardware.... And console gpu-npu will be quite powerful inference machine (versus say watch, glasses, tvs, cheaper phone, etc...), in the over 1,000 tops in low bits mode.

Specially why FSR 4 would be more cpu heavy than FSR 3, they both are feed the same motion vector, depth map, etc.. I think 3.1-4 from the program side they do not know the difference.

Lakados · 2025-04-22T14:57:03-0400

LukeTbk said:
Not so sure I get why a CPU here, instead of a say a npu or gpu to run small model particularly or why DLSS would love many light used CPU cores... APU tend to be a certain amount of possible watt like a laptop, more core running less npu-gpu or made specially good for AI specialized hardware.... And console gpu-npu will be quite powerful inference machine (versus say watch, glasses, tvs, cheaper phone, etc...), in the over 1,000 tops in low bits mode.

Specially why FSR 4 would be more cpu heavy than FSR 3, they both are feed the same motion vector, depth map, etc.. I think 3.1-4 from the program side they do not know the difference.

The existing Intel and AMD NPU offerings are slightly better than garbage; they have limited support, poor documentation, and only exist inside their mobile segments at the demands of Microsoft, which has bungled the implementation there as well. I doubt any developer is looking at either of their NPUs and thinking about how they should build a critical component of their game engine to be reliant on it.

The GPU in a console and most PCs is already the weak point in their gaming experience, so taking work away from it should be prioritized over putting work onto it.
DLSS and FSR use the CPU to handle scheduling and IO, the process is somewhat annoying for the system it's just how the technology works, neither tech produces a heavy CPU load but it does generate a lot of very light threads and system interrupts for data IO, so it may only add a 3-4% increase in CPU load but it's doing an extra .5-1% across every thread it can get it's hands on. Practical tests routinely show that frame pacing improves and weird visual jitters decrease with an increased core count. Both AMD and Nvidia's documentation on the tech covers this, it's just how it works. It's not a heavy CPU load, but it does generate one, and the more threads it has access to, the happier it is.

AI is memory-intensive, even a small model can happily consume 4GB in VRAM, very few systems out there currently could take what VRAM they do have and portion 4GB of it off for AI without severely impacting FPS, and that wouldn't even include any decrease brought about by the calculations or increased IO. Running the AI on the GPU while using system RAM isn't much better because the massive increase in IO would negatively impact the GPU's ability to load textures.

If we use a 9800x3D paired with an RTX 5090 gaming at 4K as an example of a system, that GPU is likely pinned at 100% while gaming, meanwhile, the CPU itself is likely not exceeding 40% utilization.

It is just easier to increase a game's RAM requirements and put the increased calculations on the CPU. Most PC's have more memory than they use, and aren't pinning the CPU when gaming, NPUs just aren't an attractive target for Developers right now, there aren't enough of them out on the platforms they target to be taken seriously as a resource.

LukeTbk · 2025-04-22T15:31:35-0400

Lakados said:
The existing Intel and AMD NPU offerings are slightly better than garbage; they have limited support, poor documentation, and only exist inside their mobile segments at the demands of Microsoft, which has bungled the implementation there as well. I doubt any developer is looking at either of their NPUs and thinking about how they should build a critical component of their game engine to be reliant on it.

Yes but the one that would be custom made for a PS6 for example (either a dedicated NPU or strong "tensor core" gpu, would have good support from sony, like a better version of what the PS5 pro offer and game dev making console game could go with it eye closed.

Lakados said:
The GPU in a console and most PCs is already the weak point in their gaming experience, so taking work away from it should be prioritized over putting work onto it.

Yes but if they have tensor core with 1-2-3-4 low bit precision, those would be underused in gaming otherwise, like already DLSS running well enough.

Lakados said:
AI is memory-intensive, small model can happily consume 4GB in VRAM

and cpu tend to have the slowest one

but at those level of bits and with the sub agents model, it can be relatively ok, I doubt the next console will go for the dedicated vram model, so it is a bit of a mute point.

I wonder how much is people enabling dlss, getting higher frame rate and thus less gpu bound and more able to use more CPUs overrating the DLSS impact on the CPU versus the natural impact of having higher FPS on it. Same for the scheduling talked about on a console it can be made a bit where you want (new Blackwell GPU run a RISC cpu to do it on the gpu more and more for example)

Lakados said:
If we use a 9800x3D paired with an RTX 5090 gaming at 4K as an example of a system, that GPU is likely pinned at 100% while gaming, meanwhile, the CPU itself is likely not exceeding 40% utilization.

Yes but you are not a console running game made just for you, they can put less CPU and more GPU as much as they want, I feel a lot of the point brought are specially not about being particularly good in the console context.

Lakados · 2025-04-22T15:55:10-0400

LukeTbk said:
Yes but the one that would be custom made for a PS6 for example (either a dedicated NPU or strong "tensor core" gpu, would have good support from sony, like a better version of what the PS5 pro offer and game dev making console game could go with it eye closed.

If Sony were happy limiting all their new releases to be PS6 exclusives, then this would work, but most developers are still doing PS4 versions of their modern games because there is still so many of them out there. When the PS6 does finally arrive, it's safe to assume that most games will get a PS5 launch as well, and designing a critical component like NCP AI and Reactions around hardware exclusive to the PS6 would make doing multi-platform much more difficult.

LukeTbk said:
Yes but if they have tensor core with 1-2-3-4 low bit precision, those would be underused in gaming otherwise, like already DLSS running well enough.

Ray-Tracing as a requirement is only going to increase from here on in, engines are designed for it, developers have been shrinking their art departments accordingly, unless we see a dramatic reversal in this trend then most systems just don't have the tensor cores to spare and will become increasingly reliant on DLSS/FSR in the future just to keep pace.

LukeTbk said:
and cpu tend to have the slowest one but at those level of bits and with the sub agents model, it can be relatively ok, I doubt the next console will go for the dedicated vram model, so it is a bit of a mute point.

I wonder how much is people enabling dlss, getting higher frame rate and thus less gpu bound and more able to use more CPUs overrating the DLSS impact on the CPU versus the natural impact of having higher FPS on it. Same for the scheduling talked about on a console it can be made a bit where you want (new Blackwell GPU run a RISC cpu to do it on the gpu more and more for example)

DLSS/ FSR is a requirement for most engines right now. For example, UE5's entire texture loading process runs under the assumption that it is defaulted to on, and you get pretty severe performance and image quality issues if you don't enable it. As more games transition to highly recommending or flat out requiring ray tracing, DLSS and FSR only become more important to their processing as they do the image scrubbing and all the Anti-Aliasing to keep things from being a fuzzy mess. There's lots of stuff out there about how this has resulted in a decrease in visual quality across the board, but it is significantly cheaper for developers to do it this way and reduces their need to optimize for a specific platform, as it puts that work back on the GPU driver team.

LukeTbk said:
Yes but you are not a console running game made just for you, they can put less CPU and more GPU as much as they want, I feel a lot of the point brought are specially not about being particularly good in the console context.

Yes and no, but the PS5 CPU is underutilized in most cases, as is the Xbox Series X; the S is just underpowered in every aspect. But console games are getting PC releases with an increasing frequency, permanent console exclusives just aren't making the money they used to, and multi-platform is a financial requirement. It's more pertinent to consoles, though, because they can't be upgraded, the design team needs to make sure that everything going into it has multiple purposes, and specialized hardware only complicates things for the developers when it comes to porting outside that console, which is something they are all requiring now. So, a CPU-based AI that is light on memory could make it something that could work on the existing Series X and PS5 consoles, in addition to their replacements, and the existing PC gaming spaces with little increased effort.

Nobu · 2025-04-22T16:11:59-0400

KD5ZXG said:
Carry=MAJ3(A,B,C)
Sum=MAJ5(A,B,C,/Carry,/Carry)
Or throw 11 NAND gates at the same addition problem.
Marble weighting ain't nothin new.

-1,0,+1 also good enough to multiply waveforms to test for phase correlation.
Plus * Plus = Plus, Neg * Neg also = Plus. If waving in unison, a sum goes up.
Waving in opposition, sum goes down. Waving randomly goes nowhere...
Test again with 90deg phase shift so nothing falls through the cracks.
Also to obtain a complete vector.

Magic sines might be better for this purpose than squarewaves.
https://www.tinaja.com/glib/msintro1.pdf

Recently a vid on yt keeps popping up with "what's plus * plus" as the tagline. I haven't watched it yet, figured it was another fluff vid.

Microsoft’s “1‑bit” AI model runs on a CPU only, while matching larger systems

kac77

2[H]4U