Building High-Performance RISC-V Cores for Everything

erek · Apr 12, 2023

Pretty cool about RISC-V

"Wei-han Lien is Tenstorrent's Chief RISC-V Architect, and he's been lead designer on chips like Apple's M1. Now he heads up a team building the highest-performance RISC-V cores on the market. Here's why RISC-V is important."

Source:

pendragon1 · Apr 12, 2023

lol risc again? im not even gonna post the hackers pic this time...
https://hardforum.com/search/571702...1&c[nodes][0]=116&c[title_only]=1&o=relevance

erek · Apr 12, 2023

pendragon1 said:
lol risc again? im not even gonna post the hackers pic this time...
https://hardforum.com/search/571702...1&c[nodes][0]=116&c[title_only]=1&o=relevance

You’re not a fan of risc? Or what

pendragon1 · Apr 12, 2023

erek said:
You’re not a fan of risc? Or what

ive been hearing "risc is gonna change everything" for 30 years...
you excited?

Lakados · Apr 12, 2023

RISC has a place, it does specialized compute very well and it can be customized in a variety of ways.
As a general computing device it has a long long way to go, but for low power high speed single function it is a king.

staknhalo · Apr 12, 2023

[insert Hackers clip here]

erek · Apr 12, 2023

pendragon1 said:
ive been hearing "risc is gonna change everything" for 30 years...
you excited?

Lakados said:
RISC has a place, it does specialized compute very well and it can be customized in a variety of ways.
As a general computing device it has a long long way to go, but for low power high speed single function it is a king.

it's more than that, RISC can be seen within our GPUs even

"RISC instruction sets do not need microcode and are designed to simplify pipelining. VLIW instructions are like RISC instructions except that they are longer to allow them to specify multiple, independent simple operations. A VLIW instruction can be thought of as several RISC instructions joined together."

TeraScale is a VLIW SIMD architecture, while Tesla is a RISC SIMD architecture, similar to TeraScale's successor Graphics Core Next. TeraScale implements HyperZ.

Just look at the promises of the VLIW6 At Scale

all this stuff just needs massively scaled up with a high bit depth and width 3-D Ring Bus

https://hardforum.com/threads/r600-has-512-bit-external-memory-bus.1133476/page-2#post-1030407823

stop re-inventing the wheel and start engineering with the puzzle pieces we have at a greater sophistication

Sycraft · Apr 12, 2023

pendragon1 said:
ive been hearing "risc is gonna change everything" for 30 years...
you excited?

At this point there are a few things that should be evident about RISC:

1) It is NOT going to "change everything" We've had RISC chips not just around but widely used for a long time. Every hear of ARM? That means "Acorn RISC Machine". It is a RISC-based architecture. Same with MIPS, PPC, and several others. Notice that, while useful, it hasn't been something radical, mostly by virtue of the fact that it has been around for decades.

2) True RISC doesn't work for modern needs. RISC means "reduced instruction set computing" and the OG RISC designs take it seriously. It is about having as few instructions as possible, and having each instruction only do a single thing. Nice and simple, makes compiler optimization easy... and also means you don't get to have things like vector math, crypto and network accelerators, and all that other good shit that modern chips rely on to do the things we want fast and efficient. We rely on optimized systems to make things fast.

3) All of the original talking points about CISC vs RISC are 100% irrelevant these days, as processors work fundamentally different and all of them can be considered hybrids of some fashion.

4) With any new chip, it is "put up or shut up." Never believe hype, lots of companies want to claim they are going to just annihilate everything on the market, it is rare if ever they deliver.

5) Be suspicious of anything with super high core counts. Usually if a company throws way more cores than you normally see on a given size die, that means their cores suck and can't do much and they are trying to make up quality with quantity. While there are some limited applications where things truly do scale to an infinite amount of cores and you can trade execution speed for more in parallel, there are also plenty of things that do not scale as well and need higher core speed. Not only things like games, but even simulations. Ansys is one such example where part of it runs on as many cores as given, but other parts (the setup basically) needs less faster cores. If someone offered you a 128-core CPU that had about the same performance as another company's 16-core CPU, the 16-core is the better buy because it will perform better across a wider range of applications.

RISC-V is a neat idea, being an open standard and all, but I'm not going to get excited until someone actually delivers chips that are worth having. Talk is cheap, real world performance often doesn't meet claims. Anyone remember the Transmeta Crusoe? How about the Elbrus E2K? No? My point exactly.

erek · Apr 12, 2023

Sycraft said:
At this point there are a few things that should be evident about RISC:

1) It is NOT going to "change everything" We've had RISC chips not just around but widely used for a long time. Every hear of ARM? That means "Acorn RISC Machine". It is a RISC-based architecture. Same with MIPS, PPC, and several others. Notice that, while useful, it hasn't been something radical, mostly by virtue of the fact that it has been around for decades.

2) True RISC doesn't work for modern needs. RISC means "reduced instruction set computing" and the OG RISC designs take it seriously. It is about having as few instructions as possible, and having each instruction only do a single thing. Nice and simple, makes compiler optimization easy... and also means you don't get to have things like vector math, crypto and network accelerators, and all that other good shit that modern chips rely on to do the things we want fast and efficient. We rely on optimized systems to make things fast.

3) All of the original talking points about CISC vs RISC are 100% irrelevant these days, as processors work fundamentally different and all of them can be considered hybrids of some fashion.

4) With any new chip, it is "put up or shut up." Never believe hype, lots of companies want to claim they are going to just annihilate everything on the market, it is rare if ever they deliver.

5) Be suspicious of anything with super high core counts. Usually if a company throws way more cores than you normally see on a given size die, that means their cores suck and can't do much and they are trying to make up quality with quantity. While there are some limited applications where things truly do scale to an infinite amount of cores and you can trade execution speed for more in parallel, there are also plenty of things that do not scale as well and need higher core speed. Not only things like games, but even simulations. Ansys is one such example where part of it runs on as many cores as given, but other parts (the setup basically) needs less faster cores. If someone offered you a 128-core CPU that had about the same performance as another company's 16-core CPU, the 16-core is the better buy because it will perform better across a wider range of applications.

RISC-V is a neat idea, being an open standard and all, but I'm not going to get excited until someone actually delivers chips that are worth having. Talk is cheap, real world performance often doesn't meet claims. Anyone remember the Transmeta Crusoe? How about the Elbrus E2K? No? My point exactly.

Could we pipe the risc-v Standard through GPT-4 for optimization a few times?

Lakados · Apr 12, 2023

Sycraft said:
At this point there are a few things that should be evident about RISC:

1) It is NOT going to "change everything" We've had RISC chips not just around but widely used for a long time. Every hear of ARM? That means "Acorn RISC Machine". It is a RISC-based architecture. Same with MIPS, PPC, and several others. Notice that, while useful, it hasn't been something radical, mostly by virtue of the fact that it has been around for decades.

2) True RISC doesn't work for modern needs. RISC means "reduced instruction set computing" and the OG RISC designs take it seriously. It is about having as few instructions as possible, and having each instruction only do a single thing. Nice and simple, makes compiler optimization easy... and also means you don't get to have things like vector math, crypto and network accelerators, and all that other good shit that modern chips rely on to do the things we want fast and efficient. We rely on optimized systems to make things fast.

3) All of the original talking points about CISC vs RISC are 100% irrelevant these days, as processors work fundamentally different and all of them can be considered hybrids of some fashion.

4) With any new chip, it is "put up or shut up." Never believe hype, lots of companies want to claim they are going to just annihilate everything on the market, it is rare if ever they deliver.

5) Be suspicious of anything with super high core counts. Usually if a company throws way more cores than you normally see on a given size die, that means their cores suck and can't do much and they are trying to make up quality with quantity. While there are some limited applications where things truly do scale to an infinite amount of cores and you can trade execution speed for more in parallel, there are also plenty of things that do not scale as well and need higher core speed. Not only things like games, but even simulations. Ansys is one such example where part of it runs on as many cores as given, but other parts (the setup basically) needs less faster cores. If someone offered you a 128-core CPU that had about the same performance as another company's 16-core CPU, the 16-core is the better buy because it will perform better across a wider range of applications.

RISC-V is a neat idea, being an open standard and all, but I'm not going to get excited until someone actually delivers chips that are worth having. Talk is cheap, real world performance often doesn't meet claims. Anyone remember the Transmeta Crusoe? How about the Elbrus E2K? No? My point exactly.

Special purpose silicon is the path forward, Apple does it, Broadcom/Qualcomm are doing it, Intel and AMD have announced it. ARM, RISC, x86, they do their workloads and they handle them well enough, but you get into situations where you have some stupid simple task you have to do thousands of times over and over, and using general computation is too inefficient for it its either too hot, too hungry, or too slow. So in comes some accelerator on the chip that offloads that specific task, we are going to see more and more of those accelerators in the coming future, the days of throwing more transistors at the problem are gone, the scaling doesn't keep up, TSMC 5n to 3n is the best example of this a 1.3x increase in transistor density at 1.5x the cost increase, it will supposedly be even worse for the jump from 3-2. Chiplets on different nodes doing very specific tasks will be a thing sooner than not using the UCIe standards, many developers across the board are all talking about the end of hardware-defined software and the rise of software-defined hardware because saying well you just need a faster CPU isn't going to cut it for much longer because things are hitting a wall on what is reasonable.

Sycraft · Apr 12, 2023

Lakados said:
Special purpose silicon is the path forward, Apple does it, Broadcom/Qualcomm are doing it, Intel and AMD have announced it.

Intel already does a ton of it, it is just getting more press now. Two examples in your CPU right now:

1) AES encryption. Download Veracrypt, or something else that can do crypto benchmarks, sometime and compare speeds of AES vs something like Twofish. It is an order of magnitude faster or more. How is that possible? It is efficient but not THAT efficient of an algorithm. The reason is that all modern Intel chips have a little dedicated section of silicon to do AES encryption and decryption, in particular things like the row shifts, extremely fast.

2) Video decoding. You can play back H.264 or VP9 video and it takes almost no power and CPU. Yet if you fire up something that does it in pure software, it hits the CPU pretty hard. How? Again, dedicated silicon for each video algorithm. All it does is decode that but does it very fast.

For even more specialized, Xeons have some features on them that are company specific. It is all under NDA, and disabled if you aren't that org, so we don't know precisely what it does but some orgs have their own special parts of the CPUs.

As you say, it is just the way of things, and has been for some time. Even vector math processors are an example of that. While they are more general purpose than something like an AES decryption unit, they are still special purpose to, well, vector math. You can use them to perform scalar math, but it is inefficient. They get all their efficiency from big vector operations. There's a bunch of "DSP" type functions like that and FMA that have come to CPUs to speed up those use cases.

It would be nice if we could build a real simple CPU and just make it real fast, but we can't, so instead we have to build a complex CPU with lots of special bits to do all the things we need fast.

RISC-V could do that, of course, and maybe it will. However I'm not excited until I see a chip on the market that performs well across a wide range of workloads.

staknhalo · Apr 12, 2023

I just view it as distributed computing down at the computer level now lol

uOpt · Apr 12, 2023

I am skeptical about special purpose silicon.

To me it is a good way to carry around (and keep powered) useless chip baggage that I end up not using. Give me general-purpose computing every day.

In addition I don't see how all the specialized AI hardware in processors help with research when GPUs blow them out of the water. And GPUs and more universal for machine learning. For example, on Apple Silicon Macs Torch has hardware neural net acceleration for the M1 GPU, but ironically not for the Apple neural engine.

staknhalo · Apr 12, 2023

I dunno I think integrated graphics over time even from a video perspective all the way back with MPEG2 I'm sold the concept has proven itself

AI will be like that progression on it probably

Sycraft · Apr 12, 2023

uOpt said:
I am skeptical about special purpose silicon.

If you want speed, you don't have a choice. All these things that are being done on modern processors are because we have to, not because we want to. Even things like multiple cores. Much easier to have everything be one core and just not worry about all the threading issues. Thing is, we can't, we can't make a single core that performs as well as the multiple cores do, even with a larger amount of silicon.

Same shit with vector extensions. Would be way easier to not worry, do everything as scalar math, just do one multiplication, one add, one divide at a time over and over. But we can't get the same speed. Your CPU has scalar math hardware, it isn't nearly as fast. To get the speed, we need to specialize things to work on big vectors.

Again same shit with real specialized stuff like AES encryption. Develop the most optimized, vectorized, fastest algorithm you can for the general purpose part of the CPU, it won't hold a candle to the dedicated hardware even going full out. Plus even if it did, now you CPU is busy handle encryption/decryption, and not serving web pages or whatever else it would be doing with that encrypted data and you need a crypto accelerator card. With the dedicated silicon it can do all the crypto with just a fraction of its power, leaving the rest to work on the actual data.

So: Do you want CPUs to do all the nifty modern stuff, and keep getting faster, or do you want them to have no dedicated units? Because you can't have both. If you think you can engineer a chip that is extremely fast, and completely general purpose then by all means do so. You'd find it EXTREMELY popular. Developers would love something that was dead-ass simple and required effectively no optimization or special libraries, but it can't be done.

Mr. Bluntman · Apr 12, 2023

pendragon1 said:
ive been hearing "risc is gonna change everything" for 30 years...
you excited?

Nope. 30 years of hype, yet x86 refuses to die. Hell, all x86 CPUs are anymore are x86-64 (or EM64T if you're using Intel flavor) decoders with RISC backends, anyways. So in a roundabout way RISC did change everything? It could be worse, we could all be using IA-64 processors...

uOpt · Apr 12, 2023

Sycraft said:
If you want speed, you don't have a choice. All these things that are being done on modern processors are because we have to, not because we want to. Even things like multiple cores. Much easier to have everything be one core and just not worry about all the threading issues. Thing is, we can't, we can't make a single core that performs as well as the multiple cores do, even with a larger amount of silicon.

Same shit with vector extensions. Would be way easier to not worry, do everything as scalar math, just do one multiplication, one add, one divide at a time over and over. But we can't get the same speed. Your CPU has scalar math hardware, it isn't nearly as fast. To get the speed, we need to specialize things to work on big vectors.

Again same shit with real specialized stuff like AES encryption. Develop the most optimized, vectorized, fastest algorithm you can for the general purpose part of the CPU, it won't hold a candle to the dedicated hardware even going full out. Plus even if it did, now you CPU is busy handle encryption/decryption, and not serving web pages or whatever else it would be doing with that encrypted data and you need a crypto accelerator card. With the dedicated silicon it can do all the crypto with just a fraction of its power, leaving the rest to work on the actual data.

So: Do you want CPUs to do all the nifty modern stuff, and keep getting faster, or do you want them to have no dedicated units? Because you can't have both. If you think you can engineer a chip that is extremely fast, and completely general purpose then by all means do so. You'd find it EXTREMELY popular. Developers would love something that was dead-ass simple and required effectively no optimization or special libraries, but it can't be done.

Those examples are not really what I meant with specialized silicon. General vector extensions can do many things. AES is useful in a wide variety of applications.

I am more thinking of very specialized neural net units. As I gave the example, Apple's neural unit isn't even used by Torch, which is kind of a joke.

It also depends on how much silicon you spend. Video encoding and decoding are specialized, but they are rather trivial compared to a neural net unit.

Sycraft · Apr 13, 2023

uOpt said:
Those examples are not really what I meant with specialized silicon. General vector extensions can do many things. AES is useful in a wide variety of applications.

I am more thinking of very specialized neural net units. As I gave the example, Apple's neural unit isn't even used by Torch, which is kind of a joke.

It also depends on how much silicon you spend. Video encoding and decoding are specialized, but they are rather trivial compared to a neural net unit.

Neural net units are generally just like nVidia's tensor cores: Units that do lots of high speed small-size matrix math, often low precision. That crunches the kind of math neural nets need fast.

I'm not saying some companies won't do stupid ultra-specialized shit, that would be right up Apple's alley, but in general the specialized hardware has a reason and is useful and is why CPUs are as fast as they are. RISC, real RISC, is not about that. It is about very general purpose instructions. Each instruction does one thing and one thing only, and there are a minimal number of them.

Red Falcon · Apr 13, 2023

pendragon1 said:
lol risc again? im not even gonna post the hackers pic this time...
https://hardforum.com/search/571702...1&c[nodes][0]=116&c[title_only]=1&o=relevance

No RISC, no reward.
It's RISC-V business.

Mr. Bluntman · Apr 13, 2023

Red Falcon said:
No RISC, no reward.
It's RISC-V business.

I see what you did there...

erek · Apr 13, 2023

Mr. Bluntman said:
Nope. 30 years of hype, yet x86 refuses to die. Hell, all x86 CPUs are anymore are x86-64 (or EM64T if you're using Intel flavor) decoders with RISC backends, anyways. So in a roundabout way RISC did change everything? It could be worse, we could all be using IA-64 processors...

Or we could of been all running Intel’s iAPX 432 architecture based processors too,

Never forget that one:

https://en.wikipedia.org/wiki/Intel_iAPX_432

serpretetsky · Apr 13, 2023

I dont feel like the RISC-V ISA is leaving us anytime soon. It seems like everywhere I look small, low-power, embedded microcontrollers (including ARM) are rapidly being replaced with RISC-V cores. It's taking over, the same way ARM slowly took over from the low-cost low-power and up into higher and higher power.

And the huge leverage RISC-V has over all other ISAs has nothing to do with power or performance. It's simply control (and cost) of IP. Like many have mentioned previously, the ISA itself doesn't matter very much to performance anymore. The longer and longer RISC-V sticks around the higher and higher probability engineers will eventually design high performance RISC-V cores.

OutOfPhase · Apr 13, 2023

serpretetsky said:
And the huge leverage RISC-V has over all other ISAs has nothing to do with power or performance. It's simply control (and cost) of IP. Like many have mentioned previously, the ISA itself doesn't matter very much to performance anymore.

Agreed. The more complex ISAs like x86/x64 require more from the decoder, but that's a small and ever-decreasing part of the flow. When the decoder was a big part, there was a big push to get away from it. These days, meh.

I don't like the term RISC anymore really, because what is really meant is "Carefully Selected Instruction Set". Reduced implies you want to always minimize, but in reality, you just want to plan carefully, and avoid unnecessary complexity and redundancy. Adding instructions to do very important things efficiently is a Good thing.

SamirD · Apr 13, 2023

The Amiga nailed it back in the day--would be interesting to see an architecture like that come up on the PC side, but it's all being integrated into the cpu now.

I still remember the 'rise of RISC' type of articles 30 years ago. Pure RISC hasn't done that, although it has made its way into the computing landscape.

OutOfPhase · Apr 13, 2023

SamirD said:
The Amiga nailed it back in the day--would be interesting to see an architecture like that come up on the PC side, but it's all being integrated into the cpu now.

Oh man. The Amiga is a textbook example of "what can we do?" given to brilliant engineers.

SamirD said:
I still remember the 'rise of RISC' type of articles 30 years ago. Pure RISC hasn't done that, although it has made its way into the computing landscape.

I was there. 3000 years ago.

Everyone was mad about x86. VLIW and RISC were attempts to right those wrongs. And to be clear, the x86 instruction set is nuts.

But the proposed solutions ended up being polarized implementations. But- those ideas and concepts are being incorporated in all modern designs.

VLIW can be good - when it helps.
RISC is a good thought. Reduce ways of accomplishing the same goal. But - do not cut too deeply, or think that reducing the instructions is in itself a goal.

SamirD · Apr 13, 2023

OutOfPhase said:
Everyone was mad about x86.

lol, yep! I still remember seeing the full page IBM ad in the weekly client/server publication I got when Windows NT was released and IBM was touting OS/2. It was this big NT written as 'Nice Try' and then had the OS/2 Warp logo below it. IBM was trying to piggy back on some reviews and articles that touted NT as half-baked compared to OS/2 Warp. Looking back now that was just a last ditch shot by IBM against what would end up invading and completely dominating the data center over the next few decades--NT, NT Server, win2k, and modern Windows Server line. OS/2 was a great idea for a niche, and if IBM would have tried to captivate part of the market by doing that, they probably would have survived as more than just ReactOS.

OutOfPhase · Apr 13, 2023

SamirD said:
lol, yep! I still remember seeing the full page IBM ad in the weekly client/server publication I got when Windows NT was released and IBM was touting OS/2. It was this big NT written as 'Nice Try' and then had the OS/2 Warp logo below it. IBM was trying to piggy back on some reviews and articles that touted NT as half-baked compared to OS/2 Warp. Looking back now that was just a last ditch shot by IBM against what would end up invading and completely dominating the data center over the next few decades--NT, NT Server, win2k, and modern Windows Server line. OS/2 was a great idea for a niche, and if IBM would have tried to captivate part of the market by doing that, they probably would have survived as more than just ReactOS.

I may be one of 12 people who really loved OS/2 2.x and beyond. I wish I kept the pallet-load of floppies from the 2.0 beta, just for nostalgia.

next-Jin · Apr 13, 2023

I’m pretty sure this is the first thread on [H] I’ve read in like over a decade where I have no idea what the hell anyone is talking about.

/right over my head.

OutOfPhase · Apr 13, 2023

SamirD · Apr 14, 2023

OutOfPhase said:
I may be one of 12 people who really loved OS/2 2.x and beyond. I wish I kept the pallet-load of floppies from the 2.0 beta, just for nostalgia.

I though it was great because it could do DOS, win3.1 and OS/2 so you had it all. But at the end of the day it was a bit too hard to get used to installing, configuring, etc.--at least back then. By today's standards it was actually well done and pretty easy, especially when compared to how much of a mess it was to lock down anything NT and after. (I had actually written small batch files to lock down 95 and 98se and win3.1 could be locked down by just setting certain .ini files to read-only.)

erek · Apr 23, 2023

pendragon1 said:
lol risc again? im not even gonna post the hackers pic this time...
https://hardforum.com/search/571702...1&c[nodes][0]=116&c[title_only]=1&o=relevance

Who were the detractors on here about RISC-V on the Desktop? Scope out this processor beating out an Arm chip significantly in a GIMP filter benchmark!!

Exciting times for sure!!!!!

pendragon1 · Apr 23, 2023

erek said:
Who were the detractors on here about RISC-V on the Desktop? Scope out this processor beating out an Arm chip significantly in a GIMP filter benchmark!!

Exciting times for sure!!!!!

excited to switch your desktop to this?
lol beats a pi in gimp by 3sec, 50 vs 53...
make sure to stay focused on that and ignore how crap it actually is to use with a 1ghz single core and 2D only.
hardly "will change everything"...

Sycraft · Apr 23, 2023

pendragon1 said:
excited to switch your desktop to this?
lol beats a pi in gimp by 3sec, 50 vs 53...
make sure to stay focused on that and ignore how crap it actually is to use with a 1ghz single core and 2D only.
hardly "will change everything"...

I think he was being sarcastic.

pendragon1 · Apr 23, 2023

Sycraft said:
I think he was being sarcastic.

im pretty sure hes serious and if not, def needs to add a /s

1_rick · Apr 23, 2023

pendragon1 said:
how crap it actually is to use with a 1ghz single core and 2D only.

I mean, that's already the experience you get with a Pi Zero.

They're technically usable, and they're probably decent for IoT-type stuff (although personally I'd see if it runs better without X.)

OutOfPhase · Apr 23, 2023

1_rick said:
They're technically usable, and they're probably decent for IoT-type stuff (although personally I'd see if it runs better without X.)

At first, yes. RISC-V will make inroads in the very low cost / low performance items (embedded systems) without a doubt. No need for compatibility, high concern about every aspect of cost, including licensing. Performance concerns are generally super low, compared to other use cases.

It's a much harder battle beyond that realm. Possible to have a RISC-V based desktop? Oh, of course.
But - for more "open" devices, compatibility is of course a thing.
And competing on the very high end - unbelievably hard.

1_rick · Apr 23, 2023

OutOfPhase said:
At first, yes. RISC-V will make inroads in the very low cost / low performance items (embedded systems) without a doubt. No need for compatibility, high concern about every aspect of cost, including licensing. Performance concerns are generally super low, compared to other use cases.

It's a much harder battle beyond that realm. Possible to have a RISC-V based desktop? Oh, of course.
But - for more "open" devices, compatibility is of course a thing.
And competing on the very high end - unbelievably hard.

Sure, I don't disagree with anything you say. But if someone said 5 years from now, or more like 10, you might possibly see Windows on RISC-V, I wouldn't laugh at them. 10 years ago probably nobody thought (desktop) Windows on ARM would ever be a thing, either.

OutOfPhase · Apr 23, 2023

1_rick said:
Sure, I don't disagree with anything you say. But if someone said 5 years from now, or more like 10, you might possibly see Windows on RISC-V, I wouldn't laugh at them. 10 years ago probably nobody thought (desktop) Windows on ARM would ever be a thing, either.

Oh, I don't think I'd bet against you on those numbers, to be clear.
My point was intended to be that any new arch will likely "bubble up", so for a while, we'll see inroads in things where we don't actually know what the CPU / SoC is, because it doesn't matter.
It does matter a bit more as we move up the chain, but that of course doesn't mean anything is insurmountable, or pushed out 20 years.

lopoetve · Apr 23, 2023

Sycraft said:
At this point there are a few things that should be evident about RISC:

1) It is NOT going to "change everything" We've had RISC chips not just around but widely used for a long time. Every hear of ARM? That means "Acorn RISC Machine". It is a RISC-based architecture. Same with MIPS, PPC, and several others. Notice that, while useful, it hasn't been something radical, mostly by virtue of the fact that it has been around for decades.

2) True RISC doesn't work for modern needs. RISC means "reduced instruction set computing" and the OG RISC designs take it seriously. It is about having as few instructions as possible, and having each instruction only do a single thing. Nice and simple, makes compiler optimization easy... and also means you don't get to have things like vector math, crypto and network accelerators, and all that other good shit that modern chips rely on to do the things we want fast and efficient. We rely on optimized systems to make things fast.

3) All of the original talking points about CISC vs RISC are 100% irrelevant these days, as processors work fundamentally different and all of them can be considered hybrids of some fashion.

4) With any new chip, it is "put up or shut up." Never believe hype, lots of companies want to claim they are going to just annihilate everything on the market, it is rare if ever they deliver.

5) Be suspicious of anything with super high core counts. Usually if a company throws way more cores than you normally see on a given size die, that means their cores suck and can't do much and they are trying to make up quality with quantity. While there are some limited applications where things truly do scale to an infinite amount of cores and you can trade execution speed for more in parallel, there are also plenty of things that do not scale as well and need higher core speed. Not only things like games, but even simulations. Ansys is one such example where part of it runs on as many cores as given, but other parts (the setup basically) needs less faster cores. If someone offered you a 128-core CPU that had about the same performance as another company's 16-core CPU, the 16-core is the better buy because it will perform better across a wider range of applications.

RISC-V is a neat idea, being an open standard and all, but I'm not going to get excited until someone actually delivers chips that are worth having. Talk is cheap, real world performance often doesn't meet claims. Anyone remember the Transmeta Crusoe? How about the Elbrus E2K? No? My point exactly.

Lakados said:
Special purpose silicon is the path forward, Apple does it, Broadcom/Qualcomm are doing it, Intel and AMD have announced it. ARM, RISC, x86, they do their workloads and they handle them well enough, but you get into situations where you have some stupid simple task you have to do thousands of times over and over, and using general computation is too inefficient for it its either too hot, too hungry, or too slow. So in comes some accelerator on the chip that offloads that specific task, we are going to see more and more of those accelerators in the coming future, the days of throwing more transistors at the problem are gone, the scaling doesn't keep up, TSMC 5n to 3n is the best example of this a 1.3x increase in transistor density at 1.5x the cost increase, it will supposedly be even worse for the jump from 3-2. Chiplets on different nodes doing very specific tasks will be a thing sooner than not using the UCIe standards, many developers across the board are all talking about the end of hardware-defined software and the rise of software-defined hardware because saying well you just need a faster CPU isn't going to cut it for much longer because things are hitting a wall on what is reasonable.

Sycraft said:
Intel already does a ton of it, it is just getting more press now. Two examples in your CPU right now:

1) AES encryption. Download Veracrypt, or something else that can do crypto benchmarks, sometime and compare speeds of AES vs something like Twofish. It is an order of magnitude faster or more. How is that possible? It is efficient but not THAT efficient of an algorithm. The reason is that all modern Intel chips have a little dedicated section of silicon to do AES encryption and decryption, in particular things like the row shifts, extremely fast.

2) Video decoding. You can play back H.264 or VP9 video and it takes almost no power and CPU. Yet if you fire up something that does it in pure software, it hits the CPU pretty hard. How? Again, dedicated silicon for each video algorithm. All it does is decode that but does it very fast.

For even more specialized, Xeons have some features on them that are company specific. It is all under NDA, and disabled if you aren't that org, so we don't know precisely what it does but some orgs have their own special parts of the CPUs.

As you say, it is just the way of things, and has been for some time. Even vector math processors are an example of that. While they are more general purpose than something like an AES decryption unit, they are still special purpose to, well, vector math. You can use them to perform scalar math, but it is inefficient. They get all their efficiency from big vector operations. There's a bunch of "DSP" type functions like that and FMA that have come to CPUs to speed up those use cases.

It would be nice if we could build a real simple CPU and just make it real fast, but we can't, so instead we have to build a complex CPU with lots of special bits to do all the things we need fast.

RISC-V could do that, of course, and maybe it will. However I'm not excited until I see a chip on the market that performs well across a wide range of workloads.

Agreed with the above. RISC is fine - heck there are RISC parts inside of any x86 CPU now. And your GPU. And all over the place. Lemme know when they make a processor that competes head-to-head with full-fat desktop/HEDT/Server processors with a benefit to justify switching (and open standards isn't it, sorry), and have the software that isn't easily ported moved over as well. Then we'll talk.

Lakados · Apr 23, 2023

lopoetve said:
Agreed with the above. RISC is fine - heck there are RISC parts inside of any x86 CPU now. And your GPU. And all over the place. Lemme know when they make a processor that competes head-to-head with full-fat desktop/HEDT/Server processors with a benefit to justify switching (and open standards isn't it, sorry), and have the software that isn't easily ported moved over as well. Then we'll talk.

I have my NT4 accounting software working in Windows 11 ARM 64 running in Parallels on an M1Pro now. It’s getting easier every day to port things and even if it’s inefficient as all hell it just needs to run better than the P2’s it was developed for.

Building High-Performance RISC-V Cores for Everything

[H]F Junkie

Extremely [H]

[H]F Junkie

Extremely [H]

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

[H]F Junkie

Supreme [H]ardness

Supreme [H]ardness

[H]ard|Gawd

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

[H]ard|Gawd

Supreme [H]ardness

[H]ard DCOTM December 2023

Supreme [H]ardness

[H]F Junkie

2[H]4U

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

[H]F Junkie

Extremely [H]

Supreme [H]ardness

Extremely [H]

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Extremely [H]

[H]F Junkie