AMD Fusion

yevaud

Gawd
Joined
Jun 24, 2004
Messages
969
So AMD has finally come out and said they will integrate the GPU onto the CPU die. My biggest question is whether this will ever be a performance GPU or just the budget graphics crap aka intel "extreme".

Press Release:
AMD plans to create a new class of x86 processor that integrates the central processing unit (CPU) and graphics processing unit (GPU) at the silicon level with a broad set of design initiatives collectively codenamed “Fusion.” AMD intends to design Fusion processors to provide step-function increases in performance-per-watt relative to today’s CPU-only architectures, and to provide the best customer experience in a world increasingly reliant upon 3D graphics, digital media and high-performance computing. With Fusion processors, AMD will continue to promote an open platform and encourage companies throughout the ecosystem to create innovative new co-processing solutions aimed at further optimizing specific workloads. AMD-powered Fusion platforms will continue to fully support high-end discrete graphics, physics accelerators, and other PCI Express-based solutions to meet the ever-increasing needs of the most demanding enthusiast end-users.
/press release
LINK
 
Here is osme total speculation for you, with no proof to back it up, so please take this with more then just a grain of salt.

Here is what I think is actually going on. AMD mentioned a long time ago that they were interested in releasing an extension to AMD64 that would extend it's streaming processing capability. That extension was slated for K10. The problem is that K10 has been cancelled, and so what they are doing now is simply doubling the FP units in K8L. This should give a big boost, but isnt anything like what they intended.

So with this in mind, and the awesome power of GPU's at streaming tasks.... A little modification in the front end, and some new instructions, and BAM!! we get a new streaming co-processor on die. Which can be used for graphics in embedded systems, or laptops, or lowend desktops, but with the express purpose of accelerating streaming processing.

Think an AMD type of replacement for SSE.
 
x2100xtxPE on the hypertransport 3 bus please ;)

that would be one hella huge chip though...so yeah...I'd think along the lines of a streaming enhancement like duby mentioned. Although....since it would be from ATI...maybe it would be a DECENT onboard solution that maybe has the performance of say a x700 or something. take the core of an x700, change it a bit, shrink it to 65nm or less...and I think it might fit on the same die quite nicely :p

still poor performance for graphics...but at least it's not "3 month old rotten cloudy piss-poor performance" :p
 
I think having an integrated, low power gpu on the die with the cpu might make it possible for the aux 3d "power card" on a laptop to be powered down completely during normal "web/email" browsing, then powered on only when needed. Would be the best of both worlds.
I also expect the ringbus architecture to show up in the CPU/GPU combo, especially if we're talking multicore CPU + GPU.

The big question I have is where the hell are they going to put the GDDR4/5?? I'm thinking we're going to see a radical redesign of the cpu package, I just hope we're not looking at another slotket fiasco.
 
duby229 said:
If you think about it there already are HT graphics processors. The work has already been done

Here is ATi's onboard graphics processor, which runs directly off Hypertransport.
http://www.hypertransport.org/products/productdetail.cfm?RecordID=72&CFID=132949&CFTOKEN=68656171

As far as when it'll be integrated they say 2009. That puts it squarely in the 45nm timeframe.


That's wicked cool and all....but x300 performance???? For a proof of concept that's great...but I think it would be really wicked to have a "performance" card, whether it be ATI or Nvidia , attached directly to the HT. Bypass the PCI-E bus altogether and leave it for whatever else that comes along (PPU, AIPU, sound, lan, 512-bit 32 unit cell processor board designed for folding :D ) Have your 6 or 8 PCI and PCI-E slots + a couple slots for plugging straight into the HT. Maybe a pipe dream or I took too much migraine medicine...but it sounds cool at the moment :p
 
Here is the deal...

At this point in time the ATX standard only has 7 external expansion slots. If you have cards that need access out side, then you have to plug them into an external slot. It is possible to have internal slots, but where would you put them? So more less were stuck with a maximum of 7 slots.

As far as GPU's go, the buss controller has already been done, they can slap the buss controller on pretty much any model they want. Keep in mind that It was ATi who pioneered the whole "modular" ideology with it's R300 GPU's Slap an HT buss controller on any model they want to an bam done. The hard part is designing the controller. In this case the controller is already done.
 
They could make Fusion whatever they wanted. The biggest problem I still see is what they do about memory and memory bandwidth. If they could make sticks for GDDR4+ that would go into the mobo and still run at near full speed they'd be ok but the regular old ram we're using now is hardly geared towards graphics of stream processing.

I really see this going towards a basic RISC style CPU with a giant stream processor attached to it. CPUs aren't made for math, they're made more for logic. A giant stream processor on the other hand could chew though a bunch of data parallel data paths much more efficiently.

Anyone know if DX10 math specs are a subset of IEEE specs as far as precision etc? The next question is how much different is GPU processing from stream processing. I see the triangles as being the stumbling block. Then getting a framebuffer close enough to the processors to make texturing efficient.
 
A graphics processor is a stream processor.... The only catch is that they aren't really made for general purpose processing...

You can kinda think of them as one minded. They can do one thing, and do it really well. That's graphics, but because of that it is a very powerful floating point processor. A beast really. The idea is how to tap into that streaming power. Well it would require a bit of work, but you could literally design a programmable ISA..... Which has by the way already started... Ever hear of shaders? Well that is what they are. Shaders are more or less programmable instructions.

With a little jerry riggin', it may be possible to create a general purpose streaming ISA that would be usable by application in the same way that SSE is today... The catch is that we are talking about 100 times faster then SSE. No joke.. Literally 100 times faster.

As far as memory bandwidth goes. That is yet to be seen. Listen that has more to do with logic, and nobody is better at logic then AMD. "I need this, by that time" Prefetching baby. I suspect that AMD is smart enough to figure out the logistics of moving data around before it gets released. After all nobody is better at engineering traffic handling the AMD. Look at what they do have for bandwidth. Look at the wonders of HTT. It's amazing.

Also keep in mind that this technology wont be released according to AMD until 2009. That will be well into the HT3 era, and maybe into the HT4 era. Also we'll be using DDR3 by then, and who knows how much bandwidth will actually be available? I'm certain AMD will have it's bases covered. They havent had any problem engineering adequate bandwidth since the EV6 buss. I dont suspect it will be a problem any time soon.
 
A graphics processor isn't quite a stream processor. The texturing functions and geometry based parts make it substantially more complex.

As for bandwidth if it's 100x faster it chews through data 100x faster so 100x the bandwidth is required. Bandwidth requirements of stream processors and CPUs differ greatly. CPUs like low latency hence all the caches. Stream processors like tons of bandwidth but aren't as concerned about latency. They would almost need entire seperately memory busses which may be possible. Hooking DDR2 up to one cpu/HTT link and GDDR4 up to another link might allow for multiple memory configurations and solve the problem but it'd be interesting to see. Then just hope the HT link is fast enough to pull it off.
 
Back
Top