Anandtech: Intel to Support Hardware Ray Tracing Acceleration on Data Center Xe GPUs

Snowdog

[H]F Junkie
Joined
Apr 22, 2006
Messages
11,262
More info indicating everyone is going to move to HW Ray Tracing support.

Though it looks like only Data Center parts may have it initially:
https://www.anandtech.com/show/1428...y-tracing-acceleration-on-data-center-xe-gpus

The announcement itself is truly not much longer, so rather than lead into it I’ll just repost it verbatim.

“I’m pleased to share today that the Intel® Xe architecture roadmap for data center optimized rendering includes ray tracing hardware acceleration support for the Intel® Rendering Framework family of API’s and libraries.”

So no idea if/when consumer GPU will get HW Ray Tracing and DXR support.
 
The issue is it is less risk to create this for the enterprise. I'm wondering how they plan on doing this.
 
I wonder what the usage scenario is for this capability in the datacenter.

Streamed rendered content?
 
I wonder what the usage scenario is for this capability in the datacenter.

Streamed rendered content?
I was wondering the same thing myself. That's a good supposition.
At the same time, remember that Nvidia's "RTX" isn't "Ray Tracing Hardware" but a specific application for Tensor cores in the consumer space. Could be same scenario here, just with some marketing spin on the titles.
 
I was wondering the same thing myself. That's a good supposition.
At the same time, remember that Nvidia's "RTX" isn't "Ray Tracing Hardware" but a specific application for Tensor cores in the consumer space. Could be same scenario here, just with some marketing spin on the titles.



Actually NVidia RTX cards have both specific RT cores, which are exclusively used to calculate ray paths in the most efficient manner possible, and more general purpose Tensor cores that can be used in image denoising.

RT cores on Page 30:
https://www.nvidia.com/content/dam/...ure/NVIDIA-Turing-Architecture-Whitepaper.pdf
 
Last edited:
Most likely this is going to be used for VDI accelleration and to potentially allow 3d programs like for design and manufacturing to work with an accelerated interface without having to spend 5k per workstation just on video cards.

The savings come in bulk use via virtual delivery. Imagine you can give your designers and engineers moderately powerful laptops to use with monitors and such. But no high end acceleration. that makes that 12 or even 24k video card deliver value in spades. Instead of needing 20 or 30 5k+ video cards you can have 3 or 4 of these bigger beasts to serve the needs of the users VDI or App sessions.
 
I wonder what the usage scenario is for this capability in the datacenter.

Streamed rendered content?


AI and Machine learning love Tensor cores. As well as image manipulation and rendering tasks.

And its specifically about rendering... I should really read the article first but I know we are watching for these cards to see if it will make Nvidia re-think some of their datacenter stance.
 
You guys are going to be so disappointed by the consumer version of their graphic cards. Intel will market to the data center because that is where the money is, gaming is the last thing they are worried about at this point. If your lucky they can tweak it to at least game as well as a 580 video card.
 
Lets see... what does a entry tier Nvidia card look like.. 4 chip 32gb 1080ti... hummm how would that ever be converted into a consumer piece of hardware... it's a complete mystery....
 
AI and Machine learning love Tensor cores. As well as image manipulation and rendering tasks.

And its specifically about rendering... I should really read the article first but I know we are watching for these cards to see if it will make Nvidia re-think some of their datacenter stance.

Right, people forget today that almost all CGI and animated movies are ray traced. Because we can finally afford it, and consumers expect animated films to get better as time goes on. Because the rest of the computer industry does :D

So yeah, raytracing is a huge demand to be filled, both for rendering (whole pipeline is used) and machine learning (tensors only).
 
Last edited:
You guys are going to be so disappointed by the consumer version of their graphic cards. Intel will market to the data center because that is where the money is, gaming is the last thing they are worried about at this point. If your lucky they can tweak it to at least game as well as a 580 video card.

You think Kyle would have left this site, to join Intel, if he thought that was the case?
 
You think Kyle would have left this site, to join Intel, if he thought that was the case?

Yeah cause what he does is put on events and talk to the enthusiasts. Crap performance is not his department, will just make his life harder.
 
Yeah cause what he does is put on events and talk to the enthusiasts. Crap performance is not his department, will just make his life harder.

Performance would be an unknown when he took this job. In any kind of public relations job, you have to deal with favorable and unfavorable outcomes. It goes with the territory.
 
  • Like
Reactions: N4CR
like this
Actually NVidia RTX cards have both specific RT cores, which are exclusively used to calculate ray paths in the most efficient manner possible, and more general purpose Tensor cores that can be used in image denoising.

RT cores on Page 30:
https://www.nvidia.com/content/dam/...ure/NVIDIA-Turing-Architecture-Whitepaper.pdf
So that's why it works just fine on Titan-V with tensor cores? Not believing it at all, the costs to validate that 'change' for such tiny volumes of sales to consumer 1k+ card market just ain't going to work.
 
Performance would be an unknown when he took this job. In any kind of public relations job, you have to deal with favorable and unfavorable outcomes. It goes with the territory.
While the goals are probably similar, I'm pretty sure PR is a different department entirely from where the FrgMstr works.

I doubt there'll be much enthusiast engagement to be had with Xe GPUs, anyway, but if someone was going to prove me wrong, it'd be kyle.
 
So that's why it works just fine on Titan-V with tensor cores? Not believing it at all, the costs to validate that 'change' for such tiny volumes of sales to consumer 1k+ card market just ain't going to work.

Not believing what? Ray casting is a well known problem, and dedicated HW does it faster. It really isn't open for debate.
 
Yeah cause what he does is put on events and talk to the enthusiasts. Crap performance is not his department, will just make his life harder.
I have high hopes. Intel has a huge reputation and deeper pockets than AMD and Nvidia combined.
To think Xe consumer discrete graphics will perform on par with 5 year old cards is pessimistic at best.

They picked up some of the best designers.
Already have integrated graphics that perform well enough to game on low settings, sharing slower desktop memory.
Have outstanding video encoding with Quick Synch that does better than Nvidia or Amd, already.

I will wait for reviews but I'll have money in hand to buy if it smears Nvidia.
Im hoping Intel allows for a consumer priced board that also allows for additional memory with a NVME slot like the AMD did on its professional card.
This would allow them to sell more Optane and have huge memory upgrade paths.
 
While the goals are probably similar, I'm pretty sure PR is a different department entirely from where the FrgMstr works.

I doubt there'll be much enthusiast engagement to be had with Xe GPUs, anyway, but if someone was going to prove me wrong, it'd be kyle.

Enthusiast engagement is part of PR.

While the performance of Intel's initial parts is a big unknown, the hiring of all the Industry tech talent like Raja Koduri, Jim Keller, along with a lot of GPU marketing people, and Kyle, signal one thing clearly:

Intel is serious about GPUs this time. This is not a half-ass'd effort like previous outings. It looks like Intel is going all in.

All I was originally indicating, is that Kyle likely had no idea of what kind of performance Intel would have before joining, in response to "Crap performance is not his department...".

But he could see that Intel was going all in, and being part of that probably has a lot of appeal.

I really can't wait to see what they come up with. If decent, a 3-way race in GPU cards should be great for consumers.

Also there should be a much bigger competition on integrated graphics as well, which will be great for low end consumers.

Best industry shift in a long time.
 
Not believing what? Ray casting is a well known problem, and dedicated HW does it faster. It really isn't open for debate.

That the ray tracing cores on a niche, low volume and not particularly high margin product like a 2080ti are any different to the tensor cores on their compute cards. Titan-V running Ray tracing just fine speaks volumes to support that.
 
That the ray tracing cores on a niche, low volume and not particularly high margin product like a 2080ti are any different to the tensor cores on their compute cards. Titan-V running Ray tracing just fine speaks volumes to support that.

Ok. You can believe that, but you would be completely wrong.

Tensor cores are completely different from RT cores and each accelerate a completely different function, and are in no way interchangeable.

RT cores accelerate intersection detection in path tracing.
Tensor cores accelerate denoising the image.

Titan-V does brute force intersection detection on it's massive count of shader cores, not Tensor cores.
It uses tensor cores, for the same purpose that 2080ti does: Denoising the image.


The Titan-V is the largest GPU chip ever made IIRC. It has more Shaders, more texture units, more ROPs, and more Tensor cores than 2080Ti, yet it under-performs 2080Ti by significantly at Ray Tracing tasks because it lacks RT specific HW.

With a truly monstrous chip like Titan-V you can brute force ray tracing at a somewhat acceptable level, but that doesn't make dedicated RT Hardware pointless. Dedicated HW is more efficient and faster.

Also bear in mind that most of what people were comparing is BFV, which was developed on Titan V, so it makes sense that it is rather optimized for Titan V, but Titan V was still ~30% slower.

Now on a generic DXR RT benchmark that has heavier Ray Tracing:
https://www.overclock3d.net/news/gpu_displays/titan_v_vs_3dmark_port_royal_-_rt_cores_matter/1
"overclocked Titan V offering similar levels of ray tracing performance to an RTX 2060"
 
Last edited:
  • Like
Reactions: Auer
like this
I wonder what the usage scenario is for this capability in the datacenter.

Streamed rendered content?

ID software has already flat out said their upcoming game with streaming is going to look better streamed.

https://arstechnica.com/gaming/2019...-to-overcome-ids-stadia-streaming-skepticism/

"Land also teased that id was busy working on ways to differentiate the Stadia version of Doom Eternal in ways that aren't possible on other platforms. "That is all I'm allowed to say on the subject" for the time being, he added."

I have sort of thought that was the logical road for awhile now... think of all the cool things talented game designers (not the licensed engine using masses) guys like Carmak could do if you said here. All the power of this data cluster with all these interesting ASICs and other cool things... what can you make them do.

For those that still don't understand how exciting streaming is... think of something like Enders game. The scene in that movie where the kid is playing a testing game that looks like a pixar movie on a tablet. While Mr. Ford is creepily watching a copy of his stream. We are going to be exactly there by the end of the year. Yes there are hurdles to jump with twitch based games.... but those are not the only games out there. Think of all the insane looking puzzle type games that are possible on a service like Stadia with a Linux cluster GPU system able to dedicate power equal to a couple Radeon VII (perhaps later Intel XEs) per thread thanks to proper load balancing of a few hundred server GPUs... all accessing load balanced tensor hardware as well. Google doesn't have to worry about GPUs having tensor cores... they can simply install a few racks of their own tensor hardware and let the game developers access those APIs. Never mind that you can also have 100 instances all sharing the exact same GPU memory information... just imagine a game with 32 or 64GB of textures, because its very possible as 100 users can be accessing the same loaded textures. A lot of the work going into the systems right now is making stuff like that possible... not just busting lag.

It will all be seemless to a company like ID... they build their game using as much ray tracing as they like... go INSANE. Cause googles gaming server isn't going to have just radeon MI60 (or Intel XE perhaps down the road) they are also going to have racks of Google TPUs.
https://cloud.google.com/tpu/docs/tpus

Stream gaming perhaps won't be the obvious go to for FPS... but I imagine there are going to be some really interesting games aimed at streaming that are likely not going to be possible at all on local hardware.
 
Ok. You can believe that, but you would be completely wrong.

Tensor cores are completely different from RT cores and each accelerate a completely different function, and are in no way interchangeable.

RT cores accelerate intersection detection in path tracing.
Tensor cores accelerate denoising the image.

Titan-V does brute force intersection detection on it's massive count of shader cores, not Tensor cores.
It uses tensor cores, for the same purpose that 2080ti does: Denoising the image.


The Titan-V is the largest GPU chip ever made IIRC. It has more Shaders, more texture units, more ROPs, and more Tensor cores than 2080Ti, yet it under-performs 2080Ti by significantly at Ray Tracing tasks because it lacks RT specific HW.

With a truly monstrous chip like Titan-V you can brute force ray tracing at a somewhat acceptable level, but that doesn't make dedicated RT Hardware pointless. Dedicated HW is more efficient and faster.

Also bear in mind that most of what people were comparing is BFV, which was developed on Titan V, so it makes sense that it is rather optimized for Titan V, but Titan V was still ~30% slower.

Now on a generic DXR RT benchmark that has heavier Ray Tracing:
https://www.overclock3d.net/news/gpu_displays/titan_v_vs_3dmark_port_royal_-_rt_cores_matter/1
"overclocked Titan V offering similar levels of ray tracing performance to an RTX 2060"

You have drank the marketing coolaid. There is no such thing as a "ray tracing core". When AMD and Intel have product with tensor of their own to talk about I'm sure the PR war will go full nuclear.

Nvidias newest tensor engine allows for disparate float point accuracy. It isn't just a trick of the math that their number of tensor cores and "RT cores" are linked. A "RT core" is simply the tensor unit being used to calculate rays at a different FP level then what they are using for denoise. They basically calculate rays using a couple of the clusters.... and denoise with a couple of the other clusters. I'm not saying its not a smart and perhaps even very efficient way of performing RT... just that those same "RT cores" can be told to operate at a higher FP and denoise as well. Its all the same hardware.... all that is changing is how the software is talking to it. And yes NV has improved their tensor core generation to generation... being able to calculate more simple math at FP8 is a big deal... its just not specific purpose built hardware as their marketing dept would love you to believe.
 
You have drank the marketing coolaid. There is no such thing as a "ray tracing core". When AMD and Intel have product with tensor of their own to talk about I'm sure the PR war will go full nuclear.

Not marketing. The math functions are completely different.

You are off the deep end, claiming NVidia is telling baldfaced lies about the HW, and how Volta (and Pascal) do ray tracing.

This is the kind of nonsense that leads people to disbelieve, the obvious, true explanation, and instead believe the earth is flat contrary to all available evidence.
 
Not marketing. The math functions are completely different.

You are off the deep end, claiming NVidia is telling baldfaced lies about the HW, and how Volta (and Pascal) do ray tracing.

This is the kind of nonsense that leads people to disbelieve, the obvious, true explanation, and instead believe the earth is flat contrary to all available evidence.

The math functions are different yes.... correct.

Turing chips and Volta chips both have tensor flow hardware. In voltas case it was not designed for gamers we know that. They never sold Volta gaming cards. (don't think the titan really counts) Turing however did have a few major advancement in terms of tensor. The biggest achievement was being able to calculate lower FP levels... AND allow the entire matrix to be clustered up so that software could calculate different bits of math at different FP levels at the same time, but segmenting the tensor clusters. Which again isn't a big deal but if use every "RT Core" on your card you have nothing left to do denoise... their driver is splitting that work and reserving cores for both purposes, nothing wrong with that.

Look at a 2080ti ... it has 544 Tensor cores. Nvidia claims 68 RT cores on that part... Or put another way. Nvidia has 544 tensor cores split into 8 clusters. If you are NV marketing sure you can call those 8 clusters RT cores if you like. But they are still tensors. They are simply tensor clusters calculating math at FP8 >.<

2080ti 544 tensors / 8 = 64 "RT cores"
2080 368 tensors / 8 = 46 "RT cores"
2070 288 tensors / 8 = 36 "RT cores"
2060 240 tensors / 8 = 30 "RT cores"

I am NOT knocking nvidia. They are first to market with commercial for the masses GPU + Tensor on a package solution. That is huge. I'm not some AMD or Intel super fan throwing shade on Nvidia. I do believe nvidia like any other player in the market knows that things are gearing up for RT at volume in streaming, and perhaps on a smaller scale in consoles. They have the hardware they build for non gaming purposes.... and it lends itself to the math work at hand. No they did not add FP8 to tensor for gamers... but yes it IS usable by gamers. Volta made zero sense for gamers... and Nvidia sure you can argue didn't need anything faster then their 1080 cards anyway. Still its a bit early for tensor in games... but its day is coming sure.

I'm sorry but read Nvidias own white papers on tensor ... and ray tracing and its very clear how their hardware is working behind the driver. Tensor clusters are operating at FP8. Nothing wrong with that... its just not purpose build for gamers. That part is laughable. That doesn't make it useless... just silly marketing spin.
 
The math functions are different yes.... correct.

Turing chips and Volta chips both have tensor flow hardware. In voltas case it was not designed for gamers we know that. They never sold Volta gaming cards. (don't think the titan really counts) Turing however did have a few major advancement in terms of tensor. The biggest achievement was being able to calculate lower FP levels... AND allow the entire matrix to be clustered up so that software could calculate different bits of math at different FP levels at the same time, but segmenting the tensor clusters. Which again isn't a big deal but if use every "RT Core" on your card you have nothing left to do denoise... their driver is splitting that work and reserving cores for both purposes, nothing wrong with that.

Look at a 2080ti ... it has 544 Tensor cores. Nvidia claims 68 RT cores on that part... Or put another way. Nvidia has 544 tensor cores split into 8 clusters. If you are NV marketing sure you can call those 8 clusters RT cores if you like. But they are still tensors. They are simply tensor clusters calculating math at FP8 >.<

2080ti 544 tensors / 8 = 64 "RT cores"
2080 368 tensors / 8 = 46 "RT cores"
2070 288 tensors / 8 = 36 "RT cores"
2060 240 tensors / 8 = 30 "RT cores"

I am NOT knocking nvidia. They are first to market with commercial for the masses GPU + Tensor on a package solution. That is huge. I'm not some AMD or Intel super fan throwing shade on Nvidia. I do believe nvidia like any other player in the market knows that things are gearing up for RT at volume in streaming, and perhaps on a smaller scale in consoles. They have the hardware they build for non gaming purposes.... and it lends itself to the math work at hand. No they did not add FP8 to tensor for gamers... but yes it IS usable by gamers. Volta made zero sense for gamers... and Nvidia sure you can argue didn't need anything faster then their 1080 cards anyway. Still its a bit early for tensor in games... but its day is coming sure.

I'm sorry but read Nvidias own white papers on tensor ... and ray tracing and its very clear how their hardware is working behind the driver. Tensor clusters are operating at FP8. Nothing wrong with that... its just not purpose build for gamers. That part is laughable. That doesn't make it useless... just silly marketing spin.

There is no arguing with flat earthers. :rolleyes:

But for the reasonable people, there is a perfectly reasonable explanation why RT and Tensor cores have a constant relationship, and that is because GPUs are built out of blocks. Each block contains X shaders, Y Tensors, and Z Tensors, and higher end cards use more blocks... Duh...
 
  • Like
Reactions: ChadD
like this
Back
Top