GTX 480 SLI PCIe Bandwidth Perf. - x16/x16 vs. x8/x8 @ [H]

Any chance of a 4x/4x review for us poor folks that are still on 8x/8x PCI-e 1.x?

Brent is going to do this. Good idea. I think it is more academic than anything else for the most of us but will be interesting to see.
 
Sorry if this has been addressed before, I didn't read all the posts.

You guys claim that your findings apply to both nvision and eyefinity, but I didn't see any AMD cards tested in crossfire.

I'm not sure AMD would perform worse in 8x 8x as it seems to me that xfire performance is not bandwidth limited.

Did I miss something?
 
Sorry if this has been addressed before, I didn't read all the posts.

You guys claim that your findings apply to both nvision and eyefinity, but I didn't see any AMD cards tested in crossfire.

I'm not sure AMD would perform worse in 8x 8x as it seems to me that xfire performance is not bandwidth limited.

Did I miss something?

Yes that is my educated guess, opinion, hypothesis, or claim however you want to read into it.

Please share your results when you have them. I am not going to spend the resources on it when I think the situation is very much spelled out.

Edit: But to address your concerns about my opinion on the subject, I have sent a mail to AMD asking if we are on track with our thoughts. I will let you know what AMD says.
 
Last edited:
I'll take your word on it, I just thought there was something missing.

To my knowledge AMD is not passing any more information across the PCIe bus than NVIDIA. In fact we had some discussions about this a year or so ago specifically when we were introduced to the GPU architecture. But as my edit notes above, I have asked AMD for its input on your behalf.
 
I'm kinda surprised by the results. i was expecting 8x/8x to be a lot worse!

most probably were, thats marketing doing their job.

If I was typing this one up, I woulda selected "subtle emphasis" on this line:
Kyle Bennett said:
to create a x8/x8 PCIe 2.0 operating mode

Which means, if you're running some of the older LGA 775 or AM2 chipsets like 680i [hah, who am I kidding all those boards died loong time ago], you might be in trouble, as, IIRC, dual X16 @ 1.0 gives the equivalent bandwidth to dual X8 @ 2.0.

But, speaking shrewdly for a second here, this is absolutely the kind of thing which is going to pull more readers than a 50 page review about a board parters variation on a well tested SKU [I wanted to link to an x-bit review but it seems they've stoped that kind of stuff, but tweaktown is still testing the bejesus out that type of stuff]. Definitely well worth the read.

Thanks for confirming my suspicions!

Now if I could only get Paul Johnson to plug two GTX 480's into a 500W PSU...
 
On a side note. I don't recall having seen so many articles on same architecture cards. I guess there's not much new stuff coming out soon.

Has nothing to do with what is coming out or is not coming out, has everything to do about legitimate questions that we have and our readers have.
 
My point was that since xfire doesn't scale as well as SLI it wouldn't be bottlenecked by the lower pci-e bandwidth. Anyway as you clearly stated, maximum difference is less than 10% and it really doesn't affect gameplay.

Maybe CFX is already so driver limited that you would not see the 7% at 5760? Maybe. But I think the point is that is surely does not matter except maybe to a benchmark monkey. See, less work that I have to finance, same results. All this shit does not happen for free.
 
Actually, this is pretty cool for one interesting reason.

Actually, your post is pretty darn cool and it's somewhat refreshing to read comments by people who enjoy computing and have an open mind. Lots of times people are so full of the prejudices and half-truths and even untruths that they've read on this or that website or this or that forum, that if anything is said that runs counter to all of the gobbledygook they've absorbed elsewhere they won't listen no matter how nicely and courteously you couch your statements, and all they'll do is argue the issue they are contesting into oblivion. It's obvious to me that you don't fit this category and therefore should consider yourself blessed...;) You'll probably have a long and happy life!

All of us PC enthusiast spend a lot of time and money and honestly, the last thing we want to hear is about something we don't really need, like the latest and greatest. For example, dual 16x pci slots. Was just proven we don't need them.

That's exactly right. This is something I thought that pretty much everybody knew and had known ever since the days of 3dfx, when the arguments of the day revolved around the AGP bus and a capability known as "AGP texturing." In fact, just as with today's PCIe bus, it turns out that what provides a discrete 3d card today with most of its measurable performance capabilities is how much ram is on the card and how fast the local ram on the 3d card is. Just as was true in the days of the AGP bus, texturing and doing other things out of system ram, whether we are talking about accessing system ram across the AGP bus or across today's PCIex16 bus, is far, far and away a slower process than when today's discrete 3d card accesses its textures and other data directly from the card's own local-bus pool of ram.

So that's why discrete 3d cards have a low of 512MBs of ram onboard to a high of 2GBs of local ram onboard. The more often a discrete 3d card's gpu can run wholly out of it's local pool of ram without having to interrupt the flow and go get the textures or other information it needs across the much, much slower PCIex16 bus, the data comes from data stored in standard system ram, the faster the 3d-card's frame rate will be, and the faster it will be consistently.

Q:How much faster?
A: A whole lot faster...! In fact, there's no real comparison.

I pointed this out in the last thread associated with the x16/x16 vs. x16/x8 [H] article the other day, but I'll do it briefly once again:

The memory bandwidth of a GTX480, that is, the speed at which the GTX 480 gpu can read and retrieve data from its pool of local ram on the GTX 480 card, is ~177.3 GBs/second. Man, that's blazing fast, isn't it?

OTOH, If there's a texture/textures, or other information that the game needs to run that is *not* already loaded into the GTX 480's onboard pool of local ram, then essentially the whole game comes to a screeching halt--not literally mind you, but figuratively--and the gpu is left spinning its tires in place while it hunts and pecks in the much, much slower system ram of the computer for what it needs, and when it finds it. it must transfer it across the PCIex16 bus to the slot where the GTX 480 sits so that the GTX 480 can load that information into its much faster onboard local ram, and from there the textures and other info is executed and the game continues to flow. Here's the $64M question: How much faster can the local gpu ram operate than the local system ram across the PCIex16 bus? Short answer:

The memory bandwidth of the GTX 480's internal pool of local ram is...177.3GBs/second. The memory bandwidth of the main computer's system ram, which must be accessed through the PCIex16 bus, since there is no other route available to access it, is limited to the top speed of the PCIex16 bus. And that top speed is...drum roll...8GBs/second.

So, in the case of accessing the GTX 480's onboard local ram versus the GTX 480 having to access the main computer's system ram via the PCIex16 bus, the GTX 480's local bus is 22.1625 X faster than the top speed of PCIex16. Because it is so much slower, the bottleneck in this case is unquestionably the PCIex16 bus--because it is > 2200% slower than the bandwidth locally available on the GTX 480 card itself.

(These principles apply equally to ATi's current discrete 3d-card line up, with there being only a very slight difference between the GTX480's onboard bandwidth and the HD 5870's. Both cards run rings around the top speed of the PCIex16 bus.)

When we get those seemingly inexplicably bi-i-i-i-g drops in frame rate when playing a game, drops that appear and disappear very fast--but stay around just long enough to get noticed, it's because the gpu is having to slow down and use the PCIex16 bus to access an element the game software requires, because that element has not been loaded into the gpu's local memory, but resides in the main memory of the computer--which again, is accessible *only* via the PCIex16 bus.

Six Foot Duo, you seem to have all of this under control just fine--I am using your post to hopefully communicate these very simple realities to a couple of folks in this thread who seem a tad bit confused by [H]'s results, because they were operating under some false assumptions about what the PCIex16 bus is and it's performance role in today's 3d games in terms of how an entire computer system works together to allow us to play our games. Thanks for bearing with me...;)

Certainly, no one out there is talking. It's honestly a question we bury and ignore. We lie to ourselves all the time. I know for a fact that I've been splitting hairs for years. Cutting edge costs a TON of money.

I've been building computers since 88, 89 and I am finally coming full circle.

Good point! I've been "rolling my own" systems since the 1980's, too, and I still do it--I just don't buy the most expensive and the most hyped components anymore, because I've learned better over the years...;) It really is true that the great majority of people running a 3d game would never know the difference between 150 fps and 125 fps--in a double-blind test--a test run without 3d benchmark frame rate counters. Games play just as great at 125fps as 150fps.

I just sold a top of the line 920 @ 4ghz, 6gig ram, 6 TB, Crossfire 5870's, Dual SSD's in raid, etc etc.

And guess what, my new MSI NF980-G65 AMD SLI board, my AMD 1055t running at 4.1Ghz, 4gig dual channel memory, my SSD 120gig HD, my Asus 460's in SLI are just as fast as what I had with a hell of a lot less money spent.

Not surprising at all! Good deal, and delighted it's working out for you! I also like AMD just fine, and reading years of other site's stilted benchmarks that try to convince me otherwise has never made a difference.

What people have and what performance they actually get between platforms / components when it comes to real world performance doesn't justify the cost or is just not needed a lot of the time.

yep--absolutely!

I would love Kyle and Company to flip the Hardware Enthusiasts Sites on it's ear and start Debunking a lot of the so-called performance gains / specs we have been seeing for years.

Honestly, don't stop at 16x / 8x real world performance evaluations.

Kyle's really been doing that for a long time...;) [H]'s thrust, ever since he started doing 3d cards, and I recall HardOCP when the only thing he did was motherboards--because at the time that's just about all that computers were, [H]'s thrust has been to try and review 3d cards and other products related to gaming so that the reader would get as close to a real user experience as it was possible for Kyle and company to provide. I think that by and large [H] has done a good job of that over the years. [H] isn't perfect, but who is? I know I'm not...;) (Well, er, I'm only sort of not perfect, really. Just don't tell anybody!)

Anyway, moving on...most of the time, as I've said in both these threads, for some users the problem lies in the misconceptions they develop about how hardware works. For instance, people hear so much marketing goulash about PCIex16, and how much faster it is that PCIex8, etc., that they draw the completely erroneous conclusion that PCIex16 is going to make their expensive 3d card run faster than it would run if plugged into a PCIex8 slot. It's just not true, because that expensive 3d card is already so much faster in terms of local bus throughput than either PCIx16 or PCIx8 that the differences in bandwidth between PCIex16 and PCiex8 are too small to have any impact at all on game play--provided the game is running wholly out of local gpu ram on the 3d card--as ideally it should.

I wish they still made 256MB 3d cards these days, or rather 256MB PCIex16 cards, because it would make the following experiment much easier and more obvious in result. But as far as I know, 512MB is the minimum amount of ram you can buy on a PCIex16 3d card today. So find one of those somewhere, plug it in, and raise the resolution of your display to 2048x1536 and pour on the eye candy and attempt to play the game as you normally do with your 1-2GB local ram PCIex16 3d card...

What's going to happen quickly is that you will see very dramatic slowdowns at various places in the game, slowdowns that keep occurring more and more often as you play. In fact, these slowdowns will be so dramatic when they occur that you may well think your game is unplayable--and at these frame rates it *is* unplayable. . And by "slowdown" I really mean slowdown! I'm talking about slowdowns that push the system into single-digit frame rates, so slow that what we get is not a fluid game, but more like a slide show that is really quite unplayable.

What's happening is that when the local 512MBs on the 3d card fills up then the system is forced to fall back onto the PCIex16 bus so that main system ram can be accessed to find the elements the game needs to proceed. This brings to life just what a bottleneck PCIex16 is compared to a game running out of a 3d card's local onboard ram--it's that single-digit frame-rate performance that PCIex16 delivers! That's literally what you could expect if we somehow were forced to run all of our 3d games out of main system memory using the PCIex16 bus to access our system ram! In other words, compared to discrete 3d cards with their own large pools of very fast local ram, running out of a PCIe16 system bus is dog slow. Again, this is *why* modern 3d cards have their own dedicated pools of high-speed local ram. If not for that discrete 3d-card design there'd be no 3d games industry today as we know it.

PCIex16 and PCIx8 are only valuable in terms of integrated and on-die graphics-cpu processors, as although PCIx16 at 8GBs/second bandwidth is much better than AGP for integrated gpus--but for the sole support of discrete 3d cards, the low bandwidth of PCIex16 isn't worth using at all.

Thanks for your indulgence...;)
 
Would running the highest FSAA and AF cause any saturation on the PCIe bandwidth or is that reflected on the card itself and doesn't bog down the PCIe bandwidth? For instance if you're running 2560x1600 with the highest settings, would 16x-16x / 16x-8x or 8x-8x show any severe fluctuations?


No.

I'm wondering where people come up with ideas like "the saturation of the PCIex16 bus"...? When you are running discrete 3d cards with lots of fast onboard ram, you don't need to worry at all about the state of the PCIex16/x8 bus, because your game is not running out of the PCIe bus, it is running out of the your 3d card, which includes it own local gpu and its own local onboard ram, which is far, far faster than the maximum theoretical throughput of PCIex16! Let alone PCIx8. If, compared to your 3d card's onboard local ram throughput, PCie16 is dog slow, then PCix8 is dog, dog slow...;) We don't *want* to *ever* run our games directly off the PCIe bus--we want to run them out of our discrete 3d cards with their powerful gpus and their exceedingly fast(er) ram buses. Doing so is the key to smooth and fast frame rates.

OK, features like FSAA and AF are features that run internally on the discrete 3d-card's local bus. FSAA and AF have literally *nothing to do* with the PCIe bus--again, these features run out of the discrete 3d card's internal and local ram bus, as does the game itself, ideally.

Again, the only time you see a "severe fluctuation" in frame rates when playing a game is when, for some reason, the gpu must reach outside of its discrete local bus to find an element that is in system ram but is not in the gpu's dedicated ram pool. The PCIe bus serves only as a very low-speed (compared to the gpu's local ram bus) conduit to move textures and other information from the computer's comparatively slow system ram into the 3d gpu's far faster pool of local ram. That's how much slower PCIe is than a gpu's local ram bus--a discrete 3d-card's local bus ram is wa-a-a-a-ay faster than PCIe.
 
was asking for the impact of CPU speed on the PCIe bus for a x8/x8 configuration. Since the PCIe bus is the slowest link in the chain GPUs are designed to access the bus as little as possible, right? But with a CPU bound game there is a greater chance that the bus could be saturated with the x8/x8 configuration since more data must be passed.

The PCIe bus always runs at the same speed, regardless of the software you are running or the cpu you have installed. At x16, the PCIe bus runs bi-directionally at a speed of 4 GB/s in & 4 GBs/s out, simultaneously, for a total max throughput of 8GBs/s. At x8, that is halved, so that we have a bi-directional 2GBs/s in and 2GBs/s out, simultaneously, for a max total throughput of 4 GBs/s. Depending on how the PCIe bus is setup to run it will always run at x16 or x8, unconditionally, and quite predictably.

There are basically two kinds of data that traverse the system when running a 3d game. The first type of data is instructional data, such as that which travels to the cpu from the gpu and vice-versa, and this type of data is very, very small in terms of its size-- kbs, usually. In terms of bus contention, this instructional data is too small to ever stress the PCIe bus, regardless of its frequency of transmission.

Second type of data is textures and other graphical elements. This is the data that takes up a lot of room and could, if it was allowed to, stress the PCIe bus to some degree. This is the data that mandates discrete 3d cards having upwards of 2 GBs per card of local ram. Ideally, the PCIe bus doesn't become the bottleneck because applications, games, and gpu drivers are written to as closely as possible anticipate what textures will be needed in the gpu's local ram pool and when. So while the discrete 3d gpu is rendering the game with textures pulled from its local ram, the game software is busy loading advance textures from system ram across the PCIe bus even as textures are used and discarded as instructed from the gpu's local ram pool. This should always provide for smooth game play with little if anything in the way of PCIe bus contention. But...sometimes a texture is missed and the game pauses momentarily and things are not as seamless as they should be because of the slowdown incurred by having to wait on the PCIe bus.
 
given the results, anyone think that driver updates may address this issue? I mean considering the findings I wonder if there is anything that can be done via drivers to utilize the extra bandwidth of the x16 slot? You'd think more bandwidth would make it faster, but in this case i guess not for now. I mean for nvidia everytime they make driver updates they claim that in different games that for example...crysis 10% faster with update xxx.xx.
 
I saw a similar comparison on another hardware site a year ago, with different high end cards, with same result. That is when I bought my less expensive 8x/8x setup. Still good to see results holding even with gtx 480s.
 
Typo fixed. Kyle

There goes motherboard makers marketing about who has the best lane configuration with USB/SATA 3. Glad that P55's lack of lanes isnt terribly bad for gaming. As always, more isnt always better. I patiently await the arrival of the x4/x4 article.
 
Last edited by a moderator:
No.

I'm wondering where people come up with ideas like "the saturation of the PCIex16 bus"...? When you are running discrete 3d cards with lots of fast onboard ram, you don't need to worry at all about the state of the PCIex16/x8 bus, because your game is not running out of the PCIe bus, it is running out of the your 3d card, which includes it own local gpu and its own local onboard ram, which is far, far faster than the maximum theoretical throughput of PCIex16! Let alone PCIx8. If, compared to your 3d card's onboard local ram throughput, PCie16 is dog slow, then PCix8 is dog, dog slow...;) We don't *want* to *ever* run our games directly off the PCIe bus--we want to run them out of our discrete 3d cards with their powerful gpus and their exceedingly fast(er) ram buses. Doing so is the key to smooth and fast frame rates.

OK, features like FSAA and AF are features that run internally on the discrete 3d-card's local bus. FSAA and AF have literally *nothing to do* with the PCIe bus--again, these features run out of the discrete 3d card's internal and local ram bus, as does the game itself, ideally.

Again, the only time you see a "severe fluctuation" in frame rates when playing a game is when, for some reason, the gpu must reach outside of its discrete local bus to find an element that is in system ram but is not in the gpu's dedicated ram pool. The PCIe bus serves only as a very low-speed (compared to the gpu's local ram bus) conduit to move textures and other information from the computer's comparatively slow system ram into the 3d gpu's far faster pool of local ram. That's how much slower PCIe is than a gpu's local ram bus--a discrete 3d-card's local bus ram is wa-a-a-a-ay faster than PCIe.
Thanks for the information. I wasn't sure if stressing the cards "features" would have anything to do with the PCIe bus... Now I know. I feel smarter now! lol ... Thanks for the 411 though. It's appreciated.
 
Not trying to be an ass, but what is the purpose of Crossfire while running 1920x1080? It seems like overkill.

120fps...

More than worth it if you want to hit 120fps on a 120hz screen for that silky smooth CRT like gameplay.

I try to do 120fps on Eyefinity and it's [H] with a single 5970.

See what I did there! :p
 
I've always wondered how much information is moved in and out of a graphic card once the gameplay starts and the 3D rendering starts, where we should have all textures loaded into the graphic card RAM. If it have to continuously swap textures in and out during rendering, surely it will cause the performance to drop drastically and stutters, a situation we would want to avoid.

Can't say I'm really surprised by the result, but its good to see the guys at [H] confirming it with a solid test.
 
Since Hard is so proactively "reader responsive", and in conjunction with a lot of what we're seeing from the test articles lately, I'd propose this:

How about a comprehensive "real world" benchmark test between two systems:

1. Old generation quad core (e.g.Q6600) on a "pro" LGA 775 motherboard, with DDR2 ram, default settings.

2. Speed-Equivalent NEW quad core processor on new "pro" motherboard with DDR3 ram, default settings.

All else equal "best of" (GTX480 SLI video, fast hard drives, same amount of quality RAM, OS)

The reason that I am asking is that because I don't think we'll see much of a difference, based upon the in-depth expository articles that HardOCP has been submitting lately. Other than video cards, I don't see much of a reason to upgrade anytime soon, and even then, it would be better to wait for ATI's 6XXX series to be released. (Worst case, they'll bring Nvidia's prices down a bit and/or be a cost-effective competitor.)

If there's not much difference at all (which is what I would suspect), then this would point to the bottleneck of the storage system. At that point, it would be interesting to see what SATA 6 and SSD would bring to the table. As well, we could be looking at "the way it's written" when it comes to the OS, the application, and/or drivers.

Whaddya all think? Have we hit a software and storage wall, more than hardware? Kyle and Brent seem to be on to something big here.
 
...isn't what you ask quite like this?

Hey, thanks for the link, but it really doesn't cover what I'd like to see as much as I'd like. What I'm looking for (and am sure many others are as well), is if the motherboard/ram/cpu platform makes much of a difference when the hard drive subsystem is the main bottleneck these days, combined with graphic card combos which take full advantage of what the motherboards have to offer.

As for SSD drives, we're seeing a jump in performance compared to the mechanical spinning disks, but even SSD drives could be hampered with the SATA interface. I would believe that a new parallel interface would help them, more than doubling the SATA standard does. SATA was good for mechanical drives because they're inherently slow, so you don't need a blazing fast interface for them. (RAID 5, for example, shows that SATA's got headroom for more speed...to a point, where the law of diminishing returns shows that after about 5-6 drives, you've pretty much hit the wall. I've done this myself to be sure.)

It seems to me that, much like the onboard RAM, SSD drives could shine with a better, faster parallel interface...but we're stuck with the standard until SSD drives become more affordable, bigger, and eventually put an end to the mechanical drives.

As far as graphic cards go, well, let's just say that these last few Hard articles have been eye-opening. The law of diminishing returns definitely applies to the lanes, as well as card performance. (Basically you see greater jumps in performance from the lowest end card to the next step up, and then relatively smaller gains as you move up the ladder.)

Thusly, it seems to make sense that IF the cards aren't taking more advantage of these extra lanes, and IF graphic and central processors are capable of handling more data than the graphics cards are getting, THEN something's amiss.

32-bit processing was the first time that hardware capabilities outpaced software's evolution (by a far distance), but software caught up fairly quickly. (I used to support 8 and 16 bit mainframes a long time ago, and writing apps that could "swamp" them was easy.) 64-bit seems to have increased the gap considerably now, and moreso when you consider that 32-bit apps are usable on 64-bit platforms, and that they're still practical for "backwards" compatibility on 32-bit platforms.

When 32-bit "hit", the only real obvious bottleneck became the storage subsystem. And hard drive technology hasn't really changed much, when you consider the physical aspects.

So...I'm wondering...are our CPUs and GPUs simply becoming "starved" for data, and, hence, we're not seeing jumps in performance we'd expect with new MB/CPU/GPU/RAM tech?


I guess one experiment would be to RAID a bunch of SSDs on SATA 6.0 (as compared to the equivalent number of mechanical drives), and run benchamrks. I'll be looking for something like that too, but that would be the next step beyond the "equivalent platform" test, in order to make sure the data was both accurate AND valid.
 
http://www.nextlevelhardware.com/storage/battleship/

Check this out. Old article, but relevant to this thread and the testing review. Especially notice 1) the Crysis benchmarks, and 2) what the article says about even the great separate RAID controller "dead-ending".

As-is, the performance we can get out of the SSD definitely appears to be interface-related, and absolutely affects "real world" play," even though it blisters hard drive performance.

(Of course, some benchmarks, like photoshop, didn't see a lot of gain, because they're so math-intensive, and that would be expected, so a photoshop-to-crysis comparison isn't an apples-to-apples. I wouldn't expect to see that RAID would help much in a case like that anyway. Fortunately, we don't see much of that from Hard.)
 
Last edited:
http://www.nextlevelhardware.com/storage/battleship/

Check this out. Old article, but relevant those this thread and the article. Especially notice 1) the Crysis benchmarks, and 2) what the article says about even the great separate RAID controller "dead-ending".

As-is, the performance we can get out of the SSD definitely appears to be interface-related, and absolutely affects "real world" play," even though it blisters hard drive performance.

(Of course, some benchmarks, like photoshop, didn't see a lot of gain, because they're so math-intensive, and that would be expected, so a photoshop-to-crysis comparison isn't an apples-to-apples. I wouldn't expect to see that RAID would help much in a case like that anyway. Fortunately, we don't see much of that from Hard.)

Wow. Thanks for posting that. What an awesome article.
 
So...KYLE and BRENT!!! With all those toys you have laying around, and considering you're looking to show how we can get the best bang for our buck, how about a comprehensive article that shows us that, to get the best performance, we need to plow it into a separate SATA/RAID controller and multiple SSD drives, rather than spending it on other hardware that won't give us the performance increases we would otherwise hope to get, and at the same time, going GREEN doing it?

Pretty please?
 
Just a quick question. With a lot of boards only doing 16x/8x when used in SLI or CFX when another pci-x card (sound card, phys-x card, etc) is installed could we see a comparison at that setting? Or was this done and the results across the board were exactly like 16x/16x?
 
Frankly, after the last few performance articles regarding graphics comparisons, lanes, Xfire vs SLI, etc, combined with what I've dug up recently, all I am going to upgrade is my storage subsystem for now. THANK YOU, Kyle and Brent, for using lab-grade experimentology in a real-world-usage setting.

I mean, how expensive can it be to RAID 5 smaller-size SSDs with a separate controller, and move my old drives to a media server, compared to what I would spend on a new mobo, ram, and video card(s), only to NOT realize a significant gain from the latter?

Hope a lot of you looking to upgrade for performance have been following these wonderful articles and threads. Maybe we don't have a comprehensive article tying it all together in a nice, neat little pacckage with a ribbon around it, but if you dig through all the relevant stuff here at [H]ard, and combine it with other information elsewhere...it becomes obvious...the bottlneck is still in the drives/controller.

Of course, mobo companies want to sell me new mobos, and the same goes with CPUs, GPUs, and RAM, but, in the end, where's my money best spent (given the system I have - LGA 775, quad core, 4GB RAM, and 9800 GT SLI) in order to get the best total performance boost, and one that will carry me through the next major system upgrade when the prices drop?

800MB/s (or better) sustained reads/writes is nothing to sneeze at! I wonder what SATA 6.0 can do? ;)
 
its not the connection that makes the data rate fast, as HDDs can top out at 120-150MB/s and thats sustained Only (guessing 5-6 HDDs in RAID 0 or 5 to get 800MB/s thats about the max an Intel chipset can do and most PCI-e HW RAID cards as well that are priced at an sane level £200< ), random SSDs win on that side 1 SSD on its own for any gaming system is far better then an RAID 0 setup unless your Really need high speed sustained Data rate and lots of space

Sata 3 or 6 there would be no difference in speed as its an Controller limit not cable speed limit (most tests show the Sata 6 to be lower when used with an Sata 6 HDD)
 
That's exactly right. This is something I thought that pretty much everybody knew and had known ever since the days of 3dfx, when the arguments of the day revolved around the AGP bus and a capability known as "AGP texturing." In fact, just as with today's PCIe bus, it turns out that what provides a discrete 3d card today with most of its measurable performance capabilities is how much ram is on the card and how fast the local ram on the 3d card is. Just as was true in the days of the AGP bus, texturing and doing other things out of system ram, whether we are talking about accessing system ram across the AGP bus or across today's PCIex16 bus, is far, far and away a slower process than when today's discrete 3d card accesses its textures and other data directly from the card's own local-bus pool of ram.

...

The memory bandwidth of a GTX480, that is, the speed at which the GTX 480 gpu can read and retrieve data from its pool of local ram on the GTX 480 card, is ~177.3 GBs/second. Man, that's blazing fast, isn't it?

...

When we get those seemingly inexplicably bi-i-i-i-g drops in frame rate when playing a game, drops that appear and disappear very fast--but stay around just long enough to get noticed, it's because the gpu is having to slow down and use the PCIex16 bus to access an element the game software requires, because that element has not been loaded into the gpu's local memory, but resides in the main memory of the computer--which again, is accessible *only* via the PCIex16 bus.

...

Good point! I've been "rolling my own" systems since the 1980's, too, and I still do it--I just don't buy the most expensive and the most hyped components anymore, because I've learned better over the years...;) It really is true that the great majority of people running a 3d game would never know the difference between 150 fps and 125 fps--in a double-blind test--a test run without 3d benchmark frame rate counters. Games play just as great at 125fps as 150fps.

...

Yeah I mean, the memory hierarchy, its very simple, as you move up, the speed drops and the capacity increases. The GPU has its own on-die registers (literally half a dozen clock cycles from computation), its got its own memory (as you stated, on the order of 100 GB/s on the enthusiast stuff @ 1 ns latency), its got system ram (on the order of 1 GB/s at 10 ns latency), and then lastly its got the hard drive (on the order of 10MB/s (= 0.01 GB/s) at 10 ms (= 10,000,000 ns) latency).

So, following those tiers, certainly the slowest thing is going to be your disk, and with some games you cant afford to move everything in to main memory, so, and this depends entirely on the game, a big slow down (into the single digit fps) could be caused by disk access (texture isn't in the GPU's onboard memory or system ram, so it must be in swap space on disk), it could be caused by a sudden huge shader intensive load, etc. But again, speaking from a GPGPU perspective, the PCI-E bus is stupidly slow.

In terms of the specifics, if you want to know the exact specs on the cards (as determined through some interpolation on behalf of the guys running the site), GPUreview.com is a great tool. The formula for bandwidth on a card is (mem bus width in bits) * (effective clock speed in MHz) / (8 bits per byte) / 1024 ("mega"s in a "giga"), or in the case of the 5870, ~150GB/s.

The truth is, annoyingly, for anyone without the source code, its almost impossible to find out why your frame rate is dropping, and the only solution is to just brute force your way through with faster buses, more compute power, and more cache.

And you would indeed be very hard pressed to find a viewer that could see the difference between 125FPS and 150FPS, as there is absolutely no discernible difference simply because the monitor is going to show you 60 frames/sec regardless (it takes time for a sub pixel to change colour, the so called "refresh rate")

If there's not much difference at all (which is what I would suspect), then this would point to the bottleneck of the storage system. At that point, it would be interesting to see what SATA 6 and SSD would bring to the table. As well, we could be looking at "the way it's written" when it comes to the OS, the application, and/or drivers.

As for SSD drives, we're seeing a jump in performance compared to the mechanical spinning disks, but even SSD drives could be hampered with the SATA interface. I would believe that a new parallel interface would help them, more than doubling the SATA standard does. SATA was good for mechanical drives because they're inherently slow, so you don't need a blazing fast interface for them. (RAID 5, for example, shows that SATA's got headroom for more speed...to a point, where the law of diminishing returns shows that after about 5-6 drives, you've pretty much hit the wall. I've done this myself to be sure.)

It seems to me that, much like the onboard RAM, SSD drives could shine with a better, faster parallel interface...but we're stuck with the standard until SSD drives become more affordable, bigger, and eventually put an end to the mechanical drives.

"the way its written" is exactly right. Much of how the OS scheduler...

crash course in OS design: OS runs at the same level as other applications, and it has a processes that schedules which processes get to spend time being executed by the CPU. At the end of a processes allocated time, all of its data (including future instructions) is moved into memory, and the CPU is given a new process to begin munching on, after a number of processes have completed, the OS scheduling process is again put on the CPU, and it again tries to figure out who should be given some time to execute on the CPU. The "set priority" drop down you get from right clicking a process in task manager directly effects this scheduling algorithm. One major factor a CPU uses to determine when a process will be allocated some time on the CPU is when it last asked for something from disk, because if this piece of data from disk is crucial to a processes continuation (meaning the process cannot do anything else without this data from disk), then theres no point in giving that process time on the CPU until the disk has moved that piece of data into memory for the process to inspect.

... determines process priority depends on disk latency. Completely aside from the bandwidth a disk can provide, the latency a disk operates at is absolutely fundamental to modern OS design. It might even be the case that the CPU will schedule the system idle process for longer than it should, as the disk will have successfully completed the read, but the CPU is still sitting on idle process, but I don't actually know how thats done so I wont state that as fact.

But yeah, SSD's are paradigm shifts in computing because they are literally thousands of times faster than disks in terms of latency.

My main rig, as per my sig, is two Gskill Falcon SSD's running in RAID-0. having been forced to use my laptop more than I'd like to because of my 8800GT dieing, I've reacquainted myself with old Windows-on-spinning-media. I hate it. For me the switch to RAID-0 Falcons is a lot like the switch to HDTV: its nice, not overtly, but subtly, and its very difficult to go back.
 
One thing annoys me about some people is the way they cry about bottle necks in CPU when a slower CPU is used in SLI / XFire This debunks that also people saying XFire / SLI is overkill in resolutions of 1920 x 1080 , its not if you want high FPS or you are looking to future poroof for 5 - 6 years. I know I am going to do SLI or XFire in my next build with a monitor that does 1920 x 1080. I am having a hard time finding a monitor that does a higher resolution @ 24" (I live in Ireland so dont post newegg links. NewEgg dont deliver to Ireland) and I dont have the space for more than one monitor on my desk.
 
Say on the MSI GD80 which only does x8 x8 would overclocking your PCIe bus from 100 to say 115 to 125 help close this gap? Or would it cause other problems with the system?

Reason I am asking is because I am running a i7 860 at 4ghz with a blck of 200 and my ram at 1600 on the GD80 and I just got 2 480GTX's to push 3 23in LG LED LCDs. So I was hoping that pushing my PCIe clock higher would keep me from buying a new board.

Any thoughts? Ive heard some stories of people getting HD corruption and graphic distortion even cards dieing. But I feel like if the card is rated for x16 then an OCed x8 port couldnt hurt right?
 
I don't know if this has been addressed, but even though 8x doesn't really bottleneck today's cards, I've personally noticed that it introduces more microstutter with GTX 470 SLI and 5850 CFX.
 
Hi Kyle ,1st id like to say as usual great freakin review !love you video reviews too! anyhow,i kinda figured it didnt make any real difference i have two xfx 5850 black editions i upgraded from my two hd 3870s last month for on my ole rig msi p45 platinum -intel qx9770 8gbs viper ram and on this mobo as you know it goes two 8x speed when crossfired and i have a 30 inch samsung led and mostly play bfbc2 at 1900x1200 and even my friends x48 mobo with two true 16x pcis and with very similar hardware setup we compared and seen no differences either.:p
 
Back
Top