Benchmarking the Benchmarks @ [H]

As flawed as real world gaming might be from a scientific approach...

I have a problem with the people who keep calling real-world testing unscientific. This is of course usually in contrast to people saying canned benchmarking is more scientific since its easily repeatable by anyone.

Repeatability alone does not make a scientific testing method.

The fact is the data you're testing is subject to change. Games are not some constant of nature that will act the same way no matter what's happening.

Every time a company refines their hardware to perform better in a certain benchmark its laughable that anyone tries to draw a conclusion from contaminated testing. Is it repeatable? Certainly. You're repeating a flawed test that gives no significant data.

This article on [H] has shown how flawed that data can be. These canned benchmark numbers are meaningless to a real user. Sure there are some people who like big numbers, but for those of us looking for the best gaming hardware these numbers are completely useless.

Realistically, canned benchmarks are just AMD's/Nvidia's way of getting tech sites to tell the public whatever they want them to hear. Nobody believes performance numbers directly from the company. Why do you so blindly believe it when a tech site runs the same rigged test?

Is [H]'s method perfect? Of course not, it's an impossible task. People aren't machines after all.

But when it comes down to a decision, who do you trust? People who just do whatever test company X gives them, or someone who's played the game? I'll take [H]'s game play experience over a blind test any day.

Canned benching being scientific is BS. No 'scientist' would put any value in a blind test with tainted test data.
 
[H]ardOCP stepped up when again queried about its position and results and used data and statements from the biggest hardware site in the United States to compare to. No attacks were made. I have the utmost respect for Anand Lal Shimpi and what he has built over the years, but I do not agree with some of their review techniques. Their editor also came out publicly and made some very plain statements that we do not in any way agree with. I am not sure when it became a hate crime to have a disagreement in philosophy and method. AT is number 1 in the industry in North America and the obvious choice for comparing to and that is fully valid in my mind. I have known Anand for a very long time, and I can assure you that when he read that, he thought something along the lines of "Well, that is what Derek said in a public forum, he will have to deal with it." Anand knows that Derek's statement strikes directly at the heart of what we do here. Some of you people make this all way too personal though. We have different outlooks and philosophies and we are contrasting and discussing them. That is hardly an attack, and those of you that are characterizing it as such are the ones that seem to need "drama."

I appreciate everyone that has come here to discuss this intelligently. Some of the others of you that have turned to questioning my personal integrity and motives show that you are not capable of expressing an argument that is actually on topic. And once again, those of you making those accusations have no proof whatsoever of those statements. You are laughable.

As I noted in the article, if after you FULLY read it (which it is obvious that many people still have not because they do not have an understanding of our method), and you are still not on board with our methods of video card evaluation, so be it. Then HardOCP is not the website for you and I would not suggest using it as a tool to help you make a decision on buying a video card. Some of you fail to understand that we can accept that. If you choose to base your video card purchasing decisions on canned benchmarks, more power to you. I applaud you for taking information you trust and applying it on what can be a very expensive purchase. As long as you are happy when you get it home, I am happy too. Satisfied customers in the enthusiast computer hardware industry are a hard target to earn and many companies are trying hard to earn your money.

What is interesting is that this real world testing strikes cords with SO MANY people. It is a very polarizing subject to tackle. We have never expected everyone to accept it with open arms as it is very much outside the box. We do know for a fact that many of our readers find value in it, and we personally believe that it is a better way of evaluation. It does not allow for cheating and useless optimizations that benefit only benchmark scores. It does characterize what sort of experience you should have at home when you buy the product.

You either find value in our content or you do not. This article was for those folks that did not understand exactly what we do. This article is for those that said, "Oh yeah, show me where it is better?" And we just did. We did not have to hunt to find an example. We found it first try. That is all the proof we need to continue forward with our video card evaluation philosophy. If after you read the article, you still think our video card evaluations have no value, I can fully accept that and suggest that again that HardOCP is not the website for you.

That is the part I do not understand. I see many of you proclaiming, "He did it for the press!" Well, if I did, and you just went to 20 other forums and griped about that, you just gave HardOCP the press you are griping about us seeking. I see a few proclaiming, "HardOCP is biased and they bend the results to prop up the green or the red team." That is simply not true. Ask the red or the green team about that or better yet look at our track record of evaluations over the last year. You will find no bias, only video card evaluations that have stood the test of time and been great indicators of sales patterns in the following months. You can gripe about our methods all you want, but our track record speaks for itself. I don’t recall the last time I got an irate mail because we made a bad video card purchasing suggestion.

We have one goal at HardOCP, and that is to give our readers enough information to make a good purchasing decision. And the fact is that millions and millions and millions of hardware enthusiast use our free content every year to make computer hardware purchasing decisions and are very satisfied with those purchases. You can sit at your keyboard and shoot holes in our motives, our integrity, and our methods all day long, but the fact of the matter is that millions of folks find value in what we do and we are a big asset to them when it comes to helping them spend their hardware dollars wisely. HardOCP editors are proud of that, and it is something that no detractor can take away.
 
This is really confusing article because makes it sound like ALL canned timedemos are a bad thing.

You missed this part

There is also no doubt that there are some games out there that benchmark perfectly in relation to their real world gameplay. We just don’t know what they are, and quite frankly we don’t care.
 
nice one Kyle... well done I remember talking to you a few years ago personally when you asked me if this was a good idea to switch the formats for video card testing and honestly I have been behind your back since then on this.
 
Some of the others of you that have turned to questioning my personal integrity and motives show that you are not capable of expressing an argument that is actually on topic.

That sums up this thread pretty nicely.
 
This article really puts things into perspective about the relevance of how the other guys' "benchmarketing" is completely irrelevant since timedemos really don't reflect the actual experiences you will have playing the game. If anything it will make me take anyone else's opinion about hardware with a boulder sized grain of salt. Great article guys. Bravo. :D
 
Thanks for chiming in, Kyle.
It's good to see that the [H] powers that be are willing to listen to their populus (and go in to greater depth about the whys and hows of RW testing), but are not willing to compromise based on the rants of an ignorant (whether on purpose or otherwise) few.

My only suggestion is to, over the next few months, pick out a few more games and show a similar test setup. I exoect we would see similar results, but it would be interesting to see the extent of how RW card usage differs from benchmarks.
 
My only suggestion is to, over the next few months, pick out a few more games and show a similar test setup. I exoect we would see similar results, but it would be interesting to see the extent of how RW card usage differs from benchmarks.

I don't intend to spend any more resources on this topic at this time. We have hardware to cover. :)
 
Who cares about the absolute framerates right now? I'm not disputing that the in-game and built-in benchmark framerates will be different, I'm disputing the *relative* performance between the X2, Ultra, and GTX according to [H] vs. the relative perf between the cards according to other sites.

While it may be true that the settings are different and that can vary the gap a bit, the settings alone don't turn the X2 beating the ultra by large margins (other reviews) into the X2 totally losing to the GTX ([H] review). That kind of drastic difference is coming from something else.

How can you just swallow up that reasoning as being sufficient? You don't want to see some proof that this "drastically-different-results-depending-on-settings" phenomenon is real?

Your reasoning is not following logic. It is the absolute framerates with regards to the settings used for best playable that matter. The relative or percentage differences don't matter a damn bit. Lower performance is lower performance no matter how you cut it. Lower performance in real world gameplay when compared to what the timedemo indicates will often mean the difference between a higher resolution or some type of eye candy. That is what this is about. The timedemo indicates real world gameplay at certain settings whereas real world gameplay shows you can't play smoothly at those settings. The absolute numbers show what is really going on and the relative numbers are just a percentage that doesn't indicate whether those settings are playable or not.

The absolute numbers in the timedemo do not match the absolute numbers in real world gameplay. The timedemo numbers are higher than what you will actually experience in the game. That is the problem. As an example, if the timedemo absolute numbers show you can play the game at the highest possible settings and resolution and you try to play the game at those settings but it's not smooth, it means the timedemo is inaccurate. Because of this, it's obviously showing it's a flawed tool as a performance indicator since it's not showing what you will truly experience.

 
I'd love to see anandtech backup there testing methods like this, but i'm guessing it might backfire if they did. :D
 
What do I want from a review/evualtion?
I wan't to know in general, what video card on the market is best suited to my needs, so I can make an informed purchase and make sure I spend my money wisely.

What does [H] provide?
A set of results which are hand tailored to their own preference in visual quality and frame rate. If their goal is to evaluate the card for their own needs then I'm sure it's a great way of doing it. If they're trying to provide good coverage for a range of audiences then I believe it fails to provide anything especially helpful.

Is it useful to me (specifically) as a reader?
Well no, not really, my standards don't closely conform to those of the [H] reviews, and because their testing is so narrow I can't really take anything away from their reviews.

My opinion
I think what [H] are doing is admirable, trying to break the norm and provide something which has more attention to detail, however in doing so they provide a subset of results compared to other review sites who publish a larger array of benchmarks.

What this means is that for people with a similar setup to [H], who prefer similar visual settings, and who prefer to run their games at the same frame rate, are going to get a much clearer idea of whats going on. However people who differ in any of these respects is going to find themselves saying something like:

"well I don't really care about shadows in Crysis, I prefer to crank the AA up"
"well I'm a pro gamer and when I play online I demand a steady 80-90fps"
"well I have a 30" monitor, and tend to sacrafice in game settings to enable me to use 2560x1600"

Do i think canned benchmarks (built in time demos) are bad?
Yes - Individual optimisations for these may skew real in game frame rates

Are [H] the only people to use custom timedemos?
Absolutely not, many review sites have been using their own custom time demos for years now.

Why do I think the typical review type is better?
Most other review sites provied a large array of data, with multiple resolutions, multiple levels of AA and AF, and/or multiple variations of in game settings, their graphs give us an array of data, which to any one person is too much information, but from which almost anyone can pick out the data that is more relevent to them.

With this you can pick the frame rate you prefer, find the graphs where this occours and check out the different settings that it was benchmarked at to achieve that, then compare that to the other cards.

It's more information than is helpful to any one person but its enough information to cover a range of needs. There is more data and less hand holding (the explination of what that data means)

In summary [H] provide a subset of results, if you happen to fall into this subset and find it helpful then thats fine and dandy, if you don't you may find that their review doesn't give you any helpful information what so ever. I prefer to see a large array of results and pick out of that array what is best suited to me, not have [H] tell me what settings and frame rates they think I should use.

I think the best best example I can give of this is pro gamers, who expect very high frame rates when competing online, I don't play in torunaments or for money but I still expect between 80-100 fps when I play online, and in fact I'd argue that the more pro players who play to win are a large portion of the ones spending money on high end kit to ensure good frame rates. (people who expect to use vsync also fall into this catagory, since to have a min of 60fps, you need an average much closer to 100)
 
FYI, I'm not questioning the methodology. I'll rephrase one more time for those who still don't get the problem here.

----[H] shows GTX beating x2. Most other sites show the x2 beating the Ultra.

----[H] says the difference is because they tested in real world gameplay and not using canned benchmarks like other sites.

**The Kicker**
----[H] tests with canned benchmark and still comes out with the GTX on top of the x2. (?)

Having normalized the test to be on par with what other sites are doing, they should now see the x2 beating the Ultra, if their theory about in-game performance being totally different than canned benchmark performance is true. They didn't.. they had the SAME result of the GTX beating the x2.

So now we have a problem. [H] tested *JUST* like other sites did, and *still* got totally different results than them for which card is faster. Why?

I completely agree, and my personal suspicion is that the reason is Vista 32 vs. Vista 64. [H] used Vista 64 with the 64-bit drivers, while nearly every other site used the 32-bit OS. Much of the code between these drivers is very different and it is likely that AMD spent more time optimizing the 32-bit drivers as that is still by far the more popular OS. Kyle has stated that he might go back and do a true apples-to-apples comparison using identical system configurations including the OS.

The issue raised in this latest article is largely academic. Everyone should already know that a timedemo isn't representative of real-world gameplay. After all, in a timedemo the user's keyboard and mouse input isn't being continuously polled so not only does that free up a bit of CPU time it also eliminates the USB system as a potential throughput bottleneck. And on top of that, FRAPS incurs it's own significant CPU overhead. So duh, timedemos will be faster than real world FRAPS. Everyone should already know that or at least be able to understand it logically.

But in a comparison between video cards, timedemos should still show the same relative differences that the real world gameplay scenarios do. And this latest [H] article simply proves it, as the percentage differences between the two video cards are about the same whether it's a timedemo or real world.

The real contentious issue since the original 3870 X2 review was published has always been why [H]'s testing showed 3870 X2 to be slower than GTX, while everyone else showed the opposite. And this latest article has shed absolutely NO LIGHT on that issue, as even in the timedemos the 3870 X2 is still shown to be slower than the GTX by about the same percentage as in the real world testing. This article, IMO, has simply raised a largely academic and pointless debate that has confused and misled everyone from the original argument.
 
I completely agree, and my personal suspicion is that the reason is Vista 32 vs. Vista 64. [H] used Vista 64 with the 64-bit drivers, while nearly every other site used the 32-bit OS. Much of the code between these drivers is very different and it is likely that AMD spent more time optimizing the 32-bit drivers as that is still by far the more popular OS. Kyle has stated that he might go back and do a true apples-to-apples comparison using identical system configurations including the OS.

The issue raised in this latest article is largely academic. Everyone should already know that a timedemo isn't representative of real-world gameplay. After all, in a timedemo the user's keyboard and mouse input isn't being continuously polled so not only does that free up a bit of CPU time it also eliminates the USB system as a potential throughput bottleneck. And on top of that, FRAPS incurs it's own significant CPU overhead. So duh, timedemos will be faster than real world FRAPS. Everyone should already know that or at least be able to understand it logically.

But in a comparison between video cards, timedemos should still show the same relative differences that the real world gameplay scenarios do. And this latest [H] article simply proves it, as the percentage differences between the two video cards are about the same whether it's a timedemo or real world.

The real contentious issue since the original 3870 X2 review was published has always been why [H]'s testing showed 3870 X2 to be slower than GTX. And this latest article has shed absolutely NO LIGHT on that issue, as even in the timedemos the 3870 X2 is still shown to be slower than the GTX by about the same percentage as in the real world testing. This article, IMO, has simply raised a largely academic and pointless debate that has confused and misled everyone from the original argument.

This is precisely what has been troubling me about the review.

There are several factors such as a quad core being used on many other review sites as well as a the 64 bit vs 32 bit version of windows.

There's a lot of variables involved and I wish we could get a more straight answer.
 
What this means is that for people with a similar setup to [H], who prefer similar visual settings, and who prefer to run their games at the same frame rate, are going to get a much clearer idea of whats going on. However people who differ in any of these respects is going to find themselves saying something like:

"well I don't really care about shadows in Crysis, I prefer to crank the AA up"
"well I'm a pro gamer and when I play online I demand a steady 80-90fps"
"well I have a 30" monitor, and tend to sacrafice in game settings to enable me to use 2560x1600"

This not true, simply because first of all, any site, will try to get rid of any possible bottleneck, other than the hardware being reviewed. [H] is no different. Also, when reading [H] reviews with a not so top of the line system at home, one cannot expect that just by buying the graphics card being reviewed, that he/she will get the exact same numbers. That's the ABSOLUTE wrong way to read reviews, are they real-world or timedemos.

Then there's also the fact that NO site does what you question in quotes. People looking for all that info need to understand that if a graphics card runs a demanding game @ 1600x1200 with all settings maxed, including AA, then you can always lower settings to get more framerate or increase resolution. No need to show it in the review. That's implied. This same logic applies to all types of graphics cards, are they low, mid or high-end. You can use a 8600 GT to play Oblivion @ an insane resolution, but only by sacrificing most (or all) the in-game settings. You just have to realize that you'll have to compromise much more, as you go down the segment of the card (low, mid or high-end).

Bottom line, real-world gameplay is NOT 100% accurate. That was never the issue. The issue is that it's insanely more accurate that most (if not all) timedemos and/or built-in benchmarks, presented in the majority of sites.

And it's not about picking the settings for you. [H] shows you what are the highest playable settings on that specific hardware. Many say this is subjective, but I really don't understand how is that subjective. When a game running on specific hardware, is choppy, it's choppy, no matter how you polish your conclusion of the results. It's not about framerate, as was already said in the article. It's about gameplay experience, which is what we do at home, right ?
 
What do I want from a review/evualtion?
I wan't to know in general, what video card on the market is best suited to my needs, so I can make an informed purchase and make sure I spend my money wisely.

Your thoughts are again noted. Our content does not have value for you. I have no problem accepting that fact. You do not need to state it again. Thanks for your contribution.
 
I dont know. The article seems have more than just one purpose, one being your methodology and the second to just bash Anandtech. You do know that other sites use thier canned benchmark methods as well right? Were they chosen for your crossharis due to the size of their fanbase knowing it would start a war? The article would carry more weigh and be more worthy of reading if didn't come off as a "lets take them down" piece aimed solely at AT.


You are right, you don't know. And if you refuse the author's direct explanation there is little I can do to inform you better. There is no "war" here. Don't be a drama queen. :rolleyes:

Your opnions are noted.
 
I am so sick of these fracking ignorant kiddies talking about throwing shots at Anandtech. As a testament to their level of respect, H included the best HW review site in N.A. as an example of how great, intelligent people with good processes still have some processes that they don't agree with. Instead of pointing out the hundreds of smaller sites that use the same testing method H is trying to do away with, he used one site, one of the bests.

So, no they are not taking shots at Anandtech. They are just using them as an example of good people that might be a little mislead. Why use a bunch of mom and pop sites so people can waste time arguing over credibility and just use Anandtech to avoid all that.

Good Editorial guys. I like a poster from the 2 page read your article on the 8800GT, bought one for $229 off of [H]s hotdeals forum (or maybe it was the thread that followed the article), and I am currently playing all of my games on native res maxxed out. And my card is single slot to boot and my overall case temps dropped also.

This is a free service guys, and I can;t believe the amount of "smart" people who can be so far off in their logic and reading comprehension and still feel justified to post pre-mature opinions. I dunno what wig you guys have over your eyes when you buy hardware based off other site's recommendations and have to play games with that hardware. Don't yous realize that you are lacking performance-wise? Don't you guys "feel" it?
 
I dont know. The article seems have more than just one purpose, one being your methodology and the second to just bash Anandtech. You do know that other sites use thier canned benchmark methods as well right? Were they chosen for your crossharis due to the size of their fanbase knowing it would start a war? The article would carry more weigh and be more worthy of reading if didn't come off as a "lets take them down" piece aimed solely at AT.

AT was an example. Why do people keep insisting that Kyle chose AT to pick a fight ? It's a simple example of a well known site, that uses timedemos in their testing methods, which are not in line with what [H] does.

Someone posted a link to anandtech forums, in this very same thread. When I read it, I mostly saw people trying to do exactly that: pick a fight, when it's clear in the article itself, that AT is just an example or clashing methodologies, which is what this article intended to prove.
What does that tell me ? That these people did NOT read the article at all. They just saw references to AT in an [H] article, so that means [H] must be bashing AT. [H] is not, but the intention of the article was indeed to prove the discrepancies that exist, between both sites methodologies and that, ultimately, [H]'s one is better.
 
Back
Top