Benchmarking the Benchmarks @ [H]

Chris_B · Feb 11, 2008

SamuraiInBlack said:
As Chris B points out here in post #91 even turning just the ever so slightest in a video game can alter the framerates of a game in gameplay. This is a prime example of something a player would do. They may take the same path, but they will not face the exact same direction, and yet the framerate is significantly different. So how do we know that these canned demos aren't done on a semi-optimized pathway so that the numbers aren't skewed to begin with, with occasional roughspots to make it look like it wasn't completely BS'ed? We won't.

Actually that post was really pointing out a bug with the 3870x2. Ive been in contact with ati and they seem to think its a driver issue, the performance just plunges on 5 of the 11 levels in the game for no reason, rivatuner seems to indicate the core clocking down to 2d levels, ati have told me they tried the game with powerplay disabled and still got the framerate drop so they now think its a driver issue.

SamuraiInBlack · Feb 11, 2008

jimmyb said:
You probably wouldn't. However, for those that play other games (or plan on using the card to play currently unreleased games in the future), "best playable settings" is significantly less useful than establishing an overall relative performance of the cards. That aside, it still ignores that issue of the subjectivity of "best playable settings" which we're made painfully aware of every time H reviews something.

You might also care that there's no way to scientifically verify the results.

But then it can be argued that overall relative performance of the cards is only purely in relation to games that are currently on the market. Prime example, NO card can truly spank Supreme Commander from what I've personally been reading. Things may have gotten better, but I don't see anyone playing at highest res + maxed eyecandy yet. If you're the type to go to the lowest, crappiest settings despite having a rig that's essentially budget-be-damned, then obviously you wouldn't care. Some people actually do play first-person shooters that way just so that system lag won't affect their gameplay or their accuracy. No card, maybe even CPU, was ready to handle a game like that. But if you were going to go by overall relative performance, you may have gotten the idea that company X's card was superior than company Y's, and your only arguing point would have been .000001 FPS in difference. There is no telling what kind of system-crippling games will come out until they hit the shelves. So to me, overall performance means nothing in regards to future games. Not when some games are released on older engines/platforms and others have ones totally worked from the ground up.

evilcartman · Feb 11, 2008

SamuraiInBlack said:
Science experiments belong in a lab, not in a home. Scientists typically go by doctor, not 3dMaRkS_Ownz_J00.

What Brent and Kyle have to say is what they intend to share. How you interpret that or how you consider its weight is purely up to you. It doesn't matter if you've proven it by science or by subjective performance. If you're purely going to rely on one method alone, let alone one source, you're already misleading yourself. Hell, how often do we see other reviews posted on the front page when new hardware comes out? To me that says that even the [H] wants you to not just rely on their information alone.

But as someone else pointed out, no one's explained why the results are so varying, or even investigated into it. The scientists scream that their method is more accurate - prove it. Find those exact results to happen in real gameplay. Post your proof that its consistent with your findings. Show us that the real world game play performance is reflected in those canned benchmarks. That's all you'd have to do to prove Kyle's method wrong, once and for all.

QFT. These are video cards we're talking about....it's not like where discussing new theories in Nuclear Physics that require a scientific method to test and evaluate. It seems to me that the biggest complaint is that Kyle's tests aren't reproducible and don't follow some sort of "controlled method". Has Kyle ever said he was a trained scientist? No. He's a gamer and hardware enthusiast like all of us here so I wouldn't necessarily expect him to have a truly "scientific" approach to his testing methods. I think he does an extremely good job at what he does because his tests demonstrated what actual gameplay is like as well as the fact that the tests are quite reproducible. He brings the human (by having a well-thought out path in game) and software elements (like eliminating timedemos because of optimizations used in the engine and video card drivers) to a minimum in order for the actual hardware performance to shine through.

I agree with SamuraiInBlack, until someone who favors a more "scientific" approach can prove how their method is superior and can show how this will affect my gaming experience, I'm siding with [H] on this one.

jimmyb · Feb 11, 2008

Modred189 said:
But it's a subjective industry! Everything about gaming is based on personal preferences and individual hardware and software settings.

Rendering speed (which I think is primarily what we're concerned with) is not a subjective issue at all.

Rendering quality, software compatibility, etc. are probably subjective qualities, but certainly not rendering speed.

edit: the negativity towards the scientific method here is astounding.

Jakalwarrior · Feb 11, 2008

Honestly I would like to see a manual time test. Use fraps to record the frame rates. Set a save point in the game. Use a program like EZMacros (similar to the macro program on a g15, except it records the mouse too) from the load page that, hits the load button. Sits there for 10 seconds letting the FPS stabalize, starts FRAPS, then holds down the forward button for 1 minute, then stops FRAPS. Sure it wont completely reflect real world play BUT it will give a perfect apples to apples and a bit of objectivitey to quell the naysayers. You could get more complex with it but in my years of writing macros for ultima online etc... somtimes ones a few hours long... I always had to record in a lot of fail safe redundancy type things because recording exact actions on the human interfaces only doesnt always play out exactly when the hardware does something a tiny bit different. You might be able to do more complex runs though since you dont have any network latency to add into the equation. It also wouldn't be "canned" since its not a real "time demo". Its just recording exactly what the keyboard and mouse do and replaying it.

SamuraiInBlack · Feb 11, 2008

Chris_B said:
Actually that post was really pointing out a bug with the 3870x2. Ive been in contact with ati and they seem to think its a driver issue, the performance just plunges on 5 of the 11 levels in the game for no reason, rivatuner seems to indicate the core clocking down to 2d levels, ati have told me they tried the game with powerplay disabled and still got the framerate drop so they now think its a driver issue.

Even still, wouldn't you agree that it's an issue that needs to be brought to light, and taken into consideration when doing a review? Especially if it does turn out to be driver-related. Things like this in my opinion would affect gameplay and, quite obviously, can affect the results of evaluating the card's performance, no matter what your testing methodology would be.

Oberon · Feb 11, 2008

Fix Me said:
Maybe I'm missing something, but your argument seem irrelevant. Yes, if you're comparing Anandtech's review with 4GB and [H]'s review with 2GB or RAM there might be some FPS differences. However, when you look at a review and see a 3870x2 not performing as well as a GTX, how is having 4GB of RAM going to make the X2 perform better? If you have a system with 2GB of RAM and you test a X2 and a GTX on the same system and the GTX performs better then it shouldn't matter if you upgrade the RAM to 4GB and do the same tests again. You should still have the GTX performing better.

Edit: Thanks for the article! IMHO, even one broken benchmarking tool is one too many.

Yes, you are missing something. I wasn't making a reference to relative performance between the two video cards (since that not waht the article is (supposed to be) about), I was comparing the difference between Kyle's roughly 18FPS number to Anandtech's 31FPS number. The comparison is completely irrelevant since the systems are not comparable.

Vashypooh said:
The reason you don't see 4GB in the reviews, is because if you look in most peoples sigs on the forums you see 2GB.

Simple concept, [H] tests stuff for what we use. Not what we could have if we spent a crapton of money. I'm sure they could load up systems with 16gb and go do benchs for us. I bet they have the resources but why bother when the target base of [H] doesn't have it?

If [H] only tests "what we use", what's the point in benchmarking either of these cards? I guarantee the vast majority of [H] members don't use 3870 X2s and 8800 GTXs or ULTRAs. Most of them will opt for a 3850 or 3870 if they're ATI supporters, or an 8800GTS or GT if they're an NVIDIA fan.

Chris_B · Feb 11, 2008

SamuraiInBlack said:
Even still, wouldn't you agree that it's an issue that needs to be brought to light, and taken into consideration when doing a review? Especially if it does turn out to be driver-related. Things like this in my opinion would affect gameplay and, quite obviously, can affect the results of evaluating the card's performance, no matter what your testing methodology would be.

Well it does affect the dx9 path for crysis, doesn't affect the dx10 settings and thats what the article was done with.

GrimR · Feb 11, 2008

Oberon said:
Ok, Lost Planet, and that's just off the top of my head.

Also, are you going to tell me that you're going to see absolutely NO increase in performance when doubling the amount of RAM in the system? If so, then you're going to need to turn over that forum title, because you have no idea what you're talking about.

AFAIK doubling the RAM will only help if you have saturated the available memory. For example, playing BF2142, a game that pretty much requires at least 1 gig of RAM, will not see a benefit going from 2gigs to 4gigs unless your running a ton of stuff in the background.

Oberon · Feb 11, 2008

Modred189 said:
You will see some improvement, but not as much as a video card upgrade, by far. My experience has been that once you hit 2 gb of ram, you see a very quick series of diminishing returns with more ram.
On the CPU side. When I upgraded my system from an E6300 to my current Q6700 (assuming stock clocks here, post- 4gb upgrade), I saw very little improvement in most games. Crysis definitely did. Source based games MAY have snatched a 2-3 fps gain.

And Lost Planet is a bad example, as it is a console port.

I have no clue where you're getting the idea that I said anything about comparing a RAM upgrade to a CPU upgrade.

Also, sure Lost Planet is a console port, but who cares? That has nothing to do with the fact that it scales well with more CPU cores.

bluehaze013 · Feb 11, 2008

The problem with your testing method is that it caters only to readers using your exact same system. My max playable settings will be much different than yours because I am running a Quad core whereas you are running a Dual core etc. and so on

Look at this review with regards to CPU scaling a 3870x2 scales much better than an 8800GTX. The review is in German but you can see all the graphs here: http://www.pcgameshardware.de/?menu...id=630867&entity_id=-1&image_id=769646&page=1

Please do not post images here you do not have copyright to. - Kyle

Image of CPU Scaling:

http://www.pcgameshardware.de/?menu...ty_id=-1&image_id=769656&page=1&show=original

This can also be confirmed by reading and comparing all the different reviews out there, when tested with a Quad core the 3870x2 Outperforms the Ultra on over 85% of the games some by very significant margins but when compared to it on a dual core the performance difference is not so noticeable and it loses in more games.

I can confirm this myself, now owning the 3870x2 having upgraded from 8800GTS 512, the 3870X2 smokes my GTS, Crysis is playable with everything on Very High 1440x900 and playable with very many settings on Very High the rest High at 1680x1050 and Also very much playable with all settings high 1920x1200. In all of these scenarios I avg 30FPS in game confirmed via Fraps sometimes jumping over 60FPS.

Your review might be useful to those running an x6800 but for the vast majority of us running Quad Cores your review is actuaslly providing wrong information when suggesting an 8800GTX is a better buy than a 3870x2.

Regards

Modred189 · Feb 11, 2008

Oberon said:
I have no clue where you're getting the idea that I said anything about comparing a RAM upgrade to a CPU upgrade.

Also, sure Lost Planet is a console port, but who cares? That has nothing to do with the fact that it scales well with more CPU cores.

My point is that most games are NOT CPU or RAM bottlenecked on a system like the [H] setup. The GPU/framebuffer will be an issue before the CPu and RAM.
As such, changing the setup wouldn't make much sense/is not necessary.

clarkkent57 · Feb 11, 2008

Hey,

In the spirit of contribution, I rarely post anymore, and only when I have something to say.

Kyle and I have discussed fake benchmarks to death, and hopefully, you'd think people would read and learn.

I am no stranger to testing, after a 30 year IT hardware/software background.

Regardless of what benchmark tool you may prefer and use, the Hard truth is that it's your individual experience that dictates quality. Most people are unaware of the real danger in R&D that benchmarking software creates. More on that after this...

True benchmarking software should contain no optimization code whatsoever. The field has to level, or as level as possible, for any benchmarks to have any validity and accuracy (and, critical thinkers know that these are two different things. Validity = Are we measuring at what we're supposed to be? Accuracy = If we ARE measuring what we are supposed to be measuring, are our results truthful?)

Consequently, if benchmarking software should not be "optimized," then device drivers shouldn't be either. I'm not saying that the drivers shouldn't be as efficient as they possibly can be, I'm just saying that they can't contain code written for specialized tests.

If these situation were to occur, and if all software-related platforms were equal, we STILL wouldn't have a level field. Why? Hardware. At best, we'd still be comparing apples to oranges, when it comes to comparing how software runs on different hardware platforms, unless the hardware was limited to only one manufacturer's line. For the most part, it is generally assumed that a manufacturer's hardware gets better and better, with a few exceptions, otherwise they aren't in business very long.

HOWEVER, if all things are as equal as they can be, then benchmarking software can give a RELATIVELY accurate measure of performance. But, then, as the Bard said, "There's the rub," or the danger to which was aforementioned earlier in this post.

THE REAL DANGER is that benchmarking software companies can falsely dictate development of hardware down the wrong path, and into a dead end, because they can influence the market unfairly. R&D is dependent on money coming in, and it, by nature, spends, and never directly provides income. Money comes in the form of manufacturing sales, and sales come from product promotion. Fish love shiny fishing lures, so beware the hype. (My PhD was based upon the need for truth in advertising, but also that good products need just as much promotion as bad ones to compete. Shiny packaging gets your attention; good quality keeps it.)

I NEVER buy ANYTHING based upon benchmarks or hype. My criteria is simple. Is the product as future-proofed (expandable) as possible? Is it reliable? Will the company be around in the future? Do they give good service when and if needed? Will the product do what it is advertised to do? Does it fit my needs? What's the best solution in my price range? And, finally, does it give me the experience I want? IF you buy using these guidelines, you'll consistently get the best quality and performance you can afford.

In other (shorter) words, Kyle is absolutely right. (However, before you think I am kissing up here, I'd like to throw this in: KYLE, we need best-bang-for-buck performance analyses and product and "tweak" recommendations now more than ever. The majority of us can't spring for the top-end stuff. Hard and CPU magazine, and the like, as they get more reknown, become privvy to better and better stuff, often for free, and while the toys are great, you all are moving away from the audience that needs you and also the audience that helped make you great. "Come back, Shane...")

...but there's hope. Any review website that refers to a bad power supply as a "flaming piece of shit" still has unbiased honesty going for it.

(But how does it rank among the "piece of shit" supplies?)

-Will

10012403 · Feb 11, 2008

Thanks for explaining things so well. I knew there was a reason why I trust this site over any other site especially for Video Card Evaluations.

Keep up the great work!

DeftonesXP · Feb 11, 2008

killrose said:
I think your statement "Timedemo benchmarking of video cards is broken" say's it all.

Word.

Drivers and programs, they all need optimizations. As long as they don't modify the user experience... now optimizations for benchmarks only, that's another thing

Congrats [H], another reason why your benchies are worth alot more than "the canned reviews".

SPARTAN VI · Feb 11, 2008

clarkkent57 said:
(However, before you think I am kissing up here, I'd like to throw this in: KYLE, we need best-bang-for-buck performance analyses and product and "tweak" recommendations now more than ever. The majority of us can't spring for the top-end stuff. Hard and CPU magazine, and the like, as they get more reknown, become privvy to better and better stuff, often for free, and while the toys are great, you all are moving away from the audience that needs you and also the audience that helped make you great. "Come back, Shane...")

-Will

Looking at their more recent reviews, I see the 8800GT/GTS, 3800 series, 8600 GT/GTS, and 2600 evaluations. They're on top of the mid-range offerings, maybe not directly on top, but hovering.

The most interesting part of the article, to me, was when Kyle explained how he was invited on to the Screen Savers and asked about how the FX 5900 (IIRC) played. I didn't catch that episode, much less know Kyle was a guest (wasn't even an [H] member back then) but I can almost imagine the dumbfounded look on Kyle's face (if I knew what he looked like) when Leo/Patrick dropped that bomb.. What a wake up call.

Evil_SPanKY · Feb 11, 2008

N3mi5is said:
Well I think it would be an obvious decision, if Card A performs better at the higher resolutions tested, why would it not be equally as well performing at your lower resolution?

Great article guys.

I love the final run-through with Anand's settings

Well, because it does not always translate to that metric. Things like memory throughput, interface width, etc... can have a video card that is faster at lower resolutions, become slower at higher resolutions, or when you add in AA and such when compared to the other card.

elfletcho · Feb 11, 2008

The one thing that I did not already know before reading the article was that [H] plays through each game completely during the "evaluation" process. There is quite a difference in the amount of work involved here.
1.I could run a few canned demos, 3dmark, etc while watching TV sipping on a rum and coke, slap some charts into a "review" and call it good. OR i could play through a game a multiple settings, try to replicated a run through a portion of a game exactly over and over again, stare at image captures trying to detail the differences and write a 5 page article detailing my thoughts and experiences. Lets see which one is the "easy" way out. Hmmmmmm.....

TheRapture · Feb 11, 2008

Not that I needed any convincing, but this just goes to show how BAD the canned timedemo scene is. I mean, shit, the whole 3DMark fiasco should have proved this a long time ago to those "other" sites. That "benchmark" tool is so useless to compare to REAL WORLD gaming....

Anyways, great read, great backup, and I am glad to be a longtime [H]ard member. Thanks Kyle!!!

If I want to buy a product, I check it out on the [H] first, either through a evaluation or via the hardcore forum members. If it doesn't perform well here, I probably don't want it

elfletcho · Feb 11, 2008

The one thing that I did not already know before reading the article was that [H] plays through each game completely during the "evaluation" process. There is quite a difference in the amount of work involved here.

1. I could run a few canned demos, 3dmark, etc while watching TV sipping on a rum and coke, slap some charts into a "review" and call it good.

OR

2. I could play through a game a multiple settings, try to replicate a run through a particularly difficult portion of a game exactly over and over again, stare at image captures trying to detail the differences and write a 5 page article detailing my thoughts and experiences.

Lets see which one is the "easy" way out. Hmmmmmm.....

It seems to me that other sites compare to the [H] about much as a High School research paper compares to a Graduate Thesis.

unrulycow · Feb 11, 2008

Overall, a great article. However, one thing that wasn't addressed was the effect of FRAPS on gameplay. Obviously, FRAPS is designed to have minimal effect, but was it running during the canned benchmarks as well? Having a different number of programs running could account for some of the difference, though I doubt it had much of an effect.

mentok1982 · Feb 11, 2008

I just finished writing an article about the awesomeness of this article.

http://www.gpureview.com/hardocp-smacks-anandtech-in-the-face-with-a-chunk-of-logic-article-630.html

Here is an excerpt:

My article said:
So the HD 3870 X2 saw an increase of 22% ((41.2/33.7) - 1) between playing the demo and running the demo. Using the same demo and settings, they tested a 8800 GTX. The GTX improved by only 14% ((45.3/39.6) - 1). 14% and 21% are not the same numbers. They are not equal. Think hard about what that means, I will give you some time...

If running Timedemos on two cards really did tell you exactly how well both of the cards played the game, then those percentages would be exactly the same. Card A would get 22% more FPS when running the Timedemo when compared to playing the Timedemo, and Card B would get 22% more FPS when running the Timedemo when compared to playing the Timedemo. That is not what is happening though. The percentages are not the same. 22 does not equal 14 which means:

The difference in performance between two cards running a Timedemo does not equal the difference in performance between the two cards playing that same demo (or game). Therefore, Timedemo benchmarks are worthless to people who play games and are only helpful to people that like to run Timedemos as fast as they can.

Way to go Kyle! [H]ard|OCP 4 EVAR!

Also... how come [H]ard|OCP is spelled "HardOCP" throughout the article?

ecktt · Feb 11, 2008

Good stuff!
I have been a fan of this type of "evaluation" every since its inception. Keep it up! Eventually people will come around when they get burned with crap hardware and their fraudulent results.

chrisf6969 · Feb 11, 2008

They're just no perfect way to benchmark. The closest thing would be if every review site did a complete different set of games / benches, then you compare them all as whole.

The problem is the more popular a timedemo, benchmark or set of benchmarks is... the more likely they (nvidia or ati) are going to optimize for it. So what do you do? Benchmark obscure games that they don't have game profiles for yet?

Ever since the quack/ quake benchmarking scandle, everyone knows if you use a canned/popular benchmark of any sort, the videocard companies are going to optimize for that application / benchmark.

Or worse that quack/quake when "they" (don't remember if it was red or green) optimized 3dMark so it didn't render the non viewable areas, so if you had the professional version where you could change the camera angles you'd see black space!!

Anyway, I'm with HardOCP on evaluating GAMES not benchmarks! Though I still don't personally care for the apples to oranges comparisons, so I often skip ahead to the apples to apples and the conclusion. (sorry!)

I wouldn't mind seeing a 3dmark # in there too. Even though I don't believe 3dMark is indicative of gaming performance, it is good for showing the improvements from one generation to the next, etc.. And it does provide an common ground b/c so many people use it, you can see how your system compares over time to other systems.

Jester1550 · Feb 11, 2008

Kyle et al., I have just one word for you:

AMEN

CaptNumbNutz · Feb 11, 2008

The whole "reproducibility" argument has been blown out of the water.

Good Job [H].

If a timedemo is supposed to be better simply because the results will be the same each time, then how come the timedemos do not scale equally between different architechtures, or scale the same between the timedemos and in-game?
I don't see how that helps anyone at all.

I always felt that the manufacturers optimized for the built-in timedemo's, and [H] reviews seem to prove that.

@Kyle or Brent
What if you guys record videos of the in-game run throughs, then host them somewhere? All the video would be for is to compare the Fraps counter, and to prove you are indeed trying to do the same run through each time. They don't even have to be in the native resolution.
There would be more to it than that obviously, but I'm sure you guys would have a better grasp of the concept than I would.

Rock&Roll · Feb 11, 2008

CaptNumbNutz said:
What if you guys record videos of the in-game run throughs, then host them somewhere? All the video would be for is to compare the Fraps counter, and to prove you are indeed trying to do the same run through each time. They don't even have to be in the native resolution.

MMMMM. Hardware reviews in WMV HD. That would rock.

TheRapture · Feb 11, 2008

Atech said:
It's been said before:

Real world game play rules, canned benchmarks belong at...Cannedtech?

Ouch. Sticks in eyes tend to hurt.

Niceone · Feb 11, 2008

Suprisingly they took this "hardOCP doesn't like Anandtech"-thread down on Anandtech

.
Nevermind it's back online.. was getting 102 error and it dissappeared from that list for a while

skittzle · Feb 11, 2008

Niceone said:
Suprisingly they took this "hardOCP doesn't like Anandtech"-thread down on Anandtech .

Its back up now. Some kind of hiccup there.

hellslinger · Feb 11, 2008

I'm really glad to see this article because it helps confirm my beliefs that timedemos have little or nothing to do with real world experience. I made the decision to buy an 8800 GT after seeing a graph of framerates playing Oblivion on an 8800 GTS on this website. I tend to look at minimum FPS instead of peak, or even average, as it shows you where your system is going to leave you stranded, hung out to dry while your system catches up.

It doesn't matter how many fps your system can get while you're staring into a corner or up at the sky, it matters how playable it is when there are 12 players in view and 4 explosions going off at the same time because that is the time when the guy who has a well configured system with the right parts and settings is going to have the advantage.

DragonNOA1 · Feb 11, 2008

Real-world benchmarks are better, I agree, but I think Anand is talking that if everyone else does canned-benchmarks then RELATIVELY SPEAKING you can use them to judge between cards but not in an actual game. So yes you can see which card is better than the other but you will never see how it ACTUALLY performs unless you do real world benchmarks.

tazzmissionx · Feb 11, 2008

I have been reading this thread over there.
http://forums.anandtech.com/messageview.aspx?catid=31&threadid=2153074&enterthread=y

And on that note, their forums seem to lack control, read almost any thread from apoppin

SmokeRngs · Feb 11, 2008

SamuraiInBlack said:
Easy for me to answer - testing with accurate tools where the results can't be exactly reproduced. Why? Because no one plays any game the exact same way. The exact same steps in the exact same spots on the exact same map do not exactly happen at the same exact time! You do not turn, run, and jump the exact same pathways. So with that, it would be exactly how a real person would play.

Thank you for seeing the point many others seem to have glossed over. Timedemos are inherently flawed tools in the fact that they can be independently optimized for outside of the game and also leave out critical aspects of the game which come into play while actually playing the game.

|CR|Constantine said:
There is not one shred or semblance of scientific testing with HardOcp. You are merely taking Brent or Kyle at their words it's that simple.

I'm ok with that and so are many others because as readers we trust them. But lets stop petting our asses on this one, their testing is in no way scientific because their simply is no definite control.

I think it really comes down like it always has read everyone's reviews and make your own choices.

Actually, there is scientific testing in regards to [H]'s video card evaluations. They control the variables as much as possible without removing integral parts of the gameplay in regards to the actual game. Timedemos remove parts of the gameplay and can easily be "optimized" for without actually helping out performance or IQ in the game itself. Using tools flawed at the basic level is much more unscientific. The basic premises of the scientific method are being followed in regards to [H]'s evaluations. The group is controlled as much as possible while being able to closely repeat results. Also, the limitations to the control group are stated. You can never have a "perfect" control group. To do so would make any real comparison or evaluation flawed because you would never run into that situation in the real world.

AbsintheCommando said:
Your assertion is correct, at least in my case. The essence of science is replication and verification (or falsification if you are a Popperian ) I've no idea what the reviewer thinks are acceptable scores, and I am unwilling to take their recommendations on faith. Problems with H methodology include lack of third-variable control, willful introduction of uncontrolled variables, and irreproducibility/unfalsifiability.

Kuhn (1962) would claim that this represents a failed attempt to create a new paradigm (of reviewing). Unfortunately, the HardOCP paradigm is flawed; it is inconsistent with any form of useful inquiry, and thus will not succeed the current paradigm.

I am not interested in benchmarks because of the numbers they provide in and of themselves, but the numbers they provide relative to other cards. If I get 100 FPS on Card A on Benchmark X, and 130 FPS on Card B on Benchmark X, then all else being equal, I can assume that Card A is faster. I am not interested in the bench scores in and of themselves, but in how they stack up against other scores. I use them to provide a rank order of scores. I don't use the raw scores themselves.

It is stated in the [H]'s testing methodology what the acceptable "scores" are. How you can say you don't know what these are is absurd. As I stated before, the evaluations do have results which can reproduced. Have you never taken a look at the FPS graphs. Are they perfect and spot on every time? No, but they are really damn close in most cases. It's called margin of error and the articles I have read seem to be well within an acceptable margin of error in regards to their run throughs of the games.

Again, not all variables can be controlled unless it is an ideal situation. The fact that something becomes an ideal situation means that it's not feasible in the real world. Timedemos are a much more controlled environment but they are a small slice of the environment and do not include all features of the game. There is no AI and I doubt any physics because they are nothing more than a recording. It's the lack of possible variation which invalidates the results obtained from them.

I've played plenty of games where by the FPS numbers I should not have had a problem playing the game. I've had well over 100 FPS in a game before and it still felt choppy. I've had 30-40 FPS in the same game and it felt smooth as butter. A graph of supposed average or high or low framerates doesn't tell me jack shit. The average framerate figure has to be one of the most useless "measurements" out there. Someone playing the game with the card at the settings tells me whether it's any good or not. How many times now has [H] evaluated a card and had to change settings one way or another because of something like AA or AF or the type of AA or AF was causing some type of choppiness despite the framerate being fine? How about the number of times it has been pointed out that certain settings caused some type of graphical corruption or problems in the game? You're not going to find that out from a graph of timedemo "scores".

Zetherin said:
Scientific...?

You perceive timedemos as scientific? These are optimized benchmarks, don't you get that? The 'definite control' you speak of is FLAWED.

There is more 'control' in these real world fraps tests than anything demoed. This is real data, however controlled, which paints a much better picture of what one can expect from their video card's performance on a game.

You're exactly correct. Timedemos are not the games. They are static replays of game actions which can be optimized for in software by the video card companies to make their cards look better. In most cases, this doesn't affect the actual gameplay. Any advantage made in the timedemo won't necessarily be carried over to the game itself. The lack of possible variations is what makes the tool flawed in the first place.

provoko · Feb 11, 2008

HardOCP should from now on take benchmarking settings from other sites and play the game using those settings! Haha. Make a special page for it in the review called "Canned Benchmark vs Real World" and show the % between them

Mtnduey · Feb 11, 2008

I personally like real world results. I want to know how a game is actually going to perform as if I were actually going to sit there and actually play it. Not watch a time demo and say "thats cool...now what?" b/c time demos give me no real meaningful result or corelation to actual performance.

Think about this the next time you shop for a car. Is real world gas mileage meaningful to your wallet or is what the mfg was able to get based on the loosest and most vague guidelines going to make a dent in your commute bill? Don't like the gas analogy? Try performance. Mfg claims car has 600hp and can speed thru the 1/4 mile in 10 seconds. Only to find out that the car really only has a pinch over 500hp and is closer to 13 seconds in the 1/4 mile.

Point being, say what you mean, mean what you say. If you say gfx card X should get avg 35fps with specific settings then it should get 35fps with those specific settings (with the proper supporting hardware of course). I for one do not want to read a review and see hopes of getting 60fps with some settings only to find that i can barely acheive 35fps.

satanicoo · Feb 11, 2008

unrulycow said:
Overall, a great article. However, one thing that wasn't addressed was the effect of FRAPS on gameplay. Obviously, FRAPS is designed to have minimal effect, but was it running during the canned benchmarks as well? Having a different number of programs running could account for some of the difference, though I doubt it had much of an effect.

Please check this, as i was thinking the exact same thing.

conscript · Feb 11, 2008

I don't see what the big deal is. I have alot of problems with [H]'s testing methodology, especially the continued insistance that it's the "best"...but at the same time, I think it's invaluable to be able to compare multiple reviews with different perspectives to get the whole picture. I wouldn't trust any one site's review to give me a complete picture of a product, but if you take 2,3,4 etc different reviews and compare the big picture, it's only goiing to help you make an informed decision, especially if they use different methods of testing, and you understand the manner in which those tests are performed.

AthlonXP · Feb 11, 2008

I guess someone already posted it before but:

http://forums.anandtech.com/messageview.aspx?catid=31&threadid=2153074&enterthread=y

I guess they are starting to fire back. Anyhow i thought it was a good article as I think actually playing like 10 minutes or so or going through a level and measuring the fps is a better way to tell how the card performs.

Zetherin · Feb 11, 2008

Mentok1982,

I, as a gaming enthusiast, thank you for posting that article in hopes to shed some light on the deceitful benchmarks that have (and probably will always) plague us. With the help of people like you, with leverage in GPU hotspots, I hope the article Kyle presented will not be in vain.

With that said, I hope those of you with leverage (and even those that don't) take Mentok's lead. Spread the word. Hell, just tell a friend that may be a graphics enthusiast. Spark interest in making something along this line a standard, rather than timedemos or some of the currently acceptable ways of benchmarking.

Viva la revolution

Benchmarking the Benchmarks @ [H]

Supreme [H]ardness

Supreme [H]ardness

[H]ard|Gawd

2[H]4U

2[H]4U

Supreme [H]ardness

n00b

Supreme [H]ardness

Limp Gawd

n00b

Gawd

Can't Read the OP

Limp Gawd

n00b

Weaksauce

[H]F Junkie

Limp Gawd

Gawd

Supreme [H]ardness

Gawd

n00b

Supreme [H]ardness

Limp Gawd

[H]F Junkie

2[H]4U

Fully [H]

[H]ard|Gawd

Supreme [H]ardness

Gawd

[H]ard|Gawd

n00b

Supreme [H]ardness

Gawd

[H]ard|DCer of the Month - April 2008

Gawd

[H]ard|DCer of the Month - Nov. 2013/Nov. 2014

n00b

2[H]4U

Fully [H]

Gawd