Benchmarking the Benchmarks @ [H]

bildad · Feb 11, 2008

I'm getting a chuckle out of those spouting the scientific method. The point of the scientific method is to test a theory and prove it in the real world, which is exactly what the timedemos do NOT do. They neither represent real gameplay of the games they're trying to represnt nor are the results similar when compared to the real world.

FlyinBrian · Feb 11, 2008

I am curious as to why the article's tests were run on a system that crysis eats for breakfast with a smile on its face. dual core ? 2gb ram ? try eliminating the system as a bottleneck.

Zetherin · Feb 11, 2008

bildad said:
I'm getting a chuckle out of those spouting the scientific method. The point of the scientific method is to test a theory and prove it in the real world, which is exactly what the timedemos do NOT do. They neither represent real gameplay of the games they're trying to represnt nor are the results similar when compared to the real world.

I was eating some skinless, dry chicken breasts and then I regurgitated all over my keyboard from laughter.

And yes, I regurgitate when I laugh.

supergrover · Feb 11, 2008

Yoshiyuki Blade said:
I think you're overly exaggerating the "popping in the card" part, as the testing methodology is much more complex than that.

It's also a rather weak analogy to compare video cards to power supplies, as theres a lot of *important* information under the hood that goes undetected without the proper equipment.

I absolutely agree that what goes on under the hood is very imp, especially for power supplies.

True that you can't just compare video cards to power supplies in a simple manner but if user experience playing games is the true barometer of hardware 'worthiness' as the article suggests then a power supply that seems to run fine by playing a game for a few hours a couple of days and is not operating outside of ATX specs should be as good as a power supply that will endure a [H] torture test with flying colors. However, as a user I appreciate the attention to detail that [H] puts into their power supply reviews and all the data/variables supplied by them....just like I like Anandtechs reviews for the same thing.

Also, the scientific method involves taking all the variables and being able to replicate them. Wether or not Anandtech uses a timedemo, as long as they give you all the variables they used (including the timedemo), another benchmarker should get the same results within a respectable margin of error using the same variables.

With [H] it would be impossible to do this. By using their experiences playing the game, we could not replicate which tree was knocked down, when a grenade was thrown, etc. All of these experiences in a game will affect the numbers in some way. This is what I mean by "popping" in the card.

Bottom line is:
If a purchaser just looks at a few graphs or just the "Conclusion/Award" page to make a buying decision, then they risk making a bad decision. You need to look at the setup variables (ie. a benchmark or experience chart for a Video Card tells me little if your using a vastly superior cpu/motherboard), what is being evaluated, and then make a decision. You should also look to other sources if you cannot confirm the numbers yourself.

steeltoe · Feb 11, 2008

Frankly, [H] was better when they didn't spend so much time thumping their chests. It's gotten old

"We're right and everyone else is stupid!"

gibber · Feb 11, 2008

Dozer said:
Good article.

Curious - why do you guys use only 2GB of ram with a 64-bit OS? seems like a weird pairing to me.

As I see it, the only good reason to use a 64-bit OS is to be able to access more memory than the 32-bit version can.

As to the article, I agree, but please, try to include "apples to apples" in each review. Also, would be nice to see some more mainstream cards included in the comparisons instead of exclusively high end very recent cards, as is almost always the case in [H] reviews. Would be nice for owners of older cards to be able to see how much of an improvement they'll likely get when stepping up to a newer card, be it a $100 card, $200 card, or a $300 card, that is the new card being reviewed.

Vyedmic · Feb 11, 2008

Okay bit of a trolling

Cevat Yerli said something about team of nVidia engineers helping with the developement of CryENGINE 2. It's not surprising then that nVidia has an advantage here and siginifacant jumpstart. Not even considering that 3870X2 is fairly new card - crossfire solution - which will definitely show improvement with next driver releases. And yes I realize that it's mentioned in the original 3870X2 review.

The thing with optimization for "canned" benchmarks is stupid marketing policy of both vendors and it deserves to be revealed. Thanks for that.

I read [H] video cards reviews for a bit now but I always have this feeling that something with the methodology is not right. I can't and won't place any suggestion on how to improve it but I guess reading this thread showed me that huge number of readers like it so keep it coming and don't care about haters. They only indicate that even for them [H]ard|OCP is important enough to express their views.

krameriffic · Feb 11, 2008

mach1.9pants said:
THIS!
The article is a wake up call for PC sites out there, just relying on canned demos provided by the game just doesn't cut it. If I went to AT and saw the Crysis results there I would think I could game at 1920 resolution with HIGH settings. The fact is you can't. The review is therefore defunct as a review. It does not help one to buy the correct GPU to fit their monitor.
I hope that sites like AT have the balls to fess up that they are wrong and start putting the hard yards in to get complete reviews, even if they included both timedemos AND real time play then we can make up our own minds!

The problem is that I don't look at those reviews expecting to know what card X can do in setup Y at settings Z, I look at those reviews expecting to see that card X is faster than card A and B in some certain price range. The point is relative performance, not absolute. They wouldn't even have to post numbers for me to get from those graphs what I want, all they would have to show is an accurate relative representation of the performance of the cards versus each other.

bluehaze013 · Feb 11, 2008

mach1.9pants said:
If I went to AT and saw the Crysis results there I would think I could game at 1920 resolution with HIGH settings. The fact is you can't. The review is therefore defunct as a review.

LOL, You absolutely can run Crysis at 1900x1200 with all settings on high with a 3870x2, I have one and have been doing it for a week FPS range between 35-50 DX10 under Vista granted its 35FPS the majority of the time but it is very much playable. So exactly which review is defunct here? The one that tests on a current top of the line Quad Core @ 3.8ghz similiar to what I am running or the one that tests on an atiquated Dual Core @ 2.9ghz?

dawurz · Feb 11, 2008

The quote from Derek Wilson alone is worth its weight in gold. Not to mention it sounds like he's been drinking Futuremark kool-aid. I wonder what that quote would look like if he was reviewing, say, automobiles - "Driving the car is not necessarily required."

GrapeApe · Feb 11, 2008

Zetherin said:
I hope you realize that it's not the specific testing that Kyle did here that is the main point, but the methodology OF his testing.

Which is to say what?
That we test differently is more important than how we do it or whether it's internally or externally valid? No, that's not good enough.
Others did the 'playable' tests too, and it doesn't mean they are better or worse than anything else. Just another source of info.

Rightmark and 3Dmakrk can tell you alot, but not about how a card will play in Crysis; no more than [H]'s test will ensure anyone else something that doesn't apply to their games, their settings, and similar rigs. That would be to pretend there is no system influence as well, which should be obvious to people is not the case.

Sure, we could switch settings around (for all those to discuss which are 'most important'), change configurations, play the game differently. Yes, they could do all those things, but they won't. They can't. It would simply take far too long to try every single test for every single configuration, setting, and playstyle.

I don't think anyone expects it to be or test ALL things for ALL people, however from a review that praises itself on playing the whole game to the end and then criticizing other for not having the time, I don't want to hear about time issues for not at least testing more than 1 setting, and not doing more apples-to-apples, simply due to the excuse of time. You can't have it both ways. Overanalyzing just a single scenario running it 10 times for the one setup is just as bad as only doing 1 run through on multiple, neither gives you a more global picture that is internally valid. That's why I don't give them anymore weight than any other site, because it's only a SLICE of the overall picture, vald to just their tests.

With that said, again, the point is what they are doing is on the right track, and yields far more accurate results than running a timedemo for those that wish to see results that they may apply to a real gaming experience. It's much more acceptable than basing everything off of an optimized benchmark.

Actually that's true as long as you talk about timedemos that are public and that the IHV's are given the opportunity to optimize for (Beta drivers ever make you wonder about that?) ; however create your own timedemo then you shouldn't have to worry about that. I would prefer that a computer do the straffing and running and looking rather than hoping that the reviewer can kinda match his path like he did last time. There's a flaw in both the canned demo and the 'race car driver's muscle memory demo' trying to make them equal, neither can show the whole story.

Do you see?

Oh, I saw the benefit in non-standards testing when this initially came up with the whole FX fiasco, but the real question is can you see the benefits of any other methods that don't fit the [H] mould ?

I'm looking for more information, not someone trying to tell me which is 'right' or 'wrong' information in their opinion, I'm capable of figuring that out myself thanks.

Mana · Feb 11, 2008

This benchmarking the benchmarks only manages to reinforce that the timedemos that Anandtech and other websites use are relevant.

In Hardocp's real time testing the Radeon 3870 X2 is 85% as fast as the 8800 GTX. Looking at the built in Island GPU time demo comparison, the Radeon 3870 X2 is 88% as fast as the 8800 GTX. Finally, looking at their comparison with a custom timedemo the Radeon 3870 X2 is 91% as fast as the 8800 GTX.

These results, I argue, are all well within a reasonable margin of error. In other words, while the framerates seen in the timedemos are not the framerates we will see while playing the game, timedemoes still accurately show how one videocard performs relative to another videocard.

Basically, given the above numbers, Kyle just validated Anandtech's, and other websites, method.

Niceone · Feb 11, 2008

dawurz said:
The quote from Derek Wilson alone is worth its weight in gold. Not to mention it sounds like he's been drinking Futuremark kool-aid. I wonder what that quote would look like if he was reviewing, say, automobiles - "Driving the car is not necessarily required."

What quote? The one were he did say that Anandtech didn't use cutscene testing. Even when in two games they mentioned it?

Zetherin · Feb 11, 2008

I'm also looking for more information, which is why I'm here, and searching many other places. I, unlike you, feel this slice is more representative of a real gaming experience, and have acknowledged that it isn't completely polished, and that we can learn something from future methodologies. I am not saying anything is 'wrong' or 'right', but rather that this method is more representative of obtaining results of a real gaming experience (which is what I and many other gamers seek) than what I've seen from the majority of other sites that rely on optimized benchmarks.

Again, it's a step in the right direction. Stagnant benchmarking with the current method of time demos, 3dmark results is just not sufficient anymore. People are starting to wise up (like me, after I bought a card and was fooled) and expect results that can apply to them, not a companies marketing campaign.

So, I guess we pretty much agree, however, I'm having trouble making myself clear.

Torgo · Feb 11, 2008

What is old is new again. So many posts are reminiscent of this topic when Kyle switched to the new methodology five years ago. I'm half-way tempted to repost something I wrote on the subject then. Others have posted comments that cover what I wanted to say, but there were a few people with some questions about the methodology being used that can be answered.

Critique #1: Not having a pre-recorded demo introduces too many variances. A standard recording eliminates this.
Answer: It's a valid critique. Manually running through a script does produce variances. This isn't a problem as long as the test is rerun and the results are averaged out over time. This is perfectly acceptable methodology. Easiest way to demonstrate is flipping a coin. Flip it once, it comes up heads. Flip it again, it could come up heads again. However, flip it enough times you will see the results even out to be about 50-50 on the coin coming up heads or tails. All [H] needs to do is ensure that enough tests are run to minimize the variances.

Critique #2: I'll quote Zoson who actually had an insightful comment.

Zoson said:
@Niceone
Who do you guys think design these cards? Scientists and Engineers.
How do you think they quantify the performance of the cards beforehand for games that have yet to be released? How do you think companies can claim 'our new generation performs xx% better than the last!' before it has even had silicon tapped out?

With numbers and benchmarks.

How do I know? I did several years of VLSI(Very Large Scale Integrated Circuit Design).

I sat there with a simulator that showed me my gated clocks. I crunched, by hand, how fast my devices would be on certain hard paths, and then determined its nominal clockspeed.

Answer: Very true that scientists and engineers design these cards. Sadly, it's the marketing department that takes their benchmarks and uses the numbers without knowing their meaning, or more often than not, know what the number means and convert it into something that seems fantastic. It's that old adage about "lies, damn lies and statistics." The easiest to spot are bar graphs where it seems like one product is vastly superior, but when you are able to determine the actual values, it isn't that much of a difference.

The second part of this critique involves using simulators. (Frankly, I'm surprised that you brought this up as a supporting evidence.) A simulator can only give you an approximation of real world performance. It still isn't the same thing. Nor can you be assured that the simulator is perfect. There's still a certain amount of guesswork involved. With enough practice, skill and knowledge it's very possible to have results that are extremely close to reality. Using this logic actually supports the [H] methodology as Kyle's article outlined.

Critique #3: FRAPS overhead.
Answer: As long as all variables are the same, with the exception of the videocard being tested, the overhead is the same for all tests. Any difference in overhead would be caused by conflicts with FRAPS by a particular driver or hardware set. Isn't this something that you would want to know about?

Critique #4: Hardware tested on doesn't compare to my hardware/hardware used for another review.
Answer: I've seen a few people mention this in postings here. "My PC has more/less memory" or "This PC used this chipset at this site". You shouldn't compare raw data across review sites or even compare it to your computer. Two sets of aggregate data lumped together and compared to each other is like comparing Texas to New York. Which is better? Well, what exactly are you comparing? Population? Climate? Resources?

Fact is that every reviewer has their own methodology. There's certain metrics that can be measured and compared, but there is also subjective viewpoints like picture quality. If you are bound and determined to compare metrics, try to at least compare equal metrics. Comparing results from one site to another isn't going to give you anything other than a headache.

Deusfaux · Feb 11, 2008

Deusfaux said:
Sr7 said:

I think the in-game testing is the way to go, but:

Just to be the devils advocate here.. there is still the question of why, when you run the timedemo like everyone else, the x2 loses to the GTX. I understand that fraps and the engine reported FPS may not agree, that's not really in question. The fact is now you've run the test the way other sites have, and you still have a totally different result/conclusion than they do.

So this begs the question: where does THIS discrepancy spawn from?

Click to expand...

Ok so it's super late and I'm super tired and I think I'm missing something too cuz I *get* the article and everything but I must be reading the #s backwards, because it seems like the ATI card does worse regardless - as this guy I quoted points out. Shouldn't it be doing better as the words in the article suggest?

and again

GrapeApe · Feb 11, 2008

Zetherin said:
Again, it's a step in the right direction. Stagnant benchmarking with the current method of time demos, 3dmark results is just not sufficient anymore. People are starting to wise up (like me, after I bought a card and was fooled) and expect results that can apply to them, not a companies marketing campaign.

I think it's one step in the right direction, but I also prefered it more when there was less editorial content, which does seem to be there more to promote a difference than to actually look for the playability of the cards.
The problem with a review in [H]'s format is that it relies on conflict or differences, if they are indeed equals and equally playable (however imporbably that may be) then the review has less salient points to draw upon, and then saying A and B are the same and here's out 6-10 tests showing that, provides the reader with less information than if I had the hystograms from a ton of non-standard games in timed demos at various resolutions, which would provide more information to those stuck at 1280x1024 or 1440x900 or 1920x1200, etc. and whsoe major options are increase in AA etc. Once again coming down to the what's more valuable, 1920x1200 with no AA or 1280x1024 with 4-8XAA? And why?

So I don't trust [H]'s conflict based reviews so much as when they compare 2 models from the same IHV. Some places like Beyond3D did this long ago to remove alot of the Red vs Green BS, and focus more on the hardware itself, without questionable Beta drivers too.

For [H] there's more page hits out of saying A > B rather than to say, everything's pretty equal. And probably far more page hits saying something that goes against commonly held beliefs rather than just saying 'us too'.

So without something to back-up the opinions, then it's just another review with just another different kind of dataset, where this is qualitative rather than quantitative.

Demosthenes642 · Feb 11, 2008

Great article! Benchmark inflating is definately one of the worst things in the industry and it's designed to prey on those who don't know any better.

To those who complain that real world testing isn't scientific, all I can say is that neither are time demos. Scientific method applies to fair comparisons without confounding factors, the state of industry timedemo benchmarking is rather like taking the temperature of a room and having someone in the room use a lighter on the end of the thermometer. Sure the thermometer works just fine and you get a reading but the subject that you're testing is throwing the results.

Like others, I think it would be nice to be able to see charts of all the GPU's speeds relative to one annother. Unfortunately RWT just isn't a good way to do that. To A2A a 8500GS vs a 8800Ultra in say Crysis or Oblivion is not a fair indicator of either. Maybe though A2A charts of say high-end, mid-range, and value cards within the current product generations could work?

Also, I think that [H] reflects the way that people shop for their components. When I set out to buy my GTX, I had that price range in mind and wasn't interested in how a mid range or value card stacked up to a GTX. I was interested in what was, at the time, the biggest baddest card on the block and wanted to see how that class of cards compared to eachother. In that respect I think [H] does a good job of picking it's review matchups.

Personally I read both sites, I mean they're free! Why the hell not! But I feel that RWT definately is the more reliable indicator of my experience with the product.

Chris_B · Feb 11, 2008

Has there been any comments from kyle or brent on this yet?

evilcartman · Feb 11, 2008

Chris_B said:
Has there been any comments from kyle or brent on this yet?

Nope. I've been following this thread from page 1. I have a feeling they're letting this play out for a few days before offering any comments.

mage333 · Feb 11, 2008

I've agree with [H} testing methodology. However this article really preaches to the choir. A lot of words and phrases are repeated, and the article comes across as a defensive at best, and trolling at worst.

I understand from a scientific perspective, you want to reduce the number of variables to produce a reliable test baseline. Reliable tests produce reliable results, which mean repeating the tests produces the same results. The technical descriptions are necessarily repetitive, but you don't have to convey it that way and it detracts from the focus of the article.

Would you consider condensing the graphs/charts/numbers to one page? This would save you from having to copy and paste paragraphs that are essentially long winded captions. I believe the goal of the article is to show that time demos and synthetic benchmarks are not accurate measurements of real world performance. The science and the numbers support you, but it is lost in your writing.

RAMPAGEOH · Feb 11, 2008

I play games, not benchmarks. I trust Hard[OCP] to give me the information I need to make intelligent purchases. I read a multitude of sites but I always come here before I spend my money.

86DRIFTER · Feb 11, 2008

I really trust you Kyle, I really do. I come to HardOCP to read the evaluations on products and see what would be good if I were to build a new system, if. I trust the evaluations and I discuss them with friends who consider new systems. Don't listen to the assholes on Digg, they are just fanboys of other sites and don't understand your cause/don't game themselves but sit there running time demos for their 12 year old friends to show how they get 60fps even though they would get 20fps while actually gaming.

defaultluser · Feb 11, 2008

Kyle, I think you lost your focus while completing this article. Instead of sticking to your guns and ignoring the few whiners, you've produced an article with minimal actual content and fanned the flames of an inter-site feud.

As to the content of your article, are we supposed to draw a conclusion from under a half-dozen benchmark configurations? You can't draw trends from that. Yes, the scaling percentage between the X2 and GTX cards in canned vs real benchmarks varies with resolution and settings; that's been true for a long time, what with differences in the strengths of each card. What is not clear is HOW exactly it varies, and WHY we really should care.

Want to tell us something USEFUL? How about you CHARACTERIZE the performance trends between these two cards and two tests, for a variety of resolutions and settings. Then you can actually draw supported conclusions in your article, instead of just unsupported speculation.

YES, it looks like the 3870 X2 scales better at HIGH setting than the GTX. NO, it is not remotely playable at that setting. YES, moving between the benchmarks, the GTX seems to scale less than the X2 *at some settings*. But because your article is so light on tests and actual data, your message does not come through.

dawurz · Feb 11, 2008

Niceone said:
What quote? The one were he did say that Anandtech didn't use cutscene testing. Even when in two games they mentioned it?

"To measure performance when playing The Witcher we ran FRAPS during the game's first major cutscene at the start of play." - Anand Lal Shimpi, Anandtech - 01/28/08

"We started recording frame rates as the cutscene faded in..." - Anand Lal Shimpi, Anandtech - 01/28/08

"...Call of Duty 4 also lacks any sort of in-game benchmark so we benchmark the cut scene at the beginning of the first mission."

Anandtech doesn't use cutscene testing, but Anand does. Good catch!

FrgMstr · Feb 11, 2008

defaultluser said:
Kyle, I think you lost your focus while completing this article. Instead of sticking to your guns and ignoring the few whiners, you've produced an article with minimal actual content and fanned the flames of an inter-site feud.

As to the content of your article, are we supposed to draw a conclusion from under a half-dozen benchmark configurations? You can't draw trends from that. Yes, the scaling percentage between the X2 and GTX cards in canned vs real benchmarks varies with resolution and settings; that's been true for a long time, what with differences in the strengths of each card. What is not clear is HOW exactly it varies, and WHY we really should care.

Want to tell us something USEFUL? How about you CHARACTERIZE the performance trends between these two cards and two tests, for a variety of resolutions and settings. Then you can actually draw supported conclusions in your article, instead of just unsupported speculation.

YES, it looks like the 3870 X2 scales better at HIGH setting than the GTX. NO, it is not remotely playable at that setting. YES, moving between the benchmarks, the GTX seems to scale less than the X2 *at some settings*. But because your article is so light on tests and actual data, your message does not come through.

I would suggest that message does not come through, because that is not what the article is about.

As specifically spelled out on page 1...

On the following pages we are going to do our best to prove to our readers why we think our way of evaluating the experiences that video cards provide is ultimately better than a simple review.

As to the depth of the testing, that is spoken to as well in the conclusion.

One thing I do think is funny about all of this is that I am watching people comment on our evaluation process that obviously have not taken the time to even read what we have written here today.

I have really said all I wanted to say in this article today and I would count on seeing it linked in quite a few of my posts. That is one of the reasons that I published it as an article. Now I don't have to keep repeating it here in the forums.

aycaramba · Feb 11, 2008

i know from first hand experience that companies routinely tune SW/FW to optimize canned benchmarks.... to recognize when it's being run and do things to optimize it.

that's all i have to say about it.

ReubenRosa · Feb 11, 2008

I chose the the 3870 because I knew the resolution I chose to use it would guarantee me 60 fps and look great. I also chose based on stability of the drivers when it came to Vista. which is what I use. I am going to use a Political comparison. Obama is a fantastic Orator. He can bring out emotions from his audiences and get them excited like no one has in nearly 40+ years. But Orating like he doesn't guarantee that he will when the time comes Actually DO what he says he wants to do. Clinton states using logical,information what she intends to do and how she intends to do and WHY she has to.

Kyles' Belief and stance reminds me of Obama. Great passion Signifying nothing.
No one is going to play the game As these guys play EVery single time. Kyle can't Say that his playing method is how everyone does it. Just because he believes that the average person who buys a 500 dollar 3dcard must only use it with a 30inch screen thats how he has to test it. Because the reality is Not everyone HAS the same exact reasons why they buy a 3dcard or how they intend to use it... OR what games they intend to use it with.

There are people who just play WOW. would His test method really show them Which is the best CARD for them? Someone who plays SupremeCommander that doesn't play the game online at all. because they prefer playing Skirmishes.. and they only play against 2 ai's at a time. IS this testing method going to help those who play in this manner? Someone who only buys Flight Slims. WHich 3dcard is the best for FLight Sim x with the Expansion pack? How is this testing method really going to help those readers?

How is attacking other websites who is not nor has attacked this site.. ever for their testing method.. going to help anyone? I think this doesn't bring anything good for this website. What controversy is there? Other then what you want others to believe in?
Educate don't preach.

nigerian_businessman · Feb 11, 2008

Zoson said:
This reminds me of how Tom Pabst behaved just before his website became the joke of the enthusiast community. Attacking someone elses reviews doesn't prove anything but a childish mentality of 'I'm better than you are.'

Nah, it proves pretty well that this website was in desperate need of some page views. This is typical behavior for Kyle and the staff here, every once in a while they get behind on advertising dollars or they get a bug in their shorts or something and they decide to start some stupid controversy for NO OTHER REASON than to get page views.

This is the same crap that rappers do when their record sales aren't doing so well and they're not getting radio airtime.

It is blatant sensationalism and nothing more. This article didn't prove anything that anyone who's been paying attention didn't already know. All it did was call out another reviewer and another website to start some stupid forum war and increase page views.

And people fall for it every single time. Hook, line and sinker.

fryfrog · Feb 11, 2008

Just wanted to let you know that I appreciate the "Play Experience" style of review much more than any other I've ever read. I really enjoy reading your reviews and I agree with the methodology. I honestly didn't know you didn't even use the framerate in the way you do.

Keep up the good work, you are how I pick out my next video card.

Nenu · Feb 11, 2008

nigerian_businessman said:
Nah, it proves pretty well that this website was in desperate need of some page views. This is typical behavior for Kyle and the staff here, every once in a while they get behind on advertising dollars or they get a bug in their shorts or something and they decide to start some stupid controversy for NO OTHER REASON than to get page views.

This is the same crap that rappers do when their record sales aren't doing so well and they're not getting radio airtime.

It is blatant sensationalism and nothing more. This article didn't prove anything that anyone who's been paying attention didn't already know. All it did was call out another reviewer and another website to start some stupid forum war and increase page views.

And people fall for it every single time. Hook, line and sinker.

Its a bit more than that although I agree they are hoping to get more hits, and why not

They need to defend their position on why they got results that dont show the 3870x2 being better than the 8800GTX like other sites have.
They cant really demonstrate the issue without talking about results from other sites so its to be expected that they are mentioned.
Kyle did state that this is not a dig at other sites but he needs to illustrate the point and provide the evidence.

Mixx256 · Feb 11, 2008

As much as I'd love to read the entire article and all 16 pages of this thread, instead I'm going to play some Witcher (on 1920x1200 at MAX settings) with my 8800GT I chose because of real-world [H] benchmarks that I scored for $220 thanks to a post on [H]ard|Forum. Benchies on other sites would have had me dropping another $150 for a 8800GTS for minimal fps advantage. Thanks [H}!

- Mixx

Niceone · Feb 11, 2008

dawurz said:
"To measure performance when playing The Witcher we ran FRAPS during the game's first major cutscene at the start of play." - Anand Lal Shimpi, Anandtech - 01/28/08

"We started recording frame rates as the cutscene faded in..." - Anand Lal Shimpi, Anandtech - 01/28/08

Anandtech doesn't use cutscene testing, but Anand does. Good catch!

You forgot this:
"A surprisingly successful FPS on the PC, Call of Duty 4 also lacks any sort of in-game benchmark so we benchmark the cut scene at the beginning of the first mission."

bluehaze013 · Feb 11, 2008

Nenu said:
Its a bit more than that although I agree they are hoping to get more hits, and why not
They need to defend their position on why they got results that dont show the 3870x2 being better than the 8800GTX like other sites have.
They cant really demonstrate the issue without talking about results from other sites so its to be expected that they are mentioned.
Kyle did state that this is not a dig at other sites but he needs to illustrate the point and provide the evidence.

This article did nothing to explain their differing results, nor was it neccesary to take a dig at another site. In all reality if you read this article you would see the conclusion they come to actually validates AT's testing method because the performance difference via "canned benchmarks" even though showing higher FPS was consistent with their initial review showing the 8800GTX performing on par or better than the 3870x2so i'm not sure why they would take a dig at them other than to generate traffic from that site as others have suggested.

The reasopn for the 3870x2 smoking the GTX over at AT and not here is simply because Anandtech used a much better processor to do the tests than [H] did (Quad Core 3.8ghz vs Dual core 2.9ghz) If [H] did their test with a Quad at 3.6ghz or better they would see the results would be much different. Somehow I doubt they will ever do that though as that would involve admitting they got this review wrong.

Cheers

nigerian_businessman · Feb 11, 2008

Nenu said:
Its a bit more than that although I agree they are hoping to get more hits, and why not
They need to defend their position on why they got results that dont show the 3870x2 being better than the 8800GTX like other sites have.
They cant really demonstrate the issue without talking about results from other sites so its to be expected that they are mentioned.
Kyle did state that this is not a dig at other sites but he needs to illustrate the point and provide the evidence.

Thats just it. Anyone who comes here for their 'evaluations' already knows their different testing method. And while I to some extent agree with their method as being useful, I think it's a stretch to call it superior and essentially place it on a pedestal above others.

But, as I've said and as anyone who's been here for any length of time already knows, this is par for the course. Some might call it 'sticking it to the man' or whatever, but really it's just a ploy to grab attention. I don't mind it so much when they go after companies and make a big stink. That's standard reporting, and it shows integrity to not buckle under pressure of the companies who's products you're reviewing.

Going after other websites in the community is in bad taste. Anandtech is every bit as important in this community as this website, and they deserve better from a peer than to be called out in an article. It might not be 'meant as a dig at other sites' but thats what it was. If there was some sort of disagreement, it could have been addressed in private or perhaps in a joint venture between both websites. Instead, this article just aims to stir the pot and create controversy where there shouldn't be any. Doubt me?

Discuss

Oh man, this should be a good one. Join our HardForum thread here.

The only redeeming thing about this sensationalist garbage is that it probably also raised Anandtech's page views for a little while. So at least there is that. But it's still very juvenile.

Melons · Feb 11, 2008

This methodology isn't good enough. It isn't good enough at all.

1. You cannot cherry pick just one game. Gimme a dozen or more.
2. You cannot deny canned results. They will compare to previous/similar hardware reasonably enough.
3. You cannot use Fraps as that is already a debatable product for these tests.
4. You cannot use player experience over such a minimal time frame locked into particular settings. The setting used depend on what the consumer is able and wants to run.
5. Give us all a minimum 10 FPS overhead as in the real world we have to run anti-virus, firewalls etc. :-D

0ptional · Feb 11, 2008

Melons said:
This methodology isn't good enough. It isn't good enough at all.

1. You cannot cherry pick just one game. Gimme a dozen or more.
2. You cannot deny canned results. They will compare to previous/similar hardware reasonably enough.
3. You cannot use Fraps as that is already a debatable product for these tests.
4. You cannot use player experience over such a minimal time frame locked into particular settings. The setting used depend on what the consumer is able and wants to run.
5. Give us all a minimum 10 FPS overhead as in the real world we have to run anti-virus, firewalls etc. :-D

With just your first post I know you missed the point of the article. It wasn't to compare these graphics cards, it was to compare the benchmarking methods used and how they pertain to consumer relevant info.

Drakul · Feb 11, 2008

wow 16 pages!

I thouroughly enjoyed the article and wholeheartedly agree.
rock on Kyle!

toddw · Feb 11, 2008

As a long time reader I, and probably most of your audience, already know how you evaluate video cards from reading many reviews, so who is this article for?

Melons · Feb 12, 2008

Optional87 said:
With just your first post I know you missed the point of the article. It wasn't to compare these graphics cards, it was to compare the benchmarking methods used and how they pertain to consumer relevant info.

Och, it was never my first post here - do a search, please. But I don't believe I missed the point.

As a consumer HardOCPs info had provided additional details, which I know due to the now revealed testing methods find less relevant than previously. This is saddening, as until the recent article I had put some store by.

If it wasn't for the methods employed I'd feel more assured as to future purchases.

Benchmarking the Benchmarks @ [H]

Gawd

Gawd

Gawd

Limp Gawd

Limp Gawd

Gawd

Limp Gawd

2[H]4U

Gawd

n00b

n00b

n00b

Gawd

Gawd

2[H]4U

Gawd

n00b

n00b

Supreme [H]ardness

[H]ard|Gawd

n00b

Limp Gawd

Limp Gawd

[H]F Junkie

n00b

Just Plain Mean

n00b

Limp Gawd

[H]ard|Gawd

n00b

[H]ardened

n00b

Gawd

Gawd

[H]ard|Gawd

n00b

Don't Trust Your Friends with Your Decanter

Limp Gawd

[H]ard|Gawd

n00b