If you play through a level with FRAPS running and generate those numbers, fine, but how do you do that consistently? You cannot, as the user or game engine will never generate the same results twice. Therefore, this testing methodology is not accurate and is nothing more than an estimation of performance within a single gameplay setting. If we hang our results on card A being 2FPS faster than card B with this method, then it is useless as the results will vary too much to generate meaningful numbers.
Who is to say the levels they play through are the right ones? Do they test multiplayer? I would just like to see the "actual" gameplay sequences they are using to report the numbers. Can I recreate that gameplay to see how well my system performs or do I continue to count on my seats of the pants feeling about what my system is doing or blindly trust the results. One good thing about built-in benchmarks is the ability to get 90% of the way there when it comes to setting up a system quickly or having fun comparing numbers against the systems of my friends, which usually results in a credit card being used a few minutes afterwards.![]()
You don't make sense though.
I would agree with you had a way to get the same run through each time. However you don't.
I could understand using a time demo that you guys created to run through a section of the game. That way it would be the same each time. However since you don't your using two diffrent sets of data to come to a flawed conclusion.
Thanks for the reply Kyle. I do not think you actually answered my question. I fully understand you play through the game, you then decide to take a particular level or section of the game that best represents overall gameplay, setup the system, (insert my question below here), you chart that data, report it for those of us that like numbers, and then come to an objective/subjective conclusion based on how well the game played utilizing a variety of settings (resolutions, AA/AF, etc). That is all fine and dandy and the experiences commented on during actual gameplay is wonderful in my opinion.
However, my question is how do you capture the performance data to create those charts and FPS results? Do you create timedemos while playing the game, then replay them with the game engine while FRAPS is running to collect the performance data or do you (your staff) play through a particular level with FRAPS running and collect that data for reporting the numbers. If it is the latter, how do you ensure consistency between test runs, what variability do you see between each run, and are your FPS results then averaged for a composite score? If you use timedemos, then how are yours different than other sites that use the same testing methodology with FRAPS or the game engine to report results. I think answering these questions would clear up a lot of what has been said today.![]()
I fully understand you play through the game, you then decide to take a particular level or section of the game that best represents overall gameplay, setup the system, (insert my question below here), you chart that data, report it for those of us that like numbers, and then come to an objective/subjective conclusion based on how well the game played utilizing a variety of settings (resolutions, AA/AF, etc).
If it is the latter, how do you ensure consistency between test runs,
However, my question is how do you capture the performance data to create those charts and FPS results? Do you create timedemos while playing the game, then replay them with the game engine while FRAPS is running to collect the performance data or do you (your staff) play through a particular level with FRAPS running and collect that data for reporting the numbers. If it is the latter, how do you ensure consistency between test runs, what variability do you see between each run, and are your FPS results then averaged for a composite score? If you use timedemos, then how are yours different than other sites that use the same testing methodology with FRAPS or the game engine to report results. I think answering these questions would clear up a lot of what has been said today.![]()
Click the link that says discuss this in our forums at the end of the article
Here is the whole post, and there is some more information in that thread as well as in the review, each test is run three times per card and the results are averaged:
Thanks for the words of support today guys, much appreciated. Also be sure to check out the other reviews, we posted quite a few in the news today. I was a little disappointed with Hardocp and their stance today on their front page, stating that apparently everyone else is using "canned" benchmarks and they are the only ones who aren't.
We haven't been using premade benchmark scripts for a very long time (we also dropped 3dmark in reviews before hardocp did), stuart uses real game play. This is in fact how we noticed the minimum frame per second issue we noted (page 17 of the review), I don't think anyone else has even covered this in detail, take from that what you will! Not only that but we managed to get a statement direct from AMD explaining why. We have spent the last week working with AMD on the drivers, so I think its pretty fair to say we know what we are talking about.
That said, id rather not dwell on this, I hope our review has been informative, entertaining and educational.
On every graph I've seen in these reviews, the FPS rises and dips in almost the same identical spots on all video cards tested. That shows that the same areas, and very similar battles/whatever are happening on both video cards.
As I said, I read the article. I saw the note that said they ran their tests three times and averaged the results. While that still is not specific as to how they test, it is the exact wording used to describe the canned benchmark method on most sites.
When he says, "oh no, we use real-world tests," does he mean they create their own time demos instead of using ones included with the game, or does he mean they FRAPS actual live gameplay like [H] does? There is a difference in the quality and accuracy of the results. For example, given how much higher the AA settings and framerates were on their tests for COD4 at the same resolution as [H] used, I would tend to believe that they used a testing method which did not capture all the elements of real gameplay.
ati needs to realize that no matter what they do to it R600 will never beat the G80. and get to work on the R700. mabey it'll be somewhat competive with the gt200 or whatever the next gen geforce will be![]()
WOW, 30 pages... I can make this easy for you folks whining and crying. Read Hardocp, then Anandtech, then <insert your favorite review site>. After that, take all that you read and use that goblet of information for your needs.
People who complain heavily about H's methods are just fuckin lazy. Read, you shmucks. Read more than one review site to garner your opinion about these complex technical devices.
/end thread
ps: thanks H. Because of you I bought a 8800GT for $230 and I have been happily gaming on my 24 and 22in lcds at my native resolutions.
pps: It's very funny , i know i said it many times before, but its funny to see AMD losing with AA/AF enabled. AMD is the reason H even started making a big deal about AA/AF performance. These young members that joined/lurked before the 9700 days are too young to be giviing opinions, lol.
The only reason [H] would need to show canned benchmarks would be to quieten down those that think there must be something wrong with the Hardware or config Kyle/Brent use.
The results [H] got speak for themselves.
Its even more work for them to include other benchmarks that dont show how it performs in games so I understand why they dont include them
they speak for themselves how ?
All I see is Kyle defending himself and his testing method that is getting drasticly diffrent results .
Why are they getting drasticly diffrent results and is thier way of testing really showing us a true example of how these cards will play games ?
I don't think his way is any more or less valid than canned tests , however with canned tests you can repeat them and they will be the same everytime you run them which gets rid of as many x factors as possible.
I mean everytime i hear about a new game its allways about some crazy new a.i that will make the bad guys act more human and react better than the older ones and each time you play the game you will experiance something new .
So tlel me how can benchmarking that with no control produce correct results ?
Why are they getting drasticly diffrent results and is thier way of testing really showing us a true example of how these cards will play games ?
http://www.theinquirer.net/gb/inquirer/news/2008/01/28/review
I don't know if this post has been mentioned yet, but it basically is INQ showing off links to a whole bunch of 3870 x2 reviews, but it lacks the HardOCP review, and I seriously have to wonder why. HardOCP is as big of a name as any of the other 20 something reviews on there, but it seems the be the only one using a testing methology that works.
Crysis GPU test = high FPS
Crysis Assault Map = low FPS
Which do I want to see how well the video card in question does? Assault....
Call of Duty 4 "Opening Cutscene"= high FPS, who cares?
Call of Duty 4 "The Bog" level, one of the most demanding in the game = low FPS
I want to know how hard the video card is going to be hit.... not how fluid that bastard's smoking is going to look.
This keeps getting explained to you, but you refuse to accept the answer. Play even 5 minutes of a level through on almost any game. Even if you don't shoot someone as fast as you did the first run through, you can still basically mimic the gameplay, where you go, what you see.
You use the advanced AI as an excuse that you can't replay a level the same way twice. Although that's what the developers want you to believe, how often is that actually the case? Bioshock had incredible life-altering AI?![]()
YES The new ATI card is better than the ultra. Its beats the Ultra in price/performance and performance in general
Sorry Kyle , I just don't agree with you. You've voiced your opinon many times.
Also Please take this last TID BIT in account....
OVERCLOCKING!!!!
I'm glad you've heard me . I was responeding to others that were quoting me.
As for cursing , its not my cup of tea . You attract more flys with honey than viniger.
As for something new to say. How about a question
Can you run an x2 and a 3870 in a tri crossfire type deal ? I know tha the x2 has the slower ram but could they could handle that in software .
Can you run an x2 and a 3870 in a tri crossfire type deal ? I know tha the x2 has the slower ram but could they could handle that in software .
I'm disagreeing and you have yet to explain to me how your able to get acurate results from something as random as playing through a level.
Another fact to remember is that gaining performance by adding GPUs is a process of ever-diminishing returns. I suspect that combining two of these cards in a CrossFire arrangement (effectively 4-way CrossFire) will not be as efficient as adding two single-GPU NVIDIA cards together in an SLI arrangement. Plus, theres nothing stopping NVIDIA doing exactly the same with one of their recent cores, such as the 8800 GT. However, despite having already dipped its toes into the waters of dual GPU solutions, this approach does go against the flow where NVIDIA is concerned. Dont forget that NVIDIA absorbed 3dfx, which faltered in part from developing multi-GPU solutions back in 1999. Such solutions result in extremely complex and expensive boards. NVIDIAs instinct has been to move in the opposite direction and reduce board complexity while maintaining performance. The latest 8800 incarnations have been a prime example. Whether this consolidation of components is a prelude to squeezing two GPUs onto one board, only time and the rumor-mill will tell.
Basic probability works rather nicely. Three "gameplay" runs through the game with Card 1. Three "gameplay" runs through the game with Card 2. There is no way for any of these runs to be identical to any other, but the reviewer is (presumably) attempting to make them similar. Now, if you presume that the reviewer is unbiased, anomalies will happen RANDOMLY throughout the runs. Run1/Card1 might have a couple guys popping grenades all at once thirty seconds in, but it is equally likely that Run3/Card2 will have a smoke grenade going off three minutes in. As these anomalies are random, they can be accounted for by taking enough samples. In [H]'s case, the samples appear to be several minutes of gameplay, several times. Law of large numbers at work here: you take enough samples, you get a meaningful answer that averages out anomalies. It is statistically possible to still end up with a meaningless result, but it is much less likely. I assume it is part of the reviewers job to interpret the results and judge if more testing is necessary.
This methodology is ENTIRELY dependent on reviewer integrity, but it is also less susceptible to manufacturer tampering. How do you know Graphics Company #3 didn't spend man hours spanking their drivers to render water a little faster during one part of a canned flyby?
Frankly, this is one hell of a risk for Kyle to take. If the reviewers are EVER caught with their hands in the cookie jar, this site will probably pull a Gamespot.
It's in that thread man here I will quote it out for you:
I take it you aren't a regular, so ill break it down. We use real game play to test graphics cards, we have done so for a long time. if we used pre built benchmarks or time demos we wouldn't have found the minimum frame rate issue as the majority of sites who used these still haven't even found the issue. This is only seen with extended gameplay, or what id like to call "real life" gameplay. We play games for hours behind the scenes with new cards before we even start benchmarking. This is the best way to find or note anything unusual before we get into the indepth review testing. This is how we found issues that caused AMD to delay their NDA for a week. Issues we helped solve.
if you would like anything else explained please ask away.
Every single test in the review used real gameplay... i can confirm that as i was the one who tested the cards and wrote the article. We/I havent used timedemos, or built in benchmarks for years, and have never benched a cutscene in my life.
and heres a link to the thread:
http://www.driverheaven.net/news/153658-dh-review-ati-hd-3870-x2-2.html
whats throwing me is DRIVER HEAVEN claims to have run thier test the same way as KYLE does ...and in cod4 at similar settings the numbers are way off
the 3870x2 slaps up the ultra on DH.... but loses to the gtx at the same rez on [H]