OK guys, I am in need of some third party advice on a system issue I've been dealing with for a couple of months now.
First the detailed system specs:
PSU: BFG LS680 as reviewed here at the [H] purchased late Nov 2008
http://hardocp.com/article/2008/06/09/bfg_ls_series_power_supplies/8
CPU: Intel Core i7 920 Rev C0. Currently not OC'd. Has been lightly in the past.
Heatsink: Cogage TRUE Spirit /w stock fan.
Motherboard: Asus P6T Deluxe running current BIOS
RAM: Corsair TR3X6G1333C9 which is 3 x 2GB DIMMs.
http://www.corsair.com/_datasheets/tr3x6g1333c9.pdf - Corsairs pdf with specs on the kit as a whole
http://www.corsair.com/_datasheets/TW3X4G1333C9.pdf - the corresponding 4GB kit pdf that has more info on the individual modules (which are the exact same)
Other:
Creative X-Fi Titanium Fatality
3 HDD
2 Optical drives
Dell Ultrasharp 2407 1920x1200 (main monitor, all gaming is on this one)
Dell Ultrasharp 2005 1650x1050
APC Back-UPS XS 1500 about 1 yr old
Win7 x64
Now for the video card details. (Note, I am not OC'ing these cards and heat does not appear to be an issue with any of them)
I bought an EVGA GTX260 (216) SSC Edition card late in December 2008. All was well with no known issues until around 2 months ago when I decided to give Metro 2033 a try after seeing the talk about its great graphics. Previously I was a hardcore WoW raider and had never had any issues. I had also played Farcry 2, Just Cause 2, Starcraft 2, and Civ5 without issue. Then I tried Metro and things changed, quickly. The crashes were very predictable. For those who have played at the very begining when you go to the surface with Miller, you put the gas mask on and go out into the hall, my system would crash coming up the stairs about 8 or 9 times out of 10. It was a hard lockup, the screen would be a solid color. So I to test my card stability I run Furmark and it crashes about 30-90 second in almost every time. Wow, SC2, and Civ5 are still fine.
So my though process is Metro is a game that is known to push cards hard and its finding faults in my card that none of my other games could expose. I sumbit a ticket with EVGA, we go through the standard stuff like reinstalling drivers and blah blah none of which helps and they approve me for an RMA, another GTX 260. I get this card and I still have some crashes in Metro and Furmark from the start, although it is less frequent than with the other card so I email EVGA and let them know. Later that day I get a phone call from a friendly guy named Mike, who is an RMA manager at EVGA to discuss my problem. We cover lots of things including system specs and such and he decided to send me another card, this time a GTX280 standard clock that he himself will be stress testing first.
I get the GTX280 and put the pedal to metal right away and Metro, Furmark, and everything are running great. Awesome, my faith in my system and EVGA is restored, for about a week. I'm having issues again, but of a very different nature this time. Textures tearing is really bad on SC2 similar to how this screenshot looks under where it says "Lost Coast", only worse.
http://www.playtool.com/pages/artifacts/background010000.jpg
WoW has some tearing and a sprinkling of off pixels that appear to be "stuck" that is most noticeable on the login screen. Civ5 has terrible tearing. Most games are unplayable, sometimes it crashes when tearing is present, sometimes not, very frustrating.
I inform Mike of these new developments and we are emailing back and forth about my system, specs, and testing. We check my PSU 12v which seems to be fine, my RAM timings and voltage which are fine. The problems grow worse over time, and soon it seems anything that stresses the video card at all, including Flash 10.1 accelerated video has a chance to crash it. I even did a fresh install of Win7 on a spare HDD that made no difference. So he agrees to test and send me another GTX280.
This is the card I've currently had for about 3 weeks, it was perfect at first as well, but it started messing up a week ago. These errors were different as well, no texture tearing like before, no crashes with solid color screens like the cards before that. When this one crashes it exhibits the signs of a Video RAM issue looking very much like this:
http://www.playtool.com/pages/artifacts/graphicscardmemory3.gif
Here is some things I have noticed about the issue, especially with the current card. Initially is was an odd crash here and there, but became worse over time. It is by far most likely to crash running a game either upon boot, or coming out of sleep mode. Once I get a solid boot with no crashes, I can game away on it all day with no issues, however if I put it into sleep mode when I go out or go to bed for the night it is extremely likely to crash when I wake it. The same applies if I shut it down instead. One night I decided to perform a test, I had been gaming for a few hours without issue and let the machine run all night while I slept, and when I woke up I was able to run games fine with no crashing. I did try turning on ACPI 2.0 in the BIOS as well as Repost Video BIOS on S3 resume just to see if it made a difference coming out of sleep mode, it has not changed anything.
Here is the testing I have done to qualify my systems stability.
CPU:
Prime95 is stable on all 3 standard torture tests, Small FFT, In place large FFT, and Blend for over an hour on each. This has been tested many times, and my cpu temps are good. I am quite sure my CPU is fine.
RAM
Memtest86+ 4.10 reports no issues with my RAM. I have let several passes run on individual sticks, 2 at a time in alternating slots, and all 3 at a time.
My RAM is correctly programmed by SPD. It is running at 1.7V which is plenty for triple channel. My current timings are 8-8-8-20 @ 1066MHz which is more conservative than how Corsair tests them @ SPD settings of: 7-7-7-20 at 1066MHz (per the 2nd pdf link above for my RAM). I am quite certain my RAM is fine.
PSU:
In the BIOS my 12V rail reads 12.30. At idle in Windows it is 12.21. The lowest I've ever seen it go under extreme load such as OCCT, Furmark, or Prime95 (max heat and power consumption) + Furmark is 11.88. I don't think that is any cause for concern. I know the PSU's are common culprits if a system is suspected of killing video cards. I'm looking for opinions on this or suggestions for better testing.
Additonal PSU note: I am able to see the load in Watts for my system on my UPS's display. The highest I've seen was while running Prime95 + Furmark where it registered 520 Watts. Additionally my main monitor is on the UPS as well which was on at the time, but I have tested it since to use 60-65 Watts on average, meaning my systems peak load on the UPS was 455-465 Watts, which should not be a problem for my 680W PSU.
Motherboard & Other:
For the heck of it I have also ran some other benchmarks stressing the subsystems such as Sandra's benchmarks. No issues with any of this testing. I am unsure of good ways to specifically test my PCI-E slot, but my motherboard is the next most likely thing that could be causing issues (if it turns out to be my system).
With all the history out of the way I will say this. I have wavered back and forth between blaming their cards and my system. All the tests I know to run do not show any issues with my system. My initial GTX260 card was fine for over a year and half on everything short of Metro and Furmark and even then everything else I ran was fine to up the time when I sent it in. There was no visible degredation over time I've seen with their RMA cards. During the times when their RMA cards were making my system unusable so often I swapped it out with an old Geforce 7600 card, which over this whole ordeal has spent a good deal of time in my system which has yet to exhibit any issues, even for the light gaming I have done on it (with much reduced settings of course). Furmark is 100% stable on the 7600 card btw.
I am also aware of where repaired/refurbished RMA cards come from. They are someone elses card that failed and was repaired by the mfg. I know from these very HardForums that often video RAM issues can be caused by microfractures in the solder on RAM chips and the mfg will bake a card similar to what some H posters have done to reliquify the solder and fill in the microfractures making the connection good again.
This leads me to 2 possible conclusions:
1. Either my PSU or motherboard is slowly killing cards, but with testing and background above I have lots of doubts that this is the case. Speculation and further testing advice is VERY welcomed.
OR
2. The cards they are sending me are at fault. Keep in mind here both EVGA and my initial stress testing show a good card. However in my use which I imagine is like many of you, I either turn my PC off at night or put it into sleep mode. The point I am making here is this. During their stress testing the card is subject to one long burn period. In my case it is powered up in the morning, heating during play, and cools down at night. There testing did not account for this at all. Over the period of weeks is the process of heating and cooling causing the solder joints to refracture on these cards that they claim to have repaired? Speculation on this point is also very welcomed.
This problem is driving me nuts and conflicting evidence is making it hard to figure out.
Thanks in advance to all who read this, I'm hoping with the collective knowledge and experience present in these forums can help me nail this problem for good.
First the detailed system specs:
PSU: BFG LS680 as reviewed here at the [H] purchased late Nov 2008
http://hardocp.com/article/2008/06/09/bfg_ls_series_power_supplies/8
CPU: Intel Core i7 920 Rev C0. Currently not OC'd. Has been lightly in the past.
Heatsink: Cogage TRUE Spirit /w stock fan.
Motherboard: Asus P6T Deluxe running current BIOS
RAM: Corsair TR3X6G1333C9 which is 3 x 2GB DIMMs.
http://www.corsair.com/_datasheets/tr3x6g1333c9.pdf - Corsairs pdf with specs on the kit as a whole
http://www.corsair.com/_datasheets/TW3X4G1333C9.pdf - the corresponding 4GB kit pdf that has more info on the individual modules (which are the exact same)
Other:
Creative X-Fi Titanium Fatality
3 HDD
2 Optical drives
Dell Ultrasharp 2407 1920x1200 (main monitor, all gaming is on this one)
Dell Ultrasharp 2005 1650x1050
APC Back-UPS XS 1500 about 1 yr old
Win7 x64
Now for the video card details. (Note, I am not OC'ing these cards and heat does not appear to be an issue with any of them)
I bought an EVGA GTX260 (216) SSC Edition card late in December 2008. All was well with no known issues until around 2 months ago when I decided to give Metro 2033 a try after seeing the talk about its great graphics. Previously I was a hardcore WoW raider and had never had any issues. I had also played Farcry 2, Just Cause 2, Starcraft 2, and Civ5 without issue. Then I tried Metro and things changed, quickly. The crashes were very predictable. For those who have played at the very begining when you go to the surface with Miller, you put the gas mask on and go out into the hall, my system would crash coming up the stairs about 8 or 9 times out of 10. It was a hard lockup, the screen would be a solid color. So I to test my card stability I run Furmark and it crashes about 30-90 second in almost every time. Wow, SC2, and Civ5 are still fine.
So my though process is Metro is a game that is known to push cards hard and its finding faults in my card that none of my other games could expose. I sumbit a ticket with EVGA, we go through the standard stuff like reinstalling drivers and blah blah none of which helps and they approve me for an RMA, another GTX 260. I get this card and I still have some crashes in Metro and Furmark from the start, although it is less frequent than with the other card so I email EVGA and let them know. Later that day I get a phone call from a friendly guy named Mike, who is an RMA manager at EVGA to discuss my problem. We cover lots of things including system specs and such and he decided to send me another card, this time a GTX280 standard clock that he himself will be stress testing first.
I get the GTX280 and put the pedal to metal right away and Metro, Furmark, and everything are running great. Awesome, my faith in my system and EVGA is restored, for about a week. I'm having issues again, but of a very different nature this time. Textures tearing is really bad on SC2 similar to how this screenshot looks under where it says "Lost Coast", only worse.
http://www.playtool.com/pages/artifacts/background010000.jpg
WoW has some tearing and a sprinkling of off pixels that appear to be "stuck" that is most noticeable on the login screen. Civ5 has terrible tearing. Most games are unplayable, sometimes it crashes when tearing is present, sometimes not, very frustrating.
I inform Mike of these new developments and we are emailing back and forth about my system, specs, and testing. We check my PSU 12v which seems to be fine, my RAM timings and voltage which are fine. The problems grow worse over time, and soon it seems anything that stresses the video card at all, including Flash 10.1 accelerated video has a chance to crash it. I even did a fresh install of Win7 on a spare HDD that made no difference. So he agrees to test and send me another GTX280.
This is the card I've currently had for about 3 weeks, it was perfect at first as well, but it started messing up a week ago. These errors were different as well, no texture tearing like before, no crashes with solid color screens like the cards before that. When this one crashes it exhibits the signs of a Video RAM issue looking very much like this:
http://www.playtool.com/pages/artifacts/graphicscardmemory3.gif
Here is some things I have noticed about the issue, especially with the current card. Initially is was an odd crash here and there, but became worse over time. It is by far most likely to crash running a game either upon boot, or coming out of sleep mode. Once I get a solid boot with no crashes, I can game away on it all day with no issues, however if I put it into sleep mode when I go out or go to bed for the night it is extremely likely to crash when I wake it. The same applies if I shut it down instead. One night I decided to perform a test, I had been gaming for a few hours without issue and let the machine run all night while I slept, and when I woke up I was able to run games fine with no crashing. I did try turning on ACPI 2.0 in the BIOS as well as Repost Video BIOS on S3 resume just to see if it made a difference coming out of sleep mode, it has not changed anything.
Here is the testing I have done to qualify my systems stability.
CPU:
Prime95 is stable on all 3 standard torture tests, Small FFT, In place large FFT, and Blend for over an hour on each. This has been tested many times, and my cpu temps are good. I am quite sure my CPU is fine.
RAM
Memtest86+ 4.10 reports no issues with my RAM. I have let several passes run on individual sticks, 2 at a time in alternating slots, and all 3 at a time.
My RAM is correctly programmed by SPD. It is running at 1.7V which is plenty for triple channel. My current timings are 8-8-8-20 @ 1066MHz which is more conservative than how Corsair tests them @ SPD settings of: 7-7-7-20 at 1066MHz (per the 2nd pdf link above for my RAM). I am quite certain my RAM is fine.
PSU:
In the BIOS my 12V rail reads 12.30. At idle in Windows it is 12.21. The lowest I've ever seen it go under extreme load such as OCCT, Furmark, or Prime95 (max heat and power consumption) + Furmark is 11.88. I don't think that is any cause for concern. I know the PSU's are common culprits if a system is suspected of killing video cards. I'm looking for opinions on this or suggestions for better testing.
Additonal PSU note: I am able to see the load in Watts for my system on my UPS's display. The highest I've seen was while running Prime95 + Furmark where it registered 520 Watts. Additionally my main monitor is on the UPS as well which was on at the time, but I have tested it since to use 60-65 Watts on average, meaning my systems peak load on the UPS was 455-465 Watts, which should not be a problem for my 680W PSU.
Motherboard & Other:
For the heck of it I have also ran some other benchmarks stressing the subsystems such as Sandra's benchmarks. No issues with any of this testing. I am unsure of good ways to specifically test my PCI-E slot, but my motherboard is the next most likely thing that could be causing issues (if it turns out to be my system).
With all the history out of the way I will say this. I have wavered back and forth between blaming their cards and my system. All the tests I know to run do not show any issues with my system. My initial GTX260 card was fine for over a year and half on everything short of Metro and Furmark and even then everything else I ran was fine to up the time when I sent it in. There was no visible degredation over time I've seen with their RMA cards. During the times when their RMA cards were making my system unusable so often I swapped it out with an old Geforce 7600 card, which over this whole ordeal has spent a good deal of time in my system which has yet to exhibit any issues, even for the light gaming I have done on it (with much reduced settings of course). Furmark is 100% stable on the 7600 card btw.
I am also aware of where repaired/refurbished RMA cards come from. They are someone elses card that failed and was repaired by the mfg. I know from these very HardForums that often video RAM issues can be caused by microfractures in the solder on RAM chips and the mfg will bake a card similar to what some H posters have done to reliquify the solder and fill in the microfractures making the connection good again.
This leads me to 2 possible conclusions:
1. Either my PSU or motherboard is slowly killing cards, but with testing and background above I have lots of doubts that this is the case. Speculation and further testing advice is VERY welcomed.
OR
2. The cards they are sending me are at fault. Keep in mind here both EVGA and my initial stress testing show a good card. However in my use which I imagine is like many of you, I either turn my PC off at night or put it into sleep mode. The point I am making here is this. During their stress testing the card is subject to one long burn period. In my case it is powered up in the morning, heating during play, and cools down at night. There testing did not account for this at all. Over the period of weeks is the process of heating and cooling causing the solder joints to refracture on these cards that they claim to have repaired? Speculation on this point is also very welcomed.
This problem is driving me nuts and conflicting evidence is making it hard to figure out.
Thanks in advance to all who read this, I'm hoping with the collective knowledge and experience present in these forums can help me nail this problem for good.
Last edited: