Chasing memory problems (I need a hand!)

createcoms

Weaksauce
Joined
Mar 14, 2004
Messages
90
Hi guys :)

A computer I built for a friend has started suffering ill-health (for no user-generated reason it would seem).

When I got to it, the computer would barely last 5 minutes in Windows Vista Home Premium x64 edition, before a BSOD would appear - always the same error STOP 50 Page fault in nonpaged area. Because it's the peak of summer here I couldn't ignore the timing of this issue and so I underclocked the CPU to 1GHz (It's an Athlon64 x2 6000+) to no avail. I googled as you do, and upon learning that some programs can provoke this I ran msconfig and disabled startup items. However it would seem that no program is causing this, as I can do different tasks in random sequences and the BSOD still appears. I also have tried Safe Mode in case the video driver is at fault - but to no avail.

Learning also that STOP 50 can be the face of memory problems I ran memtest 3.4 - and discovered huge amounts of errors! This ruled out Vista and all software issues. I then tested each of the two Corsair sticks, find that one stick would lock the system within 30 seconds whilst the other stick would start chucking out errors after about 1 hour 20 minutes of testing. So I ordered some more ram (G.Skill) and installed that. It didn't come up with errors initially but test 5 did get errors after 30+ minutes - this is with both sticks in dual-channel mode. So it has improved but not fixed it! I have manually set the voltage as per the G.Skill rating (2.0-2.1v), and also tried running at very relaxed 6-6-6-18 timings. I am still encountering the STOP 50 error (albeit not as quickly as the previous corsair pair).

The motherboard is an ASUS CROSSHAIR Nforce 590 SLI and is powered by the excellent corsair 620W which everyone around here and abroad gives the thumbs up.

Looking at the motherboard I theorised the stock cooler was antagonising the modules by exhausting hot air directly onto them, and also noted that the CPU idled at 45-52 degrees celsius which seems quite hot!

I have ordered a Gigabyte Galaxy II water cooler, not for overclocking but to remove this situation so that the CPU heat is dumped elsewhere and the MOSFET fan it comes with can cool the big heatpipe setup and the modules too. But the question is, can the hot air coming from the the CPU cooler really cause the modules to fail, and can it damage them permanently like the Corsair modules seem to be? No overclocking here, since my friend has no patience for such things but it's still a high end part....

Your thoughts are appreciated :)

cheers

-cc
 
well.. yes and no. heat is the bane of all things electronical, but in that situation, they should work just fine. it's odd that both kits have been giving you issues, and leads me to think that the issue lies with something other than the ram. perhaps the motherboard, or the settings you are using?

how does the g.skill fare with different amounts of voltage, either more or less than stock? ;)
 
I tend to agree that is sounds like a motherboard issue, be it hardware or settings.

First thing I would do is test all the RAM, individually, in a different system... using Memtest.
I have a strange suspicion that the Corsair will work fine in another system.

I don't expect that CPU HSF exhaust has caused the RAM to fail, certainly not permanently. Only way I would buy that is if the RAM was so highly overclocked that it was borderline over-temp already.

Does ASUS's site have a recommended RAM / compatibility chart? Many manufacturer's do. I would also check the ASUS support site for any RAM related FAQ's.
 
I have tried lowering the RAM speed to DDR400 levels (way below spec), in addition to relaxing the timings as much as is possible. Voltage tried has been motherboard default as well as the limit of the ram's official spec (2.1v) and to no avail.

As I mentioned there's no overclocking, and I also set the CPU to 1GHz via the multiplier (since the BIOS cannot allow you to drop the FSB below 200).

Turned off all the performance "tweaks" in the BIOS so the system is slower than a turtle, yet the BSOD still appears.

If it's not the heat I cannot imagine what it could be......


-cc
 
Forget the BSOD. That is a given if the hardware is experiencing memory errors as you stated.

From years of PC repair experience, I can say with confidence that if 2 sets of different RAM give you similar issues... keep looking. Corsair and G.Skill are both reputable brands, more than capable of operating under oc'ing conditions, thus capable of withstanding a warm environment. If it's a heat issue, it's with the CPU or another board component.

Don't forget that AMD's run the memory bus directly from the CPU. There's no more "northbridge". You may want to make certain that your CPU isn't to blame. I'd also try re-seating the HSF. CPU temps can cause plenty of weirdness.

If you have the ability, Memtest the RAM in another system. It's easy peace-of-mind.
If you have the ability, I'd also test the CPU in another system... or test another CPU in this system. Keep in mind, the more data you can collect, the easier it will be to determine an absolute cause. :)
 
Forget the BSOD. That is a given if the hardware is experiencing memory errors as you stated.

Agreed - once I discovered memtest (which is a bootable cd image) was dishing out errors I stopped short of sifting through the memory dumps and chasing tails over what in windows could be causing the BSOD.

From years of PC repair experience, I can say with confidence that if 2 sets of different RAM give you similar issues... keep looking. Corsair and G.Skill are both reputable brands, more than capable of operating under oc'ing conditions, thus capable of withstanding a warm environment. If it's a heat issue, it's with the CPU or another board component.
Yep - I knew it had to be another factor but since the water cooler hadn't arrived yet there was still the issue of all this heat being blown at the modules.

Don't forget that AMD's run the memory bus directly from the CPU. There's no more "northbridge". You may want to make certain that your CPU isn't to blame. I'd also try re-seating the HSF. CPU temps can cause plenty of weirdness.
When I install the watercooler today I will re-seat the processor, and of course as per normal I will take the greatest of care in installing the waterblock (using AS5 paste, bb bullet sized ball in the centre and squish n rotate action etc etc)

If you have the ability, Memtest the RAM in another system. It's easy peace-of-mind.
If you have the ability, I'd also test the CPU in another system... or test another CPU in this system. Keep in mind, the more data you can collect, the easier it will be to determine an absolute cause. :)
Therein lies the limitation of what I can do - my system is DDR1, and the only DDR2 system I have access to is my brothers who lives 3+ hours away!

Thanks for jogging through this though, it's good to help me think about how sure I am about things :)

-cc
 
After you install the new cooling, can you list ALL of your current system spex?
 
Which are you missing?:

Corsair 620W PSU
ASUS Crosshair AM2 motherboard (0904 BIOS)
Athlon 64 AM2 6000+ (not overclocked!)
G.Skill 2x2GB sticks (5-5-5-15 timings)
Nvidia 8800GTS 640MB
Creative X-Fi fatality soundcard

plus two hard drives and a DVDRW....
 
Well, I still don't know which specific Corsair memory you tried. Also, are you setting the memory voltage to the rated setting after you have cleared CMOS? In some cases, you need to clear the CMOS when adding or changing memory.

I would suggest clearing the CMOS, rebooting with a single module (either the Gskill or the Corsair) loading setup defaults, then manually setting the memory voltage to the rated setting. Then test each module individually for 3-5 passes each with Memest v 1.70 from www.memtest.org
 
The answer to your question is yes, I am setting the memory voltage to the rated setting after clearing the CMOS.

As others here have indicated, we've got two reputable pairs - Corsair and G.Skill giving off errors which points to another problem.


For your information however the corsair pair are 2x sticks of "CM2X1024-6400C4D"

The G.Skill pair which I've replaced them with is http://www.gskill.com/en/f2-8000cl5d-pq.html

Interestingly enough despite the BSOD timing and consistency remaining the same - the G.Skill sticks do not exhibit errors within a minute like one of the corsair sticks did. So the G.Skill sticks are shown by memtest to be less error-prone but the actual realworld instability is the same on both pairs.


So now the watercooler is on and heat is totally ruled out (BSODs remain in full force).

I guess the next step is to get another processor since the memory controller is on-die and thus it's hard to fault the mobo......

????


-cc
 
Yes, sorry when I say "giving off errors" I'm referring to memtest errors. Apologies for being vague.....
 
After spending a decent amount of time searching Google, and reading various forums here's what I've found:

--A LOT of people have complained about memory issues with the Crosshair. This includes almost every major brand available. Many speculate that it has to do with the DRAM voltages not working properly (or as set in BIOS).

--Seems that ASUS has RMA'd a lot of these boards for this problem.

--There is a BIOS v.0905 available at ASUS. You may want to update, as many have resolved the RAM issue by simply reflashing their existing BIOS.
LINK TO 0905 (beta)
http://dlsvr03.asus.com/pub/ASUS/mb/socketAM2/Crosshair/0905beta.zip
LINK TO 0904
http://dlsvr03.asus.com/pub/ASUS/mb/socketAM2/Crosshair/Crosshair_0904.zip

Read these forum threads that are linked below (a few that I found most interesting). Especially read LINK 1 in it's entirety.... it contains a lot of information, speculation, and a few possible fixes. After reading these, I highly doubt the CPU is your problem. If the solutions posted in the first link don't help, you may need to RMA your board.

LINK 1 (read entire thread, lots of info.)
http://www.tomshardware.com/forum/245482-30-asus-crosshair-post-errors-boot-issues

LINK 2
http://www.tomshardware.com/forum/209327-30-asus-crosshair-recommendation

LINK 3
http://www.tomshardware.com/forum/235206-30-best-asus-crosshair

There were also a lot of threads, pertaining to memory issues, in the ASUS forum, but sadly their forum script died while I was only 3 pages deep :rolleyes: After that, I couldn't get anything but page-faults from the ASUS forum site. But you should check it out. I'm sure there is more info that could help you.
 
wjogert, I want to thank you for the effort you put into researching the CROSSHAIR issues. Thanks also to everyone else, it's been important to cover all the bases.

After resetting and reconfiguring (and repeating that several times) - I'm now at the point where a memtest can run overnight (12 hours) without a single error!

This is a dramatic improvement in the situation. I will of course get the Corsair pair tested before RMA'ing it, incase it too was a fuzzed config issue instead of bad sticks.



Of course there is one big problem - the BSOD hasn't disappeared. I theorise that the memory corruption that has been there previously caused damaged to one or more system files.

Under general usage (say - boot up and then surf the internet) the system will BSOD in under 5 minutes.

However! I can actually cause it to BSOD immediately by running a windows update file (specifically http://support.microsoft.com/kb/943899 and I chose this update to try because it would replace the core system files that I wanted to rule out)

The moment the .msu file gets to the "searching for updates" point it will BSOD (everytime in the same exact spot).

Always the *same* STOP 50 Page fault error.



I have a minidump file, that shows ntoskrnl.exe is the task which is failing but I lack windbg experience to allow me to probe beyond this.

Furthermore I'm wondering if I should re-create this thread in the OS forum?

Issues to tackle at this point are:

1. Decipher anything further from the crashdump
2. Discover why the Vista bootcd cannot see the windows drive and thus cannot attempt a repair in any way

Also another way I tried to verify the system files was with sfc /scannow - but this is almost as consistent as the windows update in that is BSODs fairly quickly (much more quickly than web surfing!).

Thanks guys

-cc
 
Back
Top