Memtest86 failed?

dewhite · Feb 12, 2004

Hey guys, I've been trying to iron the wrinkles out of the system in my sig for quite some time. I think I've finally isolated the problem with a run of memtest86.

I'm wondering who can tell me what this means:

Code:

test   pass       Failing Add     good         bad          error bits    ct
2      1,2,3...   00000000410     00000410     000000414    00000004      1

This failure repeats itself on every succesive pass of test number 2. I'm guessing bad module - but how do I know which of them is toast?

The Donut · Feb 12, 2004

Take one of the modules out, re-test, no errors means its the other one, if you get errors, then it's that one.

jmcmike · Feb 12, 2004

I think, think that the failing address is a hex address in memory that should count upwards from DIMM slot 0. What I don't know is what the block size is, which you could multiply by the address to get the actual position in memory. Armed with this, you could determine which stick has the problem, based on whether the number is below or above the amount of memory in your first stick.

Of course, you could just remove one DIMM and see if you get the same error.

8Complex · Feb 12, 2004

IIRC, I just read the instructions last week and it was saying that it really could be a number of things, not even necessarily a bad ram chip. I would do as above, test with one, then with just the other.

dewhite · Feb 12, 2004

OK this is wierd as HELL.

I removed both modules and tested each individually. The module from DIMM2 (I had modules in 2 and 3 for dual channel operation on my NF7) was the only one that reproduced the above error. The other modules (from DIMM3) produced no errors.

HOWEVER, after removing the module which showed itself damaged my system is still producing the same problems. I didn't outline those symptoms because I didn't wanna make a super long post - but lemme copy/paste what I had written in the data storage forum previously (I had originally thought this was a data corruption problem).

Here follows my original description:

I have sorta a wierd problem, so I'm gonna have to go into a little detail here. It seems like I'm currupting data on the raid0 setup listed in my sig. I have a generally stable setup which can run prime95 for 24 hours. I don't have any problems with crashing etc. However, when I try to run large compressed self-executing files (read: battlefield1942 updates) they report themselves as being corrupted or damaged. I've tried downloading the files from several sources. I've also tried running them from network shares on my roommates' computers, on which they install flawlessly. Still on my machine they report as corrupt or throw CRC errors out the wazzu.

This really makes me wanna tear my hair out because I have done everything I can think of to figure this out. I have tried running MD5 sums on large files like Linux ISOs etc. The MD5s come back clean, and when I try copying BF1942 updates from my system to the roommates they execute without trouble.

SO - I _think_ my data storage system can be eliminated from the trouble shooting loop. The question is HOW IN THE HELL can I be corrupting information in runtime while still passing prime95, and memtest86 with flying colors?

I suppose its also worth mentioning that I am now running my system at stock 2500+ speeds and have the memory set to SPD - 8,3,3,2.5

I'm running out of patience - I might do something crazy like sell this piece of shit and buy a mac if I can't get it to do right...

jmcmike · Feb 12, 2004

Originally posted by dewhite
HOWEVER, after removing the module which showed itself damaged my system is still producing the same problems.

So you're saying that if the suspected DIMM is removed, and there is no longer a DIMM in that slot, you get errors on a DIMM from another slot?

dewhite · Feb 13, 2004

no...

I'm saying that once I remove the module which showed errors under memtest86, the remaining module clears that test without circumstance.

However, my stability issues persist, even without the module which memtest86 shows as damaged...

8Complex · Feb 13, 2004

I had some massive stability problems with my system that drove me nuts for a few weeks. Turns out that my overclock was randomly locking up the machine every now and then (and it was only a 12% overclock). If I read right, though, you're not overclocking it right now, but you used to be (reading your sig)?

jmcmike · Feb 13, 2004

If you can norrow the problem down to only certain types of executables or archives then it may be a software issue. Do you have another partition to do a clean install with?

8Complex · Feb 13, 2004

Just thought of another thing... is it consistent by the chip or is it consistent by the dimm slot that you're putting it into?

dewhite · Feb 13, 2004

Originally posted by 8Complex
Just thought of another thing... is it consistent by the chip or is it consistent by the dimm slot that you're putting it into?

goooood question. I'll check that out this evening when I get back from my scuba course.

In answer to the other questions:

1. I used to overclock, but in the interest of solving this problem, I'm currently running stock 2500+ speeds.

2. If I can't find a hardware explanation for this problem soon, I'm going to reformat as a matter of the process of elimination.

3. Thanks for the replies guys, I appreciate the help/support...

dewhite · Feb 13, 2004

Originally posted by 8Complex
Just thought of another thing... is it consistent by the chip or is it consistent by the dimm slot that you're putting it into?

I went back and checked my notes - and then double checked the results. The module which was originally located in DIMM2 is the one throwing errors. It throws errors in either slot (DIMM2 or DIMM3), and the module which was originally located in DIMM3 shows clean - both in it's original slot, and in DIMM2. -- So... The damaged slot theory can be eliminated.

Beyond that, I'm sorta at a loss for what to try next. I'm RMAing the module which I can prove bad, and I'm going to ship them the module which isn't throwing errors as well. We'll see if they'll help me out and replace them both so I can absolutely take that out of the trouble shooting loop.

Ideas guys?

EDIT: In the interest of taking suspicion off of the network adapter, I'm gonna have a roommate download and burn the UT2k4 demo self extracting executable to a CD for me. If I get the same CRC errors with the CD that I've been experiencing on my system, then I can safely say that the network adapter/sata controller aren't at fault right?

computerpro3 · Feb 13, 2004

what is your raid solution...do you have the proper drivers installed? What is your block size? are your motherboard 4 in 1 drivers up to date? Windows updates up to date? Sorry if those are elementary questions, but this problem is weird.

dewhite · Feb 13, 2004

Not a problem - questions are what I'm looking for here. Something, anything to get me ontrack to a solution.

Here are the answers to your questions:

1. My raid solution is the onboard SiImage 3112A sata controller on my NF7 - the drivers have been updated to the current versions found on SI's website - (raid v1.0.0.40).

2. All of my other drivers, and my BIOS have been updated via abit's website - (forceware 3.13 w/o swIDE - and BIOS v21).

3. I'm using the default WinXP chunk size (32k right?), and 32k block size.

4. NF2 mobo's don't have 4 in 1 drivers, that I know of - I think that's only via boards.

5. I have every available windows update hotfix installed on top of WinXP SP1a.

6. Thanks for the ideas, keep'em coming guys...

Memtest86 failed?

dewhite

Gawd

The Donut

2[H]4U

jmcmike

Limp Gawd

8Complex

Limp Gawd

dewhite

Gawd

jmcmike

Limp Gawd

dewhite

Gawd

8Complex

Limp Gawd

jmcmike

Limp Gawd

8Complex

Limp Gawd

dewhite

Gawd

dewhite

Gawd

computerpro3

LightningRod

dewhite

Gawd