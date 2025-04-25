  • Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

memtest86+ errors, not much observed as far as other problems

5

5pips49

n00b
Joined
Mar 7, 2025
Messages
26
I ran memtest86+ on my computer 2 times.
I would have ran it all in one shot. But, I needed to use the computer. I would think that you can keep running memtest86+ for short periods and just look at the aggregate data.

Here are the results of the first run
(5 errors after 1.5 passes on 16GB of RAM)
Screenshot.png

The second one was 10.5 passes, 0 errors.
Third (4/27/25, 2 days after OP): computer 9 hours off, 1 hour on. Ran memtest86+ for 2 passes. 0 errors.
Fourth (4/27/25, 2 days after OP): computer has been powered on for previous 10 hours. Ran memtest86+ for 1.42 passes. 0 errors.

I have not noticed much in the way of problems with the computer and I have been using it as my daily driver for over a year running Linux Mint 21.2 Cinnamon 64 bit. It's a Dell Optiplex 7010 SFF Desktop, Intel Core i5-3470 3.2GHz Quad Core, Intel HD Graphics 2500.

2 of the USB ports seem to not work (unless that's a software issue).
-
There was also some freezing that was happening with a USB mouse on other USB ports (aside from the 2 that I have already labeled as not working). Here is a further description. At first, I noticed the mouse pointer locking up regularly, maybe once every half hour. Each time that this would happen would only be for 0.9 seconds or so. After that, the computer would return to normal.

I plugged the USB mouse into a different USB port and the problem got worse. Here, the computer was completely freezing.

Was freezing a lot for a brief period. But, when I replaced the mouse the problem stopped.
-
What do you think I should do?
 
Last edited:
pendragon1 said:
up the voltage a bit or lower the timings one notch(-2 on all) and test again.
Click to expand...
What he said. But also, reseat the ram (make sure it's in the right slots if you're not using all slots) and check if you missed any bios updates that say they improve ram stability; that chip is ancient, and I don't remember if it got many updates that helped with RAM.

Also, kind of weird that it tested fine for 10 passes but didn't on a lower number of passes. But those expected vs actual values are way off, and IMHO, nowhere near what should be in RAM during those test numbers. Something wrote to the wrong address or the read came from the wrong address or ?

It's going to be a pain to debug that though, if it doesn't happen consistently. If you remember, how did you start these tests... Cold boot or warm boot, was the system running for a while before hand. I've got a theory, but don't want to lead the witness.
 
All 4 RAM slots are in use.

the 10.5 passes was a cold boot. The computer had been powered off for at least 3 hours before that.
The 1.5 pass: I don't remember.
 
Do you have
5pips49 said:
All 4 RAM slots are in use.

the 10.5 passes was a cold boot. The computer had been powered off for at least 3 hours before that.
The 1.5 pass: I don't remember.
Click to expand...

If the 1.5 pass was a warm boot, my wild theory is one of your devices didn't get fully reset and was DMAing into that RAM during the test. But my usual suspect for that is a Realtek NIC (I've seen them do some weird stuff!) and from the dell specs, the Optiplex has intel LAN, which I've usually found to be rock solid (reports are that's not true for the 2.5/5gbps series, but I've only used their powers of ten equipment). Could be temperature changes if it shows up when you reboot after running hard though.
 
I am running the latest firmware. Granted, it is old, in terms of its age in years.

I don't know the term "DMAing."

"Could be temperature changes if it shows up when you reboot after running hard though."
As far as I understand, the computer has always been used for light tasks. The day that I did the 1.5 passes would have been max of a couple of web browsers and a Linux Mint VM with 1 web browser. Also, very little of the drives are encrypted (2 SSDs). It's possible that the computer could have been running for a while, though, like 12 hours.
 
Last edited:
Direct Memory Access, a way for hardware to access memory directly.

It's (mostly) the same core, and the same address every time. That could mean something, or could be coincidence. If you swap the memory and the address of the error changes (but it's always on the new address), then it could be a faulty memory module. If it only happens when it's warm, it might be a bad trace/solder joint, or there could be some metal that sometimes causes a short.
 
I have done some research on adjusting the timings. I never knew about such a thing until reading it in this thread. It doesn't seem as though that's possible in the BIOS for this model (Dell Optiplex 7010), though. I also learned about something called Intel XTU. That's a windows program, though, and I am running Linux. I assume that running Intel XTU on a Windows virtual machine doesn't work for our purposes?

Source: https://www.dell.com/community/en/c...ing-at-1333-not-1600/647f69ecf4ccf8a8de6fea0a

I don't see any way to adjust the voltage in the BIOS either. Some general Dell instructions said to look in the BIOS for "Overclocking" or "CPU config." But, I don't see either. I don't think my model offers the feature.
 
Last edited:
Nobu said:
Direct Memory Access, a way for hardware to access memory directly.

It's (mostly) the same core, and the same address every time. That could mean something, or could be coincidence. If you swap the memory and the address of the error changes (but it's always on the new address), then it could be a faulty memory module. If it only happens when it's warm, it might be a bad trace/solder joint, or there could be some metal that sometimes causes a short.
Click to expand...
Is the suspected bad memory in slot 0 (or maybe it's called slot 1)? So, I can move it to the adjacent slot?
 
I have a question about continuing to use this computer.
quiet corruption of files is one of the things at stake?
 
Nobu said:
the same address every time. That could mean something, or could be coincidence.
Click to expand...

The addresses are similar, but not the same. For the first test, it should be all 1s except one bit, and it's some crazy value. For the second test that failed several times, it should all zeros, but there's the same crazy value.

Those sections of ram must at least mostly work, but clearly something is weird. That's why I suspect a device doing wild DMA; like it's configured to write data in the wrong place. Most of the time, it won't hit the test, but sometimes you get lucky.

5pips49 said:
I have a question about continuing to use this computer.
quiet corruption of files is one of the things at stake?
Click to expand...

Yeah, that seems likely. Let's say you write a file, it doesn't get written to disk right away, it sits around in memory for a while. Then whatever happens and some of the data gets overwritten by this junk. When the OS finally writes it to disk, you've got a junky file. Same thing could happen to any data that sits in memory.

Dell SFF with no memory tuning options unfortunately sounds normal.

Do you have an option to turn on VT-d, aka IO-MMU in the firmware, and enable its use in Linux? If so, this could help prevent corruption... When using an IO-MMU, devices can't do DMA (read and write from system memory) to any address, they are constrained to the addresses the OS expects them to use. If they do try to write to an address they shouldn't, you'll get a kernel panic. Which isn't much fun, but maybe you can figure out more if it happens.
 
  • Like
Reactions: Nobu
like this
toast0 said:
Dell SFF with no memory tuning options unfortunately sounds normal.
Click to expand...
completely normal
and i dont recall seeing that when i first responded, otherwise i wouldnt have suggested it. there has been at least one ninja edit in the op....
 
pendragon1 said:
completely normal
and i dont recall seeing that when i first responded, otherwise i wouldnt have suggested it. there has been at least one ninja edit in the op....
Click to expand...
Yes, I think I did edit and add in the specs of the computer, probably seconds after you read the OP.
 
toast0 said:
Do you have an option to turn on VT-d, aka IO-MMU in the firmware, and enable its use in Linux? If so, this could help prevent corruption... When using an IO-MMU, devices can't do DMA (read and write from system memory) to any address, they are constrained to the addresses the OS expects them to use. If they do try to write to an address they shouldn't, you'll get a kernel panic. Which isn't much fun, but maybe you can figure out more if it happens.
Click to expand...
I don't see VT-d in the BIOS
The closest that I can find is, under Virtualization Support, "Enable VT for Direct I/O." I think that has something to do with virtual machines having direct access to some hardware. That seems different, right?

I see this in the user manual for my computer:
Intel CoreTM i5 3470 / 3.20GHz, 6M, VT-x, VT-d, TXT (vProTM), 77W
(notice the "VT-d" in there)
 
Last edited:
5pips49 said:
I don't see VT-d in the BIOS
The closest that I can find is, under Virtualization Support, "Enable VT for Direct I/O." I think that has something to do with virtual machines having direct access to some hardware. That seems different, right?

I see this in the user manual for my processor:
Intel CoreTM i5 3470 / 3.20GHz, 6M, VT-x, VT-d, TXT (vProTM), 77W
(notice the "VT-d" in there)
Click to expand...

That's likely the one. The feature was developed for virtual machines, to protect the host from an evil guest. But here, the same feature should be able to protect your host from a bad device. Afaik, you won't be able to test that in memtest, since memtest wouldn't know how to program the IO-MMU.
 
Don't run with memory errors. It will slowly (or not so slow) scramble your harddrive contents.
 
toast0 said:
That's likely the one. The feature was developed for virtual machines, to protect the host from an evil guest. But here, the same feature should be able to protect your host from a bad device. Afaik, you won't be able to test that in memtest, since memtest wouldn't know how to program the IO-MMU.
Click to expand...
I would think that in order to make use of that setting, I would have to do my computing in a virtual machine or some kind of containerized environment. Is that what you had in mind?
 
5pips49 said:
I would think that in order to make use of that setting, I would have to do my computing in a virtual machine or some kind of containerized environment. Is that what you had in mind?
Click to expand...

No, enable it in bios, and then in your OS. Try adding intel_iommu=force to your kernel command line and check for the messages about using it and faults mentioned here.

https://www.kernel.org/doc/html/v6.2/x86/iommu.html

Windows has a Kernel DMA Protection mode which can do the same thing.
 
toast0 said:
Do you have


If the 1.5 pass was a warm boot, my wild theory is one of your devices didn't get fully reset and was DMAing into that RAM during the test. But my usual suspect for that is a Realtek NIC (I've seen them do some weird stuff!) and from the dell specs, the Optiplex has intel LAN, which I've usually found to be rock solid (reports are that's not true for the 2.5/5gbps series, but I've only used their powers of ten equipment). Could be temperature changes if it shows up when you reboot after running hard though.
Click to expand...
That actually sounds plausible given the pattern in the error in the screenshot OP posted. I'd expect single bit errors, or maybe double bit errors. I've had a kit of ram go bad twice in the last few months. The original set was fine for almost 4 years, then one day the machine started crashing occasionally. Ran memtest86, found single bit errors, isolated the bad stick (only one out of 4 was bad), RMA to G.Skill. The replacement did the same thing a few months later. Again, single bit errors. That pattern looks more like a complete overwrite, and it's repeating.

It's not uncommon for hardware to not be completely reset on a warm boot. Sometimes if a piece of hardware is in a really bad state the only way to get it back to normal is a power cycle - warm reboot may not do it. No idea what's causing it, but I'd suggest some more testing. If the machine's not showing signs of instability and it only fails after a warm reboot it seems likely that the memory is fine.
 
toast0 said:
No, enable it in bios, and then in your OS. Try adding intel_iommu=force to your kernel command line and check for the messages about using it and faults mentioned here.

https://www.kernel.org/doc/html/v6.2/x86/iommu.html

Windows has a Kernel DMA Protection mode which can do the same thing.
Click to expand...
the stuff about Linux is all Greek to me.
The only thing that I understand is that our hypothesis is that one or more device is writing to RAM. I don't know the definition of device in this context. Are "parts" that are built into the motherboard included? (Like chipset, processor, graphics.)
Call me crazy but software "work" seems daunting and makes moving to a new computer sound like an attractive option.
As far as software in Linux, single line commands in the command line that I have studied and used for 30+ hours, still have daunting 'man pages.' (For non-Linux users, man pages are a usually 20 or more pages explaining a single command. It's the official manual.)
Or am I just making a mountain out of a molehill?
 
5pips49 said:
the stuff about Linux is all Greek to me.
Click to expand...

Ok well, there's no better time than now to figure some of it out. :D

https://forums.linuxmint.com/viewtopic.php?t=349669 will tell you how to set kernel command line parameters, and https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html shows the parameters you might need to change. for now, try iommu=force and then run dmesg | grep DMAR (or sudo dmesg | grep DMAR) you would hopefully see something like PCI-DMA: Using DMAR IOMMU which means it's being used. Then if you have any weirdness, check dmesg again and look for something like DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000 ... if you do lspci -v, that will list all the pci devices, and you should be able to find a matching device. If you're lucky, it's something you can disable and try living without that, and maybe it keeps working enough.

5pips49 said:
The only thing that I understand is that our hypothesis is that one or more device is writing to RAM. I don't know the definition of device in this context. Are "parts" that are built into the motherboard included? (Like chipset, processor, graphics.)
Call me crazy but software "work" seems daunting and makes moving to a new computer sound like an attractive option.
As far as software in Linux, single line commands in the command line that I have studied and used for 30+ hours, still have daunting 'man pages.' (For non-Linux users, man pages are a usually 20 or more pages explaining a single command. It's the official manual.)
Or am I just making a mountain out of a molehill?
Click to expand...

Yeah, everything everything is a device, other than the part of the cpu that's running instructions. Stuff like the network card built into the motherboard, graphics (even if it's built into the cpu), even some other junk built into the cpu.

Moving to a new computer isn't a bad idea; that cpu launched 13 years ago, but this is a good opportunity to learn! Don't let it slip by. Unfortunately, I think it's probably a tricky one to confirm. I've got an issue with a flakey computer that doesn't have the decency to fail a memory test; I ended up doing a new build so I could work on it later, and I don't know if later is going to happen. :p
 
5pips49 said:
The only thing that I understand is that our hypothesis is that one or more device is writing to RAM. I don't know the definition of device in this context. Are "parts" that are built into the motherboard included? (Like chipset, processor, graphics.)
Click to expand...
Yes. In this context anything that connects to the PCI-e bus counts as a device. Chipset and its built-in functions, sound, NIC, WiFi, any extra USB or SATA controllers, anything in a PCI-e slot including M.2 SSDs, etc.

Memtest86 likely doesn't initialize all the devices in the system. Some are set up by the BIOS at boot time. Others are left for OS drivers to set up, especially if you have fast boot enabled. That's a thought. Maybe try turning fast boot off if you have it enabled and see if it makes a difference running memtest86 after a warm reboot. Fast boot skips non-essential devices and lets the OS initialize them.
 
zandor said:
Yes. In this context anything that connects to the PCI-e bus counts as a device. Chipset and its built-in functions, sound, NIC, WiFi, any extra USB or SATA controllers, anything in a PCI-e slot including M.2 SSDs, etc.

Memtest86 likely doesn't initialize all the devices in the system. Some are set up by the BIOS at boot time. Others are left for OS drivers to set up, especially if you have fast boot enabled. That's a thought. Maybe try turning fast boot off if you have it enabled and see if it makes a difference running memtest86 after a warm reboot. Fast boot skips non-essential devices and lets the OS initialize them.
Click to expand...
1. I guess I can turn off the USB controllers for the USB ports that don't work. Can't hurt, right?
2. I don't see any option for fastboot in the BIOS. I did a web search, and there is
a. mention of the option disappearing when you have secure boot turned off. I have secure boot turned off because virtual machines wont boot when it's on.
b. mention of it being a Windows feature. I don't run Windows.

Side rant: search engines produce utter trash over the last few months. I search for something about linux and the first 5 results are all talking about Windows, nothing about Linux.
 
Last edited:
5pips49 said:
1. I guess I can turn off the USB controllers for the USB ports that don't work. Can't hurt, right?
2. I don't see any option for fastboot in the BIOS. I did a web search, and there is
a. mention of the option disappearing when you have secure boot turned off. I have secure boot turned off because virtual machines wont boot when it's on.
b. mention of it being a Windows feature. I don't run Windows.
Side rant: search engines produce utter trash over the last few months. I search for something about linux and the first 5 results are all talking about Windows, nothing about Linux.
Click to expand...
Sounds like you don't have fast boot or it's off because of secure boot being off. It's a BIOS feature and a Windows feature. They're different things using the same name just to piss us all off. It was just something to try.

Does the machine have any problems other than the weird memtest86 failures? Some random device writing to a spot in ram memtest86 is testing isn't necessarily a problem except for memtest86 reporting failures. If you're not getting crashes it's probably fine. Just some device that misbehaves on a warm boot if the OS doesn't slap it into line.

Or you could just upgrade. Might be less hassle and you'll have a much faster system.
 
pendragon1 said:
test the first stick on its own, if it still errors, turf it.
Click to expand...
I don't know which ones to take out and which one to leave. The user manual doesn't seem to be a help either.
On the motherboard, next to the ram slots, there are stickers each with a number next to them. I think that the numbers are 1, 2, 3, 4. Remember, the 5 errors were at 3.24GB address location. The 1 is printed next to one of the middle 2 RAM slots.
 
zandor said:
Or you could just upgrade. Might be less hassle and you'll have a much faster system.
Click to expand...
The system seems fast enough. I don't know how close I get to full resource use. I have never paid much attention. I guess having Windows 11 in a VM will take some insane amount or resources. 6GB w/ pretty much no programs running? Insane!
the things that I don't like about the computer (pretty much in this order- #1 is the thing I hate the most):
1- It's a Dell. Then again, I don't like HP, Lenovo, and generic brands either. I probably wouldn't like the brands that I haven't mentioned but I just don't have any experience with those yet.
2- need an easy way to have some SSDs in there. Right now I have the 2 drives haphazardly positioned;
3- the USB ports piss me off, when they decide to not work. Lately, I have been unplugging my USB printer to plug in an external drive here and there. I probably don't have to do that but some of the other ports are too bunched together.
4- I don't like the choice of video outputs- 2 Display port, 1 VGA.
5- closed source motherboard firmware.
 
5pips49 said:
In the OP, I describe the USB port problems. Referring you back to that instead of cluttering up the thread downstream.
Click to expand...
That's unlikely to be caused by bad ram. USB ports failing and no other problems doesn't generally point to ram. Usually bad ram results in system crashes, application crashes and corrupted data. USB3 was brand spanking new around that time. IIRC Sandy Bridge didn't have it built in but a lot of Sandy Bridge boards used 3rd party chips to add it. Ivy Bridge, your CPU, was the next generation and the first to have it built into an associated Intel chipset.

I had a USB issue with my current board. It's an ASRock Z890 Steel Legend. It's got a couple of "low latency" USB ports for a gaming keyboard and mouse. They didn't get along well with a keyboard and mouse plugged into a monitor and routed through a cheap USB switch. I use the switch to swap the keyboard and mouse between my machine and my work laptop. They'd lose connection briefly, especially the mouse. Switching to a different USB port on the main board fixed the problem. I have another keyboard and mouse hooked up to this system through a different monitor without a switch for my gaming setup. Those seem fine going through a "low latency" USB port.
 
5pips49 said:
The system seems fast enough. I don't know how close I get to full resource use. I have never paid much attention. I guess having Windows 11 in a VM will take some insane amount or resources. 6GB w/ pretty much no programs running? Insane!
the things that I don't like about the computer (pretty much in this order- #1 is the thing I hate the most):
1- It's a Dell. Then again, I don't like HP, Lenovo, and generic brands either. I probably wouldn't like the brands that I haven't mentioned but I just don't have any experience with those yet.
2- need an easy way to have some SSDs in there. Right now I have the 2 drives haphazardly positioned;
3- the USB ports piss me off, when they decide to not work. Lately, I have been unplugging my USB printer to plug in an external drive here and there. I probably don't have to do that but some of the other ports are too bunched together.
4- I don't like the choice of video outputs- 2 Display port, 1 VGA.
5- closed source motherboard firmware.
Click to expand...

2. Standard way for SSDs is M.2 slots on the mainboard. No cables, just stick the thing in a slot and add one screw to hold it in place. Some boards have clips or something so you don't even need the screw. If you want to keep your current SSDs you'll probably just need a case with a couple of 2.5" mount points. I'm assuming they're SATA SSDs since they're haphazardly positioned. It's more or less impossible to haphazardly position M.2 SSDs.
3. USB is still irritating but I think it's gotten better.
4. What do you want for display outputs? New stuff tends to have a mix of HDMI + DisplayPort, and sometimes the DisplayPort gets replaced with USB-C on mainboards and usually on Laptops.
 
zandor said:
IIRC Sandy Bridge didn't have it built in but a lot of Sandy Bridge boards used 3rd party chips to add it. Ivy Bridge, your CPU, was the next generation and the first to have it built into an associated Intel chipset.
...
I had a USB issue with my current board.
Click to expand...
The computer has 2x usb 3 & 8x UBS 2.0.
2 of the USB 2.0 I don't use because they don't work at all.
I haven't used the USB 3 because there is too much junk in the way (mainly papers on my desk and other misc personal items.)
 
I reseated 2 of the RAM chips. Then, I ran 2 more passes right after some light computer use. At that point, my computer had been running for about 1 hour and before that was powered off for 9 hours. But, only light computer use (a couple of web browser windows). 0 errors.

Edit: added this to the OP
Fourth (4/27/25, 2 days after OP): computer has been powered on for previous 10 hours. Ran memtest86+ for 1.42 passes. 0 errors.
 
Last edited:
It's entirely possible that failure is not reproducible. If you can't repeat it I wouldn't worry about it.
 
You must log in or register to reply here.
Back
Top