I have some problems with my new Xeon 4P rig, need some help?

My 3rd processor arrived today and when I installed it, it isn't being seen by the BIOS. Does this mean that it is bad or is there a jumper that enables sockets 3 and 4. Thanks for your replies.
 
Intel 4P 2011 only works in 1, 2 and 4-CPU configurations so... you'll either need to install
new CPU in socket 1 or 2 to test it or get the fourth CPU...
 
I took the new CPU and installed it into socket1 and the original socket1 CPU into socket3 and it still doesn't see socket3. Atleast the CPU works. I have Kingston KVR16R11D4/16 1.5V DDR3 memory as per SM recommendations. As a final task before my reply, I installed the original CPU from socket1 and installed it into socket4...and still not being seen by the BIOS.
 
3 CPUs won't work, no matter how you slice'm or dice'm.

I only suggested possible course of action to verify operation of the most-recently-received CPU == trying it in dual-CPU config.
And it seems it does work :) Next step is getting 4th CPU.
 
From the way things are going, I kinda' knew someone would say that...! At anyrate I really appreciate your help. Thanks.
 
No idea what these connectors are for (I don't have access to my board now either) but I would expect
jumpers to be marked as JP (not plain J). Are there actual jumpers on those pins?

In any way I'd be very cautious in putting a jumper on either of those...
 
I have yet another question about my system. When I have linux installed, the system runs very stable (system fans idle at a resonable speed, memory managed very well, cpu utilization very low, etc). On windows 2008 R2, the only issue with stability is the high speeds of the system fans.

I installed Windows 2008 R2 a few days ago and everything installed perfectly. I loaded the SM MB drivers, installed my software without a hitch...everything installed great. After a few day, I noticed that my chassis intake and exhaust fans would ramp up to 3200 rpm and stay there until I opened that windows task manager. At which point the fans would ramp down to the idle speeds that I would see in linux. Has anyone had this problem? Or, knows how to fix it. I contacted SM tech support and they told me to reset the IPMI settings back to its default factory settings, but that didn't help one bit. Thanks.
 
My rig do exactly the same, not every time but now and then, and it happen only with P8101. I asked in this forum for a possible solution, but there have not been any answer at this time. Some time the TPF can be as high as 10:35, but after a reset it comes down to 9:0x and lower, then after a few % it usually is down to around 8:50 +/- a few second.

The same can occur on my SR-2 Xeon 2P E5645 rig so I think this is a possible Intel CPU problem?

Just and observation from a C/C++ software developer, but if the TPF rises with successive work units, that implies to me there may be a memory leak in the folding core, resulting in fragmented memory. As the core needs to allocate memory, the system takes more and more time to find the available contiguous memory space. If no contiguous memory space is available, then the operating system has to do some administrative shuffling or outright reject the memory request. The rejection would likely cause the core to crash and terminate. (Think early Windows programs...) This could be a bug fix in a new core soon.
 
This could be a bug fix in a new core soon.
I would think it's probably a Windows issue, or yes, maybe, a Folding core issue that manifests itself only with Windows. I don't recall any multi-CPU/Linux users reporting such an issue with Folding.
 
I would think it's probably a Windows issue, or yes, maybe, a Folding core issue that manifests itself only with Windows. I don't recall any multi-CPU/Linux users reporting such an issue with Folding.

It's a NUMA issue with multiple CPUs. The fix requires a kernel re-write, which the linux team is working on, but there's no eta. I don't know how Windows deals with it, or if it's an issue there.
 
This is a funny issue. I've never had this issue while using linux (SLES, CentOS, Ubuntu,etc). The problem only rears its ugly head while using Windows. I've been on the phone with a SM technician and he has been very helpful, but the problem is still there. Basically, he assumed that the problem could be a issue with the power supply, so he had me remove them one-by-one before each reboot. Then he had me reinitialize the IPMI to the factory default settings. Now at the point where I am beginning to investigate the hardware components, ie; graphics card, RAID card, sound card, etc, and there placements on the motherboard. He also said that SM ran tests (based off of my configuration) to see if they could replicate my problems. Although I am not folding, I am running some very intense rendering tasks that could warrent winelovernm and Linden's conclusion. I'll keep you guys informed. Thanks.

BTW: The X9QRi-F+ must have either: 1P, 2P, or 4P setup. A 3P configuration will not work. This is directly from the SM techrep. Thaks again.
 
Most of us are old and/or deaf around here -- we don't pay attention to minor inconveniences... :)
 
Ok, I got a email from the SM tech rep with a new version of the IPMI firmware (SMT_X9_226). The guy was graceful enough to walk me through the IPMI firmware installation, settings and relevant perameters. So far, everything is working great. I am now able to perform HD renders with the fan speeds staying pretty consistant and not reving up dramatically. I've also noticed that the CPU doesn't get as hot as before. I've gotta' say that the SM techreps are great.

Most of us are old and/or deaf around here -- we don't pay attention to minor inconveniences...

I guess covering your eyes would complete the handicap...? Lol.

Thanks for your help tear.
 
After weeks of determining what was causing the issues with my fan speed, I am happy to anounce that I found the problem. It turnes out that the "Indexing Service" was the issue. I am not sure why the indexing service would cause my intake/exhaust fan speeds to ramp up to 4000RPM, but when I disabled it:

Start > Administrative Tools > System Configuration > Services > (Uncheck) Search Indexer

I then rebooted my system and the fan speed issues went away. I hope that this information is useful to anyone having high fan speeds.
 
The "indexing service" can be a true pain. It consumes CPU time and seems to be multi-threaded. Nice job on finding the fix.

Microsoft used to have an OpenGL screen saver that used 50% of the CPU cycles back around the Win 2000 Server days. It looked great, but server performance took huge hits. Amazing how the unintended consequences of some "services" can cause us major problems.
 
Today I put a new 4P E5-4650 rig in folding. The motherboard is from November 2012 and has a completely different bios compared to my previous X9QRI-F + motherboard. NUMA is enable, but I can not find any options for node interleaving, but maybe it does not exist anymore? Otherwise, it appears that the same TPF (8101) as in the previous rig also appears on this rig.

I still have a problem that is as follows. The memory I thought to use did not work, so I borrowed 8 pieces from my first E5-server. That result in the TPF for the (8102) increased from 6:56 to 9 minutes. The WU 8102 was taken down while the server was populated with 16 pieces, so maybe this could be the reason TPF increased so much? I'll see if it improves in the next WU.
 
Last edited:
Today I put a new 4P E5-4650 rig in folding. The motherboard is from November 2012 and has a completely different bios compared to my forrrige X9QRI-F + motherboard. NUMA is enable, but I can not find any options for node interleaving, but maybe it does not exist anymore? Otherwise, it appears that the same TPF (8101) as in the previous rig also appears on this rig.

HFM just needs frames to update the TPF, no worries there
 
NUMA enabled = node interleaving disabled - you really shouldn't have two settings for this. As long as NUMA is enabled, you are fine.
 
Thanks musky, maybe I just did not look hard enough in bios, so I'll try again.
 
NUMA enabled = node interleaving disabled - you really shouldn't have two settings for this. As long as NUMA is enabled, you are fine.
+1


You may also wish to run
Code:
fahdiag | pastebinit
(after folding has been running for a while) and share resulting URL.

This will gather add'l data from the machine.
 
I got the message: "You are trying to send an empty document, exiting." I think the fahdiag is not installed! How do I make the installation?

Edit: Ok I did find it, the url is: http://paste.ubuntu.com/6178761/
 
Last edited:
Ok, so we see that:
1. NUMA is enabled [more than one node reported] -- good
2. Add'l applications are open (firefox is definitely adversely affecting your performance, not
  sure where wineserver or python came from) -- if you run them on regular basis
  that's probably bad

Recommendations:
1. Ensure (with i7z) that your CPUs are running all-core turbo
2. See if getting rid of other apps (firefox/python/wineserver) helps your performance
3. Sometimes (for lack of better explanation) unit starts in an unfortunate fashion and
  performance is somewhat degraded.
  An indication of this is percentage of CPU time used by the system (see vmstat output,
  third column from the right; we'll add headings in next version of fahdiag btw); having
  %sy at or below 10 is preferred (you're at 12).
  Restarting the client may (or may not) improve performance.
4a. It doesn't seem you're running a [H] ramdisk setup or otherwise fahinstall-based
   setup -- not bad in principle but you sure could run "fahinstall" on top of your
   installation (will need to stop the client beforehand so ramdisk can be set up)
4b. Instead of 4a above, you could migrate to latest [H] folding appliance which
   is already optimized for high output machines
 
Thanks tear.
The programs you mention was locked to the Launcher. I unlock them all, and I can not se them anymore in fahdiag. I will install ramdisk at the next change of WU. About 4b. Do you mean V7 in any kind?

But, I have a similar problem with my first E5-4650 server. After I went from 16 down to 8 pcs of memory the TPF went down about 2 min. On this server I am running f@h from a ramdisk. Here is fahdiag result. http://paste.ubuntu.com/6179323/
 
Last edited:
4b -- I meant switching to this: http://hardforum.com/showthread.php?t=1769083

Re your other machine
It's running ancient fahdiag please upgrade by either pulling new fahdiag or just running
"sudo fahinstall" (which "does everything"). Also, given ramdisk is already set up, there
is no need to shut the client down when running fahinstall.
 
alias you appear to have a problem with CPU #1 has it been running hot. Or maybe a memory problem.

also your vmstat is not so good 84 16 is not the best if this is an 8101 you were running it should be more like 90 10 + or - 1 or 2 you would probably benefit from a restart of fah when it is that low (84) but I am unsure of why it is getting that far off. I normally do not ever see anything on my rigs below 88 and usually it is 90 to 92 depending on WU 8101 is usually the worst and normally runs around 90.

Last 20 FahCore crash messages:
/var/log/syslog:Oct 1 17:48:35 Server4650 kernel: [83320.789025] thekraken-FahCo/5518: potentially unexpected fatal signal 11.
/var/log/syslog:Oct 1 17:48:35 Server4650 kernel: [83320.789029] CPU 0
/var/log/syslog:Oct 1 17:48:35 Server4650 kernel: [83321.091188] FahCore_a5.exe/5516: potentially unexpected fatal signal 11.
/var/log/syslog:Oct 1 17:48:35 Server4650 kernel: [83321.091194] CPU 12
 
Thanks Grandpa_01
As fare as I know there have been no problem with the temperature on CPU1 or any of them. IPMI shows that they are all normale. The problem started when I have to borrow 8 pcs memory from this rig to be able to put the new one in folding. This take place in the middle of an ongoing P8102 WU. I think the messages in syslog came from that moment. So I think your indication to a memory problem is correct.

I am waiting for 16 new pcs of memory from the whereever to arrive to get both servers in good shape again. Here in Norway 16 pcs of memory for this rig cost $1000 that is quite expencive. For that reason I am looking at ebay to find a cheaper deal. How is the memory cost in the USA for 16 pcs of Samsung M393B5273CH0-YH9?

Here is the situation from today, and it looks better to me. http://paste.ubuntu.com/6182727/
TPF on P8101 is now down to 7:35 with 8 pcs of memory, that I think is about normal. With 16 pcs the TPF was around 6:53 - 7:08. In my experience the difference between 8 and 16 pcs of memory is 30 sec +/- some sec.

How do I delete the syslog from within Ubuntu?
 
Last edited:
Thanks
And I come to think of, is it safe to delete syslog? Beside that, there is 2 of them also, syslog and syslog.1
 
Thanks
And I come to think of, is it safe to delete syslog? Beside that, there is 2 of them also, syslog and syslog.1

It is perfectly save to remove both, other than that you lose the information inside it.
The reason that there are two of them is because of 'log rotation' or 'log rollover'; you can look that up if you are interested.
 
As long as you pay attention to timestamps (when interpreting) in the logs you will be fine keeping them...
 
Both E5-4650 servers is now back to normal with regards to the expected TPF they should have. I do not know why they both started with relatively high TPF and then come down to normal after folding of 4 to 5 projects. Thanks to everyone who gave me good advice as development progressed towards normal.
 
One question again to all you experts?
What brand / kind of memory are you guys using in the Supermicro X9QRi-F+ motherboard?

I have been using Samsung M393B5273CH0-YH9, but this memory is very expencive. I ask because in our team we are now 2 or 3 that are building on this toy at the moment.

One of us have tried the Kingston KVR16R11S8/2, but for some reason we do not know, the bios does not recognize all of the chips, only 2 CPUs (1 and 2) is visable. This memory can be find in the Kingston list for Supermicro X9QRi-F+ motherboard. http://www.ec.kingston.com/ecom/configurator_new/modelsinfo.asp?SysID=78545&mfr=Supermicro&model=X9QRi&search_type=&root=&LinkBack=&Sys=78545-Supermicro-X9QRi-F%2B+Motherboard&distributor=0&submit1=Search
 
Last edited:
Anything un-registered, un-buffered, and ECC should do you. The stuff you are looking at is registered, which is why it is expensive. It certainly isn't needed, and may actually not work, especially if you are mixing registered and unregistered.
 
Currently KVR1333D3E9SK2/4G (8 kits for total of 16 DIMMs).

I also successfully ran Samsung M391B2873GB0-YH9 (albeit it yielded tad lower performance -- I suspect due to being single-rank).

+1 to musky's point.
 
Anything un-registered, un-buffered, and ECC should do you. The stuff you are looking at is registered, which is why it is expensive. It certainly isn't needed, and may actually not work, especially if you are mixing registered and unregistered.
Thanks for answering, but I have tried a number of different memory types and the only thing that has worked for me so far with Supermicro X9QRi-F + in combination with ES editions of E5-4650 has been Samsung M393B5273CH0-YH9 as ECC / Reg memory. For example, I tested Hynix 2GB 2RX8 PC3 10600R (HMT125R7AFP8C-H9) bought on ebay. These pieces worked neither on my G34 rigs or motherboard mentioned above.

So what I'm really asking about is what kind of affordable type of memory do you use (which is guaranteed to work) in combination with Supermicro X9QRi-F+ and ES editions of E5-4650 I have been looking at this list from Kingston http://www.ec.kingston.com/ecom/configurator_new/modelsinfo.asp?SysID=78545&mfr=Supermicro&model=X9QRi&search_type=&root=&LinkBack=&Sys=78545-Supermicro-X9QRi-F%2B+Motherboard&distributor=0&submit1=Search but all this chips are very expencive. Thank you in advance

Edit: Thanks tear, I will attempt to identify the type you refer to.
 
Back
Top