hi Folks,



i've been running a quad socket SuperMicro H8QGi-F motherboard with Opteron 61xx ES chips since 2015 as my home PC. 48 K10 cores overclocked to 3.0 GHz. The thing has been a tank, and solid as a rock (and it should be since the internals plus the SC748 case weigh about 80 lbs). The K10 based Opterons worked well, but lately i decided to upgrade to the Piledriver based Opteron 63xx chips to tide me over a bit until i upgraded yet again to EPYC Rome chips. Yes i know Piledriver is a bit passee these days, but still....



Specifically the ones i used were ZS258045TGG54_34/25/20_2/16. These are unlocked 63xx engineering samples - essentially and unlocked version of Opteron 6380 (16 cores). i pulled all 4 61xx ES (and the Noctua NH-U9DO A3 coolers), and reinstalled 2 of the 63xx chips.



i ran into a bunch of peculiarities in getting these things stable.



To keep heat down and performance up, i downcored them in the BIOS using the "Compute Unit" setting. So instead of a total of 32 cores, i get 16 (8 per 63xx). The "Compute Unit" setting essentially disables CMT, and clock gates the unused Piledriver core in the compute unit. The remaining core gets the L2 cache as well as the L1 instruction cache and shared decoder to itself.



My HQGi-F also had Tear's OCNG BIOS flashed, and of course the Turionpowercontrol utility to let me set pstates (voltages plus frequencies).



The strange thing was that the box was not stable. i could run Prime95 small FFTs (which runs in the L1/L2) just fine, but the moment i ran Blended (where it hit memory) the box would black screen and reboot. Running the RightMark Multi-Thread Memory Test would also cause a reset. At first i thought it was related to the Hypertransport, and dropping that down to HT 1.0 helped things run longer. That did not fix things unfortunately.



It turns out that the BIOS node interleaving disabled setting seems to break the processor. The Node interleaving disabled setting exposes threads to the true memory latencies caused by the distance from the core running that thread to the data in the memory it was originally allocated on. So threads running on socket 2, accessing data in memory that was originally allocated on socket 1 have memory latencies a lot higher, than if they were running on a core in socket 1. Setting it to Node interleaving back to Auto (enabling node interleaving) and striping the memory across all nodes to even out the memory latency seen by each socket brought completely stability back.



i have no idea why disabling node interleaving breaks things under heavy load. Its almost like when all the cores are hitting memory hard, the shared Hypertransport probe filter that maintains memory coherency across cores is simply not able to keep up.



As far as overclocking goes. The Piledriver cores on these things run 4.5 GHz just fine in pb0 single core turbo. pb1 (all core turbo) they run at 3.9 GHz just fine.



Frequency and voltages for stability:

2800 vcore 1.1250v (pb1 all core turbo)

3500 vcore 1.1875v (pb1 all core turbo)

3600 vcore 1.1875v (pb1 all core turbo)

3700 vcore 1.2000v (pb1 all core turbo)

3800 vcore 1.2250v (pb1 all core turbo)

3900 vcore 1.2375v (pb1 all core turbo)

4000 vcore 1.2375v (pb1 all core turbo)



4500 vcore 1.3125v (pb0 single core turbo)



The above settings were with APM on. i will be disabling APM to see what frequencies/voltages these processors run when turbo core isn't managing things.



Compared to the K10s, the single thread performance is marginally better (but requires a lot more frequency to achieve this), but the main benefit is the much improved memory performance.



I'd be curious how other people are doing overclocking these things. Heat is not so bad with just 16 cores. i hit about 33C or so when running prime95.