thoughts ???

lightp2

Gawd
Joined
Oct 28, 2009
Messages
672
For a 4-module, 2-int-core + 1 shared 256-bit floating-point core per module 32-nm processor

1. Standard Mode : 8-int-core+4-shared-fp-core

2. Special Mode 1 : Re-program BIOS/Firmware
4-int-core+4-shared-fp-core. Meaning one-int-core in every module is disabled, so you are left with one-int-core+one-256bit-fp-core.
since half of cores already disabled, so-called cache contention issues reduced. Overclock remaining 4-int-core-4-fp-core to the max

3. Special Mode 2 : Re-program BIOS/Firmware. Hybrid mode
8-int-core+4-shared-fp-core.
Remain-8-int-core to take advantage of multithreading to the maximum.
Software/Firmware coding to ensure scheduling or task switching only on special modules

For example :
Config 0
4 module 8 int-core 4 shared-fp-core : standard BD mode

Config 1A : Core i5-2500K/ Core i7-2600K Mode
4 module 4-int-core 4 shared-fp-core : special mode 1 world-record edition
Config 1B : Core i5-2500K/ Core i7-2600K Mode
3 module 3-int-core 3 shared-fp-core : special mode 1 SuperPI BE Edition max overclock.
1 module 2-int-core 1 shared-fp-core : semi-hyper-threading

Config 2 : Core i5-2400/2500 Mode
2 module 4 int-core 2 shared-fp-core : standard BD mode
2 module 2-int-core 2 shared-fp-core : special mode 1

Config 3 : FOSS Mode : only FOSS people will spend time doing this.
2 module 4 int-core 2 shared-fp-core : standard BD mode
2 module 4-int-core 2 shared-fp-core : special mode 2
say int-core [4,5], [6,7]
- 4 and 6 will take normal tasks and available for standard scheduling.
- 5,7 require special programming and custom scheduling. Since intention is to reduce fp-issue and cache-issue. The intention here is minimize impact to core 4 and 6

I do not have any concrete idea. This is reading www.xtremesystems.org and other recent postings...
 
Last edited:
Continuation of similar idea

1. Tegra3-4-core+1-low-clock-core

2. Similarly you can have BD

2 module 4 int-core 2 shared-fp-core : generic workload module edition
1 module 1 int-core 1 shared-fp-core : single-thread module edition : maximum clock. For example. If you can forceload certain old games where CPU utilization is uneven, force the most busy process to this module.
1 module 2 int-core 1 shared-fp-core : house-keeping-module edition : custom low-clock mod.

I am assuming you can individually raise and reduce clock by module/core. I do not know.
 
For example,

IBM Cell Processor Sony PS3 Emulation Mode

1 Module 2-int-core-1-fp-core : Generic OS /HouseKeeping Module
3 Module 6-int-core-3-fp-core : Special Mode 2 Hybrid
- [2,3],[4,5],[6,7]
- core 2, 4, 6, have full access to 256bit fp-module in each module, SPU function.
- core 3,5,7 mostly integer workload, maybe run COBOL???
-core 0,1 optionally can issue OpenCL offload to GPU. The FPU is less concern down the road when abundance of GPU resource is available??
 
Maybe

When Tesla/FirePro GPGPU Processing nodes are detected

1 Module 2-int-core 1-shared-fp core : Generic OS/HouseKeeping module
3 Module 6-int-core 0-shared-fp core : disabled fp functions. Offload everything to Tesla/FirePro OpenCL or floating point processing.. ??
 
You guys are so smart...it blows me away...are you guys engineers?...
I read constantly and am knodlgeable but you guys really are light years away from where im at...congrats
 
You guys are so smart...it blows me away...are you guys engineers?...
I read constantly and am knodlgeable but you guys really are light years away from where im at...congrats

They're probably programmers :D
I wish i knew what they were talking about :confused:
 
Maybe

When Tesla/FirePro GPGPU Processing nodes are detected

1 Module 2-int-core 1-shared-fp core : Generic OS/HouseKeeping module
3 Module 6-int-core 0-shared-fp core : disabled fp functions. Offload everything to Tesla/FirePro OpenCL or floating point processing.. ??

I don't think this would work, there is probably too much latency to benefit from having 0 FPU power for smaller workloads that the cpu can handle and offloading it to the tesla.

I'd say a better option is to wait till the FPU is fully loaded and there is a long enough queue of FPU defendant instructions before offloading.

or the program would have to be aware of the GPU and go there primarily for FPU intensive operations.

I dunno, I'm a newb :)
 
Last edited:
FX-4100

1. Disabled one-int-core per module, you have
2 module 2-int-core,2-256bit-fp-core 32nm- overclock to 4.8GHz
Pentium DUAL-CORE most programs only 2-thread MODE

2. Disabled one-int-core for one module, you have
1 module 1-int-core,1-256bit-fp-core 32nm- overclock to 4.8GHz
1 module 2-int-core, 1-256bit-shared-fp-core 32nm
Core i3 Semi Hyper-threading MODE

There are so many permutation. It is up to BIOS and endusers' willingness.
 
To verify concreteness of information, you need

1. FX8120 or FX8150
2. Motherboard
3. Disable one integer-core per module. So you are left with 4 module, 4 integer core, 4 256bit-fp-core. Do this inside BIOS so that you are very certain it is one-integer-core per module and set hardware-wise. Double check with OverDrive.
4. Run the tests and compare result to normal Phenom II quad/Core ix Quad. Single thread tests or multi-thread tests. This clarifies exactly how much gain/loss per observable application tests. In another example application scenario, 8core per core score is 1500, 4 core non-shared per core score is 2200. So you have incentive to run it as non-shared quad-core when situation is favorable.

4.1 The main reason is to reduce test scenario to quad for baseline, then understand the current implementation how it relates to Phenom II/Core i3/i5/C2Q IPC average for optimization consideration
4.2 To verify whether power is indeed reduced under this circumstances, and how much. or no improvement. This verification is important if high clocking issues come later.

5. If (and big if) operating as quad-core, power is much reduced, and perf/consumption comparable to Phenom II, or maybe slightly better under certain circumstances, then it is time to raise the clock to take advantage of 32nm.

6. You can then build models selectively activate cores for various scenario with above understanding.

Obviously software support is consideration but that is already understood and takes time to adapt.
 
Last edited:
Yeah, I'm thinking about doing something crazy like this.

But I wouldn't want to spend the $245-$205 for FX-8150/FX-8120.

That's the only thing putting a damper on my plans.
 
For example, say

1. You have such system X-units.
2. The systems run on battery power.
3. You have pending workload with value 100. random interference est 20
4. You need to complete the workload before UPS shutdown
5. It is better to operate as quad ? 6-core or 8core. Should I instead deactivate 3 modules to conserve power and offload everything to others ? Can you dynamically reconfigure for the best configuration before battery runs out?

:) I admit this has nothing to do with FX-8150 as a product :) :) :) I just want to give some thoughts for some of the 1000+ AMD forum viewers for staying late... :)

Cheers
 
This sounds exciting

The other thing you have to take in consideration if you want an OC worth a damn you probably should get a ASUS Crosshair V Formula. Which kinda negates the savings in not buying a 2500k/2600k. The Formula is known to have BIOS that lets you enable/disable cores independently of Bulldozer modules. I'm sure there exists or will exist another mobo that lets you do that as well, but you're gonna want that 8 phase power, son.

And you should really get an aggressive air solution or water solution. Or a phase change solution. Go big or go home. :)

But it might be like putting lipstick on a pig...
 
Last edited:
The main purpose of writing this concluding post is to give summary and answer questions. I think it should not be a problem to answer question.

1. As noted, 4Module-8-int-Core-4-256bit-shared-fp-core 32nm processor has a lot of flexibility in addition to 8MB + 8BM cache.

2. Most of the scenario are already included in the design philosophy.

3. the KEY-POINT of this series is NOT to encourage unnecessary overclocking which burns wattage without the necessary result.

4. The CORE-POINT of this series is to understand how to assess a Processor and its behavior under various scenario, and then devise models how best to operate the processor under various usage model.

5. For example, there are situation where LOW-POWER is required for persistence. Here, you can run 8-core but extremely low-clock so the power usage is reduced to reasonable minimum

6. Where power-source is stable, you can switch mode to re-configure to ramp clock where necessary for certain workload. For example, if the workload is clearly four-threads-limited, why ramp 8-core when only 4 is really needed under special config. So you dynamically deactivate where necessary. And you look at the demography of such users. Some of them are clearly not running server workload so less need to worry about
concurrency demand while the 4-cores are being ramp to 5GHz because they are probably not doing other things.

7. Some users may ask "I thought that's what Turbo Core should do?". Here, enduser maybe able to deduce a more favorable config with equal performance and lower-power.

8. The most important is you can dynamically program OverDrive/BIOS/UEFI/Firmware/user-space codes to adapt where needed.

9. To give example, I give this honestly, it is not easy to do so in Windows, but you can perform Software-based Processor Hot-Add and Processor Hot-Remove in other OS. This means where not needed, you can dynamically remove CPU core from running OS so that power consumption is reduced and cache-issue minimized by not sharing fp-core. Hot-Add processor core back to running system when load crosses certain threshold.

point 8 and 9 are reply for Yossarian22

10. In summary, to everyone,
By understanding the processor fully, you can deduce best way to run a particular workload, in the process saving power. I think reducing power consumption is fairly genuine goal and benefit everybody eventually.

This series is purely exploring various possible alteration for BD usage. Obviously I agree you cannot expect every user to do this kind of tuning so it is purely theoretical at this time. So the generic situation for FX is still as seen currently.

Cheers

note : par exemple,
1. Some times just like life, many sure hope they have great academic score, brilliant career, wonderful family, young, rich and famous, good fortune, run 100m like 10second, etc, stuff like that... However, in reality many need to make do with what they have, and frequently it is not so clear cut. The key to life is to understand the situation, analyze, appreciate and probably chart a reasonable path forward within your capability. If it is doable, try build a plan, if it is otherwise, then navigate to reasonable ground, you know, stuff like that, ... :) Happy Now?

-----------------------
Oct 13, 2011
Tentatively, considering most of the generic "Desktop User" scenario, (no server workload consideration), the suggested mode is
1. switch to 4Module-4-Int-core-4-non-shared-fp-core. Hardware-set in BIOS and verify with OverDrive.
2. Undervolt the processor, some websites able to undervolt until 1.11v
3. Finally with item 2 as baseline, start minor adjustment everywhere to ramp the clock until a set power consumption target.

Finally, it is true diminishing return, no point in insane volt and power consumption. Reasonable clock rate it is.
Obviously where endusers have very specific understanding and application set, you can switch back to 6-core or 8-core full operating mode. Even the 8-core mode is OK. However, if most time the system is dealing with 2-thread applications, higher IPC-mode is preferable.
 
Last edited:
Back
Top