AMDs newest Bulldozer architecture FX-8120 8Cores performance and OC 5G

windwithme · Dec 12, 2011

During the second half of 2011, AMD has released two new architectures, one of which is FM1 APU platform released in early July.
This platform is mainly integrated with the built-in GPU in CPU by AMD for the first time, and the 3D performance thereof is greatly improved than before.
Regrettably, FM1 is incompatible with AM3+ pin under its own brand. I have already shared three articles about the test of FM1 platform.

FM1 mentioned above belongs to the product line of the medium-level or entry-level platform,
while Bulldozer is the new platform advanced than medium-level of AMD, which is a hot term on the Internet.
By following the design of the previous platforms, Bulldozers architecture adopts AM3+ pin.
CPU has three categories of 4-Cores FX-4100、6-Cores FX-6100 and 8-Cores FX-8120/FX-8150,
among which the two 8-Core categories are the most eye-catching. After all, they are the first products shown in the Desktop PC market.

After several delays, AMD Bulldozer is finally issued in the mid of October and has been sold on the market successively.
There is a lot of related information on the Internet and graphic magazines.
This time I bought an AMD FX-8120 Processor with 8-Cores, 32nm process and TDP of 95W.
The core name is Bulldozer, the clock is 3.2GHz and the specification from Turbo Core to 4.0GHz, L2 8MB and L3 8MB is supported.

The upper left is FM1 A8-3850, lower right is AM3+ FX-8120. The front side comparison between the CPUs of these two architectures

The rear side comparison
The left is FM1, and the right is AM3+. It can be obviously seen that the CPU pins of these two are incompatible.

Recently, some cyber friends mention that I shared little information about the CPU. Personally, I think CPU can be only described by specification and appearance photography.
As for other CPU information, it can be told from the software test below, which is the major part of this article.
Through the pictures of practical test, the CPU performance can be fully presented.
Therefore, this time I add the comparison of the two sides between the CPUs of these two AMD pins, thereby hopefully sharing more details about the CPU-related information.

Regarding the chipset, AMD provides the combination with the three types of 970, 990X and 990FX. 990FX was successively on shipment by major MB manufacturers in August.
FX-series CPU was however released in the mid of October, about over two months later. It is rare in the past PC market.
MB uses BIOSTAR TA990FXE, which is in red package, and belongs to the EXTREME EDITION version under its own brand.

Whole view of BIOSTAR TA990FXE
AMD adopts the dual-bridge chipset design as before instead of single chipset, combining the North bridge 990FX with the South bridge SB950.
This combination is the most advanced chipset of AMD, and most MB brands only release the ATX specification.

As for the color, it still uses black PCB, and other slots such as PCI-E or DIMM use red or white.
The advantage of white color is that it is conspicuous when installed in the PC.
Compared with the other advanced products that are just in black and red colors, it shows better texture.

Lower left of the motherboard
3*PCI-E 2.0*16, ATI CrossFireX technology supported, bandwidth *16+*16+*4
1*PCI-E*1
2*PCI
Atheros AR8151 network chip
Realtek ALC892 audio chip, 8 sound channels HD Audio supported

Lower right of the motherboard
5*red SATA, SB950 chip provided, SATA3 specification, and RAID0, RAID1, RAID5 and RAID10 supported
Blue is the front USB 3.0 expansion slot, simple Power, Reset button, built-in Debug LED

Upper right of the motherboard
4*DIMM DDR3, support 800/1066/1333/1600/1866/2000(OC), with the maximum capacity as 16GB.
DDR3 2000 can be only achieved by CPU OC external frequency. The next is 24-PIN power input.

windwithme · Dec 12, 2011

Upper left of the motherboard
TA990FXE adopts 4+2 phase power supply. The upper left is 8Pin power input.
It adopts the black AM3+ slot exclusively for AMDs advanced MB.

IO
1*PS2 keyboard
1*PS2 cursor
1*S/PDIF Out
1*FireWire IEEE
4*USB 2.0(Black/red)
2*USB 3.0(Blue)
1*eSATA2(Red)
1*RJ-45 Network hole
6*Audio Sound effect hole

View of the heat dissipation module on the heat pipe
It is not bad in the texture and the contact area, and the red aluminum sheet makes it look more beautiful.

The heat sink of North Bridge 990FX has large area for Aluminum extruded heat dissipation.
Its shape is also designed for the consideration of preventing the large VGA from being stuck

The heat sink of South Bridge SB950 is a little thin. If it is designed as North Bridge, they will be consistent in the view and texture.

Test platform
CPU: AMD FX-8120 8-Core Processor
MB: BIOSTAR TA990FXE EXTREME EDITION
DRAM: CORSAIR CMZ8GX3M2A1866C9R
VGA: msi N560GTX-Ti Twin Frozr II
HD: CORSAIR FORCEGT 120GB
POWER: Thermaltake Toughpower Grand 1200W
Cooler: CORSAIR Hydro Series H60
OS: Windows7 Ultimate 64bit SP1

There are more and more enclosed water cooled products in the market within the last two years,
which have advantages of unnecessary replacement of coolant booster, convenient installation and good performance.
Even the AMDs most advanced FX-8150 is integrated with solution of the water cooled heat sink.
Moreover, Intel has also started to buy water cooler heat sink since the series of Sandy Bridge-E.
The heat sink used in this test is CORSAIR water cooled module, with the module number as Hydro Series H60 that is featured by silence.
Below is the external package of H60. The warranty period is as long as five years, and the package size is much smaller than that of H80.

The attached accessories
The bottom is the installation manual and introduction of other product information.
The upper left side is a 12*12*2.5 cm fan, with the maximum rotation speed as 1700 RPM, fan flow as 74.4 CFM, sound volume as 30.2 dBA.
The upper right side is the buckles and installation screws for each platform. H60 supports the platforms of AMD AM2/AM3. Intel LGA 775/1155/1156/1366.

The left side is the entirely-black water cooled radiator, dimension of 120mm*152mm*27mm
H80 water cooled radiator is 38mm thick, while H60 is 27mm thick, whose dimension is different from these two.
The right side is the water block. There is a clear brand logo on the upper side. This picture is the view installed with Intel Buckles.

It can be seen from the rear side that the bottom of the water block is copper.
The stacked fin sheets in the middle of the water cooled radiator provide larger heat dissipation area.

windwithme · Dec 12, 2011

CPU default performance test
CPU 200.0 X 16 => 3200MHz
DDR3 1600 CL7 9-8-24 1T

Hyper PI 32M X 8 => 30m 21.240s
CPUMARK 99 => 436

Nuclearus Multi Core => 16066
Fritz Chess Benchmark => 20.75/9958

CrystalMark 2004R3 => 220281

CINEBENCH R11.5
CPU => 5.04 pts
CPU(Single Core) => 0.95 pts

PCMark Vantage => 14105

PCMark7 => 3884

Windows experience index - CPU 7.8

The single-core performance of FX-8120 presented in CINEBENCH R11.5 is almost the same as that of A8-3850 or Athlon II X4640,
and 10% lower than that of Phenom II X6 1090T. So the FX series needs enhancement in single-core performance.
Multi-task is still the major feature of FX-8120. The software of CINEBENCH can get score of about 5.04 under 8-core at full speed.
However, by taking a close look, we can see that it only gets 5.31x for MP Ratio. Compared with 3.89x of X4 640 and 5.22x of x6,
the best multi-task performance can only reach the level of the last-generation X6.
It may be correlated to the design of new architecture, or the performance will be enhanced by Windows 8 as mentioned on the Internet.
As for the best Windows experience index of above software, FX-8120 can get the high score of 7.8 under default configuration.

DDR3 1600 CL7 9-8-24 1T
ADIA64 Memory Read - 13516 MB/s
Sandra Memory Bandwidth - 16554 MB/s
MaXXMEM Memory-Copy - 12839 MB/s

DDR3 performance is obviously improved in AM3+ CPU. Compared with 1090T under DDR3 1600, it is improved by over 30%.
It should be the best improvement among FX series. It has gone through three generation of DDR~DDR3, from K8 DDR to the latest 1090T DDR3.
Regarding the DRAM bandwidth, the maximum Read keeps around 10000 MB/s. Over the past eight years, the AMD bandwidth has been increased finally.
Although a small step in PC market, it is a big one for AMD. It is hoped to catch up with the competitors platforms in DDR3 performance in the future.

Temperature performance (Ambient temperature is 22 Celsius)
System standby - 13~27

CPU at full speed - 22~33
LinX 0.6.4

In comparison with the temperature by touching the water cooled radiator, the burning temperature of FX-8120 under default frequency is lower than that of 1100T, so the temperature improvement is great.
It can be seen from the shared test of 32nm A8-3850 that the CPU temperature is always low according to much software.
It remains the same after I use A75 of more than three brands. However, the temperature is not so low when I touch the heat sink, which is about 50 Celsius according to my analysis.
The abnormal CPU temperature is also shown on FX-8120. As mentioned by some media abroad, it might be caused by the inaccurate temperature information provided by FX CPU to MB.
The above temperature data detected by various software is just for reference, but without representing the actual temperature of FX-8120 capably.

windwithme · Dec 12, 2011

Power consumption test
System standby - 74W

CPU at full speed - 157W
LinX 0.6.4

The power consumption of FX-8120 obtains rather good performance under default configuration.
The power consumption difference between standby and full speed is 83W.
The integration of 32nm process is greatly helpful to the CPU of AMDs new-era FM1 or AM3+ architecture.

Below will share the overclocking of FX-8120
BIOSTAR has integrated new UEFI graphic interfaces in 990FX BIOS.
Some power-saving functionality in the CPU information page can make some further adjustment.

O.N.E calibration page

Voltage page
CPU Vcore +0.050~1.450V
CPU-NB Over Voltage +0.050~0.200V
Memory Over Voltage -0.250~+0.490V
NB Over Voltage +0.010~0.490V
CPU HT Over Voltage +0.010~0.300V
SB Over Voltage +0.010~0.300V
NB HT Over Voltage +0.010~0.300V

CPU frequency multiplication can be adjusted here too. To increase the external frequency, it needs to lower NB FID first.
If you want to reduce CPU voltage, you can also start from the option of Core VID

As for the DRAM frequency and parameter settings, generally, smaller the values are, higher the performance is.
Different from the previous platforms, FX-8120 supports DDR3 1866 for the first time. In the past, the maximum CPU of the version inferior to 1100T can be only adjusted as 1600.
Below it is set as DDR3 1866 CL8 10-9-27 1T

PC Health Status
CPU temperature detection data is higher than some software in OS, so this comparison is much closer to the actual temperature performance.

Above are the set values by windwithme to overclock FX-8120 to 4.7GHz and integrate with DDR3 1866.
The overclocking settings are easier than that of frequency-unlocked FX CPU. It just needs to adjust CPU frequency multiplication, and then adjust the CPU voltage to the acceptable value.
In comparison, the overclocking of DDR3 is also in the similar way, which requires adjusting DDR3 1600/1866 first, and then setting parameters or increasing DDR3 voltage.
If the operation voltage can be stabilized after overclocking CPU/DDR3, it means that the overclocking is successful.

Overclcoking performance test
CPU 200.0 X 23.5 => Voltage of 4700MHz at full speed is 1.452V
DDR3 1866.6 CL8 10-9-24 1T

Hyper PI 32M X 8 => 22m 29.028s
CPUMARK 99 => 520

Nuclearus Multi Core => 23265
Fritz Chess Benchmark => 30.51/14643

windwithme · Dec 12, 2011

CrystalMark 2004R3 => 282679

CINEBENCH R11.5
CPU => 7.62 pts
CPU(Single Core) => 1.15 pts

PCMark Vantage => 16970

PCMark7 => 4432

Windows experience index - CPU 7.8

After overclcoking FX-8120 to 4.7GHz, the performance of CINEBENCH is increased by about 20% in single core, and over 50% in 8-core at full speed.
Besides, MP Ratio is also increased to 6.63x, which should be an important factor to cause the significant improvement of the multi-core performance.
The performance of other software in the entire platform is significantly improved as well.
It seems that the performance of FX-8120 after overclocking is comparatively outstanding.
However, the Windows experience index is still 7.8 scores. The simple test software built in Windows7 is necessarily enhanced in the accuracy aspect.

DDR3 1866.6 CL8 10-9-24 1T
ADIA64 Memory Read - 14194 MB/s
Sandra Memory Bandwidth - 18627 MB/s
MaXXMEM Memory-Copy - 13985MB/s

FM1 APU released by AMD in July, if without overclocking, can be maximized to DDR3 1866.
The architecture of AM3+ Bulldozer this time has the same specification. It is unlike the previous platforms that can only support DDR3 1600 at most.
The improvement in frequency is also the feature of this new platform, and the bandwidth is slightly increased after it is adjusted to DDR3 1866.

Power consumption test
System standby - 79W

CPU at full speed - 401W
CPU test items in CINEBENCH R11.5 running

For the power consumption duing system standby, it just increases 5W once overclocking.
It may be resulted from enabling C&Q power-saving technology. This part is fairly good.
However, the power consumption amount after overclocking increases a lot.
When I tried 4.3GHz by using LinX burning, the power consumption reached around 380W.
It couldnt conduct burning by using LinX under 4.7GHz. When I tested the 8-core capabilities by CINEBENCH R11.5, it still got the data of 401W.
The power consumption of FX-8120 after overclocking is quite high, which is one of the factors in need of improvement in the future.

I dont share more about the temperature test after overclocking.
The first reason is that LinX can be only running under 4.3~4.5GHz.
After replacing it with Thermaltake advanced air cooler or CORSAIR H80 to enhance the heat dissipation capabilities, the overclocking stability couldnt be improved as well.
Another reason is that displayed temperature is only around 42~46 Celsius, which should not be the actual temperature.
When I disassembled the tower air cooler after burning at 4.3GHz, the heat dissipated from the heat sink shocked me a lot.

3D test
msi N560GTX-Ti Twin Frozr II
3DMark Vantage CPU SCORE => 58318

FINAL FANTASY XIV
1920 X 1080 => 4021

windwithme · Dec 12, 2011

StreetFighter IV Benchmark
1920 X 1080 => 270.63 FPS

Above is the performance of three kinds of 3D software under 4.7GHz, which is also affected by the CPU performance.
Regarding the 3D scores, it is 10% less than that of 2500K OC 5GHz. It needs to be further improved in the future AMD architecture.
990FX provides the bandwidth of ATI CrossFireX X16+X16, which can get better performance than the dual-VGA of the same specification.

External frequency of FX-8120 also gets good performance. If you want to increase the DDR3 performance more, the most direct way is to increase the CPU external frequency.
In this way, it can probably reach the level of DDR3 2000~2200.

CPU 262 X 18 => 4716MHz
DDR3 2096 CL10 11-10-27 1T

ADIA64 Memory Read - 14914 MB/s
Sandra Memory Bandwidth - 19369MB/s
MaXXMEM Memory-Copy - 15013MB/s

The external frequency of FX-8120 can be stable around 255MHz. CPU external frequency within the overclocking range also performs well.
After increasing DDR3, it can almost reach 15000 MB/s in AIDA64 Memory Read.
Compared with the previous AMD platforms, the bandwidth performance when overclocking is increased by about 40~50%, which is also one of the greatest feature for this new platform.

Reaching 5GHz
Super PI 32M X 1 => 16m 49.743s
CPUMARK 99 => 554

5GHz is not easy to reach for FX-8120, even if the heat dissipation system is enhanced.
The key is whether the physical quality of the CPU is excellent or not. Generally, it can reach almost 4.8~4.9GHz, thereby making operating system (0S) run smoothly.
Although reaching 5GHz in OS, it cant be stabilized, which can be only reached under single PI and 32M.
If it requires running 8 PIs simultaneously, it cant be reached even if it is 1M.
Thats why the overclocking should consider whether each core of the multiple cores has the same excellent physical quality.

BIOSTAR TA990FXE uses the AMDs most advanced 990FX integrated with SB950 chipset.
The price in U.S. is about US$115(no more than NT$3,500), which is affordable to most people.
The performance is also one of the major features of BIOSTAR, which gets good performance in the test after overclocking.
Moreover, 990FX has the advantage of dual-X16 and built-in native SATA3 and USB 3.0, which is quite advanced in the hardware specification.

BIOSTAR TA990FXE
Advantages
1. The package and materials are both above the level, and the price is still fair among 990FX series.
2. Its BIOS has been integrated with UEFI interfaces, with rich options, and a large voltage range as well as excellent overclocking capability.
3. It includes Japanese solid capacitors, built-in POWER/RESET buttons and simple debug LEDs.
4. It adopts Atheros AR8151 network chip, which is a renowned brand among the network chip brands.
5.It has the built-in and native SATA3 and USB 3.0，and provides two PCI-E X16s and conventional PCIs for extension.

Shortcomings
1. CPU voltage is fluctuated greatly when system is standby and at full speed.
2. BIOSTAR has no distribution channel in Taiwan.

Performance ratio ★★★★★★★★☆☆
Material ratio ★★★★★★★★☆☆
Specification ratio ★★★★★★★☆☆☆
Appearance ratio ★★★★★★★★☆☆
Performance vs. Price ratio ★★★★★★★★★☆

In AMDs Bulldozer platform, we can see the improvement brought by the new architecture, and the shortcomings as well.
The advantage is that AMD integrates new 32nm process, and launches the 8-Cores design in the common PC consumption market for the first time.
When FX-8120 is not in the overclcoking state, the power consumption and temperature both get good performance.
The overclocking degree also pushes the AMD CPU and DDR3 clock to higher field, and the DDR3 performance is improved greatly as well.

The shortcoming is the price which made AMD proud all the time. This time FX-8120/8150 will make the consumer feel some kind of expensive.
The single-core performance is a little bit lower than that of 1090T, so are the multi-task performances of some software.
When overclocking FX-8120 to 4.3GHz or higher, the temperature and power consumption amount will increase a lot.
These three aspects should be strengthened by AMD to increase the efficiency under multi-task environment through architecture optimization, BIOS modification or software support.

I think many AMD fans have been waiting for Bulldozer architecture for more than one year,
who can only know the 8-core performance from the hearsay news on the Internet or the test information revealed by media abroad during this period.
In this article, windwithme spent nearly two weeks to make calibration for overclocking.
Initially, I hoped it could reach the level of stable burning by 4.6GHz or higher to present the best performance of FX-8120. However, it didnt make it, which is pitiful.
Bulldozer brings about surprises and some disappointment. Personally, I think the disappointment is more than surprises.
When writing this article, I got some complicated feelings. Maybe it is because I still have the passion of pursuing K7 high C/P and K8 high-performance CPU.
When can we see the unsurpassed magnificence of AMD when the performance of K8 Athlon 64 3000+/3200+ was overwhelming on the market again?
When will AMD make us crazy about it again, and make some new-era architecture that is really perfect in overall aspects?
I still hope AMD can perform better in the medium- and high-order market in the future, and have the glorious period as before.
I believe that it is also the expectation of many consumers...

dajet24 · Dec 12, 2011

good review.

cageymaru · Dec 12, 2011

Does the Biostar motherboard have the same problem with Steam CEG games like the Asus Sabertooth 990FX and MSI motherboards? You can test it for free by making a character for the game Saint's Row the Third. Here is the link.

Great review.

pelo · Dec 12, 2011

Nice write-up.

One thing that stuck out was that cinebench taxes all 8 cores on the BD chips and can handle far more (iirc it's 64?), so a scheduler wouldnt help in that regard. The windows scheduler now doesn't help with windows generally spitting threads out in a rather random fashion but a new scheduler only makes a difference if it's more than 1 (all cores perform the same at 1 thread) and under 8 (you can't fill all the cores in more than 1 way). Other than the little misunderstanding about cinebench and windows 8/7 scheduler it's a good read.

W.Feather · Dec 12, 2011

good review as always

ManofGod · Dec 12, 2011

Two things I noticed: 1) CPUz was showing the speed as 1400MHz. 2) The Vantage CPU score is skewed by the NVidia physics testing. To test the cpu correctly, the physics test would have to be turned off changed in the settings.

Otherwise, good review.

W.Feather · Dec 12, 2011

1400 was with a x7 multi, so CnQ was enabled

Mr Spocko · Dec 12, 2011

Sums things up pretty well. You have to OC the FX CPU's quite a lot to make them worthwhile. When you do the power consumption hits absurd levels.
We can only hope that AMD do some major revisions to the FX series and address the obvious problems it has. Much better power consumption needed more energy efficient. Performance per core needs improving quite a lot too.

Then, we might actually have a decent CPU

boushidosan · Dec 12, 2011

Where on Earth are the 95w 8120's? a worthy upgrade from my sempron. How does 5ghz compare to a maxed out 1100t? Im not worried about power.

buttons · Dec 12, 2011

Makes me want to run out and get ANOTHER biostar 990fxe

SonDa5 · Dec 12, 2011

Great MB for the money. The weakness and disappointment of 1st gen BD is further noted with many other reviews. Great review.

Zero82z · Dec 13, 2011

That motherboard looks like it was designed by a monkey. Two PCI-E 16x slots right next to each other and only five SATA ports? Bulldozer sucks enough already; shitty motherboard layouts aren't going to help the situation.

Aaron11 · Dec 13, 2011

good review. nice to hear someone blatantly speak their mind rather than how some sites tend to sugarcoat it.

Poisoner · Dec 13, 2011

Zero82z said:
That motherboard looks like it was designed by a monkey. Two PCI-E 16x slots right next to each other and only five SATA ports? Bulldozer sucks enough already; shitty motherboard layouts aren't going to help the situation.

A little far from GenMay, aren't we?

Zero82z · Dec 13, 2011

Poisoner said:
A little far from GenMay, aren't we?

I have made over 27 thousand posts on this forum. Most of those were outside Genmay.

Poisoner · Dec 13, 2011

Zero82z said:
I have made over 27 thousand posts on this forum. Most of those were outside Genmay.

I was being facetious.

Dan_D · Dec 13, 2011

Good god. Some of those benchmark results are absolutely pathetic.

noko · Dec 13, 2011

pelo said:
Nice write-up.

One thing that stuck out was that cinebench taxes all 8 cores on the BD chips and can handle far more (iirc it's 64?), so a scheduler wouldnt help in that regard. The windows scheduler now doesn't help with windows generally spitting threads out in a rather random fashion but a new scheduler only makes a difference if it's more than 1 (all cores perform the same at 1 thread) and under 8 (you can't fill all the cores in more than 1 way). Other than the little misunderstanding about cinebench and windows 8/7 scheduler it's a good read.

Actually it can still help. Even if 8 cores are being used the current scheduler in W7 will trash the caches with other threads from other programs almost randomely switching between the modules and cores. If scheduler would keep as many modules/cores dedicated to a given program while sharing as few as possible with other threads will help. Keeping similar threads and data slotted for one module/core will also help so some program awareness is probably in order.

Mr Spocko · Dec 13, 2011

Still doesn't explain why AMD designed the Cpu knowing it was not optimal. I just think it's all smoke and mirrors do they really expect a huge increase in performance in Win 8? I doubt it will be in reality.

BTW let's all remind ourselves that you can't actually buy Win 8 yet either.

Bottom line is very clear a 32nm revision of the older Ph/Ah II chips would have done better and quite a lot as well.

Mav451 · Dec 13, 2011

windwithme said:
I think many AMD fans have been waiting for Bulldozer architecture for more than one year,
who can only know the 8-core performance from the hearsay news on the Internet or the test information revealed by media abroad during this period.
In this article, windwithme spent nearly two weeks to make calibration for overclocking.
Initially, I hoped it could reach the level of stable burning by 4.6GHz or higher to present the best performance of FX-8120. However, it didnt make it, which is pitiful.
Bulldozer brings about surprises and some disappointment. Personally, I think the disappointment is more than surprises.
When writing this article, I got some complicated feelings. Maybe it is because I still have the passion of pursuing K7 high C/P and K8 high-performance CPU.
When can we see the unsurpassed magnificence of AMD when the performance of K8 Athlon 64 3000+/3200+ was overwhelming on the market again?
When will AMD make us crazy about it again, and make some new-era architecture that is really perfect in overall aspects?
I still hope AMD can perform better in the medium- and high-order market in the future, and have the glorious period as before.
I believe that it is also the expectation of many consumers...

Well said sir - a return to the glorious period would be really welcomed by enthusiasts. The A64 era is just so long ago now and only something to reminisce on

Thanks for the great review.

pelo · Dec 13, 2011

noko said:
Actually it can still help. Even if 8 cores are being used the current scheduler in W7 will trash the caches with other threads from other programs almost randomely switching between the modules and cores. If scheduler would keep as many modules/cores dedicated to a given program while sharing as few as possible with other threads will help. Keeping similar threads and data slotted for one module/core will also help so some program awareness is probably in order.

That's true. Like threads would benefit being shared within module rather than being spit out in a random way as they currently are, but those benefits I'd imagine wouldn't be nearly as significant as the gains you'd see from 2 or more or under 8.

The problem AMD has, and subsequently windows, is that the turbo core feature in the bulldozer's is actually pretty good but there's a 20% performance decrease when threads are taxed within module rather than between modules. So they've gotta weigh the positives and negatives between having a small number of threads (under 4) being split between modules before putting them within the same module and just when that'll likely net bigger performance gains, like similar tasks being slapped on the same module.

I guess it's actually far more complex than I originally thought it would be. I'd assume it's relatively easy to just ask the scheduler to tack in chronological order but then you have to worry about the -20% sharing tax. If you split them between modules then where would similar threads go after 4? Maybe similar workloads and threads within the same module and others get their own? Clearly they've got a lot of work cut out for them.

But we're digressing. In a benchmark like cinebench I don't think the scheduler would affect the scores. In other heavily threaded apps that may be the case though

Dan_D · Dec 13, 2011

Mr Spocko said:
Still doesn't explain why AMD designed the Cpu knowing it was not optimal. I just think it's all smoke and mirrors do they really expect a huge increase in performance in Win 8? I doubt it will be in reality.

BTW let's all remind ourselves that you can't actually buy Win 8 yet either.

Bottom line is very clear a 32nm revision of the older Ph/Ah II chips would have done better and quite a lot as well.

You have to understand that designing a CPU takes a long time. Years in fact. All you can do is try to predict where technology and your competitors are going the best you can and make your products accordingly. Also, given that no one has done some of the things in Bulldozer before, it was probably unclear how they'd actually work in practice. I'm sure the theory behind every part of the CPU is fine, but the reality is quite different. I'd bet that by the time they realized what was wrong in the earlier working examples of the chip, it was probably too late to radically redesign the CPU. After that they probably optimized, tried for better yields and tweaked small things to try and mitigate the processor's weaknesses.

Unlike Intel, scrapping an entire architecture could be catastrphoic for them. (If you'll recall the Tejas architecture was completely abandoned.) If they started a new CPU then all the money they spent up to that point on Bulldozer would have been wasted. It would also "reset" the clock on the development cycle. With their lineup already aging and Intel leaping further and further ahead, something had to be done.

So they salvaged what they could, made it work as good as they possibly could, then released it with a market position which would allow them to recoupe most of the development costs of the CPU, if not actually turn a profit after some period of time. All they can hope to do is achieve better results next time. If they are lucky they can close the gap a little as they did with Phenom II. It won't make it the CPU everyone hoped for but it will still allow them to make some cash.

pelo · Dec 13, 2011

Dan, though you're right, technically they already did die-shrink the Phenom II's: it's their best selling product, the Llano on 32nm. They share the same Stars core design, minus the L3 cache. Obviously the Llano is an APU so there's the graphical portion as well.

He is right, though. AMD plans on abandoning Stars cores in early 2012 with the Piledriver core Trinity APU on socket FM2. They've already (and had to, really) embrace their new Bulldozer/Piledriver architecture. We can only hope they've fixed the cache issues and tweaked whatever they could manage to get that IPC up

Dan_D · Dec 13, 2011

pelo said:
Dan, though you're right, technically they already did die-shrink the Phenom II's: it's their best selling product, the Llano on 32nm. They share the same Stars core design, minus the L3 cache. Obviously the Llano is an APU so there's the graphical portion as well.

He is right, though. AMD plans on abandoning Stars cores in early 2012 with the Piledriver core Trinity APU on socket FM2. They've already (and had to, really) embrace their new Bulldozer/Piledriver architecture. We can only hope they've fixed the cache issues and tweaked whatever they could manage to get that IPC up

Well considering they'll have to drop clocks in the mobile market they'll have to increase IPC. I'm betting on the mobile version getting a lot of revisions before launch. At least, we can only hope. But yes, I was aware of Llano being what it is. However it's not a product that will carry them very far into the future.

pelo · Dec 13, 2011

Dan_D said:
Well considering they'll have to drop clocks in the mobile market they'll have to increase IPC. I'm betting on the mobile version getting a lot of revisions before launch. At least, we can only hope. But yes, I was aware of Llano being what it is. However it's not a product that will carry them very far into the future.

Actually, I think the Llano and Fusion idea is their future. You need to look at how they approached FPU in the BD design to see that they're likely to integrate the GPU into the picture as well. In fact, they already have with the Llano but there hasn't been a lot of software that's taken advantage. Just look at CUDA =P Though, CUDA is arguably far more successful than OpenCL, it still shows the direction. Llano is currently their only product that's impressive for the average consumer and if they hadn't had GloFo yield issues you'd have seen new Macbooks with Llano APUs instead of Sandy/Ivy Intels.

And yea, this design doesn't seem very fitting for the mobile market. High clocks, long pipeline and too many transistors isn't a recipe for low wattage mobile parts. I really do hope they've got the issues with L1 and L2 cache ironed out with Piledriver. High clocks on a gate-first approach was a disaster from the beginning.

SonDa5 · Dec 13, 2011

The memory speeds and timings look great on the tests. The memory benchmark performance seems a little low for the speeds and timings though. Looks like the memory performance is bottle necked.

fps4ever · Dec 13, 2011

With some IPC tweaks it may have been a perfect architecture for the 22nm process, but as it stands it is just a failure overall using today's processes.

Dan_D · Dec 13, 2011

pelo said:
Actually, I think the Llano and Fusion idea is their future. You need to look at how they approached FPU in the BD design to see that they're likely to integrate the GPU into the picture as well. In fact, they already have with the Llano but there hasn't been a lot of software that's taken advantage. Just look at CUDA =P Though, CUDA is arguably far more successful than OpenCL, it still shows the direction. Llano is currently their only product that's impressive for the average consumer and if they hadn't had GloFo yield issues you'd have seen new Macbooks with Llano APUs instead of Sandy/Ivy Intels.

And yea, this design doesn't seem very fitting for the mobile market. High clocks, long pipeline and too many transistors isn't a recipe for low wattage mobile parts. I really do hope they've got the issues with L1 and L2 cache ironed out with Piledriver. High clocks on a gate-first approach was a disaster from the beginning.

I'm talking about mobile Bulldozer, not Llano and Fusion. I know the mobile market is where the money is right now. I know that Llano has treated AMD very well. They just can't continue to work with the current architecture over the long haul. I'd wager it's pretty well tapped out as far as IPC is concerned. Bulldozer seems like a bad move in the mobile market, but we'll see.

pelo · Dec 13, 2011

Well the new Trinity chips will feature Piledriver cores, so we'll get a chance to see if they improved upon IPC and by how much even before the AM3+ Vishera FX models are released. The Trinity chips will lack L3 and will likely be popular for mobile parts, but that means they need to improve perf-per-watt and decrease watt usage by a very large amount. But yes, they're current approach doesn't seem justified in the enthusiast nor mobile space. They'll have to drastically decrease L2 latency and increase L1 and that should theoretically net them a decent IPC gain, but AMD isn't known for their fast cache speed. The Bulldozer design was aimed to be clocked high with a longer pipeline, and if Trinity shares that approach (and judging by their responses to our questions it seems to be the case), then I wouldn't hold my breath.

Let's just hope they've got an ace up their sleeve that nobody knows about

cageymaru · Dec 13, 2011

A company that can't even get their bios running properly doesn't give me much hope for their future. If you can't get the little things right, how can they get the big things right? Heading into January and still no fix for the Asus Sabertooth 990FX or MSI motherboards.

Garbage in and garbage out.

zxfreese · Dec 13, 2011

Shame, was really hoping on upgrading some computer parts this Christmas/Winter. With this being a disappointment, and my 6850s still killing... Might be the first Christmas I don't strong arm someone into buying me some sort of computer component!

Devilpup · Dec 13, 2011

I think this may be a better year for buffing auxilliary (monitor, speakers, desk, chair) parts rather than direct components (cpu, gpu, hdds).

The argument about scheduling issues is somewhat interesting. Is there any way to hack the bios to force it to only recognize 4 primary cores? I realize that would defeat some of the theory behind the cpu but if it could be done maybe it would help garner an idea of what performance would look like if the workload was distributed better.

pelo · Dec 13, 2011

Devilpup said:
The argument about scheduling issues is somewhat interesting. Is there any way to hack the bios to force it to only recognize 4 primary cores? I realize that would defeat some of the theory behind the cpu but if it could be done maybe it would help garner an idea of what performance would look like if the workload was distributed better.

Yea, techreport said much the same, the problem is that the turbo works better for within module rather than between module, so you're losing clockspeed. The performance gains are generally better between modules because you don't pay that -20% sharing tax.

"Trouble is, right now, Intel has much better OS and application support for Hyper-Threading than AMD does for Bulldozer. In fact, we're a little surprised AMD hasn't attempted to piggyback on Intel's Hyper-Threading infrastructure by making Bulldozer processors present themselves to the OS as four physical cores with eight logical threads. One would think that might be a nice BIOS menu option, at least. (Hmm. Mobo makers, are you listening?)"

http://techreport.com/articles.x/21865/2

The problem with presenting as 4 cores 8 threads is that you're now not an 8 core processor. Granted, it's just a 4 core with extra ALU and not really an entirely separate 8 core structure, but whatever pays the bills.

Dan_D · Dec 13, 2011

pelo said:
Yea, techreport said much the same, the problem is that the turbo works better for within module rather than between module, so you're losing clockspeed. The performance gains are generally better between modules because you don't pay that -20% sharing tax.

"Trouble is, right now, Intel has much better OS and application support for Hyper-Threading than AMD does for Bulldozer. In fact, we're a little surprised AMD hasn't attempted to piggyback on Intel's Hyper-Threading infrastructure by making Bulldozer processors present themselves to the OS as four physical cores with eight logical threads. One would think that might be a nice BIOS menu option, at least. (Hmm. Mobo makers, are you listening?)"

http://techreport.com/articles.x/21865/2

The problem with presenting as 4 cores 8 threads is that you're now not an 8 core processor. Granted, it's just a 4 core with extra ALU and not really an entirely separate 8 core structure, but whatever pays the bills.

It may not be up to the motherboard manufacturers. The OS would recognize two quad core Xeon's with Hyperthreading as two quad cores with 8 physical cores if you disabled HT on them. This method would probably yield better performance than a single quad core with Hyperthreading would assuming all other factors were equal. (At least in Nehalem and Sandy Bridge CPUs, which don't have an FSB architecture.) People are too quick to blame Microsoft for the Windows 7 schedular sucking but a more likely scenario is that with AMD's help, Microsoft has managed to make Windows 8 suck less on Bulldozer based systems than Windows 7 did.

pelo · Dec 13, 2011

Yea, it's not as easy as just having the scheduler recognize them in a chronological order. There has to be a lot of thought put into a win7(8) scheduler for Bulldozer. For AMD and Windows to release a BD optimized scheduler requires taking into account quite a lot.

Off the top of my head:

Both the clock speed gains and future clock speed gains and how the turbo will impact the scheduler
Spitting out like threads within module to optimize performance.
Deciding where to send a lower number of threads
You have to account for the shared FP or 2 128bit separate, and shared L2 and L3 into the above.

It's gonna be difficult and I'm not surprised I don't see one yet. I'd imagine the chronological order one would have been relatively easy to come up with and as would a "module first until over 4" scheduler, but those have obvious downsides. To make one that accounts for those weaknesses that will help the BD (and CMT) architecture won't be easy.

EDIT: you can set thread (well, core) affinity on your own anyway, but that requires knowing how many threads the program uses, and that isn't as easy as it seems. Then there's the hassle of always playing with it depending on how many threads you're CPU is being asked to handle.

Can you turn off specific cores in BD? or are you limited to turning off modules only?

AMDs newest Bulldozer architecture  FX-8120 8Cores performance and OC 5G

Limp Gawd

Limp Gawd

Limp Gawd

Limp Gawd

Limp Gawd

Limp Gawd

2[H]4U

Fully [H]

2[H]4U

[H]ard DCOTM x4 & [H]DCOTY x1

[H]F Junkie

[H]ard DCOTM x4 & [H]DCOTY x1

Limp Gawd

Gawd

2[H]4U

Supreme [H]ardness

Fully [H]

[H]ard|Gawd

Gawd

Fully [H]

Gawd

Extremely [H]

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

2[H]4U

Extremely [H]

2[H]4U

Extremely [H]

2[H]4U

Supreme [H]ardness

[H]ard|Gawd

Extremely [H]

2[H]4U

Fully [H]

n00b

[H]ard|Gawd

2[H]4U

Extremely [H]

2[H]4U

AMDs newest Bulldozer architecture FX-8120 8Cores performance and OC 5G