So what went wrong with BD?

How can it be less expensive to manufacture when it is 50% larger on the same process size?

thuban is 346mm2 and bulldozer is 315mm2 . The waffer size is still the same so you can fit more bulldozers per wafer than thubans. So its cheaper.


Of course dropping thuban to 32nm would have most likely gotten you a good 30% reduction in die size .
 
Robert Palmer, current member of AMD's Board of Directors and former Digital CEO purportedly remarked that

"Designing microprocessors is like playing Russian roulette. You put a gun to your head, pull the trigger, and find out four years later if you blew your brains out."
 
Robert Palmer, current member of AMD's Board of Directors and former Digital CEO purportedly remarked that

"Designing microprocessors is like playing Russian roulette. You put a gun to your head, pull the trigger, and find out four years later if you blew your brains out."

haha thats a very good saying.

I would be more then happy to support Bulldozer if it didn't pull so much damn power. If they drop their 8-core power usage by 30% I'll buy an AMD set up.
 
The only thing wrong with FX is that it fails to keep up with it's predecessors under certain circumstances, nobody expected that and it really made it look bad.

The rest is exactly what AMD promised, and if they can cut the prices and come up with the rest of the line (specially the budget skus), FX will do exactly what AMD has been doing all this years, offering a good performance/price ratio. It even touches the top southbridge offerings with multi threading operations, I didn't see that one coming

That's the way it is, AMD banked heavily on the future, this technology is the foundation of their future roadmap, and proof to this fact is that it is already performing better on win 8, not that it's worth much, AMD needed a champ right now, and badly, so did the rest of the industry, lack of competence will only make intel sit on their lazy butts and keep charging premium for it.
 
thuban is 346mm2 and bulldozer is 315mm2 . The waffer size is still the same so you can fit more bulldozers per wafer than thubans. So its cheaper.

FSM forgive me.

thubanandbulldozer.jpg
 
one of the main problems is that their flagship cpu is slower than what most enthusiasts (their target audience) already have in their rigs currently.
who would pay more money for a downgrade?
 
Honestly, I just think they're aiming too much for the future and not enough for today. Sure, the 8150 will probably beat the 2500k in most things 4 years from now when 6+ threads become the norm. Unfortunately, we'll all be buying new processors by then.

The shared resource design also seems to rely too much on the OS being able to intelligently distribute the threads. I'm sure it'll get improved over time, but it'll never be perfect and always hurt the module design.
 
thuban is 346mm2 and bulldozer is 315mm2 . The waffer size is still the same so you can fit more bulldozers per wafer than thubans. So its cheaper.


Of course dropping thuban to 32nm would have most likely gotten you a good 30% reduction in die size .

Oh, smaller than a thuban. I mistakenly assumed the comparison was to a Sandy Bridge.
 
Pulling my post from another thread: http://hardforum.com/showthread.php?t=1642939

After seeing all the benchmarks, I'm starting to think: Could the shared resources and split nature of each Bulldozer module be at fault here?

From how I understand it, each module has its own L2 cache but all modules share a larger L3 cache. In each module, workload is sent through one thread or split into two threads depending on load. Now, the benchmarks show that Bulldozer is near a 2500K or around a 1090T in multithreaded applications but performs miserably in single threaded applications.

The way I see it is this and purely theoretical and hopefully someone can provide a better explanation:
When one single module receives a single threaded workload, only HALF the resources of a Bulldozer module is at use. However, I believe that the core clock probably automatically increases for single threaded workloads but still only half the Bulldozer's module's resources at use-- half the integers, half the FP, etc. The other modules are disabled and probably automatically since it's only a single thread.

I'm thinking that a smarter approach to this method of thread scheduling would be to automatically distribute a single thread to the entire module. The processor would then utilize the full complement of the resources of each module but the module as a whole is working on the same single thread. The processor is using two times the resources available versus half the resources of a module.

At the very least, a single module won't be wasting resources for a single threaded application. Therefore, I'm thinking that AMD's Bulldozer isn't utilizing a full module for a single thread. Majority of the applications consumers use day-to-day are single- or dual-threaded. Many games are multi-threaded, or starting to soon.

A multi-threaded application of let's say two threads should use two modules instead of a single one. That way each thread gets to use the full resources of each module.

Four threads, four modules. Eight threads, eight modules but it would make it look like a sixteen core processor. I think the idea of using half the resources of a module per thread is being more of a detriment than benefit for Bulldozer. Especially since each "core" of a module is literally half the resources of the entire module but with less execution and computational components to work with on a single thread.

Comparing Nehalem's core to Bulldozer's "core," it really does seem like half the resources are used in a module thus a poor single threaded performance.

Bulldozer module: http://www.bit-tech.net/hardware/cpus/2011/10/12/amd-fx-8150-review/2
Nehalem core: http://www.anandtech.com/show/2594/3
Sandy Bridge core: http://www.tomshardware.com/reviews/sandy-bridge-core-i7-2600k-core-i5-2500k,2833-2.html

Now, I'm guessing AMD figured that Bulldozer would be a power hog somewhere during its design stage and thought maybe having Bulldozer save power is to disable the other modules if not in use or reduce power if they were underutilized. I think that backfired because the processor is still using quite a bit of power.​

And, would just an improved Thuban architecture in a more traditional sense have been a better option? Something with higher per thread/per core performance and increased IPC versus increasing speed to work on a single thread when half the module is underutilized.

I get this thinking that AMD should have stuck with, "If it ain't broke, why fix it?", mantra. I get the feeling AMD wanted to try something new and implement something that would be quasi-Hyperthreading (without possibly getting sued by Intel for a one-to-one copy of it).

But, is doing something new and different the best idea? Then again, we look at the core architecture of Nehalem to Sandy Bridge (and SB-E) and then to Ivy Bridge. Intel tries something new yet are more successful at implementing changes to its core architecture.

Is this because AMD hired the wrong engineers?
Is this the result that AMD has 8 times less operating income than Intel? ($6.4 billion versus $800 million.)
Is this because Intel has hindered AMD's place in the market by making backdoor deals with computer manufacturers that reduced their revenue and income?

What could be the source of the issue(s) here for AMD?

Addition:

Another possibility is that AMD put too much forward-thinking in this processor design. Like someone mentioned before in this thread, this is great for servers that run a lot of multi-threaded applications and for the future when programs are more multi-threaded than now. However, with the majority of single-threaded applications, this processor design is probably too radical for the majority of desktop users.

Add to the fact that the design of the module is different than a traditional core if one were to compare to Thuban, Nehalem, or Sandy Bridge. Intel from all that I've read, seems to have only introduced changes where it would increase performance regardless of the application was used. Each core that's used in an Intel CPU is fully utilized and not halved like in a Bulldozer module is how I see it.
 
I am sure AMD saw that option too... but they new they could only compete with intel, what AMD did with BD is to create a new tech so radical and so new that NOBODY will dare to compete with it, why..?

in a few years there will be 20 core desktop CPU with 4 cores sharing a BD Module.. that makes things so cheap for AMD, all the software guys needs to do is wright apps for it.. how is Intel going to compete with 20 cores without charging $10k for one I can't see how they could do that with OLD CPU Tech..

cause intel is just goofing along holding tech back in my opinion.

http://nextbigfuture.com/2011/06/intel-will-introduce-50-core-processor.html
http://www.engadget.com/2010/12/28/researchers-create-ultra-fast-1-000-core-processor-intel-also/

I remember reading a couple years ago that Intel planned to have 60 core CPUs in 2011. Of course I can't find the article now - but it's apparent there's no real need for them to push hard through the tech if they have no competition and are enjoying the premier 1st place position as is. They are dooling out old tech compared to what they are R&Ding. I don't think for one second AMD has any upper hand on intel except in the joint CPU/GPU chips --- which Intel tried and failed at and seems somewhat content with making onboard chips at this time. In AMD keeps moving that direction and Intel keeps ignoring that direction -- AMD will enjoy a brighter future. A single chip robust solution for video and cpu processing is a holy grail of sorts in this industry...
 
Last edited:
fairly disappointing result.

until AMD can tap out a couple of chip revisions on current BD arch (combined with GF's 32mn node maturity), AMD needs to push Microsoft to release a patch for Win7 to take advantage of thread scheduling.
don't know how much of a big deal that would be, waiting for Win8 to get a bump in performance is retarded.
 
Robert Palmer, current member of AMD's Board of Directors and former Digital CEO purportedly remarked that

"Designing microprocessors is like playing Russian roulette. You put a gun to your head, pull the trigger, and find out four years later if you blew your brains out."

Sounds to me like they need to get better engineers as well as STOP USING STUPID PROGRAMS TO DESIGN THEIR CPUS.
 
I am surprised that nobody has brought up the lowering of the L1 cache as a possible factor.
 
I remember reading a couple years ago that Intel planned to have 60 core CPUs in 2011. Of course I can't find the article now - but it's apparent there's no real need for them to push hard through the tech if they have no competition and are enjoying the premier 1st place position as is.

Maybe they just realized there is hardly any real reason for 4 cores for most workloads, so having 60 would be pointless?
 
Last edited:
Sounds to me like they need to get better engineers as well as STOP USING STUPID PROGRAMS TO DESIGN THEIR CPUS.
lol, I remember that story from one of the A64 engineers :p
I am surprised that nobody has brought up the lowering of the L1 cache as a possible factor.

Maybe they ran into die size/transistor count cost issues? Like they did with Agena/Phenom I?
 
Well, [H] and Anand tech showed less than favorable reviews... Guru3D showed it in a bit better light..... then Legit Reviews ended theirs with this:

Legit Bottom Line: The AMD FX-8150 offers solid performance and is competitive with the Intel 'Sandy Bridge' series of processors.

I still think the 8 cores are overpriced, but the 6100 has me intrigued. The only stat that has been consistent among all four reviews is that single core performance is dismal and the power consumption is down right scary.

I think BD has some potential in the mainstream market, but a enthusiast part it is not.

 
I am surprised that nobody has brought up the lowering of the L1 cache as a possible factor.
It was brought up and discussed to death ~6 months ago when someone noticed that the AMD K10 software guide was basically a plea that all programs henceforth include a decent TLB logic emulator since the CPU didn't have one. There is no doubt that the this processor is exceptionally prone to TLB stalls from cache incoherence unless extraordinary care is taken. This is the result of not only using a very small L1 cache but also the decisions to eliminate physical cache addressing/hinting in favor of virtual addressing only, placing all load/store logic in one module at the very front of a very long pipeline, providing for very fast intra-module cache flushes but not between modules, and choosing not to include any memory disambiguation logic at the L2/L3 levels.

It almost seems like there were different teams and each team assumed it was someone else's problem.
 
Last edited:
in todays software state? Yes.. but the future will be more simple cores each working in highly parallelized coded apps, just like relentless ants..

You mean like GPGPU? If highly parralellized, simple tasks become more popular, who would take 8 CPU cores over...say 512 GPU? :p
 
AMD Modules are the Way of the Future, Intel FanBoys can't see this, heck I believe in the future we will have 3 cores sharing a mudule, why? the future is Highly paralel codes, apps, games..

but TROLLS... will be TROLLS... so I can't help you guys see the future, I for one I'm waiting on Interlagos...

But we are living now and if AMD does not make money they will not have money to invest in the future. Maybe this product is really forward thinking but so what there is a such a thing as putting a product out to early.
 
It was brought up and discussed to death ~6 months ago when someone noticed that the AMD K10 software guide was basically a plea that all programs henceforth include a decent TLB logic emulator since the CPU didn't have one. There is no doubt that the this processor is exceptionally prone to TLB stalls from cache incoherence unless extraordinary care is taken. This is the result of not only using a very small L1 cache but also the decisions to eliminate physical cache addressing/hinting in favor of virtual addressing only, placing all load/store logic in one module at the very front of a very long pipeline, providing for very fast intra-module cache flushes but not between modules, and choosing not to include any memory disambiguation logic at the L2/L3 levels.

It almost seems like there were different teams and each team assumed it was someone else's problem.

That is news to me... if that is the case, then they better fix it in the next revision. Cache issues play havok with performance. If the team(s) working on the BD overlooked this, then the whole lot of them needs to be canned.

I was thinking about getting an FX setup to play with.. but my current rig will tide me over just fine for quite a while.
 
The story unfolded
-the uarch was simplified, less execution INT units, FPU shared, longer instruction latencies , increase cache latencies, loosing 5% IPC for 20% more frequency ( this was the plan )
-the implementation sucked because if was done with synthetised tools ( 20% larger blocks and 20% slower than hand crafted )
-the process is borked, variability is high, yields abysimal and power is out of control ( frequency is lower than expected and parts are more power hungry )

This things were known for a year. Only one who did not want to see the truth could be surprised now at the performance.

They expected the a slight loss in IPC for existing code compensated by the increased frequency. The frequency gains were reduced in the implementation phase due to a high use of automated placing and routing tools, the end result being a part 20% larger and 20% slower than it could have been had it been carefully optimized. Yet, on top of that, the process failed to live to the expectations. It's a whole chain of events which killed BD.

Had everything worked right, taking a base Thuban of 3.3GHz, a 32nm BD should have lost 5% on IPC, gain 20% frewquency by uarch ( 4GHz ) and another 20% from process ( 45->32nm ). We would have had a 4.5-4.8GHz BD with slightly less IPC than a Thuban, in other words a part performing 30-100% ( in AVX/FMA cases ) better than Thuban. What we ended up with is a part that is +/- 10% of Thuban while being hotter.
 
I see nothing wrong with BD.. infact I believe is about the best piece of tech that comes along, AMD has a champ here, but trolls can't gasp why is NOT killing SB.. just give this tech a time and you will see how it will perform... more cores is they future...

There's been a few of these comedians in the forum lately.

As to whats wrong with BD.

The Short List:
It wasn't designed by Intel

The Long List:
Is it the architecture
the shared cache
the longer pipeline
the manufacturing process
the shared FP
all of the above

What happens when you take an architecture that's already behind the curve and degrade it to achieve higher clock speeds? You get BD. Crap performance and higher power consumption. This is not unlike Intel's P4 days when they went from Northwood to Prescott.

To the AMD diehards in here, I'm curious, how bad does BD need to perform before you'd actually admit it's a shit CPU? Does Phenom II need to outperform it in every test instead of just half like it does now?
 
You kinda get the feeling Bulldozer is actually a rough prototype.
 
But we are living now and if AMD does not make money they will not have money to invest in the future. Maybe this product is really forward thinking but so what there is a such a thing as putting a product out to early.

This. And there's no law forbidding anyone from buying AMD cpus if they had Intel so we can always jump into BD when software starts really using it.
And then we will be runing something like 22nm Piledriver @ 6Ghz oc with all improvements to architecture while people who bought FX8150 today will be stuck with them at <5 Ghz and something like twice the power consumption :D
 
Is it the architecture?
the shared cache?
the longer pipeline?
the manufacturing process?
the shared FP?
all of the above?
Size inefficiency versus SB. Inability to hit the projected 30% higher clockspeeds than Thuban despite the longer pipeline. Sucky GloFo 32nm process.

I wonder how much can they rework it for the consumer market in Piledriver. Most likely a cutting off most of the L3 cache is in the cards in Fusion for sure.
 
Right now people are looking at the performance characteristics when you disable a core per module in order to increase single thread performance. So far it looks very promising. We just need to see some game benchmarks when doing this.

http://www.xtremesystems.org/forums/showthread.php?275873-AMD-FX-quot-Bulldozer-quot-Review-%284%29-!exclusive!-Excuse-for-1-Threaded-Perf.

There might be a pretty big silver lining to having all of those modules if you disable a core per module and then overclock the snot out of it. If that performance is to be believed we should see lower power consumption, and greater performance across the board. Hmm my HTPC is a AM3+ socket. I might get one just to test this.
 
Last edited:
There might be a pretty big silver lining to having all of those modules if you disable a core per module and then overclock the snot out of it. If that performance is to be believed we should see lower power consumption, and greater performance across the board. Hmm my HTPC is a AM3+ socket. I might get one just to test this.

So the solution to the problems with the 8 core Bulldozer is make it 4 core? Seems like a less than optimal fix.
 
All I can say right now is that I'm glad I went ahead and pulled the trigger for a PhII 1090T last week. For all the hype and buildup Bulldozer seems quite anti-climactic.
 
Well for low threaded stuff like wow, it may be better to core juggle and turn it into a 2/4 core and OC it above 5GHz or something lol.

What a crazy little autistic monkey AMD has given to us to play with...
 
Here's a review that with the use of affinity, it improved the performance considerably in some benchmarks while in others remained the same, showing that Windows 7 schedulilng isn't optimal for Turdozer. This is a IPC comparison between Turdozer and Deneb at similar clocks.

http://www./forum/hardware-canucks-reviews/47155-amd-bulldozer-fx-8150-processor-review-3.html

Another article that states the same thing. According to AMD, Windows 7 has some scheduling issues, I don't know if a driver can fix it.

"AMD also shared with us that Windows 7 isn't really all that optimized for Bulldozer. Given AMD's unique multi-core module architecture, the OS scheduler needs to know when to place threads on a single module (with shared caches) vs. on separate modules with dedicated caches. Windows 7's scheduler isn't aware of Bulldozer's architecture and as a result sort of places threads wherever it sees fit, regardless of optimal placement. Windows 8 is expected to correct this, however given the short lead time on Bulldozer reviews we weren't able to do much experimenting with Windows 8 performance on the platform. There's also the fact that Windows 8 isn't expected out until the end of next year, at which point we'll likely see an upgraded successor to Bulldozer."

http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/11
 
Zarathustra[H];1037873454 said:
I think he's trolling, but he may be on to something.

AMD screwed the pooch, big time. They tried to introduce a major new architecture AND introduce a new process at the same time. There is a reason why Intel adopted its tick tock strategy and does this every other year.

By all accounts there is nothing wrong with the BD arch itself. It is intentionally designed with a long pipeline which reduces IPC somewhat, and allows higher clocks.

The problem is the 32nm process. Global Foundries yields and process maturity is likely preventing BD from reaching the clocks that AMD had planned by launch time, due to each clock increase requiring too much voltage. This is a typical symptom of a process yield issue.

We see this even more by the fact that the Opteron 6200 series low clocked parts have very good power usage and are competitive with Intel's offerings.

Code:
Model		Cores	Frequency	TDP		Pre-order price
Opteron 6204	4	3.3 GHz		115 Watt	$516.13
Opteron 6212	8	2.8 GHz		115 Watt	$303.17
Opteron 6220	8	3.0 GHz		115 Watt	$588.93
Opteron 6234	12	2.3 GHz		115 Watt	$430.00
Opteron 6238	12	2.5 GHz		115 Watt	$516.13
Opteron 6262 HE	16	1.6 GHz		85 Watt		$588.93
Opteron 6272	16	2.1 GHz		115 Watt	$588.93
Opteron 6274	16	2.2 GHz		115 Watt	$720.17
Opteron 6276	16	2.3 GHz		115 Watt	$881.22
Opteron 6282 SE	16	2.6 GHz		140 Watt	$1135.26

It is also notable that FX seems to use a TON of power when overclocked.

This means three things.

1.) AMD is getting poor yields from a immature process.

2.) The best parts (stable at the lowest voltages) are being binned as Opterons.

3.) Process yields mean that any clock increase comes with an exorbitant voltage and power penalty.


In Q1 2009, when Phenom II X4 first was launched, the highest cherry picked samples would clock on extreme cooling was 4.2Ghz. Fast forward less than 2 years and in December 2010 it was the norm for people to hit 4.2 with X4's on air.

Seeing that current cherry picked samples are hitting 8.5Ghz with extreme cooling tells us something about where the arch may be by 2013. Add to this that a 10% IPC gain ought to result each year from Piledriver -> Excavator.

Suddenly it makes so much sense why BD has been delayed so many times, and why we heard rumors this summer that AMD engineers were unhappy with the clocks they were getting on test samples.

AMD was hoping that Global Foundries process would mature more quickly than it did. we don't know how bad it was back in June, and how far it has come since then, but it is clear today that it didn't mature fast enough.

Hopefully if this speeds up though, we'll see a quick ramp-up in CPU speeds to the point where BD is a little more competitive.

Man, that post needed to be quoted in all its glory.

That's exactly it. Explains why the guy in charge of the relationship with GloFo got shafted in August IIRC.
 
Here's a review that with the use of affinity, it improved the performance considerably in some benchmarks while in others remained the same, showing that Windows 7 schedulilng isn't optimal for Turdozer. This is a IPC comparison between Turdozer and Deneb at similar clocks.

http://www./forum/hardware-canucks-reviews/47155-amd-bulldozer-fx-8150-processor-review-3.html

Another article that states the same thing. According to AMD, Windows 7 has some scheduling issues, I don't know if a driver can fix it.

"AMD also shared with us that Windows 7 isn't really all that optimized for Bulldozer. Given AMD's unique multi-core module architecture, the OS scheduler needs to know when to place threads on a single module (with shared caches) vs. on separate modules with dedicated caches. Windows 7's scheduler isn't aware of Bulldozer's architecture and as a result sort of places threads wherever it sees fit, regardless of optimal placement. Windows 8 is expected to correct this, however given the short lead time on Bulldozer reviews we weren't able to do much experimenting with Windows 8 performance on the platform. There's also the fact that Windows 8 isn't expected out until the end of next year, at which point we'll likely see an upgraded successor to Bulldozer."

http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/11

So AMD designed a processor that does not work well on the most popular desktop OS around and what is sold with nearly 90% of computers today. This AMD engineers sure know how to get it done...

That would be like BF3 not being optimized for AMD and nVidia GPU's.
 
So AMD designed a processor that does not work well on the most popular desktop OS around and what is sold with nearly 90% of computers today. This AMD engineers sure know how to get it done...

That would be like BF3 not being optimized for AMD and nVidia GPU's.

Who knows loll, but may be AMD didn't knew that Windows 7 scheduling was so ate up, but I also wonder which environment they use to test their processor's performance, Linux??
 
So AMD designed a processor that does not work well on the most popular desktop OS around and what is sold with nearly 90% of computers today. This AMD engineers sure know how to get it done...

That would be like BF3 not being optimized for AMD and nVidia GPU's.

It is far worse, it is not like any macs run AMD CPUs. And since the rise of apple back to main stream linux has seen alot less desktop use. Then you have netbooks which had linux but gave up on it. What I am trying to say is for AMD windows is probably 99% of their consumer machines.
 
You can always buy your BD now to run Win8 on it a year from now and enjoy 3-10% more performance from the better scheduler.
 
Back
Top