New FX-8xxx msrps

To me the interesting part is that the 8370e is 3.3 GHz stock so (as expected) the chart from a few weeks ago was wrong.
 
I am really interested in the 8320e at MSRP or less.

It should be a drop in replacement for my athlon ii which is in an asus board with ecc unbuffered as my fileserver (FreeBSD/ZFS)

I have 2 zpools. One has 2 vdevs in raidz2, the other has 1 vdev in raidz3. On top of that, all the filesystems have either lz4 or gzip comrpession. This absolutely kills my athlon ii 255 which is in there now when the box is getting a lot of I/O.
 
Why is AMD so allergic to introducing these multi-core monsters on FM2? It makes no sense to restrain these to such an ancient and power inefficient platform (seriously, all 990FX processors consume 20w more just at idle), and prop it up just for the sake of product segmentation.

I'm sure that people buying 6 or 8-core behemoths won't be all that disappointed that they don't come with integrated graphics, and they've already broken that precedent with the 750k.

I'm sure that OEMs can make versions of FM2 that can handle high-current overclocking well beyond the 100w spec (just like the original unofficial support for 220w on AM3+), and it would help a whole lot if the platform they sold was up-to-date. The reason I recommend Intel over AMD exclusively is because:

1. There's no good value in the AMD world for under $80, because their dual-cores are overpriced and under-performing.
2. If you buy an FM2 quad, there's no upgrade path to more cores. Offering 95w 6 or 8 core options on FM2 would really fix this problem, and make the platform more of a recommendation.
 
Last edited:
I think you answered yourself. :) AM3+ is old tech so why bring it to FM2+? Since Intel has been stagnate AM3+ is still relevant. Even after Intel released those godly 6 and 8 cores, (damn I want 16 thread!) they are tied to ridiculously priced motherboards and overpriced DDR4. As to why don't they refresh AM3+? I suppose they could, but wouldn't it be better to incorporate those ideas into the next platform to give a clean slate to build from?

With all of that said, I want an affordable replacement for my FX-9370 so bad. I feel your pain. :)
 
FM2+ is not intended for running more cpu cores. It made choice for consumers easier having a solution with some cores and good/decent graphics..
 
FM2+ is not intended for running more cpu cores. It made choice for consumers easier having a solution with some cores and good/decent graphics..

So when did AMD say that exactly? And the choice isn't as easy either. Good enough graphics Intel can do as well. Come back when the APU can do 1080P using high settings and its a true quad core, not 2 modules, aka 4 cores.
 
AMD doesn't bring more cores to the FMx platform because it doesn't feel like having a separate production line for a niche set of processors. Those FM2(+) APU's and CPU's are all using the same die that are found in the mobile notebook products, which saves money both on R&D as well as production costs. Most reject chips simply get re-badaged as top A-series SKU's or what have you for the desktop.

This is the same thing that Intel does with their mainstream platform, except for them, money isn't an object. That's why outside of LGA-2011 (i.e. re-branded Xeon rejects) everything is nothing but APU-style processors with no more than 4 cores max.

Both companies feel that 4 threads is more than enough for mainstream mobile products, and well, they're right. No casual or even power users who use laptops should really have any need for more threads, and if they do, the work they plan on doing is best served for a desktop anyway, in which case they can get more cores provided they invest in the proper platform to do so. Moreover, there would be no point in adding moar coarz to mobile when all that would do is force them to drop the clockspeeds even more for very little gain.

So when did AMD say that exactly? And the choice isn't as easy either. Good enough graphics Intel can do as well. Come back when the APU can do 1080P using high settings and its a true quad core, not 2 modules, aka 4 cores.

Two modules already are "true" quad-cores. Whether people get oddly confused by the 15h architecture or not doesn't matter, it's not anything close to SMT.
 
I feel sorry for people who think that 1 module equals 2 "true" cores. If I thought like that, I would have given up on AMD a while ago.

I mean seriously, I would rather think of an 8350 as a 4c8t part (like how windows thinks of it). This way AMD's 'quad' core parts actually perform kinda well compared to equally priced intel quads.
 
And I feel sorry for people who, for some unknown reason, fail to grasp why they are easily "real cores", still perpetuate the myth that it's just silly advertising.

Windows reports it as a 4 core, 8 thread part because that's exactly what the scheduling hotfix did in the first place. It did this because it's the way the processor was meant to perform (i.e. sending processes to different modules rather than filling a single module -> on the contrary, if a power-saving plan is set in Windows, it will put priority on filling a module up rather than doing the module-to-module approach.) This is due to the flawed CMT implementation (CMT itself isn't the problem, issues in the uArch itself are the problems.)

For my own sake, I will simply copy/paste a post I made on another site about this very stupid debate to save time:
Anyway, inside a "module", there exists two integer units. Each individual integer unit has it's own instruction bus, data bus, control unit, and datapath. Each core has it's own scheduler which is the control unit, basically. The control unit isn't shared between the cores. The datapath isn't shared either, as each core has it's own ALU's and AGU's. The data bus isn't shared either, as each core has their own in/out for data. This is the antithesis of Hyperthreading.

Meanwhile, with SMT, you have one core, and thus ONE control unit, datapath, data bus, and instruction bus, which spawns off two threads which compete for available resources. This is NOT what happens in the 15h processors. Not at all, not even close.

In AMD's implementation of CMT, the two cores in a module aren't competing for resources at all, as they have their own resources to use. As Blameless said, the shared FPU wasn't the most significant "problem" in the uArch at all. One of those was the shared instruction decoder, a problem which was mitigated with Steamroller.

Bulldozer didn't have poor performance because of CMT... CMT is nothing more than a concept. The Alpha 21264 was a CMT processor, and it was known for having very high performance at the time. The reason Bulldozer's ST performance was so lacking is due to the uArch itself, not the fact that CMT was the concept.

It's no secret that AMD's octocores are targeted mainly against i5's. That's simply because that's the closest to Intel they can compare without being laughed out of the building. Even AMD themselves are aware of this, as the latest FX release PR stuff was comparing their processors on productivity-based tasks, and they compared against the Haswell i5 chips rather than any i7's.
 
Each AMD FX modules have 2 integer scheduler or 'cores' however that does not equal two true cores as they share lot of resources not as NaroonGTX its trying to make it see.. they aren't true complete and independent cores they share instruction fetch, decode stages, FPU, and L2 cache... so each "core" its just like a 70% complete core and not only because each "core" have its own instruction execution pipeline can't be called or be qualified as a true core..

bulldozer-die-2.jpg


the share of those elements are he biggest impact in the IPC of the FX modules and that is the main reason of lot of bandwidth and power wasted making it inefficient each time each "core" need to be flushed to share resources every time it need.. To the operating system, the resulting module appears as a pair of cores, similar to how a Hyper-Threaded core would appear. AMD is naturally eager to dispel the idea that Bulldozer will behave anything like Hyper-Threading (or SMT), claiming that its design facilitates better scalability than two threads sharing one physical core. and that makes sense but a Bulldozer module really can’t be characterized as a single core because many of its resources are, in fact, duplicated.

block-diagram.jpg
 
The decode stage isn't shared anymore as of Steamroller, and the "it's not a real core" argument is nonsense because there never really was a real set-in-stone definition of what a "core" actually was. That definition has changed numerous times over the past couple of decades alone.

A core isn't defined by L2$, L1i and L1d caches, or any accelerators.

The instruction bus, data bus, control unit, and datapaths aren't shared between the cores. In BD/PD, the instructions go down one path and alternate between two paths every clock. In SR, the instructions go down both paths per clock.

An integer scheduler doesn't necessarily equal a core. There are two integer clusters literally located physically on the die.
oLFheTH.jpg


the share of those elements are he biggest impact in the IPC of the FX modules and that is the main reason of lot of bandwidth and power wasted making it inefficient each time each "core" need to be flushed to share resources every time it need..

That has very little to do with the IPC. It's the actual design of the uArch itself (not CMT) that is the culprit of the low ST performance. The fact that the two integer execution units share the FPU only impacts performance by about ~20% tops. That's why if you disable one core per module, you don't gain any actual ST performance. All that does is gimp the processor (an octocore, in this case) and make it perform like a quadcore. The cause of that multi-threaded penalty is the shared single decode stage, which was mitigated with Steamroller. SR saw the multi-threaded performance no longer be hampered by the shared decoder.

To the operating system, the resulting module appears as a pair of cores, similar to how a Hyper-Threaded core would appear.

As I said in my post, the reason this happens is due to the scheduling hotfix that AMD cooperated with Microsoft to release for Windows 7 and below (and came as a part of Windows 8 by default.) It does this for scheduling purposes, not to tell you anything about the uarch. CPU-Z and other apps will report it properly as an octocore, or however many cores the chip has under 15h family. Due to the flawed CMT implementation, you get better performance if you schedule a task to prioritize loading threads into a new module rather than filling one module up immediately.

and that makes sense but a Bulldozer module really can’t be characterized as a single core because many of its resources are, in fact, duplicated.

No, for the millionth time, the resources aren't mostly "duplicated". That's patently false, and the die shot I posted right there proves it. How are the resources duplicated when each integer unit mostly has it's own resources? Outside of the shared fetch, decode, L2$, and FPU/front-end, nothing else is shared. And the FPU itself, while heavily flawed (the major cause if anything for the low perf/core, not due to the shared nature, but other things) isn't the singular cause for it, like the shared decode, which was fixed with SR, as I said.

AMD is naturally eager to dispel the idea that Bulldozer will behave anything like Hyper-Threading (or SMT), claiming that its design facilitates better scalability than two threads sharing one physical core.

AMD was never eager to do anything of the sort, because that's nonsense. AMD was always going on about how CMT would offer higher throughput for a lesser given area cost.
uEgx9gY.png

1QDwbne.jpg


AMD's comparison was to their older uArch (Istanbul) whereupon Bulldozer could achieve higher theoretical throughput whilst simultaneously using less die space to do so. And that's precisely what happened. Was it by a huge amount? No, probably not. But that was their ideology at the time. By contrast, Intel uses a combination of CMP and SMT. Each core has it's own resources and nothing is shared with anything else (L3$ is obviously out of the core, and shared with all cores, same as AMD, but their caches aren't victim caches).

The purpose of SMT is to get higher throughput with resources that aren't being immediately used. It creates a virtual thread from hardware (i.e. completely independent of software) which shares the instruction bus, control unit, data bus, and datapaths with the "master thread". Family 15h does nothing of the sort, not at all. The only element of SMT in Family 15h is the FPU, and that is normal vertical multi-threading.

So while they "claimed" that their design facilitates higher throughput, they were absolutely correct in saying so. Their CMT approach scaled to 80% (prior to SR), whilst Intel's SMT scaled to about ~20-25% most of the time. Seems higher to me.

The problem is that people fail to wrap their heads around the fact that there is and never has been a singular, set-in-stone definition of what a "core" actually is. They go off Intel's old late 90's definition of it, and try to apply it to a modern landscape as if everything would remain static forever (which is funny, considering that Bulldozer's design was based off the older DEC Alpha 21264 chips, which is where the CMT concept originated... and guess who was one of the lead engineers on that? Dirk Meyer, who went to AMD after his stint at DEC was over.)
 
Anybody with a half brain can tell AMD's definition of "4" true cores is misleading. Test any of the latest AMD Quad (dual modules) against any of Intel's i3's dual cores with (hyper-threading) and you can see why. If you were allowed to OC the Intel i3's it would be no contest. I fail to see why people can't wrap their heads around that.
 
The problem with that marketing illustration is the bottom two images on the left are very much not to scale leading to wrong conclusions about throughput. From looking at performance there is at least as much processing power (execution units) in an Intel SMT core as a bulldozer module. This is why an 8 core bulldozer is at best competitive with an 4 core / 8 threaded i7. Remember AMD said that they needed 125% of the transistors of a Phenom core to make a bulldozer module so either each core has less execution units than a Phenom II had or a Phenom II was a very inefficient design. Yes I know some of the savings was the reduced cache.
 
Last edited:
Anybody with a half brain can tell AMD's definition of "4" true cores is misleading. Test any of the latest AMD Quad (dual modules) against any of Intel's i3's dual cores with (hyper-threading) and you can see why. If you were allowed to OC the Intel i3's it would be no contest. I fail to see why people can't wrap their heads around that.

Another fallacy. The reason the performance of 15h is so low is because it's pretty much just a shit design. CMT has nothing to do with it. A modern-day Core i core simply has much higher performance-per-core than a single 15h execution unit, so obviously a single core would be faster in many cases than a 15h module when Haswell generally has 2x (sometimes higher) perf/core. If you take a 10h core, it would get smacked around just the same, even with it's "traditional design", so I'm not sure why people try to use this as a counterpoint to the 15h "core" argument. I guess we're now at the point where a core isn't defined by it's actual build and makeup, but by it's performance? Lol, so if Company X made a radical uarch where one core somehow had 3x the throughput of a Haswell core, does that suddenly mean that Haswell no longer has "real cores"?

No, because that's beyond stupid, and you guys know it.

The problem with that marketing illustration is the bottom two images on the left are very much not to scale leading to wrong conclusions about throughput.

If you're talking about the second image by Hiroshige Goto, it's not wrong at all. He was illustrating throughput of the technology itself, not the uArch. An SMT virtual thread offers nowhere near the throughput of a CMT thread or a CMP thread.
 
Another fallacy. The reason the performance of 15h is so low is because it's pretty much just a shit design. CMT has nothing to do with it. A modern-day Core i core simply has much higher performance-per-core than a single 15h execution unit, so obviously a single core would be faster in many cases than a 15h module when Haswell generally has 2x (sometimes higher) perf/core. If you take a 10h core, it would get smacked around just the same, even with it's "traditional design", so I'm not sure why people try to use this as a counterpoint to the 15h "core" argument. I guess we're now at the point where a core isn't defined by it's actual build and makeup, but by it's performance? Lol, so if Company X made a radical uarch where one core somehow had 3x the throughput of a Haswell core, does that suddenly mean that Haswell no longer has "real cores"?

No, because that's beyond stupid, and you guys know it.

If you're talking about the second image by Hiroshige Goto, it's not wrong at all. He was illustrating throughput of the technology itself, not the uArch. An SMT virtual thread offers nowhere near the throughput of a CMT thread or a CMP thread.

Not a fallacy. Just because you and AMD's idea of what an actual core is doesn't agree with what others believe is a true core like Intel's doesn't make you any more right or us wrong. Like you said its all in what you consider a real core is. Your right, Bulldozer was a shit design so lets hope their next design uses cores that don't share anything but on die L2/L3 cache.
 
If you're talking about the second image by Hiroshige Goto, it's not wrong at all.

I say it is absolutely wrong. The scale between the two images does not represent the actual difference in the amount of processing power of a bulldozer module versus an Intel SMT core. If it did the size of the bulldozer module would be very similar to the size of an Intel SMT core.

He was illustrating throughput of the technology itself, not the uArch. An SMT virtual thread offers nowhere near the throughput of a CMT thread or a CMP thread.

My point is throughput does depend on how much processing power you put in each core.
 
@pcjunkie: Agreed, I'll concede to that. ;)

I say it is absolutely wrong. The scale between the two images does not represent the actual difference in the amount of processing power of a bulldozer module versus an Intel SMT core. If it did the size of the bulldozer module would be very similar to the size of an Intel SMT core.

You say it's wrong, but only because for some reason, you refuse to see it from the proper perspective. Goto knows his shit, trust me, he's not some random commoner who fails at journalism like 99.9% of "tech journalists" we have today. The illustration makes perfect sense. The SMT section wasn't talking about the throughput of the entire core itself, but rather the general boost you get with SMT, which isn't very much (10~25% on average.) An Intel core would fall under CMP, which as you can see, has "full" throughput.

My point is throughput does depend on how much processing power you put in each core.

Yes, and that image doesn't dispute that in any shape or form.
 
Then make the bulldozer module be the same size of the Intel SMT core and then scale the throughput of the AMD module down by the same amount so that it is roughly the same as the Intel core and it would be a realistic illustration.
 
The illustration clearly went over your head. Not gonna bother trying to reason with you, lol.
 
The illustration clearly went over your head. Not gonna bother trying to reason with you, lol.

It absolutely did not go over my head at all. I am not going to poke insults at you but we can just agree to disagree.
 
Back
Top