AMD processor speeds vs. Intel ...

Dude8383

n00b
Joined
Jan 12, 2004
Messages
58
Why are AMD's top processor speeds lower than Intel's?? Isn't that bad for competition? or are the newest amd processor's better even with their lower processor speeds.

Noob question i guess
 
Intel and AMD have different approaches to get top performance.

Intel P4's, and the Netburst architecture upon which they are based have long pipelines. These pipelines, when full, can allow the P4 to handle many instructions on the fly, and also takes into account that Silicon has physical limitations in regards to how much stress you can place on it. By using a long pipeline, the CPU should theoretically clock much higher than a CPU with a short pipeline.

Intel also chose to use two pipelines for ALU and FPU functions, respectively. They consider this to be a good compromise between transistor count, die size, heat dissipation, and performance

Now, you may note that the ALUs on a P4 are double-clocked. Why is this, you might ask? Because unfortunately, although the P4 has excellent branch prediction, even a relatively low number of mispredictions causes the P4 to lose performance badly. Remember that long pipeline? Yes, well, if you mispredict and the wrong information is in the pipeline, you have to flush the current data out, and reload the right data. On a short pipeline, this happens more quickly than on a longer pipeline. Additionally, it takes more steps to get data down to the point of execution, so the longer the pipeline, the more the CPU sits idle.

Thus, as you've surely noted when reading current benchmarks, the P4 excels at tasks where there is little branch prediction required, and prefetch can be used to keep data in or close to the L2 cache. Tasks such as encoding or compression happen quite efficiently on the Netburst architecture.

Tasks like compiling, and in general, games, often have large amounts of branches, and thus the P4 falters slightly. Keep in mind, that the P4 is hardly a "weak" CPU - we should give credit to Intel's electrical engineers, for it is still a good CPU.

The P4 also has an x87 FPU which features 'medium' performance, at least by comparison to an Athlon or Athlon 64. (No need to compare it to the Itanium or other architectures like the PowerPC or others, either...) The original FPU design on the P4 would've most likely shown much higher true IEEE x87 performance, but in the interests of saving die space, Intel chose to implement an 'FPU-lite' version in the Willamette. We have been stuck with the same FPU ever since. :(

To improve floating point performance, Intel instead chose to create SSE2, which complements the x87 FPU nicely. These new instructions added double-precision FPU support to the SSE feature set. Under ideal circumstances, SSE2 can literally improve the floating point performance of a given section of code by 300%. In the real world, the performance increase is normally tangible, but hardly 300%... The recent addition of SSE3 improved the efficiency of several key instructions normally required during encoding, and also featured some instructions that improve the amount of parallelism possible.

To summarize the Netburst architecture, as others have much more eloquently pointed out, think of an engine where it is required to rev over 4500 RPM to get any torque. (Assuming torque is the only measure of performance.)

I will post the AMD CPU side in a second... the wife is bothering me.
 
The frequency of AMD's processors are lower then intel's however that does not mean that they are slower (performance wise).

Prescott has a rather inefficient 31 stage pipeline while A64's have a 10 stage pipeline.

A64's carry out more instructions per cycle then Prescott's do.

A64's have an integrated memory conroller, so they can bypass the northbridge in that area.

Prescott's L2 cache latency is greater then that of A64's. (IIRC it's also asynchronous)

there's more but I'm tired.
 
Note that strictly speaking it's not AMD vs Intel.
Intel offers more than just the P4. While the P4 explores the limits of clockspeed, using a very long pipeline, Intel also has the Itanium and Pentium-M architectures, which aim for maximum instructions per cycle. The Itanium and P-M can beat AMD at its own game, and get the same or more performance as an Athlon64 at equal clockspeed.

Some might say the P4 was an experiment that went wrong. I prefer to see it as a CPU that is too far ahead of its time. The world is not ready for P4 yet. On the one hand, we have the manufacturing process that is giving problems. The 90 nm process turned out to get an unpredicted amount of leakage, which means that the P4 cannot reach the superhigh clockspeeds that it was designed for, not yet anyway.
And on the other hand we have the software. In order to make the P4 perform at its best, you have to eliminate as many branches as you can, and try to keep the cache filled.
This is not impossible to do, but it will require rewriting the software. Most software simply isn't optimized for P4, not yet anyway.
 
woah awesome info there..keep it coming a friend asked me to post the question up :) :)
 
DryFire said:
Prescott has a rather inefficient 31 stage pipeline while A64's have a 10 stage pipeline.

Prescott's L2 cache latency is greater then that of A64's. (IIRC it's also asynchronous)

Correct me if I'm wrong, but I am fairly certain the Athlon 64 architecture has 12/17 stages. (ALU and FPU, respectively.)

Also, I remember reading that the P4 had quite a low latency on the L1/L2, as compared to any of AMD's processors. Where did you find that information? (Honestly, I'd be interested to see some figures.)

To summarize my feelings about the Athlon 64, and finish my thought:

AMD chose to take another approach in achieving maximum performance - a small pipeline, and a lower overall clock speed. IPC has traditionally (here I go on a slippery slope...) been more prevalent in RISC-like CPUs than in CISC. Take for example, the four stage G4. As a post-RISC CPU, it has an extremely short pipeline, and relatively low clockspeeds. Yet somehow, it manages to attain similar levels of performance to many modern x86 CPUs clocked at much higher rates. Given that many of AMD's current engineers are ex-Alpha, it is hardly inconceiveable that they would favor this type of efficient, parallel, and scalable architecture. I believe AMD was quite smart to take this approach. AMD must have known that Moore's law was coming to an end, and that processor scaling could not continue at the same pre-P4 rate, given that as I mentioned above, Silicon does have some inherent limitations. As you shrink the die to smaller and smaller gate widths, current leakage increasingly becomes an issue, as does heat. How would AMD counter heat and current leakage? Enter SOI. SOI allows a certain amount of protection against current leakage by providing insulation around leakage areas. (Can someone who is more versed in this topic perhaps elaborate? I would love to hear more about this.) Intel has since admitted SOI will eventually be implemented on their CPUs, and in the meantime, has begun using strained silicon, which is another way of reducing leakage. To my knowledge, reducing leakage can help solve heat and power problems found on many modern CPUs. IBM Power CPUs and by virtue, the Apple G5 CPU series also make use of SOI.

The end result of AMD's planning - a cool running, extremely powerful CPU with a three pipeline design. This design was mainly borrowed from the K7 (Athlon), in the same way the Pentium M borrowed design elements from the PIII. AMD's planning must have started YEARS ago: they knew the K7 was only the beginning of good things to come. Someone at AMD has excellent vision...

The short to medium length triple pipeline ALU/FPU design of the Athlon allowed it to outperform the PIII, and for a long time, the P4 as well. AMD added two stages to each of the pipelines in order to allow the Athlon 64 to scale slightly better than the Athlon XP. This brought the stages for ALU to 12, and the FPU to 17. Additionally, AMD increased IPC significantly by adding an on-die DDR PC3200 controller. As others have mentioned, by adding the memory controller right into the CPU, there is no need to implement a northbridge on the motherboard. (The AGP GART is also on die - this should eliminate the source of MANY chipset issues, hopefully!) Not only did this reduce memory latencies beyond what any P4 could match, but also serves to eliminate the requirement for chipset vendors to create a northbridge. (If you are so inclined, AMD does state there are ways of bypassing the onboard memory controller. Future chipsets may choose to do this if there is a way of delivering similar or improved performance, although at this point, it is unlikely.)

AMD's implementation of the various incarnations of SIMD (MMX, SSE, SSE2, 3DNow!, and recently SSE3) are relatively strong, and do show marginal performance improvements in many areas. Because of the Athlon 64's powerful triple pipeline FPU, it does not necessitate as many tweaks to program code to achieve excellent levels of performance. Additionally, AMD improved the Athlon's TLB's, added 64-bit capabilities which double the number of general purpose registers, and allow complex functions that include large numbers not easily handled by 32-bit IA32 CPUs. Although AMD lengthened the pipeline from the K7 design, they actually improved IPC by up to 25%.

In many respects, the P4 is a better, more forward-thinking design, but when most code is poorly, if at all optimized, the brute force approach the Athlon 64 provides is sometimes more efficient. Besides, foundries just don't have the technology to crank out reliable 4GHz P4's yet. Intel will likely have to wait a few months for the process to catch up before this will happen, if at all. Until Intel does this, AMD will likely gain marketshare, as their CPUs feature roughly the same or more performance in most respects. (Don't get me wrong however; one should always choose the right CPU for the job, whether it is a P4 or other.)

The Athlon 64 is like a muscle car engine - lots of torque at low RPM, with little need to run at the same RPM as a 'P4' engine to achieve the same or better performance.
 
Josh_B said:
Correct me if I'm wrong, but I am fairly certain the Athlon 64 architecture has 12/17 stages. (ALU and FPU, respectively.)

Also, I remember reading that the P4 had quite a low latency on the L1/L2, as compared to any of AMD's processors. Where did you find that information? (Honestly, I'd be interested to see some figures.)

To summarize my feelings about the Athlon 64, and finish my thought:

AMD chose to take another approach in achieving maximum performance - a small pipeline, and a lower overall clock speed. IPC has traditionally (here I go on a slippery slope...) been more catered to RISC-like CPUs than to CISC. Take for example, the four stage G4. As a post-RISC CPU, it has an extremely short pipeline, and relatively low clockspeeds. Yet somehow, it manages to attain similar levels of performance to many modern x86 CPUs clocked at much higher rates. Given that many of AMD's current engineers are ex-Alpha, it is hardly inconceiveable that they would favor this type of efficient, parallel, and scalable architecture. I believe AMD was quite smart to take this approach. AMD must've known that Moore's law is coming to an end, and that processor scaling could not continue at the same pre-P4 rate, given that as I mentioned above, Silicon does have some inherent limitations. As you shrink the die to smaller and smaller gate widths, current leakage increasingly becomes an issue, as does heat. How would AMD counter heat and current leakage? Enter SOI. SOI allows a certain amount of protection against current leakage by providing insulation around leakage areas. (Can someone who is more versed in this topic perhaps elaborate? I would love to hear more about this.) Intel has since admitted SOI will eventually be implemented on their CPUs, and in the meantime, has begun using strained silicon, which is another way of reducing leakage. To my knowledge, reducing leakage can help solve heat and power problems found on many modern CPUs. G5 CPUs and by virtue, the Power series from IBM, have also made use of SOI.

The end result of AMD's planning - a cool running, extremely powerful CPU with a three pipeline design. This design was mainly borrowed from the K7 (Athlon), in the same way the Pentium M borrowed design elements from the PIII. AMD's planning must've started YEARS ago, when they knew the K7 was only the beginning of good things to come. Someone at AMD has excellent vision...

The short to medium triple pipeline ALU/FPU design of the Athlon allowed it to outperform the PIII, and for a long time, the P4 as well. AMD added two stages to each of the pipelines in order to allow the Athlon 64 to scale slightly better than the Athlon XP. This brought the stages for ALU to 12, and the FPU to 17. Additionally, AMD increased IPC significantly by adding an on-die DDR PC3200 controller. As others have mentioned, by adding the memory controller right into the CPU, there is no need to implement a northbridge onto the motherboard. (The AGP GART is only on die - this should eliminate the source of MANY chipset issues, hopefully!) Not only did this reduce memory latencies beyond what any P4 could match, but also serves to reduce the requirements for chipset vendors to create a northbridge. (If you are so inclined, AMD does state there are ways of bypassing the onboard memory controller. Future chipsets may choose to do this if there is a way of delivering similar or improved performance, although at this point, it is unlikely.)

AMD's implementation of SIMD (MMX, SSE, SSE2, 3DNow!, and recently SSE3) are relatively strong, and do show marginal performance improvements in many areas. Because of the Athlon 64's powerful triple pipeline FPU, it does not necessitate as many tweaks to program code to achieve excellent levels of performance. Additionally, AMD improved the Athlon's TLB's, added 64-bit capabilities which double the number of general purpose registers, and allow complex functions that include large numbers not easily handled by 32-bit IA32 CPUs. All of these features

In many respects, the P4 is a better, more forward-thinking design, but when most code is poorly, if at all optimized, the brute force approach the Athlon 64 provides is sometimes more efficient. Besides, foundries just don't have the technology to crank out reliable 4GHz P4's yet. Intel will likely have to wait a few months for the process to catch up before this will happen. Until Intel does this, AMD will likely gain marketshare, as their CPUs feature roughly the same or more performance in most respects. (Don't get me wrong however; one should always choose the right CPU for the job, whether it is a P4 or other.)

The Athlon 64 is like a muscle car engine - lots of torque at low RPM, with little need to run at the same RPM as a 'P4' engine to achieve the same or better performance.

Excellent reading, thanks
 
newls.. you didn't have to quote the entire thing ;)

Josh_B said:
Correct me if I'm wrong, but I am fairly certain the Athlon 64 architecture has 12/17 stages. (ALU and FPU, respectively.)

Also, I remember reading that the P4 had quite a low latency on the L1/L2, as compared to any of AMD's processors. Where did you find that information? (Honestly, I'd be interested to see some figures.)
yeah, i don't know for sure on the fpu, but everything i've seen says 12 stages.
and as for the cache latency, i'm pretty sure p4's L1 is faster, but the L2 is slower.. and the dothan (pentium m) dominates both by a lot
 
Josh_B said:
Correct me if I'm wrong, but I am fairly certain the Athlon 64 architecture has 12/17 stages. (ALU and FPU, respectively.)

Correct, K8 has two new stages over K7. 1 for extra decode for better vector / mixed path instruction handeling (SIMD, complex x86) Acutally that's more of a modification of the old 'scan' stage. The actually physical new stages are the ''pack' and 'pack decode' stages to allow instructions to chagne lanes post decode. Once the three instructions are picked and the operand fetched in K7 they're locked into what ALU they will be schedualed on. Pack-decode allows instructions to only be locked to an ALU right before they're shipped off to the schedular.


Also, I remember reading that the P4 had quite a low latency on the L1/L2, as compared to any of AMD's processors. Where did you find that information? (Honestly, I'd be interested to see some figures.)

Depends on what you're doing. L1 has very fast integer lookups, 2 and 4 cycle for NorthWood and Prescott respectively. But has fairly poor load times for the FP units, 9 and 12 cycle respectively.
L2 was excpetional on Northwood, just 7 cycles, but has scalled, well less than gracefully on Prescott, 18 cycles, now 27 with the 2Mib cores.

K8 is about 17 for L2, and typicall 3 cycles for L1.

Though that's a bit misleading,
a 3.8ghz Prescott 1Mib L2 is 4.7nS access time,
a 3.8ghz Prescott 2Mib L2 is 7.1nS,
and a K8 2.6ghz is 6.5nS.
So we really need to define 'slower,' yes access time in clock cycles is longer, but the front end for P4 is also much longer. Northwood had 12 stages between fetch, and schedual (when the data has to be in the register file and ready to go out for execution), Prescott, presumably, has about 16, K8 has only 9.
The raw lookup time is fairly comperable for Prescott 2MiB L2, and much faster for Prescott 1Mib L2.
 
:( looks like i mixed k7 and k8 a little and some fallacies to.

must sleep now before i go on misinforming people.
 
it's all good mate, i do the same thing sometimes, as long as you're [H]ard enough to admit that, nobody can hold anything against you :p
 
Dude8383 said:
Why are AMD's top processor speeds lower than Intel's?? Isn't that bad for competition? or are the newest amd processor's better even with their lower processor speeds.

Noob question i guess


Performance Rating:

AMD chips differ architecturally from Intel Chips in a number of ways. Intel chips use longer, narrow pipelines, while AMD chips use shorter, wider pipelines. Intel chips perform 6 Instructions Per clock Cycle (IPC), while the Athlon XP line performs 9. This means that an Athlon XP can theoretically be 33% slower in overall speed, yet do the same amount of work as a P4 @ the same rating.

Snoop dog would put it lis this:

AMD chips shot calla architecturally F-R-to-tha-izzom Intel Chips in a numba of ways . Chill as I take you on a trip. Intel chips use longa, narrow pipelizzles while AMD chips use gangsta motherfucka pipelizzles. Intel chips perform 6 Instructions Per clizzock Cycle (IPC), while tha Athlon XP line performs 9 fo' sho'. This means tizzy an Athlon XP can theoretically be 33% brotha in overall speed, yet do tha same amount of wizzle as a P4 @ tha same rat'n.
 
MD_Willington said:
Performance Rating:

AMD chips differ architecturally from Intel Chips in a number of ways. Intel chips use longer, narrow pipelines, while AMD chips use shorter, wider pipelines. Intel chips perform 6 Instructions Per clock Cycle (IPC), while the Athlon XP line performs 9. This means that an Athlon XP can theoretically be 33% slower in overall speed, yet do the same amount of work as a P4 @ the same rating.

Snoop dog would put it lis this:

AMD chips shot calla architecturally F-R-to-tha-izzom Intel Chips in a numba of ways . Chill as I take you on a trip. Intel chips use longa, narrow pipelizzles while AMD chips use gangsta motherfucka pipelizzles. Intel chips perform 6 Instructions Per clizzock Cycle (IPC), while tha Athlon XP line performs 9 fo' sho'. This means tizzy an Athlon XP can theoretically be 33% brotha in overall speed, yet do tha same amount of wizzle as a P4 @ tha same rat'n.

LMAO :p
 
MD_Willington said:
AMD chips shot calla architecturally F-R-to-tha-izzom Intel Chips in a numba of ways . Chill as I take you on a trip. Intel chips use longa, narrow pipelizzles while AMD chips use gangsta motherfucka pipelizzles. Intel chips perform 6 Instructions Per clizzock Cycle (IPC), while tha Athlon XP line performs 9 fo' sho'. This means tizzy an Athlon XP can theoretically be 33% brotha in overall speed, yet do tha same amount of wizzle as a P4 @ tha same rat'n.

Word.
 
He's wrong though...
The maximum number of instructions per clk is determined by the smallest stage in the architecture. In both cases it's the retirement stage, and in both stages they can retire at most 3 micro-ops per clk.
So that is the maximum IPC.
It's not that interesting that the Athlon can theoretically process 9 micro-ops per clk, because it can never decode fast enough to get 9 micro-ops in a single clk in the first place, and it could never retire them.

The real gain for the Athlon is that it has the memory controller on-die, so on a cache-miss there is less latency on the memory fetch... And because of the shorter pipelines, a flush is less costly, so mispredicted branches are cheaper.
And then there's the case of instruction latency... The P4 is designed to run at higher clockspeeds, which stretches the pipeline over more stages... This means that the amount of instructions that can run with 1 clk latency will be smaller. Other instructions should be replaced with sequences of low-latency instructions whenever this is faster.
Most software isn't yet optimized for these circumstances. And THAT is the real reason why the Athlon gets more IPC.
 
is it that the software isn't optimized, or the tasks that need to be done cannot take advantage of the streaming nature of the netburst architecture?

i'm thinking the latter ;)
 
(cf)Eclipse said:
is it that the software isn't optimized, or the tasks that need to be done cannot take advantage of the streaming nature of the netburst architecture?

i'm thinking the latter ;)

It depends a lot on the software at hand.
It's a bit of both I suppose. Some tasks won't suit the netburst architecture very well. But nearly everything can be adapted to perform better.
 
so how far would you go, could you say a non-oced athlon 64 3400+ would outperform an intel 3.4c when it comes to things like gaming?
 
I love it when people argue over which CPU is better. Basically it should boil down to one thing which AMD holds the edge at.

Price and Performance.

an AMD 3400+ clocked at 2.4 core costing $219 will outperform
http://www.newegg.com/app/viewproductdesc.asp?description=19-103-484&DEPA=1


the Intel P4 3.4 Gig clocked at 3.4 Gig costing $419
http://www.newegg.com/app/viewproductdesc.asp?description=19-116-196&DEPA=1

In almost every test the AMD will perform at about the same or better $200 less.

Now that should be enough for all of to see that AMD has the better CPU.
 
So a FX-55 on a 939 mb should be comparable to a top-line p4 atm? (3.0+ and up?)
I'm looking into this debate atm because i'm deciding if i should upgrade or not. I currently have a p4 3.0 on a 478 motherboard and it's bogging down on alot of the newer software coming out lately.
Since i can afford it atm, i was thinking of going to a FX-55 on a 939 motherboard. (can figure out exactly the best one to buy later) I just want to see if and how much my performance would increase from what i have atm. (i suspect it would be quite a bit) Any info would be appreciated. I have a 9800 pro and would be upgraded to an ultra or 850xt btw...
 
In every benchmark and performance measurement I have ever seen, the FX-55 outperforms any other CPU (with as many other factors as possible being the same) in most if not all gaming applications, especially the new ones. Obviously, this is excluding extremely overclocked situations.

I'm pretty sure it outperforms the 3.73EE, the P4 570 (3.8GHz) and the P4 660 (3.6GHz but w/ newer tech etc. and INTEL's fastest of the latest series of P4 chips). The FX-55 easily outperforms all 3.0GHz - 3.4GHz INTEL processors.
 
Ah thanks....My setup is decent atm but it really is starting to bog down on alot of new stuff. Especially anything that needs raw cpu power...
 
Back
Top