Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
I don't see how they can add HT to a Core 2 architecture based chip. As I understand it, Hyperthreading requires a deeper pipeline in order to be effective. The pipeline in the Core 2 is shallow compared to Prescott or Northwood based Pentium 4 CPUs.
Umh , you got it wrong.
You can do 2 things with SMT : better use available resources and mask latency.
P4 had a long narrow pipeline and few execution units while Core has a shallow pipeline , is very wide and has a lot of execution units.
Ironically , Core is better suited to implement HT because it has so many resources that it is unlikely a thread will ever use all.
Power 5 also has a short pipeline and yet it experienced bigger gains than P4 from HT.
HT needs unused execution units to be effective, not a deep pipeline.
Intel had mentioned before that HT would likely return to NGMA.
simulated cores on the cheap? what's not to like... and it shouldn't cost much since its not exactly new tech
Ok. I stand corrected. I seem to recall in earlier discussions that the longer pipeline was required. The discussion stemmed from someone asking about HT in the Pentium M processors.
you're probably thinking of the fact that HT requires the trace cache idea from the P4. Most CPU architectures decode x86 instructions "on the fly" and feed them directly into the pipeline...in order to implement HT, you need to decode them independantly and store the decoded instructions somewhere while both threads are scheduled.
This IMHO is pretty dumb. You already have two execution cores, or even 4. What's the point?
Seriously? Did you just ask "whats the point?"
Whats the point of making a faster processor? Whats the point of dual core? Whats the point of Quad core? Whats the point of having ever implementing HT on the P4? Why dont we all just go back to our x286 pc's and have a ball?
At least come up with a reasonable response as to why you think its "pretty dumb". Do you not live in the land of America? The land of "super size me!" If I can double the number of my theoritcal processors for another 20 bucks then by all means, super size my Intel order please Newegg.
Ok. I stand corrected. I seem to recall in earlier discussions that the longer pipeline was required. The discussion stemmed from someone asking about HT in the Pentium M processors.
you're probably thinking of the fact that HT requires the trace cache idea from the P4. Most CPU architectures decode x86 instructions "on the fly" and feed them directly into the pipeline...in order to implement HT, you need to decode them independantly and store the decoded instructions somewhere while both threads are scheduled.
This isn't true.
All x86 processors since PPro have out-of-order execution which consists of a system that first decodes x86 instructions into uOps, then stores them in a buffer, and lets the out-of-order logic pick instructions when they can be executed by the backend. Then the executed instruction is stored in the reorder-buffer, and finally the reordering station will write back the results in the proper order to ensure the correct in-order behaviour.
Trace cache is only different in that the code cache never actually stores the x86 instructions, but rather the decoded uOps.
This is not strictly required for HT, because HT only affects the out-of-order logic, really. It just has to keep track of 2 core states rather than 1.
Trace cache may improve performance if your x86-decoder itself isn't fast enough to decode the x86-code for 2 threads at the same time... but I'm not so sure if that is the case. After all, the efficiency of HT isn't *that* high, so there won't be all that many extra instructions that have to be decoded for the second thread.
Alternatively, you could just use a second x86-decoder instead of trace cache.
So it's not required (and I don't think IBM uses it in its version of HT/SMT). Nevertheless, tracecache was a nice idea, and can even benefit single-threaded systems. I've said it before, tracecache is one of the technologies of P4 that I think is most likely to be re-used in future processors.
Namely, there are all sorts of penalties when decoding x86-code. If you are running a loop (and most time of most programs is generally spent in loops), a regular CPU will get these penalties at every iteration, because it just redecodes the same instructions in the same way. With trace-cache only the first iteration will be slower, but after that, you feed the uOps directly, which have no decoding penalties.
thanks for clearing that up...at least I was semi-correct. I guess this explains why the p4's decoder can decode more instructions per clock than the scheduler can issue (IIRC it can decode 4 per clock and the scheduler can issue 6 per 2 clocks).
Anyway, I definitely agree that despite how often people want to bash the netburst architecture, it had some brilliant ideas--many of which were crucial to the current core2 arch.
...
I'd even go as far as to say that we may see a Netburst-like architecture again in a few years, if and when silicon manufacturing has matured enough to make 5+ GHz speeds possible... which is what Netburst was originally designed for.
A Core2 with tracecache and HT would already be quite similar... all it would need is a longer pipeline... and I do think that the pipeline will get longer in the future.
The key is finding the sweet-spot between IPC and clockspeed, and the right pipeline-length is crucial to that.
Going from P3 to P4 was a huge leap, and Intel overshot the sweet-spot by miles... Currently they seem to play it safe and take it one step at a time... The current Core2 will probably get us past 4 GHz... and then Intel will probably want to see if 5+ GHz is possible, with a longer pipeline.
don't forget that we're entering an age where effeciency matters. The new name of the game in the computer industry is "power effeciency," and hyperthreading is ideal for that purpose. Hyperthreading makes each individual core more effecient (5-20% depending on app, in the p4's case) which is definitely a plus. Adding hyperthreading to intel's existing architecture hardly requires any changes, so why not do it in the interest of effeciency? When intel added the additional scheduler and associated infrastructure for HT to the p4 it only added about 5% additional die space, so why not?
I don't think getting more IPC than Core/K8L is worthy with x86.The complexity of the decoder and prefetch part will become true bottlenecks sooner or later.
Netburst was brilliant in guessing the future : with limited IPC that you can extract , better have a simpler , narrower core that can run at huge frequencies.Too bad manufacturing couldn't keep up ...
But you still have 4 cores so what's the point?
There is no point. There is never any point to having faster equipment for less money.
Nothing uses more than four cores anyway, no multitasker can load 8. What % does HT help anyway, I remember it being -5 - 10%.
This IMHO is pretty dumb. You already have two execution cores, or even 4. What's the point?
Well, HT can make the processor do more work aswell... just not as much as an entire second core.
But still I've seen up to about 20% gain on some of my multithreading code with HT-machines... code that was actually meant for dualcores.
Run the program in this thread on a P4 with HT enabled and disabled... I don't guarantee that you'll get 20% gain, but I'm quite sure it will run noticeably faster with HT enabled, when you run it with 2 threads.
On a P4 3.0 HT I got 135 fps with the singlethreaded version, and 145 fps with HT enabled and 2 threads. So quite some improvement...
http://www.hardforum.com/showthread.php?t=1149750
So 7% is 20%? Wow. Great math there. Instead of making outlandish claims, you should focus on running the test first and then posting what the actual results are.
Well, frankly, HT sucked wang on the P4, and I can't imagine it not sucking wang on a Core2 due to that stigma. As I alluded to in my last post, the Core2 may indeed be something different, though, and maybe it will be a pleasant suprise, until then I'm not holding my breath, and I'm more than satisfied with the blazing speed of my dual core chips.
Yeah, supersize all ya want, you'll be dead by 40 from heart disease
the same could be said for MMX and SSE by some folks too. the implementation of them is what sucked (IMO). they all hold/held great potential, but with faster silicon always on the horizon, just code for muscle seems to be the method of choice. timing-getting to market and other factors contribute just as much as anything does (maybe more)
it is good to hear that the 45nm and the new process in using halfnium etc, is gonna be available to 775
Here's a thread with some user benchmarks with HT.
http://forums.anandtech.com/messageview.aspx?catid=28&threadid=1180277&enterthread=y&arctab=y
20% is not unusual when two CPU intensive threads or apps are running simultaneously.
I think you haven't been keeping up for more over a year. Core and Core 2 CPUs are native dual core. Even better, they share the L2 cache and Core 2 cores can snoop on one another's L1 cache. That's a higher level of integration than AMD has in their "native" dual core CPUs.I really dont care for the "bolt" two cores to a chip and call it dual core.
Im all for the new processors, i hope there more than just clock increases with extra cache. I do however have a thought about them i dont like. I want to see a native dual core AND four core processors from Intel. I also want memory controllers on them as well. I really dont care for the "bolt" two cores to a chip and call it dual core. I really thought after the pentium d that core 2 duos would be native and i really thought that the new 45nm would be but alas they arent. Im sure they will be fun, but could be much better. Im looking foward to AMDs native 4 core processors due out soon. Word is they are 40% stronger than current intel 4 core processors. Ill sell my e6600 in a second if it holds true.
Im all for the new processors, i hope there more than just clock increases with extra cache. I do however have a thought about them i dont like. I want to see a native dual core AND four core processors from Intel. I also want memory controllers on them as well. I really dont care for the "bolt" two cores to a chip and call it dual core. I really thought after the pentium d that core 2 duos would be native and i really thought that the new 45nm would be but alas they arent. Im sure they will be fun, but could be much better. Im looking foward to AMDs native 4 core processors due out soon. Word is they are 40% stronger than current intel 4 core processors. Ill sell my e6600 in a second if it holds true.