Reverse HyperThreading...?

CHAoS_NiNJA

[H]ard|Gawd
Joined
May 26, 2005
Messages
1,577
Ok, so, I'm sad to say I've been out of the loop. Is Reverse HyperThreading real, or is it all a bluff? Whens it coming? wtf is the deal?

I've had different people on the inside tell me both yes and no. I seriously have no idea.

Please clue me in.
 
with current cpu architectures, it is practically impossible to do in runtime

any level of instruction level paralellism in single threads is already being fully exploited by the fact that all modern cpu's are superscalar affairs
 
Is Reverse HyperThreading real,
No.

And the "lightweight profiling" extension specification AMD released this month is not in Barcelona. http://developer.amd.com/assets/HardwareExtensionsforLightweightProfilingPublic20070720.pdf and http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~118952,00.html It has nothing to do with "reverse hyperthreading" in any case.

That AMDZone poster is ridiculous. Intel had been talking about speculative multithreading for a lot longer than AMD has and something like RHT is not "AMD's idea." I didn't think Intel would ship a processor with it anyways until the "many core" era.
 
No.

The "lightweight profiling" extension specification AMD released is not in Barcelona. http://developer.amd.com/assets/HardwareExtensionsforLightweightProfilingPublic20070720.pdf and http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~118952,00.html That has nothing to do with "reverse hyperthreading".

That AMDZone poster is ridiculous. Intel had been talking about speculative multithreading for a lot longer than AMD has and something like RHT is not "AMD's idea."

Like I said I'm not entirely convinced. Different strokes for different folks, and all that jazz. RHT is still entirely possible. You can get better thread level parallelism by splitting a process at it's branches and then assigning it to a dedicated thread. Profiling is one needed step to accomplish this goal.

The problem with branch splitting is two fold,
1: Data dependanies
2: Thread scheduling.

Right now all threads are scheduled by the OS. This is not entirely ideal for RHT to work. It would be much better if the hardware itself scheduled threads for execution. Unfortunately no matter how much we wish it, that will never happen. This is where profiling comes in. It gives the OS much finer information about the goings on inside the CPU. The OS can then take this information and schedule threads more effectively. So we can partially solve the scheduling problem with profiling. Though in the end ATi's threading processor would be far better solution.

Data dependancies.... This is the bigger problem of the two. Lets say that the data needed was in Core 1's register set, and Core 0 needed it. It would need to be retired all the way to l3 cache then promoted all the way back up to the other register set. The amount of latency is incredibly huge on a relative basis.... It's just simply not worth doing. As such RHT in it's original idea is not worthwhile. You can improve single threaded performance by a good margin.... But is it worth it?

I personally dont think it is. Instead of wasting time trying to paralellize single threaded code, they should be using those resources to make it easier to write multi threaded code. Single threaded performance should be improved on an entirely instruction level basis... And that can be better done by reducing the complexity of the ISA. Adding new instructions helps in the short run, but in the long run it only adds complexity., which does nothing more then hurt the system as a whole.

To recap it is my opinion that ATi got it right. Create a few general purpose execution cores, and then create shit ton of very simple special purpose cores that are --instruction compatible-- with the GP cores that function as helpers. Then have a hardware scheduler issue threads to these execution cores. ATi got it right. I think it is in the industries best interest to immediately stop pushing SMP architectures and to start pushing aSMP architectures in its place. Some people might be reminded of Cell.... That is not ewhat I have in mind, simply becouse the the signal, and stream processors it has tacked on the side are not instruction compatible.....

That is my idea, and I still very much think this is the way forward. RHT is a move in the right direction, but with the wrong emphasis.
 
Like I said I'm not entirely convinced. Different strokes for different folks, and all that jazz. RHT is still entirely possible.
The original RHT thread here from last year has detailed reasons why RHT is unlikely as described by Theo. http://www.theinquirer.net/?article=32589

In a nutshell, the communication overhead to keep registers synchronized on a single thread split between 2 cores makes it very unlikely that 6 pipelines could be combined as a single core.

Theo is probably ROFL about how he can lead people around with the most ridiculous claims -- with absolutely zero proof.

BTW, real-time profiling isn't "reverse hyperthreading" no matter how many times you repeat it. :p
 
The original RHT thread here from last year has detailed reasons why RHT is unlikely as described by Theo. http://www.theinquirer.net/?article=32589

In a nutshell, the communication overhead to keep registers synchronized on a single thread split between 2 cores makes it very unlikely that 6 pipelines could be combined as a single core.

Theo is probably ROFL about how he can lead people around with the most ridiculous claims -- with absolutely zero proof.

BTW, real-time profiling isn't "reverse hyperthreading" no matter how many times you repeat it. :p

No but it is one method of reducing overhead by allowing the OS finer grained info with the goings on inside the CPU. It's not the best solution, but it does help. As I said I think ATi's threading processor is the best solution. But that may never happen so profiling fills an immediate need...
 
These guys here reckon it could be implemented in Barcelona.... Personally I'm not convinced. I still think that ATi's threading processor is a better solution.

Heres a link.....
http://www.amdzone.com/index.php?na...&t=11105&sid=aea3fd2af68e2499b53ce71b5495fb74

Those amdzone people are retarded. LWP is just perfmon with feedback to software. Intel has announced something similar months ago (RT, I believe)

Back to topic, big fat no. No matter how you cut it, LWP and RT do not allow one thread to execute on two or more cores.
 
No but it does provide feedback to the OS so that it can schedule threads far more effectively.... Which is a step in the right direction. The only thing they are missing now is a branch splitter. Which I dont think is all that much of a good idea. They would be better off with a thread processor similar to the one ATi uses in it's R600 and R580 GPU's....
 
No but it does provide feedback to the OS so that it can schedule threads far more effectively. Also keep in mind that AMD has a patent on this stuff dating back to 99.

That's not even what LWP does. Go read the spec.

As for "RHT", it is the exact opposite of power efficiency, so why even bother doing it, especially when the processor is rarely "wrong" to begin with.
 
That's not even what LWP does. Go read the spec.

As for "RHT", it is the exact opposite of power efficiency, so why even bother doing it, especially when the processor is rarely "wrong" to begin with.

It's not about being right or wrong. It's about scheduling threads. Which core will have free time in the near future? That is what profiling tells the OS. The OS can take this information and schedule threads far better then it can now. The problem is that right now threads have to be developed in code by the programmer. They cant be auto generated, or interpreted at runtime. Threads are actually hardcoded into the application by the author of that application.....

In a multicore world of the future this is simply not going to work. Harcoding threads is --not-- going to work past 4 or 8 threads. Anything past that will --need-- to be automated. There are 2 ways of doing this. One school of thought is that we can ignore threading all together by simply parallelizing processes along its branches. The problem with this idea is two fold.... 1 it will need data from other cores in a way that is impossible to predict, and therefore impossible to prefetch. 2. The OS will still need to schedule these threads. This is what RHT is.

The second school of though, which is what I follow says that we need to simply stop pursuing the SMP architectures that we are using today before it become another ghz fiasco. We dont need another prescott. Instead we need to implement about 4 GP x86 cores on die, and about 128 or so simplified special purpose cores on die that ---share--- the same ISA as the GP cores. So in that way the special purpose cores are entirely instruction compatible with the legacy x86 cores. This asymmetrical approuch can be entirely tranparent to software. However it would almost certainly require a new x86 extension, and compilers, and software would have to be rewritten to use it. In addition it would only require a few threads, and would scale incredibly well.

And the really cool thing about it is that we know it will work well, becouse ATi has already done it. It's done.
 
That's not even what LWP does. Go read the spec.
Replace OS with userland app, and he is correct. The spec is not specific to threading, but there is no reason it could not be used to help tweak a threaded app.
 
Replace OS with userland app, and he is correct. The spec is not specific to threading, but there is no reason it could not be used to help tweak a threaded app.

Absolutely, this extension can be used to tweak anything. So the claim that this is especially beneficial for multithreaded apps seems like an AMD marketing pitch to me.
 
The second school of though, which is what I follow says that we need to simply stop pursuing the SMP architectures that we are using today before it become another ghz fiasco. We dont need another prescott. Instead we need to implement about 4 GP x86 cores on die, and about 128 or so simplified special purpose cores on die that ---share--- the same ISA as the GP cores. So in that way the special purpose cores are entirely instruction compatible with the legacy x86 cores. This asymmetrical approuch can be entirely tranparent to software. However it would almost certainly require a new x86 extension, and compilers, and software would have to be rewritten to use it. In addition it would only require a few threads, and would scale incredibly well.

And the really cool thing about it is that we know it will work well, becouse ATi has already done it. It's done.

Who is "we"?

Good luck doing general purpose work with that kind of machine.
 
Who is "we"?

Good luck doing general purpose work with that kind of machine.

We includes you, or at the very least should.....

All I'll say is Fusion.... CTM...... Threading processor...... Profiling..... seems to add up to something a little more dont it?
 
We includes you, or at the very least should.....

All I'll say is Fusion.... CTM...... Threading processor...... Profiling..... seems to add up to something a little more dont it?

It doesn't add up to "RHT", that's for sure. LOL.
 
It doesn't add up to "RHT", that's for sure. LOL.

You havent read anything I said have you? Oh you must be one of them "wigglers"... Let me quote myself for you.....

It's not about being right or wrong. It's about scheduling threads.

Notice how I said it's about scheduling threads..... hmmm I wonder why I said that......

Harcoding threads is --not-- going to work past 4 or 8 threads. Anything past that will --need-- to be automated. There are 2 ways of doing this. One school of thought is that we can ignore threading all together by simply parallelizing processes along its branches. The problem with this idea is two fold.... 1 it will need data from other cores in a way that is impossible to predict, and therefore impossible to prefetch. 2. The OS will still need to schedule these threads. This is what RHT is.

Oh yeah thats right. Its becouse I said RHT isnt going to work... Then I said something about asymmetric processing......

we need to simply stop pursuing the SMP architectures that we are using today before it become another ghz fiasco. We dont need another prescott. Instead we need to implement about 4 GP x86 cores on die, and about 128 or so simplified special purpose cores on die that ---share--- the same ISA as the GP cores. So in that way the special purpose cores are entirely instruction compatible with the legacy x86 cores. This asymmetrical approuch can be entirely tranparent to software. However it would almost certainly require a new x86 extension, and compilers, and software would have to be rewritten to use it. In addition it would only require a few threads, and would scale incredibly well.

.....How it is a much better solution then RHT could possibly be.....
No but it does provide feedback to the OS so that it can schedule threads far more effectively.... Which is a step in the right direction. The only thing they are missing now is a branch splitter. Which I dont think is all that much of a good idea. They would be better off with a thread processor similar to the one ATi uses in it's R600 and R580 GPU's....

Thats a pretty good recap I'd say. RHT sucks. aSMP is better. A threading processor is ideal. Also Fusion, CTM, and Profiling in combination with the threading processor seem to be more then meets the eye....
 
duby229 said:
Its becouse I said RHT isnt going to work
duby229 said:
Like I said I'm not entirely convinced. Different strokes for different folks, and all that jazz. RHT is still entirely possible... Profiling is one needed step to accomplish this goal.

Nice one.

I can hardly figure out what you're saying anyways, you've horribly mangled the concepts of thread level parallelism. Why were you talking about branches again:

duby229 said:
The only thing they are missing now is a branch splitter.

WTF? I can barely link your stringy line of conceptual ideas together... :rolleyes:


Moving onto asymmetric processing, yes it's a great idea and it's already being worked on. No surprise there. I was just trying to make sense of your statement, copied below:

duby229 said:
Instead we need to implement about 4 GP x86 cores on die, and about 128 or so simplified special purpose cores on die that ---share--- the same ISA as the GP cores. So in that way the special purpose cores are entirely instruction compatible with the legacy x86 cores. This asymmetrical approuch can be entirely tranparent to software.

Yeah, "we" need to implement special purpose cores (vector processors?) "instruction compatible" with the "legacy x86 cores" that share the same ISA. So do they need decoders? "About 128" of those might be a bit bulky.

Also, there is absolutely no point creating heterogeneous cores without software involvement, the concept is too expensive otherwise, with lower ROI. Software transparency is often a good thing to have, but in this case, it is definitely not.

One last thing,

duby229 said:
Thats a pretty good recap I'd say. RHT sucks.

Sucks relative to what?
 
Nice one.

I can hardly figure out what you're saying anyways, you've horribly mangled the concepts of thread level parallelism. Why were you talking about branches again:



WTF? I can barely link your stringy line of conceptual ideas together... :rolleyes:


Moving onto asymmetric processing, yes it's a great idea and it's already being worked on. No surprise there. I was just trying to make sense of your statement, copied below:



Yeah, "we" need to implement special purpose cores (vector processors?) "instruction compatible" with the "legacy x86 cores" that share the same ISA. So do they need decoders? "About 128" of those might be a bit bulky.

Also, there is absolutely no point creating heterogeneous cores without software involvement, the concept is too expensive otherwise, with lower ROI. Software transparency is often a good thing to have, but in this case, it is definitely not.

One last thing,



Sucks relative to what?

Wiggly one aint it? eh? Funny how you can twist and jig, while completely ignoring what was said. Need I recap again?

Heres one more thing I was saving....
And the really cool thing about it is that we know it will work well, becouse ATi has already done it. It's done.

Whats that? ATi has already done it? Are you sure? 100% sure? Yep. I certainly am....
http://www.pcper.com/images/reviews/406/2900XTarchitecturediag.jpg
Now in this particualr implementation it isnt x86 compatible, and will require a massive overhaul to make it so. I believe that Fusion is the first step in that direction. However in the interim, there are several hurdles that need cleared. That is where this topic comes in.

Being as how I've made it perfectly clear. Can you wiggle some more... hhmm? Can you?
 
Wiggly one aint it? eh? Funny how you can twist and jig, while completely ignoring what was said. Need I recap again?

Don't bother, you have no clue what you're talking about. I've been in the field long enough to know bullshit when I read it. You were confused by profiling and still don't know what it is, you are completely confused about the theoretical idea of "RHT", and your ideas about asymmetric processing are laughably non-workable. I guess working in the real world allows one to include reality when processing ideas.

duby229 said:
Whats that? ATi has already done it? Are you sure? 100% sure? Yep. I certainly am....
http://www.pcper.com/images/reviews/...ecturediag.jpg
Now in this particualr implementation it isnt x86 compatible, and will require a massive overhaul to make it so. I believe that Fusion is the first step in that direction. However in the interim, there are several hurdles that need cleared. That is where this topic comes in.

Look, a stream processor. That hasn't been done before, no sir. :rolleyes:

Yeah, and LWP and RT help integration of the two cores how again? Get a clue, go back to school.
 
[1] Don't bother, you have no clue what you're talking about. [2] I've been in the field long enough to know bullshit when I read it. [3] You were confused by profiling and still don't know what it is, you are completely confused about the theoretical idea of "RHT", and your ideas about asymmetric processing are laughably non-workable. [4] I guess working in the real world allows one to include reality when processing ideas. [5] Yeah, and LWP and RT help integration of the two cores how again? [6] Get a clue, go back to school.

Lots of bashing, flaming, and no substance. Sentences 1-4 and 6 contain no information and are simply school yard bashing. Sentence 5 actually asks a question, but again, in a rude way that contributes nothing to the discussion. Care to make a post that says anything useful, or would you rather just bash Dubby to make your e-tool look bigger?
 
Lots of bashing, flaming, and no substance. Sentences 1-4 and 6 contain no information and are simply school yard bashing. Sentence 5 actually asks a question, but again, in a rude way that contributes nothing to the discussion. Care to make a post that says anything useful, or would you rather just bash Dubby to make your e-tool look bigger?

It is difficult to respond to a post that has no meaning (specifically, duby's posts). In regards to sentence 5, that was a rhetorical question.

To be honest, I can't really offer any more specifics from my point of view without violating NDA, and AMD's LWP specification is available for all to see. Even with the available public information it should be obvious that the technology has nothing to do with whatever duby was talking about, or the RHT crap the amdzone nutcases were raving about.
 
It is difficult to respond to a post that has no meaning (specifically, duby's posts). In regards to sentence 5, that was a rhetorical question.

To be honest, I can't really offer any more specifics from my point of view without violating NDA, and AMD's LWP specification is available for all to see. Even with the available public information it should be obvious that the technology has nothing to do with whatever duby was talking about, or the RHT crap the amdzone nutcases were raving about.

In the face of overwhelming evidence...... It's kinda funny actually.... :D
 
Back
Top