Google Disables SMT in ChromeOS due to Intel Bugs

Zarathustra[H]

Extremely [H]
Joined
Oct 29, 2000
Messages
38,844
After OpenBSD disabled SMT in 6.4 last year, Google has now followed suit. Apple, Microsoft, Red Hat, and Xen have all also to one extent or another recommended that users might want to disable the feature for the sake of security, in response to MDS.

Is this the beginning of the end of the road for SMT?
 
I doubt it. The current revision of Intel chips already have hardware fixes for MDS. This is for the older chips. Also, these attacks haven't been successfully carried out on AMD processors yet, so it may not be an inherent flaw of SMT.
 
Of it the end for SMT why is the Ryzen not exposed to similar problems.
At the moment this just looks like Intel design issues, maybe to hit performances ...

Maybe there is a fundamental flaw with SMT but amd are ok at the moment
 
yet more news that makes me glad I "downgraded" my dual Ivy Bridge xeons to Opteron 6366's - and if/when the time comes I'll go with an EPYC after that.

I like my $15 Opterons
 
The security flaws are found in Intel's hyperthreading, not SMT. AMD's implementation of SMT appears to be secure, based on what we know so far.
 
I would be really annoyed if I spent $100-$200 extra for a Hyperthreaded chip in my laptop just to have it disabled.

The performance loss is huge.
 
The security flaws are found in Intel's hyperthreading, not SMT. AMD's implementation of SMT appears to be secure, based on what we know so far.

None have been found in AMD's implementation yet.

The OpenBSD and Google team seem to think the whole concept is flawed and it's only a matter of time...
 
I would much rather make these decisions on my own. Then again, ChromeOS is all about hand-holding.

Maybe one day they will simply stop allowing infected apps on the play store instead of nerfing people's computers?
 
SMT isn't at fault.

Intels implementation of SMT is.

The MDS exploits (there are more then one) are related directly to S&M as they are all side channel speculation exploits.

Intel has been cheating >.< When moving data to L1 cache software will use a virtual memory setup... very little really fits in L1 on any chip. Intel has "optimized" their implementation and it does a bunch of speculation. Intels main issue is they DON'T clear the cache out between execution. So MDS is basically a form of advanced statistical analysis that rebuilds that data based on what bits and sizes Intel has left behind. Its pretty ingenious really.

As Red Hat Says on MDS;
"
The load port variant of MDS targets a processor structure used during the process of loading a single data value into a processor register. Registers are small memories that store data as it is being operated upon by the execution units. Data is loaded from a cache line into a register for processing, by means of the processor load ports. There are typically only a couple of load ports, and they are competitively shared by peer sibling threads of a Hyper-Threaded core. During the process of performing a load into a register, the load port needs to be able to handle the largest possible load that it may encounter - such as a 512-bit wide vector value - in addition to the smaller 8, 16, 32, and 64-bit loads performed routinely during the course of a program.

Intel’s implementation of load ports doesn’t completely zero out previous data that might have been loaded by older instructions within a program. Instead, it tracks the size of the load, and only forwards those bits that are supposed to be accessed by a load to the internal processor register "file". But as an optimization, loads can be forwarded (bypassed) for certain operations even while the data is in flight to the register file. This forwarded data may speculatively appear to be larger than its actual width, allowing an attacker to sample certain stale load port data. In the vector example, which is commonly used by cryptographic code, a reasonably large amount of data can be sampled, potentially allowing attackers to derive bits of cryptographic keys used by other applications. This is not in any way related to the earlier "LazyFPU" vulnerability.

"

Translation... Intel is cheating for performance and it leaves behind just enough in the cache load that one program can glen from the cache what the other program is doing.
 
You know what they say, cheaters never win.....

unless you're Intel and then cheating is always a winning proposition - whether it's cheating on processor performance, anti-competitive cheating when strong-arming motherboard manufacturers when they want to make boards for competitors' products back in the day, hell, if Intel had a wife maybe he's cheating on her too?

Is Mr. Intel getting busy with the maid behind Mrs. Intel's back? Is a cleaning lady love-child going to show up when he's 18 and kick our collective asses?

My question is: If they're doing this.............





...Then what else are they doing or not doing?
 
After OpenBSD disabled SMT in 6.4 last year, Google has now followed suit. Apple, Microsoft, Red Hat, and Xen have all also to one extent or another recommended that users might want to disable the feature for the sake of security, in response to MDS.

Is this the beginning of the end of the road for SMT?

Intel doesn't recommend disabling SMT. Google is doing it because it is one of those few customers "who cannot guarantee that trusted software is running on their system(s)". Moreover modern CPUs have hardware mitigations

https://www.intel.com/content/www/u...ngineering-new-protections-into-hardware.html
 
Intel doesn't recommend disabling SMT. Google is doing it because it is one of those few customers "who cannot guarantee that trusted software is running on their system(s)". Moreover modern CPUs have hardware mitigations

https://www.intel.com/content/www/u...ngineering-new-protections-into-hardware.html

Did you read your link ? lol

V1 = software mitigation
V2 = a Mix of hardware and software (in otherwords the same microcode fixes that cost older CPUs performance as well)
V3 = Mostly hardware a few new chips still need software mitigation
V3a = Microcode fix for almost all but one line of Xeons
V4 = a mix of hardware microcode and software on all chips
Foreshadow = hardware fix for most models unless your unlucky enough to get a stepping 11 chip
Ridl = hardware unless again your unlucky enough to pick up a stepping 11 chip
Fallout = 50/50 on current chips some have hardware fixes others are microcode fixes
MlDPS and MDSUM = mix of microcode and software

Reading this it looks like for SOME of these exploits they are slowly working hardware fixes into their pipe. For others like V1 the fix is software, for V2 V4 its microcode. Those are not proper hardware fixes.

And of course the issue isn't the chips Intel is selling today (they are working through a ton of product that still isn't properly fixed), the main issue is the millions of Intel chips out there already. Its going to take years for all those chips to go through their life cycle. Meaning Intel running companies are going to be patching new side channel exploits for YEARS. Expect a new novel way to exploit side channel stuffs every 6 months till that hardware is history in 4 or 5 years.
 
Intel is going to turn x86 into one big mitigation.

I doubt that just better news for AMD. The change that customers will invest in new hardware will contemplate the headache of buying Intel with their performance short cuts that now cause a significant performance loss when you disabled hyper threading .....
 
Did you read your link ? lol

V1 = software mitigation
V2 = a Mix of hardware and software (in otherwords the same microcode fixes that cost older CPUs performance as well)
V3 = Mostly hardware a few new chips still need software mitigation
V3a = Microcode fix for almost all but one line of Xeons
V4 = a mix of hardware microcode and software on all chips
Foreshadow = hardware fix for most models unless your unlucky enough to get a stepping 11 chip
Ridl = hardware unless again your unlucky enough to pick up a stepping 11 chip
Fallout = 50/50 on current chips some have hardware fixes others are microcode fixes
MlDPS and MDSUM = mix of microcode and software

Reading this it looks like for SOME of these exploits they are slowly working hardware fixes into their pipe. For others like V1 the fix is software, for V2 V4 its microcode. Those are not proper hardware fixes.

And of course the issue isn't the chips Intel is selling today (they are working through a ton of product that still isn't properly fixed), the main issue is the millions of Intel chips out there already. Its going to take years for all those chips to go through their life cycle. Meaning Intel running companies are going to be patching new side channel exploits for YEARS. Expect a new novel way to exploit side channel stuffs every 6 months till that hardware is history in 4 or 5 years.

Stepping 12 for 8th gen mobile chips and Stepping 13 for 9th gen desktop chips have full hardware mitigation for MDS attacks, which is the focus of this thread. I don't know why you're bringing the other exploits into this.
 
Stepping 12 for 8th gen mobile chips and Stepping 13 for 9th gen desktop chips have full hardware mitigation for MDS attacks, which is the focus of this thread. I don't know why you're bringing the other exploits into this.

Because they are all variations of the same attack vector. MDS isn't completely new its a variation. Its a side channel attack.

I am glad they are working on getting this fixed in hardware don't get me wrong... but understand most of this stuff is still not fixed. And if you read when they say mitigation for MDS they are talking about microcode which means its not fixed in hardware, not really. There isn't one chip they are selling right now that is 100% protected from side channel attacks via hardware alone. To be fair AMD, Power and ARM are not 100% safe either... its just they are all protected from the easiest of these attacks as they don't leave things in cache between executes and they check permissions prior to execution. Intel cheated on both scores and it costs them. You can use side channel attacks on any SMT running chip like Power and AMD chips... but things like Zombieload where you watching chip cache errata building patterns of use to rebuild crypto keys... that is exclusive to Intel chips, and so far the only fixes they have for even current shipping chips are microcode software fixes.

Not to double post things but;
https://www.phoronix.com/scan.php?page=article&item=mds-zombieload-mit&num=1
Micheal did test current generation xeon and 9900k performance is impacted pretty heavily... and extremely in cases of context switching. (which is going to be every VM running Intel server)

PS... think this thread is about BSD and now Google all turning SMT off. BSD disabled HT last year due to the first round of side channel attacks. MS Red Hat and others for now are just recommending that it might be wise to disable for people running VMs ect.
 
Last edited:
SMT isn't at fault.

Intels implementation of SMT is.

The MDS exploits (there are more then one) are related directly to S&M as they are all side channel speculation exploits.

Intel has been cheating >.< When moving data to L1 cache software will use a virtual memory setup... very little really fits in L1 on any chip. Intel has "optimized" their implementation and it does a bunch of speculation. Intels main issue is they DON'T clear the cache out between execution. So MDS is basically a form of advanced statistical analysis that rebuilds that data based on what bits and sizes Intel has left behind. Its pretty ingenious really.

As Red Hat Says on MDS;
"
The load port variant of MDS targets a processor structure used during the process of loading a single data value into a processor register. Registers are small memories that store data as it is being operated upon by the execution units. Data is loaded from a cache line into a register for processing, by means of the processor load ports. There are typically only a couple of load ports, and they are competitively shared by peer sibling threads of a Hyper-Threaded core. During the process of performing a load into a register, the load port needs to be able to handle the largest possible load that it may encounter - such as a 512-bit wide vector value - in addition to the smaller 8, 16, 32, and 64-bit loads performed routinely during the course of a program.

Intel’s implementation of load ports doesn’t completely zero out previous data that might have been loaded by older instructions within a program. Instead, it tracks the size of the load, and only forwards those bits that are supposed to be accessed by a load to the internal processor register "file". But as an optimization, loads can be forwarded (bypassed) for certain operations even while the data is in flight to the register file. This forwarded data may speculatively appear to be larger than its actual width, allowing an attacker to sample certain stale load port data. In the vector example, which is commonly used by cryptographic code, a reasonably large amount of data can be sampled, potentially allowing attackers to derive bits of cryptographic keys used by other applications. This is not in any way related to the earlier "LazyFPU" vulnerability.

"

Translation... Intel is cheating for performance and it leaves behind just enough in the cache load that one program can glen from the cache what the other program is doing.
So in the worst case, a program could have access to 512-8=504bits of data that it shouldn't have access to? In a single request, that is.
 
So in the worst case, a program could have access to 512-8=504bits of data that it shouldn't have access to? In a single request, that is.

Chips make thousands of requests.

The cache system on a chip forms a hierarchy in which the innermost level runs at the core speed of the processor’s execution units... while the outer levels of cache get slower. So L1 fast but small... L2 bigger slower still fast... L3 bigger yet but slower. Then Ram preload virtual channels.

Caches keep copies of data from external memory... an address and its value. At the inner most L1 cache level they are split futher into data and execution bits. So if you give your CPU something to do... it stores that in L1 does the work then stores it back in the L1 data cache. Now in say intels implementation they are leaving that data in the cache until the space is required again... because that math might be reused. With that in mind the processor can perform "Speculation" where it will use internal bits telling it to pre pull stuff from the memory preload channels that it MAY need. If the processor guess wrong it just ignores that data and overwrites it when it needs the space... and the way things where assumed to work user land software should never know what is going on behind the curtain. A software programmer is not supposed to see all that speculation going on.

S&M where the first known side channel exploits... and basically how they work is programmers are basically writing code that is checking if the cache space is used or not (and unpatched S&M chips from intel don't mind you reading it even if you don't have permission to cause they don't bother checking till execution of those bits)... with MDS they are simply sampling that data space over time (they don't need to read it) and rebuild the data in full. For anyone that has used a Raid or Parity data for compressed archives... its the same idea. By paying attention to what bits are used over time we can rebuild the data without ever seeing all of it. That might not work for pulling say a text file... but if all your looking for is the Crypto key for the machine your sniffing your golden. This works in much the same way a WiFI password cracker does. By seeing a large sample of little bits of info you can infer the common string. A crypto key or a Wi Fi Key.

The difference between meltdown and MDS are basically... that meltdown took advantage of Intel not checking to see if bits had permission to read shared cache space. MDS takes advantage of the cache not being cleared... to perform a statistical analysis on the data in the cache. Simply getting a you don't have permission return tells you something is there. Which is all it really needs to know, as the size of bits of data going to the cache are somewhat known. Like a parity rebuild its figuring out what should be there by sampling what is and what isn't. (My understanding is Intel hyper threading speculation routines leave a lot of data in the larger L2 and L3 caches which make this exploit even more efficent... there could be more to that I haven't read a ton on the yet)

So long answer to basically say.... the issue isn't a hacker using these attacks to pull out a specific bit of text or rebuild a JPG being resized or something. Its main use would be to decipher crypto codes imo. That isn't as difficult with this attack as some would think. Again it works a lot like a Wi-Fi sniffer... it doesn't need to effect the data at all... just recieve it long enough to see the pattern and rebuild the relevant data.
 
Last edited:
So AMD sold us the brick house that isn't as big or pretty but came with a poured concrete basement and reinforced exterior doors with double deadbolts...

Whereas Intel sold us the beautiful wood frame house with big square footage that didn't have proper compaction or grading before the foundation was poured, used basic metal home depot exterior doors with single builder grade locks and 24 inch stud spacing.
 
So AMD sold us the brick house that isn't as big or pretty but came with a poured concrete basement and reinforced exterior doors with double deadbolts...

Whereas Intel sold us the beautiful wood frame house with big square footage that didn't have proper compaction or grading before the foundation was poured, used basic metal home depot exterior doors with single builder grade locks and 24 inch stud spacing.

I would add they built it in flooding zone without reporting it too lol. Let's be honest all business take educated risky decision to be on top, it paid off Intel for very long and is probably one of the key contributor for where they're today. But the lack of real actions taken is strange at best. Either they can't deliver a real HW patched version that can compete with competitor or they simply still think that gambling on the risky path is a best approach because of their current position on the market. Losing the "somewhat" performance crowd may kill all the momentum gained thru the years and I say "somewhat" because at the same price point, they aren't really ahead...

Anyway, all this good PR (pun intended) may make them move faster to better solutions (they already offer more core count with no HT). AMD need to grab this opportunity and run with it, Zen2 can't come fast enough... I want to see Intel damage control once those hits market.
 
I would add they built it in flooding zone without reporting it too lol. Let's be honest all business take educated risky decision to be on top, it paid off Intel for very long and is probably one of the key contributor for where they're today. But the lack of real actions taken is strange at best. Either they can't deliver a real HW patched version that can compete with competitor or they simply still think that gambling on the risky path is a best approach because of their current position on the market. Losing the "somewhat" performance crowd may kill all the momentum gained thru the years and I say "somewhat" because at the same price point, they aren't really ahead...

Anyway, all this good PR (pun intended) may make them move faster to better solutions (they already offer more core count with no HT). AMD need to grab this opportunity and run with it, Zen2 can't come fast enough... I want to see Intel damage control once those hits market.

Well Intel is wrestling with 7? different vulnerabilities at this point, some of which are apparently harder to mitigate than others. Not to mention 10 nm issues.

The MDS (hyperthread-focused) attacks were hardware mitigated fairly quickly, and required only a stepping to do it. The big ones, like Spectre, probably require a new architecture.
 
I would add they built it in flooding zone without reporting it too lol. Let's be honest all business take educated risky decision to be on top, it paid off Intel for very long and is probably one of the key contributor for where they're today. But the lack of real actions taken is strange at best. Either they can't deliver a real HW patched version that can compete with competitor or they simply still think that gambling on the risky path is a best approach because of their current position on the market. Losing the "somewhat" performance crowd may kill all the momentum gained thru the years and I say "somewhat" because at the same price point, they aren't really ahead...

Anyway, all this good PR (pun intended) may make them move faster to better solutions (they already offer more core count with no HT). AMD need to grab this opportunity and run with it, Zen2 can't come fast enough... I want to see Intel damage control once those hits market.
You know... Ive wonder about this got away for a long time...
Wouldn't be these types of exploits be the ones other.. you know for real actors be looking, probably found and keep deep deep deep secret... In fact deployment would even be limited to situation of maximum spionage value so to speak?
Meanwhile we are confident in our advanced hardware, and our billion dollar companies being soooo at the protection of security that we would deploy their hardware with little review other than being 'approved'.
You think the Russians have been dumbasses and saying nope, its an Intel processor, we can't do it, they too smart, hardware is imprenetable... Bull.. might have taken them a while, but they probably figued out these stuff a while ago, and probably kept it as secret as possible, and used it as little as possible as not to lose the tool too fast. If I was betting money, real actors are probably deploying these tools left and right before they lose it completely... My shitty guessing of course.
 
Did you read your link ? lol

V1 = software mitigation
V2 = a Mix of hardware and software (in otherwords the same microcode fixes that cost older CPUs performance as well)
V3 = Mostly hardware a few new chips still need software mitigation
V3a = Microcode fix for almost all but one line of Xeons
V4 = a mix of hardware microcode and software on all chips
Foreshadow = hardware fix for most models unless your unlucky enough to get a stepping 11 chip
Ridl = hardware unless again your unlucky enough to pick up a stepping 11 chip
Fallout = 50/50 on current chips some have hardware fixes others are microcode fixes
MlDPS and MDSUM = mix of microcode and software

Reading this it looks like for SOME of these exploits they are slowly working hardware fixes into their pipe. For others like V1 the fix is software, for V2 V4 its microcode. Those are not proper hardware fixes.

And of course the issue isn't the chips Intel is selling today (they are working through a ton of product that still isn't properly fixed), the main issue is the millions of Intel chips out there already. Its going to take years for all those chips to go through their life cycle. Meaning Intel running companies are going to be patching new side channel exploits for YEARS. Expect a new novel way to exploit side channel stuffs every 6 months till that hardware is history in 4 or 5 years.

This thread is about MDS, isn't? The table I did bring shows that only "Family 6 Model 158 Stepping 11" has software protection, stepping 12 and 13, and rest of entries in that table show hardware protection already included.

V1 and V2 is Spectre; this not only affects to non Intel chips, but probably will never be fully fixed. The process of fixing on hardware is accumulative. That is why some steppings have only software fix and the N+1 stepping has hardware mitigation.

Yes, the issue is on older chips, but it would be also stated that up to today there is no known attack based in those exploits, because a successful attack requires very sophisticated techniques.
 
Last edited:
Back
Top