A Massive Intel Hardware Bug May Be on the Horizon

Megalith

24-bit/48kHz
Joined
Aug 20, 2006
Messages
13,000
There is mounting evidence that an Intel CPU bug, which could have lasting consequences for Amazon, Google, and other major cloud providers, is about to be disclosed. While a fix is in the pipeline, people say that it could impose performance penalties of as much as 35 percent. AMD chips are reportedly unaffected.

tl;dr: there is presently an embargoed security bug impacting apparently all contemporary CPU architectures that implement virtual memory, requiring hardware changes to fully resolve. Urgent development of a software mitigation is being done in the open and recently landed in the Linux kernel, and a similar mitigation began appearing in NT kernels in November. In the worst case, the software fix causes huge slowdowns in typical workloads.
 
Last edited:
This popped up on the gentoo forums earlier today... seems everyone is keeping stum about this to get OS-level fixes in place.
The nature of the linux patches is what is giving people the insight into where the flaw is

https://news.ycombinator.com/item?id=16046636

The linux fixes (and thus the windows fix because shock horror an OS is an OS...) does it paging hard so this will impact performance for intel-based systems
 
'Kernel memory leaking' Intel processor design flaw forces Linux, Windows redesign

A fundamental design flaw in Intel's processor chips has forced a significant redesign of the Linux and Windows kernels to defang the chip-level security bug.

Programmers are scrambling to overhaul the open-source Linux kernel's virtual memory system. Meanwhile, Microsoft is expected to publicly introduce necessary changes to its Windows operating system in this month's Patch Tuesday: these changes were seeded to beta testers running fast-ring Windows Insider builds in December.

Crucially, these updates to both Linux and Windows will incur a performance hit on Intel products. The effects are still being benchmarked, however we're looking at a ballpark figure of five to 30 per cent slow down, depending on the task.
 
Saw this mentioned on other sites yesterday, just another booboo along the path of progress, was bound to happen sooner or later statistically. The biggest security aspect that I was reading about in the details is that in situations where virtual machines are in use there could potentially be some cross-contamination, so to speak, between the memory being used by each VM aka one VM's active memory could be monitored/snooped on by another active VM so yeah that's a big fucking huge security issue there.
 
Well if thats true there goes Intels advantage over AMD. It will also hurt Intels image as well.
 
Wow if I actually see a even 10% slowdown im going to be build a new rig. 5-10% for me means 45mins-1.5 hrs.
Well it says typical workloads, until I know what workloads are those, it is possible we won't even be affected at all. But I feel the problem, as someone who runs processes that run for days.
 
Well it says typical workloads, until I know what workloads are those, it is possible we won't even be affected at all. But I feel the problem, as someone who runs processes that run for days.
this flaw appears to be associated with virtual memory, so if your application has to page out alot or uses a lot of virtual paging you are probably going to be affected more... this is a royal PITA... I use matlab daily and 24gig isn't enough for what I simulate ... a quad i7 slowing down to mitigate this flaw could potentially hit my productivity
 
AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against. The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault.

Tom Lendacky <thomas.lendacky@amd.com>

TL,DR OK, it's a security bug, but what's the symptom? What if I don't want it to be fixed? What if I want my 30% performance, security be damned?
AMD-RYZEN-ZEN.jpg


:D
 
Last edited:
Kernel page table issue. Patches are working to isolate.
 
AMD is up 6% today, Intel is up 1% with the broad market. Easily could be a coincidence. Might be time to get some AMD stock though.
 
The codebase looks like it is affecting both Intel and AMD going by the comments?

Code:
+ * User space process size. This is the first address outside the user range.
+ * There are a few constraints that determine this:
+ *
+ * On Intel CPUs, if a SYSCALL instruction is at the highest canonical
+ * address, then that syscall will enter the kernel with a
+ * non-canonical return address, and SYSRET will explode dangerously.
+ * We avoid this particular problem by preventing anything executable
+ * from being mapped at the maximum canonical address.
+ *
+ * On AMD CPUs in the Ryzen family, there's a nasty bug in which the
+ * CPUs malfunction if they execute code from the highest canonical page.
+ * They'll speculate right off the end of the canonical space, and
+ * bad things happen. This is worked around in the same way as the
+ * Intel problem.
+ *
+ * With page table isolation enabled, we map the LDT in ... [stay tuned]
*/
#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
https://git.kernel.org/pub/scm/linu.../?id=5aa90a84589282b87666f92b6c3c917c8080a9bf

I guess you get more page hits by only going after Intel.
 
The codebase looks like it is affecting both Intel and AMD going by the comments?

Code:
+ * User space process size. This is the first address outside the user range.
+ * There are a few constraints that determine this:
+ *
+ * On Intel CPUs, if a SYSCALL instruction is at the highest canonical
+ * address, then that syscall will enter the kernel with a
+ * non-canonical return address, and SYSRET will explode dangerously.
+ * We avoid this particular problem by preventing anything executable
+ * from being mapped at the maximum canonical address.
+ *
+ * On AMD CPUs in the Ryzen family, there's a nasty bug in which the
+ * CPUs malfunction if they execute code from the highest canonical page.
+ * They'll speculate right off the end of the canonical space, and
+ * bad things happen. This is worked around in the same way as the
+ * Intel problem.
+ *
+ * With page table isolation enabled, we map the LDT in ... [stay tuned]
*/
#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
https://git.kernel.org/pub/scm/linu.../?id=5aa90a84589282b87666f92b6c3c917c8080a9bf

I guess you get more page hits by only going after Intel.
No, two separate issues.

The AMD/Ryzen issue is associated with a memory hole causing hard lockups. The recomendation until now, for those affected by early Ryzen hardware issues, was to disable ASLR: https://wiki.gentoo.org/wiki/Ryzen#Troubleshooting this gentoo wiki page has the confirmed issues and potential work arounds

The Intel one is associated with this bug.

--edit--
This is the discussion of the linux fix

https://lkml.org/lkml/2017/12/27/2

Code:
+   if (c->x86_vendor != X86_VENDOR_AMD)
+        setup_force_cpu_bug(X86_BUG_CPU_INSECURE);


--edit--
This is the actual commit to kernel.git
https://git.kernel.org/pub/scm/linu...c?id=a89f040fa34ec9cd682aed98b8f04e3c47d998bd

Code:
+    /* Assume for now that ALL x86 CPUs are insecure */
+    setup_force_cpu_bug(X86_BUG_CPU_INSECURE);

The proposed commit targeted non-AMD, the actual commit err'ed on the safe side until it is confirmed it doesn't affect AMD.

SHIT!! my linux-based gentoo setup is going to be slowed down in 4.15... I might stick with 4.14 a bit longer
 
Last edited:
The codebase looks like it is affecting both Intel and AMD going by the comments?

Code:
+ * User space process size. This is the first address outside the user range.
+ * There are a few constraints that determine this:
+ *
+ * On Intel CPUs, if a SYSCALL instruction is at the highest canonical
+ * address, then that syscall will enter the kernel with a
+ * non-canonical return address, and SYSRET will explode dangerously.
+ * We avoid this particular problem by preventing anything executable
+ * from being mapped at the maximum canonical address.
+ *
+ * On AMD CPUs in the Ryzen family, there's a nasty bug in which the
+ * CPUs malfunction if they execute code from the highest canonical page.
+ * They'll speculate right off the end of the canonical space, and
+ * bad things happen. This is worked around in the same way as the
+ * Intel problem.
+ *
+ * With page table isolation enabled, we map the LDT in ... [stay tuned]
*/
#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
https://git.kernel.org/pub/scm/linu.../?id=5aa90a84589282b87666f92b6c3c917c8080a9bf

I guess you get more page hits by only going after Intel.

wrong bug / code

https://infozonic.com/2018/01/02/linux-page-table-isolation-is-not-needed-on-amd-processors/

x86_vendor != X86_VENDOR_AMD
 
Last edited:
TL,DR OK, it's a security bug, but what's the symptom? What if I don't want it to be fixed? What if I want my 30% performance, security be damned?

Exactly. That's a huge fricken trade-off for a security vulnerability that may not even affect you.
 
To improve security Linux & windows has randomised where in memory applications launch - a know boot sequence would mean a certain binary memory space exists at a predictable location... no more.
To further improve security linux & windows then randomise where in memory the kernel lives - every boot would result in the kernel living at the same memory space ... no more.

If you know where in memory a binary resides & it has an exploit, you can abuse it. This randomisation is a precautionary measure

https://en.wikipedia.org/wiki/Kernel_page-table_isolation

Kernel page-table isolation (KPTI, previously called KAISER) is a hardening technique in the Linux kernel to improve security by better isolating user space and kernel space memory.[1][2] KPTI was merged into Linux kernel version 4.15,[3][4] to be released in early 2018, and backported into Linux Kernel 4.14.10. Windows implemented an identical feature in version 17035 (RS4)[5].

Prior to KPTI, whenever executing user space code (applications), Linux would also keep its entire kernel memory mapped in page tables, although protected from access. The advantage is that when the application makes a system callinto the kernel or an interrupt is received, kernel page tables are always present, so most context switching-related overheads (TLB flush, page table swapping, etc) can be avoided.[1]

In 2005, the Linux kernel adopted address space layout randomization (KASLR), which makes it more difficult to exploit kernel vulnerabilities,[6][7] which relies on kernel addresses remaining hidden from user space. Despite prohibiting access to these kernel mappings, it turns out there are several side-channel attacks in current Intel x86 processors (as of December 2017) that can leak the location of this memory, making it possible to work around KASLR.[2][8][9][10] AMDprocessors are not affected by these attacks and don't need KPTI to mitigate them.[11]

KPTI fixes these leaks by separating user space and kernel space page tables entirely. On recent x86 processors, a TLB flush can be avoided using the process context identifiers (PCID) feature, but even then it comes at a significant performance cost particularly in syscall-heavy and interrupt-heavy workloads. The overhead was measured to be 0.28% according to KAISER's original authors,[2] but roughly 5% for most workloads by a Linux developer.[1]

KPTI can be disabled with the "nopti" kernel boot option. Also provisions were created to disable KPTI if newer processors fix the information leaks.[3]
 
This popped up on the gentoo forums earlier today... seems everyone is keeping stum about this to get OS-level fixes in place.
The nature of the linux patches is what is giving people the insight into where the flaw is

https://news.ycombinator.com/item?id=16046636

The linux fixes (and thus the windows fix because shock horror an OS is an OS...) does it paging hard so this will impact performance for intel-based systems


Welp,

It seems from my reading that it impacts virtual memory only, so maybe if you disable VT-x and VT-d in bios, you are immune?

The real impact will be for people running virtualized workloads on intel systems?

This is going to stink. My virtualized server does a lot of my home stuff, and it is a dual Westmere-EP
 
This flaw effects everyone though because it allows privilege escalation.

The article says it affects all "virtual memory" implementations.

I wonder if disabling VT-x and VT-d allows to to avoid the bug without hard paging resulting in a performance drop.
 
Welp,

It seems from my reading that it impacts virtual memory only, so maybe if you disable VT-x and VT-d in bios, you are immune?

The real impact will be for people running virtualized workloads on intel systems?

This is going to stink. My virtualized server does a lot of my home stuff, and it is a dual Westmere-EP

Virtual memory is not the same as Virtualized memory (although the later uses virtual memory to work). It you open Notepad, you're using virtual memory. It's been around since the 80386 (for x86).
 
If it comes anywhere near a 30% droop Intel should give some very deep discounts on nee cpus and If this effects the sever side of things it could be a disaster
 
So will Intel send me a 5.35ghz to 6.63ghz 8700k to offset the 5-30% performance loss on my brand new setup?

That chip doesn't exist.

However what they can do is send you a chip with more cores that will give you 30% more mutlithread performance :)
 
Epyc might see a big gain out of this depending on how this all shakes out.
 
this flaw appears to be associated with virtual memory, so if your application has to page out alot or uses a lot of virtual paging you are probably going to be affected more... this is a royal PITA... I use matlab daily and 24gig isn't enough for what I simulate ... a quad i7 slowing down to mitigate this flaw could potentially hit my productivity
Thank god I disabled virtual memory.

I realize virtualization is not the same as virtual memory. But for virtualization to work, doesn't virtual memory have to be on? Or does it do Address table side lookups even when you decide to not use virtualization.

I guess you'll be upgrading to 64 gigs 4x16 sticks
 
Last edited by a moderator:
Thank god I disabled virtual memory.

I guess you'll be upgrading to 64 gigs 4x16 sticks
This isn't todo with swap, this is todo with how memory is mapped.
Your RAM is part of a larger concept called memory which is divided into pages and easily exceeds the amount of RAM and addressable RAM your systems has or could have
 
Back
Top