Intel Xeon Phi x200 Knight’s Landing x86 Compatibility Test

Megalith · Dec 10, 2016

Okay, so nobody is going to run Windows regularly on these 64-core beasts, but it’s cool to see that Intel’s claim of Knight’s Landing being capable of running legacy x86 code with little fuss was true. Thanks to Patriot for the link.

The run did not utilize any of the KNL features that give the platform its true power such as AVX512. That testing is coming. We did validate one of the major selling points of KNL. Code made to run on other Intel x86 architectures will work without modification on the new Intel Xeon Phi x200 (KNL) generation. The impact of this is enormous and is a key reason we saw so many KNL supercomputer system wins at SC16 this year. Applications can access a highly parallelized architecture without needing a co-processor. The Intel Xeon Phi x200 has direct access to the hex-channel system RAM as well as 16GB of high-bandwidth/ low latency MCDRAM without a performance penalty from traversing the PCIe bus.

atp1916 · Dec 10, 2016

That's pretty damn cool.

Cinebench R15 run =

Quartz-1 · Dec 10, 2016

Megalith said:
Okay, so nobody is going to run Windows regularly on these 64-core beasts

Why not? Microsoft have migrated Windows and Office to non-x86 platforms before. Windows Compute Server anyone?

pxc · Dec 10, 2016

64 Airmont-based cores with 4 threads per core. OK.

As the guy explained in the video, that isn't the best test of the Xeon Phi's abilities, and I agree with atp1916 above that it's a cool demo.

Zarathustra[H] · Dec 10, 2016

I wonder what type of loads benefit from four threads per core...

Especially on weak-ass Atom-type cores.

I mean, I know there's lots of them, but still...

ir0nw0lf · Dec 10, 2016

When I first saw MCDRAM I thought of McDonalds.

pxc · Dec 11, 2016

Zarathustra[H] said:
I wonder what type of loads benefit from four threads per core...

Especially on weak-ass Atom-type cores.

I mean, I know there's lots of them, but still...

That's how GPU-type cores are made, in order to maximize utilization. The purpose of SMT, both on GPUs and CPUs, it is to hide memory access latency by switching to another thread while it waits for the memory load to complete and become available, or other pipeline stalls. I'm not sure how efficiently Intel's context switching is, but in AMD and Nvidia GPUs, at least while working on a single wavefront or warp (name for the unit of work typically comprising of up to many thousands of threads for AMD and Nvidia GPUs, respectively), it's essentially free to switch out a stalled thread and work on another that is ready.

As "weak" as those Atom cores are at running full x86 code, they're far faster per core than Nvidia's or AMD's streaming processors for that style of code (essentially semi-random read patterns, branchy code). Those two examples are things that kill performance on classic GPGPU style programming, and compared to using optimized AVX-512 code on the multiple SIMD units per core on that Xeon Phi, it also kills its performance.

That Cinebench demo is not a typical task for a GPU, but it does show how flexible each core is in running general code fairly efficiently.

Intel Xeon Phi x200 Knight’s Landing x86 Compatibility Test

Megalith

24-bit/48kHz

atp1916

[H]ard|DCoTM x1

Quartz-1

Supreme [H]ardness

pxc

Extremely [H]

Zarathustra[H]

Extremely [H]

ir0nw0lf

Supreme [H]ardness

pxc

Extremely [H]