Intel Xeon Phi x200 Knight’s Landing x86 Compatibility Test

Megalith

24-bit/48kHz
Staff member
Joined
Aug 20, 2006
Messages
13,004
Okay, so nobody is going to run Windows regularly on these 64-core beasts, but it’s cool to see that Intel’s claim of Knight’s Landing being capable of running legacy x86 code with little fuss was true. Thanks to Patriot for the link.

The run did not utilize any of the KNL features that give the platform its true power such as AVX512. That testing is coming. We did validate one of the major selling points of KNL. Code made to run on other Intel x86 architectures will work without modification on the new Intel Xeon Phi x200 (KNL) generation. The impact of this is enormous and is a key reason we saw so many KNL supercomputer system wins at SC16 this year. Applications can access a highly parallelized architecture without needing a co-processor. The Intel Xeon Phi x200 has direct access to the hex-channel system RAM as well as 16GB of high-bandwidth/ low latency MCDRAM without a performance penalty from traversing the PCIe bus.
 

pxc

[H]ard as it Gets
Joined
Oct 22, 2000
Messages
33,064
64 Airmont-based cores with 4 threads per core. OK. :p

As the guy explained in the video, that isn't the best test of the Xeon Phi's abilities, and I agree with atp1916 above that it's a cool demo.
 

Zarathustra[H]

Official Forum Curmudgeon
Joined
Oct 29, 2000
Messages
29,470
I wonder what type of loads benefit from four threads per core...

Especially on weak-ass Atom-type cores.

I mean, I know there's lots of them, but still...
 

pxc

[H]ard as it Gets
Joined
Oct 22, 2000
Messages
33,064
I wonder what type of loads benefit from four threads per core...

Especially on weak-ass Atom-type cores.

I mean, I know there's lots of them, but still...
That's how GPU-type cores are made, in order to maximize utilization. The purpose of SMT, both on GPUs and CPUs, it is to hide memory access latency by switching to another thread while it waits for the memory load to complete and become available, or other pipeline stalls. I'm not sure how efficiently Intel's context switching is, but in AMD and Nvidia GPUs, at least while working on a single wavefront or warp (name for the unit of work typically comprising of up to many thousands of threads for AMD and Nvidia GPUs, respectively), it's essentially free to switch out a stalled thread and work on another that is ready.

As "weak" as those Atom cores are at running full x86 code, they're far faster per core than Nvidia's or AMD's streaming processors for that style of code (essentially semi-random read patterns, branchy code). Those two examples are things that kill performance on classic GPGPU style programming, and compared to using optimized AVX-512 code on the multiple SIMD units per core on that Xeon Phi, it also kills its performance. :p That Cinebench demo is not a typical task for a GPU, but it does show how flexible each core is in running general code fairly efficiently.
 
Top