Xeon Phi 7120p: compatible mobo's

Discussion in 'Physics Processing' started by eudoxos, Nov 14, 2014.

  1. eudoxos

    eudoxos n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    I bought a Xeon Phi 7120p and need to upgrade the mobo from ASUS P9X79 PRO which does not support Xeon Phi (precisely, it does not feature "memory mapped I/O address ranges above 4GB" (there is a writeup about that here). I am a bit torn between two possibilities. Easier one is to just replace the mobo with ASUS P9X79-E WS, another one is to upgrade the whole gear and pass my current machine to someone else. That results in the questions:

    1. The ASUS P9X79-E WS officially supports the Xeon Phi 3100 series, but people (e.g. here) use 5100 series as well. Is the 7120 somehow different, or it just more expensive and hence not explicitly mentioned? Will it work with that board?

    2. If I am to buy new gear, I am thinking about AMD (dual 16-core Opterons or something similar). I understand Intel claiming compatibility with Xeon CPU's only (though I don't like this kind of monopolization through unclear compatibility). Is this risky? Someone had that one working? (I know AMD may not be a good idea for HPC, but I have code which scales pretty well and 2x16-core Opteron costs the same as 1x10-core Xeon...?) What should I look at? Or are Opterons just a no-go for computing?

    3. If I go with Xeon CPU (single or dual), will all mobos work? How to check that? Intel is pretty reluctant at claiming compaitiblity and recommends to buy from complete computers OEMs. I don't want that. I want to pick my mobo and CPU and everything around. (I looked at ASUS ESC2000_G2 barebone, looks nice, that's what the people at openwall.info use, or perhaps just single CPU with ASUS ESC1000_G2). Some recommendations? I don't want rack, just server box, silent if possible (sits in my kitchen, behind the door when working).

    4. I bought the passive-cooled variant, thinking that I will cool it just with some fan in the chassis... not so sure about that now... :/ These guys put dual 12kRPM fans but that will be noisy as hell. If I just buy e.g. 80mmx80mm fan with a bit more than required throughput (30cuft/min), and make a funnel to push the air through the cooler, will that work?

    Thanks for all information and input.
     
  2. Patriot

    Patriot [H]ard|DCer of the Month - March 2011/June 2013/De

    Messages:
    2,496
    Joined:
    Dec 15, 2010
    I guess an appropriate question would be why do you want a phi ... they are not that great for HPC.
     
  3. Blue Fox

    Blue Fox [H]ardForum Junkie

    Messages:
    11,695
    Joined:
    Jun 9, 2004
    The thing really does need a ton of airflow and a single 80mm fan might not be adequate. Keep in mind that it produces almost 3 times the amount of heat that your CPU might. The passively cooled variants were designed for servers with precisely those kinds of 40mm fans that you'd rather not use.
     
  4. eudoxos

    eudoxos n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    Particule sims, that's why I need it. Those suck at GPU (unpredictable memory accesses) but parallelize very well on CPUs.
     
  5. eudoxos

    eudoxos n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    Good point for comparison, it is true I have 2x80mm fans on the CPU. and it has a massive cooler.

    The card has 3-pin fan socket, when I connect it there, will it regulate the fan depending on temperature itself? Or do I need to do that from software?
     
  6. wirk

    wirk Gawd

    Messages:
    811
    Joined:
    Sep 2, 2014
    Why not buying the X99-E WS board? It is the newest with the X99 chipset instead of the old X79, has 7 PCIe 3.0x16 slots and power supply for heavy loads. Most importantly it supports new Xeon v3 processors and registered DIMM memory. Apart of what is written on the Asus page that it supports all Xeon v3 processors, I can fully confirm it too since this message is written on a just assembled system with the X99-E WS, Xeon E5-1680v3, and 4x16=64 GB Crucial RDIMM set :cool:

    Second issue is if the simulations could be done using high-end graphics cards like Titan Black. Their floating point performance is not far or better from Xeon Phi, but obviously architecture must fit.
     
    Last edited: Nov 15, 2014
  7. eudoxos

    eudoxos n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    If I buy the older P9X79-E WS, I don't have to upgrade the CPU (i7-3930K, a 32nm model) and RAM (DDR3 1600MHz). I assume those would not work with the X99? Replacing just the mobo is the cheapest solution.

    If I decide to upgrade the whole thing, then, yes, X99-E WS is hot candidate (unless I go for dual CPU).

    Congrats to your new machine :)
     
  8. wirk

    wirk Gawd

    Messages:
    811
    Joined:
    Sep 2, 2014
    No, X99-E WS requires Haswell architecture, LG2011v3 socket, and DDR4. Meaning complete overhaul of the system. But me thinks heavy caliber card like Xeon Phi requires workstation class meaning Xeon processor and ECC memory. X99-E WS support Xeon and up to 128 GB RAM.
     
  9. eudoxos

    eudoxos n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    I confirm that the 7120p works with the ASUS P9X79-E WS (which officially supports only Xeon Phi 3120 and friends).

    Now the cooling, that will be a bit of a hassle. Someone owning the 7120A would care to photograph how it is arranged internally? Where is the fan connected? I could perhaps use the same pin to drive the external fan.

    For me that would be waste of money. Xeon Phi for the hard work, CPU only dispatches jobs there, no big performance needed.
     
  10. wirk

    wirk Gawd

    Messages:
    811
    Joined:
    Sep 2, 2014
    Your jobs could not be done with graphics cards, CUDA/OpenCL?
     
  11. eudoxos

    eudoxos n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    No. And believe me, I am good programmer, spent 5 months full-time on trying that plus consulted expensive GPGPU pros. GPGPU is hype, it is usefl only for very specific tasks (with predictable memory access patterns) -- most discretized continuum simulations (finite elements, finite volumes), like fluid flow or solid mechanics, lead to solving matrixes, that is where GPGPUs excel.

    For particle simulations, that's not the case, and there is just one code (which is very limited) simulating particle behavior on the GPU. nVidia published some "particle simulation" tutorials with CUDA/OpenCL, but those are just jokes on the real thing (500k particles, each of them different parameters, complex contact equations, collision detection etc).
     
  12. wirk

    wirk Gawd

    Messages:
    811
    Joined:
    Sep 2, 2014
    ^What you say is true, GPU is good for specific stuff only. However, detailed analysis of performance could take into account possibility of having 4 GPUs (like Titan Black) vs. single Xeon Phi in one system. On paper single Titan Black dual floating point performance vs. Xeon Phi is similar.
     
  13. eudoxos

    eudoxos n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    @wirk raw FPS are meaningless as most the time is spent (and that is true also for CPUs) waiting for data from RAM for usual tasks; and that is not a design problem of the code, it is how it is. The GPUs we had were FirePro and some high-end from nVidia and we could not even match the performance of a very normal i7 processor (with different implementation in c++, not in OpenCL). GPGPU was a big marketing, and it still is; but its usefulness is really only limited to tasks with high data locality, and those are just some. Xeon Phi is more similar to CPU in architecture, comparing Xeon Phi to GPU is mixing elephants and apples.
     
  14. wirk

    wirk Gawd

    Messages:
    811
    Joined:
    Sep 2, 2014
    I agree with you that Xeon Phi is more like multicore architecture which means it is more flexible. GPUs have fine grain computing units but they are rigid for specific tasks but then performance gain can be impressive as shown for the just released K80 e.g. even for linpack. There is also another fact: every big supercomputer is now a combination of CPU and GPU nodes.
     
  15. eudoxos

    eudoxos n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    That's because most SC solve matrices (linpack) for huge PDEs discretized over an insane amount of elements. But I need to solve my problems, not those I don't have....
     
  16. Blue Fox

    Blue Fox [H]ardForum Junkie

    Messages:
    11,695
    Joined:
    Jun 9, 2004
    That's not exactly true when the #1 supercomputer for example uses Xeon Phis, not GPUs.
     
  17. wirk

    wirk Gawd

    Messages:
    811
    Joined:
    Sep 2, 2014
  18. Blue Fox

    Blue Fox [H]ardForum Junkie

    Messages:
    11,695
    Joined:
    Jun 9, 2004
    Years away and there's always something better around the corner. New Xeon Phis are increasing performance by considerable amounts.
     
  19. wirk

    wirk Gawd

    Messages:
    811
    Joined:
    Sep 2, 2014
    ...in their specific areas, there is not insignifcant pool of cases where GPUs are better :cool:.
     
  20. Chris_Lonardo

    Chris_Lonardo [H]ard|OCP Storage Engineer & Editor

    Messages:
    1,726
    Joined:
    Feb 10, 2002
    The Phi sucks up PCIe bandwidth. To me, it makes the most sense to use it as intended- with a dual E5 board. I've successfully tested the Phi on the Supermicro X9DAi and Intel S2600COE, but wasn't able to get it working on a single-proc Asus Z9PA-U8. Also, the Windows toolchain is not mature (or wasn't the last time I played with it a few months ago), so I wouldn't really bother unless you're using Linux.
     
  21. eudoxos

    eudoxos n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    Of course I use Linux... For PCIe, I don't, I don't care so much, the card will be used for running the entire program (loaded over NFS), not for offloading routines from programs running on the CPU otherwise. The card is working just fine, judging by all diagnostics tools (micctrl etc) though I've not yet got to running serious code on it.
     
  22. major_foad

    major_foad n00b

    Messages:
    41
    Joined:
    Jun 14, 2004
    I've got one working great in an HP Z820, although that's probably too extreme a leap from already having most of the PC parts already.

    If you've got the ability to change your mind, the actively cooled one has somewhat higher performance... although it is far from silent (or even quiet).
     
  23. eudoxos

    eudoxos n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    I just want to report that Xeon Phi works well with P9X79-E WS. A friend of mine converted the PC to a water-cooled one with 4 radiators inside the case, and it is almost silent. The cooling was not cheap, though.
     
  24. martinmsj

    martinmsj [H]ard|Gawd

    Messages:
    1,542
    Joined:
    Mar 3, 2005
    An update would be nice. I've had an eye on one of this. I don't have a particular problem to solve with these however it hasn't stopped my curiosity.

    I've wondered what it's like to work with these versus OpenCL and a Quadro K2000 or AMD APP supported GPU. Currently messing around with OpenCL and the Iris 6100 which hasn't been too friendly in regards to memory. (Macbook Pro 13" 2015 i7 Iris 6100)
     
  25. eudoxos

    eudoxos n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    OpenCL and XeonPhi are not directly comparable, it is like comparing GPUs with CPUs; GPUs are very good at something at suck really bad for other things. XeonPhi is 240 virtual PentiumIII (IIRC) cores plus extra vector math instructions. Lot of work to compile things (you need intel compiler, plus recompile any libs), but easier than rewriting your code in a different language compltely. I am not an expert here, barely got it running, better ask somewhere else. Good luck.
     
  26. Patriot

    Patriot [H]ard|DCer of the Month - March 2011/June 2013/De

    Messages:
    2,496
    Joined:
    Dec 15, 2010
    P54c (pentium 1) and nothing virtual about them... They are shrunk, given some updated instruction sets and wrapped up in a ringbus.
     
  27. eudoxos

    eudoxos n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    I meant it has 60 physical cores, AFAIK, each with 4 hyper-threads ones, showing up as 240 "virtual" cores.
     
  28. martinmsj

    martinmsj [H]ard|Gawd

    Messages:
    1,542
    Joined:
    Mar 3, 2005
    Hi there!

    There is some confusion here. The Xeon Phi's support OpenCL as well as openmp and MPI etc.. Intel has specific OpenCL optimizations. I was curious on the experience in OpenCL targeting XeonPhi versus the Nvidia/AMD/Intel GPUs as far as working with the clustering and memory. Moreover, it would probably not be appropriate for a forum post (and most likely go over my head since it's not what I do for a living just a cool side hobby.)

    What are you running on the XeonPhi hardware now?
     
  29. eudoxos

    eudoxos n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    No experience here, did not try OpenCL.
    I am still in the progress of compiling the http://woodem.org code for XeonPhi, to run it directly there, but not much time lately. I want to try keeping the same code as for CPU, perhaps with just a few optimizations. It needs a bunch of c++ libs, so the prospect is to compile those with cross-compiler first and so on.

    Unfortunately, Intel is not really supportive. They only provide a (on my machine defunct) yocto build - for embedded systems. A full distribution (e.g. Debian port would be my favorite) would be much more appropriate and convenient. Not sure if I ever get yocto to compile boost or VTK, for instance.
     
  30. martinmsj

    martinmsj [H]ard|Gawd

    Messages:
    1,542
    Joined:
    Mar 3, 2005
    Thanks for the reply! I appreciate it. These forums seem a bit dead so thanks again. Who knows, I may contribute a thread or two. I've been looking into those heavily discounted Xeon Phi models. Unfortunately,I don't have the hardware to run these as their meant for rack servers in a data center.

    I've been playing around with the NVidia Jetson K1 development board. Hopefully, I'll have a chance to play around with one of these sometime. I know I have zero chance of getting my job to purchase one.
     
  31. Chris_Lonardo

    Chris_Lonardo [H]ard|OCP Storage Engineer & Editor

    Messages:
    1,726
    Joined:
    Feb 10, 2002
    Are you talking about the cooling? I got a deal on a passively-cooled Phi a while back and was able to make it work in my desktop. You just need to be a bit creative on air flow.
     
  32. martinmsj

    martinmsj [H]ard|Gawd

    Messages:
    1,542
    Joined:
    Mar 3, 2005
    Precisely that. I also, not sure how to get it to work on the computer at home since I would need a custom BIO's from Asus to enable the 64-Bit Bar support on the Gryphon Z87 motherboard.
     
  33. Chris_Lonardo

    Chris_Lonardo [H]ard|OCP Storage Engineer & Editor

    Messages:
    1,726
    Joined:
    Feb 10, 2002
    That would be the bigger issue, I'd expect. I was able to rig a spare Corsair H60 liquid cooling loop I had laying around onto my Phi pretty easily, after I got tired of having a crazy fan contraption hanging out of my case.

    The Phi really needs the PCIe lanes, so I have some doubts about it being worth the effort on a single CPU system, unfortunately.
     
  34. wirk

    wirk Gawd

    Messages:
    811
    Joined:
    Sep 2, 2014
    I don't catch what you mean. For a single card one can get PCIe 3.0x16 lines at most (though 4.0 is coming but needs support). The 16 lanes are really needed no doubt about it, and they are readily available. But how is this related to the single/multi CPU system? PCIe lines are managed by a single CPU in any case and one can buy a single CPU high-end X99 mobo with support for 4 PCIe 3.0x16 cards.
     
  35. Chris_Lonardo

    Chris_Lonardo [H]ard|OCP Storage Engineer & Editor

    Messages:
    1,726
    Joined:
    Feb 10, 2002
    Single Haswell-E CPUs provide a maximum of 40 lanes. Dual CPU motherboards support 80 lanes total. In the example you provided (4 PCIe x16 cards running in an X99), they will not be running with all 16 lanes, at least without relying on a PLX multiplexer.