Intel Has a Core Issue and It Stems From a Lack of a HEDT Plan

"Intel Has a Core Issue..."

giphy.gif
 
The question then is, why does that latter task require tons of CPU horsepower?
It will of course depend on the application, but I find myself at a loss to imagine one that would do so.

Think having every non-private corner of a residence (or small business) under 4k60+ surveillance; that footage has to be piped around and analyzed, at a basic level for changes, and for changes to be piped to inference engines. Dozens of cameras, which themselves should be far smaller than current surveillance cameras supposing they are based on current cell-phone camera designs. Perhaps having stereo and IR capability as well.

And this function could be integrated into every other function, from environmentals to doors, windows, etc. It's a lot of data to constantly be dealing with, and a lot of decisions to make along the way.

[also not saying that ML is absolutely a killer-app for high-core CPUs, or even the integration scenario that I very roughly describe above, it's just the only one I can think of that might increase demand at the mass-market consumer level...]
 
Intel and AMD both need the next Killer App, something highly parallelizable, requiring insane amounts of CPU, that has wide mass-market appeal.

It's more difficult than that. An app that is simply highly parallellzable would probably just be done on GPU's.

There needs to be something about it that requires a bigger core, like a CPU has.
 
An app that is simply highly parallellzable would probably just be done on GPU's.

If it were an app with branching code, and if its performance were tied to clockspeeds that are more amenable to modern CPUs than GPUs, I think that the case could be made. Of course, AMD/Nvidia could just throw ARM cores at the GPU if this workload became high in demand.
 
Think having every non-private corner of a residence (or small business) under 4k60+ surveillance; that footage has to be piped around and analyzed, at a basic level for changes, and for changes to be piped to inference engines. Dozens of cameras, which themselves should be far smaller than current surveillance cameras supposing they are based on current cell-phone camera designs. Perhaps having stereo and IR capability as well.
This has mass-market appeal? For business, maybe, but if I had a home like that, I'd be taking an ice pick to all those cameras.
 
This has mass-market appeal? For business, maybe, but if I had a home like that, I'd be taking an ice pick to all those cameras.

Yeah, there's a level of trust here that simply hasn't been established, which is what makes it a precarious suggestion.
 
The truth is that none of us NEED anything other than a basic celeron computer, but we like to have nice things.

In truth, the OG Threadripper series was a big success for AMD, even though people said nobody needed it or could find a use-case for it. I can't find a good use-case for a Lamborghini, but damn do I want one!

So stop trying to justify a purchase based on NEEDS when a $1000+ CPU is not something purchased out of necessity...
 
The truth is that none of us NEED anything other than a basic celeron computer, but we like to have nice things.

I agree with respect to 'needs', and for desktop computing I feel that you're absolutely right; but for stuff that's compute-intensive, be it software development or content creation of some sort, applications do exist for these many-cores-per-socket products. They're particularly great for work that's perhaps already well-threaded but not currently well-distributed between nodes; stuff that's latency sensitive really needs network technologies that do not exist at the workstation level, and more cores helps.
 
Isnt the advantage of ThreadRipper2 the fact that it offers more PCIe Lanes for motherboard manufactureers to use for more features?
 
I agree with respect to 'needs', and for desktop computing I feel that you're absolutely right; but for stuff that's compute-intensive, be it software development or content creation of some sort, applications do exist for these many-cores-per-socket products. They're particularly great for work that's perhaps already well-threaded but not currently well-distributed between nodes; stuff that's latency sensitive really needs network technologies that do not exist at the workstation level, and more cores helps.

Good point, as well I can add that there may not be ONE thing that can use all my cores, but I can run 10 different active applications AND game without slowdown. That in itself is a blessing.
 
Good point, as well I can add that there may not be ONE thing that can use all my cores, but I can run 10 different active applications AND game without slowdown. That in itself is a blessing.

I had a 6700k that was perfectly acceptable for gaming, even with a 1080Ti- I only moved to an 8700k because I saw myself doing more 'other work' like VM labs, and I also wanted to be able to game at the same time.

Now, if that other work were actually CPU intensive, I'd likely have moved to Ryzen/TR; and I've looked closer at TR in the recent past because of:

Isnt the advantage of ThreadRipper2 the fact that it offers more PCIe Lanes for motherboard manufactureers to use for more features?

Basically I experimented with the idea of adding 10Gbase-T to a consumer board. The result was seeing that it's going to cost +$100, and if the controller isn't built in- Intel or AMD sub-HEDT- you're gonna lose GPU lanes. Most Intel HEDT are there as well. So for 10Gbit or other higher-bandwidth connection needs, AMD is certainly hitting a market point that Intel hasn't yet chosen to.
 
Basically I experimented with the idea of adding 10Gbase-T to a consumer board. The result was seeing that it's going to cost +$100, and if the controller isn't built in- Intel or AMD sub-HEDT- you're gonna lose GPU lanes. Most Intel HEDT are there as well. So for 10Gbit or other higher-bandwidth connection needs, AMD is certainly hitting a market point that Intel hasn't yet chosen to.

I have an Intel 10GBase-T adapter in my Asus P9x79 WS workstation board and it works great.

One thing that is appealing to me abnout TR, which I may attempt in my next build is to do a VM passthrough build, with Linux Mint as itss underlying OS, and Win10 running in a VM with a passed through GPU for games.

I've never done this, but as long as there are not significant performance penalties, I think it would be pretty awesome.
 
I have an Intel 10GBase-T adapter in my Asus P9x79 WS workstation board and it works great.

So long as you use the Aquantia NIC (or pay up for Intel PCIe 3.0), it should be fine, yeah. The main problem is with the CPUs that skimp on PCIe lanes, and how those boards are usually set up.
 
PCMag.com's encyclopedia defines it as "An Intel term for high-performance desktop computers." As others here have mentioned referencing the X58 chipset, Intel has been using the term for over 10 years as seen on page 4 of this pdf dated April 2, 2008..

More of the Mendela Effect. Does anyone here actually remember using it 10 years ago? Even 5 years ago?

Mirror Mirror on the wall... What's the worst acronym of them all?
 
We got a better term?

Maybe you had to look it up, but HEDT seems to have stuck. Intel did introduce the idea, why not use it?
 
So long as you use the Aquantia NIC (or pay up for Intel PCIe 3.0), it should be fine, yeah. The main problem is with the CPUs that skimp on PCIe lanes, and how those boards are usually set up.

Mine are "Intel Corporation 82598EB 10-Gigabit AT2 Server Adapter". Got a matching pair of single port adapters a couple of years ago. One went in my desktop, the other in my NAS server, with a direct line between them.

Best I can tell from the PCIe info, they are 8x PCIe gen 1.x devices. (2.5GT/s is gen 1 right? It's funny because the Intel spec page says they are gen 2 and two port, but mine definitely aren't) Both on the desktop and on the server they have the full 8x available to them, and I have had them hit damned near full 10Gbit speeds during file transfers.

15823097_10104106998518932_5246699001835498337_n.jpg


Code:
02:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT2 Server Adapter (rev 01)
    Subsystem: Intel Corporation 82598EB 10-Gigabit AT2 Server Adapter
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 26
    Region 0: Memory at fa640000 (32-bit, non-prefetchable) [size=128K]
    Region 1: Memory at fa600000 (32-bit, non-prefetchable) [size=256K]
    Region 2: I/O ports at d000 [disabled] [size=32]
    Region 3: Memory at fa660000 (32-bit, non-prefetchable) [size=16K]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [60] MSI-X: Enable+ Count=18 Masked-
        Vector table: BAR=3 offset=00000000
        PBA: BAR=3 offset=00002000
    Capabilities: [a0] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
        DevCtl:    Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 2.5GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
        DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis-, LTR-, OBFF Disabled
        LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
            Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
            Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
            EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        AERCap:    First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
    Capabilities: [140 v1] Device Serial Number xx-xx-xx-xx-xx-xx-xx-xx
    Kernel driver in use: ixgbe
    Kernel modules: ixgbe
 
Yup, that's part of the overall problem; if you want 10Gbase-T, and I do me the RJ-45 version, and you also want it on PCIe 3.0 (so that it's not potentially downgrading anything else to PCIe 2.0 {or PCIe 1.0!!}, you can be a bit limited in terms of platform support. It's a further issue if you want to build a "hyperconverged" system that has a ton of local storage, as even 10Gbase-T is dog slow if you're not running some form of RDMA, and then it's still slower than local storage in terms of latency, most especially if you're comparing to local NVMe.

However, with respect to your solution, I can't say that I disagree; I did spend up for a 10Gbase-T switch (that has two shared SFP+) from HP simply so that I can expand going forward, and I'm hoping to be able to trunk it over 10Gbit to an Aruba switch sourced from eBay.
 
Yup, that's part of the overall problem; if you want 10Gbase-T, and I do me the RJ-45 version, and you also want it on PCIe 3.0 (so that it's not potentially downgrading anything else to PCIe 2.0 {or PCIe 1.0!!}, you can be a bit limited in terms of platform support. It's a further issue if you want to build a "hyperconverged" system that has a ton of local storage, as even 10Gbase-T is dog slow if you're not running some form of RDMA, and then it's still slower than local storage in terms of latency, most especially if you're comparing to local NVMe.

However, with respect to your solution, I can't say that I disagree; I did spend up for a 10Gbase-T switch (that has two shared SFP+) from HP simply so that I can expand going forward, and I'm hoping to be able to trunk it over 10Gbit to an Aruba switch sourced from eBay.


I've never actually tested disk access latency of remote drives. How would you even do that? Put a drive image in a VM on it, and then do a Disk Bench?

I do all things that require low latency locally though. My remote NAS is just for file storage, and it does very well at that. I have - on occasion - seen it hit ~1.2GB/s accross the adapter. This was probably for stuff that was already in RAM cache on the NAS server though.
 


It's why I made the Mendela reference. With the Mendela reference, the past has changed historically, but our minds remember the 'old' reality. As if someone is traveling back in time and changing things, but our minds are powerful enough to remember how it really was. I thing Star Trek explorers similiar concepts. Tetra something or another.

Just a joke.
 
I've never actually tested disk access latency of remote drives. How would you even do that? Put a drive image in a VM on it, and then do a Disk Bench?

Probably?

Mount a remote share as a local drive (map it), and then run say CrystalBench on it?

Mostly, RDMA (which is RoCE or iWarp in actual products) seems designed to reduce access latencies that are inherent to connection-oriented network protocols (think TCP) even more than say just dropping to UDP would; they're essentially just dropping layer three completely and running data directly in ethernet frames more or less.

This has a bandwidth bonus for some workloads due to lower overhead, but the bigger bonus is that you're skipping protocol stacks along the way, so stuff that would be latency sensitive is less affected by running remotely in what is likely a highly distributed environment, something resembling for example a low-rent 'supercomputer'.

Applications for this stuff for homelabbers aren't really that broad except that some of this equipment is starting to hit ebay at accessible prices and it'd be both fun to learn and play with, and actually be useful if say you have a large enough lab and enough project work to actually realize a benefit from running apps remotely. Hell, at some point it should be financially feasible to say run your Steam directory on the NAS with absolutely no difference in performance versus local.

I do all things that require low latency locally though. My remote NAS is just for file storage, and it does very well at that. I have - on occasion - seen it hit ~1.2GB/s accross the adapter. This was probably for stuff that was already in RAM cache on the NAS server though.

And for most of us, this is the right way to do it. It's how I'm doing it even as I'm aware that there are other possibilities that have just crossed the horizon :).

[I think that locally I'm going to be down to a 1TB SATA M.2 for the OS, might grab another to mirror when prices crash again, a 2TB 2.5" for games I'm actually playing and for Lightroom Catalog storage, and a 3TB spinner as a games/download target; currently, critical (OS, LR Cats and LR Libraries, Documents) stuff is backed up to an aging external 4TB over USB3. I'd had an aging 2TB Green that I just yanked that was dedicated to LR Libraries, and that along with various media stored on disparate external drives will be the first candidates for NAS storage, which I'm going to start with Storage Spaces on 2016 Datacenter. If that doesn't perform and/or doesn't provide domain integration benefits over say ZFS, then off to ZFS I go!
 
Back
Top