after 32 cores?

Discussion in 'AMD Processors' started by Epyon, Jun 20, 2018.

  1. gamerk2

    gamerk2 [H]ard|Gawd

    Messages:
    1,173
    Joined:
    Jul 9, 2012
    Pretty much this.

    Case in point: There was I project I was working on that got ported to a "modern" (post-80's) processor. The software continued to run horribly. A massive effort was undertaken to make the program more threaded, which ended up costing the taxpayers (you) mid-7 figures.

    Performance dropped 50%. We found the software kept grinding to a halt due to threads constantly waiting on eachother.

    After that debacle, an actual analysis of the code was done. One minor code change to the original (pre-threaded) code base resulted in a 500% increase in performance.

    Threading is not a magical salve for performance increases; the majority of workloads do not scale to an increased number of threads. All that adding more and more cores to the CPU is going to do is clamp down on clock speeds and maximum OC, resulting in lower performance and higher cost to consumers.
     
  2. M76

    M76 [H]ardness Supreme

    Messages:
    6,923
    Joined:
    Jun 12, 2012
    There are usages that do benefit from many threads. Or are you suggesting that all workloads are the same?
     
  3. Dick Johnson

    Dick Johnson n00bie

    Messages:
    41
    Joined:
    Sep 27, 2016
    Then get a mb with 2 sockets.

    128 cores FTW!
     
  4. Araxie

    Araxie [H]ardness Supreme

    Messages:
    6,070
    Joined:
    Feb 11, 2013
    there are "things" that simply can't be "splitted up" into several command threads, there's a difference between CPU processing threads and command execution threads which are the amount of parallelization a code can run at the same time without losing the logic, and then as mkrohn said above it will even run slower in many scenarios, in games there aren't many things that can be parallelized before running into IPC bottleneck, game logic, physics calculation, shadow logic, map logic, scenario positioning and even I/O operations (mouse and keyboard behaviors); let's say you can develop a game code to run EACH of those activities on one thread, there will be a point where more threads won't do anything to improve the game engine performance, you can't split shadow logic without adding tons of bugs or glitches, same with rendering pattern send/receive to GPU or worse in I/O, you can't split keyboard command lines or you may end with lot of input latency as an extra command thread will be needed to merge all the process that were splitted before into a single one in this scenario faster IPC will work always better than splitting that activitiy, the same can be said about other programs and applicationss..

    more threads will always work excellent for lot of things, but not always more threads = better. but more IPC it's always better, that doesn't mean anything close to lazy coding, it's the nature of computation.
     
    drescherjm likes this.
  5. Araxie

    Araxie [H]ardness Supreme

    Messages:
    6,070
    Joined:
    Feb 11, 2013
    to make it short "all workloads are the same?" yes..

    Speaking as a single program you don't treat a programming code as workload, all usages and scenarios are transparent to the programming code, all modern programming language doesn't care about what kind of load they are put on, it just need to follow the specific language instructions, more threads allow more programs and applications to be run at the same time, but not make a "workload" anytime faster, the only thing able to run a code faster is IPC which it's the point of gamerk2. you can run a non-threaded program faster and more efficient than the same program fully thread aware at least in real world. there a tons of synthetic benchmarks made specifically to run on as much threads as possible doing a single task, (cinebench as example) and will scale both in thread account as thread IPC, but it don't go beyond that, a benchmark application, in real world scenarios we are in a world where professional licenses for a lot of programs and tools are sold PER core, which it's actually pretty much sad, so in that kind of scenarios IPC and core speed will always win over anything else.
     
    Last edited: Jul 11, 2018 at 12:40 PM
  6. M76

    M76 [H]ardness Supreme

    Messages:
    6,923
    Joined:
    Jun 12, 2012
    No application uses per core licensing anymore. At least not in a desktop environment that I know of. Everything is per seat licensed. You can't make a single workload faster, but you can code your application to break the single workload into multiple smaller ones. Also there are tasks which consist of thousands of individual chunks. Both are easy to optimize for a great number of cores.
    I don't know what do you base this on, but it is simply not true. If the task is computation intensive it is always beneficial to break it up into multiple threads. The only way breaking it up won't benefit you is if the bottleneck is io performance, not cpu time, but in that case the task won't be faster running on better ipc either.

    Unless you mean that a 1 core cpu that can do the same number of opreations as the other cpu can on 32 cores, then of course the single threaded will be faster. But we simply don't have a cpu like that. So it is completely theoretical, and meaningless in the real world.
     
    Last edited: Jul 11, 2018 at 1:00 PM
  7. gamerk2

    gamerk2 [H]ard|Gawd

    Messages:
    1,173
    Joined:
    Jul 9, 2012
    Of course there are usages. But the overwhelming majority of tasks do not lend themselves to be broken up into smaller units.

    Hell, this is why GPUs were created in the first place.
     
  8. gamerk2

    gamerk2 [H]ard|Gawd

    Messages:
    1,173
    Joined:
    Jul 9, 2012
    Not necessarily true. Remember that you, the software engineer, have ZERO control over thread scheduling. Do not assume your applications threads have 100% uptime, and will never be bumped for another applications thread. Never make the assumption you are the only program running on a system.

    There's a really basic rule for threading: Is there a point where if you break up a task into smaller threads, where one thread will be forced to wait on another? If the answer is "yes", then you should seriously consider why you are threading in the first place. Threads should be independent units of work; if two threads need to touch, then you've likely threaded too much and will choke performance.
     
  9. M76

    M76 [H]ardness Supreme

    Messages:
    6,923
    Joined:
    Jun 12, 2012
    In my application there are 3 types of threads. One thread responsible for assigning work to work threads and doing all the IO, one thread updating progress and driving the ui. And a number of threads doing the work whose number is dependent on how much resources the cpu has, and how slow or fast is the io process. It works pretty well by breaking up the workload into smaller chunks that it assigns to the workers. I could optimize it further but I just can't be bothered because it is already much faster than the single threaded version was of the same task. Yes sometimes some threads have to wait their turn, but the efficiency of threading is still close to linear. Meaning two threads will do almost twice the work compared to one thread. Of course the more threads there are the less the benefit is, but that is because IO becomes a chokepoint after about 20 threads. Meaning I Can't feed enough data to the workers, so they'll just wait while the data loads from the disk. At which point more threads really offer no more benefit.
     
    Last edited: Jul 11, 2018 at 3:23 PM
  10. gamerk2

    gamerk2 [H]ard|Gawd

    Messages:
    1,173
    Joined:
    Jul 9, 2012
    And if the workload can be broken up then that's the right way to do it. But most tasks don't break up that way; you have a clear order of things that need to get done, and the individual units of computation are generally small enough where you can't break them up. That's why most games still have on "master" thread that manages everything; that's the one driving CPU 0 to 60% usage.

    My other point is that when other applications are run, you run into the problem where the additional threads needing to be serviced results in a situation where a more heavily threaded program runs slower due to a higher likelihood of threads getting blocked from executing due to the sheer amount of them needing to be run. You know all those posts by people wondering why their games performance tanks when they use additional programs (FRAPS, streaming, and so on), despite the fact they run fine by themselves? This is why.

    There's also the long term problem that artificial "you require this many cores to run this game" requirements are going to make an entire generation of games unplayable in a decade or two, when single core quantum CPUs that are orders of magnitude faster then what we have today become the norm. Don't laugh; this happened when the first Core 2's came out, because CPUs weren't reaching the 3GHz clock speed requirement that some titles were enforcing in their installers. This is why I never enforce any amount of cores for any program I write, even the ones that make use of many cores on modern systems.

    And yes, 20 threads is about the point where IO bottlenecks cause additional threads to reduce performance; this has been known since the 70's when MIT and other universities studied the subject. Granted, this was with multiple physical CPUs, but the underlying problems are the same. Anything that scales beyond that many cores should probably be run on GPUs anyways, since the scaling is better.
     
  11. Spaceninja

    Spaceninja [H]ard|Gawd

    Messages:
    1,664
    Joined:
    Sep 15, 2004
  12. M76

    M76 [H]ardness Supreme

    Messages:
    6,923
    Joined:
    Jun 12, 2012
    Games are unique in that they need everything to be synced. That's one scenario where more threads are hard to use and yield little benefit. That doesn't mean I won't take 32 cores if they offer it to me for a number of other uses.

    That's a problem with task scheduling and core assignments. Intel has an app that is supposed to help with that but I haven't checked if it yields any noticeable benefits when used. But again that is only a problem for gaming where latency is an issue. In uses like rendering, and processing of raw data, it doesn't matter when one or two threads get delayed indefinitely. They'll catch up whenever they catch up. Meanwhile a ton of work is getting done.
    I've never encountered or even heard about that problem, and I lived trough that era as a gamer. They used to calculate clock speed on a per core basis. Which also lead to a common misconception of the time that I was trying to fight tooth and nails.
    People assumed that a 2 core 1.5GHz cpu was equal to a 1 core 3 GHz one in every task.

    20 threads is not set in stone, with a faster drive it could probably run more threads. Just because the data the work needs to be performed on can be broken up into smaller pieces doesn't necessarily mean it can be done on a GPU. I'd need to look into that and I'm not keen on getting involved with gpu computing. And why would I if io is the bottleneck already?
     
  13. gamerk2

    gamerk2 [H]ard|Gawd

    Messages:
    1,173
    Joined:
    Jul 9, 2012
    Games don't need to be perfectly synced; there are titles that run it's graphics and physics engines at different rates. Granted, this can cause unexpected problems as you diverge from 60 FPS; Dead Space has a bug where if you get slightly more then 60 FPS a key in-game cut-scene won't happen properly, leading to the user not being able to progress.

    In most cases though, games tend to use two threads for the majority of their work. The first thread is the main executable, which contains the main game loop that runs each part of the game engine and processes all the data, generally sequentially. The second is the main rendering thread, which handles the communication between the game and the GPU. In newer APIs the render thread can more easily be broken up, since more then one thread can physically perform rendering. But point being, while games are using 80+ threads (and have been for a good decade now), only a handful do any meaningful amount of work.

    Intels solution has problems. Thread scheduling is the domain of the OS and the OS alone; any attempt to override what the OS is doing is going to cause problems in one circumstance or another.

    In the grand scheme of things, Windows thread scheduler isn't bad. At it's simplest level, its purely priority based. Whatever thread(s) have the highest priority at any instance are the thread(s) that are executed. Threads that are running get priority decrements; those that are waiting get priority bumps. Kernel threads get higher priority then user threads. This typically ensures most threads don't wait for excessive periods of time, but leads to odd performance issues. For example: This is why threads can jump cores, since they can be bumped due to priority then re-assigned to a completely different core later on. You can in theory lock threads to a specific core (Linux does this), but this can cause unintended consequences, especially if a higher priority thread takes the one core you were previously running on.

    My rule here is simple: I leave thread scheduling up to the OS. I NEVER micromanage in this area.

    Most games caught on very fest and did a quick math calculation rather then checking for raw clockspeed (probably due to AMDs significantly lower clocks compared to Intel at the time), but there were a handful of titles that checked for CPU clock prior to install. I've got a handful of titles that fall into this category.

    GPUs are basically fully programmable embarrassingly parallel floating point co-processors. You've got core counts numbering in the thousands; any task that scales beyond a handful of cores is going to scale better on GPUs then CPUs. Why do you think massively parallel operations like AI are being done on GPU like architectures? Because it's a far better architecture to scale simple calculations thousands of times.
     
  14. Mr. Bluntman

    Mr. Bluntman [H]ardness Supreme

    Messages:
    6,339
    Joined:
    Jun 25, 2007
    So, 34 cores on-die/package with one disabled? :ROFLMAO:
     
    gigaxtreme1 likes this.
  15. gigaxtreme1

    gigaxtreme1 2[H]4U

    Messages:
    3,283
    Joined:
    Oct 1, 2002
  16. Mr. Bluntman

    Mr. Bluntman [H]ardness Supreme

    Messages:
    6,339
    Joined:
    Jun 25, 2007
    With a name like Starship it had better have an Interstellar implementation with Out of This World performance. Seriously, if it doesn't make me readjust my Spaceballs because I just went to plaid, it's going to get thrown out into The Expanse.

    Wait, what are we talking about here? lol
     
  17. mkrohn

    mkrohn 2[H]4U

    Messages:
    2,186
    Joined:
    Apr 30, 2012
    I also said everything that nicely threads should just be moved to something else like GPU's. Most of what does thread very well is gaming and video encoding and other cases that aren't typical. Intel specifically in the mobile chips is working some voodoo where the base clocks are very low but when needed 1 or two cores can be like 4x the base clock. This is the right direction. I'd like to see something more like what they do in cars where the big v8 can completely shut down and basically become a 4 cylinder when the rest of the power isn't needed.
     
  18. gigaxtreme1

    gigaxtreme1 2[H]4U

    Messages:
    3,283
    Joined:
    Oct 1, 2002
    Starship is based on Zen2. There was some talk of increasing core count per CCX.
     
  19. M76

    M76 [H]ardness Supreme

    Messages:
    6,923
    Joined:
    Jun 12, 2012
    Yes there were some games that indicated that by showing a fail in the installshield wizard. But there were also games that showed you had -32MB of ram. But I don't remember any that refused to run. Installers detecting newer hardware incorrectly was common, and not just related to CPU speed, that's why I don't think we should base HW development around that issue.

    I know, I'm not saying they are not efficient at what they do. I just can't be bothered, when I already hit a wall with IO. I do a lot of photogrammetry, and most software only uses GPUs for a few simple tasks, most calculations are still done on the CPU. I doubt devs of $10.000 category software couldn't be bothered with the GPU either. I assume they do what they can on the GPU.
     
  20. Rockenrooster

    Rockenrooster n00bie

    Messages:
    36
    Joined:
    Apr 11, 2017
    I'll take a 32 Core 4Ghz CPU over any 5-5.5 GHz quad core any day. I do more on my PC than just game 100% of the time. Devs really need to optimize better than just counting on clockspeed or cores to fix it.

    Look at ARK Survival Evolved, it runs like utter crap even on a 5+ghz i7 and a 1080Ti. Then look at Warframe, Custom built engine (DX11), looks better (In my opinion), has better physics all around, can also handle large maps, and I can run it maxed out 60+FPS on a GTX 850m and a 2.5GHz i7 quad.
    Yet ARK runs like a slideshow on medium settings (Okay, like 25 FPS) on the same machine.

    Software seems like it is getting worse as time goes on. Word 2003 takes like 1ms to open (Not really, but you get the point) and word 2016 takes waaaaay more time than that on the same machine. Programs are getting more bloated with useless crap and runtimes out the wazoo.
    You shouldn't need a quad core, a ssd, and 8GB of RAM just to run Outlook at an acceptable speed.

    If we could just program everything in assembly lol.
     
    {NG}Fidel likes this.
  21. tayunz

    tayunz [H]ard|Gawd

    Messages:
    1,079
    Joined:
    Sep 27, 2004
    I really think Zen 2 has 6 core CCXs, which means topping out at 48 core for a 4 die part. Seems like rumblings point to Zen 2 EPYC topping out at 48 cores.

    I really don't know anything though, just a guess. Feel like a Ryzen 2 3700/3800x might be a 12 core part as a result. But who knows.