after 32 cores?

M76 · Jul 11, 2018

gamerk2 said:
Pretty much this.

Case in point: There was I project I was working on that got ported to a "modern" (post-80's) processor. The software continued to run horribly. A massive effort was undertaken to make the program more threaded, which ended up costing the taxpayers (you) mid-7 figures.

Performance dropped 50%. We found the software kept grinding to a halt due to threads constantly waiting on eachother.

After that debacle, an actual analysis of the code was done. One minor code change to the original (pre-threaded) code base resulted in a 500% increase in performance.

Threading is not a magical salve for performance increases; the majority of workloads do not scale to an increased number of threads. All that adding more and more cores to the CPU is going to do is clamp down on clock speeds and maximum OC, resulting in lower performance and higher cost to consumers.

There are usages that do benefit from many threads. Or are you suggesting that all workloads are the same?

Dick Johnson · Jul 11, 2018

N4CR said:
Wait for 64 Core 7nm Epyc 2 it's out in '19

Then get a mb with 2 sockets.

128 cores FTW!

Araxie · Jul 11, 2018

sirmonkey1985 said:
no we need more programs/games to better use the threads available to them, clock speed means less and less once an application can scale across multiple cores, we need to stop supporting the half ass lazy coding that is still going on because "oh no people in Brazil or Russia are still playing games on a Pentium 3's so lets screw over everyone to make it fair for them".

there are "things" that simply can't be "splitted up" into several command threads, there's a difference between CPU processing threads and command execution threads which are the amount of parallelization a code can run at the same time without losing the logic, and then as mkrohn said above it will even run slower in many scenarios, in games there aren't many things that can be parallelized before running into IPC bottleneck, game logic, physics calculation, shadow logic, map logic, scenario positioning and even I/O operations (mouse and keyboard behaviors); let's say you can develop a game code to run EACH of those activities on one thread, there will be a point where more threads won't do anything to improve the game engine performance, you can't split shadow logic without adding tons of bugs or glitches, same with rendering pattern send/receive to GPU or worse in I/O, you can't split keyboard command lines or you may end with lot of input latency as an extra command thread will be needed to merge all the process that were splitted before into a single one in this scenario faster IPC will work always better than splitting that activitiy, the same can be said about other programs and applicationss..

more threads will always work excellent for lot of things, but not always more threads = better. but more IPC it's always better, that doesn't mean anything close to lazy coding, it's the nature of computation.

Araxie · Jul 11, 2018

M76 said:
There are usages that do benefit from many threads. Or are you suggesting that all workloads are the same?

to make it short "all workloads are the same?" yes..

Speaking as a single program you don't treat a programming code as workload, all usages and scenarios are transparent to the programming code, all modern programming language doesn't care about what kind of load they are put on, it just need to follow the specific language instructions, more threads allow more programs and applications to be run at the same time, but not make a "workload" anytime faster, the only thing able to run a code faster is IPC which it's the point of gamerk2. you can run a non-threaded program faster and more efficient than the same program fully thread aware at least in real world. there a tons of synthetic benchmarks made specifically to run on as much threads as possible doing a single task, (cinebench as example) and will scale both in thread account as thread IPC, but it don't go beyond that, a benchmark application, in real world scenarios we are in a world where professional licenses for a lot of programs and tools are sold PER core, which it's actually pretty much sad, so in that kind of scenarios IPC and core speed will always win over anything else.

M76 · Jul 11, 2018

Araxie said:
to make it short "all workloads are the same?" yes..

Speaking as a single program you don't treat a programming code as workload, all usages and scenarios are transparent to the programming code, all modern programming language doesn't care about what kind of load they are put on, it just need to follow the specific language instructions, more threads allow more programs and applications to be run at the same time, but not make a "workload" anytime faster, the only thing able to run a code faster is IPC which it's the point of gamerk2. you can run a non-threaded program faster and more efficient than the same program fully thread aware at least in real world. there a tons of synthetic benchmarks made specifically to run on as much threads as possible doing a single task, (cinebench as example) and will scale both in thread account as thread IPC, but it don't go beyond that, a benchmark application, in real world scenarios we are in a world where professional licenses for a lot of programs and tools are sold PER core, which it's actually pretty much sad, so in that kind of scenarios IPC and core speed will always win over anything else.

No application uses per core licensing anymore. At least not in a desktop environment that I know of. Everything is per seat licensed. You can't make a single workload faster, but you can code your application to break the single workload into multiple smaller ones. Also there are tasks which consist of thousands of individual chunks. Both are easy to optimize for a great number of cores.

you can run a non-threaded program faster and more efficient than the same program fully thread aware at least in real world.

I don't know what do you base this on, but it is simply not true. If the task is computation intensive it is always beneficial to break it up into multiple threads. The only way breaking it up won't benefit you is if the bottleneck is io performance, not cpu time, but in that case the task won't be faster running on better ipc either.

Unless you mean that a 1 core cpu that can do the same number of opreations as the other cpu can on 32 cores, then of course the single threaded will be faster. But we simply don't have a cpu like that. So it is completely theoretical, and meaningless in the real world.

gamerk2 · Jul 11, 2018

M76 said:
There are usages that do benefit from many threads. Or are you suggesting that all workloads are the same?

Of course there are usages. But the overwhelming majority of tasks do not lend themselves to be broken up into smaller units.

Hell, this is why GPUs were created in the first place.

gamerk2 · Jul 11, 2018

M76 said:
I don't know what do you base this on, but it is simply not true. If the task is computation intensive it is always beneficial to break it up into multiple threads. The only way breaking it up won't benefit you is if the bottleneck is io performance, not cpu time, but in that case the task won't be faster running on better ipc either.

Not necessarily true. Remember that you, the software engineer, have ZERO control over thread scheduling. Do not assume your applications threads have 100% uptime, and will never be bumped for another applications thread. Never make the assumption you are the only program running on a system.

There's a really basic rule for threading: Is there a point where if you break up a task into smaller threads, where one thread will be forced to wait on another? If the answer is "yes", then you should seriously consider why you are threading in the first place. Threads should be independent units of work; if two threads need to touch, then you've likely threaded too much and will choke performance.

M76 · Jul 11, 2018

gamerk2 said:
Not necessarily true. Remember that you, the software engineer, have ZERO control over thread scheduling. Do not assume your applications threads have 100% uptime, and will never be bumped for another applications thread. Never make the assumption you are the only program running on a system.

There's a really basic rule for threading: Is there a point where if you break up a task into smaller threads, where one thread will be forced to wait on another? If the answer is "yes", then you should seriously consider why you are threading in the first place. Threads should be independent units of work; if two threads need to touch, then you've likely threaded too much and will choke performance.

In my application there are 3 types of threads. One thread responsible for assigning work to work threads and doing all the IO, one thread updating progress and driving the ui. And a number of threads doing the work whose number is dependent on how much resources the cpu has, and how slow or fast is the io process. It works pretty well by breaking up the workload into smaller chunks that it assigns to the workers. I could optimize it further but I just can't be bothered because it is already much faster than the single threaded version was of the same task. Yes sometimes some threads have to wait their turn, but the efficiency of threading is still close to linear. Meaning two threads will do almost twice the work compared to one thread. Of course the more threads there are the less the benefit is, but that is because IO becomes a chokepoint after about 20 threads. Meaning I Can't feed enough data to the workers, so they'll just wait while the data loads from the disk. At which point more threads really offer no more benefit.

gamerk2 · Jul 11, 2018

M76 said:
In my application there are 3 types of threads. One thread responsible for assigning work to work threads and doing all the IO, one thread updating progress and driving the ui. And a number of threads doing the work whose number is dependent on how much resources the cpu has, and how slow or fast is the io process. It works pretty well by breaking up the workload into smaller chunks that it assigns to the workers. I could optimize it further but I just can't be bothered because it is already much faster than the single threaded version was of the same task. Yes sometimes some threads have to wait their turn, but the efficiency of threading is still close to linear. Meaning two threads will do almost twice the work compared to one thread. Of course the more threads there are the less the benefit is, but that is because IO becomes a chokepoint after about 20 threads. Meaning I Can't feed enough data to the workers, so they'll just wait while the data loads from the disk. At which point more threads really offer no more benefit.

And if the workload can be broken up then that's the right way to do it. But most tasks don't break up that way; you have a clear order of things that need to get done, and the individual units of computation are generally small enough where you can't break them up. That's why most games still have on "master" thread that manages everything; that's the one driving CPU 0 to 60% usage.

My other point is that when other applications are run, you run into the problem where the additional threads needing to be serviced results in a situation where a more heavily threaded program runs slower due to a higher likelihood of threads getting blocked from executing due to the sheer amount of them needing to be run. You know all those posts by people wondering why their games performance tanks when they use additional programs (FRAPS, streaming, and so on), despite the fact they run fine by themselves? This is why.

There's also the long term problem that artificial "you require this many cores to run this game" requirements are going to make an entire generation of games unplayable in a decade or two, when single core quantum CPUs that are orders of magnitude faster then what we have today become the norm. Don't laugh; this happened when the first Core 2's came out, because CPUs weren't reaching the 3GHz clock speed requirement that some titles were enforcing in their installers. This is why I never enforce any amount of cores for any program I write, even the ones that make use of many cores on modern systems.

And yes, 20 threads is about the point where IO bottlenecks cause additional threads to reduce performance; this has been known since the 70's when MIT and other universities studied the subject. Granted, this was with multiple physical CPUs, but the underlying problems are the same. Anything that scales beyond that many cores should probably be run on GPUs anyways, since the scaling is better.

Spaceninja · Jul 11, 2018

Neural-net CPU's......

M76 · Jul 12, 2018

gamerk2 said:
And if the workload can be broken up then that's the right way to do it. But most tasks don't break up that way; you have a clear order of things that need to get done, and the individual units of computation are generally small enough where you can't break them up. That's why most games still have on "master" thread that manages everything; that's the one driving CPU 0 to 60% usage.

Games are unique in that they need everything to be synced. That's one scenario where more threads are hard to use and yield little benefit. That doesn't mean I won't take 32 cores if they offer it to me for a number of other uses.

My other point is that when other applications are run, you run into the problem where the additional threads needing to be serviced results in a situation where a more heavily threaded program runs slower due to a higher likelihood of threads getting blocked from executing due to the sheer amount of them needing to be run. You know all those posts by people wondering why their games performance tanks when they use additional programs (FRAPS, streaming, and so on), despite the fact they run fine by themselves? This is why.

That's a problem with task scheduling and core assignments. Intel has an app that is supposed to help with that but I haven't checked if it yields any noticeable benefits when used. But again that is only a problem for gaming where latency is an issue. In uses like rendering, and processing of raw data, it doesn't matter when one or two threads get delayed indefinitely. They'll catch up whenever they catch up. Meanwhile a ton of work is getting done.

There's also the long term problem that artificial "you require this many cores to run this game" requirements are going to make an entire generation of games unplayable in a decade or two, when single core quantum CPUs that are orders of magnitude faster then what we have today become the norm. Don't laugh; this happened when the first Core 2's came out, because CPUs weren't reaching the 3GHz clock speed requirement that some titles were enforcing in their installers. This is why I never enforce any amount of cores for any program I write, even the ones that make use of many cores on modern systems.

I've never encountered or even heard about that problem, and I lived trough that era as a gamer. They used to calculate clock speed on a per core basis. Which also lead to a common misconception of the time that I was trying to fight tooth and nails.
People assumed that a 2 core 1.5GHz cpu was equal to a 1 core 3 GHz one in every task.

And yes, 20 threads is about the point where IO bottlenecks cause additional threads to reduce performance; this has been known since the 70's when MIT and other universities studied the subject. Granted, this was with multiple physical CPUs, but the underlying problems are the same. Anything that scales beyond that many cores should probably be run on GPUs anyways, since the scaling is better.

20 threads is not set in stone, with a faster drive it could probably run more threads. Just because the data the work needs to be performed on can be broken up into smaller pieces doesn't necessarily mean it can be done on a GPU. I'd need to look into that and I'm not keen on getting involved with gpu computing. And why would I if io is the bottleneck already?

gamerk2 · Jul 12, 2018

M76 said:
Games are unique in that they need everything to be synced. That's one scenario where more threads are hard to use and yield little benefit. That doesn't mean I won't take 32 cores if they offer it to me for a number of other uses.

Games don't need to be perfectly synced; there are titles that run it's graphics and physics engines at different rates. Granted, this can cause unexpected problems as you diverge from 60 FPS; Dead Space has a bug where if you get slightly more then 60 FPS a key in-game cut-scene won't happen properly, leading to the user not being able to progress.

In most cases though, games tend to use two threads for the majority of their work. The first thread is the main executable, which contains the main game loop that runs each part of the game engine and processes all the data, generally sequentially. The second is the main rendering thread, which handles the communication between the game and the GPU. In newer APIs the render thread can more easily be broken up, since more then one thread can physically perform rendering. But point being, while games are using 80+ threads (and have been for a good decade now), only a handful do any meaningful amount of work.

That's a problem with task scheduling and core assignments. Intel has an app that is supposed to help with that but I haven't checked if it yields any noticeable benefits when used. But again that is only a problem for gaming where latency is an issue. In uses like rendering, and processing of raw data, it doesn't matter when one or two threads get delayed indefinitely. They'll catch up whenever they catch up. Meanwhile a ton of work is getting done.

Intels solution has problems. Thread scheduling is the domain of the OS and the OS alone; any attempt to override what the OS is doing is going to cause problems in one circumstance or another.

In the grand scheme of things, Windows thread scheduler isn't bad. At it's simplest level, its purely priority based. Whatever thread(s) have the highest priority at any instance are the thread(s) that are executed. Threads that are running get priority decrements; those that are waiting get priority bumps. Kernel threads get higher priority then user threads. This typically ensures most threads don't wait for excessive periods of time, but leads to odd performance issues. For example: This is why threads can jump cores, since they can be bumped due to priority then re-assigned to a completely different core later on. You can in theory lock threads to a specific core (Linux does this), but this can cause unintended consequences, especially if a higher priority thread takes the one core you were previously running on.

My rule here is simple: I leave thread scheduling up to the OS. I NEVER micromanage in this area.

I've never encountered or even heard about that problem, and I lived trough that era as a gamer. They used to calculate clock speed on a per core basis. Which also lead to a common misconception of the time that I was trying to fight tooth and nails.
People assumed that a 2 core 1.5GHz cpu was equal to a 1 core 3 GHz one in every task.

Most games caught on very fest and did a quick math calculation rather then checking for raw clockspeed (probably due to AMDs significantly lower clocks compared to Intel at the time), but there were a handful of titles that checked for CPU clock prior to install. I've got a handful of titles that fall into this category.

20 threads is not set in stone, with a faster drive it could probably run more threads. Just because the data the work needs to be performed on can be broken up into smaller pieces doesn't necessarily mean it can be done on a GPU. I'd need to look into that and I'm not keen on getting involved with gpu computing. And why would I if io is the bottleneck already?

GPUs are basically fully programmable embarrassingly parallel floating point co-processors. You've got core counts numbering in the thousands; any task that scales beyond a handful of cores is going to scale better on GPUs then CPUs. Why do you think massively parallel operations like AI are being done on GPU like architectures? Because it's a far better architecture to scale simple calculations thousands of times.

gigaxtreme1 · Jul 12, 2018

Wait for "Starship".

Mr. Bluntman · Jul 12, 2018

gigaxtreme1 said:
Wait for "Starship".

With a name like Starship it had better have an Interstellar implementation with Out of This World performance. Seriously, if it doesn't make me readjust my Spaceballs because I just went to plaid, it's going to get thrown out into The Expanse.

Wait, what are we talking about here? lol

mkrohn · Jul 12, 2018

M76 said:
There are usages that do benefit from many threads. Or are you suggesting that all workloads are the same?

I also said everything that nicely threads should just be moved to something else like GPU's. Most of what does thread very well is gaming and video encoding and other cases that aren't typical. Intel specifically in the mobile chips is working some voodoo where the base clocks are very low but when needed 1 or two cores can be like 4x the base clock. This is the right direction. I'd like to see something more like what they do in cars where the big v8 can completely shut down and basically become a 4 cylinder when the rest of the power isn't needed.

gigaxtreme1 · Jul 12, 2018

Starship is based on Zen2. There was some talk of increasing core count per CCX.

M76 · Jul 12, 2018

gamerk2 said:
Most games caught on very fest and did a quick math calculation rather then checking for raw clockspeed (probably due to AMDs significantly lower clocks compared to Intel at the time), but there were a handful of titles that checked for CPU clock prior to install. I've got a handful of titles that fall into this category.

Yes there were some games that indicated that by showing a fail in the installshield wizard. But there were also games that showed you had -32MB of ram. But I don't remember any that refused to run. Installers detecting newer hardware incorrectly was common, and not just related to CPU speed, that's why I don't think we should base HW development around that issue.

GPUs are basically fully programmable embarrassingly parallel floating point co-processors. You've got core counts numbering in the thousands; any task that scales beyond a handful of cores is going to scale better on GPUs then CPUs. Why do you think massively parallel operations like AI are being done on GPU like architectures? Because it's a far better architecture to scale simple calculations thousands of times.

I know, I'm not saying they are not efficient at what they do. I just can't be bothered, when I already hit a wall with IO. I do a lot of photogrammetry, and most software only uses GPUs for a few simple tasks, most calculations are still done on the CPU. I doubt devs of $10.000 category software couldn't be bothered with the GPU either. I assume they do what they can on the GPU.

Rockenrooster · Jul 12, 2018

I'll take a 32 Core 4Ghz CPU over any 5-5.5 GHz quad core any day. I do more on my PC than just game 100% of the time. Devs really need to optimize better than just counting on clockspeed or cores to fix it.

Look at ARK Survival Evolved, it runs like utter crap even on a 5+ghz i7 and a 1080Ti. Then look at Warframe, Custom built engine (DX11), looks better (In my opinion), has better physics all around, can also handle large maps, and I can run it maxed out 60+FPS on a GTX 850m and a 2.5GHz i7 quad.
Yet ARK runs like a slideshow on medium settings (Okay, like 25 FPS) on the same machine.

Software seems like it is getting worse as time goes on. Word 2003 takes like 1ms to open (Not really, but you get the point) and word 2016 takes waaaaay more time than that on the same machine. Programs are getting more bloated with useless crap and runtimes out the wazoo.
You shouldn't need a quad core, a ssd, and 8GB of RAM just to run Outlook at an acceptable speed.

If we could just program everything in assembly lol.

tayunz · Jul 15, 2018

I really think Zen 2 has 6 core CCXs, which means topping out at 48 core for a 4 die part. Seems like rumblings point to Zen 2 EPYC topping out at 48 cores.

I really don't know anything though, just a guess. Feel like a Ryzen 2 3700/3800x might be a 12 core part as a result. But who knows.

EniGmA1987 · Jul 19, 2018

LuxTerra said:
Zen 2 is rumored to scale out to 12c dies, which will make 48c TR/EPYCs possible. There's some debate on how that will occur, another 4c CCX or 6c CCXs. Beyond that, Zen 2 will scale out to 16c dies/64c CPUs (unknown if that's quad 4c CCX or dual 8c CCX). At this moment, 64c is only confirmed for EYPC/Rome and not TR. Quad channel ram is going to be a bottleneck on a 64c machine for most tasks that can utilize that many cores. IMHO, we should see 48c TR3 next year, but won't see 64c TR4 until the socket is changed for at least six channel ram, but given Zens architecture eight channel makes the most sense.

This is one reason I would very much like to see the desktop and laptop world never move to DDR5 or even high channel DDR4. It would be better for everyone if the industry moved to GDDR6 as the next standard for all system memory computing. I asked this years ago and no one knew the answer or thought it was a good idea, but PS4 and XBox have proved GDDR5 works perfectly fine for general system memory of normal computing tasks of an OS. For the next generation processors they could have each stick of RAM be 8 channels by itself as each chip would be accessed individually like they do on GPUs. Then instead of having 1 channel per stick, they could do 8 channels a stick and each stock of RAM on the motherboard would be tied to an additional 8 channels in the CPU up to a certain extent. In this way server chips could run 64 channel memory and desktop PCs could run 16 channel memory on mainstream systems (2 sticks sharing 8 channels for total of 4 sticks like we currently have), and HEDT could run 32 channels with either 4 or 8 slots on the MB. Think about that, a 12 channel GPU with GDDR6 gets 768 GB/s of bandwidth. If server CPUs had 64 channel setups they would get 4TB/s bandwidth. Insane. We would be set on all memory bandwidth for higher core counts for years and years with this type of setup. Another cool thing you could do with a setup like that is instead of paralleling all the channels for max bandwidth of a highly threaded operation, you could open it up to a ton of individual single channels or maybe pack them in 2-4 channel groups so a ton of threads being processed could all have their own access to memory without waiting for other requests to finish, in a similar way AMD had ganged and un-ganged channel configs in the AM2 or AM3 days or whatever it was.

LuxTerra · Jul 28, 2018

EniGmA1987 said:
This is one reason I would very much like to see the desktop and laptop world never move to DDR5 or even high channel DDR4. It would be better for everyone if the industry moved to GDDR6 as the next standard for all system memory computing. I asked this years ago and no one knew the answer or thought it was a good idea, but PS4 and XBox have proved GDDR5 works perfectly fine for general system memory of normal computing tasks of an OS. For the next generation processors they could have each stick of RAM be 8 channels by itself as each chip would be accessed individually like they do on GPUs. Then instead of having 1 channel per stick, they could do 8 channels a stick and each stock of RAM on the motherboard would be tied to an additional 8 channels in the CPU up to a certain extent. In this way server chips could run 64 channel memory and desktop PCs could run 16 channel memory on mainstream systems (2 sticks sharing 8 channels for total of 4 sticks like we currently have), and HEDT could run 32 channels with either 4 or 8 slots on the MB. Think about that, a 12 channel GPU with GDDR6 gets 768 GB/s of bandwidth. If server CPUs had 64 channel setups they would get 4TB/s bandwidth. Insane. We would be set on all memory bandwidth for higher core counts for years and years with this type of setup. Another cool thing you could do with a setup like that is instead of paralleling all the channels for max bandwidth of a highly threaded operation, you could open it up to a ton of individual single channels or maybe pack them in 2-4 channel groups so a ton of threads being processed could all have their own access to memory without waiting for other requests to finish, in a similar way AMD had ganged and un-ganged channel configs in the AM2 or AM3 days or whatever it was.

GDDR and DDR are optimized for their purpose. There are differences in how the memory is accessed and the prefetch that make GDDR inefficient for general computing access patterns, but just fine for GPU access patterns. It's not that the computer engineers and industry doesn't know what they are doing, they've optimized each for the task at hand.

This has been discussed since the beginning of GDDR and Google will find you a lot of resources, some simple summaries, some really in depth. Here's a simple one that's relatively recent: https://www.gamersnexus.net/guides/2826-differences-between-ddr4-and-gddr5

As for GDDR use in consoles, it's a compromise. Again, it's not that it can't work, but rather that it's inefficient and sub-optimal. It can be worked around if you limit the tasks your console does and/or spend a lot of time mitigating the issues. Those CPU cores in consoles are really slow compared to your desktop. It's part of the reason console ports often are terrible and lack strategic depth to the AI, etc. There simply isn't enough compute to do great things with those cores. I.e. the inefficiencies of using GDDR for main memory don't matter if your CPU cores are slow enough and you don't plan to do hard or diverse computing with them. Consoles are about cutting costs everywhere, GDDR is required to get the graphics performance, but after that it's about saving money.

Factum · Jul 28, 2018

It's not crappy coding, it is called Amdahl's law:
https://en.wikipedia.org/wiki/Amdahl's_law

mkrohn · Jul 28, 2018

Factum said:
It's not crappy coding, it is called Amdahl's law:
https://en.wikipedia.org/wiki/Amdahl's_law

the people who say crappy coding don't understand programming at all. Yeah there's plenty of examples of bloatware available like MS office but even that actually highlights where improved per core performance does help a lot. No matter how great you thread something there will be a main thread waiting for everything to load in a specific order.

KarsusTG · Jul 29, 2018

Who doesn't play their games with discord open, a browser with 25 tabs and netflix going, a windows or Linux VM running, and boinc / fah idling in the background waiting for an once of unused cpu to lap up? This is the [H]ardforum after all. Moar cores is better for the multitasking addict. We are all at this point multi tasking addicts.

cyberguyz · Jul 29, 2018

Moar is better!!

mkrohn · Jul 29, 2018

KarsusTG said:
Who doesn't play their games with discord open, a browser with 25 tabs and netflix going, a windows or Linux VM running, and boinc / fah idling in the background waiting for an once of unused cpu to lap up? This is the [H]ardforum after all. Moar cores is better for the multitasking addict. We are all at this point multi tasking addicts.

4 cores and 8 threads handles leaving crap open like a champ. If you get into VM's you already have a proper server for keeping that off your main machine anyway... I love 8 cores/16 threads. I'm not saying thats not great to have. I want both 8 cores and 5ghz+ or just voodoo to get more out of every core. 4 cores is not enough and 6 does help a lot.

cyberguyz · Jul 30, 2018

mkrohn said:
4 cores and 8 threads handles leaving crap open like a champ. If you get into VM's you already have a proper server for keeping that off your main machine anyway... I love 8 cores/16 threads. I'm not saying thats not great to have. I want both 8 cores and 5ghz+ or just voodoo to get more out of every core. 4 cores is not enough and 6 does help a lot.

The common opinion back in the day was that dual cores didn't buy the gamer anything when they came out either. And at the time that was very true - for a little while. Then people that didn't have dedicated gaming pcs realized they liked to do a whole lot of things in the background (like downloading p0rn) while playing that game .

mkrohn · Jul 30, 2018

cyberguyz said:
The common opinion back in the day was that dual cores didn't buy the gamer anything when they came out either. And at the time that was very true - for a little while. Then people that didn't have dedicated gaming pcs realized they liked to do a whole lot of things in the background (like downloading p0rn) while playing that game .

Correct... I agree with you that threads help a ton to a certain extent. I think at 4-6 cores PRESENTLY I'm with you completely. I think we have far too much reliance on more cores and we're too far ahead of the curve on this while basically leaving proper performance per core behind. I was on the moar cores bandwagon with my 2670 quite a few years back or dual hex on 1366.

Meeho · Jul 30, 2018

tangoseal said:
I mean you gotta be able to use your cores whether its 4 or 64 cores. Do you have a usage scenario (s)?

I couldn't imagine anything smaller than Enterprise running big VM boxes needing even 32 cores much less 64.

This:

M76 said:
I do a lot of photogrammetry, and most software only uses GPUs for a few simple tasks, most calculations are still done on the CPU.

8700K@5 GHz is laughably slow for this. 64 cores would make a world of difference.

juanrga · Jul 30, 2018

tayunz said:
I really think Zen 2 has 6 core CCXs, which means topping out at 48 core for a 4 die part. Seems like rumblings point to Zen 2 EPYC topping out at 48 cores.

Isn't 64 core the last rumor for Rome?

juanrga · Jul 30, 2018

sirmonkey1985 said:
no we need more programs/games to better use the threads available to them, clock speed means less and less once an application can scale across multiple cores, we need to stop supporting the half ass lazy coding that is still going on because "oh no people in Brazil or Russia are still playing games on a Pentium 3's so lets screw over everyone to make it fair for them".

There are algorithms that are serial. So no matter how many non-lazy programmers you throw to the problem the code will be serial, because the problem is serial.

Several modern games have one/two master threads and then a couple of slave threads dealing with subtasks. You can parallelize those slave threads, but they need to be synchronized by the master threads. So the bottleneck is on the master threads which can only run on a core.

tayunz · Jul 30, 2018

juanrga said:
Isn't 64 core the last rumor for Rome?

Latest is that it's not clear either way. There will definitely be a 64 core part at some point, but Rome might be 48. I've seen it stated both ways though.

oblox · Jul 30, 2018

33 cores

schmide · Jul 30, 2018

mkrohn said:
Do you even code? there are tons of things that just don't thread very well. Everything that DOES thread well belongs on a GPU It has nothing to do with lazy coding. I've run into some things running SLOWER by adding threading. No matter what you have a main thread which all the other threads report back to. 90% of the time I'd gladly take 4 cores running @ 5ghz over 6 running at under 4Ghz. Adding cores leads to dropping clock speed and thats simply doing things wrong for the sake of some oddball benchmarks that don't indicate real world usage.

You lift bro? Yes there are things that don't serialize well, but games are certainly not one of them. They are in fact one of the truest examples of a pipelined producer consumer model.

A few years ago I would of taken the 5x4 now 4x6.

mkrohn · Jul 30, 2018

schmide said:
You lift bro? Yes there are things that don't serialize well, but games are certainly not one of them. They are in fact one of the truest examples of a pipelined producer consumer model.

A few years ago I would of taken the 5x4 now 4x6.

yes... games work great for threaded but when are games really CPU limited though? It is mostly GPU limited especially as we switch to 4k. Real world normal usage is mostly in need of more speed per core. The bulk of what makes up the workday really. At one point larger CPU cache was a miracle worker for this. I had hoped the edram from the iris pro would be able to translate to better application performance. I'm not really sure where the main bottleneck is right now with cache.

schmide · Jul 30, 2018

Well back in the day you were doing 100% on the cpu. Collision detection, hierarchy traversal, binning, rasterizing, sound and such. Then sound and rasterizing was pushed off, but you still had to do all your transformations on the CPU. Then the fixed pipeline gave way to the shader model and the GPU began to do more of the work. Then physics moved over and the GPU basically became saturated and thus it was now the bottleneck.

So we're at a point where the GPU is doing double duty while CPU cores remain unused. To put salt in this wound, doing environmental calculations on the GPU can incur a round trip further saturating the bus. It may be true that CPU cores are not the best at doing some of these tasks, but they can still get it done at some level, freeing up the GPU to do what GPUs do best GPU.

juanrga · Jul 31, 2018

tayunz said:
Latest is that it's not clear either way. There will definitely be a 64 core part at some point, but Rome might be 48. I've seen it stated both ways though.

The former rumor was 48 core for 7nm and 64 core for 7nm+. But the last rumor is that 64 core is already coming to 7nm.

FrgMstr · Jul 31, 2018

juanrga said:
The former rumor was 48 core for 7nm and 64 core for 7nm+. But the last rumor is that 64 core is already coming to 7nm.

Double your CCXs, double you fun.

Neapolitan6th · Jul 31, 2018

mkrohn said:
yes... games work great for threaded but when are games really CPU limited though? It is mostly GPU limited especially as we switch to 4k. Real world normal usage is mostly in need of more speed per core. The bulk of what makes up the workday really. At one point larger CPU cache was a miracle worker for this. I had hoped the edram from the iris pro would be able to translate to better application performance. I'm not really sure where the main bottleneck is right now with cache.

I may be a minority, but I will always answer:

Strategy Games. 12-32 AI players, either turn based, or "real time". I would love to see some creative ways to code these kinds of games for a variable number of threads.

Since the beginning of time it seems, they have been begging for a faster CPU. (Currently only clockspeed can help)

In strategy games I would say Turn Time is almost always more important than FPS above 30.

Factum · Jul 31, 2018

Neapolitan6th said:
I may be a minority, but I will always answer:

Strategy Games. 12-32 AI players, either turn based, or "real time". I would love to see some creative ways to code these kinds of games for a variable number of threads.

Since the beginning of time it seems, they have been begging for a faster CPU. (Currently only clockspeed can help)

In strategy games I would say Turn Time is almost always more important than FPS above 30.

Most strategy game are turn-based, moving one unit at the time...aka very serial.

after 32 cores?

[H]F Junkie

Weaksauce

Supreme [H]ardness

Supreme [H]ardness

[H]F Junkie

2[H]4U

2[H]4U

[H]F Junkie

2[H]4U

2[H]4U

[H]F Junkie

2[H]4U

2[H]4U

Supreme [H]ardness

2[H]4U

2[H]4U

[H]F Junkie

Gawd

[H]ard DCoTM October 2018

Limp Gawd

Limp Gawd

2[H]4U

2[H]4U

2[H]4U

Gawd

2[H]4U

Gawd

2[H]4U

Supreme [H]ardness

2[H]4U

2[H]4U

[H]ard DCoTM October 2018

[H]ard|Gawd

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

Just Plain Mean

[H]ard|Gawd

2[H]4U