Intel Core i9-9900KS Review: The Fastest Gaming CPU Bar None

Of course one core's pegged, you have to have a master thread. In comparison, here's Call Of Duty MW2:

oXhFCON.png


That example of Doom 2016 is on a dual X5675 Xeon based system from 2010 and in that particular configuration running 1920 x 1200 in order to be fairly CPU bound it was pulling between 160-180 FPS under Linux using Proton. You need to consider, that's two 3.06Ghz processors, not even 4Ghz.
 
Last edited:
And that's actually the point: frame render speed is limited by the speed of that thread, while having more than ~8 cores does very little.

I wouldn't go that far. As you can see, each one of those threads is a render thread with a separate PID, the master thread simply schedules all the individual threads. Having one core perform the entire workload would definitely have a substantial impact on performance.

We need to shift away from single threaded optimization as we have reached the limits of current technology, the newer DX12 and Vulkan renders are doing exactly that as evidenced perfectly in my screenshot - That's the most efficient usage of multiple cores I've seen in a game to date and it's highlighted in the resulting FPS considering the machine used.

I say give me cores, I want as many as possible.
 
As you can see, each one of those threads is a render thread with a separate PID, the master thread simply schedules all the individual threads. Having one core perform the entire workload would definitely have a substantial impact on performance.

I can see one thread is loaded, and another is nearly. That's all you can logically derive from these screenshots.

You'd need significantly more reporting from game loops at least to show that the single maxed thread is indeed just the scheduling thread, and that the other threads are also just rendering threads.

We need to shift away from single threaded optimization as we have reached the limits of current technology

While parallelizing workloads should indeed be a goal of computer science research, we are far from reaching the limits of current technology. What we're going to see is more focus on compilers, architecture optimizations for general code, and architectures tuned for specialized code. As an example of all of the above, Apple tends to get more performance out of less hardware (except their GPUs, which are beasts) for their mobile processors, and in turn, they tend to exceed the battery life of their competitors significantly. They do this by optimizing the whole stack, from code generation to the circuitry itself.

Another example, more loosely, is Windows: from Vista to Windows 10, Microsoft improved the performance of the operating system. Not as apparent on new hardware but they've actually made old systems more tolerable than before, and Windows 10 is the fastest gaming operating system available (in general).

That's the most efficient usage of multiple cores I've seen in a game to date and it's highlighted in the resulting FPS considering the machine used.

Really, that's just id. They've always been optimizers, however, their technology is almost never employed for other properties let alone other types of games. I find this unfortunate myself as I'd love to see games with such fluidity and responsiveness elsewhere, but those are the breaks.

I say give me cores, I want as many as possible.

I need enough cores and of those, the fastest I can get. The new Ryzen and TR CPUs are an example of a decent compromise and in general I recommend them first- but for gaming, they're not the fastest, and realistically eight cores is considerable overkill for most desktop uses including gaming. Yes, some games can assign a load to more cores, but at the same time they don't run any faster than they would on six or eight.
 
I can see one thread is loaded, and another is nearly. That's all you can logically derive from these screenshots.

You'd need significantly more reporting from game loops at least to show that the single maxed thread is indeed just the scheduling thread, and that the other threads are also just rendering threads.



While parallelizing workloads should indeed be a goal of computer science research, we are far from reaching the limits of current technology. What we're going to see is more focus on compilers, architecture optimizations for general code, and architectures tuned for specialized code. As an example of all of the above, Apple tends to get more performance out of less hardware (except their GPUs, which are beasts) for their mobile processors, and in turn, they tend to exceed the battery life of their competitors significantly. They do this by optimizing the whole stack, from code generation to the circuitry itself.

Another example, more loosely, is Windows: from Vista to Windows 10, Microsoft improved the performance of the operating system. Not as apparent on new hardware but they've actually made old systems more tolerable than before, and Windows 10 is the fastest gaming operating system available (in general).



Really, that's just id. They've always been optimizers, however, their technology is almost never employed for other properties let alone other types of games. I find this unfortunate myself as I'd love to see games with such fluidity and responsiveness elsewhere, but those are the breaks.



I need enough cores and of those, the fastest I can get. The new Ryzen and TR CPUs are an example of a decent compromise and in general I recommend them first- but for gaming, they're not the fastest, and realistically eight cores is considerable overkill for most desktop uses including gaming. Yes, some games can assign a load to more cores, but at the same time they don't run any faster than they would on six or eight.

OK, this isn't a point worth arguing. I've done quite a bit of testing on this, multi core improvements are a big part of the newer API's, we can't stick to single core rendering forever and it's obvious the main thread is the master scheduling thread which should see 100% utilization.

I've posted two screenshots highlighting these points, others are free to form their own opinions. Processor speed is not going to continue to scale, it's well known that multiple cores are the way of the future. In relation to Windows 10, it is well reported that it struggles with multi core scheduling, especially in NUMA implementations - But I don't run Windows so it's not something I care to really offer a personal opinion on.

Perhaps someone else that runs Windows can test it?
 
I've posted two screenshots highlighting these points, others are free to form their own opinions.

They support your point, however, they do not rise to showing causation- only correlation. That's true just upon inspection of the argument. Further, as we can show six and eight core Zen and Core CPUs putting out the same or better framerates and frametimes as CPUs with significantly more cores, we disprove the causation argument directly.

Point being, even if a game can assign a workload to more than eight cores, doing so does not improve performance.

The part where I'm happy to agree with you is this: having more cores also doesn't regularly and significantly inhibit performance, and that hasn't always been the case!

Processor speed is not going to continue to scale, it's well known that multiple cores are the way of the future.

Clockspeeds may not, or they may- research is ongoing!

However, processor speed- and core speed- will most definitely continue to scale. The improvements just won't necessarily be due to increased clockspeeds.

In relation to Windows 10, it is well reported that it struggles with multi core scheduling, especially in NUMA implementations

Well yeah. Until Threadripper, there was no reason outside of academia to run Windows 10 on a CPU with NUMA. And then there is AMDs awkward CCX implementation...

Apparently that's a sore point that is being addressed. Obviously Microsoft and AMD both have their own stake in improving NUMA performance in Windows 10.
 
They support your point, however, they do not rise to showing causation- only correlation. That's true just upon inspection of the argument. Further, as we can show six and eight core Zen and Core CPUs putting out the same or better framerates and frametimes as CPUs with significantly more cores, we disprove the causation argument directly.

Point being, even if a game can assign a workload to more than eight cores, doing so does not improve performance.

The part where I'm happy to agree with you is this: having more cores also doesn't regularly and significantly inhibit performance, and that hasn't always been the case!



Clockspeeds may not, or they may- research is ongoing!

However, processor speed- and core speed- will most definitely continue to scale. The improvements just won't necessarily be due to increased clockspeeds.



Well yeah. Until Threadripper, there was no reason outside of academia to run Windows 10 on a CPU with NUMA. And then there is AMDs awkward CCX implementation...

Apparently that's a sore point that is being addressed. Obviously Microsoft and AMD both have their own stake in improving NUMA performance in Windows 10.

I don't see any evidence that multi threaded optimization, which is quite obviously what this is, didn't improve performance. You're not going to improve Windows 10's NUMA implementation, if you could it would have been rectified already. I'd expand further on that comment, but I can't as that would be introducing another OS into the discussion resulting in reported posts.

I've made my point, people can believe whatever tickles their fancy.
 
I don't see any evidence that multi threaded optimization, which is quite obviously what this is, didn't improve performance. You're not going to improve Windows 10's NUMA implementation, if you could it would have been rectified already. I'd expand further on that comment, but I can't as that would be introducing another OS into the discussion resulting in reported posts.

I've made my point, people can believe whatever tickles their fancy.

How is it obvious? What is your point of comparison?
 
How is it obvious? What is your point of comparison?

When you have a master thread and many rendering threads each with their own PID, what would you imply you're looking at? Do you honestly believe that's just thread jumping?

That'd have to be the most perfect thread jumping I've ever seen.
 
When you have a master thread and many rendering threads each with their own PID, what would you imply you're looking at? Do you honestly believe that's just thread jumping?

That'd have to be the most perfect thread jumping I've ever seen.

Without having insight into what the code is doing where, I wouldn't make any assumptions.
 
View attachment 198529

Here you go. It's one thing to shill but at least do it correctly. Alot of people bought them because they were efficient and walked all over other chips in it's price range.

I don't know whether they bought them because they were efficient, I think people bought them because they had no idea what they're buying and thought they'd at least be capable. The reality is, while they were more capable than competing processors, they were still a frustratingly slow experience that, when coupled with not enough ram and a mechanical HDD, should have been illegal to sell as the device was barely capable of doing the job it was sold to do.
 
Last edited:
I don't know whether they bought them because they were efficient, I think people bought them because they had no idea what they're buying and thought they'd at least be capable. The reality is, while they were more capable them competing processors, they were still a frustratingly slow experience that, when coupled with not enough ram and a mechanical HDD, should have been illegal to sell as the device was barely capable of doing the job it was sold to do.
Well if you don't give it enough RAM and pair it with a slow HDD I would reckon that would be the problem. Those processors supported 16GB of RAM which is more than what you can get by default in many laptops sold today. A bad build is a bad build.

Workloads matter quite a bit. If you know what hardware is good at and how you can maximize performance you won't have a bad experience. We are well past the era of slow desktop experience.

Anyone who bought one I assure you knew what performance they were getting and knew what it would be good at. For regular desktop usage and HTPC use it was quite good.
 
I don't know whether they bought them because they were efficient, I think people bought them because they had no idea what they're buying and thought they'd at least be capable. The reality is, while they were more capable them competing processors, they were still a frustratingly slow experience that, when coupled with not enough ram and a mechanical HDD, should have been illegal to sell as the device was barely capable of doing the job it was sold to do.

Really? I do not seem to have any issues with my Athlon 5350 and 8GB of ram. With a hdd, it is slower but then, hdds are slow.
 
Well if you don't give it enough RAM and pair it with a slow HDD I would reckon that would be the problem. Those processors supported 16GB of RAM which is more than what you can get by default in many laptops sold today. A bad build is a bad build.

Workloads matter quite a bit. If you know what hardware is good at and how you can maximize performance you won't have a bad experience. We are well past the era of slow desktop experience.

Anyone who bought one I assure you knew what performance they were getting and knew what it would be good at. For regular desktop usage and HTPC use it was quite good.

The problem is, the masses buying these machines have no idea, they just believe whatever the salesman tells them and believe there's no way any brand new OEM device could possibly perform so badly. The performance is shocking in most cases and very few Windows users beyond enthusiasts have 16GB of ram let alone an SSD.

We are most definitely not beyond the era of a slow desktop experience.
 
We are most definitely not beyond the era of a slow desktop experience.
I'm not talking about application or game performance. I'm talking just using Windows or Linux. Sure a HDD is way slower than SSD but that's true regardless of what the processor is. Trust me if you pair a 9900K with a mechanical HDD you will not have a fast experience opening anything. It is acceptible for most.

Back in the day simply turning on your computer meant you had at least 10 minutes to walk away and do something else while it loaded....lol
 
  • Like
Reactions: N4CR
like this
I'm not talking about application or game performance. I'm talking just using Windows or Linux. Sure a HDD is way slower than SSD but that's true regardless of what the processor is. Trust me if you pair a 9900K with a mechanical HDD you will not have a fast experience opening anything. It is acceptible for most.

Back in the day simply turning on your computer meant you had at least 10 minutes to walk away and do something else while it loaded....lol

Running Windows with a mechanical HDD and such a processor, you still need to walk away for 10 mins while something loads, at least until Windows has finished booting - Which can take a good 10 mins at least. You can't even open the Start Menu as soon as the desktop loads.

The entire experience is frustratingly slow and useless. Hence the rising popularity of mobile devices.

You can run a 9900K with a mechanical HDD and have a fast experience, but I can't discuss how here...
 
OK, this isn't a point worth arguing. I've done quite a bit of testing on this, multi core improvements are a big part of the newer API's, we can't stick to single core rendering forever and it's obvious the main thread is the master scheduling thread which should see 100% utilization.

I've posted two screenshots highlighting these points, others are free to form their own opinions. Processor speed is not going to continue to scale, it's well known that multiple cores are the way of the future. In relation to Windows 10, it is well reported that it struggles with multi core scheduling, especially in NUMA implementations - But I don't run Windows so it's not something I care to really offer a personal opinion on.

Perhaps someone else that runs Windows can test it?

At the end of the day an OC’d 9900k/KS/KF or sometimes 9700K give the highest fps for high Hz.

While I would love for more cores to matter I haven’t seen a case in gaming where they do. I would love an excuse to wait and buy a 16 core 3950x but it just doesn’t exist.
 
At the end of the day an OC’d 9900k/KS/KF or sometimes 9700K give the highest fps for high Hz.

While I would love for more cores to matter I haven’t seen a case in gaming where they do. I would love an excuse to wait and buy a 16 core 3950x but it just doesn’t exist.

And they give even higher FPS when a modern API is used making the most efficient use of all available cores...
 
  • Like
Reactions: kac77
like this
But the highest IPC / clockspeed combo still wins. Even the 8 core 8 thread 9700k.

Enthusiast grade processors from either AMD or Intel have more than enough performance for a more than decent single threaded experience, the fact is the world is slowly shifting away from single threaded application.

Which is a good thing. The more cores, the better. Higher clock speeds just result in cooling issues.
 
Last edited:
$472 for a 9900k 8c/16t.
$499 for a 3900X 12c/24t.

The math has never been easier.
 
Single thread throughput will always be critically important.

And in both cases single core performance is more than adequate. The hope is that in time single core becomes less important as silicon technology is quite obviously hitting a ceiling.
 
  • Like
Reactions: kac77
like this
And in both cases single core performance is more than adequate. The hope is that in time single core becomes less important as silicon technology is quite obviously hitting a ceiling.

Sadly, It will never go away, Amdahl’s law is what it is. For some workloads you can split endlessly, but huge amounts never will due to the nature of the problem sets.

And of course increasing each core helps multicore processors too, N times!

Not disagreeing with your points, just reiterating that per core is hugely important even if difficult, and even if we can go wide.
 
Everyone's raving about single core performance and gaming... but isn't just about every CPU sold now multi-core? It stands to reason that more and more gaming engines and software in general is going to start embracing and truly leveraging multi-core tech... I can see overall multi-core performance and not single core performance becoming the prime criteria used to gauge gaming performance eventually.
 
Last edited:
Most all the gains on the CPU end are I/O, on the single core end, Multicore communication has helped a LOT. Face it, Intel and AMD are just overclocking the shit out of their procs for performance gains as the single core GHz has hit the wall. Single core in of itself hasn't really advanced lately. You can only take the x86/64 "building block" so far by itself. Go wide has been the game for years. I don't see that changing soon.
 
But the highest IPC / clockspeed combo still wins. Even the 8 core 8 thread 9700k.

RDR2 shows that if you code highly threaded games this is not the case since trying to chase high fps the 9700k and the 9600k end up screwing themselves up and creating a stuttering mess, forcing you to cap the frames until it's low enough for those non HT cpu to handle (and this is only the latest game to show that, someone has mentioned already that FC 5 had similar performance quirks/bias against these cpus)

I'll give you that they seemed to have higher average, but their minimum was even less than half what most other similar processors can handle.
 
You can run a 9900K with a mechanical HDD and have a fast experience, but I can't discuss how here...
No, even a supercomputer would be slow with a single HDD in loading and storing data into memory.
A single HDD is far more than a bottleneck on even low power and embedded systems at this point, let alone a workstation with a powerful CPU.

Also, you can totally discuss it here. :p
Once the data has been loaded from the HDD into RAM, the "experience" can be fast; it is the wait-time for the data to move from the HDD to RAM that is abysmal.

2TB and lower, HDDs are completely obsolete, even at cost.
For 3TB and above, HDDs are still good at storage, but not for OS and day-to-day usage, let alone databases or enterprise outside of WORM media.
 
Last edited:
No, even a supercomputer would be slow with a single HDD in loading and storing data into memory.
A single HDD is far more than a bottleneck on even low power and embedded systems at this point, let alone a workstation with a powerful CPU.

Also, you can totally discuss it here. :p
Once the data has been loaded from the HDD into RAM, the "experience" can be fast; it is the wait-time for the data to move from the HDD to RAM that is abysmal.

2TB and lower, HDDs are completely obsolete, even at cost.
For 3TB and above, HDDs are still good at storage, but not for OS and day-to-day usage, let alone databases or enterprise outside of WORM media.

Your biggest problem with mechanical HDD's and performance in most scenarios is NTFS and the NT kernel - Both are overdue for retirement. It's also one of the biggest problems with multi threaded application for most.

Remember, you said I could discuss it. ;)
 
  • Like
Reactions: kac77
like this
Your biggest problem with mechanical HDD's and performance in most scenarios is NTFS and the NT kernel - Both are overdue for retirement.

Remember, you said I could discuss it. ;)
HA! Definitely agree with both of those statements. :D
 
Sadly, It will never go away, Amdahl’s law is what it is. For some workloads you can split endlessly, but huge amounts never will due to the nature of the problem sets.

And of course increasing each core helps multicore processors too, N times!

Not disagreeing with your points, just reiterating that per core is hugely important even if difficult, and even if we can go wide.

Lets use my Cinebench screenshot as an example of multi threaded application vs single threaded application. If you were to render that image using a single core with the same level of performance, you would need a processor running at a speed that is never going to happen on silicon technology. We're at the limits, and both makes of processor offer impressive single threaded performance - In the real world there's very little between them.

We have no choice but to overcome the current scheduler issues and begin splitting applications into threads if we want to continue to ramp speeds faster. As shown in my example regarding Doom 2016, game developers are already beginning to apply such logic to games using modern API's/engines. Something that really wasn't possible regarding older versions of DirectX/OGL.
 
  • Like
Reactions: kac77
like this
I haven't had a ten minute bootup on a computer since the IMSAI 8080.

You've gotta get out more...

Try working on the average users PC as a tech, most are still running 4GB of ram and mechanical HDD's and Windows 10 is not less resource intensive than earlier versions of Windows. Only yesterday I did a major Windows update on a 4GB/Pentium/mechanical HDD based laptop like the kind of PC most use and it took a full four hours to finish.

The IMSAI running FDD's was faster than that. My Pentium 3 Tualatin with 384MB of ram and Windows 2000 is waaay faster than that.
 
Back
Top