This is why I like y'all. You watch a video like the one in the op and it makes it seem like it gives you a massive performance increase in everything then [H] comes around and puts out numbers for real world applications and we can see what's up.

Any chance something like this will help in gaming? Also the performance penalty by uesing this program says it is doing something any idea why it would make it that much slower vs that one in the video that picks up a lot of speed? Think this takes care of the windows part of it but now programs have to change a little bit also?
Thanks for the kind words. Wendell at L1T does some great work, and we conversate over Twitter. I was really hoping this would be a fix for our 2990WX encoding issues. For the record, AMD has never let on over the last 6 months that it is aware of the exact issue, and we have spoken to them about it often.
 
What is actual CPU utilization during these handbrake encodes? I don't use handbrake admittedly but I thought it was known it scales like crap with over 8 cores (phys or log) I'm not sure.

https://hwbot.org/benchmark/hwbot_x265_benchmark_-_4k/rankings?cores=32#start=0#interval=20

2990wx seems to do pretty well using this x265 benchmark which I assume is better utilizing the CPU. I see all these handbrake benchmarks and I suppose that is what the majority of people use open source. I personally have never cared for it and there are better open source tools/GUIs for media encoding using x264/x265.

Run this benchmark linked / riptbot264 / ripbot264 with the distributed encoding option (what I use with my 1950x/poweredge r820) and/or run a couple of VMs on the 2990wx host and run two instances of x265 to be sure to saturate the CPU.

Posting results without any CPU usage metrics durig these tests seems pretty pointless to actually articulate an issue about this.

If the thought this was possibly a OS update was the culprit wouldn't you ISO install a win 10 build prior to this update and test for baseline and then update and test again? I thought troubleshooting was identifying and being concrete on the cause of an issue rather than assuming and testing a fix.

I won't comment about premiere but running handbrake 4k to 1080p shouldn't really even be a test someone runs to measure performance of a 32 core hedt CPU. Handbrake should probably go in your recycle bin until it's written to better utilize these CPUs.
 
Last edited:
I think BF5 increased in performance or at least has more stable FPS or something after doing a re-test to eliminate the poor GPU utilization bug.

At least i'm doing better anyways now, lol.
 
Handbrake should probably go in your recycle bin until it's written to better utilize these CPUs.

I'm not trying to turn this into a Windows vs Linux argument, and perhaps I need to read the OP a little better. But as far as I'm aware this NUMA issue isn't present under Linux, it's only present under Windows? Which sort of makes sense considering the use of Linux as the OS of choice running massively paralleled supercomputers.

So why blame Handbrake when the issue isn't present under Linux and as far as I'm aware, isn't present under Windows running Intel hardware?

As stated, not trying to cause a ruckus, just stating what I believe to be the obvious, unless I'm missing something here. Not blaming AMD here either as I believe Windows NUMA implememtation simply cannot handle AMD's multicore architecture.

My own testing on a platform that can switch between SMP and NUMA highlights that NUMA is faster in literally all cases running Linux. As can be assumed by my use of the term SMP, my system is an older dual socket system, and NUMA still performs better in literally all cases running modern software.
 
Last edited:
The other thing I question regarding the video is where the presenter states that we're specifically talking about single socket systems? Yes, it looks like one socket, but as far as I'm aware that's literally two AMD CPU's 'joined at the hip', you can even see this by looking at the bottom of the CPU as a package. So, technically speaking, while the distances between data paths is naturally substantially shorter between the two processors considering NUMA, there is still two processor packages present and therefore two individual sockets that simply look like one large socket?
 
But as far as I'm aware this NUMA issue isn't present under Linux
There are actually four distinct problem areas:
  1. Windows operating system scheduler sucks (this causes Windows performance drop in 7-zip, Indigo, GraphicsMagick, Stockfish, ...)
  2. Application software scales poorly (Handbrake, Adobe PP, SPECwpc CFD, ...)
  3. Software category is generally ill-suited for NUMA (databases, gaming)
  4. Limitation due to memory bandwidth, or lack of directly connected RAM on two dies (not aware of any example where this is severe)
We are talking here of an instance that less informed publications suspected to be in the fourth category, but now has been demonstrated to be in the first. Switching to Linux (or public shaming of Microsoft until they act) can address the first problem, but not the others.
 
There are actually four distinct problem areas:
  1. Windows operating system scheduler sucks (this causes Windows performance drop in 7-zip, Indigo, GraphicsMagick, Stockfish, ...)
  2. Application software scales poorly (Handbrake, Adobe PP, SPECwpc CFD, ...)
  3. Software category is generally ill-suited for NUMA (databases, gaming)
  4. Limitation due to memory bandwidth, or lack of directly connected RAM on two dies (not aware of any example where this is severe)
We are talking here of an instance that less informed publications suspected to be in the fourth category, but now has been demonstrated to be in the first. Switching to Linux (or public shaming of Microsoft until they act) can address the first problem, but not the others.

I've tested UMA vs NUMA under Linux and found that applications work well within assigned workspaces ensuring that memory access is structured in a way that each core works well within the confines of it's own pool of memory, only occasionally having to access a pool of memory assigned to another core - Htop is actually quite good for such testing. As stated by the presenter in the video, even when NUMA requires a processor access the memory pool of another processor, the performance drop is quite negligible, certainly not 50%.

Phoronix did testing quite some time ago using an Epyc equipped server that highlighted beyond all doubt that this is certainly not an application level issue, these performance issues experienced under Windows and the NT kernel are not present under Linux - At all.

Furthermore, memory bandwidth has been ruled out as the cause of the issue.
 
Last edited:
My own testing on a platform that can switch between SMP and NUMA highlights that NUMA is faster in literally all cases running Linux. As can be assumed by my use of the term SMP, my system is an older dual socket system, and NUMA still performs better in literally all cases running modern software.

That's probably because your older dual socket is not an SMP system (or more accurately a UMA since SMP actually says nothing about memory access). Unless you are basically testing on PPros. Pretty much any system with an IMC is a NUMA system in some regard and esp with multi-socket systems.

The only real questions are how many NUMA levels does a modern system have and how disparate are the latency differential of those levels. Ryzen based designs basically have 2 NUMA levels on chip and up to 2 additional levels per socket and another level in dual socket systems, all with significant steps. The Intel designs actually have more NUMA levels on chip, up to 28 on Xeons, but the step wise latency increases are fairly minimal enough that they can just be treated as UMA with additional step added at both 2 sockets and 4 sockets which are fairly significant.
 
There are actually four distinct problem areas:
  1. Windows operating system scheduler sucks (this causes Windows performance drop in 7-zip, Indigo, GraphicsMagick, Stockfish, ...)
  2. Application software scales poorly (Handbrake, Adobe PP, SPECwpc CFD, ...)
  3. Software category is generally ill-suited for NUMA (databases, gaming)
  4. Limitation due to memory bandwidth, or lack of directly connected RAM on two dies (not aware of any example where this is severe)
We are talking here of an instance that less informed publications suspected to be in the fourth category, but now has been demonstrated to be in the first. Switching to Linux (or public shaming of Microsoft until they act) can address the first problem, but not the others.

3) Neither Intel nor AMD sells any systems that are UMA and they haven't for a while. Its not NUMA that is the issue, but the step wise latency increases inherent in the AMD design that presents the challenges combined with the asymmetrical configuration.
 
That's probably because your older dual socket is not an SMP system (or more accurately a UMA since SMP actually says nothing about memory access). Unless you are basically testing on PPros. Pretty much any system with an IMC is a NUMA system in some regard and esp with multi-socket systems.

The only real questions are how many NUMA levels does a modern system have and how disparate are the latency differential of those levels. Ryzen based designs basically have 2 NUMA levels on chip and up to 2 additional levels per socket and another level in dual socket systems, all with significant steps. The Intel designs actually have more NUMA levels on chip, up to 28 on Xeons, but the step wise latency increases are fairly minimal enough that they can just be treated as UMA with additional step added at both 2 sockets and 4 sockets which are fairly significant.

You are totally correct, all multiprocessor implementations are essentially SMP with the only real differences being memory access as either UMA or NUMA - My particular system simply refers to UMA as SMP and NUMA as NUMA in bios, hence my mention of the term older hardware. My system is far newer than Pentium pros.

The fact remains that this issue regarding AMD processors appears to be limited to Windows only.

NUMA performance is important as it's the direction everything is headed.
 
Would you agree that Threadripper and Epyc are essentially individual dies in the one package with dual sockets placed exceptionally close together to, in effect, appear as one socket?
 
Sorry for the maybe stupid question.

But is this a problem that affects other AMD CPUs as well?
I my self have a 1950x in my system but i'm also thinking of the 1k and 2k series of Ryzen CPU's or is it only on 16+ core systems?
 
As stated by the presenter in the video, even when NUMA requires a processor access the memory pool of another processor, the performance drop is quite negligible, certainly not 50%.
It very much depends on the workload/benchmark. pgbench for example is extremely sensitive so such things, even on Linux, in the read/write test.

XOTTCXR.png

Source: Phoronix
So handbrake is super interesting, in a bad way.
I am not really surprised that the Handbrake situation is different. It has been known that launching multiple instances of Handbrake will give the expected (cumulative) performance, so obviously it is not the Windows scheduler eating the CPU time by shuffling things around.
 
Thanks. :) I try. I like your no-nonsense hardened grizzlenes, too. So handbrake is super interesting, in a bad way. Check it out -- I need to send you/post the bitd utility. it's just bitd.exe bundled with coreprio, or maybe not this build. Here's the output for me trying to convert to h265 from 1080p flv OBS recording -- it was about 40-50 fps on epyc (numa mode). This is just a dump not coreprio doing anytthing.

The cpu affinity mask? That's the program setting it itself. Also there's like 600 threads for just 64 processes so wwwwtttttfffffff is handbrake doing. I thought handbrake was basically a frontend for ffmpeg which also scales really well.

I will keep digging. There is hope :D


You can see the capture from handbrake initially is like 37 threads then I hit the export button, and blam. Done. 600 thread spaghetti nightmare.

I am going to use the big buck bunny movie in mp4 as the base/source file unless there is another file I should test with to compare apples to apples. Kyle I can also provide you with teamviewer if you'd like to fart around om this epyc machine. I would have to reboot to change to UMA mode. I want to try UMA tomorrow, for sure.
a

thx. Moar puzzles. Nom.
Thanks for giving this a look. I am not too concerned with HandBrake as there are many alternatives that we can use to get around its shortcomings. You get Premiere Pro to work and I will put a 2990WX in my own system. Using a 2950X because of the Pre Pro issues.

I am getting ready to head out to CES Monday, and will not have time to do any further testing with the 2990WX till I get back. But when I do get back, I will surely have my ears open and be more than happy to continue to help out with testing where I can!
 
Sorry for the maybe stupid question.

But is this a problem that affects other AMD CPUs as well?
I my self have a 1950x in my system but i'm also thinking of the 1k and 2k series of Ryzen CPU's or is it only on 16+ core systems?
As fully explained in the video, this is a 2990WX and 2970WX issue. Single die Threadripper or Ryzen 7/5/3 CPUs do not show these issues with thread scaling.
 
Thanks. :) I try. I like your no-nonsense hardened grizzlenes, too. So handbrake is super interesting, in a bad way. Check it out -- I need to send you/post the bitd utility. it's just bitd.exe bundled with coreprio, or maybe not this build. Here's the output for me trying to convert to h265 from 1080p flv OBS recording -- it was about 40-50 fps on epyc (numa mode). This is just a dump not coreprio doing anytthing.

The cpu affinity mask? That's the program setting it itself. Also there's like 600 threads for just 64 processes so wwwwtttttfffffff is handbrake doing. I thought handbrake was basically a frontend for ffmpeg which also scales really well.

I will keep digging. There is hope :D


You can see the capture from handbrake initially is like 37 threads then I hit the export button, and blam. Done. 600 thread spaghetti nightmare.

I am going to use the big buck bunny movie in mp4 as the base/source file unless there is another file I should test with to compare apples to apples. Kyle I can also provide you with teamviewer if you'd like to fart around om this epyc machine. I would have to reboot to change to UMA mode. I want to try UMA tomorrow, for sure.
a

thx. Moar puzzles. Nom.

Are those threads or fibers? That impacts scheduling in Win32.
 
I'm not trying to turn this into a Windows vs Linux argument, and perhaps I need to read the OP a little better. But as far as I'm aware this NUMA issue isn't present under Linux, it's only present under Windows? Which sort of makes sense considering the use of Linux as the OS of choice running massively paralleled supercomputers.

So why blame Handbrake when the issue isn't present under Linux and as far as I'm aware, isn't present under Windows running Intel hardware?

As stated, not trying to cause a ruckus, just stating what I believe to be the obvious, unless I'm missing something here. Not blaming AMD here either as I believe Windows NUMA implememtation simply cannot handle AMD's multicore architecture.

My own testing on a platform that can switch between SMP and NUMA highlights that NUMA is faster in literally all cases running Linux. As can be assumed by my use of the term SMP, my system is an older dual socket system, and NUMA still performs better in literally all cases running modern software.

Handbrake is just a GUI for ffmpeg, x265, etc. just like many of the other open source GUIs available that serve the same functions. It's pretty obvious other GUIs using the same backend tools don't have this performance issue in Windows running on AMD CPUs including the 2990wx as noted in the benchmark results link I included. The command line being fed to x265 in handbrake for video encoding is probably just not optimal for a four die/32 core CPU. Everyone acknowledges this already so not sure what we are debating here.

Use another GUI for x265 (ripbot264) I already mentioned and encode 4K source to 4K output running multiple instances if necessary. 4K source to 1080p output in handbrake without CPU utilization metrics during encoding is not a valid test.

For reference as it seems you did not even look at the link I provided a 2990wx ~4Ghz all cores running 4K source to 4K output 4 simultaneous instances properly utilizing the CPU in Windows with x265 = ~30fps. Obviously there is no speed issue here. For reference I ran this today and my 1950x ~3.9Ghz running overkill mode 2x instances netted about 17fps 4K -> 4K which is right in line.

https://hwbot.org/submission/402651...ark___4k_ryzen_threadripper_2990wx_29.274_fps

Overall I don't see the point in using a GUI for x265 which is widely known and even confirmed here to be poor for these CPUs is a valid test for even 1950x let alone 2990wx. It pains me to see reviews of these 16/32 core CPUs using handbrake or encoding 1080p output in general and then wondering why your results are poor.

If you want a valid real world test of x265 on 2990wx like I said use ripbot264 distributed encoding 4K source -> 4K output running four instances on four virtual machines/4 instances with your physical 2990wx with a hypervisor. This is what I would do if I had a 2990wx. It might be overkill to do this but it will surely utilize the CPU to it's potential. If I had a 2990wx I would test it for you.

Easy to simply run the hwbot x265 benchmark as well instead. I think a power user would work a little harder to utilize a 1500 dollar CPU.
 
I have a R820 poweredge 4 socket with older v2 8 core xeons and for x265 I do exactly what I stated above to maximize encoding speeds on it. Depending on 4K source material bitrate, etc. I can encode 4K source/ouput near real time 23fps when combined with my 1950x system. 2990wx used properly in windows should achieve around the same performance again dependent on x265 settings.
 
It pains me to see reviews of these 16/32 core CPUs using handbrake or encoding 1080p output in general and then wondering why your results are poor.

Reviewers are using tests they have standardized; that shouldn't pain you, that's a plus.
 
Reviewers are using tests they have standardized; that shouldn't pain you, that's a plus.

Sure run standardized tests that's fine but don't stop there with some open source GUI that is obviously not optimized for the hardware you are reviewing. In that case the only people are you are connecting with are people that buy a 32 core CPU that aren't properly going to utilize it. Paid software sure I will say sure that software developer needs to address this performance issue with their software.

I basically agree with the guy that made the video ...this isn't an AMD issue or hardware issue it's quite easy to realize that. The issue mainly lies with application software implementation. Linux is just distributing and maximizing workload on these CPUs better for these software applications but my point with just x265 in windows is there is really no issue if you dig a little deeper.
 
Reviewers are using tests they have standardized; that shouldn't pain you, that's a plus.
I agree. Ripbot is cool with it's Multi-computer Distributed Encoding, but people like the flexibility that Handbrake offers. Is it the perfect bench for new tech? Looking like maybe not, but still, it's popular and as with all programs, the user wants the best performance they can see. Especially after an expensive processor purchase.
 
I agree. Ripbot is cool with it's Multi-computer Distributed Encoding, but people like the flexibility that Handbrake offers. Is it the perfect bench for new tech? Looking like maybe not, but still, it's popular and as with all programs, the user wants the best performance they can see. Especially after an expensive processor purchase.

It should just be the command line arguments for x265 handbrake is using the dev responsible for that portion of the app should be able to figure it out pretty quickly. Handbrake is open source so expecting some level of performance from a free GUI might not be some expectation you can cash your chips in on. Seeing multiple reviews of 4K to 1080p on these CPUs using one free tool makes no sense to me. Why buy a ferrari to never take it over 60mph?

I personally haven't seen a review of 2990wx by any review site that represents a power user in regards to open source encoding.

I know if I am investing a chunk of change for hardware I am going to figure out what I need to do to maximize it's capabilities for the purposes I made the investment to begin with.
 
It very much depends on the workload/benchmark. pgbench for example is extremely sensitive so such things, even on Linux, in the read/write test.

View attachment 132791
Source: Phoronix
I am not really surprised that the Handbrake situation is different. It has been known that launching multiple instances of Handbrake will give the expected (cumulative) performance, so obviously it is not the Windows scheduler eating the CPU time by shuffling things around.

That's not a 50% performance drop, that test was highlighting the difference in kernels considering Spectre/Meltdown.

Handbrake is just a GUI for ffmpeg, x265, etc. just like many of the other open source GUIs available that serve the same functions. It's pretty obvious other GUIs using the same backend tools don't have this performance issue in Windows running on AMD CPUs including the 2990wx as noted in the benchmark results link I included. The command line being fed to x265 in handbrake for video encoding is probably just not optimal for a four die/32 core CPU. Everyone acknowledges this already so not sure what we are debating here.

We're not debating anything.

We are discussing the fact that this is not a problem under Linux, yet it's a problem under Windows. So GUI front end or not, that immediately highlights that the problem is not with AMD's processors, the problem is not with Handbrake, the problem is with the NT kernel and it's NUMA implementation in certain scenarios.

This issue was all highlighted back in August by Phoronix, compared to Linux, Windows flatly sucks in a vast number of situations running a Threadripper 2990WX. This can't be a simple GUI front end issue or an issue with the processors themselves.

https://www.phoronix.com/scan.php?page=article&item=2990wx-linux-windows&num=1
 
Last edited:
That's not a 50% performance drop, that test was highlighting the difference in kernels considering Spectre/Meltdown.



We're not debating anything.

We are discussing the fact that this is not a problem under Linux, yet it's a problem under Windows. So GUI front end or not, that immediately highlights that the problem is not with AMD's processors, the problem is not with Handbrake, the problem is with the NT kernel and it's NUMA implementation in certain scenarios.

The problem is completely with Handbrake in regards to x265 in Windows. Performance is perfectly fine with other GUIs using the same open source software. Not sure what else plainly obvious I have to articulate to you are real world results not enough?

I already give up posting here not worthwhile if people will not consume the info supplied.

You guys have a good one.
 
The problem is completely with Handbrake in regards to x265 in Windows. Performance is perfectly fine with other GUIs using the same open source software. Not sure what else plainly obvious I have to articulate to you are real world results not enough?

I already give up posting here not worthwhile if people will not consume the info supplied.

You guys have a good one.

Don't get all defensive.

Handbrake is available under Linux and as far as I am aware this is not a problem under Linux? An I wrong in saying that or is that true? I'm not interested in bagging out Windows here, I simply find this quite fascinating and I'm surprised that more people aren't mentioning the results Phoronix got back in August as the variances between operating systems are quite simply massive.
 
In the FFMPEG source tree, libavfilter/pthread.c defines the default # of threads as # of cores + 1. The only win32 specific define I see is the call to w32thread_init(). Does Handbrake expose what settings it's applying when calling FFMPEG?
 
Whatever the case may be, it's blatantly obvious that Handbrake is more than simply a GUI front end for the likes of ffmpeg, x265, etc. Because, logically speaking, if Handbrake was simply sending command line arguments via a fancy GUI front end this performance issue would not be a problem. Quite obviously Handbrake is a little more than a GUI front end, it's a software package making use of the above mentioned codecs.
 
Whatever the case may be, it's blatantly obvious that Handbrake is more than simply a GUI front end for the likes of ffmpeg, x265, etc. Because, logically speaking, if Handbrake was simply sending command line arguments via a fancy GUI front end this performance issue would not be a problem. Quite obviously Handbrake is a little more than a GUI front end, it's a software package making use of the above mentioned codecs.

This is exactly what Handbrake is guy. I like how I can post freely consumable info from the internet, real results mind you, running Windows with this same hardware and those results get ignored. You can't take any examples I've provided and realize it's the program feeding variables to the same open source software components.

How can you explain the results I posted? Give it a try.

Send me the a damn 2990wx and I will show you results approx double the performance I am getting with a 1950x with the 2990wx using x265. Go ahead and run the 1080p bench as well. Have people lost the ability to read in 2019?

Someone run the damn hwbot265 4K benchmark on a 2990wx that works for HardOCP. What is hard about this? The only people being defensive are the ones that can't duplicate results already shown by others with the same hardware for some reason. Maybe you're stuck on using one piece of software or you want to display bad scaling using software that HB devs state themselves doesn't scale well past 8 cores. Idk. Seems pretty silly to me personally.

https://handbrake.fr/docs/en/latest/technical/video-encoding-performance.html

"The hardware you run on can have a large effect on performance. HandBrake can scale well up to 6 to 8 CPU cores with diminishing returns thereafter. So a 4 core CPU can be nearly twice as fast as a dual Core equivalent, however a 16 core may not be twice as fast as an 8 core but may still offer significant increases in performance. The CPU scaling curve does vary greatly by source and settings used."

Trust me guy it isn't x265 that has an issue scaling on more than 6-8 cores and definitely not when run concurrently with multiple instances.

Please now just be quiet about Handbrake and poor scaling. It's an issue with the program.

The methodology being used to review a 32 core CPU meant for professionals is not an adequate representation of the performance it offers and of what hopefully some percentage of people buying such a CPU would do when given a performance delta not to expectation.

Just to break this down in the simplest of terms. I provided results here of x265 having no issues scaling in Windows on these CPUs.

I had a 1700x @ 3.8Ghz prior and it would average around 4-8fps 4K-4K (one instance of x265).

I have a 1950x @ 3.8GHz and it averages around 7-16fps 4K-4K (one instance of x265).

I ran the hwbot265 bench 4K-4K and that test content I got about 17fps 4K-4K (one or two instances of x265 results were very close I gained around 1fps running 2 instances).

I showed you an example of the same benchmark same 4K content source of 2990wx and results were right at 30fps (so what 185-190%). (4 instances of x265 althought I would guess 2 instances would get a slightly lower result).


Scaling looks pretty damn good to me. All of these tools use x265 and ffmpeg. Now do you want to keep telling me it's NOT Handbrake?

I even told people what you might need to do to get correct scaling using x265 in windows using these CPUs and it gets ignored. Are you a rock? You are ingesting valid/real info and results like a rock.

Bottom line is the hardware is fine. Windows is fine. Some software is not optimized to run on these CPUs in Windows. if you can't get the performance from real applications that you have PAID for in Windows I would talk to the company you bought this software from.

If you can't get the expected and TESTED performance in open source applications that are using the same back end open source software AND you can't at least realize the issue then you shouldn't own the hardware. End of story.
 
Last edited:
Handbrake is available under Linux and as far as I am aware this is not a problem under Linux?

You can get different versions with different distros with different presets of every app. Hard to believe that the defaults for Windows might be out of whack? Or that the issue might be a combination of a Windows quirk and particular software configurations?

How can you explain the results I posted? Give it a try.

He hasn't gotten his 'Microsoft Bad' poof from Kyle yet, don't go so hard on him ;)
 
This is exactly what Handbrake is guy. I like how I can post freely consumable info from the internet, real results mind you, running Windows with this same hardware and those results get ignored. You can't take any examples I've provided and realize it's the program feeding variables to the same open source software components.

How can you explain the results I posted? Give it a try.

Send me the a damn 2990wx and I will show you results approx double the performance I am getting with a 1950x with the 2990wx using x265. Go ahead and run the 1080p bench as well. Have people lost the ability to read in 2019?

Someone run the damn hwbot265 4K benchmark on a 2990wx that works for HardOCP. What is hard about this? The only people being defensive are the ones that can't duplicate results already shown by others with the same hardware for some reason

https://handbrake.fr/docs/en/latest/technical/video-encoding-performance.html

"The hardware you run on can have a large effect on performance. HandBrake can scale well up to 6 to 8 CPU cores with diminishing returns thereafter. So a 4 core CPU can be nearly twice as fast as a dual Core equivalent, however a 16 core may not be twice as fast as an 8 core but may still offer significant increases in performance. The CPU scaling curve does vary greatly by source and settings used."

Trust me guy it isn't x265 that has an issue scaling on more than 6-8 cores and definitely not when run concurrently with multiple instances.

Please now just be quiet about Handbrake and poor scaling. It's an issue with the program.

The methodology being used to review a 32 core CPU meant for professionals is not an adequate representation of the performance it offers and of what hopefully some percentage of people buying such a CPU would do when given a performance delta not to expectation.

Just to break this down in the simplest of terms. I provided results here of x265 having no issues scaling in Windows on these CPUs.

I had a 1700x @ 3.8Ghz prior and it would average around 4-8fps 4K-4K
I have a 1950x @ 3.8GHz and it averages around 7-16fps 4K-4K
I ran the hwbot265 bench 4K-4K and that test content I got about 17fps 4K-4K
I showed you an example of the same benchmark same 4K content source of 2990wx and results were right at 30fps (so what 185-190%).

Scaling looks pretty damn good to me. All of these tools use x265 and ffmpeg. Now do you want to keep telling me it's NOT Handbrake?

I even told people what you might need to do to get correct scaling using x265 in windows using these CPUs and it gets ignored. Are you a rock? You are ingesting good info like a rock.

Bottom line is the hardware is fine. Windows is fine. Some software is not optimized to run on these CPUs in Windows. if you can't get the performance from real applications that you have PAID for in Windows I would talk to the company you bought some software from

If you can't get the expected and TESTED performance in open source applications that are using the same back end open source software then AND you can't at least realize the issue then you shouldn't own the hardware. End of story.

First of all, calm the fuck down.

Second of all, if all Handbrake is, is a GUI front end for raw codec commands, then why is the issue present if all Handbrake does is exactly the same thing one can do by encoding directly via the command line? Your logic doesn't hold water at the most basic level. Furthermore, you continually ignore the fact that the issue is not present under Linux.

I'm happy to discuss, but reply in a friendly manner with a measure of respect. I don't use Windows, I use Linux and everything works fine here running 24C/12T.
 
Last edited:
First of all, calm the fuck down.

Second of all, if all Handbrake is, is a GUI front end for raw codec commands, then why is the issue present if all Handbrake does is exactly the same thing one can do by encoding directly via the command line? Your logic doesn't hold water at the most basic level. Furthermore, you continually ignore the fact that the issue is not present under Linux.

I'm happy to discuss, but reply in a friendly manner with a measure of respect. I don't use Windows, I use Linux and everything works fine here running 24C/12T.

Obviously the command line parameters being fed to the x265/FFmpeg command line via Handbrake aren't optimal. I mean like I stated already about 10 times you're using the same damn encoding back end. I'm calm but when people can't connect a dot that you gave straight up results to connect them for you it's pretty much a hand to forehead experience.

You have still not explained the results I have provided AND THAT YOU CAN FIND YOURSELF. I LINKED YOU TO THEM.

Lol my logic doesn't hold water...then explain the results like I've asked multiple times! You can't other than to just say...yeah handbrake scales for shit like the devs of the program STATE IT DOES.

Re-read my posts as many times as you need to...it's simple/straigh forward/logical and repeatable results are provided. I do not know how else it can be more plainly straight forward or easily consumable.

FFS Handbrake even states their software GUI doesn't scale well past 8 CPUs.

"calm the fuck down" ......READ
 
Last edited:
Overall I don't see the point in using a GUI for x265 which is widely known and even confirmed here to be poor for these CPUs is a valid test for even 1950x let alone 2990wx. It pains me to see reviews of these 16/32 core CPUs using handbrake or encoding 1080p output in general and then wondering why your results are poor.

If you want a valid real world test of x265 on 2990wx like I said use ripbot264 distributed encoding 4K source -> 4K output running four instances on four virtual machines/4 instances with your physical 2990wx with a hypervisor. This is what I would do if I had a 2990wx. It might be overkill to do this but it will surely utilize the CPU to it's potential. If I had a 2990wx I would test it for you.
Well, testing hardware is not all about finding out what it is great at, but also finding out what it is not good at. That is one of the reasons we run 8 different content creation benchmarks. https://www.hardocp.com/article/2018/11/13/intel_core_i99980xe_vs_amd_ryzen_threadripper/3

Quite frankly, I could care less if HandBrake is broken or not, as there are many ways around that problem, however Premiere Pro is a big deal to me as seeing how much of the industry uses it.
 
At the same time imho all I see online is how Handbrake scales like crap on these CPUs I get no real world performance data other than benchmarks users have run using other open souce apps for video encoding.
HandBrake has always scaled like crap on CPU with more than 8 cores....it states it in the HandBrake docs. The issue with HandBrake is that you are actually dealt a penalty, which is something in itself worth knowing.

So anyway, we go your thoughts on HandBrake. Can we stop talking about that now and what exactly thread is actually about?
 
I think Microsoft is probably saying hey lot of people probably don't have this processor so we can wait on this. Its truly amazing to me that Microsoft really neglects this. I am wondering though why AMD would not be going out of their way to get this done as well if this issue is windows specific. May be AMD can only do so much on their own and getting a cold shoulder from Microsoft.
 
Uploading a quick test video, hopefully it snot too crap. Sorry for background noise, housemates playing COD and geek squad tech support lol.

I have found a second issue, if you dont restart indigo between start/stop on coreprio your results will not be consistent. Took me a few videos to figure out what was going on and you can see it in this video as I restart indigo once to get it to work properly.

EDIT - For what its worth I wouldnt put it past the issue being a BIOS issue with the MSI motherboard i'm using (X399 Gaming Pro Carbon AC) as I cant disable SMT. Doing so results in no POST.

Crap video

coreprioquicktest1.png
 
Last edited:
Back
Top