CPU Utilization is Wrong

rgMekanic

[H]ard|News
Joined
May 13, 2013
Messages
6,943
In May of last year, senior performance architect at Netflix, Brendan Gregg posted an interesting article about how the "%CPU" metric is wrong, and is progressively getting worse. Now, Brendan expands on his findings in a 5 minute video from the Southern California Linux Expo. The UpSCALE Lightning Talk from Opensource.com goes over his original idea very well, but also shows an interesting conclusion.

Check out the video

In his Lightning Talk, "CPU Utilization is Wrong," Brendan explains what CPU utilization means—and doesn't mean—about performance and shares the open source tools he uses to identify reasons for bottlenecks and tune Netflix's systems. He also includes a mysterious case study that's relevant to everyone in 2018.
 
  • Like
Reactions: Aenra
like this
Sooo.. the whole point of the video was to show that the Meltdown and Spectre patches are causing a 26% slowdown because they flush the TLB cache thus causing a massive number of cache misses.

But instead of just getting to the point, the speaker just said that CPU Utilization is incorrect...... :rolleyes:
 
Sooo.. the whole point of the video was to show that the Meltdown and Spectre patches are causing a 26% slowdown because they flush the TLB cache thus causing a massive number of cache misses.

But instead of just getting to the point, the speaker just said that CPU Utilization is incorrect...... :rolleyes:

But it is incorrect.
 
Pretty straightforward. It's pretty direct that cache misses cause an increase in what the metric measures as CPU utilization when that is not in fact the case. Good point. The specific case he uses is specific to Intel CPUs and the Meltdown patches, but it is applicable to CPUs in general waiting on main memory reads, which happens all the time.
 
its complete correct. You just have to understand what it is you are measuring and stop confussing it with something you believe it is.
 
I've been saying this for almost 10 years... Why did I not present on this so many years ago? This isn't smart stuff to me, its duh stuff! This occurs with memory usage too and I should have presented and wrote a white paper on this.
 
If you follow Brendans work you'll know that hes been doing this kind of analysis for years. Usually it has to do with spending inordinate cycles executing a particular code block. The result of the inefficient code produces similar results as an inefficient processor. Not so surprising.
 
For one thing anyone that is managing large scale server deployments knows about this. Second, if only we had a way to measure external delays like I/O wait... oh what's that, we already do?

I'm a dope and even I know this.
 
Wow. He wrote all those tools and still screwed the pooch on a process waiting on input causing erroneous CPU utilization. Yup, it's a Linux piece. No wonder. Yeah, these guys need to go back and read "Operating Systems" by the legendary Andrew S Tanenbaum. Instead of saying a Context Switches gets more expensive, he makes an absolutely wrong statement about misleading CPU utilization.
 
Well, the CPU utilization metric is correct if you plan on using it to throttle similarly bottlenecked code.
 
I skipped the meltdown and spectre patches and bios updates .I haven't seen any incident yet on my clients pc are any of mine. These patches do more harm than help, you can see that in benches.
 
4 in 10 Americans think the earth is less than 10,000 years old so when you apply that same level of tech ignorance then it becomes abundantly clear that not everyone knew this.
Well there is that :)
I guess what I meant was every Linux user knows (or should know) this. These are Linux tools being demonstrated and any sysadmin needs to know how to track down io bound tasks


Windows is different as MS doesn't make this obvious.

Fundamentally this isn't CPU loading this is task scheduler loading. A CPU is always working, aspects maybe unlocked to save power BUT it is still used.
An OS scheduler however is different
 
4 in 10 Americans think the earth is less than 10,000 years old so when you apply that same level of tech ignorance then it becomes abundantly clear that not everyone knew this.
Have a source for that stat, or did you just make it up?
 
http://bfy.tw/HwPo

Sadly, it seems to be the case.
All I could find was a bullshit Gallop poll that gave people only three choices: evolution with God's help, evolution without God's help, and creation 10,000 years ago. That leaves out a lot of people that believe in creationism but have other ideas on the timeline, including the idea that the Earth is millions of years (or more) old. These people will not select the first two answers, so get lumped in with the "young earth" creationists. So really all that poll proves is that 40% of the people in this country believe in creationism. I'll bet that percentage is much higher in Muslim countries. Does that make them backwards and stupid too?
 
Again, this backs up any statement that more or less says: "Most people are idiots."

giphy.gif
 
100% utilization doesn't mean 100% max load. It just means the kernel scheduler had something to run other than idle the cpu thread resource. It's always been a pita to quantify cpu and memory usage, everyone wants to know something slightly different.
 
All I could find was a bullshit Gallop poll that gave people only three choices: evolution with God's help, evolution without God's help, and creation 10,000 years ago. That leaves out a lot of people that believe in creationism but have other ideas on the timeline, including the idea that the Earth is millions of years (or more) old. These people will not select the first two answers, so get lumped in with the "young earth" creationists. So really all that poll proves is that 40% of the people in this country believe in creationism. I'll bet that percentage is much higher in Muslim countries. Does that make them backwards and stupid too?

No, you did not have to choose only one of those three, that's why the combined percentage is not 100%. The category you are describing would obviously choose "none of the above", which seems to be a 5%.
 
Back
Top