Unusual CPU useage on servers in one location, can [H] folks take any guesses?

dbwillis

[H]F Junkie
Joined
Jul 9, 2002
Messages
9,487
OK, so my team built ~36 servers or so back in January, all server 2019, all the same hardware spec as they were all purchased under one purchase order from HP.
We handed them over to the other team, who scheduled the install in the racks and installing the 3rd party vendors application.
DL 380 Gen 10, (2) Xeon Gold 6226R (32 cores, 64 threads), 96gb (12x 8gb 2933mhz)
OS is installed to 480gb Nvme in Raid1, there is a D drive of 100TB (8x 16tb SAS I believe)

1/2 the servers went to Texas, the other half here in CT
I patched the servers after they were built, so all were fully patched as of then, had no issues in Device Manager, no issues in Event logs

The servers in Texas are using ~11% more CPU than the Conn ones, specifically in Task Manager / Processes tab, the 'Services and Controllers App' is ~11%, on the Details tab, its 'Services.exe' using the 11%

HPe techs confirmed there are no hardware issues (would be ironic if we picked the specific ones with hardware issues and sent to TX)
All have static IP, same domain, same OU/Policies, apps run as a service so no user is logged in daily, vendors app is fully up to date/latest version and patch. I only found 2 issues:
A - The TX ones had CT DNS servers set statically on the teamed fiber connection (20gb), although a DNS request shouldnt take long, I changed one server to the TX DNS servers, /flushdns and then rebooted it, with no change
B - Windows Defender was still running, even though there was a GPO to stop it, I uninstalled Defender from the Features..the services.exe went back to about 1%< rebooted the server..still at 1%..maybe that was it ! Checked the server today (3 days later) and the Services and Controllers App' is ~11% again
Anyone have any thoughts?

1719592831645.png
 
Do the Texas ones simply see more traffic?
Is, for example, one location getting some IPv6 addresses while the other isn't?

How's the cooling between them? You'd be seeing higher percentages on systems that are running lower clocks more often due to throttling.

^ The above poster's suggestion about using sysinternals programs (process explorer and procmon) is a very good one. They are very, very good programs. Easy to use and show you what's happening all the way down. I would start there.

Can you take the problematic ones offline for testing? Or are you forced to do everything live?
 
The same traffic, TX is the DR for the CT office, although the traffic does go over a longer path to TX.
I have one Dev server I can work with, but all are live, recording all signals we broadcast, 24x7, it's an FCC thing.
I didn't think of the temps, will check the ILO temps Monday, but I think we recently increased the data center temps in CT.
 
Back
Top