GPU Crunching Efficiency

pututu · Dec 31, 2016

Not much have being discussed about GPU (BOINC) crunching power efficiency here so I thought it might be useful to share a study which I did over this Xmas holiday. What I found out (no surprise at all) is that the optimum power-point (PPD/watt) efficiency and least energy consumption (kWh) to reach a desired point target occurs when the GPU is underclocked and undervolted. Note: YMMV depending on the PC system and GPU card. Even on the same card, if the GPU BIOS version is different, I think it may yield different result when the GPU throttles when "power limit" is imposed onto the GPU card.

Purpose: Take a GPU card and attempt to find a "power limit" that yields the best power-point efficiency & low energy consumption. In this study I used PrimeGrid GFN-18 project as a test case since this yields pretty consistent points per work unit (WU) and put quite a stress on the GPU card, less on CPU and motherboard.

Test setup:
GPU card - MSI GTX 970 Gaming 4G
CPU -- intel core i3 530 (running @140 BCLK or 3.08GHz)
MB - Gigabyte GA-H55M-S2H
PSU - Sesonic G-550 (SSR-550RM, gold rated)
Memory - 2 x 2GB Corsair XMS3 DDR3 (running @1333MHz)
SSD - PNY CS1311 120GB Sata III
Case - Compaq Presario 5BW130 case (do I really need to mention this?)

Pictures below for the test setup. This is not a beast machine. The rest of the components, I had them for many years except for the PSU and SSD. The compaq case I had it for 16 years!

Other tools needed: MSI Afterburner 4.3.0, Nvidia Inspector 1.9.7.8, Kill-A-Watt meter and nvidia-smi program that comes it when installing Nvidia driver. This program is typically located at C:\Program Files\NVIDIA Corporation\NVSMI folder. There is even a manual for it. I used nvidia-smi program rather than Nvidia Inspector to record GPU power consumption.

Baseline and GPU information: GPU Bios version 84.04.36.00.F1 at stock setup. Nvidia driver 368.22 used.

At idle, PC power input is 57.20W (kill a watt) and GPU idle power is 15.82W (nvidia-smi). Let's call these PC_watt and GPU_watt respectively. That means the PC system minus GPU consumes 57.20 - 15.82 = 41.38W at idle. I think this is quite decent for a non-beast system. "Power Limit" is set by the MSI AfterBurner or the Nvidia Inspector. Either one will give set the same "Power Limit". Below are the actual pictures at idle condition. Note that the lowest GPU voltage that can be achieved by this card is 0.856V. If we can run the GPU during crunching at 0.856V, you get the theoretical best possible efficiency. I think this is set by the BIOS and somewhat limited by the semiconductor material and processing.

Test Results:
Record result at power limit of 100%, 80%, 70%, 65%, 60%, 55% and 50%. ~~I can't figure out if there is a away to reduce the power limit setting below 50%. If anyone knows, would like to hear from you.~~ Found this in nvidia-smi. In this testing, I let the GPU BIOS (whatever the card manufacturer decides) automatically determines the voltage and GPU clock setting. I'm not sure if there is a way to manually reduce the voltage (up to a limit of the transistor operation of 0.856V in this case). Below is the result based on PrimeGrid GFN-18 work units.

The optimum power-point efficiency and PC energy consumption occurs when the power limit is set around 55%. At 50% power limit, the GPU clocks down from 1088MHz to 861MHz, that's a drop of 21% in clock speed while maintaining GPU voltage of 1.0V. Again, this might be limited by GPU BIOS or perhaps in combination with other components limitation.

The first graph below shows the relative performance of various metrics with respect to (wrt) the optimum 55% power limit. The second graph shows PPD (points per day) per watt performance.

Below is the screenshot of the 55% power limit setup.

Discussions and Conclusions:
IIRC this may not apply to FAH project where points are awarded based on TPF (time per frame) time. So you need to increase both GPU clock speed and power to achieve the best possible PPD per watt. Due to varying project points in FAH, PG GFN-18 is a better choice for this study. In general, YMMV and in my case, 55% appears to be the optimum for GFN-18. At this setting and to get 1M BOINC points, it consumes about 51.6 kWh (PC_energy) or 34.2 kWh (GPU_energy). For states with high power bills, I hope this study provides a useful insight to the GPU crunching efficiency.

fastgeek · Dec 31, 2016

Huh. I was just messing around with nvidia-smi very late last night trying to figure out how to throttle down the twin M40s I've borrowed. These are fanless units and one of them is getting within 3*C of the slow down limit whist crunching GFN21s. Mind you I had been at work for 14hours at this point, so the best I could come up with was using nvidia-smi to query the temps and output it to a file; then ran "find" on said output. If the card was 81* or more, boinccmd would trigger a GPU pause for 20 seconds to let the GPUs cool down. Rinse, lather, repeat. Crude, but it worked.

Also pointed a big ass fan at the front of this server, which helps a lot. Never occurred to me to try something like afterburner... but not sure it would work on these cards. Hmm.

Gilthanis · Jan 3, 2017

Keep in mind that GPUGrid also has a QRB bonus that other BOINC projects do not. So, when calculating you need to factor the QRB and non-QRB respectively as down clocking may make you miss out on those bonuses. I think within 24 hours is 50% bonus and within 48 hours is 25%. But I could be wrong as it has been a very long time since I read up on it.

And IIRC GFN work units require double precision and so performance may very greatly from card to card. Especially with nVidia cards. You may want to try a single precision work unit to verify similar results. GFN 20 gets a 10% bonus. GFN21 gets a 20% bonus. GFN22 gets a 50% bonus. So, it will also be interesting to see a break down of each offered application to see which work units perform the best.

pututu · Jan 4, 2017

Yes, this study does not apply to WU projects that have something similar to QRB such as you mentioned above for GPUgrid. FAH I falls into similar category i.e. if you do the same amount of work but return the result faster you are rewarded extra points. If you look at the table above and focus on GFN-18 which has no QRB, at 100% power limit versus 55% power limit, the WU run time is only about 7.6% slower, so this appears not that bad, at least for GFN-18 project. YMMV for other projects as you stated and for different GPUs due to how the BIOS is coded into the GPU. The plus side is that you consumed 15.9 kWh less (~30.8% less) energy than if you run the GPU at 100% power limit to reach 1M points, at least for the card that I have studied. Got to wait for the next long holiday to test out other GFN-20 and above which has no QRB. I think the conclusion from this study is that you get better power-point efficiency if you underclocked and undervolted the GPU cards assuming the same amount of work is being done (exclude time related factor such as QRB). Again YMMV depending on the type of GPU card and project of interest. It is best to do a quick check to gauge the optimum power-point efficiency for each card-project but I do think that underclocking and undervolting the GPU card in general has some power-point benefit though the only setback is return of asset (ROA). Since this DC is all about volunteering some folks might focus more on ROA, some on power-point efficiency, some on serious competition, etc.

Gilthanis · Jan 4, 2017

I would also suggest perhaps doing a test with PPS Sieve since it is single precision. See if the results are consistent with the double precision work units of GFN.

pututu · Jan 4, 2017

How do you find out which PG sub-projects or any other projects which are using SP or DP?

Gilthanis · Jan 4, 2017

Well... when GFN first came to PrimeGrid, it was discussed and so old cards that don't have DP just couldn't get work due to requirements. You would need to read their forums for details or ask someone there are if it is officially documented somewhere.

AMD discussion here: http://www.primegrid.com/forum_thread.php?id=5254#69079
nVidia discussion here: http://www.primegrid.com/forum_thread.php?id=5254#69079

pututu · Jan 4, 2017

I'll take a stab at PPS Sieve if SP does matter on the same card since this is a short run.

Bill1024 · Jan 4, 2017

I don't know if this will be of any interest to you.

http://www.primegrid.com/gpu_list.php

Gilthanis · Jan 4, 2017

That list pretty much just states what crunches it the fastest based on their results. It however, does not state what is most efficient based on power usage vs. point payout. Nor does it show where the prime efficiency for each card is with clock speeds and such. And that is what pututu is looking at.

pututu · Jan 15, 2017

EXPERIMENT #2

Using the same setup as my first post, managed to get some data on PPS Sieve and see if single precision WU yields similar result as double precision WU.
Result: The trend is the same i.e. you get better GPU crunching efficiency with lower GPU voltage and GPU clock BUT the gain is not as much as the double precision WU.

The key to get better crunching efficiency (does not apply to WU which rewards point base on time and not the amount of GFLOPs done) is really the GPU voltage which is not surprising since power consumption is proportional to the voltage-squared (assuming nvidia chip resistance does not increase much over the range of operating temperatures studied). Since PPS Sieve has better stability than GFN projects in regards to overclocking, overclocking the GPU card while maintaining the GPU voltage improves crunching efficiency. My GTX 970 card max at 1478 MHz without increasing the voltage. If you want to get above 1500 MHz, generally increasing GPU voltage (and watch for temp) helps to maintain stability but you will start losing on crunching power efficiency. Again YMMV depending on card and projects but reducing GPU voltage either by power limiting the GPU card and/or reducing the GPU clock help. Below is the result of this study with PPS Sieve project. BTW, I ran two PPS Sieve WUs per card to load up the GPU utilization to 100%. Also, the overhead power needed to run one GPU card (PC motherboard, CPU, DRAM, SSD/HD, etc) remains fairly constant for PPS Sieve, around 56.10 W to 63.14 W (PC_watt - GPU_watt), so one should see better efficiency with multiple GPU cards. Some projects which use heavy PCIE bus bandwidth/CPU/RAM/HD/SSD may increase PC power overhead such as GPUgrid when compare to PPS Sieve, for example.

Pictures of 50% power limit which yields the best power-point crunching efficiency.

Gilthanis · Jan 15, 2017

Thanks for the testing. Keep us posted if you test any other gpu work units/projects. PrimeGrid has the AP27 GPU work as well. I'm assuming it will see the same but for science sake you might as well do one for it as well.

pututu · Jan 16, 2017

I've EVGA GTX 960 card which I may want to test GFN (double precision) and PPS Sieve (single precision) to see how each GPU manufacturer optimizes their GPU bios since there is no way to undervolt the GPU card, other than power limiting the card or downclocking the card. I think I would get different results for different card manufacturers. It is a pity that power-point efficiency in GPU crunching is now well discussed or considered since most folks chase after more points in a given time. Yeah, moar power, moar GHz, moar voltage, etc. If you are in for a long haul, having efficient setup is beneficial, imo. I read somewhere that GTX 750Ti is a standard in regards to power-point efficiency for crunching and it would be great to benchmark this in terms of PPD/PC_watt or PPD/GPU_watt. If you don't have kill-a-watt, you can use nvidia-smi command to access the GPU_watt and perhaps if someone can help to publish this here, that will be great. The PC_watt might vary considerably on motherboard, CPU, etc so maybe just focus on GPU_watt first.

I probably won't be able to do this until sometime next month...

pututu · Oct 6, 2017

Bump this thread with an update.

Found a nice article about GPU mining (not crunching) power efficiency on GTX 1080Ti card.

Peak mining power efficiency for GTX 1080Ti occurs at about 137W to 148W actual power or power limit of 137/250 (55%) to 148/250 (59%). I think mining uses double precision.
The actual power reported here is the GPU card power, excluding system power. My GTX 970 reported 55% power limit setting as the optimum (crunching) power efficiency. YMMV.

Note that the Pascal card allows voltage-frequency curve tweaking to further optimize power efficiency/overclocking. I have not investigated this but I'm guessing you probably won't gain a lot out of this. I think in most Pascal cards, the sweet spot for power efficiency is probably somewhere between 50% to 60% power limit.

BTW, don't recommend to use this during the GPUGRID FB sprint challenge currently underway.

GPU Crunching Efficiency

pututu

[H]ard DC'er of the Year 2021

fastgeek

[H]ard|DCOTM x4 aka "That Company"

Gilthanis

[H]ard|DCer of the Year - 2014

pututu

[H]ard DC'er of the Year 2021

Gilthanis

[H]ard|DCer of the Year - 2014

pututu

[H]ard DC'er of the Year 2021

Gilthanis

[H]ard|DCer of the Year - 2014

pututu

[H]ard DC'er of the Year 2021

Bill1024

2[H]4U

Gilthanis

[H]ard|DCer of the Year - 2014

pututu

[H]ard DC'er of the Year 2021

Attachments

Gilthanis

[H]ard|DCer of the Year - 2014

pututu

[H]ard DC'er of the Year 2021

pututu

[H]ard DC'er of the Year 2021