Titan Black Folding Throughput Trouble

plext0r

[H]ard DCOTM x3
Joined
Dec 1, 2009
Messages
780
Hey guys, I removed the second GPU from this system, but the Titan Black is still performing like crap. I'm not sure on the PPD because HFM.net says 0 PPD. I'm running nvidia 355.11 drivers and set-gpu-fan to increase the GPU fan. nvidia-smi says the GPU is running at 99% utilization, but look at this snippet from log.txt:

Code:
13:45:39:WU02:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/beta/Core_21.fah/FahCore_21 -dir 02 -suffix 01 -version 704 -lifeline 3992 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
13:45:39:WU02:FS02:Started FahCore on PID 5864
13:45:39:WU02:FS02:Core PID:5868
13:45:39:WU02:FS02:FahCore 0x21 started
13:45:40:WU02:FS02:0x21:*********************** Log Started 2015-09-25T13:45:39Z ***********************
13:45:40:WU02:FS02:0x21:Project: 9205 (Run 0, Clone 22, Gen 20)
13:45:40:WU02:FS02:0x21:Unit: 0x00000017664f2dd055d4c7ef955afa95
13:45:40:WU02:FS02:0x21:CPU: 0x00000000000000000000000000000000
13:45:40:WU02:FS02:0x21:Machine: 2
13:45:40:WU02:FS02:0x21:Reading tar file core.xml
13:45:40:WU02:FS02:0x21:Reading tar file system.xml
13:45:43:WU02:FS02:0x21:Reading tar file integrator.xml
13:45:43:WU02:FS02:0x21:Reading tar file state.xml
13:45:47:WU02:FS02:0x21:Digital signatures verified
13:45:47:WU02:FS02:0x21:Folding@home GPU Core21 Folding@home Core
13:45:47:WU02:FS02:0x21:Version 0.0.12
13:48:45:WU02:FS02:0x21:Completed 0 out of 2500000 steps (0%)
13:48:45:WU02:FS02:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
14:17:02:WU02:FS02:0x21:Completed 25000 out of 2500000 steps (1%)
14:44:39:WU02:FS02:0x21:Completed 50000 out of 2500000 steps (2%)
15:12:08:WU02:FS02:0x21:Completed 75000 out of 2500000 steps (3%)
15:39:41:WU02:FS02:0x21:Completed 100000 out of 2500000 steps (4%)
16:08:13:WU02:FS02:0x21:Completed 125000 out of 2500000 steps (5%)
16:35:59:WU02:FS02:0x21:Completed 150000 out of 2500000 steps (6%)
17:03:42:WU02:FS02:0x21:Completed 175000 out of 2500000 steps (7%)
17:31:20:WU02:FS02:0x21:Completed 200000 out of 2500000 steps (8%)
17:59:39:WU02:FS02:0x21:Completed 225000 out of 2500000 steps (9%)
18:27:18:WU02:FS02:0x21:Completed 250000 out of 2500000 steps (10%)
18:55:04:WU02:FS02:0x21:Completed 275000 out of 2500000 steps (11%)
19:22:53:WU02:FS02:0x21:Completed 300000 out of 2500000 steps (12%)
19:51:30:WU02:FS02:0x21:Completed 325000 out of 2500000 steps (13%)
20:19:09:WU02:FS02:0x21:Completed 350000 out of 2500000 steps (14%)
20:46:41:WU02:FS02:0x21:Completed 375000 out of 2500000 steps (15%)

Any ideas what I could do to get this thing moving? What should TPF be on the Core 21?
 
I make between 5 and 6 min per frame on my 970 under Linux a while back (I guess 346.46 at that time)

First think comes is mind: do,you have a core free for the GPU ? What is your CPU slot setup ? Reason: those new 0x21 write every 2.5% completion a checkpoint and perform CPU-driven sanity/consistency checks. You really want to make sure not to overallocate CPU.

Next try: try a lower driver version ... 355.11 is a bit slower (but nothing like in your case)
 
I have the slots configured as follows:
Code:
  <slot id='0' type='CPU'>
    <client-type v='smp'/>
    <cpus v='32'/>
  </slot>
  <slot id='1' type='CPU'>
    <client-type v='smp'/>
    <cpus v='30'/>
  </slot>
  <slot id='2' type='GPU'>
    <client-type v='beta'/>
    <gpu-index v='0'/>
  </slot>

Here's a snippet from all slots log:
Code:
20:02:56:WU03:FS00:0xa4:Completed 400000 out of 500000 steps  (80%)
20:04:51:WU00:FS01:0xa4:Completed 105000 out of 500000 steps  (21%)
20:08:35:WU03:FS00:0xa4:Completed 405000 out of 500000 steps  (81%)
20:11:05:WU00:FS01:0xa4:Completed 110000 out of 500000 steps  (22%)
20:14:15:WU03:FS00:0xa4:Completed 410000 out of 500000 steps  (82%)
20:17:25:WU00:FS01:0xa4:Completed 115000 out of 500000 steps  (23%)
20:19:09:WU02:FS02:0x21:Completed 350000 out of 2500000 steps (14%)
20:19:51:WU03:FS00:0xa4:Completed 415000 out of 500000 steps  (83%)
20:23:38:WU00:FS01:0xa4:Completed 120000 out of 500000 steps  (24%)
20:25:37:WU03:FS00:0xa4:Completed 420000 out of 500000 steps  (84%)
20:29:54:WU00:FS01:0xa4:Completed 125000 out of 500000 steps  (25%)
20:31:17:WU03:FS00:0xa4:Completed 425000 out of 500000 steps  (85%)
20:36:21:WU00:FS01:0xa4:Completed 130000 out of 500000 steps  (26%)
20:36:59:WU03:FS00:0xa4:Completed 430000 out of 500000 steps  (86%)
20:42:42:WU03:FS00:0xa4:Completed 435000 out of 500000 steps  (87%)
20:42:46:WU00:FS01:0xa4:Completed 135000 out of 500000 steps  (27%)
20:46:41:WU02:FS02:0x21:Completed 375000 out of 2500000 steps (15%)
20:48:30:WU03:FS00:0xa4:Completed 440000 out of 500000 steps  (88%)
20:48:58:WU00:FS01:0xa4:Completed 140000 out of 500000 steps  (28%)
20:54:17:WU03:FS00:0xa4:Completed 445000 out of 500000 steps  (89%)
20:55:22:WU00:FS01:0xa4:Completed 145000 out of 500000 steps  (29%)
21:00:04:WU03:FS00:0xa4:Completed 450000 out of 500000 steps  (90%)
21:01:50:WU00:FS01:0xa4:Completed 150000 out of 500000 steps  (30%)
21:05:53:WU03:FS00:0xa4:Completed 455000 out of 500000 steps  (91%)
21:08:11:WU00:FS01:0xa4:Completed 155000 out of 500000 steps  (31%)
21:11:39:WU03:FS00:0xa4:Completed 460000 out of 500000 steps  (92%)
21:14:11:WU02:FS02:0x21:Completed 400000 out of 2500000 steps (16%)
21:14:34:WU00:FS01:0xa4:Completed 160000 out of 500000 steps  (32%)

HFM.net says the CPU slots are getting 10K PPD and 13K PPD and the Titan is getting 0. Full log is at this location.
 
You would need to maintain HFM manually, it seems not getting the right info:

9205 171.64.65.104 p9205 116800 7.00 10.00 24400 100 OPENMM_21 Description jadeshi 0.75
Preferred days: 7
Final deadline: 10
base credit : 24400

Can you snap an image from nvidia-smi ?
While your CPU config looks ok, could you pause one CPU slot for a bit and see if the Titan gets more air to breath ?
 
Just see your 0x18 is also slow: my 970 makes 0:33 TPF ...

Are you in a 4lane PCIe slot ?

Can't recall seeing numbers from other Titans ...
 
Next wild und uneducated guess: the GPU don't like to jump between CPUs

Assuming you run it in a 4P machine ... Each PCIe slot is assigned to a dedicated CPU (?)
Any chance (with htop for example) to lock the CPU process driving the GPU on the physical CPU driving the physical slot ? Not sure if that could cause an overhead leading to lower TPF (and I still don't have a multi CPU rig).

Reducing/pausing one CPu slot might help to confirm as the system would less shuffle processes around
 
Brilong you definitely have a problem below is what both your GPU and CPU's should be getting' I am running 14.04 and using v7 of fah as I mentioned earlier v7 chooses 31 on the smp slots but I have no idea if that matters or not. I am avg around 700k on the Intel 4650 boxen with a GTX 980 about an hour ago I hit 850k on it but that is not the norm. Also below are some tools you may want to use to set v and clocks on the Titan.

I do not know what your problem is you may want to try v7 and maybe 14.04 if you are not already using it.

sudo nvidia-xconfig --cool-bits=28

nvidia-smi

nvidia-settings -q all | grep -i voltage (Shows what current v is and what allowable OC v is on my 980 I am allowed 75000mv max you can choose any 12500 increment up to the max.

sudo nvidia-settings -a GPUOverVoltageOffset=xxxxx (12500), 25000) etc.(set v on GPU)


9205 GTX 980 1505Mhz 355.11 drivers
Code:
Project ID: 9205
 Core: OPENMM_21
 Credit: 24400
 Frames: 100


 Name: Grandpa Slot 00
 Path: 127.0.0.1-36330
 Number of Frames Observed: 300

 Min. Time / Frame : 00:04:32 - 378,301.6 PPD
 Avg. Time / Frame : 00:04:52 - 340,108.3 PPD


 Name: Grandpa Slot 01
 Path: 127.0.0.1-36330
 Number of Frames Observed: 300

 Min. Time / Frame : 00:04:28 - 386,802.5 PPD
 Avg. Time / Frame : 00:04:48 - 347,218.3 PPD


 Name: Grandpa Slot 02
 Path: 127.0.0.1-36330
 Number of Frames Observed: 300

 Min. Time / Frame : 00:04:25 - 393,389.0 PPD
 Avg. Time / Frame : 00:04:41 - 360,272.5 PPD

9761 Intel 4650 stock clocks, Core32 is 4650L
Code:
 Project ID: 9761
 Core: GRO_A4
 Credit: 537
 Frames: 100


 Name: Core32 Slot 00
 Path: 10.0.0.91-36330
 Number of Frames Observed: 300

 Min. Time / Frame : 00:00:30 - 160,723.2 PPD
 Avg. Time / Frame : 00:00:37 - 117,343.2 PPD


 Name: Core32 Slot 02
 Path: 10.0.0.91-36330
 Number of Frames Observed: 300

 Min. Time / Frame : 00:00:29 - 169,107.7 PPD
 Avg. Time / Frame : 00:00:37 - 117,343.2 PPD


 Name: Patriot Slot 00
 Path: 10.0.0.83-36330
 Number of Frames Observed: 300

 Min. Time / Frame : 00:00:26 - 199,205.2 PPD
 Avg. Time / Frame : 00:00:33 - 139,312.3 PPD
 Cur. Time / Frame : 00:00:32 - 143,869.7 PPD
 R3F. Time / Frame : 00:00:32 - 143,869.7 PPD
 All  Time / Frame : 00:00:32 - 143,869.7 PPD
 Eff. Time / Frame : 00:00:33 - 138,462.0 PPD

9752 Intel 4650L
Code:
 Project ID: 9752
 Core: GRO_A4
 Credit: 1000
 Frames: 100


 Name: Core32 Slot 00
 Path: 10.0.0.91-36330
 Number of Frames Observed: 300

 Min. Time / Frame : 00:00:22 - 213,140.9 PPD
 Avg. Time / Frame : 00:00:28 - 148,444.5 PPD
 Cur. Time / Frame : 00:00:28 - 146,164.7 PPD
 R3F. Time / Frame : 00:00:28 - 146,164.7 PPD
 All  Time / Frame : 00:00:28 - 146,164.7 PPD
 Eff. Time / Frame : 00:00:31 - 127,159.3 PPD


 Name: Core32 Slot 02
 Path: 10.0.0.91-36330
 Number of Frames Observed: 300

 Min. Time / Frame : 00:00:23 - 199,392.8 PPD
 Avg. Time / Frame : 00:00:29 - 140,832.9 PPD
 Cur. Time / Frame : 00:00:29 - 139,562.8 PPD
 R3F. Time / Frame : 00:00:28 - 145,088.7 PPD
 All  Time / Frame : 00:00:28 - 145,088.7 PPD
 Eff. Time / Frame : 00:00:29 - 139,562.8 PPD
 
HFM.net says the CPU slots are getting 10K PPD and 13K PPD and the Titan is getting 0. Full log is at this location.

Could you update HFM with the project data ? Should give you the right PPD reading.

But more important and out of curiosity and if you the time: could you try to link the core 0x21 or 0x18 processes to dedicated CPU cores ? With a tool like htop under Linux ?
Or additionally posting over in FF for getting more Titan background and comparisons.
 
I'm a dumbass. I had 4P on the mind, but this box is only 2P with 32 cores total. I removed the "cpus" line from each slot and restarted the client. It moved the a4 cores to 15 cores each and we'll see how PPD improves.
 
Could you update HFM with the project data ? Should give you the right PPD reading.

But more important and out of curiosity and if you the time: could you try to link the core 0x21 or 0x18 processes to dedicated CPU cores ? With a tool like htop under Linux ?
Or additionally posting over in FF for getting more Titan background and comparisons.

I'm not sure how to manually update HFM.net. I changed psummary.html back to psummaryC.html but the latest projects are still not listed.

I also removed the second CPU slot completely and the client changed to 31 cores running the a4.
 
@brilong: set the cpu to 30, bigger primes are not preferred (3, 5, 7 can be ok; from 11 onwards getting risky and often disabled by the project managers).

Did the tpf improved on the Titan ?

As for HFM, sorry, not using it. But I hear from others there is a function to update the data. No idea how ... I'm on my iPad for monitoring. Maybe someone else can jump in for that topic.
 
brilong on HFM on the tools tab in the drop down window choose Download projects from Stanford. Also 31 works fine all of the WU's I got from the server that the big boxen are assigned to had no problem with 31.
 
My install of HFM would not update the projects properly until I tried http://fah-web.stanford.edu/new/psummary.html.

This was the ticket to get the latest project info. Thanks!

It shows my 2P E5-4640 is getting 112K PPD on 9752, but the accompanying Titan is getting 31K. I'm going to pause the CPU slot as someone suggested and see if the Titan improves. I can also move it to another slot on the motherboard.
 
It's strange to me that nvidia-smi shows the GPU is 99% utilized yet the power usage and temperature is not escalating:
Code:
+------------------------------------------------------+
| NVIDIA-SMI 355.11     Driver Version: 355.11         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  On   | 0000:83:00.0     Off |                  N/A |
| 28%   44C    P8    48W / 250W |    131MiB /  6143MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     16204    C   .../NVIDIA/Fermi/beta/Core_17.fah/FahCore_17   114MiB |
+-----------------------------------------------------------------------------+
 
I found out my "set-gpu-fans" script was not properly setting the fan target to 85% because the newer drivers changed the parameter from GPUCurrentFanSpeed to GPUTargetFanSpeed. I fixed that script and now the fan is pegged at 85%. Let's hope the TPF decreases.
 
Think this might be a problem?

Attribute 'GPUCurrentClockFreqs' (bester:0[gpu:0]): 324,324.

What the heck? Investigating why it's running in P8 mode (according to nvidia-smi) which appears to be low performance mode.
 
I run my headless machines without GUI, so I run a script that generates a temporary xorg.conf and calls "xinit" with the temp file to run nvidia-settings. I found out if I changed the xorg.conf to include the following, I'm running full-blast (but not overclocked):
Code:
    Option         "Coolbits" "28"
    Option "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x3322; PowerMizerDefaultAC=0x1"

I would like to know if there is a way to overclock this Titan Black from the CLI. When I search through the nvidia-settings -q all output, I see the max clock is 1202 MHZ but I'm running at 1058. Any ideas on how to push this card faster would be appreciated.
 
Sorry brilong I missed your post run the below commands for voltage adjustments sorry I do not know the command for frequency I use xserver GUI for that. The first gives you the range mine is listed below and I have the ability to add 0 - 37500v to the 980 Classified. The second sets the V once you know your range.

nvidia-settings -q all | grep -i voltage

sudo nvidia-settings -a GPUOverVoltageOffset=37500

Code:
  Attribute 'GPUCurrentCoreVoltage' (scotty:0.0): 1226250.
    'GPUCurrentCoreVoltage' is an integer attribute.
    'GPUCurrentCoreVoltage' is a read-only attribute.
    'GPUCurrentCoreVoltage' can use the following target types: X Screen, GPU.
  Attribute 'GPUOverVoltageOffset' (scotty:0.0): 13750.
    The valid values for 'GPUOverVoltageOffset' are in the range  [COLOR="Red"]0 - 37500[/COLOR]
    'GPUOverVoltageOffset' can use the following target types: X Screen, GPU.
  Attribute 'GPUCurrentCoreVoltage' (scotty:0[gpu:0]): 1226250.
    'GPUCurrentCoreVoltage' is an integer attribute.
    'GPUCurrentCoreVoltage' is a read-only attribute.
    'GPUCurrentCoreVoltage' can use the following target types: X Screen, GPU.
  Attribute 'GPUOverVoltageOffset' (scotty:0[gpu:0]): 13750.
    The valid values for 'GPUOverVoltageOffset' are in the range [COLOR="Red"]0 - 37500[/COLOR]
    'GPUOverVoltageOffset' can use the following target types: X Screen, GPU.
 
Back
Top