Could GTX4## owners run a benchmark for me?

InvisiBill

2[H]4U
Joined
Jan 2, 2003
Messages
2,608
I asked the HD5XXX owners to do the same thing when I was looking at switching to ATI. Now I'd like to compare my numbers to the new Nvidia cards.

  1. Go to the Pre-Release Clients page and download the [x86/CUDA-2.2] client.
  2. Unzip the file somewhere.
  3. Run dnetc.exe -runoffline -l benchmark.txt -cpuinfo
  4. Run dnetc.exe -runoffline -l benchmark.txt -bench
  5. Post your results here. "CODE" tags will keep it a manageable size.
That should run dnetc, keeping it completely offline and logging to benchmark.txt for easy copying of the results. The first command will output info about the GPU, while the second one will benchmark all the different cores.

You should end up with something like this:
Code:
dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[Mar 31 15:40:29 UTC] nvcuda.dll Version: 8.17.11.9716
Automatic processor identification tag: 8192
	name: Quadro NVS 160M (1 MPs)
Estimated processor clock speed (0 if unknown): 1450 MHz
Number of processors detected by this client: 1
Number of processors supported by this client: 128

dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[Mar 31 15:40:32 UTC] nvcuda.dll Version: 8.17.11.9716
[Mar 31 15:40:32 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[Mar 31 15:40:51 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
                      0.00:00:16.52 [6,410,666 keys/sec]
[Mar 31 15:40:51 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
[Mar 31 15:41:10 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
                      0.00:00:16.38 [8,615,123 keys/sec]
[Mar 31 15:41:10 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
[Mar 31 15:41:32 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
                      0.00:00:16.30 [9,823,934 keys/sec]
[Mar 31 15:41:32 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
[Mar 31 15:41:52 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
                      0.00:00:17.75 [8,775,579 keys/sec]
[Mar 31 15:41:52 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
[Mar 31 15:42:16 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
                      0.00:00:18.06 [8,881,913 keys/sec]
[Mar 31 15:42:16 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
[Mar 31 15:42:40 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
                      0.00:00:19.26 [10,039,487 keys/sec]
[Mar 31 15:42:40 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
[Mar 31 15:42:43 UTC] RC5 CUDA ERROR [0]: 'the launch timed out and was terminated' (cudaEventSynchronize)

[Mar 31 15:42:44 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #6 (CUDA 4-pipe 64-thd)
[Mar 31 15:42:44 UTC] Core #6 is significantly faster than the default core.
                      The GPU core selection has been made as a tradeoff between core speed
                      and responsiveness of the graphical desktop.
                      Please file a bug report along with the output of -cpuinfo
                      only if the the faster core selection does not degrade graphics performance.

For some frame of reference, the GTX285 is listed in their database as doing 325mkeys/sec, which is right around what mine did. The 9800GT does around 130mkeys/sec.
 
I DON'T have a Fermi card, but I ran it anyway for shits and giggles.

Code:
dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[May 29 02:20:46 UTC] nvcuda.dll Version: 8.17.11.9745
Automatic processor identification tag: 16384
	name: GeForce GTX 260 (27 MPs)
Estimated processor clock speed (0 if unknown): 1242 MHz
Number of processors detected by this client: 2
Number of processors supported by this client: 128

dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[May 29 02:21:03 UTC] nvcuda.dll Version: 8.17.11.9745
[May 29 02:21:04 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[May 29 02:21:23 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
                      0.00:00:17.05 [140,478,632 keys/sec]
[May 29 02:21:23 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
[May 29 02:21:42 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
                      0.00:00:16.45 [140,357,995 keys/sec]
[May 29 02:21:42 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
[May 29 02:22:01 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
                      0.00:00:16.53 [135,783,479 keys/sec]
[May 29 02:22:01 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
[May 29 02:22:20 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
                      0.00:00:16.50 [143,135,023 keys/sec]
[May 29 02:22:20 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
[May 29 02:22:39 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
                      0.00:00:16.53 [134,430,805 keys/sec]
[May 29 02:22:39 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
[May 29 02:23:00 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
                      0.00:00:16.80 [129,945,803 keys/sec]
[May 29 02:23:00 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
[May 29 02:23:20 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
                      0.00:00:17.23 [133,516,792 keys/sec]
[May 29 02:23:20 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
[May 29 02:23:40 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
                      0.00:00:16.78 [130,484,122 keys/sec]
[May 29 02:23:40 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
[May 29 02:24:00 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
                      0.00:00:16.42 [145,868,957 keys/sec]
[May 29 02:24:00 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
[May 29 02:24:20 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd busy wait)
                      0.00:00:17.31 [231,842,502 keys/sec]
[May 29 02:24:20 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
[May 29 02:24:40 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sleep 100us)
                      0.00:00:17.17 [205,565,287 keys/sec]
[May 29 02:24:40 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dynamic).
[May 29 02:24:58 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sleep dynamic)
                      0.00:00:16.50 [228,917,981 keys/sec]
[May 29 02:24:58 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #9 (CUDA 1-pipe 64-thd busy wait)
[May 29 02:24:58 UTC] Core #9 is significantly faster than the default core.
                      The GPU core selection has been made as a tradeoff between core speed
                      and responsiveness of the graphical desktop.
                      Please file a bug report along with the output of -cpuinfo
                      only if the the faster core selection does not degrade graphics performance.
 
EVGA GTX480 SC
Code:
dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.0).

[May 29 04:53:26 UTC] nvcuda.dll Version: 8.17.11.9775
Automatic processor identification tag: 32768
	name: GeForce GTX 480 (15 MPs)
Estimated processor clock speed (0 if unknown): 810 MHz
Number of processors detected by this client: 1
Number of processors supported by this client: 128

dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.0).

[May 29 04:53:32 UTC] nvcuda.dll Version: 8.17.11.9775
Automatic processor identification tag: 32768
	name: GeForce GTX 480 (15 MPs)
Estimated processor clock speed (0 if unknown): 810 MHz
Number of processors detected by this client: 1
Number of processors supported by this client: 128

dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.0).

[May 29 04:54:01 UTC] nvcuda.dll Version: 8.17.11.9775
[May 29 04:54:01 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[May 29 04:54:11 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
                      0.00:00:07.59 [569,377,244 keys/sec]
[May 29 04:54:11 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
[May 29 04:54:21 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
                      0.00:00:07.37 [587,062,637 keys/sec]
[May 29 04:54:21 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
[May 29 04:54:35 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
                      0.00:00:11.16 [386,106,977 keys/sec]
[May 29 04:54:35 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
[May 29 04:54:45 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
                      0.00:00:07.52 [574,706,146 keys/sec]
[May 29 04:54:45 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
[May 29 04:55:00 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
                      0.00:00:11.17 [385,839,927 keys/sec]
[May 29 04:55:00 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
[May 29 04:55:16 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
                      0.00:00:13.02 [330,473,251 keys/sec]
[May 29 04:55:16 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
[May 29 04:55:30 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
                      0.00:00:11.15 [387,182,936 keys/sec]
[May 29 04:55:30 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
[May 29 04:55:46 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
                      0.00:00:12.91 [334,835,156 keys/sec]
[May 29 04:55:46 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
[May 29 04:56:04 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
                      0.00:00:13.94 [310,451,244 keys/sec]
[May 29 04:56:04 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
[May 29 04:56:14 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd busy wait)
                      0.00:00:07.59 [572,090,840 keys/sec]
[May 29 04:56:14 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
[May 29 04:56:32 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sleep 100us)
                      0.00:00:15.45 [278,980,371 keys/sec]
[May 29 04:56:32 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dynamic).
[May 29 04:56:42 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sleep dynamic)
                      0.00:00:07.59 [569,717,408 keys/sec]
[May 29 04:56:42 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #1 (CUDA 1-pipe 128-thd)
[May 29 04:56:42 UTC] Core #1 is significantly faster than the default core.
                      The GPU core selection has been made as a tradeoff between core speed
                      and responsiveness of the graphical desktop.
                      Please file a bug report along with the output of -cpuinfo
                      only if the the faster core selection does not degrade graphics performance.
 
Galaxy 470 GC

Code:
dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[May 29 05:30:42 UTC] nvcuda.dll Version: 8.17.12.5715
Automatic processor identification tag: 32768
	name: GeForce GTX 470 (14 MPs)
Estimated processor clock speed (0 if unknown): 810 MHz <- Detected my low-power clock speed instead of full.
Number of processors detected by this client: 1
Number of processors supported by this client: 128
[May 29 05:30:49 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #0 (CUDA 1-pipe 64-thd)

dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[May 29 05:30:59 UTC] nvcuda.dll Version: 8.17.12.5715
[May 29 05:30:59 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[May 29 05:31:11 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
                      0.00:00:09.45 [460,140,897 keys/sec]
[May 29 05:31:11 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
[May 29 05:31:30 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
                      0.00:00:16.55 [262,198,240 keys/sec]
[May 29 05:31:30 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
[May 29 05:31:50 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
                      0.00:00:16.55 [256,510,477 keys/sec]
[May 29 05:31:50 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
[May 29 05:32:09 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
                      0.00:00:16.36 [260,453,956 keys/sec]
[May 29 05:32:09 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
[May 29 05:32:28 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
                      0.00:00:16.84 [263,998,973 keys/sec]
[May 29 05:32:28 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
[May 29 05:32:48 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
                      0.00:00:16.77 [256,450,606 keys/sec]
[May 29 05:32:48 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
[May 29 05:33:07 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
                      0.00:00:16.72 [263,650,745 keys/sec]
[May 29 05:33:07 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
[May 29 05:33:27 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
                      0.00:00:16.49 [263,562,658 keys/sec]
[May 29 05:33:27 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
[May 29 05:33:47 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
                      0.00:00:16.52 [253,294,249 keys/sec]
[May 29 05:33:47 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
[May 29 05:33:59 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd busy wait)
                      0.00:00:09.43 [458,676,424 keys/sec]
[May 29 05:33:59 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
[May 29 05:34:17 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sleep 100us)
                      0.00:00:15.47 [279,182,209 keys/sec]
[May 29 05:34:17 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dynamic).
[May 29 05:34:28 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sleep dynamic)
                      0.00:00:09.50 [455,211,969 keys/sec]
[May 29 05:34:28 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #0 (CUDA 1-pipe 64-thd
 
Galaxy GTX 480 --> Xtreme Tuner HD (Galaxy's Tuning software) set to 850/1700/2000 @ 1125mv

Code:
dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[May 29 10:27:25 UTC] nvcuda.dll Version: 8.17.12.5715
Automatic processor identification tag: 32768
	name: GeForce GTX 480 (15 MPs)
Estimated processor clock speed (0 if unknown): 810 MHz
Number of processors detected by this client: 1
Number of processors supported by this client: 128

dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[May 29 10:27:34 UTC] nvcuda.dll Version: 8.17.12.5715
[May 29 10:27:34 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[May 29 10:27:43 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
                      0.00:00:06.45 [669,728,107 keys/sec]
[May 29 10:27:43 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
[May 29 10:27:52 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
                      0.00:00:06.25 [692,912,543 keys/sec]
[May 29 10:27:52 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
[May 29 10:28:05 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
                      0.00:00:10.01 [434,337,977 keys/sec]
[May 29 10:28:05 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
[May 29 10:28:13 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
                      0.00:00:06.38 [682,335,883 keys/sec]
[May 29 10:28:13 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
[May 29 10:28:26 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
                      0.00:00:10.07 [428,084,463 keys/sec]
[May 29 10:28:26 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
[May 29 10:28:41 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
                      0.00:00:11.91 [362,115,985 keys/sec]
[May 29 10:28:41 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
[May 29 10:28:54 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
                      0.00:00:09.99 [430,954,174 keys/sec]
[May 29 10:28:54 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
[May 29 10:29:08 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
                      0.00:00:11.77 [368,394,953 keys/sec]
[May 29 10:29:08 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
[May 29 10:29:23 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
                      0.00:00:11.82 [370,359,134 keys/sec]
[May 29 10:29:23 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
[May 29 10:29:32 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd busy wait)
                      0.00:00:06.45 [671,678,414 keys/sec]
[May 29 10:29:32 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
[May 29 10:29:50 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sleep 100us)
                      0.00:00:15.46 [280,272,533 keys/sec]
[May 29 10:29:50 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dynamic).
[May 29 10:29:59 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sleep dynamic)
                      0.00:00:06.45 [672,841,067 keys/sec]
[May 29 10:29:59 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #1 (CUDA 1-pipe 128-thd)
[May 29 10:29:59 UTC] Core #1 is significantly faster than the default core.
                      The GPU core selection has been made as a tradeoff between core speed
                      and responsiveness of the graphical desktop.
                      Please file a bug report along with the output of -cpuinfo
                      only if the the faster core selection does not degrade graphics performance.

Next, I ran it at stock clocks 700/1401/1848 @ 1000mv
Code:
dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[May 29 10:34:38 UTC] nvcuda.dll Version: 8.17.12.5715
Automatic processor identification tag: 32768
	name: GeForce GTX 480 (15 MPs)
Estimated processor clock speed (0 if unknown): 810 MHz
Number of processors detected by this client: 1
Number of processors supported by this client: 128

dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[May 29 10:34:46 UTC] nvcuda.dll Version: 8.17.12.5715
[May 29 10:34:46 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[May 29 10:34:56 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
                      0.00:00:07.81 [552,871,329 keys/sec]
[May 29 10:34:56 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
[May 29 10:35:06 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
                      0.00:00:07.59 [568,434,392 keys/sec]
[May 29 10:35:06 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
[May 29 10:35:20 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
                      0.00:00:11.32 [381,215,302 keys/sec]
[May 29 10:35:20 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
[May 29 10:35:30 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
                      0.00:00:07.73 [559,366,666 keys/sec]
[May 29 10:35:30 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
[May 29 10:35:44 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
                      0.00:00:11.35 [380,507,709 keys/sec]
[May 29 10:35:44 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
[May 29 10:36:00 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
                      0.00:00:13.22 [325,958,577 keys/sec]
[May 29 10:36:00 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
[May 29 10:36:14 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
                      0.00:00:11.29 [381,593,579 keys/sec]
[May 29 10:36:14 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
[May 29 10:36:30 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
                      0.00:00:13.05 [330,874,878 keys/sec]
[May 29 10:36:30 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
[May 29 10:36:48 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
                      0.00:00:14.04 [311,582,660 keys/sec]
[May 29 10:36:48 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
[May 29 10:36:58 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd busy wait)
                      0.00:00:07.81 [555,861,691 keys/sec]
[May 29 10:36:58 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
[May 29 10:37:16 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sleep 100us)
                      0.00:00:15.46 [280,509,341 keys/sec]
[May 29 10:37:16 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dynamic).
[May 29 10:37:26 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sleep dynamic)
                      0.00:00:07.81 [552,839,467 keys/sec]
[May 29 10:37:26 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #1 (CUDA 1-pipe 128-thd)
[May 29 10:37:26 UTC] Core #1 is marginally faster than the default core.
                      Testing variability might lead to pick one or the other.

The overclocked benchmark's core 1 is 21.9% faster than the stock clock's. :-D
 
Last edited:
gtx 480 900/2050
Code:
dnetc v2.9107-516-GTR-09122713 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[May 29 14:18:27 UTC] nvcuda.dll Version: 8.17.11.9741
Automatic processor identification tag: 32768
	name: GeForce GTX 480 (15 MPs)
Estimated processor clock speed (0 if unknown): 810 MHz
Number of processors detected by this client: 1
Number of processors supported by this client: 128

dnetc v2.9107-516-GTR-09122713 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[May 29 14:18:31 UTC] nvcuda.dll Version: 8.17.11.9741
[May 29 14:18:31 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[May 29 14:18:40 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
                      0.00:00:06.17 [704,784,988 keys/sec]
[May 29 14:18:40 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
[May 29 14:18:48 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
                      0.00:00:05.97 [736,033,746 keys/sec]
[May 29 14:18:48 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
[May 29 14:19:01 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
                      0.00:00:10.42 [418,284,822 keys/sec]
[May 29 14:19:01 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
[May 29 14:19:10 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
                      0.00:00:06.10 [718,379,520 keys/sec]
[May 29 14:19:10 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
[May 29 14:19:23 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
                      0.00:00:10.54 [413,445,287 keys/sec]
[May 29 14:19:23 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
[May 29 14:19:35 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
                      0.00:00:09.39 [461,951,928 keys/sec]
[May 29 14:19:35 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
[May 29 14:19:48 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
                      0.00:00:10.45 [415,964,832 keys/sec]
[May 29 14:19:48 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
[May 29 14:19:59 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
                      0.00:00:09.18 [474,886,780 keys/sec]
[May 29 14:19:59 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
[May 29 14:20:12 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
                      0.00:00:10.49 [412,621,300 keys/sec]
[May 29 14:20:12 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
[May 29 14:20:21 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd busy wait)
                      0.00:00:06.11 [717,066,677 keys/sec]
[May 29 14:20:21 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
[May 29 14:20:40 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sleep 100us)
                      0.00:00:16.23 [208,990,792 keys/sec]
[May 29 14:20:40 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dynamic).
[May 29 14:20:53 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sleep dynamic)
                      0.00:00:10.42 [419,241,273 keys/sec]
[May 29 14:20:53 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #1 (CUDA 1-pipe 128-thd)
[May 29 14:20:53 UTC] Core #1 is significantly faster than the default core.
                      The GPU core selection has been made as a tradeoff between core speed
                      and responsiveness of the graphical desktop.
                      Please file a bug report along with the output of -cpuinfo
                      only if the the faster core selection does not degrade graphics performance.
 
gtx 480 900/2050

Wow, nice OC. How are you cooling your 480? Is that 900mhz stable in games and/or furmark xtreme burning mode?

I've taken mine to 850, but that seems to be as far as I can go at 1125mv. I don't think heat is my issue, the card never even hits 90 degrees while gaming. :(
 
Here is my GTX 470 at 800-memory 2000-shader-1650

Code:
dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[May 29 23:39:12 UTC] nvcuda.dll Version: 8.17.12.5715
Automatic processor identification tag: 32768
	name: GeForce GTX 470 (14 MPs)
Estimated processor clock speed (0 if unknown): 810 MHz
Number of processors detected by this client: 1
Number of processors supported by this client: 128

dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[May 29 23:39:17 UTC] nvcuda.dll Version: 8.17.12.5715
[May 29 23:39:17 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[May 29 23:39:27 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
                      0.00:00:07.50 [580,366,425 keys/sec]
[May 29 23:39:27 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
[May 29 23:39:37 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
                      0.00:00:07.18 [604,136,353 keys/sec]
[May 29 23:39:37 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
[May 29 23:39:51 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
                      0.00:00:11.14 [388,998,561 keys/sec]
[May 29 23:39:51 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
[May 29 23:40:01 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
                      0.00:00:07.36 [590,123,681 keys/sec]
[May 29 23:40:01 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
[May 29 23:40:15 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
                      0.00:00:11.20 [385,421,546 keys/sec]
[May 29 23:40:15 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
[May 29 23:40:31 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
                      0.00:00:13.04 [330,186,102 keys/sec]
[May 29 23:40:31 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
[May 29 23:40:45 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
                      0.00:00:11.10 [389,737,964 keys/sec]
[May 29 23:40:45 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
[May 29 23:41:01 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
                      0.00:00:12.87 [336,263,302 keys/sec]
[May 29 23:41:01 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
[May 29 23:41:18 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
                      0.00:00:13.71 [318,548,372 keys/sec]
[May 29 23:41:18 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
[May 29 23:41:28 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd busy wait)
                      0.00:00:07.39 [586,708,995 keys/sec]
[May 29 23:41:28 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
[May 29 23:41:47 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sleep 100us)
                      0.00:00:16.10 [268,257,385 keys/sec]
[May 29 23:41:47 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dynamic).
[May 29 23:41:57 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sleep dynamic)
                      0.00:00:07.43 [583,035,324 keys/sec]
[May 29 23:41:57 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #1 (CUDA 1-pipe 128-thd)
[May 29 23:41:57 UTC] Core #1 is significantly faster than the default core.
                      The GPU core selection has been made as a tradeoff between core speed
                      and responsiveness of the graphical desktop.
                      Please file a bug report along with the output of -cpuinfo
                      only if the the faster core selection does not degrade graphics performance.
 
Last edited:
Thanks for the numbers, guys. Looks like a good overclock on a 480 can get you about twice the speed of a 285.
 
Wow, nice OC. How are you cooling your 480? Is that 900mhz stable in games and/or furmark xtreme burning mode?

I've taken mine to 850, but that seems to be as far as I can go at 1125mv. I don't think heat is my issue, the card never even hits 90 degrees while gaming. :(
Water, and I didn't test it in games, just ran this test and couple of 3dmark runs (Vantage and 2006)
Thanks for the numbers, guys. Looks like a good overclock on a 480 can get you about twice the speed of a 285.
Pretty sure it can do better if the client was tailored for Fermi
 
AFAIK it already is...you were asked to run the cuda version of the client.

CUDA is generic programming interface for most nvidia cards and not specific to Fermi. Pretty sure you can optimize for Fermi and gain additional performance out of it. Look at the date of the dnetc client: it's made in Dec 2009 when no one even knew what Fermi is gonna be like.
 
CUDA is generic programming interface for most nvidia cards and not specific to Fermi. Pretty sure you can optimize for Fermi and gain additional performance out of it. Look at the date of the dnetc client: it's made in Dec 2009 when no one even knew what Fermi is gonna be like.

I don't think it works that way, CUDA is supposed to work on ALL cuda hardware, so you won't find people coding cuda apps JUST for Fermi. the compilers are probably setup already to have the code executed in the most efficient way, and the entire CUDA architecture is also probably setup to take advantage of "better" hardware when it's present.

AKA. Physx / CUDA apps run better on a G92 9800GTX vs an 8800GTX with the same amount of cores.
 
Possibly, but Fermi was touted as the general computing GPU as opposed to simple GPU of previous generations, so I assume nvidia introduced some sort of CUDA extensions to utilize extra capabilities in Fermi architecture that the programmers of the old version of dcnetc may not have been aware of when they created that version of the client. On the other hand, it's only a speculation :D
 
Possibly, but Fermi was touted as the general computing GPU as opposed to simple GPU of previous generations, so I assume nvidia introduced some sort of CUDA extensions to utilize extra capabilities in Fermi architecture that the programmers of the old version of dcnetc may not have been aware of when they created that version of the client. On the other hand, it's only a speculation :D

Its a perfectly valid speculation. :) Considering that they are making a F@H client especially for the Fermi series, its reasonable to think that this is done due to the standard CUDA implementation doesn't work well enough with Fermi and performance isn't right where it should be yet.
 
System in sig (non SLI)

Code:
dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[Jun 01 20:38:00 UTC] nvcuda.dll Version: 8.17.11.9775
Automatic processor identification tag: 32768
	name: GeForce GTX 480 (15 MPs)
Estimated processor clock speed (0 if unknown): 810 MHz
Number of processors detected by this client: 2
Number of processors supported by this client: 128

dnetc v2.9107-516-GTR-09122714 for CUDA 2.2 on Win32 (WindowsNT 6.1).

[Jun 01 20:39:09 UTC] nvcuda.dll Version: 8.17.11.9775
[Jun 01 20:39:09 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[Jun 01 20:39:19 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd)
                      0.00:00:07.56 [573,667,305 keys/sec]
[Jun 01 20:39:19 UTC] RC5-72: using core #1 (CUDA 1-pipe 128-thd).
[Jun 01 20:39:29 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd)
                      0.00:00:07.33 [590,755,852 keys/sec]
[Jun 01 20:39:29 UTC] RC5-72: using core #2 (CUDA 1-pipe 256-thd).
[Jun 01 20:39:43 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd)
                      0.00:00:11.12 [387,766,514 keys/sec]
[Jun 01 20:39:43 UTC] RC5-72: using core #3 (CUDA 2-pipe 64-thd).
[Jun 01 20:39:53 UTC] RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd)
                      0.00:00:07.44 [581,456,457 keys/sec]
[Jun 01 20:39:53 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).
[Jun 01 20:40:07 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)
                      0.00:00:11.18 [385,647,125 keys/sec]
[Jun 01 20:40:07 UTC] RC5-72: using core #5 (CUDA 2-pipe 256-thd).
[Jun 01 20:40:23 UTC] RC5-72: Benchmark for core #5 (CUDA 2-pipe 256-thd)
                      0.00:00:13.08 [328,498,514 keys/sec]
[Jun 01 20:40:23 UTC] RC5-72: using core #6 (CUDA 4-pipe 64-thd).
[Jun 01 20:40:36 UTC] RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd)
                      0.00:00:11.09 [388,784,786 keys/sec]
[Jun 01 20:40:36 UTC] RC5-72: using core #7 (CUDA 4-pipe 128-thd).
[Jun 01 20:40:52 UTC] RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd)
                      0.00:00:12.91 [333,755,833 keys/sec]
[Jun 01 20:40:52 UTC] RC5-72: using core #8 (CUDA 4-pipe 256-thd).
[Jun 01 20:41:09 UTC] RC5-72: Benchmark for core #8 (CUDA 4-pipe 256-thd)
                      0.00:00:13.90 [314,877,652 keys/sec]
[Jun 01 20:41:09 UTC] RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
[Jun 01 20:41:20 UTC] RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd busy wait)
                      0.00:00:07.53 [574,176,260 keys/sec]
[Jun 01 20:41:20 UTC] RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
[Jun 01 20:41:38 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sleep 100us)
                      0.00:00:15.49 [278,749,729 keys/sec]
[Jun 01 20:41:38 UTC] RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dynamic).
[Jun 01 20:41:47 UTC] RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sleep dynamic)
                      0.00:00:07.55 [573,114,899 keys/sec]
[Jun 01 20:41:47 UTC] RC5-72 benchmark summary :
                      Default core : #0 (CUDA 1-pipe 64-thd)
                      Fastest core : #1 (CUDA 1-pipe 128-thd)
[Jun 01 20:41:47 UTC] Core #1 is marginally faster than the default core.
                      Testing variability might lead to pick one or the other.
 
Back
Top