Collatz

Yes, sorry... I was referring to the original Genefer Short WU's in my post above. I hadn't crunched any of the newer short ones yet. I have crunched some now and they do only take ~5 minutes each on a 7970/280X and ~5:45 on a 7950. Looks like they are worth about 589 points apiece.
 
The apps for CUDA and OpenCL were recently updated. Some have reported that it replaced the previous config file. So, anyone running these apps may want to check their config to make sure it is still correct. I have not looked at mine yet, but plan to do so this weekend. For those that haven't done it yet, please consult the quote in my very first post here.
 
The apps for CUDA and OpenCL were recently updated. Some have reported that it replaced the previous config file. So, anyone running these apps may want to check their config to make sure it is still correct. I have not looked at mine yet, but plan to do so this weekend. For those that haven't done it yet, please consult the quote in my very first post here.

It definitely replaces your config file, if you had one, as it is a new app version. There is also a new parameter added that I highly recommend utilizing, as it allowed the Large WU's to complete in about the same amount of time as they did with the old app. When I simply used my old config parameters without utilizing the new parameter, the Large WU times took several hours longer than they had previously with the old app.

Here is what Slicker wrote when the new apps were released:
CUDA version 6.06 and OpenCL version 6.08 were released for Windows today. The CUDA version fixes (I hope) a "device not ready" bug seen by some fast GPUs. While I could not duplicate the error, the code now waits for the events to synchronize which should eliminate the error.

The OpenCL version, 6.08, now includes the bug fix where the GPU and CPU results were not always matching. It uses several of the optimizations Sosirus has provided and includes a new "lut_size" configuration option which now defaults to 12 (4096 items). The previous version used a 2^20 sized lookup table which did not fit into the GPU's cache causing it to be memory rather than processor bound. So, you should see a higher GPU utilization with the new version and it should not be as dependent upon memory speed as the previous versions.

Linux and OS X versions with the same fixes will follow soon.

As usual, let me know if you have any issues with the new versions.

Here are my config file settings I'm using on my 7970's/280X's. As always, be sure to experiment with these values, as ymmv:
verbose=1
lut_size=16
items_per_kernel=22
kernels_per_reduction=9
threads=8
sleep=1
 
I had not ran CPU work units at this project outside of Android devices for quite a while and was surprised when I witnessed a MT work unit crunching away on my quad core laptop. I asked Slicker if this was a bad work unit or if it was working as designed. He said that it does run on multiple CPU's but he recommended to disable that app in the preferences as it was more efficient to run separate work units.

It does, but not as efficiently as running 4 WUs on each of the 4 cores. The overhead of multi-threading loses about 3% on quad cores and 5% on 8 core machines.
 
I forgot to post this back in September. It was on the front page News section

If the sieve application fails immediately after it starts, you probably need to install the latest Microsoft Visual C++ runtime and, since BOINC will send your computer both 32 and 64 bit applications, you will need to install both the 32 and 64 but C++ runtime (see main page for a link or just Google it).

I also think they may have depricated the old CAL work units. I'm waiting to hear back from Slicker on this one. I know it doesn't affect that many, but there aren't many projects out there that still support those old cards...
 
Yup..confirmed
You likely aren't getting work because there are no CAL/Brook+ apps for the sieve app. AMD switched from CAL/Brook+ to OpenCL back in 2011, having done no new development on it for since 2008 -- and not having fixed the bugs that were present since 2005.

Unlike nVidia which still provides OpenCL drivers for GPUs made since 2006, AMD did not bother with backwards compatibility when they switched to OpenCL. To continue using the old apps makes no sense because the new sieve app runs 70 times faster (or rather, doesn't 1/70 the work by eliminating the numbers via the sieve which allows it to check 70 times more numbers per WU) and the old app was producing invalid results because of a latent bug that only recently started showing up -- likely because the numbers being checked were greater than a given threshold.

If someone wants to fix the bugs in the existing Brook compiler and make it compatible with current development tools, I'd be willing adding a new Brook+ app. You can get the Brook+ source code at http://sourceforge.net/projects/brookplus/
http://boinc.thesonntags.com/collatz/forum_thread.php?id=1331&postid=21562#21562
 
Windows machines now require Visual Studio 2012 C++ runtime.

Many machines are also having a lot of errored work units. some people have fixed with resetting the project. I'm just now looking into my machines.
 
And I can confirm that 64bit versions of Windows will need both the x86 and x64bit versions of the Visual Studio runtime to be installed.
 
As anyone tried adjusting the config files to see what is the best performance with the new Sieve apps? I was given some suggestions for a couple cards from a member of another team that I plan on testing. It could mean a rather large bump in production if they meet the scoring he claims. I plan on doing some testing this weekend.
 
Can anyone confirm these numbers?

7970 = 1:40 - 1:50
verbose=1
kernels_per_reduction=64
threads=8
lut_size=16
sleep=1
reduce_cpu=0
sieve_size=30

270x = 2:20 - 2:40
verbose=1
kernels_per_reduction=64
threads=8
lut_size=16
sleep=1
reduce_cpu=0
sieve_size=29

750ti = 5:20 - 5:40
verbose=1
kernels_per_reduction=56
threads=8
lut_size=17
sleep=1
reduce_cpu=0
sieve_size=28

I am running the 750ti's right now. I saw my run time cut in half but cpu increased. That was the first work unit. I have to finish up a few other projects work units before I will have more to test on Collatz.
 
For some numbers, this is what I was told a few cards could make using these changes.

If the server and internet could stay stable a pitcairn can make ~2m a day, tahiti ~3.2m a day, 750ti 1mil a day, gtx 560 can even make 570k a day on stock clocks/voltages air cooling.
 
A GT430 using the same settings as the 750ti above took about 1.8 hours. However, I didn't have any in my records showing what it was before the setting. Since the GT430 is in the same box as the two 750ti's, I can't remove the settings without dropping their performance.
 
Some additional settings pulled from EVGA
GTX 980
<app_config>
<app>
<max_concurrent>8</max_concurrent>
<name>collatz_sieve</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>

verbose=1
kernels_per_reduction=16
threads=8
lut_size=16
sleep=1
reduce_cpu=0
sieve_size=16

R9 280x
<app_config>
<app>
<max_concurrent>12</max_concurrent>
<name>collatz_sieve</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>


verbose=1
kernels_per_reduction=64
threads=8
lut_size=16
sleep=1
reduce_cpu=0
sieve_size=30
 
Thanks for these settings! Any chance you've come across optimized settings for Intel GPU's?
 
I have not seen any. Not a lot of people pushing those...but would be interested in some as well.
 
I will start with the 270X settings and see how those do on my Intel HD4600's.
 
Using the 270X settings for my Intel GPU's has reduced the average Sieve WU completion time from ~1 hour down to ~30 minutes, so about half the time they used to take. The WU's use little to no CPU with these settings. I may tweak some more to see if they can do any better.

Edit: I can confirm the 270X settings also allow my 7870 (same GPU as a 270X) to complete the Sieve WU's in ~3 minutes each. I forget how long they took before the optimization, but I know it was pretty close to double that time with stock settings. I am not running Collatz on my other GPU's at this time, so I can't comment on the other tweaks.
 
Last edited:
Just passed 10M points a few hours ago. Got a gold badge here
upload_2016-9-24_21-59-21.png
in the process.

Thought that I just give this a shot. Wow the points you get is amazing. Probably the second highest ppd behind Bitcoin Utopia (if you happen to have the ASIC hardware)

Could not find any real life useful application for collatz conjecture via google search but it was fun getting the points.

My first DC project other than FAH with 10M points :LOL:. Decided to stop at this mark.

upload_2016-9-24_22-11-34.png
 
Last edited:
Be sure to optimize Collatz crunching with a gpu.config file.
Is the gpu.config still applicable for collatz sieve work unit? When I set this up last week, I only play around with app_config.xml and collatz_sieve_1.21_windows_x86_64__opencl_nvidia_gpu.config file but I didn't see any gain at all. So I ran without using these xml and config files.
 
pututu, post your results for which card you used and I'm sure someone could compare.
 
Is the gpu.config still applicable for collatz sieve work unit? When I set this up last week, I only play around with app_config.xml and collatz_sieve_1.21_windows_x86_64__opencl_nvidia_gpu.config file but I didn't see any gain at all. So I ran without using these xml and config files.

Oh yes it greatly reduces the time for a WU to complete. My 1070 was around 60-62s and my 970 was 1:40 min:sec if I recall. Before the config file the 970 was over 3min.
 
Card is MSI GTX 970, running at 1.26GHz with 2695v2 (see my sig below). PPD is about 1.5M or about 241 seconds (4min 1sec) per WU without using app_config.xml and collatz_sieve_1.21_windows_x86_64__opencl_nvidia_gpu.config files. I could overclock this baby up to 1.5GHz (the highest I've tried during FAH last year) but the heat generated will be great in this rig since I'm also running 11 cores for WCG, leaving one core for collatz. Also, later I ran 12 cores and more and it seems did not affect the PPD.

mmonnin, care to share your config files for the 970 that you used? I tried the config files posted by Gilthanis for the 980 above and some other variants from EVGA website. Did the usual procedure of creating the files in the C:\ProgramData\BOINC\projects\boinc.thesonntags.com_collatz folder and "read config files" after updating that files. I've done this successfully on POEM running 2 WUs per card, so I'm sure I did the right procedure. The only thing that could go wrong is that I was impatient and didn't thoroughly record the changes that I've made but I reckon getting 1.5M was good but seems not good enough when compare to yours.
 
collatz_sieve_1.21_windows_x86_64__opencl_nvidia_gpu.config
verbose=1
kernels_per_reduction=32
threads=8
lut_size=17
sieve_size=30
sleep=1
reduce_cpu=0

app_config
<app_config>
<app>
<name>collatz_sieve</name>
<max_concurrent>4</max_concurrent>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.2</cpu_usage>
</gpu_versions>
</app>
</app_config>

I ran 2 tasks at once as there was a gap between WUs of like 2 seconds. At 62s a pop, 2 seconds is a lot of wasted time. I've been told the reduce_cpu=1 is supposed to change that but I still ran 2 WUs faster when they were concurrent compared to 1 at a time.
 
Thanks mmonnin. I'll try again later and will share the result.

BTW, what 970 card that you got there? GPU clock? If I can get more than double the output I can aim for 20M mark.:)
 
Thanks mmonnin. I'll try again later and will share the result.

BTW, what 970 card that you got there? GPU clock? If I can get more than double the output I can aim for 20M mark.:)

EVGA SC+. Not any of the special FTW editions or anything. It's been a bit since I ran collatz but I think it clocks up to 1440ish on that project.
 
EVGA SC+. Not any of the special FTW editions or anything. It's been a bit since I ran collatz but I think it clocks up to 1440ish on that project.
The config files got me faster throughput! The 970 running at 1200 completed 2 WUs in about 220 sec or 110 sec per WU. Even if I overclock, your 62 sec per WU still a lot faster than mine. Is yours on linux or windows?
 
The config files got me faster throughput! The 970 running at 1200 completed 2 WUs in about 220 sec or 110 sec per WU. Even if I overclock, your 62 sec per WU still a lot faster than mine. Is yours on linux or windows?

Sorry, the 62s example was on my 1070. When I got the 1070 I went to 2x WUs. I was getting the same 110s on my 970 at 1440ish running 1 WU at a time. So pretty comparable. All of this was in Windows.

You can play with some of the values. What they all mean is on the collatz forums. These options can reaaally make screen have input lag and can make using the PC unbearable.
 
Some recent reading and playing around for 1080/1080 Ti has me currently at the following settings:


collatz_sieve_1.21_windows_x86_64__opencl_nvidia_gpu.config

verbose=1
kernels_per_reduction=48
sleep=1
threads=9
lut_size=17
reduce_CPU=0
sieve_size=30
cache_sieve=1

Watch out for heat at these settings as the load moves from 70% to 95-100%. Not really a problem if you only have one gpu per rig.


app_config

<app_config>
<app>
<name>collatz_sieve</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>0.5</cpu_usage>
</gpu_versions>
</app>
</app_config>

Unlike Einstein, I haven’t tried running two tasks on each GPU yet as I’m pretty much at 100% load and the consensus seems to be to leave it be.

The above seems to be working well (Free-DC project # 1 in points so far today:)) but I’d like to experiment some more when time allows or alternate suggestions arise.

Edit: Removed max_concurrent line from app_config as suggested.
 
Last edited:
Damn! I'm gonna try those settings. How many GPUs do you have?!
 
I would also remove the <max_concurrent>5</max_concurrent> line from the app_config as it isn't really necessary unless you are intentionally wanting to tell the client to NEVER run more than 5 work units at a time. There are scenarios for that, but typically not in the case of GPu work units.
 
Damn! I'm gonna try those settings. How many GPUs do you have?!

Currently got 1x 1080 and 4x 1080 Ti.

Stop slacking and change those settings mon ami ;)

You may need to adjust for your non 1080s
 
I would also remove the <max_concurrent>5</max_concurrent> line from the app_config as it isn't really necessary unless you are intentionally wanting to tell the client to NEVER run more than 5 work units at a time. There are scenarios for that, but typically not in the case of GPu work units.

You're right but I didn't want to screw with it. A bit of a 'if it ain't broke don't fix it' mentality I'm afraid Gil :oops:
 
Not a problem. It won't improve anything unless there comes a time when you are actually trying to run more than 5 work units. If you are at near 100% with the one... that will probably never happen.
 
You guys need to hold my hand.

What settings should I use for my 980 Ti and 970? Changing over my 1080s right now.
 
You guys need to hold my hand.

What settings should I use for my 980 Ti and 970? Changing over my 1080s right now.

For the 980 Ti try the same setting but change the threads to 8 instead of 9.

I don't have a 980 Ti so I'm just taking other posts as a guide to this.
 
I used the 980 settings that was posted on post 53. lol.

Back upstairs I go. Sigh.
 
I've got 6x980 Ti's, 2 970s and 2 1080s and I'm doing around 25M points per day.

You're running 4 1080 Tis and 1 1080 and already at 28M points for today that isn't even over yet.

Wow. Just wow. Either 1080 Ti's are that much faster or those settings are phenomenal.
 
I used the 980 settings that was posted on post 53. lol

A bit of supposition but looking at the runtimes posted above, I think the work units must have grown considerably since last year.

Might be worth letting one run for comparison and the lolz.
 
Back
Top