WCG?

Toconator · Oct 5, 2025

Well it's finally cooled off a bit here after the driest Sept on record but will warm a bit next week so I'll prob go down to one F@H boxen for a few days. The WCG servers finally answered the comm so it was able to be installed in BOINC Mgr but it hasn't picked up any work yet for weeks. Is there no WU's ? My Server is going to crunch now for the duration of cooler weather and I might add another GPU to it but it's a shame to not get some BOINC on the CPU side. I mentioned in my other thread I prefer Health Sciences or maybe Climate Study projects. Any word on WCG? or should I shop for an alternative.

pututu · Oct 5, 2025

Toconator said:
Well it's finally cooled off a bit here after the driest Sept on record but will warm a bit next week so I'll prob go down to one F@H boxen for a few days. The WCG servers finally answered the comm so it was able to be installed in BOINC Mgr but it hasn't picked up any work yet for weeks. Is there no WU's ? My Server is going to crunch now for the duration of cooler weather and I might add another GPU to it but it's a shame to not get some BOINC on the CPU side. I mentioned in my other thread I prefer Health Sciences or maybe Climate Study projects. Any word on WCG? or should I shop for an alternative.

Visit the Jurisicalab website and click on "Operational Status". There have been issues over the past week or so. I've not crunched WCG since the pent in May.

Gilthanis · Oct 6, 2025

I've had work units waiting to upload for quite a while. WCG is certainly NOT the WCG we once loved. Sad to say, but DC is slim pickings for solid projects let alone medical/bio ones.

Toconator · Oct 7, 2025

Gilthanis said:
I've had work units waiting to upload for quite a while. WCG is certainly NOT the WCG we once loved. Sad to say, but DC is slim pickings for solid projects let alone medical/bio ones.

Sounds like they've fixed that as of today 10/7/2025 so you should be able to upload. Still fixing other stuff but hopefully WU's up soon.

AgrFan · Oct 8, 2025

I'm running FAH CPU units right now. Plenty of work there. Serious considering retiring from WCG. It's too frustrating.

Gilthanis · Oct 8, 2025

AgrFan said:
View attachment 758757

I'm running FAH CPU units right now. Plenty of work there. Serious considering retiring from WCG. It's too frustrating.

Yeah...since IBM let it go, its just been too much of a pain to fully support. I still leave most things connected but my BOINC goals have superseded things for a while. But those projects are drying up quick too. DC is in a very sad state for volunteer computing.

Toconator · Oct 10, 2025

AgrFan said:
View attachment 758757

I'm running FAH CPU units right now. Plenty of work there. Serious considering retiring from WCG. It's too frustrating.

Fair enuff but that would likely cause a heat issue for the HTPC so that's not an option for that boxen. Power bill efficiency also rears its ugly head for CPU WU's

AgrFan · Oct 16, 2025

AgrFan · Oct 19, 2025

October 18, 2025
- We are sending small batches of workunits out starting tonight with batch IDs in the range 9999900+ for MCM1 to test the new distributed partition-aware batch upserting app-specific create_work daemons. The few volunteers who get these workunits before we start releasing larger batches as we gain confidence that the new system is working as expected may notice these workunits have a much smaller number of signatures and run much faster than normal. These are still meaningful workunits, but key parameters such as number of signatures to test per workunit were reduced so we could get feedback quckly.
- Similar to ARP1, we have moved all workunit templating and preparation to WCG servers for MCM1. We did this for the MAM1 beta (beta30) already, but we were able to move the rendering of workunit templates per batch into the create_work daemon C++ code directly, where it consumes a protobuf schema from Kafka/Redpanda's schema registry that it then hydrates to produce all workunits for the batch according to the desired parameters it consumes from the "plan" topic via Kafka. Hence, "app-specific" above. Then, it updates the BOINC database in bulk instead of calling BOINC's create_work() function. Metadata is local, partitioned, replicated in Kafka for durability, each batch writes files to that nodes' 1/6th of the buckets from the BOINC dir_hier fanout directory and commits 1/6th of the batch records to the database in non-overlapping ranges per 10k workunits per batch.
- The new validators are working and deployed. In our new distributed, partitioned approach, validators process workunits local to their host ONLY, uploads are partitioned according to the fanout directory assigned by BOINC, routed to the correct backend node by HAProxy corresponding to the BOINC fanout buckets. We split the buckets between nodes, instead of using them to fanout across the filesystem and avoid massive numbers of files in a single BOINC upload path, we fan out across the cluster and read/write these buckets in tmpfs so Apache serves downloads and accepts uploads in-memory, validators read in-memory, Kafka/Redpanda gets a copy of uploads into a disk-persisted, replicated topic for durability so if a node goes down and we lose the in-memory cache of downloads and uploads, we can replay and recover.
- By subscribing to a Kafka topic containing the count of uploads, a reduction on upload events emitted to Kafka topics from the new file_upload_handlers for only the local buckets of that partition, file locations pertaining to a pair of workunits, and emits success or failure to another queue for downstream "assimilation". We have written and are testing a batch applier that collects successful validation events on each partition, and batch updates the BOINC database so that the transitioner and scheduler can work together to evaluate the state of those workunits. Once we are confident the batch updates work as expected from the applier, users should start seeing workunits pending validation clear to valid.
- We are not running file_deleter or db_purge at the moment, they need to be rearchitected to match the new setup, or at minimum assessed to make sure it makes sense to start them unchanged. We have no concerns about running out of space in the database or on disk at the moment, only making mistakes, so we will get around to assessing what if anything needs to change about file_deleter and db_purge soon but not now. Likely, they will also take advantage of per-workunit event data from Redpanda/Kafka instead of just talking to the BOINC database and operate on local partitions across the cluster. But as we are producing events for every workunit's full lifecycle to Kafka topics we have a level of visibility and control we were never able to achieve with the legacy system, and we were able to set up prometheus node_exporter, tap into docker stats endpoints per node across the cluster, and likewise for Redpanda/Kafka with the helpful https://github.com/redpanda-data/observability repo to get a Grafana dashboard going that will let us do many things, such as serve up server status pages, and improve the stats pages.

pututu · Oct 20, 2025

WCG was included in the BOINC Pent this year but many participants were not happy with its performance in handling out WUs esp over the weekend. From what I can find, next month SG WCG birthday challenge is cancelled.

I only ran WCG once this year. I ran a lot more when gpu app was available (OPNG) back then. Hoping that things will get better from now on. Fingers crossed.

Gilthanis · Oct 20, 2025

pututu said:
WCG was included in the BOINC Pent this year but many participants were not happy with its performance in handling out WUs esp over the weekend. From what I can find, next month SG WCG birthday challenge is cancelled.

I only ran WCG once this year. I ran a lot more when gpu app was available (OPNG) back then. Hoping that things will get better from now on. Fingers crossed.

View attachment 761105

WCG Birthday Party has been cancelled for multiple years in a row. Not sure why they bother keeping it on the calendar.

pututu · Oct 20, 2025

Gilthanis said:
WCG Birthday Party has been cancelled for multiple years in a row. Not sure why they bother keeping it on the calendar.

I'm guessing that SG already have the know-how on how to run this WCG birthday challenge if things happen to improve on WCG's end. Probably they put it on their event timetable as a constant reminder.

wareyore · Oct 20, 2025

Hope springs eternal.

AgrFan · Oct 21, 2025

October 21, 2025
- Finally stress testing rather than correctness testing.
- Sent a batch of 100,000 workunits (fast running, not full size in case something crashed.
- Thank you for your patience and continued support.

Jherek · Oct 22, 2025

Can report that I seem to be getting a full compliment of MCMs from World Community Grid, and some have successfully validated. Took longer than planned, but they do pretty well considering they are operating now without IBM running it as a tax dodge. Should bring most of my computers online in a few hours. WCG is back. Edit: just fired up another computer, and it is getting new tasks right away.

wareyore · Oct 22, 2025

I got nothing.

SmokeRngs · Oct 22, 2025

Just tried now and said there was no MCM work.

Gilthanis · Oct 22, 2025

With the sheer number of users on WCG, the odds are if you didn't try loading up when released you aren't going to get any now. Everyone vacuumed up all the work relatively quickly.

pututu · Oct 22, 2025

Reminds me of the WCG Penta last May. Gotta load up tasks with multiple instances and run update scripts just to fill up the work cache. Probably some experience crunchers are doing that.

bluestang · Oct 22, 2025

pututu said:
Reminds me of the WCG Penta last May. Gotta load up tasks with multiple instances and run update scripts just to fill up the work cache. Probably some experience crunchers are doing that.

Don't blame me as I've only been running it sparingly since no OPNG work

Gilthanis · Oct 22, 2025

I just have it on autopilot. Either I pick up work or I don't...

Toconator · Oct 22, 2025

Gilthanis said:
I just have it on autopilot. Either I pick up work or I don't...

Me too. Guess I'm one of the lucky ones. Running MCM now...

AgrFan · Oct 22, 2025

I'm staying on FAH until the new WCG environment is stable and everything is working properly. Most likely another few weeks.

I recently picked up another Dell Inspiron desktop (i5-10400, 12GB RAM) off eBay. Threw in a WD Blue 80GB hard drive with Ubuntu already installed. Loaded the FAH Linux client with no issues once I figured out how to do it. It's getting 105K+ PPD with 5 cores enabled. Turned off hyperthreading and turbo boost (lower temps) in the BIOS. HT didn't seem to improve performance that much.

https://folding.extremeoverclocking.com/user_summary.php?s=&u=237271

Toconator · Oct 23, 2025

AgrFan said:
I'm staying on FAH until the new WCG environment is stable and everything is working properly. Most likely another few weeks.

I recently picked up another Dell Inspiron desktop (i5-10400, 12GB RAM) off eBay. Threw in a WD Blue 80GB hard drive with Ubuntu already installed. Loaded the FAH Linux client with no issues once I figured out how to do it. It's getting 105K+ PPD with 5 cores enabled. Turned off hyperthreading and turbo boost (lower temps) in the BIOS. HT didn't seem to improve performance that much.

https://folding.extremeoverclocking.com/user_summary.php?s=&u=237271

Yeah, once upon a time in the 4 core days it was estimated that Hyperthreading = 1/3 of a core under most conditions. Prob not worth the heat & power draw CPU folding nowadays.

SmokeRngs · Oct 23, 2025

I didn't leave the client trying to grab work because an hour after posting I was heading out of town until now. Didn't know how it was going to go and wanted to keep an eye on it.

As for hyperthreading I limit the number of cores even on my Ryzen 5800x to keep it away from running on the logical threads. It's not worth the extra power usage.

AgrFan · Oct 25, 2025

October 24, 2025
- We have paused uploads and release of test batches while we work on the validation throughput issue.
- We think we have identified the root cause of the low validation rate for the MCM1 test batches, and we will send a few more test batches to confirm the fix does works.
- If this is the final fix and we see the expected validation rate for pairs of MCM1 uploads for new batches, we will replay the Kafka consumer on the upload events fired for test batches received earlier in the week, and this should idempotently allow the new batch assimilator to process those validations and assign credit.
- If the above goes well, we will schedule regular MCM1 batches to resume instead of the test batches.
- As volulnteers have noted, we have not yet reconciled uploads of regular MCM1 results submitted before we began sending test batches, and before the migration, but we have those files and will be able to do this in a batch update once the path for new workunits is working as described above.
- Naturally, we will resume ARP and MAM only after these issues are fully resolved.

AgrFan · Oct 28, 2025

October 28, 2025
- We have fixed the main validation throughput issues with the new Kafka-based workflow, and reprocessed uploads from around the time we started sending out test batches. We are reviewing the Kafka topics and BOINC database to see if the volunteer reports of both results for a test workunit uploaded but no validation/assimilation occured during the reprocess is another bug to fix, and if so is it severe enough to block regular MCM1_024% batch distribution until resolved.
- In reviewing the transitioner implementation (which we intended to start yesterday to begin triggering resends for test batches), we found the new paradigm for storing configuration details that are required to populate resends in the result table needed to be incorporated into key functions. We are testing these relatively minor changes to the transitioner now.
- Our plan is to deploy the updated transitioner, verify resends work, verify it times out expired workunits, and depending on how that and the review of "missed validations" noted above goes we may then be ready to resume MCM1 batches in the normal range.
- Regarding uploads that span the downtime for migration, we will reconcile validation and credit for these workunits as soon as the production path for MCM1 described above is running. We should be able to use the new components to do that, after walking the filesystems where those uploads live, double-checking the list that need validation and crediting in the database, and pursuing a similar "reprocessing" path which worked well to re-attempt validation and crediting of the test MCM1 batches.
- Then, we will begin testing beta30/MAM1, and ARP1 using the new system, which we expect to progress much faster now that we have ironed out the logic with MCM1.
- tats updates will be restarted as soon as the MCM1 workflow is stable, that will include the daily export to https://download.worldcommunitygrid.org/boinc/stats/

wareyore · Nov 6, 2025

Looks like I pulled some tasks today. We'll see if they make it through.

Toconator · Nov 7, 2025

wareyore said:
Looks like I pulled some tasks today. We'll see if they make it through.

They should. Been crunching steady for about a week . There was a brief time at the start where there was a long queue for upload but it cleared up after a couple days. Still downloaded and ran while the backlog built up tho. Been clear sailing ever since. I think they've finally got it figured out.

AgrFan · Nov 7, 2025

I switched my machines to FAH CPU work. WCG is not stable right now. Lots of scary posts in the WCG forums from the WCG Tech team (dylanht). I'm not confident they really know what they are doing. I'm staying away until my returned work is fully processed properly. I don't expect that to be any time soon.

wareyore · Nov 8, 2025

I switched to the BG Sprint and will switch back and reduce output Sunday.

Toconator · Nov 9, 2025

AgrFan said:
I switched my machines to FAH CPU work. WCG is not stable right now. Lots of scary posts in the WCG forums from the WCG Tech team (dylanht). I'm not confident they really know what they are doing. I'm staying away until my returned work is fully processed properly. I don't expect that to be any time soon.

Yes, looks like I called it too soon. Not long after my post I amassed a huge line-up of "waiting to report" with all crunching stopped. The queue is empty now so they were all sent but nothing happening atm.

wareyore · Nov 9, 2025

BG Sprint finished today. I added over 100M to SRBase this weekend. I'm going to freelance some BOINC projects off and on for a bit. No challenges or events until next year. I have WCG tasks sitting from earlier this week. We'll see how they do over the next couple of days.

Gilthanis · Nov 10, 2025

wareyore said:
BG Sprint finished today. I added over 100M to SRBase this weekend. I'm going to freelance some BOINC projects off and on for a bit. No challenges or events until next year. I have WCG tasks sitting from earlier this week. We'll see how they do over the next couple of days.

Should we bring back the challenges calendar? Or is it still pretty slim pickings in regards to challenges? Last we left off with PG challenges, BG challenges, annual FAH challenge, and Pentathlon.

wareyore · Nov 10, 2025

Gilthanis said:
Should we bring back the challenges calendar? Or is it still pretty slim pickings in regards to challenges? Last we left off with PG challenges, BG challenges, annual FAH challenge, and Pentathlon.

It can't hurt, but, yes, there isn't much out there and participation outside F@h is pretty small. Is there enough interest outside of F@H to warrant the effort? I look at other sites for their calendars and updates to plan my own activity.

Agreed that here F@H in the winter and the Pentathlon are the two challenges the team consistently participates.

Gilthanis · Nov 10, 2025

wareyore said:
It can't hurt, but, yes, there isn't much out there and participation outside F@h is pretty small. Is there enough interest outside of F@H to warrant the effort? I look at other sites for their calendars and updates to plan my own activity.

Agreed that here F@H in the winter and the Pentathlon are the two challenges the team consistently participates.

We sometimes take interest in the PG series but the team traditionally really focuses on medical. With so many projects ending or going down the toilet, it really is tough to find good ones to really compete in. Networking capabilities and storage systems really haven't kept up with the increase in computing capabilities over the years. Boincgames still seems like it is stuck in BETA every time I look into it. With how long it has taken to essentially clone formula-boinc, I've lost a lot of interest in it. However, if the team as a whole wanted to take it serious, I would follow it a bit more. We just don't have enough people that want to DC these days. Especially when things aren't well ironed out.

Jherek · Nov 11, 2025

The latest:
(I'm getting plenty of MCM tasks, but only about 1 in 10 or so are validating.)
November 11, 2025

Database maintenance over Friday/Saturday completed without issue. We have resolved an issue with the backup scripts, effectively increased memory used to service database queries and added some new indices. We expect better performance from the BOINC database going forward.
However, the disk remains slower than initial benchmarking when we stood up the database. We will monitor and reach out to hosting to see if the Ceph placement group expansion (that caused the stuck blocks of that particular disk when the placement group the result table lives on) got stuck in a "peering" state. We were informed that we should expect temporary, possibly intermittent slow IO during this Ceph maintenance window. If we can get faster disks for the BOINC database (which would require restoring the database to a new volume as we did to migrate) we will consider a maintenance window. Right now, we are optimistic the issues revealed in the new system by hanging database queries and database crashes can all be resolved with patches the new BOINC daemons, and current performance will be sufficient.
As mentioned, this event identified several issues with the new BOINC daemons.
MCM1 workunit creation proceeds in the Kafka topic even though the database is down, the mcm1_create_work daemon for it's Kafka partition on science01...science06 tries to commit it's part of the batch, database isn't there, so it doesn't do anything, but it does commit it's offset/pointer into the batch plan topic and move on to consume the next batch plan. That means every 10-15m while the database is down, a batch is effectively skipped. We were able to fix that, and have restarted MCM1 batch creation at roughly 5:00 p.m. EST, November 10th, 2025.
We believe we have finally architected a fix for the pending validation backlog issue. This requires some non-trivial plumbing in the MCM1 batch assimilator, a Kafka connector deployed on the BOINC database node, and transitioner code changes.
Workunit supply may remain artificially lower while we roll out the new batch assimilator builds and monitor the transitioner -> Kafka event consumption and result table interaction.
We were able to resolve the issue with computing preferences not being updated from the website to BOINC client and vice versa. Generally, when the BOINC database goes down, so does the event listener that handles these messages on the webserver.
We are still working on resolving the validation backlog from over the break, with the result table bricked during the Ceph maintenance we architected a "trust the filesystem" solution, and we are hopeful that this issue will be resolved this week.
MAM1 was initially planned to be resumed in beta30 last week, to see if 7.07 fairly schedules work and respects --nthreads, which is a blocking issue in promoting the beta application to production. Depending on the error rate and behaviour on BOINC clients, we would then consider the stable code paths for the first production batches. Given our increased control over batch parameters with the new Kafka topic that uses a protobuf schema to fill out the workunit and result table entires, we intend to run work in production on Linux as soon as the beta30 application is stable with an error rate lower than MCM1 excepting the GLIBC dependency, which is typically the only repeated error we see from clients on the current LibTorch code path. We will then rely on iterating the beta30 application to 7.08 and 7.09 to get GPU and Windows support, and Parquet IO for input and uploaded results.

AgrFan · Nov 11, 2025

WCG is a complete mess. They are over their heads. It's obvious their hosting infrastructure is inadequate to run the Grid properly. It baffles me how everything was working for months and now there are major issues after a scheduled weekend data center cutover.

Is it really this difficult to support a BOINC project? I highly doubt other projects are this complicated.

Dr Jurisica is a "Visiting Scientist, IBM Centre for Advance Studies, IBM Toronto Lab" per the UHN Research website. I wonder if this has anything to do with the direction they are taking by continuing with the IBM architecture.

The fact they are performance tuning the BOINC database in production is a major red flag. Indices should have been designed and tested first in a development system. This is an indicator they are flying by the seat of their pants. This is not good.

Gilthanis · Nov 11, 2025

AgrFan said:
WCG is a complete mess. They are over their heads. It's obvious their hosting infrastructure is inadequate to run the Grid properly. It baffles me how everything was working for months and now there are major issues after a scheduled weekend data center cutover.

Is it really this difficult to support a BOINC project? I highly doubt other projects are this complicated.

Dr Jurisica is a "Visiting Scientist, IBM Centre for Advance Studies, IBM Toronto Lab" per the UHN Research website. I wonder if this has anything to do with the direction they are taking by continuing with the IBM architecture.

The fact they are performance tuning the BOINC database in production is a major red flag. Indices should have been designed and tested first in a development system. This is an indicator they are flying by the seat of their pants. This is not good.

There is probably a lot to do with her affiliation with IBM. They certainly are under budget for a project of this scale. WCG has one of the largest user bases out there to support. Probably only rivaled by Folding@home which also probably has a much better support line. The fact that WCG continues to try and embrace all the changes IBM made is just crazy to me. Rip the bandaid off already.

Jherek · Nov 22, 2025

Looks like they are getting WCG sorted out finally. In the last 3 days or so, the number of valided results has increased greatly. I've got about 5600 MCM results pending validation. I look at it as unintended bunkering. Will let you know when it is running at full steam again, but they are a good 3/4's functional, it appears.
Latest news:
November 21, 2025

We are testing required changes to the scheduler and feeder to resolve the corrupt/truncated "os_name" and "os_version" entries such as "W"/"W" for some hosts, as reported by users in the forums, and to resolve frequent "stuck" feeder states where "No tasks available for platform" is logically incorrect by hr_class, yet the tasks populating the feeder shared memory segment remain unassigned by the scheduler passes and manual intervention is required to get work flowing again.
Passes through uploaded results that have not been credited by the new system will begin next week, to backfill missing credits. We have been performing dry runs to establish correctness. As a precaution, we will be running the program in multiple passes starting with the oldest uploads, to the most recent.
Volunteers have reported that the API sometimes shows an invalid state for multiple results, where only one result is marked valid, which should be impossible. Preliminary investigation points to the new MCM1 assimilation procedure interacting with the transitioner. The new MCM1 assimilation procedure acts to validate and credit all in progress results for a workunit as soon as it has consumed any pair/quorum of files, whether original 0 and 1 results or resends 2 and up, that have passed validation. We will review this issue in full and report our findings, whether a bug in the assimilator, or poorly modeled interaction between assimilator transactions and the transitioner, which is where we expect to find an explanation.

WCG?

[H]ard DCOTM January 2026

[H]ard DC'er of the Year 2021

[H]ard|DCer of the Year - 2014

[H]ard DCOTM January 2026

[H]ard DCOTM x2

[H]ard|DCer of the Year - 2014

[H]ard DCOTM January 2026

[H]ard DCOTM x2

[H]ard DCOTM x2

[H]ard DC'er of the Year 2021

[H]ard|DCer of the Year - 2014

[H]ard DC'er of the Year 2021

HDCOTY 2023

[H]ard DCOTM x2

Weaksauce

HDCOTY 2023

[H]ard|DCer of the Month - April 2008

[H]ard|DCer of the Year - 2014

[H]ard DC'er of the Year 2021

[H]ard|Gawd

[H]ard|DCer of the Year - 2014

[H]ard DCOTM January 2026

[H]ard DCOTM x2

[H]ard DCOTM January 2026

[H]ard|DCer of the Month - April 2008

[H]ard DCOTM x2

[H]ard DCOTM x2

HDCOTY 2023

[H]ard DCOTM January 2026

[H]ard DCOTM x2

HDCOTY 2023

[H]ard DCOTM January 2026

HDCOTY 2023

[H]ard|DCer of the Year - 2014

HDCOTY 2023

[H]ard|DCer of the Year - 2014

Weaksauce

[H]ard DCOTM x2

[H]ard|DCer of the Year - 2014

Weaksauce