World Community Grid

SmokeRngs

[H]ard|DCer of the Month - April 2008
Joined
Aug 9, 2001
Messages
17,469
Lol a bird in the hand is worth two in the bush. Also have you tried to run more than two in Windows? I would keep increasing number of tasks till you reach 100% gpu usage in afterburner and you should get into that 150 range your talking about and get some good points. I mean i assume good points since i don't have a RX570 lol. Best utilization maybe
In Windows the power usage didn't change when I was running a single WU or two. The only difference was there was very little downtime on the GPU for processing. When it was processing on the GPU the GPU was at 100% utilization and I was only running more than one work until so when it was doing the CPU processing the GPU wasn't idle or at least not very often. Two possible explanations for this. The Polaris architecture is more efficient at running these or it's simply not powerful enough not to be running at 100% utilization for the work units. It's exactly the same under Manjaro. That's why I haven't bothered to set it up to run a third work unit simultaneously. When crunching a work unit the GPU is at 100% use and two work units are set to run so the GPU downtime while CPU crunching is kept to a minimum.

That said, the points look good enough. The GPU points are blowing away what my 5800x and 2600x have been doing combined on CPU. It will take a few days to stabilize the points output but so far for the day I've seen a 400k+ point increase at the halfway mark for the day's stats and the GPU was not crunching anything for several hours of that half day's stats. Not bad for an extra 50w-55w of power over an idle GPU (idle is usually around 32w-33w for the GPU and currently using 85w-89w while crunching.)
 

SmokeRngs

[H]ard|DCer of the Month - April 2008
Joined
Aug 9, 2001
Messages
17,469
Looks like with less than a full day of GPU crunching (plus several hours downtime while gaming) a lowly RX570 is putting out about 1.3 million points. After stabilizing and maybe a bit less gaming it could be doing around 1.5-1.6 million a day I figure. Normal output for the 2600x and 5800x combined is around 200k-220k per day.
 

pututu

[H]ard DCOTM x2
Joined
Dec 27, 2015
Messages
2,173
May 2021 update: OpenPandemics

27 May 2021

Summary

The project has added GPU power to the existing strong CPU power that supports research for potential COVID-19 treatments.



OpenPandemics.jpg
Background
OpenPandemics - COVID-19 was created to help accelerate the search for potential COVID-19 treatments. The project also aims to build a fast-response, open source toolkit that will help all scientists quickly search for treatments in the event of future pandemics.
In late 2020, the researchers announced that they had selected 70 compounds (from an original group of approximately 20,000) that could be promising to be investigated as potential inhibitors of the virus that causes COVID-19. Lab testing is currently underway for 25 of these compounds.
GPU work units
We recently completed beta testing and have released GPU work units for this project. Currently, the project is sending out 1,700 new work units every 30 minutes. We expect to be sending out GPU work at this pace for the foreseeable future.
We will continue to create and release regular work units that use CPU power. This will help keep the work going at a good pace, and will ensure that everyone who wants to contribute computing power can participate.
Stress test of World Community Grid's technical infrastructure
Earlier this month, the World Community Grid tech team wanted to determine the upper limit of computational power for the program, and to find out if the current infrastructure would be able to support the load if we provided enough GPU work to meet the demand.
The scientists for OpenPandemics - COVID-19 provided us with approximately 30,000 batches of GPU work (equal to the amount of work done in about 10 months by CPUs), and we let these batches run until they were fully processed.
The stress test took eight days to run, from April 26 through May 4, 2021. Thank you to everyone who participated in this important test. We expect to have a forum post from the tech team soon to summarize what they learned about World Community Grid's current capabilities and limitations.
Current status of work units
CPU
  • Available for download: 1,322 batches
  • In progress: 6,240 batches
  • Completed: 44,810 batches
    5,596 batches in the last 30 days
    Average of 186.5 batches per day
  • Estimated backlog: 7.1 days*

    *The research team is building more work units.
GPU
  • In progress: 2,391 batches
  • Completed: 37,569 batches
    35,296 batches in the last 30 days
    (largely due to the stress test)
    Average of 1,176.5 per day
    (again, largely due to the stress test)
 

pututu

[H]ard DCOTM x2
Joined
Dec 27, 2015
Messages
2,173
Open Pandemics GPU stress test (Apr) update. The WCG team has identified bottlenecks in their infrastructure and discusses potential future changes to enhance the overall performance.

OpenPandemics GPU Stress Test​

Background
In March 2021, World Community Grid released a GPU version of the Autodock research application. Immediately, there was a strong demand for work from volunteer machines; in fact, there was considerably higher demand than supply of GPU work units.

The World Community Grid tech team wanted to determine the upper limit of computational power for the program, and to find out if the current infrastructure would be able to support the load if enough GPU work was provided to meet the demand.

Additionally, the OpenPandemics - COVID-19 scientists and their collaborators at Scripps Research are exploring novel promising target sites on the spike protein of the SARS-CoV2 virus that could be vulnerable to different ligands, and they were eager to investigate this target as quickly and thoroughly as possible. They provided World Community Grid with approximately 30,000 batches of work (equal to the amount of work done in about 10 months by CPUs), and we let these batches run until they were fully processed.

The stress test took 8 days to run, from April 26 through May 4, 2021.

The results outlined below represent World Community Grid's current technical capabilities. This information could help active and future projects make decisions about how they run work with us, keeping in mind that they have varying needs and resources.

Summary
The key findings of the stress test revealed the following points:
  • We had previously determined that in 2020 the volunteers contributing to World Community Grid delivered the computing power similar to a cluster of 12,000 computing nodes running at 100% capacity 24x7 for the entire year where node each contains 1 Intel Core i7-9700K CPU @ 3.60GHz processor from CPUs only. We can now further state that the volunteers are able to provide an additional 8x that computing power from GPUs.
  • The current World Community Grid infrastructure is able to meet the load generated by this computing power with this particular mix of research projects. However, the infrastructure was pushed to its limit, and any further growth or possibly a different mix of research projects would require increased infrastructure.
  • The OpenPandemics - GPU workunits consisted of many small files that created high IO load on both the volunteers' computers and the World Community Grid infrastructure. If we were able to combine these small files into a few larger files, this may reduce the IO load on both the volunteers' computers and on the World Community Grid infrastructure. This change would likely allow the infrastructure to handle a greater load from the volunteers and improve the experience for the volunteers.
  • On the back side of the pipeline, backing up the data and sending results to Scripps server does not appear to be a bottleneck. However, running OpenPandemics at a higher speed will cause the research team to focus the majority of their time and energy on preparing input data sets and archiving returned data rather than performing analysis of the results and moving the interesting results to the next step in the pipeline. As a result, the project will remain at its current speed for the foreseeable future.
  • Now that we are able to quantify the capabilities of World Community Grid, scientists can use this information as a factor in their decision-making process in addition to their labs' resources and their own data analysis needs.
Bottlenecks identified
During the test, there were three major issues where the system became unstable until we could identify the bottlenecks and resolve them.

Prior to Launch
Before the launch of the stress test when we were creating the individual workunits to send to volunteers, we exhausted the available inodes on the filesystem. This prevented new files or directories from being created on the filesystem, and as a result it caused an outage for our back-end processes and prevented results being uploaded from volunteer machines. We resolved this issue by increasing the max number of inodes allowed and then we added a monitor to warn us if we start approaching the new limit.

Launch
Shortly after releasing the large supply of workunits, we experienced an issue where the connections from our load balancer to the backend servers reached their maximum configured limits and blocked new connections. This appears to be caused by connections being created by clients that opened connections and stalled out or very slowly downloaded work. We implemented logic in the load balancer to automatically close those connections. Once this logic was deployed, the connections from the front-end became stable and work was able to flow freely.

Packaging ramps up
The next obstacle occurred when batches started to complete and packaging became a heavy load on the system. Several changes were made to improve this issue:

The process of marking batches completed in order to start the packaging process originally was run only every 8 hours. We changed that so that batches would be marked complete and packaged every 30 minutes.
Our clustered filesystem had configuration options that were sub-optimal and could be improved. We researched how the configuration could be improved in order to increase the performance and then made those changes.

Following these changes, the system was stable even with packaging and building occurring at a high level. The time to package a batch dropped from 9 minutes to 4.5 minutes and the time to build a batch dropped by a similar amount. Upload and downloads performed reliably as well. However, we were only able to run in this modified configuration during the final 12 hours of the stress test. It would have been useful to run with these settings for longer and to confirm that they resulted in a stable system over an extended period of time.

Potential future changes to further enhance performance

Clustered filesystem tuning

The disk drives that back our clustered filesystem only reached about 50% of their rated throughput. It is possible that the configuration of the filesystem could be further optimized to further increase performance. This is not certain, but if the opportunity exists to engage an expert with deep experience optimizing high performance clustered filesystems using IBM Spectrum Scale, this could be a worthwhile avenue to explore.


Website availability
We identified an issue where high IO load on the clustered filesystem will cause problems with the user experience and performance of the website. These two systems are logically isolated from each other, but share physical infrastructure due to system design. This degradation of the performance of the website should not have happened, but it clearly did. We want to determine for certain why this issue exists, but at this time we believe that this issue stems from the way our load balancer, HAProxy, is configured.

We have HAProxy running as a single instance with one front-end for all traffic passing the data back to multiple back-ends for each of the different types of traffic. We could instead run HAProxy with multiple instances on the same system provided that there are separate IP address for each instance to bind to. If we were to run a separate instance for website traffic and a second instance for all BOINC traffic, we expect that this would allow website traffic to perform reliably, even if the BOINC system was under heavy load.


Thank you to everyone who participated in the stress test.
 

AgrFan

[H]ard DCOTM October 2012
Joined
Sep 29, 2007
Messages
551
No, this does not include resends. Final tasks will be sent out towards the end of July. MIP should be finished by July 31st.
 

pututu

[H]ard DCOTM x2
Joined
Dec 27, 2015
Messages
2,173
WCG Jul 29 OpenPandemics project update

29 Jul 2021

Summary

The research team continues their data analysis, and lab testing is ongoing for potential COVID-19 treatments.



OpenPandemics.jpg
Background
OpenPandemics - COVID-19 was created to help accelerate the search for potential COVID-19 treatments. The project also aims to build a fast-response, open source toolkit that can help all scientists quickly search for treatments in the event of future pandemics.
In late 2020, the researchers announced that they had selected 70 compounds (from an original group of approximately 20,000) that could be promising to be investigated as potential inhibitors of the virus that causes COVID-19. Lab testing is ongoing for 25 of these compounds.
Project update from researchers
The research team gave us an official update earlier this month. Highlights from the update include:
  • What they learned about their own workflow and tools during this spring's stress test, and how this could help current and future research
  • More information about exactly what they're analyzing with the help of donated computing power
  • Details about the ongoing lab testing of compounds that could be potential treatments for COVID-19
Since the update was published, they've also begun looking ahead to possibly testing a second set of compounds. They will share further details if and when this happens.
Current status of work units
CPU
  • Available for download: 3,601 batches
  • In progress: 2,245 batches
  • Completed: 53,615 batches
    2,950 batches in the last 30 days
    Average of 98.0 batches per day
  • Estimated backlog: 36.6 days
GPU
  • Available for download: 6,338 batches
  • In progress: 5,441 batches
  • Completed: 60,876 batches
    9,520 batches in the last 30 days
    Average of 317.0 per day
  • Estimated backlog: 20.0 days

Click here to learn more about World Community Grid's monthly project updates.
 
  • Like
Reactions: EXT64
like this

pututu

[H]ard DCOTM x2
Joined
Dec 27, 2015
Messages
2,173
WCG Aug 31 OpenPandemics project update

31 Aug 2021

Summary

The research team ran a group of high-priority batches on World Community Grid this past month.



OpenPandemics

Background
OpenPandemics - COVID-19 was created to help accelerate the search for potential COVID-19 treatments. The project also aims to build a fast-response, open source toolkit that can help all scientists quickly search for treatments in the event of future pandemics.
In late 2020, the researchers announced that they had selected 70 compounds (from an original group of approximately 20,000) that could be promising to be investigated as potential inhibitors of the virus that causes COVID-19. Lab testing is ongoing for 25 of these compounds.
Accelerated World Community Grid work on four binding sites
The research team recently sent 600 batches of accelerated work to World Community Grid. These batches containined simulated experiments on several important binding sites and additional compounds that could be promising as potential treatments.
Sulfonyl fluoride (SuFEx) is a molecule that is known to react and covalently bind to lysine (K) and tyrosine (y) amino acid sidechains in proteins.
The researchers selected four possible binding sites in the main protease (Mpro) of SARS-CoV-2 that are adjacent to K and Y sidechains. They then prepared about 600 packages to be docked against nearly 300,000 molecules from Enamine that contain SuFEx. These batches were recently completed on World Community Grid
The most promising molecules will be purchased or synthesized, then shipped to the laboratory of Prof. Chris Schofield at the Chemistry Research Laboratory at the University of Oxford to verify experimentally if they bind to SARS-CoV-2 Mpro.
Current status of work units
CPU
  • Available for download: 2,110 batches
  • 28-day average of 94.0 batches per day
  • Estimated backlog: 21.5 days
GPU
  • Available for download: 3,357 batches
  • 28-day average of 398 batches per day
  • Estimated backlog: 8.8 days
 
  • Like
Reactions: EXT64
like this

pututu

[H]ard DCOTM x2
Joined
Dec 27, 2015
Messages
2,173
Let's hope for better project management in the future and really listening to their donors especially the small donors (FAH anyone?) which represent the bulk of the population. I know they probably will suck up to the corporate donors with the most money. Yeah, money rules but who know one day the corporate donors might just pull out but the small donors are likely to be around....;)
 
  • Like
Reactions: EXT64
like this

EXT64

[H]ard|DCer of the Year 2020
Joined
Mar 27, 2013
Messages
674
Yep, hopefully they will add some folks interested in improving the 'donor relations' side. As much as I hate to say it, that is as important as having really smart researchers for distributed computing.
 

jojo69

[H]F Junkie
Joined
Sep 13, 2009
Messages
10,901
We got any competitions to get us motivated coming up? Our hemisphere is cooling off.
 
  • Like
Reactions: EXT64
like this

pututu

[H]ard DCOTM x2
Joined
Dec 27, 2015
Messages
2,173
We got any competitions to get us motivated coming up? Our hemisphere is cooling off.
There will be the 17th WCG birthday challenge organized by Seti Germany. It will commence on Nov 16 00:00:00 UTC and ends on Nov 22 33:59:59 UTC.
You can bunker as much as you like as long as the tasks returned during the challenge period have not expired.
Don't know if it will include the gpu tasks due to limited task availability but pays very well and difficult to bunker as the tasks are released at some random interval but twice per hour.

The other challenge will be in Feb next year in folding at home race against TAAT. This needs to be confirmed.

Come in May 2022, there is the BOINC pentathlon, so need all hands on deck ;)
 
  • Like
Reactions: EXT64
like this

pututu

[H]ard DCOTM x2
Joined
Dec 27, 2015
Messages
2,173
Cancer, cancer, cancer. Such a dreaded word. How can we help? Well one way is to allocate your unused CPUs for cancer research via joining our team in WCG project. Today, WCG updated the milestone in which volunteers have tested 15 trillions signatures associated with common cancers such as lung cancer, ovarian cancer, and sarcoma. Read on this article below taken directly from WCG news website.

Volunteers tested 15 trillion signatures for Mapping Cancer Markers project​


In this update, coinciding with the 17th anniversary of WCG, the research team summarizes the contribution of all volunteers to each type of tumor and signature size that Mapping Cancer Markers has studied so far.


Project: Mapping Cancer Markers

Published on: 16 Nov 2021



Background
Mapping Cancer Markers (MCM) aims to identify the markers (sometimes referred to as signatures) associated with various types of cancer. The project is analyzing millions of data points collected from thousands of healthy and cancerous patient tissue samples. So far, these have included tissues with lung cancer, ovarian cancer, and sarcoma.
Looking at the prodigious volunteer contribution
Each Mapping Cancer Markers work unit tests multiple groups of biomarkers against a cancer dataset for use as diagnostic or prognostic signatures. Currently, that dataset is our sarcoma dataset. MCM has explored three datasets so far: lung cancer, ovarian cancer, and sarcoma. For lung and ovarian cancer, MCM explored different signature length (i.e., different number of genes included in the signature), while for sarcoma the signatures have all the same length but different compositions (i.e., they include markers measured with different techniques, in variable percentages). At the time of this report, the volunteers analyzed about 800 million work units.
2021_11_mcm1_figure1.png
Figure 1. Number of completed work units per cancer type and signature size.

A work unit will test signatures of a specific size (number of biomarkers) against its dataset. Added together, World Community Grid members have tested about 15 trillion signatures, a number that would have been unimaginable to test without your support.
2021_11_mcm1_figure2.png
Figure 2. Evaluated signatures by dataset (and size) for all tumours (A) and for ovarian cancer only (B).

The compute time required per signature in any given work unit depends on the signature size and the dataset. Since we try to keep the total amount of computation per work unit constant, the number of signatures per work unit will also vary with the signature size and dataset.
In most datasets, larger signatures take more compute time than shorter signatures. Our lung dataset generally follows this pattern, while for our ovarian dataset, the opposite is true. For the ovarian cancer dataset, MCM tested prognostic signatures that predicted short or long survival times. This task was inherently more difficult than the diagnostic tasks in lung (cancer/no cancer) or sarcoma (distinguishing sarcoma subtypes). Because of this difficulty, MCM could not compute nearly as many signatures per ovarian work unit as it did for other datasets, regardless of signature size.
2021_11_mcm1_figure3.png
Figure 3. Signatures evaluated per work unit, by dataset (and size), for all tumours (A) and (B) specifically for ovarian cancer.

Our next step will be to evaluate how the different composition of sarcoma signatures affect their predictive ability.
We are grateful to everyone who is supporting Mapping Cancer Markers, and all the important projects on World Community Grid.
If we are celebrating the WCG 17th anniversary it is only because of your dedication.
Thank you!
 

bluestang

Limp Gawd
Joined
Dec 14, 2018
Messages
261
To the guys running OpenPandemics on your 3000 series GPU...can you get full power usage out of the card when running multiple work units at once? Like if you set 100% power in Precision X1 or AfterBurner does GPU-Z, HWiNFO64, PX1,or AB show 100% power actually being used?

I think its a driver issue (maybe?), as my 2080ti is now doing it after I update driver today...haven't done that in quite some time. 70% power meant 70% power, and 100% power went to 100% power..now 100% power never gets out of the 70s% power.

Yeas I've done DDU and clean installs and reboots.

I notices it when I put my kids 3000 series GPUs on this project for when they are not using their PCs.
 

pututu

[H]ard DCOTM x2
Joined
Dec 27, 2015
Messages
2,173
bluestang, the opng tasks never got to 100% power on the 1000,2000 or 3000 series cards that I have tested whether on my amd or intel system. I think the opng algorithm is not well written to fully utilize the gpu unlike folding. You probably already know this when you run one task per gpu, there is quite a considerable time the task spends on cpu. I think the programmer still has a long way to optimize the codes.

I use MSI afterburner and set 70% power limit (the opng task never push the card to hit power limit) and also set the voltage-frequency curve to further reduce power consumption.

Maybe things might be different with low end/older generation cards as reported here: https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,43547

Haven't tried linux drivers. Let us know if you do find anything interesting.
 

bluestang

Limp Gawd
Joined
Dec 14, 2018
Messages
261
I'm not talking about the GPU Usage %, I know about that with these WUs that's why most of us are running concurrent WUs to keep the GPU fully busy. I'm talking about the Power/Wattage the GPU is using now after a driver update...like NVIDIA changed their Power settings or something in the driver. I could get my 2080ti to use 250 or more watts easily if I cranked my Power slide up to 85% or higher...now it does nothing in Precision X1. Old driver temps for were 64-69 core and 70-74 mem, but new driver the core is a couple degrees lower and the mem won't get out of the 50s/60s and WUs taking much longer. Same exact settings...makes no sense.

And of course I forget to note down what driver I was using on my 2080ti before I updated it lol.

I also noticed this phenomenon when I installed a 3070ti FTW3 that never got the memory temps above the 50s even at 100% power level. And was taking longer to finish tasks than a 3060ti FTW3 running the same settings. Like these newer drivers are gimping the GPUs from reaching their full power limit by 25% or more.

I've read something about 3D Power in NVIDIA Control Panel being changed on newer drivers, but that did nothing. Also heard about NVIDIA Profile Inspector and changing the 'CUDA - Force P2 State' from On to Off but haven't tried that yet.

Figured and was hoping someone here has seen this issue maybe.

PS... congrats on your 4th place in the WCG Birthday!
 

pututu

[H]ard DCOTM x2
Joined
Dec 27, 2015
Messages
2,173
I'm still using 471.68 driver. Even with this older driver I could not get the gpu to run close to full power or even close to 70% power limit which I usually do. On my 3070ti XC, I downclock my core to around 1750mhz and consume ~145W, which is half of the gpu max power of this card (290W). At least the card is running cool around 50+°C which I'm happy with. I haven't tried Nvidia Inspector on this yet but seems like it could give better control over the power settings. My 3070ti XC somehow had a different issue which doesn't allow me to go below 818mV. Also someone there in that same thread suggested to use Nvidia Inspector.

Thanks. Was hopping you guys could also participate in the WCG birthday challenge for the benefit of cancer research. ;)
 

motqalden

[H]ard|DCOTM x4
Joined
Jun 22, 2009
Messages
2,346
On my 3090 with 472.12 at 70% i was pulling 250w. Set to 100 and it fluctates between 330-350w. I had heard old versions of afterburner did not work properly for 3000 series cards for power so maybe check that.
 

bluestang

Limp Gawd
Joined
Dec 14, 2018
Messages
261
On my 3090 with 472.12 at 70% i was pulling 250w. Set to 100 and it fluctates between 330-350w. I had heard old versions of afterburner did not work properly for 3000 series cards for power so maybe check that.
So you're using MSI AfterBurner then? I'm using EVGA Precision X1 since I have EVGA cards... I'll uninstall it and try AB when I get a chance if so.
 

bluestang

Limp Gawd
Joined
Dec 14, 2018
Messages
261
Okay uninstalled the 496.76 WHQL DRD that was the newest according to NVIDIA driver site and downloaded 472.12. Ran DDU as always and reinstalled using the 472.12.

All good now! Must have been a driver/bug issue as it had the same behavior across 3 machines and 3 different GPUs. 65% power setting in PX1 and power wobbles from 215-ishto 240-ish Watts depending on load just like it used to.

So the new drivers must have gimped something. Time to do this to my other machines (3000 series) and hope it has the same good result.
 
Top