Dear crunchers!
We would like to share with you two things:
a) the GPU accelerated GW app is now much less memory hungry and less likely to produce errors. If you had previously opted out of the GW app, we invite you to reconsider and give the new version a chance.
b) to celebrate the new app, we have a holiday season special offer for our crunchers: you'll get twice the BOINC credits for the GW App results.
[see details in the forum thread]
Copyright © 2024 Einstein@Home. All rights reserved.
Comments
We have been busy lately to
)
We have been busy lately to improve the crunching experience when running the O3 All Sky Gravitational Wave Search on O3 data (O3ASHF search).
We had noticed that a few things were not quite optimal: the app originally required almost 4GB of memory on your graphics card, which, admittedly, was too ambitious. We noticed many results coming back to us as computation errors, caused by memory allocation errors. We also noticed that quite a few users have opted out of the gravitational wave app, and we suspect that the relatively high error rate and the high memory requirements are to blame for this. We are sorry for any inconveniences this might have caused.
So we changed the workunits and the app substantially: Instead of crunching through a certain amount of parameter space in one go per workunit (and using up to almost 4GB VRAM for this), the new app will run on workunits that search thru that same space but in two steps sequentially, each time covering half the previous search volume. The advantage is that now the maximum VRAM used for this is only around 2GB, and to create some safety headroom, we let BOINC assume a requirement of ca 2.5 GB.
The new app version was deployed some time ago and we are indeed seeing a substantial decrease in the number of work units failing with an error, so this works as intended.
Important: If you have previously opted out from the gravitational wave search in favor of the BRP7 GPU accelerated search, we would like to invite you to re-enable the GW search again in your preferences (under the "Project" setting: note that you might want to set this in all of the BOINC "venues").
Happy New Year 2024 Extra Credits
As an incentive to try the new app (especially if you have previously opted out), and as compensation for potential troubles in the past, we now increased the credits per new workunits that are generated from now on to 10k, or twice the previous amount.
Some additional technical notes
If you have a graphics card with 4GB VRAM or less, you probably want to make sure that you do not accidentally run two or more instances of the app at the same time. The "GPU utilization factor" in your project preferences settings in BOINC should be set to 1.0 in this case, which is the default. If you reduced this to 0.5 etc to allow multiple units in parallel in the past, and you see errors in the computation (because BOINC will try to start two units with very close to 4GB RAM usage in total because BOINC is fooled by a low VRAM usage at the very start of the app), you should set this to 1.0
Happy crunching!
BM
sounds great Bernd :) is the
)
sounds great Bernd :)
is the 10k points a "limited time offer" kind of thing? will it go back to 5k at some time? or will you leave it at 10k from here on out?
this might be a big ask, but would it be possible to compile the GW app for CUDA for nvidia devices? CUDA gives some special benefits for Linux users in being able to run the Multi-Process Service and gives the user some tweaking ability that can't be done on OpenCL. I use this a lot, and with running several other projects that are CUDA also (GPUGRID, Asteroids) it makes my life easier not having to stop and start MPS to switch between CUDA and OpenCL. Simply having the binary in CUDA with no other changes is sufficient.
_________________________________________________________________________
Can you update build script
)
Can you update build script for brp application here https://einsteinathome.org/brp-src-release.zip ?
It tries to download zlib 1.2.8 from http://zlib.net/zlib-1.2.8.tar.gz bit it was moved to https://www.zlib.net/fossils/zlib-1.2.8.tar.gz
There are probably more modifications needed.
kotenok2000 wrote:Can you
)
you can update the version to pull the latest version in the build script
just change "ZLIB_VERSION=1.2.8" to "ZLIB_VERSION=1.3"
but this is all off-topic. this thread is talking about the Gravitational Wave tasks, and you're asking about the BRP app.
_________________________________________________________________________
Bernd Machenschalk wrote:We
)
I assume that's "All-Sky Gravitational Wave search on O3 (O3AS)" in the Project Preferences and Applications pages? (I don't see a "Continuous" or an O3ASHF)
Why do these application names often not quite match up??
Edited to add: OK - I see the task is listed as O3ASHF in BOINC's TASKS list. But I can't see that until a task is downloaded. If you're advising users to sign up for an application, it would be better to list that application by the name it appears in the Project Preferences page.
Thanks
WPrion wrote: I assume
)
yes. that is the one.
_________________________________________________________________________
...and the task name does not
)
...and the task name does not say "Continuous". Do I have the right one??
Thanks, but my recommendation
)
Thanks, but my recommendation still stands:
If you're advising users to sign up for an application, it would be better to list that application by the name it appears in the Project Preferences page.
there is only one
)
there is only one Gravitational Wave O3AS selection. you can't get it wrong. it identifies as "All-Sky Gravitational Wave on O3" which is pretty self explanatory given there's only one GW search. it's the right one.
the O3MD* are previous completed searches. there is no work for those.
_________________________________________________________________________
Got it. Choose the Project
)
Got it. Choose the Project Preferences application by best guess and process of elimination. Accuracy is so overrated anyway. It's not like we're dealing with computers here...oh, wait...
Thanks Bernd! Let me give my
)
Thanks Bernd! Let me give my Radeon VII a shot. It seems well suited for this task.
Ich habe es ausprobiert. Ich
)
Ich habe es ausprobiert. Ich kann dazu nur sagen es ist nicht ansatzweise das was ich mir von einer "neuen" App versprochen habe. Die Auslastung der CPU war durchgängig zwischen 3,6 und 6% (auf meinem AMD Ryzen 9 7950X3D mit SMT an). Die Auslastung der GPU ( RTX4090 ) lag bei einer Aufgabe bei max 52% und einer Belastung von 110W, bei 2 Aufgaben gleichzeitig bei 95% und ebenfalls 110 W. Bei 2 gleichzeitig laufenden Aufgaben verlängerte sich die Laufzeit der beiden WUs auf mehr als das doppelte, ich habe sie abgebrochen. Gibt es eine Möglichkeit das das gleichzeitige berechnen von 2 Wus des Typs O3AS verhindert werden kann?
Ausserdem bekommt man für das berechnen einer Meerkat-Wu 3333 Credits. Die läuft aber nur 160 bis 190 sec. Die "neue" O3AS braucht aber mehr als 600 sec und erhält 10000 Credits. Also ein deutliches Missverhältnis.
I tried it. All I can say is that it's not even close to what I expected from a "new" app. CPU utilization was consistently between 3.6 and 6% (on my AMD Ryzen 9 7950X3D with SMT on). The utilization of the GPU (RTX4090) was a maximum of 52% and a load of 110W for one task, and 95% and also 110W for two tasks at the same time. With two tasks running at the same time, the runtime of the two WUs was more than doubled , I canceled it. Is there a way to prevent the simultaneous calculation of 2 Wus of type O3AS?
You also get 3333 credits for calculating a Meerkat-Wu. But it only runs for 160 to 190 seconds. But the "new" O3AS needs more than 600 seconds and gets 10,000 credits. So there is a clear disproportion.
I'd like to help but O3 GPU
)
I'd like to help but O3 GPU does absolutely nothing on my AMD iMac GPU, less than 5% of GPU is used, so there's no point.
It was the same back in 2022 when I started that thread (and I remember I had also tested this before and it was the same), and unfortunately the recent updates of the app didn't change that problem.
I too wish E@H would use the
)
I too wish E@H would use the GPU effectively. I also see average utilisation of about 7% on a Vega64, for einstein_O3AS, and suspiciously that is irrespective of how many concurrent GPU tasks I run. Each GPU task also uses about 7 GB of GPU RAM, even with the latest version that you say should use only 2.5 GB.
The other GPU apps - e.g. einstein_O2MDF & hsgamma_FGRPB1G - were much more effective at utilising the GPU properly and required much less RAM - in particular, they scaled nicely; I could run five or more such GPU tasks simultaneously with close to linear throughput gains. It's unfortunate that E@H is no longer running those apps.
I know the credits are largely made up, but FWIW I used to earn an order of magnitude more from E@H than I do now, on this same computer. Back when E@H was using the GPU properly.
Other BOINC projects are able to utilise my GPU fully (often with a single task, which is convenient), though their project objectives have less merit (IMO). It's a shame that my preferred project performs so badly.
Got my Radeon VII to run
)
Got my Radeon VII to run some O3ASHF gpu tasks. Running 5 tasks per gpu by staggering each task after the cpu portion of each task is completed. Total average run time for 5 tasks = 2008 secs or 401.6 secs per task with average gpu board power of around ~105W give or take. Average actual clock ~1360GHz. Memory clock ~1800MHz. AMD Driver 23.5.2.
Link to host
GPU-Z screenshot:
Average PPD for O3ASHF is 2.15M at 105W board gpu, assuming 0% invalid Not sure what's the average current invalid rate looks like. Maybe a few percent.
Seems like my Radeon VII still has some good life in it. Wondering how long will the O3ASHF project last and if ever the new credit granted is permanent or temporary.
pututu wrote: Got my Radeon
)
Any recommendation on automating staggering? I am seeing quite large task times on my VII (@40 mins running 2x), going up near 50 mins if running at same percentage without staggering manually. Are my poor times due to CPU limitations? (It's on a xeon e5 v4 host). I run a slight auto under volt but don't think its that.
I haven't seen the new credit bump either yet must be still running through old tasks.
Just one data point for the
)
Just one data point for the opt-out. I once had to opt-out GW apps because the app crashed on latest Nvidia driver and I happen to need that driver for a game. In addition, such error blocked me from getting any EAH tasks, not just GW apps, even though other apps are working fine. So the only option was opting out GW app. Other than checking forums occasionally, I have no way for me if/when the issue was resolved and whether I need to upgrade my driver again. Ultimately I forgot to retry for a long time.
It would be nice if server side can quickly block only problematic combination of apps and drivers, though I know that's probably not EAH but BOINC server code and not necessarily easy to do. Assuming that's not worth the effort, it would be helpful to have a shout out through the boinc notice whenever new drivers or apps don't work for some specific versions and when such issues are resolved. LHC can detect when virtualbox is not installed and send a notice in boinc manager. I wonder if that means there is some capability to detect the platform configurations and target a similar message for hosts with problematic drivers?
The 10k credits will remain
)
The 10k credits will remain until the end of this "run", as long as it takes. The current prediction is 2 months. However, we hope that with this offer we'll finish it a little faster.
I'm working on compiling this app for CUDA (Win&Lin), but ran into some problems and ultimately had to postpone that for more urgent things. I'll pick it up again ASAP.
BM
I once had to opt-out GW apps
)
Thanks, that's an important piece of information. By far most of the errors we saw were memory allocation errors, so we worked on fixing the memory issue first. I'll take a look at the remaining errors now. Could you post the driver version that's creating the problem? Do others see this, too? Is this limited to a specific OS (Windows, Linux) or does this happen on both?
BM
I ran 2 WUs on my 2080 since
)
I ran 2 WUs concurrent on my 2080 since yesterday and now it has crashed with a blue screen: https://einsteinathome.org/de/host/12987429
The Box is normally doing fine, no crahes, regardless what I crunch.
Windows10, fully patched, NVIDIA-Driver 466.77
Supporting BOINC, a great concept !
Quote:ace_quaker
)
pututu
)
Bernd Machenschalk
)
thanks Bernd :). I look forward to a CUDA version.
_________________________________________________________________________
I have it running up to 3 x
)
I have it running up to 3 x per GPU. And my impression is there is enough CPU processing going on that we could run up to our memory limit and still get a speed up?
Yes/No?
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Thank you for the update :)
)
Thank you for the update :)
Tom M wrote: I have it
)
I have a test-rig running 4 x per Gpu. It looks like it is running 80% of the available user gpu memory.
The Epyc-7601 is showing 21% cpu usage with no other tasks processing.
I had it running at varying # of tasks per Gpu. So only the "last" ones are 4 x tasks.
Its on NNT. Shutdown after it runs out of tasks.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Ian&Steve C. wrote:WPrion
)
I would love to help with this project however I have the above application selected in my preferences and I get the below output
18/01/2024 9:52:00 AM | Einstein@Home | Project requested delay of 60 seconds
18/01/2024 9:53:01 AM | Einstein@Home | Sending scheduler request: To fetch work.
18/01/2024 9:53:01 AM | Einstein@Home | Requesting new tasks for NVIDIA GPU
18/01/2024 9:53:03 AM | Einstein@Home | Scheduler request completed: got 0 new tasks
18/01/2024 9:53:03 AM | Einstein@Home | No work sent
18/01/2024 9:53:03 AM | Einstein@Home | No work available for the applications you have selected. Please check your preferences on the web site.
18/01/2024 9:53:03 AM | Einstein@Home | Project requested delay of 60 seconds
Tried a project reset to no avail. Not to move the focus away from this project I thought the search focus was on project "BRP7"?
It is All-Sky Gravitational
)
It is All-Sky Gravitational Wave search on O3 (O3AS)
Speedy wrote:Ian&Steve C.
)
So what tasks have you got enabled in your profile? If it is All-Sky GW, do you also have the "and other tasks if needed" enabled? Might even turn on the "beta test" for the profile too. Hope you have run Gpu tasks and both the Nvidia and Amd gpus selected in your profile too.
HTH,
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote:Speedy
)
I only have the project above/run test application selected and Nvidia GPU as I don't want anything running on my AMD GPU as it is combined with my CPU
Speedy wrote: I only have
)
What happens when you set this to YES?
"Allow non-preferred apps:yes/no"
Does it download ANY gpu tasks?
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote: Tom M wrote: I
)
My current guess is if 4 x really is averaging 47 minutes then the Rac production is about equal to brp7/MeerKat.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote: Speedy
)
Not sure I only want to run the project above I may try again in the coming days. Thanks for the suggestions. I haven't had such issues with other subprojects
Tom M wrote:My current
)
I think the board power draw for O3ASHF will be much lower than running BRP7? So you could actually save some power when running O3AS vs BRP7 for the same RAC?
Speedy wrote: Tom M
)
If it downloads any GPU tasks then it is likely not your setup that is the problem.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Thank you for update! Sounds
)
Thank you for update!
Sounds very good, these are the exact reasons why I had to keep GPU off from crunch :)
ace_quaker wrote: Are my
)
Probably yes. I have been getting these tasks for a while now, I think because I opted in to 'run test applications'.
I run 4 of these tasks simultaneously on my 7900 XT (seemed most efficient when I tested a long time ago), and my CPU is a 12-core 5900X. When I'm running 4 CPU tasks at the same time as the 4 GPU tasks, thus still using only 8 of 12 physical cores, this will add about 5 minutes of runtime per GPU task, going from ~27 to ~32 minutes. That's quite a lot, seeing that the GPU is not taxed more in any way by the CPU tasks, so it seems the GPU tasks are heavily limited by the CPU.
I have not tried staggering the GPU tasks. It sounds like a very good idea. I also wonder what's causing this extra runtime. Perhaps the GPU tasks are limited mostly by CPU cache. Unfortunately, when running 4 CPU + 4 GPU tasks, the Windows thread scheduler will first put all six cores on the first chiplet to work, and put only two tasks on the second chiplet. This is very inefficient from a cache point of view, since each chiplet has its own cache shared between six cores. I might try to spread them out evenly with Process Lasso sometime if I feel l like it.
I suspect that the moderators
)
I suspect that the moderators would prefer we stop cluttering up a "News" thread with on going discussion.
I have opened up a discussion thread in the Crunchers area here.
Your invited.
Respectfully,
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Bernd Machenschalk wrote: I
)
That was quite a while ago and have since resolved. There was a thread on this forum back then but I couldn't find it now. :-( IIRC, it was consistently crashing on Windows for one of the 525 driver updates.
Sorry if that caused confusion. I was only mentioning this to point out people could disable some app but never learn that it's fixed later. It's not a current problem.
I have tried to run the GW
)
I have tried to run the GW tasks, but they always indicate absurd time for completion. One estimated 330 days to complete. I abort these. Today I aborted one whose time to completion was 169 days. Even when I suspend all other tasks, I get the same result.
Can you suggest any solutions to this dilemma?
This computer has a 1TB SSD and 16 GB of RAM.
Total credit:1,478,027
Average credit:2,910.38
Cross project credit:
CPU type:AuthenticAMD AMD Ryzen 7 5700G with Radeon Graphics [Family 25 Model 80 Stepping 0]
Number of processors:16
Coprocessors:AMD AMD Radeon(TM) Graphics (6227MB)
Operating system:Microsoft Windows 11 Core x64 Edition, (10.00.22621.00)
BOINC client version:7.22.2
Memory:15754.27 MiB
Cache:512 KiB
Swap space:18186.27 MiB
Total disk space:930.81 GiB
Free disk space:817.61 GiB
Measured floating point speed:5727.01 million ops/sec
Measured integer speed:25051.66 million ops/sec
Average upload rate:46.22 KiB/sec
Average download rate:7945.19 KiB/sec
Average turnaround time:1 days
Tasks:86
Number of times client has contacted server:1437
Last time contacted server:22 Jan 2024 19:44:00 UTC
% of time BOINC client is running:93.9626 %
While BOINC running, % of time host has an Internet connection:99.9991 %
While BOINC running, % of time work is allowed:98.0415 %
Task duration correction factor:0.84422
S. Gaber
Steven Gaber wrote:I have
)
Sure. The time estimates for the first 11 or so tasks are completely meaningless. You don't begin to get a reliable estimate till later on.
I have run these tasks on my Windows 5700G and they don't take 330 days. Honest. They likely will take hours. But not days. No I don't regularly crunch on my Windows box. But I do experiment with it.
Try again. It will get better.
We can provide more help at: https://einsteinathome.org/content/new-improved-gravational-wave-app-discussion so we don't clutter up this news thread any further.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
A small digression, for a GW
)
A small digression, for a GW tasks...they finish up to 99,5% & then they hang for a while, doing only CPU work while GPU is on minimal %...do those task really need to do those CPU evaluation for so long or is it some "glitch"?
non-profit org. Play4Life in Zagreb, Croatia, EU
KLiK wrote: A small
)
It stops at about 50 percent and 99.5 percent on every task I have processed.
This was described as planned earlier in the thread. This is why processing time is sensitive to CPU speed and cache.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
KLiK wrote: A small
)
Not a glitch, this is how the OpenCL app works, currently. The same happens around 49.5% .. 50% .
Here is why: After some initial data parsing, validation and preparations which are done on the CPU only, the GPU-heavy part starts and the application computes many thousands of potential signal candidates, ranked by some detection statistic. This part of the computation is very well suited for GPU processing because (somewhat simplified), we can try many signal templates in a regularly spaced frequency grid and for identical sky points in parallel, and doing the same operations in parallel for different but "similar" data points in such a configuration is what GPUs are really good at.
Now, once we have a list of candidates, we need to perform some operations on each of those candidates. Because those candidates are now from a sparse subset of the original search grids and are thus all over the sky and no longer are arranged in a regular frequency grid, this additional step is not so well suited for GPU computations anymore. Therefore, the current OpenCL app now does these operations on the CPU.
The "new" workunits now contain a bundle of two sub-workunits, so this switching between GPU-intensive and CPU-only processing happens twice during the processing of a workunit: the first half of the batch from 0 ... 49.5% (GPU intensive) and 49.5..50% (CPU intensive), and then the second sub-workunit follows with 50%..95.5% (GPU) and 95%..100% (CPU).
It is possible that the next E@H version of the software will be able to do this second step on the GPU as well, BUT because the data is (for the reasons outlined above) less "regular" than in the first step of the (GPU) processing, it will still not be that much faster than the CPU code that we use now, at least on most machines. It obviously depends a lot on the relative speed of your GPU and CPU. It is not clear whether we can roll out the new code before the end of the current O3ASHF search.
I hope this answers your question.
Thank you for a great recap!
)
Thank you for a great recap!
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Bikeman (Heinz-Bernd
)
Hey, thanks for the explanation. Sorry I'm late to the party. Tossin a VII Pro on the pile to crunch this. Here's to some good data :)
All-Sky Gravitational Wave
)
All-Sky Gravitational Wave search on O3 v1.07
....No Work....any news?
pututu wrote: Got my Radeon
)
I'm running 4 tasks on my Radeon VII but can't achieve as good as results as you are getting Pututu
Low wattage which is great though
https://einsteinathome.org/host/12602626/tasks/4/56
Maybe it's the memory overclock?
Do you have a profile which you use? I used to have one for undervolting.
Thanks, as maybe I should try
)
Thanks, as maybe I should try running 4x WUs for my RTX 4000. ????
non-profit org. Play4Life in Zagreb, Croatia, EU
Bernd Machenschalk
)
And now? How far are you with your work? Half a year is quite a bit long