I've promised analysis but got carried away a bit in a conversation with fellow team member
so that will need to wait... (sorry).
I did get permission to quote our conversation so, please, enjoy.
A word of caution. I will ask moderators to intervene shall the discussion depart to the level
of e-peen contest even in the slightest.
so that will need to wait... (sorry).
I did get permission to quote our conversation so, please, enjoy.
A word of caution. I will ask moderators to intervene shall the discussion depart to the level
of e-peen contest even in the slightest.
Code:
04:32 <&tear> kendrak, to put it short
04:32 <&tear> http://www.scalalife.eu/content/gpu-only-execution-gromacs
04:32 <&tear> open this.
04:32 <&tear> there's a chart close to the bottom of the page
04:33 <@kendrak> Ok.....
04:33 <&tear> now, parse this paragraph:
04:34 <&tear> MD algorithms are complex, and although the Gromacs code is highly tuned for them, they often do not translate very well onto the streaming architetures.
Realistic expectations about the achievable speed-up from tests with GTX280: for small protein systems in implicit solvent using all-vs-all kernels the
acceleration can be as high as 20 times, but in most other setups involving cutoffs and PME the acceleration is usually only about 5 times relative to a 3GHz CPU
core.
04:35 <@kendrak> Core as in singular
04:35 <@kendrak> Not a 48 core boxen
04:35 <&tear> yes
04:35 <&tear> so the chart illustrates these two types of simulations
04:36 <&tear> first three simulations illustrate the '20x' from the paragraph
04:36 <@kendrak> Yes, get that
04:36 <&tear> the last two -- the '5x'
04:36 <@kendrak> I see that
04:36 <&tear> now
04:37 <&tear> all GPU projects out there are of 20x type.
04:37 <&tear> there is no GPU unit that I know of that is of 5x type
04:37 <&tear> (simplifying)
04:37 <@kendrak> Not a supprise
04:37 <&tear> at least not in circulation at this time
04:38 <&tear> now
04:40 <&tear> similarly, all SMP projects I've dealt with are of '5x' type
04:41 <&tear> (that will become interesting later)
04:41 <@kendrak> Ok.....
04:41 <&tear> but for now, let's assume that we have a mix of 5x and 20x projects
04:41 <&tear> and that both types are being served to both GPUs and CPUs
04:41 <&tear> ==
04:41 <@kendrak> Talk about a PPD rollarcoaster
04:41 <&tear> if PG uses SMP as a 'global' benchmark
04:42 <@kendrak> Ok....
04:42 <&tear> that means the SMP rigs get relatively flat PPD output across the board
04:42 <&tear> (for both '5x' and '20x')
04:42 <&tear> (a consequence of SMP being a benchmark)
04:42 <&tear> but... GPUs get ridiculuously high points for 20x units
04:42 <@kendrak> Understand
04:42 <&tear> because they calculate them so well
04:43 <&tear> that creates an incentive for GPU users to cherry-pick WUs
04:43 <&tear> =====
04:43 <@kendrak> That will get messy quick
04:44 <&tear> now, imagine they do the opposite
04:44 <&tear> use GPU as a baseline
04:44 <&tear> GPU PPD output becomes ~flat
04:44 <&tear> but some units (20x units) get penalized on SMP
04:45 <&tear> == create incentive for SMP to only fold 5x units [!]
04:46 <&tear> with these two types of units
04:46 <&tear> it's impossible to satisfy these two at the same time:
04:46 <&tear> 1. Equal points for equal work
04:46 <&tear> 2. ~Flat PPD output for a given hardware
04:46 <&tear> ====
04:46 <&tear> I would honestly recommend keeping '20x' units to GPUs
04:47 <&tear> and keeping 5x to both SMP and GPUs
04:47 <&tear> ===
04:48 <@kendrak> So a 48 core system at 2.0ghz.....
04:48 <&tear> 20x units scale very bad on SMP, too.
04:49 <&tear> 2.4 GHz 12 core == TPF of 15m and some seconds (on an 8057 -- '20x' unit)
04:49 <&tear> 3.0 GHz 48 core == TPF of 8m
04:49 <&tear> (on the same unit)
04:49 <&tear> (even less reason to keep them on CPUs)
04:51 <@kendrak> I understand the points tear
04:52 <&tear> kendrak, now, what's even more funny
04:52 <@kendrak> However.... the community will only want to do20x GPU
04:52 <&tear> kendrak, not if points are flat on the GPU
04:53 <&tear> kendrak, but wait for this...
04:53 * kendrak waits for it
04:53 <&tear> kendrak, I'm getting 6k PPD on 8057 on my desktop
04:53 <&tear> kendrak, regular SMP units are getting 26k PPD
04:54 <&tear> imagine what PPD GPUs would be getting now
04:54 <&tear> if points were such that I'd be getting 26k PPD on 8057, not 6k :)
04:54 <&tear> the gap is so BIG that it just doesn't make sense to feed these 20x units to SMP _at_ _all_.
04:56 <&tear> points would be even higher than the 50-250k they're getting now [!]
04:57 <@kendrak> Or do they just hand out points like penny candy?
04:57 <@kendrak> If they keep the WU on "optimized" hardware....
04:57 <&tear> I've no idea what their intents are
04:57 <&tear> or if they realize consequences..
04:58 <&tear> hell, they are researchers so they sure should know what performance difference is on '20x' type
04:58 <@kendrak> Only GPUs running 20x WU will be the focus
04:59 <&tear> the fact that we only have '20x' type units on the GPU (only one with bonus)
04:59 <@kendrak> Not sure they comprehend what giving 2/3rds the ppd to a gtx 580 vs a 4P does to their SMP potential
04:59 <&tear> could lead some to conspiracy theories :)
04:59 <@kendrak> Could.....
04:59 <&tear> ah, yes, that's another interesting consequence!
04:59 <@kendrak> Very diplomatic of you
04:59 <&tear> I'm trying :)
05:00 <&tear> now, look at this
05:00 <&tear> top dawg folding 8057 gets 250k PPD
05:00 <@kendrak> Mmm hum
05:00 <&tear> with power consumption of about 250W card + 100W system -- 350W
05:00 <&tear> +/-
05:01 <&tear> we're talking ballpark figures so we don't need to be super accurate
05:01 <&tear> that's 714 PPD/Watt
05:01 <@kendrak> About where a good 4P sits
05:01 <&tear> not exactly
05:01 <&tear> a _very_ good 4P gets about 500k PPD with 8101 (and 800W of power)
05:02 <&tear> that's 625 PPD/Watt [!]
05:03 <@kendrak> So we have GPUs as a new PPD/w king (in general)
05:03 <&tear> so 4P 8101 folders (read: majority) are getting screwed by lucky/cherry-picking GPU folders
05:03 <&tear> 8102s are better but only a little bit.
05:04 <@kendrak> Then the luck/cherry goes away and it becomes the norm
05:04 <@kendrak> If only 20x WU are given to GPUs
05:04 <&tear> correct.
05:05 <&tear> ==
05:05 <@kendrak> And SMP will go dark (for the most part)
05:05 <&tear> yup, completely dark.
05:05 <&tear> there will be no incentive for "regular SMP" any more
05:05 <&tear> forget all your 2700k, 3960X ...
05:06 <@kendrak> And the science is limited, because only one type of math is worth doing
05:06 <&tear> kendrak, not really
05:06 <&tear> kendrak, the rest is '5x'
05:06 <&tear> kendrak, so it's still very good on GPU
05:07 <&tear> kendrak, *if* they introduce 5x to the GPU
05:07 <&tear> (5x units)
05:07 <@kendrak> I don't think they will....
05:07 <&tear> that I do not know
05:07 <&tear> but I think they should be made aware of potential implications :)
05:07 <&tear> (if they aren't already)