parallel processing numerical modeling worthwhile to virtualize?

Thuleman

Supreme [H]ardness
Joined
Apr 13, 2004
Messages
5,833
I just ordered a few servers which will only be used to run CPU intensive climate models on where each model run will need to be allocated 6 cores and 72 GB of RAM, and the computation takes several weeks per run.

The servers I ordered have dual hex-core CPUs. Ordinarily we would install Debian, and then run the model software with settings to allocate six cores and ram, effectively being able to run two model simulations per physical server. When the model simulation runs it maxes out each of the six cores to 100%.

The VMware-guy in me wants to install ESXi and then create two VMs, each with six cores, and then run the model simulation inside of that.

The Lowest-Complexity-Possible-guy in me wants to install Debian and just run the models in it and not monkey with VMware.

This seems to be one of those cases where virtualiztion isn't a good fit. Doesn't seem like anything is gained by virtualizing this workload. It's actually more likely that performance will actually be lower when running on ESXi since there's some overhead introduced by ESXi.

If the model run is interrupted for some reason the whole run is hosed, so HA wouldn't help, FT might, but will take double the resources and it's not worth it.

Anyone running virtualized computationally intensive workloads that max out the CPU for days/weeks? Worth doing?
 
Can it be done? Sure. Is it worth it? Only way to find out is to try- depending on how the climate model was written, the original programmer may have done some low-level Ring-0 work to make things "faster" and that won't virtualize well. If not, then we can get you within 2-5% of bare-metal performance with the ease of management of VMs, via the low-latency tuning options. Maybe even closer, with a bit more work. We've got a lot of folks doing this - it just takes more work than "create VM, run software".

I've worked with both types of models - as a programmer and VMware dude, so it all depends on what was written and how much work you want to do (oh, and forget backups while it's running - that'll be a no-go).
 
Interesting. Backups are a non-issue because the model run cannot be resumed once it's interrupted, at least that is what I was told, I should probably check to see whether that's actually true. It would be cool to pause the model on schedule, run the backup, and then resume.

Do you have any resources for the low-latency tuning and the stuff that needs to be done doing "a bit more work"? If the model run can be paused and resumed then deploying this in a VM is totally worth doing from my perspective. If it cannot be paused and resumed then I am not sure whether it's worth the effort, depends on how much effort it is.

At within 5% of bare metal performance we would be missing ~437 hours, or 18 days per year per model, assuming 100% utilization. At a more likely 85% utilization it would be roughly 15 days missed over bare metal per model run.

I'll check with the project manager to see whether we can put a couple servers aside to test performance of VM vs bare metal. That seems reasonable to me.
 
Thanks for those, will dig into them in the coming days.

I did see this in the comment section of the first link:

Yes, we are recommending over-provisioning. This is to achieve to two things:

1. Latency-sensitive feature performs best when a latency-sensitive VM gets exclusive access to PCPUs. Over-provisioning increases the chances of this. However, as explained in the whtie paper, the best way is to give 100% CPU reservation for a given latency-sensitive VM that guarantees exclusive PCPU access to a VM.

2. Once PCPUs are exclusively owned by latency-sensitive VMs, they cannot be used by VMkernel threads and user-level processes. So it is recommended to leave one or more PCPUs for those.

Essentially what he's saying is that in my case, where I want to give the model six cores (6 vCPUs) I should have at least 7 physical cores available as I am otherwise unable to achieve exclusivity for the model VM cores. So in my specific case of two physical CPUs, with six cores each, I could only provision 5 cores per VM, or maybe 6 cores for one VM and 5 cores for the other.

This is somewhat similar in a bare metal case as well since the OS still needs to run on something, but in the bare metal case a fraction of a core is lost to the OS, not the whole core. It will require some testing which approach is more efficient, losing a whole core to the OS but getting 100% on the remaining cores or distributing the overhead.
 
Yeah - some workloads free up enough for ESX to be happy, other's don't. It's always hard to say :) setting a CPU reservation for the ESX kernel might free up a bit of that and not cost you a whole core too - don't be afraid to tweak those settings.
 
Has OP compared costs to run his simulations in a few seconds on a few hundred cores of cloud or even university supercompute infrastructure ?
 
Has OP compared costs to run his simulations in a few seconds on a few hundred cores of cloud or even university supercompute infrastructure ?

Sort of, we do have our own 1,750+ core HPC setup on campus. In theory HPC is great, in practice it becomes very situational. The issue with HPC is that your jobs are queued up, so they don't run immediately. Once they do run and you find out that something is wrong and you need to restart the run you are queued up at the end of the queue again. So HPC could be an option if a perfect run has been developed and it just need to be repeated over and over. Another issue is that it takes time/approvals to install software in the HPC environment, and time/approval to make any changes to that software. Once again HPC becomes viable only after a very stable set of prerequisites has been developed. Lastly, access to the HPC system isn't straight forward, files need to be SFTPed to one location, and then copied over to the file system the HPC actually uses. All of this generates a lot of admin overhead and thus is not viable for our purposes.

The other issue is that the standard version of the software they want to use recommends to use fewer than 8 cores. The option exists to utilize more than 8 cores per model run but it requires code modification to switch to a different MP library. As far as I can tell no good stats exist on whether more cores per run are more efficient since there's overhead to keep things distributed and in sync.

This project is in its early stages where frequent changes to the model parameters and inputs will be made. Currently the best option was to buy a few servers and have the ability to run several models on less than 8 cores at a time.

Amazon etc. cloud based computing wasn't viable since our needs are CPU (6 cores per run), memory (72 GB per run), and storage intensive. Model input is about 2 TB, intermediate and final outputs total 4-5 TB per model run.

I bought a bunch of used Dell C1100, dual hex core, 144 GB of RAM, they are perfect for this job at less than 2k each.
 
The VMware-guy in me wants to install ESXi and then create two VMs, each with six cores, and then run the model simulation inside of that.

Did you not ask him why? If you didn't make him substantiate the validity of his approach, you should smack yourself.

It doesn't matter what he tells you, if he can't give you a good reason to support it.

The Lowest-Complexity-Possible-guy in me wants to install Debian and just run the models in it and not monkey with VMware.

While I'm sure this choice falls closer to the 'obvious solution' category than the other one, it still would have been valid for you to ask him what his concerns with VMware or other virtualization platforms could be.

This seems to be one of those cases where virtualiztion isn't a good fit. Doesn't seem like anything is gained by virtualizing this workload. It's actually more likely that performance will actually be lower when running on ESXi since there's some overhead introduced by ESXi.

Well, the biggest benefit I can see is that you can isolate the simulations more granularly. If for example, simulations A and B run on the same box, and something goes wrong with the software on that box (something system wide, which would necessitate restarting the OS, etc.), you're going to lose both simulations. If simulations A and B run in their own virtual environments there are situations where it's possible to restart the OS on one box without doing so on the other. It might also keep any OS resources from being shared between the two simulations. But of course you have to examine whether or not that's actually valuable to you. It seems like a stretch to me.

Is this worth the potential overhead? Well, the thing I would look at is availability, i.e. MTTF/(MTTF + MTTR), of your setup. How often does the entire rig get taken down, and how long does it take you to bring the rig back up? If running the simulations natively without virtualization already provides a very high availability, I don't see anything to gain from virtualization. Do you ever find yourself needing to restart the entire system because of some kind of failure? How often are you going to have one simulation take out the whole system and therefore another simulation with it? If this doesn't really happen ever, then regardless of how small the virtualization overhead may be, it's probably not worth virtualizing.

If it happens a lot, then there might be value in separating the simulations through virtualization. Of course, to know for sure you have to look at the performance impact of the virtualization software, and if that's causing you to give up 1 core for 1 of the simulations, any execution units which receive near-linear speedup from multi-core hardware will probably suffer far more than any downtime prevention can justify.

In your case, it sounds like the benefits of virtualization aren't really going to be that helpful.


Model input is about 2 TB, intermediate and final outputs total 4-5 TB per model run.

With this kind of spatial requirement, I'd say you'd be wasting your time with anything other than your own dedicated hardware. If you can't be in the same room with it, I'd say odds are you'll be wasting a lot of time moving all of that data around.

Of course, it could potentially be interesting to have access to a Hadoop cluster so that you could load all that data into something like HDFS so that you can search through the results in a potentially interesting way, so having in a university HPC center that offers a Hadoop cluster could be beneficial in that way. But I imagine the size of data you're working with and the duration of your jobs would make a shared computing center tedious to use.

I bought a bunch of used Dell C1100, dual hex core, 144 GB of RAM, they are perfect for this job at less than 2k each.

That's probably what I would have done too. When you have the option of purchasing used servers, the cost of owning your own hardware is much less severe.
 
Back
Top