I can also add this: I have a GTX580 and when everything is on Ultra with Anti Deferred at 4x it comsumes almost all of my VRAM. When I put everything to High with Anti Deferred OFF, ANIS 16x, Ambient Off, Motion Blur Off, and Anti Post on High it only consumes 730 on the VRAM. I'll be honest, I really don't see a huge difference between the High settings and the low settings.
My friend also has the SLI'd Evga GTX 570's and get's stutter and we have been trying to troubleshoot this for about a week. We have tried everything that Google has to offer and notta. Somebody earlier recommended the Evga Bios update... that is new and we'll give it a try. If he play's with just one card the stuttering goes away so it's obviously a SLI thing... which btw I hope it doesn't effect me since I have another GTX 580 on it's way
you know what you are right SLI seems to be the only thing that gets the massive performance hit when you hit the VRAM ceiling on your cards for some reason, i remember before i got my 2nd 560 Ti i could force too high of video settings on my single 560 Ti and while it did not get great frame rates it did perform better than with my SLI setup and the same settings smoothness wise.
the only hardware related thing that i can think of that is the cause of this is that when running SLI your cards in most motherboards are only running in 8x mode rather than 16x when there is only 1 card, this halving of the bandwidth each card can get on the bus combine with the fact that you now have 2 cards that need the same rendering data at the same time and if the game runs out of space to store this data on the video card's RAM it automatically falls back on storing it in system ram until its needed, once needed by the GPU's rather than just having one GPU on a 16x slot calling for the data from system RAM now you have TWO GPU's calling for the same data at nearly the same time with HALF the bandwidth to get it there. so as you can see there are 2 reasons why it could cause massive slowdowns that don't happen with a single GPU. with SLI you need twice the amount of data moved on half the pipe bandwidth or well half to each GPU but same to the CPU because its got the 16x PCI-e controller in it (or your chipset if your not on a sandy bridge cpu)
i think the main problem is the twice the data part. system ram even dual channel DDR3 is quite a bit slower than GDDR5 ram and if a game is forced to do this there will be performance hits due to it anyways but doubling the amount of data needed to be transferred will make a bad situation even worse and i can totally see this being the cause of the major performance hit when this happens.