Ohh that is double buffer vsync, in which you have your frame buffer, and the "back buffer" which cannot be written until it has been sent, and thus yeah a 16.67ms constant lag will be present.
But with triple buffer vsync, which Fito got wrong on page 1 of this thread, you get rid of that input lag feel, because you can write a second back buffer even if the first one hasn't been sent to the monitor in its entirety.
Heck if Fito would have read the article he linked he would have seen that the extra 16-33ms input lag they talk about refers to displays who try to do "smart" postprocessing (another great reason to use digital inputs and turn all monitor//tv based postprocessing off!), and that they say that the best combo is triple buffering with a 120hz monitor (which indeed, for twitch shooters is sweet!)
And edit to add:
The developers of D3Doverride warned against that input lag article on Anandtech stating that the writter didn't really have much of an idea about how d3dprogramming works. (And that is the motive why D3DOverride's forced triple buffering does work)
Could you post the link to where they said that? I haven't found it yet.
I did find this thread, though, which contains more discussion on triple buffering:
http://forums.nvidia.com/index.php?showtopic=169911
This is what you seek:
http://forums.guru3d.com/showthread.php?t=315577
Q: Does anybody know if the triple buffering method implemented by by D3Doverrider uses a larger flip queue size (Increasing input lag.) or does it display the newest back buffer and discards any buffered frames that are not needed. (Reducing input lag.) I've always assumed that it was the first due to various articles and benchmarks on the web but a debate sparked on these very forums in the ATI driver section could do with this being answered.
A (from Unwinder who programs D3Doverrider): Of course the first one. That article about TB gave wrong concepts and understanding of TB triple nature and principles. Direct3D gives applications no way to manipulate swap chain that way, any D3D application using TB is always using the first implementation.