Cluster Hosted Gaming

feebleminded · May 17, 2012

OK,

I am trying to get a better sense of what it takes to deploy a "cloud" or as [H]'ers like to say, "cluster" for hosting gaming applications with low latency. Moreover, I am interested in a gaming cafe that has a local cluster with thin clients connected so you virtually eliminate network latency to the cluster.

Obviously nVidia's GRID and VGX offerings based on the Kepler cores is a good starting point, but it is still unclear to me what you would install on the cluster to be able to host a variety of applications. I am assuming Gaikai has some proprietary software that they use, but I wonder if you could install that on a localized cluster, or are there other options?

I am sure this topic has been beat to death on this forum, but with these new offerings I think the hardware infrastructure is there, it is just a question of how to utilize all those CUDA cores!

Thuleman · May 17, 2012

It seems to me that if you are doing a local cluster you are basically defeating the purpose of the whole thing. On a one-switch-LAN there is basically no latency that matters (< 1ms) so you don't gain anything by having the GFX housed in "the cloud" vs the local machines. It may be that the cost of the cluster is lower than buying a bunch of actual PCs, don't know about that.

feebleminded · May 17, 2012

That's the point, instead of buying a bunch of gaming PC's which you'll have to upgrade if you want them to be able to play the latest games, you just have the cluster and thin clients and scale that to your load. That way every user gets unbelievable graphics and supercomputer gaming experience without every machine having a GTX680 in it. Also, you are able to dynamically allocate GFX resources with a cluster whereas a bunch of local machines will essentially have limited resources at times, and excess resources at other times.

Nate7311 · May 17, 2012

Ok, What is your budget? How many seats? Minimum specs that needs to be available to each seat?

MrGuvernment · May 17, 2012

so how does the system spit out the game in real time with the graphics?

applications is one thing as you have a client on the thin side that access said resources on the server...

how can you run say 1 game 15 times and get high quality video out across the network to a thing client with some crappy build in intel graphics

Nate7311 · May 18, 2012

The whole concept is basically trading bandwidth for GPU processing horsepower. The whole point of a card that the OP linked to is that the video is preprocessed in that GPU and then the output is streamed to a client at each gaming station. The key to this scenario will be budget and the OPs knowledge in this arena. Asking a forum to design a system from the ground up is a bit risky IMO. Especially w/o a defined budget, goals, target specs, etc. What he's askign for should be possible. I just doubt it'll be in the budget of a small businees startup.

You can build a box with top end playability for $800-ish w/ monitor. How many seats? 8? 10? 16? Even at 16 seats you should be able to get a quantity discount on parts so you're looking at under $12k. Think you can build a cloud gaming environment for that? It's a good idea and definately one to build experience, but we need more info.

feebleminded · May 18, 2012

Correct Nate, you take the video output from the game and encode it in h264, then the clients just decode that datastream and even Atom proc's can do that.

This is a hypothetical design scenario for now, but it is something that I would like to deploy in my home or in a gaming cafe environment. I think $50-$100K is within reason to spend on such a system as a small business investment for say ~20 computers to start. The load per client would vary depending of course on what you are playing/running. Maybe you want to run an instance of BF3 which would take a lot of GFX power, or maybe you want to do some Photoshop editing where you need more RAM allocated, or maybe you want to play My Little Pony Island Adventure in which case you would need very few resources from the cluster...

feebleminded · Feb 5, 2013

I am intentionally reviving this thread because I never really got a solid answer, and am more serious about deploying such a system.

I am looking for someone with knowledge of current virtualization software, and am wondering if any of the current off the shelf offerings could take advantage of the quick Kepler HD video compression, ie:

VMware
VirtualBox
Xen
Citrix

Or, I wonder if Gakai or OnLive (formerly) offer local instances of their software

obrith · Feb 5, 2013

It is extremely unlikely you can get even decent performance out of such a system at scale that will be cost competitive to systems at each seat, assuming you play GPU intensive games. Virtualizing does well with threaded CPU intense tasks, not so much with GPU intense. You simply can't get the same performance potential from a virtualized environment as a local one. You also eliminate the potential gains of things like overclocking gaming systems.

I'm not going to run numbers for you, but I would guess you're going to be multiple times more expensive in a cluster with similar performance (not counting licensing for a hypervisor!), potentially magnitudes more. Future upgrades are unlikely to be a cost-savings either.

I also haven't used services like OnLive or Gakai, but I would suspect they are trading performance for gains you may see from colocation latency vs local latency and capitalizing on them, plus they're scaling FAR beyond what you're talking about.

RADRaze2KX · Feb 6, 2013

on the contrary, vga passthrough in Remotefx in server 2012 allowed me to run current generation games in 50+ frames over wireless to a laptop with no dedicated graphics card on high settings. The issue wasn't the hardware it was the rdp - all first person games had horrible mouse sensitivity issues that I couldn't get past. The mouse look was 5 times more sensitive than it should have been. There were three workarounds: touchpads did not have this problem but we're inconvenient... having a cursor overlay sometimes fixed the problem at the cost of performance and a cursor distraction... usb game pads were not affected...

I started a thread recently on this same subject here:
hardforum.com/showthread.php?t=1735453

I loved remotefx except the mouse issue.. I tried vmware workstation instead and the performance in 3d was garbage and I didn't want to use esx because vga passthrough requires two cards and I thought that was unnecessary. same with xen. not sure about virtual box.

feebleminded · Feb 7, 2013

Yeah, I feel like using a cluster you can run games at settings desktops could never ever reach, and it's not about the price/set (which is important) but about providing a gaming experience superior to anything anyone can get anywhere.

It seems as though you could run BF3 at Ultra settings and max out all the AA by utilizing the cluster. Then you just compress that to a video stream. I also know with OnLive and Gakai, they prioritize input packets to further minimize perceived latency.

I also contacted Gakai to see if they offered private instances of their software, but since Sony bought them they are on lock down.

We are putting together details of changes to Gaikai’s & Sony’s strategic plans, and will have more information a bit later in the year.

feebleminded · Feb 7, 2013

This is along the lines I was thinking of, but multiply the scale of the system dramatically.

http://hardforum.com/showthread.php?t=1722566

McTurkey · Feb 7, 2013

feebleminded said:
Yeah, I feel like using a cluster you can run games at settings desktops could never ever reach, and it's not about the price/set (which is important) but about providing a gaming experience superior to anything anyone can get anywhere.

It seems as though you could run BF3 at Ultra settings and max out all the AA by utilizing the cluster. Then you just compress that to a video stream. I also know with OnLive and Gakai, they prioritize input packets to further minimize perceived latency.

I also contacted Gakai to see if they offered private instances of their software, but since Sony bought them they are on lock down.

Latency will never allow a superior gaming experience that is rendered remotely over what is available from a high-end gaming desktop. Further, if you have to do video compression, quality drops compared to the uncompressed images.

I would absolutely love to be able to some day soon build a beastly server and replace the laptops/desktops in my house with thin clients, and not have any complaints about gaming performance (FPS, MMO, RTS, or otherwise). Realistically, this isn't happening until GPU virtualization is significantly improved, and even then, it has to remain on a local network or latency makes it unplayable when you're otherwise accustomed to ultra-low-latency gaming.

obrith · Feb 7, 2013

Server CPUs are simply slower than gaming CPUs if you overclock them (and sometimes even if you don't). We use a mess of gaming computers overclocked to 5ghz to compile single threaded compiles as fast as possible because servers suck at it.

If the games are crazy threaded you still wont win, now or in the near future, using a server. It will cost massively more to meet or beat performance of a very cost effective gaming system.

Desktop gaming systems are as cheap and high performing as they've ever been.

MrGuvernment · Feb 8, 2013

um, how are server cpus slower than gaming CPUs..

for one there is no such thing as a "gaming" cpu specifically and also server and desktop processors are built the same for the most part except say ECC memory.

Hagrid · Feb 8, 2013

obrith said:
Server CPUs are simply slower than gaming CPUs if you overclock them (and sometimes even if you don't). We use a mess of gaming computers overclocked to 5ghz to compile single threaded compiles as fast as possible because servers suck at it.

If the games are crazy threaded you still wont win, now or in the near future, using a server. It will cost massively more to meet or beat performance of a very cost effective gaming system.

Desktop gaming systems are as cheap and high performing as they've ever been.

They dont make multi-threaded compilers?
Of course a 5ghz single thread will beat a 2.5ghz or so server cpu. Use a multi-threaded program and a 4P 64 core opteron system and then see which finishes faster.

obrith · Feb 8, 2013

"They" do not make multi-threaded compilers (FPGA layout design). My overclocked 5ghz+ i7-2700k's will destroy any Xeon you can buy in single threaded performance, or really anything that doesn't scale beyond a few threads well. I have several multi CPU X5690 systems that are awesome for running many compiles at once, but during design phase engineers want 1-2 compiles ASAP to check tweaks.

One dual-X5690 system buys 5 or more i7-2700k (or similar) systems. Obviously there are benefits to servers (ECC and otherwise) but raw CPU or GPU potential is not one of them.

EDIT -- I guess I should specify potential per dollar, since you can do something stupid like buy Everest chips for $10k+/ea (and still be slower clock speeds) or very large core counts, as stated above. And "they" are one of the largest FPGA manufacturers out there.

McTurkey · Feb 9, 2013

obrith said:
"They" do not make multi-threaded compilers (FPGA layout design). My overclocked 5ghz+ i7-2700k's will destroy any Xeon you can buy in single threaded performance, or really anything that doesn't scale beyond a few threads well. I have several multi CPU X5690 systems that are awesome for running many compiles at once, but during design phase engineers want 1-2 compiles ASAP to check tweaks.

One dual-X5690 system buys 5 or more i7-2700k (or similar) systems. Obviously there are benefits to servers (ECC and otherwise) but raw CPU or GPU potential is not one of them.

EDIT -- I guess I should specify potential per dollar, since you can do something stupid like buy Everest chips for $10k+/ea (and still be slower clock speeds) or very large core counts, as stated above. And "they" are one of the largest FPGA manufacturers out there.

If you're looking at just performance per dollar in raw cpu, then what you're saying is fine. But if you are going to be running multiple instances of a game, you want performance per watt, which means higher core counts in a smaller overall density, using fewer motherboards and power supplies. This is what the thread is about, even if the overall goal remains pretty much a pipe dream at this point.

obrith · Feb 11, 2013

McTurkey said:
... the overall goal remains pretty much a pipe dream at this point.

That is all I'm saying. Yes, you can do it. No, it wont work "well".

feebleminded · Feb 13, 2013

obrith said:
Server CPUs are simply slower than gaming CPUs if you overclock them (and sometimes even if you don't).

That is kind of true, but what I am talking about is using a GPU cluster, not server processors. Using a cluster with nVidia's latest technology, they can communicate and share memory/processor resources over the PCI-E interface directly instead of going through the system bus....

And with the latest GRID technology, the compression happens in ~10ms, so your total latency is that plus the network lag, so roughly 15ms-20ms, and you don't think you can play an FPS at that? If we shoot for 30fps in game, that means you would want a frame about every 30ms to keep up with the "sampling rate" of the eye, so this is within that range. (Granted a LOT of assumptions go into that figure, but a rough range).

RADRaze2KX · Feb 13, 2013

TBH, in an FPS you want to aim for 50+ FPS, but that's still not impossible with the situation you're giving

Cluster Hosted Gaming

Weaksauce

Supreme [H]ardness

Weaksauce

2[H]4U

Fully [H]

2[H]4U

Weaksauce

Weaksauce

Limp Gawd

Weaksauce

Weaksauce

Weaksauce

Limp Gawd

Limp Gawd

Fully [H]

[H]F Junkie

Limp Gawd

Limp Gawd

Limp Gawd

Weaksauce

Weaksauce