Optimizing Game Load Times

Neapolitan6th

[H]ard|Gawd
Joined
Nov 18, 2016
Messages
1,182
I have historically been a major fan of large open world games and modding them to oblivion with large 2k-4k-8k texture files.

As a result I have found load times to be the biggest thorn in my side when playing such games as Skyrim.

I am currently toying with the idea of building a new system in the future that is not optimized for the highest FPS per say, but rather the fastest loading times.

Here are the bottlenecks I am aware of:
-HDD/SSD storage
-RAM speed/latency
-CPU (decompression?)

To combat storage problems, I plan to use a ram disk with the fastest low latency RAM kit preferably B-die that I can get my hands on. (Probably north of 64gb)

As far as the CPU goes, traditional wisdom points towards Intel being the winner, however I have noticed that AMD tends to smoke Intel when ot comes to 7-zip decompression tests. How important are these decompression metrics when it comes to game load times and could an AMD CPU actually be faster than Intel in this scenario?

Thanks for any input anyone has on the matter. Any suggestions for trimming down even a miniscule amount of loading times would be appreciated.
 
Try to find an optane drive for your games, those have the fastest load times, but ofc you also need a strong CPU.

Ofc it helps if the game is coded to make the most out of what it has available, and sorry to say, but Bethesda games are far from great in this regard, and adding mods to that that might be even less optimised or don't work well together.
 
In my experience, it won't help much.

I've tried putting games with long load times on RAMDisks before, and ... it didn't really help.

YMMV thought - every game is a bit different.
 
I can help on the SSD end, for sure. Optane > SMI controllers (SM2262/EN, SM2263) > Phison controllers, with Samsung not being cost-effective. NVMe > SATA. If you're looking for the most cost-effective solution, that would be the 660p (or actually the upcoming 665p), if you can make do with 1TB or less then the SX8200 Pro and EX950 are top dog outside of non-NAND (e.g. Optane) drives. The older EX920/SX8200NP are nearly as good, but you can also find the Kingston KC2000 with the SM2262EN which uses 96-layer NAND which will match the best. RAID-0/stripe does not help with game load times because your latency and 4K randoms remain the same (with more overhead, if anything).

I'm less useful discussing memory and CPU, but I have tested my 6700K vs. 3700X extensively with different RAM. I would say that bandwidth is king for load times, which means interleaving is ideal; this would be 2 or 4 ranks per channel. Latency is a secondary concern. B-die will get you effectively the highest, however the most cost-effective will be Micron E-die for sure. For CPU, generally Intel is superior as it has higher single-core/thread performance (due to clock) and also better cross-core latency, although with core management (e.g. Process Lasso) I find Zen 2 is actually faster; the limitation is the core speed, which if improved with Zen 3 in 2020 might nudge it in AMD's favor.

Two additional notes: first, it depends on the game and second, use of software like PrimoCache can overcome some limitations with proper configuration. Also, I'm talking consumer systems, not enthusiast or something with persistent Optane DIMMs or anything exotic.
 
As far as the CPU goes, traditional wisdom points towards Intel being the winner, however I have noticed that AMD tends to smoke Intel when ot comes to 7-zip decompression tests. How important are these decompression metrics when it comes to game load times and could an AMD CPU actually be faster than Intel in this scenario?

Thanks for any input anyone has on the matter. Any suggestions for trimming down even a miniscule amount of loading times would be appreciated.

7-zip decompression can be made highly parrelelt so it can utilize many cores.
Also 7-zip copression is mostly integer instructions negating any benefits from a cpu having a better fpu (that is important for games)
That why it was a love benchmaark for ABM bulldozers. As it could utilize the full integer cores and not show the bad slow fpu and half FPU core count

Texture compression i believe is mostly based on lossy compression which can utilize FPU instructions ( rounding is no longer and issue) and depeding on how the game engine code the decompressor. 7-zip can be a pretty useless benchmark for the purpose you are are thinking
 
7-zip decompression can be made highly parrelelt so it can utilize many cores.
Also 7-zip copression is mostly integer instructions negating any benefits from a cpu having a better fpu (that is important for games)
That why it was a love benchmaark for ABM bulldozers. As it could utilize the full integer cores and not show the bad slow fpu and half FPU core count

Texture compression i believe is mostly based on lossy compression which can utilize FPU instructions ( rounding is no longer and issue) and depeding on how the game engine code the decompressor. 7-zip can be a pretty useless benchmark for the purpose you are are thinking

The texture decompression should be running directly on the GPU, with little to no CPU involvement. Basically, you can send the compressed version right into the renderer as-is.

That's assuming the normal Block Compression standard formats are used, which DX requires. Most are low-bit depth integer ops, ideal for GPUs.
 
The texture decompression should be running directly on the GPU, with little to no CPU involvement. Basically, you can send the compressed version right into the renderer as-is.

That's assuming the normal Block Compression standard formats are used, which DX requires. Most are low-bit depth integer ops, ideal for GPUs.

correct if we are talking about the "hot" compression again done while rendering aka the compression level for saving GFX ram.

However from what I see in a lot/most/plenty of games does not store the "cold" texture on the hard drive in this format natively but in like .png .tga (quake3 does this) .jpeg
So from cold storage its gets decompress from the storage media into ram. Rhen compressed into S3TC or (more likely) one of the DXTC formats, then uploaded to Vram, where its decompress lived while rendering.

The S3TC format (and I belieev the DXTC formats too but not 100% sure) are all Constant ratio, lossy, blockbased compression .

So if you get a 4:1 compression ratio. The render knows that pixel 128 is in the block 32bits in
and thereby can seek and decompress only that block needed while rendering.
Unlike PNG and jpeg that would requires all data ahead of the pixels needed, to be decompresses in some regard.
 
Last edited:
Back
Top