Riddle me this: Better IPC?

For me, the 10980XE result was what I expected. HEDT parts as I said, typically have additional latencies as a result of their design. Until the Threadripper 3960X and 3970X anyway. The mesh bus is likely the culprit on the Intel side. The memory timings on that aren't as tight on my test system either, not that I'm certain that those effect something like TS12 anyway. It's generally got better IPC than Intel does, but the clock rates are less static. The 3900X boosts up to 4.6GHz, but as we've seen in other tests, it falls behind the 9900K because the IPC isn't that much better and the clocks of the latter CPU offset its performance in games. So while interesting, that was predictable to some extent. I wouldn't have been able to tell you how much worse it would have done, but I figured it would do worse.
 
Dan, thanks for that. In looking back, it looks like a miscopied a number from my spreadsheet, as my SPI score is 431, not 411. To be sure I re-ran the tests:

2600K 4.7G DDR-1866 9-9-9-27 No G-Sync: 32M SPI: 431s TS12 Bench1: 33 via AB OSD dips to 32 about once ever 10-15 seconds. So:

2600k 4.7G 24 32M SPI 411.077s. TS12 - 32FPS (Average - Eyeball - Train Stationary)
2600k 4.7G 24 32M SPI 431.044s. TS12 - 33FPS (Average - Eyeball - Train Stationary)
9900k(1) 24 32M SPI 424.57s. TS12 - 30-31FPS (Average - Eyeball - Train Stationary) (Keljian)
9900K(2) 24 32M SPI 380.36s TS12 - 37.6 (Average - Frameview - Train Stationary) 5G (Dan)
3900X 24 32M SPI 501s. TS12 - 34-36FPS (Average - Eyeball - Train Stationary)
10980XE 24 32M SPI 380.79s. TS12 - 29.2FPS (Average - Frameview - Train Stationary)

Starting from 2600K (SPI: 431s, TS12B1: 33FPS) and predicting others:

9900K(1) (SPI: 425s, TS12B1: 30.5 FPS)
SPI difference 1.4% faster, 9900K(1) prediction: 33 FPS + 0.014*33 FPS = 33.5 FPS vs Measured: 30.5 FPS​

9900K(2) (SPI: 380s, TS12B1: 37.6 FPS)
SPI difference 13.4% faster, 9900K(2) prediction: 33 FPS + 0.134*33 FPS = 37.4 FPS vs Measured: 37.6 FPS
10980XE (SPI: 381s, TS12B1: 29.2 FPS)
SPI difference 13.1% faster, 10980XE prediction: 33 FPS + 0.131*33 FPS = 37.3 FPS vs Measured: 29.2 FPS
3900X (SPI: 501s, TS12B1: 35 FPS)
SPI difference 13.9% slower, 10980XE prediction: 33 FPS - 0.139*33 FPS = 28.4 FPS vs Measured: 35 FPS
Notes about these results:
  • Keljian's machine did 3 FPS less in TS12 than predicted based on a SPI estimate from the 4.7G 2600K -9% error
  • Dan's 9900K did 0.2 FPS more in TS12 than predicted based on a SPI estimate from the 4.7G 2600K +0.5% error
  • 10980XE did 8.1 FPS less in TS12 than predicted based on a SPI estimate from the 4.7G 2600K -22% error
  • 3900X did 6.6 FPS more in TS12 than predicted based on a SPI estimate from the 4.7G 2600K +23% error
  • 9900Ks predicted pretty good. 3900X underestimated, but the 4.2G clock rate prevents the part from doing much better in the game.
2600K to 9900K didn't do 1/2 bad and it nearly got Dan's result right on the nose, Keljian's, was within 10%. The 10980XE and 3900X diverged by ~22%, but in different directions, with the Ryzen part beating the estimate and the 10980X falling short.

I wonder if the difference in the 9900Ks is Frameview versus the OSD. Feel free to check the math, it can be tedious.

I'm also going to grab a 2600K @ 3.8G tomorrow and rerun the 4.7G test again for verification. Likely will also do some back to back multiple SPI runs to see how much variance there is run to run.

With any luck I should have a dynamic bench ready late tomorrow.

-Mike

Thank you, this is so much better. Focusing on the data is great.

The high error and the change in direction is what was driving me to say there was a lack of a clear answer.

I've settled in on a pretty good overclock. 4.525 on fastest CCX, lowest made it to 4.4 and the rest are in between.

I ran the two test again. 35-37 in TS12 (1fps gain), 8m30s in SPI.

SPI was running strictly on a single core so previously PBO was able to bring that individual core to 4.6.
TS12, being slightly multithreaded, PBO was only raising the utilized cores to 4.2

Based on the error, it's possible then, that the theory is accurate for mainstream Intel processors. I think that agrees with what Dan was saying about that not necessarily being the case for HEDT.
 
I am flat out with work and study at the moment so can only offer best effort - note I was using fraps as opposed to anything else. I will not mess with SPI anymore, feel it's a farce

Render max = 44.3

There HAS to be something in the code that is locking this
 

Attachments

  • Capture.PNG
    Capture.PNG
    12.7 KB · Views: 0
Last edited:
I skipped SBE because of reasons -- I think those reasons were the memory controller and SPI scores, but it was so long ago that I don't remember. I do remember the disappointment reading the [H]ardOCP SBE review. Is it fair to compare SBE to 10980XE in terms of arch reasons for the differences? What is stuck in my head was the SBE was not better for desktop sort of apps, but I haven't looked at SBE in years and have not looked at 10980XE at all. It may have simply been that 100% bump in DIMM memory bandwidth for SBE that didn't payoff internally - again my memory is foggy about it.

Memory speeds impact SPI scores, and they impact TS12 cpu limited minimum frame rates. Low latency memory is generally good for a bump equivalent to 0.1GHz more clock on SB.

I'm really jammed up at work this week, so progress may be slow. In the dynamic test, I'm getting frame rates in the "good" part that are 100+ and I think I saw 18 in the "bad" part. I try to get that test out tonight so you guys can look at it.

And data is the way to look at this problem. I'm glad we got to the same page.

GalacticAC, I know this sounds like heresy, but is it possible to run a test of your Ryzen say at 3.0G or so? Both SPI32 and TS12B1.

If the old girl will still fire up, I think I can run these benches on a 3.8G P4 Prescott.

-Mike
 
I am flat out with work and study at the moment so can only offer best effort - note I was using fraps as opposed to anything else. I will not mess with SPI anymore, feel it's a farce

Render max = 44.3

There HAS to be something in the code that is locking this

WRT to SPI, the data says there is something here, but feel free to leave it be.

From the tone of your posts I get the impression you are intrigued as to why this code runs the way it does. I've been for a very long time. There are places on the map where it drops like a stone and begs the question "just what the hell is it doing?". I don't have the means or the time to figure the reasons behind it out, but I'm interested in what you can figure out -- just to be clear, I'm not asking anybody to fix the code or help in tuning it.

-Mike
 
I skipped SBE because of reasons -- I think those reasons were the memory controller and SPI scores, but it was so long ago that I don't remember. I do remember the disappointment reading the [H]ardOCP SBE review. Is it fair to compare SBE to 10980XE in terms of arch reasons for the differences? What is stuck in my head was the SBE was not better for desktop sort of apps, but I haven't looked at SBE in years and have not looked at 10980XE at all. It may have simply been that 100% bump in DIMM memory bandwidth for SBE that didn't payoff internally - again my memory is foggy about it.

Memory speeds impact SPI scores, and they impact TS12 cpu limited minimum frame rates. Low latency memory is generally good for a bump equivalent to 0.1GHz more clock on SB.

I'm really jammed up at work this week, so progress may be slow. In the dynamic test, I'm getting frame rates in the "good" part that are 100+ and I think I saw 18 in the "bad" part. I try to get that test out tonight so you guys can look at it.

And data is the way to look at this problem. I'm glad we got to the same page.

GalacticAC, I know this sounds like heresy, but is it possible to run a test of your Ryzen say at 3.0G or so? Both SPI32 and TS12B1.

If the old girl will still fire up, I think I can run these benches on a 3.8G P4 Prescott.

-Mike

Sandy Bridge-E was an HEDT part, not a mainstream desktop part. Architecturally, it was no different than Sandy Bridge. The difference is that it offers four memory channels, has no integrated GPU, more PCIe lanes and two more CPU cores. This was at a time when there was absolutely no value in using something like that for gaming. HEDT parts have often been well ahead of the curve in terms of what they offer, but they've never been better at standard mainstream parts for applications such as gaming. The one and only reason why people like me used Sandy Bridge-E for gaming, was because we could use two graphics cards with full x16 PCIe lanes without resorting to a latency inducing PLX chip. Even this was of minimal value.

I also see no point in benchmarking Ryzen 3000 series CPU's at 3.0GHz. We know where it stands for IPC vs. Intel. There is plenty of benchmark data out there with them clocked at the same speeds. (4.3GHz or thereabouts.)

I will also try this on my 3900X and see what I get. It will give us another data point. Although, it might work slightly better on a single CCD Ryzen 3000 rather than a 3900X or 3950X.
 
Last edited:
I also see no point in benchmarking Ryzen 3000 series CPU's at 3.0GHz. We know where it stands for IPC vs. Intel. There is plenty of benchmark data out there with them clocked at the same speeds. (4.3GHz or thereabouts.)

I will also try this on my 3900X and see what I get. It will give us another data point. Although, it might work slightly better on a single CCD Ryzen 3000 rather than a 3900X or 3950X.

Frankly the AMD 3.0G number is more for my curiosity and doesn't matter a whole lot now.

The bang for the buck on the 3600/3600x is very tempting. For about $350 for CPU/Mobo/RAM it's heck of a deal considering 2600Ks new were about $300 just for the CPU. Is it correct to assume the better Ryzen performance shown here so far would translate to multi-threaded (not TS12) as well? I think so based on the theory presented earlier.

-Mike
 
Frankly the AMD 3.0G number is more for my curiosity and doesn't matter a whole lot now.

The bang for the buck on the 3600/3600x is very tempting. For about $350 for CPU/Mobo/RAM it's heck of a deal considering 2600Ks new were about $300 just for the CPU. Is it correct to assume the better Ryzen performance shown here so far would translate to multi-threaded (not TS12) as well? I think so based on the theory presented earlier.

-Mike

Ryzen's design tends to translate to better multi-threaded performance. Intel's clock speed advantages tend to drop off in multi-threaded workloads and thus, the gap between the two closes. Ryzen has superior IPC (slightly) and thus, Intel often needs more cores to be competitive. For example, Intel's 10980XE loses some benchmarks against the 3950X which has two less cores and better clock speeds. You have to overclock the 10980XE for it to come out ahead consistently.

Core i9 10980XE @ 4.7GHz - Static benchmark.
257672_upload_2019-12-6_13-7-33.png


Core i9 9900K @ 5.0GHz - Static benchmark.
258215_upload_2019-12-8_20-48-42.png


AMD Ryzen 9 3900X - PB2 - Static benchmark.
upload_2019-12-9_15-51-35.png


Core i9 9900K @ 5.0GHz - Proper benchmark
257664_upload_2019-12-6_12-50-33.png

Here is all my data gathered in one place, aside from the moving benchmarks fro the 10980XE. Here is the text version:

Results displayed in Render Min/Avg/Max FPS

Core i9 10980XE @ 4.7GHz - Useless Static Benchmark
21.2/29.2/32.3
Core i9 9900K @ 5.0GHz - Useless Static Benchmark
28.4/37.6/40.1
AMD Ryzen 9 3900X @ Stock PB2 - Useless Static Benchmark
15.3/33.9/39.2
Core i9 9900K @ 5.0GHz - Proper Benchmark
2.4/29.8/46.9

What the data tells is here is a couple of things. First off, the Intel Core i9 9900K @ 5.0GHz is faster than AMD here. The minimum frame rates are much higher in the static scene on Intel than they are AMD. The average seems only slightly higher, but when we are talking about less than 40FPS, those 3-4FPS are a good 9-10%. The maximum FPS is surprisingly the same, which tells me there is something limiting their performance here. I still think its the V-Sync / frame rate locked physics engine. Until we can push upwards of 90FPS or more, there is no way we won't see V-Sync cut our FPS in half every time we drop below 60FPS.

Notice I did include one benchmark where the train actually moved. We can see a couple of important differences in the result, and makes the case for why the static value isn't all that useful. Our minimum FPS tanks badly when moving. Sure, the maximums can go higher but ultimately, our minimums drop. Those ultra low minimums are what you want to look at. If you really want to get into it, the frame times will tell you how often this happens and what your performance looks like the bulk of the time. This tells you which CPU provides the best experience.

In that case, this is the Intel machine. Not only are the actual benchmark numbers better, but so are the frametimes. The render and display numbers are both a good 8-10% higher. Weirdly, this may actually bring us somewhat back in line with Super Pi as the 3900X does worse there and does worse in the game. No doubt its faster per clock, but we can't get them to 5.0GHz, so that doesn't matter. Intel's clock speed advantage is clear here. Going a step further, it also shows why we can't trust Super Pi, as our HEDT 10980XE has almost identical Super Pi results yet performs worse than the other test configurations by a substantial margin.

At least, that's what I can conclude from the static scene test, which I wouldn't advocate as being entirely accurate since it isn't representational of the actual game play experience, as I've shown.
 
Last edited:
The static scene test was just to see if we could correlate with the simplest test possible.

One of the other things NV3Games did is they released service packs for TS12 and each made the game worse. The last "good" one was build 49922 and is the one I have experience with. The steam version I have is build 61297 - you can see the build number in the lower right hand corner of the splash screen. I don't know the Amazon version. What I do know is TS12 doesn't like mixing some things between versions so before I spend the time needed to finish this I want to make sure you guys can load the session.

The saved session is 110MB, too big for attaching here. Any suggestions for easily sharing it?

-Mike
 
48249 is the build of the Amazon version.
I think this is going to have to be shared as a .CDP as the saved sessions are pretty much binary blobs (I think) and I've never been able to run them on different versions. A little more work, but should be still doable. The nice thing is the .CDP file will be small and can be hosted here.

Dan, there is a patch that upgrades 48249 to 49922, arguably the "best" version: http://download.ts2009.com/patch/48249_to_49922.exe

-Mike
 
What about dropbox/onedrive/google drive etc hosting?

I'll get that patch installed when I get home, if you can share that dynamic benchmark (google drive ought to work), I'll get frameview going so we get that apples to apple test.

I can declock to 3.0 later, I'm guessing the fps will be around 25. If you want a direct clock for clock comparison perhaps we could all clock to 4.0 or 4.2 for testing, then again if you want to see what does the best period, then you'll want each CPU at its best.
 
Ok, lets try this, download TS12Bench2Sesssion-Bench2.zip from my google drive:

Unzip it to a temporary directory. You should get a directory named TS12Bench2Session-Bench2. Inside that directory will be a bunch of .save files. Move the TS12Bench2Session-Bench2 directory to TS2012\UserData\cache\sessions where TS2012 is the root directory of the game. In my steam version I used the default installation and it ended up in C:\Program Files(x86)\Steam\steamapps\common\TS2012 on Win7. When done correctly, TS12Bench2Session-Bench2 will be a directory in that sessions folder.

Start TS12 and click Start on the splash screen. Click Saved Sessions and then double click on ts12bench2session-bench2. If successful you will be presented with an outside view of a running train. The view will update itself and the trains will drive themselves.

If the session is not there on the Sessions screen, then double check if you put it in the right directory. If it is in the right directory and does not show, then this technique will not work. It could fail to show in the Sessions and it could also fail to run.

As a preview, the benchmark I'm intending is from the beginning of the session until the train gets to the second bridge on the far side of Tehachapi, CA. It will pass another running train in the opposite direction. Just like the first benchmark, no user intervention required beyond enabling data collection.

Please let me know if this works and which TS12 version. We need to test all TS12 versions because I'm not sure this will work same version to same version as TS12 treats the saved session different than anything else.

If this doesn't work I have a plan B which should work across versions, but the setup and running is less convenient.

-Mike
 
I patched to version 49922 from 48249. Everything appears as it is going to work, but the game crashes when attempting to load the session. Everything else as it comes with the game seems to still work.
 
I patched to version 49922 from 48249. Everything appears as it is going to work, but the game crashes when attempting to load the session. Everything else as it comes with the game seems to still work.

That's what I was afraid of. Ok, plan B. Ironically, I could install 49922 but then the Steam people would be out in the cold. Using a CDP file should work with any version.

Stand by.

-Mike
 
ETA: This will not work for 49922. I removed the attachment. I'm going to try building the benches in 49922 as going up in build numbers generally works.

Ok, let's try this for a dynamic scenarios:

1) Download the attached .zip and extract to a convenient directory.
2) In that zip are two .cdp files.
3) Start TS12
4) At the game splash menu, click Content
5) At the Planet Auran popup click cancel. Also cancel this anytime it pops up - it's nagware for our purpose.
6) Click on File and select Import CDP.
7) Select the extracted CDP files from step 1.
8) Click on File and Launch Trains.
9) Once inside TS12, click Select Route
10) Click Mojave Sub Division
11) Double click TS12Bench2 or TS12Bench3

The sessions run automatically. Bench3 is longer than Bench2 and will show more variation. If we can get these to run there is a way to shorten Bench2. In a way this is better because you can't change run controls or the views so every run will be the same.

When you exit TS12, it will take you back to the Content Manager, just exit that. You only have to do the content import once. The next time you start TS12 the Bench2 and Bench3 sessions will be there.

On my steam version, there are some video artifacts while switching views sometimes as most of the train disappears, it then reappears. This does not seem to impact the frame rate behavior. I don't think this happens on 49922.

Please let me know if this works and then we can move on. I propose we end the test at the highway bridge on the other side of Tehachapi. We can make the bench faster if we start at the highway bridge before Tehachapi, but this stuff needs to work first.

Let me know.


-Mike
 
Last edited:
Ok, let's try this for a dynamic scenario:

1) Download the attached .zip and extract to a convenient directory.
2) In that zip there is a .cdp file.
3) Start TS12
4) At the game splash menu, click Content
5) At the Planet Auran popup click cancel. Also cancel this anytime it pops up - it's nagware for our purpose.
6) Click on File and select Import CDP.
7) Select the extracted .CDP files from step 1 (TS12Bench2)
8) Click on File and Launch Trains.
9) Once inside TS12, click Select Route
10) Click Mojave Sub Division
11) Double click TS12Bench2

The session runs automatically. In a way this is better because you can't change run controls or the views so every run will be the same. Make sure the TS12 settings are the same as TS12 Bench 1. This should work on 49922 or later builds, including the steam version 61297. I tested with 49922 and 61297.

When you exit TS12, it will take you back to the Content Manager, just exit that. You only have to do the content import once. The next time you start TS12 the Bench2 session will be there.

On my steam version, there are some video artifacts in 61297 while switching views sometimes as most of the train disappears, it then reappears. This does not seem to impact the frame rate behavior. This does not happen on 49922.

Please let me know if this works and then we can move on. I propose we end the test at the highway bridge on the other side of Tehachapi. We can make the bench faster if we start at the highway bridge before Tehachapi, but this stuff needs to work first.

Let me know.

-Mike
 

Attachments

  • TS12Bench2.zip
    83.3 KB · Views: 0
Captured with FRAPS starting as soon as session finishes loading and stopping after 390 seconds. This is TS12 Bench 2 49922. Start the bench, start FRAPS, let it finish on its own. Unrestricted frame rate, 1080P low video settings, games settings same as TS12 Bench 1.

2600K 9-9-9-27 DDR3-1866.

I'm just going to leave this here:

upload_2019-12-11_20-56-24.png


-Mike
 
Last edited:
Fraps does not give frame times, just average frame rates, therefore is not representative - also, please remove the big chart
 
Fraps does not give frame times, just average frame rates, therefore is not representative - also, please remove the big chart

We will just have to agree to disagree on that. The chart shows I can predict a measure of frame rate performance from a 3.8G machine of a 4.7G machine without any data from the 4.7G machine except a SPI32 score. And I've been doing it all the way through this thread with the exception of Ryzen 3900x - it would not surprise me if I could predict Ryzen to Ryzen as well. Need a different clocked Ryzen to find out.

I also know from experience that if one can keep those frame dips above 33FPS or so, the game will run smoothly VSYNC'ed at 30FPS.


-Mike
 
No FRAPS. It doesn't work.

Looks like it works just fine to me. ETA: In context that was kind of snotty, sorry.

NVidia FrameView does not work, as in will not run on my Win7 system w/o the Win7 telemetry update, so that's a no go for me. If you collect FrameView data over time, you will very likely get plots very much like mine.

-Mike
 
Last edited:
Looks like it works just fine to me.

BTW, NVidia FrameView does not work, as in will not run on my Win7 system w/o the Win7 telemetry update, so that's a no go for me. If you collect FrameView data over time, you will very likely get plots very much like mine.

-Mike

That's the issue. I'm not running Windows 7. In the later builds of Windows 10, FRAPS often doesn't run. I've tried to make it run with compatibility mode settings but it doesn't work on any of my test systems. Frameview will get me the data I need.
 
Dan: I got fraps to work on the latest version of Windows.. want me to get some benches up or frameview?
 
I really don't think FRAPS or Frameview matters as long as Frameview can collect data over time, which I'm pretty sure it does. Frameview is likely better as FRAPS data is pretty limited.

-Mike
 
As an information point, in the past when trying to improve the performance in this section of the map, the only thing that helped was taking a couple locomotives and about 40% of the length of each train. Trimming back the buildings, removing cars, removing the windmills, taking out 1/2 the trees not only didn't help, it did nothing for performance.

I wanted to keep the bench short, but if it starts farther outside of town, framerates above 60 FPS are possible. Coming into town, the framerate drops and when other rail vehicles start coming into the draw distance, the framerate further drops.

There are other parts of the Mojave map where the CPU loading is low and GPU is quite high. In these parts, the geography is much more complex.

-Mike
 
We will just have to agree to disagree on that. The chart shows I can predict a measure of frame rate performance from a 3.8G machine of a 4.7G machine without any data from the 4.7G machine except a SPI32 score. And I've been doing it all the way through this thread with the exception of Ryzen 3900x - it would not surprise me if I could predict Ryzen to Ryzen as well. Need a different clocked Ryzen to find out.
-Mike

Just want to point out that both of these are important measures. Frame-time could show irregularities that are hidden under or over the measure of fps since frametime can be measured per frame while fps, by nature of the unit, is a moving average.

I'll see if I can get fraps to work and run it with both frameview and fraps. Hopefully there is no appreciable difference in data collected.

I don't think anyone is disputing that the same CPU at different clocks will perform proportionally better or worse. The problem from earlier was the predicting of performance across different CPU generations and brand.

As an information point, in the past when trying to improve the performance in this section of the map, the only thing that helped was taking a couple locomotives and about 40% of the length of each train. Trimming back the buildings, removing cars, removing the windmills, taking out 1/2 the trees not only didn't help, it did nothing for performance.

Definitely sounds like physics calculations of some sort. I was impressed with the realism of the coupling mechanics, I would guess that this is why.
 
...



Definitely sounds like physics calculations of some sort. I was impressed with the realism of the coupling mechanics, I would guess that this is why.

Just wanted to chime in about that, considering I have done it in real life almost every day for 18 years :)

It's something that's hard to replicate the feeling of in a game environment due to the shear forces involved.

Curiously, we now use simulators to recertify our engineers every three years, they even have an RCL simulator for PC used for recert for my craft.
 
Dan: I got fraps to work on the latest version of Windows.. want me to get some benches up or frameview?

It doesn't really matter. For whatever reason, I can't get FRAPS to work on Windows 10 1903 or 1909. which is what led me to getting Frameview. Frameview is an excellent tool and I like the fact that the output is a .CSV file with a lot of information in it. It can also pull power consumption data as well. This admittedly doesn't work perfectly on AMD cards, but for NVIDIA's its spot on.
 
Back
Top