• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

First Windows x64 demo

Scali2

2[H]4U
Joined
Apr 20, 2006
Messages
2,845
I've just made my first Windows x64 demo...
You can find more info and download it here: http://bohemiq.scali.eu.org/forum/viewtopic.php?t=39

I'm basically looking for guinea-pigs :)
I've only been able to test it on one system here at home. It worked there, but I'd like to know if it works on others aswell, I'm not yet aware of any 64-bit compatibility issues.
Also, I'm interested in what kind of performance people are getting.
The system I tested on, was a P4 650, clocked at 4 GHz, with an X1800XT 512 mb. It ran Windows XP x64, and it got about 550 fps on the 32-bit version, and 650 fps on the 64-bit version. That's much more than I anticipated, and I'm very happy with this free performance increase :)
 
I get about 90 FPS on the 32 bit version (the plain one) with a Radeon 9550, no antialiasing, and a 3.0 GHz Pentium 4.

Will you be releasing your source code?
 
Turning it down to nearest neighbor filtering I had about 350fps in x64 versus about 250fps in every 32bit binary other than sse2 which was oddly maxing around 230fps. It works fine but I noticed that your program borks out with a "Failed to reset" dialog box everytime I ctrl-alt-delete to open task manager. I must say I like the sound / effects ;o)
 
mikeblas: 90 fps seems a bit low... Perhaps you've forced vsync on, and are running at 90 hz refreshrate?
I won't be releasing my sourcecode.

Lord of Shadows: What system is that? Seems to get even more gain from the 64-bit version than my brothers system.
Do both the 32-bit and 64-bit version crash that way? And does it also happen when you press alt-enter? (should switch between window/fullscreen-mode, which would also trigger a reset). On my 32-bit system, it works okay.
The SSE2-version is indeed slower. I made that to check whether or not SSE2 was responsible for the performance-boost in 64-bit. It probably wasn't. I think it's the fastcall, but in 32-bit mode you don't have enough registers to take advantage of fastcall, apparently.
The music is made by Khrome, by the way.
 
Both crash that way, alt+enter works fine, I'll try in win2k in a second. My system isnt overclocked right now due to a few reasons though.

athlon64 3400+ s939 newcastle (2.2ghz) with a fx5900 ultra.

I'd be interested to see how much effect that fast floating point math switch would effect things if at all.

Edit: ctrl-alt-del forces a failed to render msg box in win2k as well.
 
Hum, okay... if it doesn't happen with alt+enter, then the problem is probably not in the reset itself, but in some kind of special condition when you press ctrl+alt+del. My box is a standalone single-user box, so I get the taskmanager when I press ctrl+alt+del... but if you get that screen where you can lock your box etc, it may block D3D rendering in some way that I am not handling properly.

I can do some builds with the fast FP switch enabled when I get home, but I don't expect much of a difference.
 
On my P4 560J @ 4.1ghz / x850xt combo it was bouncing around 470-510fps for the plain/fastcall modes...but for whatever strange reason it ran significantly slower in SSE2 mode (only around 390-400fps). That's weird. I'd run the 64-bit version but no 64-bit cpu here.

edit: and yeah, it certainly doesn't like it when you ctrl-alt-del with fast user switching disabled. The music keeps going until you hit the "ok" button though :p
 
Scali2 said:
mikeblas: 90 fps seems a bit low... Perhaps you've forced vsync on, and are running at 90 hz refreshrate?
Nope.
 
oh something else I noticed too: Does this program has multiple threads? I notice that while it's running it tends to have cpu above 50% on my HT box (usually 51-53%) which indicates some kind of multithreading (I was thinking maybe sound). Is it just misreporting, is your demo multithreaded, or perhaps it is some multithreading from the video driver?

Just something I noticed.
 
If you want to see how many threads are running, you should add the threads column to task manager, or use PerfMon.

Is that 53% number for the processor, or for the system as a whole?
 
Eva_Unit_0 said:
edit: and yeah, it certainly doesn't like it when you ctrl-alt-del with fast user switching disabled. The music keeps going until you hit the "ok" button though :p

Ah, right, that way I can create the same situation on my box... I'll have to look into it, and also see what other windowed D3D applications do.
I suspect that my problem is that I only try to reset the device once, then it bails when it fails :) It probably fails to reset the device as long as the ctrl-alt-del screen is displayed, because all your windows are off the desktop then.
Perhaps I should just let it retry until the app is killed by the user.
 
Eva_Unit_0 said:
oh something else I noticed too: Does this program has multiple threads? I notice that while it's running it tends to have cpu above 50% on my HT box (usually 51-53%) which indicates some kind of multithreading (I was thinking maybe sound). Is it just misreporting, is your demo multithreaded, or perhaps it is some multithreading from the video driver?

The sound is done in a separate thread, through fmod. That thread probably won't use a lot of CPU (playing an mp3 is < 1%).
The other thread does all the rendering, and it tries to go as fast as it can, so that thread will be taking 100% CPU.
 
Scali2 said:
The sound is done in a separate thread, through fmod. That thread probably won't use a lot of CPU (playing an mp3 is < 1%).
The other thread does all the rendering, and it tries to go as fast as it can, so that thread will be taking 100% CPU.

ah okay that's what I figured.

And for the record that's 53% just for the one process, not for the entire system.
 
I've updated the archive with a few new exes, with 'fast' in the filename, those are compiled with the fast FP model. I notice a slight improvement on my PC, and also the SSE version is not slower now.
 
*bump*

Anyone tried the new versions yet, compiled with the fast FP switch?
 
Well, now that I have a dualcore processor, I've decided to make the routine multithreaded.
I seem to get a small gain out of the second code, despite the overhead and such... but only in specific cases, especially as the number of balls goes up.
I wonder if using more than two cores will show more of a speedup, since the overhead is already 'taken in' by the first two cores, and now the extra processing power is being harnessed.

I've added it to the archive here:
http://bohemiq.scali.eu.org/forum/viewtopic.php?t=39
 
I've rewritten my multithreading code in a different way... which means the threads can work virtually independently, although they generate more work for the GPU.
After some testing, it turned out that the first version was actually slower on a Pentium D, and was pretty much tied on an Athlon.
Only the Core2 Duo benefitted from the multithreaded algo.

As it turns out, the rewritten algo is even faster on Core2 Duo, and now it should also give a significant speedup on Athlons and Pentium-D.

I've also used a different testcase, which is heavier on the algo, so it gives a better view of the time saved:

http://scali.eu.org/~bohemiq/Fire.rar
 
Oh, I forgot.
I created the multithreading version using OpenMP.
If it doesn't work, you should install the VC++ runtime: http://go.microsoft.com/fwlink/?linkid=65127&clcid=0x409

Lord of Shadows: It's a 3d-volume with a simple blur-effect (like the common 2d-fire). I use the MarchingCubes algorithm to visualize it. I 'peel off' isosurfaces of the volume at various isovalues, and give each value a different colour (the isovalue is basically the temperature-component of the volume). Then I just render the isosurfaces with alphablending.
 
Back
Top