CAD system for client - Intel 5520 chipset + 2x Quadro FX5800's = issues

been following this thread since the beginning.. crazy stuff but very interesting to read.. just out of curiousity.. have you tried say running 2 regular nvidia cards(matching or non matching) instead of the quadro's.. just to eliminate the board as being the issue?
 
I'd love to do that. Sadly, I can't. I don't have a desktop SLI setup to test that theory out, or even two nvidia PCI-E cards.
 
darn.. well then i dunno.. this is way out of my league.. i just figured id take a stab at it since i didnt see anyone suggest trying that..

but ill keep googling trying to find a way to disable the physX crap.. because that idea sounds like the most logical reason for it.. though have you tried using the sli bridge just to see if it really is the physX.. because in SLI the physX problem shouldnt effect the second card..
would atleast narrow down the possibilities.. (though i cant remember if you ever said if the cards even supported SLI)
 
from what i can find on PhysX.. is that the registry files are under the folder name AGEIA.. though there are still some left over in the windows/system32/64(been way to long since i used a 64bit os)

and to uninstall the physX drivers completely.. it should be under the add and remove as AGEIA PhsyX.. not sure if any of that would help..
 
It is no longer listed that way. It's been quite a few driver revisions since that was the case. PhysX is now fully integrated into the display driver package. The card do support SLi, and I have an old SLi bridge from an old board that works with the spacing between the cards. however, the motherboard does not support SLi, so connecting the bridge effectively did nothing.
 
on my add and remove.. there is Nvidia PhysX which is 121mb..

at this point im just coming up with idea's.. who knows maybe a better.. more brilliant idea will come to mind for ya.. :p
 
Programs and Features in Vista lists:

Nvidia Display Driver
Nvidia Control Panel
Nvidia Performance Drivers

That's it, other than the Realtek Audio drivers. There's nothing else on the system. The computer gods hate me.

You know what I keep thinking, after wasting my entire weekend on this crap? Why, oh why didn't I just smile and nod when this client wanted to buy a liquid cooled Alienware for CAD? What the fuck is wrong with me?
 
lol your not alone.. theres a reason i only build a new system every 2 - 3 years.. because every time i do.. it turns into the exact same problem your having.. hell if it was me.. i would of gone to a store.. bought 2 gtx 285's.. thrown em in and sold the quadro's on ebay or shipped em back at this point.. since in reality.. the gtx 285's would do just as good as the quadro's.. and only cost 1/4 of the price..
 
lol your not alone.. theres a reason i only build a new system every 2 - 3 years.. because every time i do.. it turns into the exact same problem your having.. hell if it was me.. i would of gone to a store.. bought 2 gtx 285's.. thrown em in and sold the quadro's on ebay or shipped em back at this point.. since in reality.. the gtx 285's would do just as good as the quadro's.. and only cost 1/4 of the price..

we are very nearly coming into the timeframe where I will have to notify the client of the issue. We could drop the config wdown to the single quadro, ship one back, lose 15% of $3k ($450) in restocking fees, and call it a day. the problem is, that $450 comes off of my P&L, and the client won't be happy with a less than expected config. And our president won't be happy at eating $450, with an unhappy client.
 
Update: I just got an:

NMI: Memory Parity Check
Memory Parity error

Never seen that before.

Further reads on memory spec show Quad ranked DDR3-1333 ECC Registered 4GB/stick, 12GB/kit (x2) This board supports SR and DR DIMMs for 1333 operation. I'm gonna go manually set RAM speed in BIOS to 1066.

QR fastest speeds, according to motherboard manual, are supposed to be 1066
 
el ebay-o-time-o.. or just refund the client.. bite the bullet for the 3k and use it in a different rig down the road..

or hell.. just tell the client the gods honest truth.. that theres something wrong.. and you are still trying to troubleshoot it.. and just ask for a few more days.. unless the clients a total asshat.. im sure he'd be willing to give up a couple more days..
i mean hell theres no reason to run CAD right now anyways.. no ones buying houses or building anything for that matter.. :p just had to add that for my amusement..
 
Update: I just got an:

NMI: Memory Parity Check
Memory Parity error

Never seen that before.

Aww man, hopefully you don't have a RAM issue or you're going to be testing each of those sticks.

I've been following this thread from the beginning, and I must say kudos to you for sticking with it for this long. I probably would have notified the client by now of the situation and advised a different set of hardware. Good luck on finding out the problem. By the way, if you do figure this out Nvidia, Intel, Supermicro, and PNY should all send you a word of thanks for the hard work you've put in when they're they one's that should have figured this out.
 
Failure is not an option...at least, not yet. I think I might just be too dumb to give up...which sucks.

On the RAM error, I have yet to see that one, but I just forced 1066 operation speed in BIOS, so we'll see if that helps at all. If the card has to go back at all, we'll be eating the restock fee. Chances of us using this exact model card in a future build is very very slim. Possible, but slim.

And no, I don't want to memtest all of these sticks.

With regard to the timeline, the client paid extra to expedite the shipping, and is expecting it ASAP. I gave a non-expedited delivery time of 14-16 business days, and I told them that paying actual cost for shipping to be expedited would liekly shave 3-4 days off of the delivery estimate. I was hoping for Tuesday, but it's not looking possible, at all. Client is a mechanical and electrical engineering firm that just won a large project contract. they're expecting a lot from this machine, and quickly.
 
ram speed change did nothing to solve the problem. I wonder if I installed the slipstreamed os with both cards installed, what would happen...probably a kernel panic.
 
oh dang that sucks.. yeah.. screw memtesting all of that ram.. though the sad thing is bad memory can also cause issues with video cards.. though ive never seen it cause an issue where 1 works and 1 doesnt..

ive never run sli before.. but would installing the bridge.. then disabling SLI in the drivers cause the second card to become completely useless or would it just treat it as a second card?

yet again another one of my dumb idea's to try and find a work around for the physX crap..
 
im still reading up on various things.. just out of curiousity.. if you go into the nvidia control panel under manage 3d settings/global settings tab.. what does it have the multi-display/mixed-gpu acceleration set to? by default on mine its set to multiple display performance mode.. im wondering if with the second card installed.. that its automaticly setting it to multi-gpu mode which would then force you to use the physX on the second card.. its an old BOINC issue/trick sorda thing that i remembered when running multiple cards in non SLI mode.. so if it is.. i wonder what would happen if one of the cards was set to run in multiple display performance mode or single display performance mode..
 
Morning. *yawns* I called it a night early yesterday, weather's been kicking my rear.

I think I know what's going on now though. You're not going to like it. We need to get Supermicro involved here ASAP; this is exactly why I should have enabled the Watchdog NMI much, much sooner.

I have seen this error before. Not with Quadros, not with the same class of board, but it doesn't change things that much. The Quadros are doing something on the PCIe bus which is actually causing a board-level error. Memory Parity Error NMIs should never occur; you're running ECC Registered. It is, in essence, physically impossible.
Probably, you should get Supermicro on the phone and let them know about this. This is a major, major error that they really need to know about immediately. Make sure you identify what exact memory you're using, down to the part number, if you haven't already. That it hasn't tripped again is kind of irrelevant; it shouldn't be physically possible in the first place.
 
I have just written Supermicro, and will hopefully get to call them before CoB today, to update the case information and bug them on the phone.
 
I have just written Supermicro, and will hopefully get to call them before CoB today, to update the case information and bug them on the phone.

With some luck that MPE will tip 'em to something obvious. I'm hoping, hoping it's a BIOS glitch that causes the power-off. But that won't solve the SERR/PERR I suspect is happening.
 
I just got feedback from a reliable source. He's with me in suspecting problem is entirely nVidia; the 5520 chipset does NOT support SLI. nVidia is playing their games again, and refusing to consider it for certification, from what I'm being told. Who identified dual Quadros as being compatible? Single, yes, but dual definitely should never have been done. The shutdown is a completely unreasonable response to it, and most likely the driver deliberately doing so, which is why we can't bloody trap it. There's nothing WRONG.
 
Supermicro BIOS engineer confirmed that they reviewed the PCIEX.exe tool ouput, and found no conflicting issues, or memory allocation issues from the output. they also confirmed that the troubleshooting here has been extensive, and assisted greatly.

Thank you AreEss, for all of your help thus far. To everyone else, thank you to you, as well. You input has prompted me to try things that I had not considered to try.

So, for now, it looks as if Supermicro is off of the hook, completely. I have one contact at Nvidia, and one at PNY to get ahold of, tomorrow. Hopefully, they can provide a final answer (death blow) to my issue. I have seen people running these dual Quadro configs on Intel workstation shipset board, before, without issue. I have seen Quadro drivers of the x64 Windows variety that were a total joke, and put the second card in Code 10. I've ssen QuadroPlex IV 1000's error out on the second card, on an entirely supported platform...but I've NEVER seen this.

I would agree that if it is entirely Nvidia's fault, that the way they are choosing to handle it, in the driver, is ridiculous. I guess I might find out tomorrow, for sure.

I am considering alternative options. However, I think the only solution that might be acceptable to the client would be an equally powerful GPU in the form of a Tesla. We can use the second DVI from this single Quadro to pump both 30" displays, and still have the second CUDA capable GPU to use for AUTOCAD acceleration.

Anyone have any input, there? BTW, I have seen this same chassis, motherboard, CPUs, PNY Quadro, and the Tesla offered by a company (Saturday night, I found it Googling) for Maya and CAD workstation usage. Anyone know which Tesla (I haven't looked, yet) would be the closest match in performance to the current second Quadro?
 
looks like the Tesla C1060 is an exact match, minus the framebuffer, SLi connector, and any sort of display ouput
 

I wouldn't. Dollars to donuts, it's the same damn issues. If not worse. Frankly, it's even more likely to not work. The Tesla cards are far, far touchier than the Quadros - and nVidia is hellbent on people not finding this out. I have yet to see a working "hybrid" system that didn't take months and screaming, or wasn't a demo box from nVidia. (Literally, Tesla is just a GeForce with the output lines cut. That's it.) I am entirely unconvinced from this experience that you will fare any better there.

The only thing I can recommend with a clear conscience is to go to FirePro's or FireGL V8650's. Frankly, your customer is not likely to see significant - if any - performance difference. Not only that, but you'll A) get actual support B) I've tested dual FirePro and V8650. It works very well. The best option would be V8650's with the Stream SDK, each driving one display. I would have to know more about the specific datasets, but chances are quite good that they are severely overbuilding on the word of somebody who wants a stupid overpowered desktop. ;P
 
simultaneous copies of AutoCAD 2010 MEP x64, NavisWorks 2010 MEP x64 and Revit 2010 MEP x64, rendering in real time, all in 3D. the customer will be receiving point cloud data from the primary on the contract that they just go, and then will be using those files to render the environment, and then modify and manipulate their objects inside of the laser scanned representation of the actual environment that they will do the work in.

point cloud data will be upwards of 700MB files, just for the environment. expectation is that these will soon grow exponentially larger in size. customers own data sets, per object are anywhere from 2-100MB per object. These sizes only apply to the renderings happening on one monitor, in AutoCAD 2010, itself. The second screen will be running Revit for real time renderings of the changes taking place on monitor number one. NavisWorks will also be running.
 
what the most comparable ATi solution then, what does it cost, and where do I get it?
 
from the chart:

X8DA3 Tested with 2x HIC cards installed.
RHEL 5 64-bit Check mark

&

X8DA3 RHEL 5.2 64-bit Tested with 2x Tesla C1060
 
So far I have talked to 3 people at both Nvidia and PNY, and I am making real progress. Channel Partnerships are good, too.
 
Yea i know it really sucks
I would definately buy a S1070...you know those 1u rackmounts with 4 of GPU's in them, but I cant run windows.
 
I've for 2x FAE's for Nvidia, and 2x driver engineers from both PNY and Nvidia assisting me, at this point. They have had me run a dump tool, and ship off output from the tool to them. They have confirmed no issues with the tools run on single card configs (run on each card)

The 1U quad GPU solution is out of the question, currently.
 
simultaneous copies of AutoCAD 2010 MEP x64, NavisWorks 2010 MEP x64 and Revit 2010 MEP x64, rendering in real time, all in 3D. the customer will be receiving point cloud data from the primary on the contract that they just go, and then will be using those files to render the environment, and then modify and manipulate their objects inside of the laser scanned representation of the actual environment that they will do the work in.

point cloud data will be upwards of 700MB files, just for the environment. expectation is that these will soon grow exponentially larger in size. customers own data sets, per object are anywhere from 2-100MB per object. These sizes only apply to the renderings happening on one monitor, in AutoCAD 2010, itself. The second screen will be running Revit for real time renderings of the changes taking place on monitor number one. NavisWorks will also be running.

Okay. So, we probably should talk offline. The email address I already gave you is fine.

They want the wrong answer to the wrong question. What they're talking about doing there is both madness and not going to work. Secondly, finalized render work should NEVER BE DONE ON NVIDIA UNDER ANY CIRCUMSTANCES. nVidia has significantly lower image quality, and that's just cold hard fact. Any finalized rendering work should only be performed on 3DLabs REALiZM 800's, FireMV or FirePro. Anything that involves lighting should be done purely on 3DLabs when possible.

Really, though, it sounds like their workflow is badly mangled. More likely, they should be using a pre-processor attached to SAN, given the dataset sizes, and feeding probably an LSI 8480-powered RAID5 using 10k's or 15k's on the workstation side. (RAID5 for performance, not capacity.) I'd need to sit and talk with 'em to be certain though, but it really sounds like up-front data conversion is a better route.
 
Okay. So, we probably should talk offline. The email address I already gave you is fine.

They want the wrong answer to the wrong question. What they're talking about doing there is both madness and not going to work. Secondly, finalized render work should NEVER BE DONE ON NVIDIA UNDER ANY CIRCUMSTANCES. nVidia has significantly lower image quality, and that's just cold hard fact. Any finalized rendering work should only be performed on 3DLabs REALiZM 800's, FireMV or FirePro. Anything that involves lighting should be done purely on 3DLabs when possible.

Really, though, it sounds like their workflow is badly mangled. More likely, they should be using a pre-processor attached to SAN, given the dataset sizes, and feeding probably an LSI 8480-powered RAID5 using 10k's or 15k's on the workstation side. (RAID5 for performance, not capacity.) I'd need to sit and talk with 'em to be certain though, but it really sounds like up-front data conversion is a better route.

You have confused me. The client that this is for has specifically stated that they wanted Quadros. FireGL cards are only an option if we can't get the Quadros to work. They're not really worried about IQ here. The RealiZMs don't appear to even be in production anymore, and are not readily available. The timeline for delivery is very short, so waiting weeks for anything was out of the question.

And who does RAID-5 for performance? whether you're using onboard SAS controller, onboard SATA controller, or a dedicated RAID card, you still have to calculate parity, somewhere. They are not (for this machine) concerned with data loss all that much (a reload at this point only requires that they drop in my slipstreamed disk, and reload AutoCAD), they are concerned with machine downtime (that's why RAID10...as long as two disks in different sets don't fail), and time incurred for parity calculations (hence the RAID10 > copy on write...no parity calcs). They don't care about the data on the machine, per say (well, only so long as it takes them to do the work on the machine...then it gets dumped off).

They do have a SAN, but the requirement for this project, and therefore, this machine is that it will be VLAN'd off, and cannot touch the SAN (but they do have one), or any other shared network devices. This machine has to be standalone, or have dedicated resources. $20k to these folks is a lot for a computer, so the idea that they would buy additional dual fabric config, RAID controller, and disk for this machine it out of the question.

The 30" screens that they went with were for the resolution, solely. they weren't at all interested in the overall IQ of the displays, response times, etc. Although they do intend to manipulate the objects in the environment on one screen, and have it rendered on the other, they don't do fluid animations, or renders that require fast response times, color gamut reproduction, etc. It's not that kind of application. They're doing Mechnical and Electrical work in AutoCAD, not video production, video editing, or animation stuff.
 
Last edited:
SM is apparently working under the direction of PNY and Nvidia FAE's to try to reproduce the error, and have the test system parts coming together. Having contacts is GOOD.
 
Back
Top