Dual Opertron and Registered ram

Not possible. The socket 940 CPUs always require registered RAM.
 
Actually, I found one, its an Abit WN-2S+... but it doesnt have enough PCI or PCI-X slots for what i want to do.
 
All opterons require registered ram. It doesn't matter what motherboard you use because the opteron memory controller is on die. Or at least, that's how it was explained to me.
 
defakto said:
All opterons require registered ram. It doesn't matter what motherboard you use because the opteron memory controller is on die.
QFT
Any board that claims it will run opty's without registered RAM would be big news indeed
 
If you're going to quote me, seriously, get the entire quote in. I don't mind being proven wrong. No big deal. But no need to be all, "QFT" about it.

Also if you search their forums. Looks like abit pulled the plug on that particular board and alot of their planned server market. That board isn't even mentioned under their 940 boards anymore.

This looks like a promising alternative though

http://www.wideopenwest.com/~AXM77/images/MS-9620.JPG

this link breaks the news about discontinuation of this project
http://techreport.com/onearticle.x/8053
 
Ahh.. my thanks.. you proved me wrong.. I didnt find that disconinuation notice..

Why would you get that when the Tyan K8WE has more features and allows for more ram to be used..
 
difference in price probably. the k8we is an expensive board, worth every penny, but expensive.
 
defakto said:
If you're going to quote me, seriously, get the entire quote in. I don't mind being proven wrong. No big deal. But no need to be all, "QFT" about it.

Also if you search their forums. Looks like abit pulled the plug on that particular board and alot of their planned server market. That board isn't even mentioned under their 940 boards anymore.

This looks like a promising alternative though

http://www.wideopenwest.com/~AXM77/images/MS-9620.JPG

this link breaks the news about discontinuation of this project
http://techreport.com/onearticle.x/8053


Looks nice for enthusiast based items. My quams would be that it isn't E-ATX which would help a lot with board placement, it doesn't have two seperate ram banks (meaning each processor will be fighting over the same channel), and it doesn't have any PCI-X slots on it. However it hopefully will have advantages over they Tyan board as far as overclocking, and tweaking options. I still haven't seen a dually opteron workstation board that has all the features that I want to be on it.
 
Being non-eatx is a good thing though. It will give you a wider ranges of cases that the board will fit in since eatx is typically the largest board you can buy. I agree with you on the ram thing, just depends on the application you want to use the board for really.
 
defakto said:
Being non-eatx is a good thing though. It will give you a wider ranges of cases that the board will fit in since eatx is typically the largest board you can buy. I agree with you on the ram thing, just depends on the application you want to use the board for really.

I don't really agree.

Take a look at the spacing of the processors on MSI's dually board right there. Notice the proximity of the two CPU's to each other, and also the proximity to the top PCI-E slot. Then take a look at where the RAM is and how close that is to everything.

Simply said although larger and "limiting" as far as case size, eatx would have given much more room for the processors, the ram, having more features, and in this particular case more just regular PCI slots. Personally I think the fact that it is not eatx is more limiting than case size anyway.

For a dually CPU rig with SLi I would probably want to put it into a cube and watercool everything. If you're gonna go out and make a gawd box, might as well go all out.
 
eatx > atx imo.
I used to have the MSI master2 and it was a real pain in the ass....
came with its own heatsinks and fans (that were smaller than stock), one of which got within 2mm of the back of whatever is in the AGP slot. Watercooling or phasechanging was even worse as I had to cut away some of the blocks to stop them touching the graphics card, overall it was a very bad layout.

Now with my Iwill DK8N I have plenty room between everything, which means less heat. It also meants the board can have NUMA and allow 4 DIMM per CPU. For me (and most people) there is no difference in choosing a case when looking at eatx over atx for these type of mobos as most people who don't wanna rackmount it wanna put it in a tower.
 
Getting back to the ram issue, does anyone here run more than 8 Gigs of PC3200 ECC? I'm considering building a rig with K8WE and 16 Gigs and I'm wondering if I'm going to run into any problems. Is it better to get PC2700?

How much memory do you guys have? What brand and what speeds?
 
defakto said:
If you're going to quote me, seriously, get the entire quote in. I don't mind being proven wrong. No big deal. But no need to be all, "QFT" about it.

Also if you search their forums. Looks like abit pulled the plug on that particular board and alot of their planned server market. That board isn't even mentioned under their 940 boards anymore.

This looks like a promising alternative though

http://www.wideopenwest.com/~AXM77/images/MS-9620.JPG

this link breaks the news about discontinuation of this project
http://techreport.com/onearticle.x/8053

too bad there are no 64-bit pci slots! :|
 
Not everyone uses 64 bit pci slots....Besides you can put a very nice scsi or sata controller on pci-e
 
defakto said:
Not everyone uses 64 bit pci slots....Besides you can put a very nice scsi or sata controller on pci-e

I haven't found one yet, but I haven't really been looking too hard... could you give me linkage? I would really like to find an affordable solution (SATA) that would go into PCI-E.

Of course it's hard to argue against 128bit PCI-X... Looking at server/workstation class hardware always makes me drool.
 
defakto said:
Don't get me started on server hardware, I work security in a data center, racks and racks of 15k scsi drives and miles of fiber switches...so neat.

just a few off a quick google search for pci-e drive controllers

pci-e sata raid controller http://www.topmicrousa.com/arc-1220.html
pci-e scsi raid controller http://www.intel.com/design/servers/RAID/srcu42e/

I don't have or see any PCI-E x8 slots though :( Unless of course they use physical x16 slots nerfed to x8...

Ugh I hope they start making x1, x2, and x4 ones... I really want to get a SATA controller that has 6-8 ports. Raid 0 and 1 is fine, I don't need anything else that is exotic or even raid 5. Full hardware control optional (nice feature don't necessarily need it), ecc ram for buffer optional (nice feature don't necessarily need it), cost sub $250. Unfortunately, that doesn't exist....
 
You can put lower x numbered cards in higher numbered slots. You could even stick a x1 card (if you could find one...) in a x16 slot. It would just run at x1 speeds. So the uber leet raid controllers in defakto's post would work fine in x16 slots.
 
I've heard rumours, though not substantiated, that you can put a higher number card in a lower slot and just lose the bandwidth, like an 8x card in a 4x slot with the extra hanging over, kind of like putting 64 bit pci-x cards in std 32 bit slots.
 
I thought PCI_express connectors were 'closed' at the end, thus an 8x card wouldnt even fit in a lower slots. Could be wrong thought as i dont have a PCI-E mobo, but its what ive seen anyway.
 
A Dual Core - Dual Opteron Computer on a Tyan K8WE with 4 GB of Good Regestered PC3200 Ram would be a powerhouse of a Machine. Period.
 
DarkBahamut said:
I thought PCI_express connectors were 'closed' at the end, thus an 8x card wouldnt even fit in a lower slots. Could be wrong thought as i dont have a PCI-E mobo, but its what ive seen anyway.


After doing some poking around online looking at mobo images. Looks like your right. That's a shame, they really should design them to be backwards compatible with smaller laned slots. Even at a performance hit it would make the product much more desirable and flexible.
 
defakto said:
Not everyone uses 64 bit pci slots....Besides you can put a very nice scsi or sata controller on pci-e

you're talking of opterons here which is mainly for servers and workstations. alot of people who already have a highend workstation would probably have a 64bit card already and would like to reuse them. :)
 
Sheff said:
Getting back to the ram issue, does anyone here run more than 8 Gigs of PC3200 ECC? I'm considering building a rig with K8WE and 16 Gigs and I'm wondering if I'm going to run into any problems. Is it better to get PC2700?

How much memory do you guys have? What brand and what speeds?

4GB of Micron\Crucial PC2700 (1 GB per channel, with 4 more slots open)
what are you doing that you want 16GB? You running a NUMA aware OS?
What aps are you running?
 
I have been asked by an ad agency to put down a bid on an image that when complete will be about 10.6 Gigs at the resolution and size the client is asking for. There are three other artists who are up for this assignment. They have not yet picked out who is going to do it.

Currently, I am working on an XP 2500+ with 1 Gig. My main apps are 2d apps like Photoshop, Painter, and Xara.

Should I land this assignment, I would need to upgrade my main rig significantly. All of my recent posts have been research as to what motherboard/processors/memory etc. would I need. Currently I'm considering K8WE and MS-9620 which isn't out yet.

news114603_002.jpg


The MSI I don't believe does NUMA which, I have no experience with. So, I'm not sure whether or not NUMA will benefit me. I would like a second processor or a dual core machine as I have some 3d apps which I want to use more.

I need a better machine anyway for what I'm doing, but depending on whether or not I get this assignment, it will determine just how much memory I need to get. If I get 4 to 8 gigs of memory, that should be more than enough for my other work. If I hadn't been approached with this job, I wouldn't even be considering more than 4 gigs of memory.
 
Photoshop is limited to 2GB as a process (period regardless of the OS and hardware allowing more)
what you want to speed up the working time of such a large image is
say 4GB of RAM (assuming its oneGB stick for each of 2 channels for 2 processors)
and fast drives (or arrays) for both the scratch disk and pagefile (on seperate channels)

that way you can employ the full 2GB for Photoshop and the remainder for your OS

you could look at a solid state disk for the scratchdisk
either as hardware > http://www.cenatek.com/store/category.cfm?Category=9&CFID=1857939&CFTOKEN=89152993
or software > http://www.cenatek.com/product_ramdisk.cfm (in whuich case youd employ RAM on the mobo)

about the only aps that can employ 16GB are custom 64bit scientific R&D, the odd 3D renderer, or database\server, generally employing a NUMA aware OS (Linux, W2K3)

cut and paste 1
Opteron Memory Guide
NUMA FAQ (Non-Uniform Memory Access)
Error Correcting Memory - Part I
Error Correcting Memory - Part II: Myths and Realities
Virtual Memory In XP


cut and paste 2 (without review, Im rushed and have to fly) its quite old at this point too
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

while I didnt wait and went Suse 9.1 Pro
the questions remain if we are talking about simply porting an OS to employ memory larger than 4GB (2GB per thread the current 32bit limitation with a few Xeon oriented 3GB switch exceptions in W2K\2003) or actually long coded to employ the additional registers
Both linux and XP 64 Beta are just ported to employ more memory per thread not take advantage through long coding

and of course the same for applications

64bit computing in real life @ LostCircuits

outlining the advantages of the Opteron\AMD64 Architecture
(as youll see this primarily applies to multi-processor systems but some would carry over to the AMD64 as well)


AMD Opteron Coverage - Part 1: Intro to Opteron/K8 Architecture @ anandtech

Go DeepThe difference in pipeline architectures is what makes a clock-for-clock comparison between the Xeon and Opteron invalid (much like the Pentium 4 to Athlon XP comparison was invalid on a clock-for-clock basis). The Xeon's architecture allows it to reach high clock speeds at the expense of doing less work per clock cycle, the appropriate comparison ends up being one of cost and real-world performance, not one of clock speed.

The more pipeline stages you have, the less work is done per clock and thus the higher you're able to clock your CPU; this is the reason the 20-stage Xeon is currently at speeds of 3GHz, compared to the 12-stage Opteron which is debuting at 1.8GHz.
in short, its not just how fast you run, but also how long your legs are, a Westy need to run twice as fast as a Wolfhound to cover the same ground

AMD's 64-bit strategy - x86-64
The benefits of a 64-bit microprocessor architecture are mainly memory related; if you take two identical microprocessors, make one 64-bit and one 32-bit, the advantage of the 64-bit CPU is that it can address much more memory than the 32-bit CPU (2^64 vs. 2^32). For those that were hitting the limits of 32-bit memory addressability (4GB), Intel's only high performance solution was to transition to Itanium, but if all you're looking for is more than 4GB of memory and solid x86 performance, then you're SOL from Intel's perspective.

AMD's 64-bit strategy is significantly different; AMD has always been focused on the current customer needs, not on the vision of the computing future 5 - 10 years from now and this is reflected in their 64-bit strategy. The strategy is simple and has been done before in the past; stick with a high-performing x86 core, and simply extend the ISA to support 64-bit memory addressability - the end result is what AMD likes to call x86-64.

In legacy mode, the K8 will run all native 16 or 32-bit x86 applications, the processor basically acts just as a K7 would.

Things get interesting in "long" mode where a 64-bit x86-64 compliant OS is required; in this mode, the K8 can either operate in full 64-bit mode or in compatibility mode. Full 64-bit mode allows for all of the advantages of a 64-bit architecture to be realized, including 64-bit memory addressability. One of the major features of the K8 architecture is the fact that the number of general purpose registers is doubled when in x86-64 mode, and thus this feature is also taken advantage of in full 64-bit mode.

Compatibility mode gives you none of the advantages of a 64-bit architecture on the application level, as it is designed for running 32-bit apps on a 64-bit OS (hence the name compatibility); The extra registers and 64-bit register extensions are ignored in this mode. Compatibility mode is important because of the 2GB process size limitation under Windows OSes. Although 32-bit Windows offers support for a maximum of 4GB of memory, each process can only use a maximum of 2GB of memory - the remaining 2GB is reserved for the OS. By running a 64-bit version of Windows (when released) and a 32-bit application, compatibility mode allows for each 32-bit process to have up to a full 4GB of memory, with the OS using anything above that marker.

Finally we have 64-bit long mode, where there is more than meets the eye. In addition to > 4GB memory addressability, in 64-bit long mode, applications have access to twice as many named general purpose registers. Remember that registers are basically high speed memory locations on the microprocessor where temporary values are stored. For example, if you were to compute the sum of two numbers, both of those numbers as well as the final result would be stored in these registers.

There is an immediate advantage if your running over 4GB of RAM and a 64bit Operating System (Linux or Windows 64bit Evaluation and while not that many applications are coded to employ the long mode, animation software will be among the first

Look what we found, an on-die memory controller
The benefits of an integrated memory controller are clear - low latency memory accesses and an extremely fast controller design thanks to the fact that it is manufactured using the latest processes using the fastest transistors.

(see chart)

You can see that the integrated memory controller of the Opteron is significantly lower latency than the nForce2's dual-channel DDR memory controller. It is also worth noting that the 875P memory controller is extremely low latency, especially for an external controller - but you have to keep in mind that we're comparing two different clock speed CPUs here when we're comparing to the Intel platform. While the platform may have a latency similar to that of the Opteron, the CPU is running at a much higher frequency meaning that more clock cycles are being wasted in the same amount of time:

(see chart)

The above graph shows the number of clock cycles wasted on waiting for data from main memory, here we see the clear advantage of having an on-die memory controller.

The downside to the on-die memory controller is that in order to get support for new memory technologies, you need to replace your CPU, not just your motherboard. AMD has built functionality into the K8 core that allows an external chipset to disable the on-die memory controller and use an external one. However, remember that a K8 without the integrated memory controller is basically like an optimized K7 with a longer pipeline.

Considering the need to employ ECC RAM, and the amount of RAM a typical workstation would employ, especially to take advantge of breaking the 2GB application limit above, its unlikely that the typical animator would be "upgrading" RAM speeds on a regular basis, rather older workstations wiould be added to a rendering farm\cluster and new one replace it

Multiprocessor Mecca
The culmination of all of this is that the K8 core (and thus the Opteron) scales very well with the number of CPUs you have in a system, much better so than any Intel processor.

Whereas the Xeon only sees an 11% increase in performance from going to two CPUs, the Opteron sees an impressive 24% performance boost! These are not numbers to scoff at; AMD has clearly designed the Opteron for serious multiprocessing environments. We hope to be able to bring you 4-way scaling benchmarks very soon.

Another interesting thing about the K8 architecture is that it has already been engineered for use in multicore designs. AMD's Fred Weber mentioned to us that the logic for multicore, single die Opteron processors has already been verified, although nothing has taped out. The process is actually quite simple; AMD produces two Opteron cores, removes the physical layers of the Hyper Transport links and connects the two on a single die.

and this leads us to the two greatest potential advantages
one that AMD has announced that dual cores are on the way, and that they will be employing the current Opteron 940 socket and thus its possible and potentially likely that a dual processor board bought now could be upgraded to a quad processing powerhouse in the not too distant future, and that while Intel has also announced their intent to release dual cores, they are much farther out than AMD, and with the current architecture they are bottlenecked,, Xeons dont scale well at all
unlike Opterons with their on die memory controller and Hypertransport, and its unlikely that their dual core will employ the current socket and architecture

the above excerpts are from the initial release of the Opteron
a more current article would be below, and describes why the L3 cache is so important to the Xeon

AMD Opteron vs. Intel Xeon: Database Performance Shootout
Future Xeon and Pentium 4 processors will ship with the x86-64 extensions enabled but architecturally they will be identical to the currently available Prescott based Pentium 4. The architectural similarity between Intel's IA-32e ad IA-32 processors (IA-32e is Intel's marketing equivalent to AMD64) is an important point to note as it means that if Opteron is able to outperform Xeon in 32-bit mode, it will maintain a performance advantage in 64-bit mode as well.

FSB Impact on Performance: Intel's Achilles' Heel

We've alluded to FSB bandwidth being a fundamental limitation in Intel's multiprocessor architecture, and now we're here to address the issue a bit further.
A major downside to Intel's reliance on an external North Bridge is that it becomes very expensive to implement multiple high speed FSB interfaces as well as a difficult engineering problem to solve once you grow beyond 2-way configurations. Unfortunately Intel's solution isn't a very elegant one; regardless of whether you're running 1, 2 or 4 Xeon processors they all share the same 64-bit FSB connection to the North Bridge.

The following diagram should help illustrate the bottleneck
(see chart)

In the case of a 4-way Xeon MP system with a 400MHz FSB, each processor can be offered a maximum of 800MB/s of bandwidth to the North Bridge. If you try running a single processor Pentium 4 3.0GHz with a 400MHz FSB you'll note a significant performance decrease and that's while still giving the processor a full 3.2GB/s of FSB bandwidth; now if you cut that down to 800MB/s the performance of the processor would suffer tremendously.

It is because of this limitation that Intel must rely on larger on-die L3 caches to hide the FSB bottleneck; the more information that can be stored locally in the Xeon's on-die cache, the less frequently the Xeon must request for data to be sent over the heavily trafficked FSB.

What's even worse about this shared FSB is that the problem grows larger as you increase the number of CPUs and their clock speed. A 2-way Xeon system won't experience the negative effects of this FSB bottleneck as much as a 4-way Xeon MP; and a 4-way Xeon MP running at 3GHz will be hurting even more than a 4-way 2.0GHz Xeon MP. It's not a nice situation to be in, but there's nothing you can do to skirt the issue, which is where AMD's solution begins to appear to be much more appealing:

(see chart)

First remember that each Opteron has its own on-die North Bridge and memory controller, so there are no external chipsets to deal with. Each Opteron CPU features three point-to-point Hyper Transport links, delivering 3.2GB/s of bandwidth in each direction (6.4GB/s full duplex). The advantage is clear: as you scale the number of CPUs in an Opteron server there are no FSB bottlenecks to worry about. Scalability on the Opteron is king, which is the result of designing the platform first and foremost for enterprise level server applications.

Intel may be able to add 64-bit extensions to their Xeon MPs, but the performance bottlenecks that exist today will continue to plague the Xeon line until there's a fundamental architecture change.
 
THis may benfit him but does photoshop support /pae and /3gb switches for windows?
 
nope the limitation is in the photoshop itself and they have no plans to increse its memory
Renderman Peo Server for instance has been coded to employ memory greater than 2GB as a process which can be useful in high res complex scenes, still not long coded to fully employ 64bit however.

now that XP64 bit is out and Intel is doing 64bit, things may speed up on the application front but there are still very few (if any) long coded aps

hit that Lost Circuits article first ;)
 
Thanks Ice,

I will read up on those links. I had been researching solid state drives for scratch in this thread.
Someone else in that thread recommended Cenatek as well. Do you or anyone know if that Ramdisk software will work with XP 64 bit? Price wise, Ramdisk seems like the way to go in combination with motherboard memory.

I do intend on putting in 10K raptors in it. I had already discussed some of these problems with the client. Possibly the solution would be to work on the image in sections.

I forgot about the Photoshop 2gig limit. Now that you've reminded me of it, I seem to remember Spooge Demon being frustrated with his 4 gig dual Xeon.

Do you know of any problems or issues of PC3200 vs PC2700 at larger capacities? With that much memory, does the speed make that much difference?
 
Depends on the board I believe. I think the thunder k8w, once you pass the 4 gig barrier, will default down to pc2700 speeds from higher clocked ram.
 
defakto said:
Depends on the board I believe. I think the thunder k8w, once you pass the 4 gig barrier, will default down to pc2700 speeds from higher clocked ram.
This is what was attached to Tyan's latest BIOS update for the k8w:

[font=Verdana,Tahoma,Arial,Helvetica][size=-2]# [/size][/font][font=Verdana,Tahoma,Arial,Helvetica]Note: This BIOS follows AMD recommendations for DDR bus speed as a function of loading. As a result if you are running CG or older stepping CPU AND more than 6 loads DDR 400 memory the BIOS will automatically set the bus speed to DDR 333 speeds. For more information see the AMD BIOS and Kernel Developer's Guide (PID# 26094)[/font]
So you really only need to worry if you have a CG or older core.
 
I'm looking at 252's. I'm guessing that will be okay with PC3200.

Thanks. When I figure out everything that's going into this rig, I will post it for feedback.
 
Got a response from Cenatek

Hi, Sorry, we don't currently support XP64 in any of our products. -Support

So it looks like I will need to figure something else out with regard to a scratch disk.
 
http://www.superspeed.com/servers/ramdisk.php

Operating Systems:

Windows Server 2003 - all editions, 32- and 64-bit (x64 and Itanium-based)

Processor (CPU) Support:

300 MHz processor or higher

All Intel and AMD Pentium-class platforms

Intel 64-bit: Itanium, Itanium 2, EM64T processor families

AMD 64-bit: Athlon 64, Opteron

All SMP configurations of the above that are supported by Windows™

Click here for SMP list by OS
 
Ice,

You rock dude. However, what are the disadvantages to using Win2k3 server versus xp64 bit?

It's unfortunate that the Cenatek software isn't 64 bit. The price difference between Cenatek and Superspeed is huge.

Also, I'm considering this memory.

thanks
 
Back
Top