New ESXi Build with Questions & Project Log

nuclearsnake

Limp Gawd
Joined
Mar 8, 2003
Messages
445
Hi everyone,

It's been a while since I've had a big project to document and seeing this new Virtualized Computing section made me feel like starting a new log

The Goal
  • Migrate my Win2003 Servers to VMs
  • Mirgrate my Smoothwall to a VM
  • Build a testing network for server 2008/Vista/Exchange 2007/etc
  • Have enough free space to last a while and able to add to it if needed

My main Win2003 server is an old Dell PowerEdge 830 with 3x250GB (no raid) drives and one 500GB that is out of space. It hosts users backups, along with being my central repository for music etc.

My other Win2003 is an old Compaq 4500, 7U, quad Xeon 400 from 1998 that needs to be shot it's so loud.

Other Win2003 servers do DNS/DHCP/etc/

I was planning on putting the following together as my ESXi box:

  • Supermicro X7DCA-L-O (Dual 771)
  • 2 x Intel E5410
  • 6 x 2GB (KVR667D2D8P5/2G)
  • Adaptec 5805 Storage Controller
  • Adaptec Battery for 5808
  • Supermicro Case CSE-742T-650B
  • 6 x WD WD1001FALS (4.65TB after formatting)

From my reading, the major issue everyone comes up with is the Disk performance. The 6 drives will be in a RAID5 array connected to the 5808. I don't want to put them in a Raid 0 Array as the data is important, and SAS drives are out of the question in terms of costs. Will this work? I know the Adaptec 5805 is on the HCL, but will the performance be enough?

Also, is it wise to virtualize a Win2003 that will be acting as a file server with ~5TB of storage?


Props to agrikk for the thread titled "Your home ESX server lab hardware specs?" that I pulled a bunch of info from, and to sabregen for getting me turned onto the Supermicro X7DCA-L
 
I would consider SSD from Intel they are getting cheaper and would offer alot more perforance, from what i have seen have all the os's on those SSD's and then make a larger array for all your storage with `a NAS or something5.
 
You could go all out and build 2 boxes. 1 being just your storage box and 1 being your ESXi box. Your storage box wouldn't need much in terms of CPU/Memory. Not sure what's in your PE830 but you might be able to salavage most of it and just put it in a new case. This would be the [H]ard way to do it anyway.

I built a storage only box that's running Server 2003 and a iSCSI target application to give my ESX box storage. Spec wise, it's very minimal(2.33Ghz dual core w/ 4GB, which is overkill for it anyway). All the power is in my ESX box. And with the iSCSI target app I have, the storage for the ESX VM's is just a big file on my storage server.
 
No problem with teh file server - I see a lot more than that. However, you won't easily get 5TB of storage on local drives. VMFS supports only a max of 2TB, and you do NOT want to use extents. This means you'll HAVE to have a storage server, and either use NFS w. vmdk, or a RDM to be the storage volume. disk access will be quick enough.

Do not use the microsoft iSCSI target either - it's not really reliable and does not match the IEEE spec (odd, since their initiator is so good), and is definitely NOT supported or recommended by VMware. Use Openfiler if you need an iSCSI target, 2.3 is fully compliant with VPD pages 80/83/86. Pretty fast too, for what it is. was even briefly supported for some versions of ESX. Also avoid FreeNAS and ESPECIALLY the FreeBSD/NetBSD target (for some reason, they don't do Page 83 right, IIRC) - access will randomly break if you add a second server. Openfiler will also outperform MS's target, and take less overhead. MS makes a great initiator for use with VCB though.

Last fair warning - 1TB drives might be a bit risky. Under load, rebuilding a 5TB RAID5 could take a while. Long enough you might lose another drive. Just fair warning - I've seen it happen.

Plan looks solid!
 
So it looks like there is no way around the storage server. hmm.. that means that I'll need 2 more small disks for the ESXi box and a new case for it. The Adaptec card will go into that PE830 as I lucked out and it has a PCIe 8x slot, and I'll probably need a new NIC for it too and a touch more Ram...

I'll have to crack open that PE830 to see what the mobo size is and if the power to the mobo is proprietary as it's current PSU won't do 6 1TB drives.

As for the 1TB drives, you do raise a good point that I've seen happen before too. I'll just order one drive every two weeks :cool::D

The specs of the PE830 are:
P4D 3.00 with 1GB Ram with Win2003 Std
Broadcom integrated Nic
 
Wait, can't I build three VMFS volumes (2TB, 2TB, 1TB) and make one virtual machine on one of the 2TB volumes see the full 2TB vmkd file :confused:

Or am I totally confused?
 
Wait, can't I build three VMFS volumes (2TB, 2TB, 1TB) and make one virtual machine on one of the 2TB volumes see the full 2TB vmkd file :confused:

Or am I totally confused?

I'm not sure I follow you - I don't see any problem with what you just said.

VMFS is one volume per LUN or one partition per volume - it doesn't like being anything different, and I CANNOT recommend anything different. You also want 5% free space per volume minimum.
 
From my reading, the major issue everyone comes up with is the Disk performance. The 6 drives will be in a RAID5 array connected to the 5808. I don't want to put them in a Raid 0 Array as the data is important, and SAS drives are out of the question in terms of costs. Will this work? I know the Adaptec 5805 is on the HCL, but will the performance be enough?

I'll probably be in the minority with this recommendation, and it certainly depends on how much storage you actually need (many users seem to way overprovision), but RAID50 may be a solution that would work for you. Speed and redundancy at essentially double the $/GB.

You case has enough room for 8 HDDs that your controller supports, so at the most you'd end up with 3TB of usable space. Alternatively you can stick to 7 HDDs, your original RAID5 setup, and have the 7th HDD in the hotswap cage designated a hot spare.

The problem I see is that RAID5 is slow to write under load. If your VMs are write intensive, then your disk I/O bottleneck will affect all VMs.

There really isn't a good (read: cheap) solution to this, especially if you plan to stick with 6 drives only. Depending on how much storage you have available elsewhere, you can always start out with your RAID5, and then if there is an actual performance problem, reconfigure the array if you have a place to make the data available while you monkey with the reconfiguration.

In the end it all depends on how much I/O load you really have. Most everything is just theorycraft till it's put to the test. ;)
 
I was thinking the RAID 0+1 option, but I don't like the drop in disk space. Now, adding a 7th and 8th drive will give me 4TB before funky counting dwindles that down to ~3.6TB, but that will fill up my Storage controller, and leave me with no room to grow, which is the problem I am facing now.

Why is ESX all about tradeoffs?

How about this as a new idea. As I don't like the thought of having to build a "new" Storage server at the present time (mostly a cost thing) can't I operate with my 4.5TB split up within the limits of VMFS, then when I get more time/money build a proper storage server, and move the VM that need more space over the the storage server?
 
Why is ESX all about tradeoffs?

It's really not.
While CPU and memory are hardware which easily supports virtualization and has historically often been underutilized in existing physical servers, consumer grade storage hardware is simply not the right tool for production server virtualization.

Of course it all depends on actual usage. Is your server making money? If so, then allocating the proper hardware (and software) shouldn't be much of an issue. If your server is just a home lab setup you monkey around with for shits and giggles, then it's likely that RAID5 write performance won't be an issue as you don't put any load on that machine anyway.

You can run it at home with consumer hardware without any problems.
You can run it as a production (business) box with consumer hardware up to a point without any problems as well.
What you can't do is expect enterprise performance from consumer hardware (or at consumer hardware pricing, as the actual hardware between consumer and enterprise is in many cases not that different anymore), and that really has nothing to do with ESX at all.


How about this as a new idea. As I don't like the thought of having to build a "new" Storage server at the present time (mostly a cost thing) can't I operate with my 4.5TB split up within the limits of VMFS, then when I get more time/money build a proper storage server, and move the VM that need more space over the the storage server?

Yes.
 
I was thinking the RAID 0+1 option, but I don't like the drop in disk space. Now, adding a 7th and 8th drive will give me 4TB before funky counting dwindles that down to ~3.6TB, but that will fill up my Storage controller, and leave me with no room to grow, which is the problem I am facing now.

Why is ESX all about tradeoffs?

How about this as a new idea. As I don't like the thought of having to build a "new" Storage server at the present time (mostly a cost thing) can't I operate with my 4.5TB split up within the limits of VMFS, then when I get more time/money build a proper storage server, and move the VM that need more space over the the storage server?

It's not about tradeoffs. It's designed for enterprise level hardware that most of us don't get to use at home. ;) No offense, but your build is on the bottom end for most ESX installs.

You wouldn't be worrying about this if you had a DMX - you'd just pick a size for your lun and go with it. Even a clariion or Netapp...

Theoretically, yes, you can move things as needed. Migrate for the win!
 
It's really not.
While CPU and memory are hardware which easily supports virtualization and has historically often been underutilized in existing physical servers, consumer grade storage hardware is simply not the right tool for production server virtualization.

Of course it all depends on actual usage. Is your server making money? If so, then allocating the proper hardware (and software) shouldn't be much of an issue. If your server is just a home lab setup you monkey around with for shits and giggles, then it's likely that RAID5 write performance won't be an issue as you don't put any load on that machine anyway.

You can run it at home with consumer hardware without any problems.
You can run it as a production (business) box with consumer hardware up to a point without any problems as well.
What you can't do is expect enterprise performance from consumer hardware (or at consumer hardware pricing, as the actual hardware between consumer and enterprise is in many cases not that different anymore), and that really has nothing to do with ESX at all.




Yes.

EXACTLY

Realize this: ESX, with full options, starts at 3-5k for license and support (and support is REQUIRED) for production environments. This isn't windows, or even Hyper-V. This is a datacenter product :) We just like playing with it at home!
 
I've been doing some more reading, and planning and have decided to change a few things around from my initial build.

I would like to take back my statement that ESX is all about tradeoffs. That's not quite what I meant. Doing ESX on the cheap is about tradeoffs in regards to when you will get everything together in the optional setup

First, Thuleman, thanks for reminder about the 2TB limit of VMFS. As I do not currently have the cash to build a storage server, I will be altering my plans to incorporate this into the next revision buy building my current setup around plans to have a storage server in the future.

This means that I'm going to change my case from a full tower (that can double as a 4U Rack) to a 2U Rackmount - namely the SUPERMICRO CSE-825TQ-560LPB. It comes with a 560W PSU, with the ability to add a second in the future (more on this to come).

My other problem I wanted to address with this update was the potential performance problem with using 1TB 7200RPM drives in RAID 5 for guest OSs that do a large amount of writes. For the initial build, I will keep my 6x1TB in RAID 5, but if performance is not as I expect, the the Storage Server will have these drives moved into it, and a set (6-8) of 36GB 10K or 15K SAS drives will be put into the ESX box for fast local storage that will not reach the 2TB cap. I'll probably do RAID 0+1 on that. VMs will be moved around as needed. Might even be able to get 73GB drives at that point...

If/when I set the ESX box up with the SAS drives, I'll add that second PSU as the PSU calculators show that with all other hardware being equal and 6x7200RPM SATA drives in the system the draw is ~475W. Once I swap out the 6xSATA in favor for 8x15K SAS the load will increase to 575W

I know that I haven't started buying the parts for the ESX box, but here's what I have planned for the Storage Server:
Same SUPERMICRO CSE-825TQ-560LPB Case
Dell PE830 Server Guts (Mobo, CPU)
4 x 1GB DDR2 "unbuffered ECC memory modules in the four memory module sockets on the system board" quoted from Dell's documentation
Same Adaptec 5805 Storage Controller (if I like the 1st one)
Intel Dual Port NIC (PICe) and one for the ESX box

I think this addresses all the problems posted/talked about so far

thanks to axan for his worklog that helped me get some ideas together Link
 
If/when I set the ESX box up with the SAS drives, I'll add that second PSU as the PSU calculators show that with all other hardware being equal and 6x7200RPM SATA drives in the system the draw is ~475W.

That doesn't seem quite right.
The specs for the WD1001FALS say that the R/W dissipation is 8.4W per drive. Let's say you need triple that during spin-up (just for argument's sake), that's still just roughly 21W per drive, or roughly 130W total. I am sure the controller supports staggered spin-up so you wouldn't actually place 130W load on the system.

The PSU calculators I have seen online are totally worthless because they are A) totally outdated in terms of draw, or B) put out by retailers who would rather sell you a bigger PSU than you need just to be sure you can power the system.

In the end the PSU is perhaps the most important component in your system, but while it's good to buy a quality one, there's no need to massively oversize it, especially if you are confining yourself to a 2U case which can physically only hold so much.
 
Updates with Pics!

So parts have started to arrive, I'm still missing the RAM, Heatsinks, CD drive, and 4 of the drives (I order my drives from different suppliers at different times to minimize getting a bad "batch" of drives)

Anyway everyone's here for pics!

Box shots! Hey wait! is that a Celeron?
sIMG_7823.jpg


Nope, just two second hand E5420's
sIMG_7824.jpg


Adaptec 5805
sIMG_7823.jpg


sIMG_7826.jpg

sIMG_7827.jpg


I love the packaging of Supermicro parts. Check this out; It's a box in a box!
sIMG_7829.jpg

sIMG_7830.jpg


Rails, and a ton of mounting screws were in the little one;
sIMG_7833.jpg

sIMG_7834.jpg


sIMG_7837.jpg


sIMG_7838.jpg


Action shots of the case;
sIMG_7843.jpg

sIMG_7844.jpg

sIMG_7845.jpg


sIMG_7848.jpg


Because the question I see everyone asking about these cases is about the length of the power wires; here's a nice shot to show the different lengths
sIMG_7851.jpg


Needs moar editing;
sIMG_7855.jpg


Screws:
sIMG_7857.jpg


I really like the stand offs for this case;
sIMG_7861.jpg


sIMG_7858.jpg


Macro!
sIMG_7862.jpg

sIMG_7863.jpg

sIMG_7864.jpg

sIMG_7865.jpg


There are a lot of headers on the mobo;
sIMG_7869.jpg


Two shots to show how the non-hotswap bays work
sIMG_7871.jpg

sIMG_7872.jpg


Here's where I had a problem; the Power LED is three pins wide, where as the plug on the mobo has it set for being two pins wide...
sIMG_7875.jpg

sIMG_7877.jpg


See:
sIMG_7879.jpg


So I had to pop the little pin out, and move her over
sIMG_7880.jpg

sIMG_7881.jpg

sIMG_7883.jpg
 
The problem I see is that RAID5 is slow to write under load. If your VMs are write intensive, then your disk I/O bottleneck will affect all VMs.

There really isn't a good (read: cheap) solution to this, especially if you plan to stick with 6 drives only. Depending on how much storage you have available elsewhere, you can always start out with your RAID5, and then if there is an actual performance problem, reconfigure the array if you have a place to make the data available while you monkey with the reconfiguration.

In the end it all depends on how much I/O load you really have. Most everything is just theorycraft till it's put to the test. ;)

Our main ESX server contains a file server VM that serves redirected home drives/desktops as well as departmental data and large image files to about 500 users. We use 8 750gb SATA ES.2 series Seagates on an Adaptec 5805 in RAID 6. Performance is excellent. We have no speed complaints and run 17 other VMs on the box at this point.

The days of RAID 5/6 being slow are gone, as long as you buy the right hardware. Adaptec's 5 series are incredibly fast.
 
Tell me about it. I'm installing my guest OS's on the server and it's flying. Granted that Im installing off and ISO image I made.

Question: I'm not able to give all my space to VFS in ESXi. I can see all the space, but I can only make a VFS out of 598.99GB of it...
Does it have anything to do with the RAID Stripe Size?
 
Tell me about it. I'm installing my guest OS's on the server and it's flying. Granted that Im installing off and ISO image I made.

Question: I'm not able to give all my space to VFS in ESXi. I can see all the space, but I can only make a VFS out of 598.99GB of it...
Does it have anything to do with the RAID Stripe Size?

No. Are they local drives? If so, you may have to adjust the partitions in the install - it's been a long time since I've done an ESXi installable install.

Also, if it's >2TB, you'll only get to use the stuff over 2TB. VMFS has a maximum 2TB volume size without extents.
 
I decided to reinstall with a larger stripe set (1024K as opposed to 256K) and now I see this...

I did a build/verify on Array when I made it the 2nd time on the 5805

vmwareissue01.jpg
 
how big is the array?
edit: OP says 4.6 tb. Make this into 3 logical drives: 1.9TB, 1.9TB, and the rest. You cannot have raid devices over 2TB with ESX. Extent them together then.
 
Oh okay there have to be 3 Raid pools

Why don't people say that
Edit: nevermind, just found a whitepaper. doh
 
Tell me about it. I'm installing my guest OS's on the server and it's flying. Granted that Im installing off and ISO image I made.

Question: I'm not able to give all my space to VFS in ESXi. I can see all the space, but I can only make a VFS out of 598.99GB of it...
Does it have anything to do with the RAID Stripe Size?

2tb LUN limit on VMFS. Technically, you can merge LUNs with extents, but its considered a bad idea. Not sure why.

We have a hair over 4tb usable, so we have 2 2tb data volumes plus a 90gb LUN for the ESX OS.
 
2tb LUN limit on VMFS. Technically, you can merge LUNs with extents, but its considered a bad idea. Not sure why.

We have a hair over 4tb usable, so we have 2 2tb data volumes plus a 90gb LUN for the ESX OS.

Extents are Software RAID0 - ish. At least when you think about reliability and usability, although you don't get the performance boost and things my not be striped across luns. I'm sure you can see why that's bad ;)

Multiple points of failure for a datastore, and losing one means you can lose both.
 
Another thing to think about with the 3 partitions, you have a 3 way contention scenario for I/O on a single volume.
 
I think I'm starting to have disk I/O troubles. Every now and then all the VM lockup (screen still displays, but nothing's going on) and on the performance log of the ESX console everything flat lines.

Looking at the case, i see all the drives access LEDs' lit but nothing going on... Not quite sure what to make of it as all that was going on at the time was an install of SP2 on my 2003 box...

Anyone have any ideas?
 
Another thing to think about with the 3 partitions, you have a 3 way contention scenario for I/O on a single volume.

Yeah, that was bugging me so I might build two RAID 5 volumes, and add some more disks in order to balance out the load
 
I think I'm starting to have disk I/O troubles. Every now and then all the VM lockup (screen still displays, but nothing's going on) and on the performance log of the ESX console everything flat lines.

Looking at the case, i see all the drives access LEDs' lit but nothing going on... Not quite sure what to make of it as all that was going on at the time was an install of SP2 on my 2003 box...

Anyone have any ideas?

Pull the logs. file->export diagnostic data. Upload somewhere and pm me the link, I'll take a look.
 
I think I'm starting to have disk I/O troubles. Every now and then all the VM lockup (screen still displays, but nothing's going on) and on the performance log of the ESX console everything flat lines.

Looking at the case, i see all the drives access LEDs' lit but nothing going on... Not quite sure what to make of it as all that was going on at the time was an install of SP2 on my 2003 box...

Anyone have any ideas?

Are you running current firmware on the 5805?
 
I think I'm starting to have disk I/O troubles. Every now and then all the VM lockup (screen still displays, but nothing's going on) and on the performance log of the ESX console everything flat lines.

Looking at the case, i see all the drives access LEDs' lit but nothing going on... Not quite sure what to make of it as all that was going on at the time was an install of SP2 on my 2003 box...

Anyone have any ideas?

If you want the best disk performance, it is recommended that you use independent disks as opposed to a RAID volume.
You are bound to have I/O issues with many VMs on a single RAID volume.
Remember, most disk activities are going to be small reads and writes.
With many VMs, this would lead to a lot of I/O requests.

This has been discussed so many times.
Running your VMs on a RAID volume (other than RAID 1) is just not recommended.
People always get caught up with disk transfer rate throughput and forget about I/O requests scaling, when that's what matters in this scenario.

I had a lot of fun playing with ESXi 3.5 update 3 yesterday on both my T60 laptop and the server in my sig.
I was quite surprised to find the performance to be really good using a NFS datastore.
The NFS server was running in a Ubuntu VM on another laptop (a Dell 9300).

I am a bit disappointed that ESXi does not support using raw local disks.
If RDM is well supported for external storage, I don't see why raw local disks can't be well supported as well.
Having to build a separate storage server for datastore use kinds of defeat the point of consolidation unless you really need shared storage.
 
If you want the best disk performance, it is recommended that you use independent disks as opposed to a RAID volume.
You are bound to have I/O issues with many VMs on a single RAID volume.
Remember, most disk activities are going to be small reads and writes.
With many VMs, this would lead to a lot of I/O requests.

This has been discussed so many times.
Running your VMs on a RAID volume (other than RAID 1) is just not recommended.
People always get caught up with disk transfer rate throughput and forget about I/O requests scaling, when that's what matters in this scenario.

I had a lot of fun playing with ESXi 3.5 update 3 yesterday on both my T60 laptop and the server in my sig.
I was quite surprised to find the performance to be really good using a NFS datastore.
The NFS server was running in a Ubuntu VM on another laptop (a Dell 9300).

I am a bit disappointed that ESXi does not support using raw local disks.
If RDM is well supported for external storage, I don't see why raw local disks can't be well supported as well.
Having to build a separate storage server for datastore use kinds of defeat the point of consolidation unless you really need shared storage.

It has to do with how we do SCSI reservations. And remember, this is an enterprise product. Few people using ESX don't have a storage array of some kind. Most people aren't using RDMs as well either.
 
It has to do with how we do SCSI reservations. And remember, this is an enterprise product. Few people using ESX don't have a storage array of some kind. Most people aren't using RDMs as well either.

With free ESXi, this is no longer true (meaning the target audience has been scaled back a bit).
By not allowing raw local disk, VMware is pretty much saying it does not support the virtualization of large storage arrays, which sucks.
 
With free ESXi, this is no longer true (meaning the target audience has been scaled back a bit).
By not allowing raw local disk, VMware is pretty much saying it does not support the virtualization of large storage arrays, which sucks.

Except ESXi was originally released to enterprise customers, and then made free - it was never designed for small customers in the first place.

And sure they support virtualization of local disk - you just have to use something like LeftHand's VSA appliance - I'm pretty sure there's a demo for it even, although it requires ESX Classic, IIRC. VMware is not a storage vendor, it's a hypervisor vendor.

What do you call large? To me, large is 3 clustered DMX-4s with 2.5 PB of storage on 8gig fibre or a 10gbe front end.... :) No local storage will ever touch that. THATS what ESX was designed for.

Why would you want to do a raw device to a local disk anyway? What's the point? You gain maybe a percentage point of performance, but that's about it... Can't split it up to multiple vms, only a single one has access at a time (you can't MSCS around local disk - the SCSI3 reservation would lock ESX up totally since it's a device level reservation)... It just makes no sense at all for normal use.
 
This has been discussed so many times.
Running your VMs on a RAID volume (other than RAID 1) is just not recommended.
This may hold true for your home-grown RAID setup, but in an Enterprise environment where VMs run off of SANs, those are all RAIDed.
 
Pretty sure you aren't running into disk contention issues with the random lockups as thats typically not how ESX exhibits high I/O lag. If I were to guess its any issue with a faulty drive, raid controller or cable. Did you disable TLER those drives so the RAID controller correctly handles error correction and you don't have hangups from drive read/write failures?
 
Back
Top