OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Hi _Gea,

Can I make a suggestion? I think it would be very helpful to be able to make selections with check boxes. I've run into a few instances now where I've had to delete all my snaps and jobs. It's quite time consuming having to delete hundreds of snaps and jobs one at a time.

Thanks and continue the great work!

Riley


Hello Riey, you may mass-delete snaps based on date or any characters in name.
Each job is a pair of files in the job-folder, no problem to delete them directly
 
I have an odd SMB/Permissions issue I can't seem to solve, here goes:


You need to know that CIFS/SMB is ACL only while on console, you deal with simple unix permissions.

It should work with SMB and inherit of ACL when:
aclmode and aclinherit ZFS property is set to pass-through (can be modified with napp-it and newest acl extension)
set ACL of the shared Video folder to desired permisssion with file and folder inheritance enabled

Included Files and folders should now inherit the ACL from the parent folder
 
You need to know that CIFS/SMB is ACL only while on console, you deal with simple unix permissions.

It should work with SMB and inherit of ACL when:
aclmode and aclinherit ZFS property is set to pass-through (can be modified with napp-it and newest acl extension)
set ACL of the shared Video folder to desired permisssion with file and folder inheritance enabled

Included Files and folders should now inherit the ACL from the parent folder

Thanks GEA for the fast reply, but can you please clarify?

aclmode and aclinherit ZFS property is set to pass-through (can be modified with napp-it and newest acl extension)

I have v0.7g and I can't seem to find this option?

set ACL of the shared Video folder to desired permisssion with file and folder inheritance enabled
When viewing ACL on SMB in the ACL Extension tab, I'm not able to change the "inheritence" value. Please advise.

Thanks again GEA, much appreciated.
 
Thanks GEA for the fast reply, but can you please clarify?

I have v0.7g and I can't seem to find this option?

When viewing ACL on SMB in the ACL Extension tab, I'm not able to change the "inheritence" value. Please advise.

Thanks again GEA, much appreciated.

use 0.7i or set ZFS property via CLI
http://www.napp-it.org/downloads/changelog_en.html

opt. delete/ reset ACL
set new trivial or user ACL, this is free with the extension
 
Much obliged, that did the trick :)

P.S Any way of resetting ALL ACL's in ALL folder, or do I have to manually go through every single one?

for inherited ACL, it is enough to modify parent ACL
otherwise try chmod -r ..

even a chmod -r 700 /folder resets ACL to meet this permission
 
I have 8 wd 5000aaks drives, and only get about 35/MB's write speed 330/MB's read speed

I have 3 ibm m1015's flashed to IT mode

I'm running oi151, esxi5.0 8gb ram for the vm and two vcpu.

I have three 64gb kingston v100 ssds that I can use for a write cache.

Would this improve my write speed greatly?

I tried it with 4 older hitachi drives and it didn't make much improvement.

*edit this is for VM storage - holding about 25 VM's with about a dozen on at any one time. Nothing super I/O intensive.
 
I think you are getting bit by the NFS sync mode writes esxi uses. To confirm do:

zfs set sync=disabled

on the pool and re-test?
 
I think you are getting bit by the NFS sync mode writes esxi uses. To confirm do:

zfs set sync=disabled

on the pool and re-test?

Don't mean to derail, but don't you run the risk of corruption doing this? The main reason i'm pursuing zfs is for the corruption fighting characteristics.
 
Don't mean to derail, but don't you run the risk of corruption doing this? The main reason i'm pursuing zfs is for the corruption fighting characteristics.

Yes. I think he means just to test if that is the performance bottleneck.

What config are the 8 drives?
 
I have 8 wd 5000aaks drives, and only get about 35/MB's write speed 330/MB's read speed

I have three 64gb kingston v100 ssds that I can use for a write cache.

Would this improve my write speed greatly?
.
yeah as suggested I would try with sync write turned off but leaving it off will mean vm's and especially any databases on vm's risk getting currupted data if there is a power cut or hardware lockup as the last writes will only be in ram. Try it and if it makes a noticable difference to your vm's performance then think about adding a decicated cache device or two. I would not use kingston v100 though for this as they don't have super caps and could lose data in a power cut witch is as bad as turning write cache off anyway! Also check out their write endurance and comare this to your expected write load. There are many suggestions for good zfs write log devices if you search the net. The real good ones are expensive though.

Another option is to create a second NFS share and turn off the write cache only for this one (napp-it has an option for this per NFS share in the web GUI). you can then allocate high write speed storage to some vm's that dont need perfect consitancey but need high write performance.

One interesting option i've been looking at is if you can make a lower cost SSD which is not suitable for ZFS log use sutable by protecting it from power loss. Currently only a small selection of drives have super caps etc and many cost a lot or have much lower write performance. Some high speed modern SF-22xx controler SSD's which don't come with power loss data protection by default could be used this way. Because they are not as reliable as some drives you may want two in mirror and the simplest method is to wire up a simple cheap 5V DC power adapter to a sata power adapter cable and plug this into a dedicated small UPS. If the main System loses power or crashes and powers down the drive will stay alive and commit its last writes safely. The only problem is if you trip over the external power adapter you cut the power to the drives and the ZFS pool goes in to slow write mode when it detects the log devices going down. To stop this two simple diodes could be added to feed in 5V from computer power supply as well as external so it will stay live. But then your in an unsafe situation if the adapter power dies so you ideally would also need to add a buzzer circuit to alert you that the external power has failed.

Another option I designed is to use a large supercapacitor which you could add in between the computers power supply and the SSD so that it has an extra 10 seconds of power only which is all it needs. But to make it safe and not hurt the drive its a bit more complex as it requires extra chips and circuitry to safily cut the power when the super cap voltage drops. Ideally I would make a 12V super capacitor bank and then output this to a 5V regulator with a circuit to turn off a relay when the capictor output fell to around 6-7V. Would be great if someone could build and sell this kind of thing as a simple plug in module
 
Correct, I was proposing disabling sync mode for testing purposes only. On the other hand, if this is an all in one, I think a zil is much less useful for two reasons: 1) if you have a power failure and crash, everything is going to crash, including esxi and the guests, and 2) since you can easily get 2-3gb/sec with internal network traffic (the SAN is virtualized as well as the guests and the hypervisor), your write performance will suck anyway, unless you get a very expensive ssd for the zil (with 300MB/sec or better write performance...)
 
If you were to run cheaper ssd's for the ZIL in an all-in-one that aren't backed up by internal capacitors is the biggest issue a complete power failure (pulling the plug) or do you have to worry about software issues forcing a hard shutdown as well? For example, say for some strange reason esxi completely locks up and you need to hit the reset/power button on the machine, is there any additional risk of data loss with the cheaper ssd's? How about the same scenario without a ZIL and sync disabled?
 
If you were to run cheaper ssd's for the ZIL in an all-in-one that aren't backed up by internal capacitors is the biggest issue a complete power failure (pulling the plug) or do you have to worry about software issues forcing a hard shutdown as well? For example, say for some strange reason esxi completely locks up and you need to hit the reset/power button on the machine, is there any additional risk of data loss with the cheaper ssd's? How about the same scenario without a ZIL and sync disabled?

my understanding:

All writes are going to RAM and then, after a few seconds to disk. If sync (confirmed) writes are requested, they are going in parallel to a ZIL (even after years i must think, ok Zil is the one for write caching in parallel, horrid naming for a simple behaviour) to be confirmed after really wrote to disk.

Without SSD, this "ZIL/ other name is LOG, Apple help for better naming" device is located on the pool (slow as the pool) but it can be also on
faster disks (cheap, fast, does not matter). If you define another disk, it will be used.

In case of a sudden power failure, the data on ZIL may be lost when not on a SSD or battery buffered RAM. This does not affect ZFS data consistency
but may be a problem to file-systems build upon. They may have optimized data in RAM for performance reasons. On a power-loss, their view to data is inconsistent
(not from ZFS view). Therefor some ESXi VM's may have problems after a power failiure, even when on ZFS.

This can affect also VM's on a all-in-one if sync is set to off
(Most VM's are quite save for power failures, but do not with non transactional databases like isam mysql)
 
Last edited:
I'm sure this is an easy fix but I only tried doing this late last night and it didn't work as expected,

had 2 x 2TB disks in a non-RAID array, now have a 3rd disk and wanted to switch to RAIDZ.

Copied all my data off to another machine, destroyed the pool and recreated with all 3 disks, however it seems to be maintaining the old dataset and I cannot get samba shares for the new dataset.

Is there something simple I'm missing? I tried destroying the dataset first but that didn't seem to make any difference, it just destroyed the pool also and once again when I re-added pool the dataset was back again.

edit: sorted, was missing one step.
 
Last edited:
I got my Kill-A-Watt meter today. My server ASUS-WS-8B, Pentium G620, Mtron 16GB SSD, 5x2TB@5900 RPMHard Disk, 16GB RAM takes about 70-80 Watts. The power consumption did not vary much during use, so it is very likely the disks are not spinning down. The server of course does not go to sleep and manually suspending hangs the machine :(. I pay 25c/KwH so it would cost about $13/month to run it 24x7.

In contrast my i5-2500K + ASUS P67 + ASUS HD6870 + Dual Port Intel Gigabit PCI card + 8GB RAM + 1SSD + 1x1TB@7200RPM on W7P takes about 120Watts on idle but almost zero watts when it goes to sleep.

They are both running a 760W PC Power and Cooling Silencer (Seasonic80Plus Silver). In retrospect I realize that the PSU is an overkill but I want some head-room for adding more disks.

I expect the server to be idle greater than 95% of time, so the ability to go to sleep is perhaps more critical than I thought it would be.

Is there any other Solaris distro which handles sleep better on these x86 architectures? The MB has the C206 chipset. Any other ideas on how to reduce power consumption for this machine?
 
Your disks are probably using 3-5 watts while spinning, if you can get those to idle that should provide some good savings. I'm expecting to have the same problem and I wish there were a way for guest OSs to go to sleep then if all are asleep have VMware trigger the entire server to sleep.. but I'm guessing that isn't in the cards. You also have the problem of having to trigger things to wake up..
 
Your disks are probably using 3-5 watts while spinning, if you can get those to idle that should provide some good savings. I'm expecting to have the same problem and I wish there were a way for guest OSs to go to sleep then if all are asleep have VMware trigger the entire server to sleep.. but I'm guessing that isn't in the cards. You also have the problem of having to trigger things to wake up..

I am not virtualizing (yet); I am running OI directly.

Any clues on how to control disk-spin downs?

I have done the following add 'setprop interval 24h' to /usr/lib/fm/fmd/plugins/disk-transport.conf and manually set the spindown time

Code:
cpu-threshold 30s 
device-thresholds         /dev/dsk/c6d0     90s 
device-thresholds         /dev/dsk/c4t3d0     90s 
device-thresholds         /dev/dsk/c4t4d0     90s 
device-thresholds         /dev/dsk/c4t5d0     90s 
device-thresholds         /dev/dsk/c4t6d0     90s 
device-thresholds         /dev/dsk/c4t7d0     90s


Also in general, do servers ever go to sleep like desktops do? I do not have an IT background (engineering yes) so do not have any idea about how they typically operate.
 
Last edited:
I haven't run OI directly yet so I can't comment on spindown, although I've read people getting it to work. Generally speaking I don't think servers go to sleep very often, although my home windows media center PC sleeps everytime its not being watched or recording.

Also a question, is your hardware sandy bridge based? I thought I had read that the latest solaris wasn't supporting that particularly well yet.
 
Hello all, I have been using napp-it (currently .6r) for some time on Solaris 11 and I had my first hard drive die. I wanted to share my experience and see if there was anything I did wrong or could improve. My first problem was attempting to identify the failed drive. I tried to use DD but that did not trigger the LEDs. What does this function depend on? Just wondering if there is anything I can do to get this to work.

Now I did figure it out simply by watching the activity and noticing the drive that was not active. Once I identified the drive I got a replacement and swapped out the drive. I expected that since I had autoreplace enabled that it would do the replacement, however this didn't happen. I was able to do this manually through the web-gui but I am wondering why this didn't work.

Once replaced, it was able to rebuild successfully. I ran the vendor tools and sure enough it failed. However, the next day I got another alert about the new disk failing. This time the errors we only checksum errors. I brought it back online, cleared the errors, and triggered a scub. Currently the scrub is running but has already found 12 checksum errors but says it is repairing. It seems unlikely (though I know possible) that the new drive just happened to have issues as well. Does anyone have any recommendations to figure out what is going on?
 
Gea,

I am planning on an all-in-one build, and saw this in your latest documentation discussing different virtualization modes:

Code:
6.5 Virtualization within the OS-Kernel (KVM - Kernel based virtual machines, needs real stable systems best with isolation of functions like with zones)

Have you played with Joyent's KVM port and consider it stable for use? Do you have any timeframe for switching napp-it to use this instead of esxi? I'm wondering if I should wait instead of doing an esxi build right now? Thanks.
 
.. I tried to use DD but that did not trigger the LEDs. What does this function depend on? Just wondering if there is anything I can do to get this to work.

Now I did figure it out simply by watching the activity and noticing the drive that was not active. ..

.. Currently the scrub is running but has already found 12 checksum errors but says it is repairing. I

DD just generates traffic. If your disk fails, you cannot detect via DD

For LSI 2008 based systems and SES enclosures, there are options to display the
physical slots (ex. my napp-it monitor extension) - otherwise you should write down
the slot and the id/ SN of your disks and where they are.

Single checksum errors are not a problem. A lot indicates a disk or enclosure/power/ cabling/ controller etc problem
 
Gea,

I am planning on an all-in-one build, and saw this in your latest documentation discussing different virtualization modes:

Code:
6.5 Virtualization within the OS-Kernel (KVM - Kernel based virtual machines, needs real stable systems best with isolation of functions like with zones)

Have you played with Joyent's KVM port and consider it stable for use? Do you have any timeframe for switching napp-it to use this instead of esxi? I'm wondering if I should wait instead of doing an esxi build right now? Thanks.

not yet. To be honest. ESXi is very stable and very comfortable, but adds an extra layer.
Unless KVM becomes easy to use (ex includes a web-management tool) i will wait.
 
Can anyone suggest a bare-bones box so I don't have to spec all these cards and such? I'm looking at the SuperMicro 6026T-NTR+ which is even on the OI community HCL. But it only supports 6 SATA drives as is. I don't know what card to add into it. I don't really understand why they make a case with 8 without 8 ports...
So far I think I want a box that would do :

6 SATA drives in raidz2 + 1 or 2 spares in the hotswap bays
2 2.5s for OS drive (mirrored)
Room to add a SSD or two for cache or ZIL if it becomes necessary

Would there still be room to add more NICs or some kind of Fiber or Infiniband?
 
Yes. I think he means just to test if that is the performance bottleneck.

What config are the 8 drives?


vm ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c2t50014EE100A5D2EAd0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-6
c2t50014EE156934189d0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-2
mirror-1 ONLINE 0 0 0
c2t50014EE1AB6BA40Bd0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-6
c2t50014EE1ADF6F144d0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-2
mirror-2 ONLINE 0 0 0
c2t50014EE200C320FAd0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-0
c2t50014EE201ABB7BCd0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-2
mirror-3 ONLINE 0 0 0
c2t50014EE2560F4133d0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-7
c2t50014EE25610C2F2d0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-7

Ill do some more testing and see what happens.

Edit

I did zfs set sync=disabled on my pool

instead of 45MB/s I'm getting 450MB /s

Whats the best course of action here to keep these speeds, and not have sync writes disabled?
 
Last edited:
Can anyone suggest a bare-bones box so I don't have to spec all these cards and such? I'm looking at the SuperMicro 6026T-NTR+ which is even on the OI community HCL. But it only supports 6 SATA drives as is. I don't know what card to add into it. I don't really understand why they make a case with 8 without 8 ports...
So far I think I want a box that would do :

6 SATA drives in raidz2 + 1 or 2 spares in the hotswap bays
2 2.5s for OS drive (mirrored)
Room to add a SSD or two for cache or ZIL if it becomes necessary

Would there still be room to add more NICs or some kind of Fiber or Infiniband?
They do this just to save money and allow people to choose what raid/hba card they want.

This board like almost all boards does not come with proper SAS/SATA dedicated RAID/HBA add on and just has the essentially free ports that are included in the motherboard chipset used

6x SATA2 (3 Gbps) Ports via ICH10R Controller

This is what this board and most similar boards use and these ports can be used fine in ahci mode in OI etc. You have to be aware of the limitations of these internal ports though. These ones are only sata 2 which may slow down the fastest SSD's a little and they all share a limited bandwidth that the ICH10R gets so they are fine for just normal hard drives but stick 6 SSD's on them and they will limit things a bit. Also currently smart monitoring in napp-it does not support sata ports like these and only works with SAS raid cards. Also Note that if you plan to do an all in one with esxi installed first then you either pass all 6 ports to an OI vm or you can use these ports to boot esxi. you can't do both with these ports because of the way direct path has to pass the whole ICH10R controller to the vm. Also if you boot esxi from these ports you can't do any raid as these are only software raid so not supported by esxi and you will have one boot disk and if it dies you will be reinstalling ESXi from scratch and reinstalling OI and napp-it but your ZFS pool with all your VM's will be fine and easy to import again.

Because of this you need to either don't do esxi and boot OI off two disks and then have 4 disks for your main pool or you will have to add extra SAS (or SATA) adaptors. for that case they will have to be half height cards as well which makes it a little trickier to find one that works.

you can get cheap 2-4 port ahci sata cards and boot esxi on these (check the web to see if the chipset on the card is supported by esxi first though!) but you will not have raid for your esxi boot disks as above. or you can get a LSI 1064 based card that esxi can do raid 1 with. either way you then have 6 ports to use for your main ZFS data disks.

But the best is to find a OI supported sas hba that is LSI 1068 or 2008 based and flash it into IT-mode to give the best support. Once you have one of these cards you can then either boot OI dirrect off the onboard 6 port (using 2 disks in a mirrored boot pool) and use this 8 port HBA for the main data drives or do the same thing with esxi booted off the onboard 6 port ( with no raid mirroring :/ )

for an esxi all in one your best bet is to get a cheaper 4 port LSI 1064 card to boot esxi with raid and then a better 8 port to use for data disks. though you could get two 1064's and add in the 6 onbaord ports to the OI VM giving you 10 ports.

Also work out where you are going to mount your 2 x 2.5 boot drives. That case may have no good place to mount them so you would need to use 3.5-2.5 inch adapters and lose 2 of the hot swap bays for boot drives. You may be able to rig them somewhere random inside the case but thats at your own risk.
 
vm ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c2t50014EE100A5D2EAd0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-6
c2t50014EE156934189d0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-2
mirror-1 ONLINE 0 0 0
c2t50014EE1AB6BA40Bd0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-6
c2t50014EE1ADF6F144d0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-2
mirror-2 ONLINE 0 0 0
c2t50014EE200C320FAd0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-0
c2t50014EE201ABB7BCd0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-2
mirror-3 ONLINE 0 0 0
c2t50014EE2560F4133d0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-7
c2t50014EE25610C2F2d0 ONLINE 0 0 0 500.11 GB WDC WD5000AAKS-7

Ill do some more testing and see what happens.

Edit

I did zfs set sync=disabled on my pool

instead of 45MB/s I'm getting 450MB /s

Whats the best course of action here to keep these speeds, and not have sync writes disabled?

Dedicated ZIL log device(s) (size can be very small but MLC will last longer with larger capacity for wear levelling)

Depending on budget and use-case (home/business critical) from cheapest to most expensive:

MLC SSD
SLC SSD (Intel 311)
2x MLC SSD mirrored
2x SLC SSD (Intel 311) mirrored
RAM based 'SSD' such as Acard or STEC solutions
2x RAM based 'SSD' such as Acard or STEC solutions mirrored


Bear in mind writes will be limited to the write speed of the ZIL logging device.
 
I have three kingston v100 64gb ML ssd's currently not in use.

I tried two of them mirrored and wasn't happy with their performance, it was only slightly higher write speed.

I have a budget of about $600 left (maybe more depending on what it gets me)

Would 4 of the intel 311 SSD's be a good move, or jump right to a ram disk.

I have plenty of ddr2 I have access to

Dedicated ZIL log device(s) (size can be very small but MLC will last longer with larger capacity for wear levelling)

Depending on budget and use-case (home/business critical) from cheapest to most expensive:

MLC SSD
SLC SSD (Intel 311)
2x MLC SSD mirrored
2x SLC SSD (Intel 311) mirrored
RAM based 'SSD' such as Acard or STEC solutions
2x RAM based 'SSD' such as Acard or STEC solutions mirrored


Bear in mind writes will be limited to the write speed of the ZIL logging device.
 
I'm not sure if it's possible to use more than a single mirror as a ZIL log device. It would be pretty awesome if you could do 2x mirror VDEVs striped. (4,6,8 drives etc.)

The 311's have a write speed of around 110MB/s and random writes top out around 70MB/s - that's still plenty to saturate Gigabit for sequential and obviously the random performance is also very good.
 
DD just generates traffic. If your disk fails, you cannot detect via DD

For LSI 2008 based systems and SES enclosures, there are options to display the
physical slots (ex. my napp-it monitor extension) - otherwise you should write down
the slot and the id/ SN of your disks and where they are.

Single checksum errors are not a problem. A lot indicates a disk or enclosure/power/ cabling/ controller etc problem

Gea, thanks for the response. I do have an LSI 2008 system, so I tried the extension and it was able to find the slots, however the identification did not work. There are a number of pins on my backplane so I am thinking it supports it, but they didn't give me much in the way of a manual for it. :( I'll have to see if I can figure that out.

The scrub completed and repaired 88 checksum errors. It has been error free since yesterday evening so I guess I'll just keep an eye on it. Perhaps it was cleaning up after the failed disk?

Any idea on why the auto-replace didn't work? Thanks!
 
I am not virtualizing (yet); I am running OI directly.

Any clues on how to control disk-spin downs?

I have done the following add 'setprop interval 24h' to /usr/lib/fm/fmd/plugins/disk-transport.conf and manually set the spindown time

Code:
cpu-threshold 30s 
device-thresholds         /dev/dsk/c6d0     90s 
device-thresholds         /dev/dsk/c4t3d0     90s 
device-thresholds         /dev/dsk/c4t4d0     90s 
device-thresholds         /dev/dsk/c4t5d0     90s 
device-thresholds         /dev/dsk/c4t6d0     90s 
device-thresholds         /dev/dsk/c4t7d0     90s


Also in general, do servers ever go to sleep like desktops do? I do not have an IT background (engineering yes) so do not have any idea about how they typically operate.

Try this:

http://www.nexenta.org/boards/1/topics/1414

worked for me

P.S. you might wanna set the timer a bit higher. 90 seconds is very little. I would set them to 10-15 minutes. Remember that disks will wear out VERY quickly if they spin up/down often.
 
I have three kingston v100 64gb ML ssd's currently not in use.

I tried two of them mirrored and wasn't happy with their performance, it was only slightly higher write speed.

I have a budget of about $600 left (maybe more depending on what it gets me)

Would 4 of the intel 311 SSD's be a good move, or jump right to a ram disk.

I have plenty of ddr2 I have access to

Just watch out as the intel 311 is not really designed for log use and does not have a supercap and it's unknown if it will lose all uncommitted writes if there is a powercut or the host system has an unexpected powerdown. Only a few higher end drives have some form of power loss protection like this. It's hard to find reliable info on all this online as everyone has there own 2 cents.

Also you often can't just use some spare ddr2 you have lying around. The proper dram based log devices normally come bundled with the ram and they are a special device made of the following components:

An interface/conroller chip either SATA/SAS or PCI-express
4 or more dimm slots populated with Dimms
a small SLC flash drive
Supercap/Battery/External power adapter

When the power cuts the ram is backed up using the backup power to the attached flash and when it powers up it copies back the other way.

I've been thinking that it would be useful if someone actually tested SSD's to see what happens in power failure situation to see how they handle it in the real world. If I had the devices and the time I would set up the following test steps:

1. Secure erase or blank with zeroes a known safe drive and the drive to be tested
2. add these two devices to a pool as mirrored log devices
3. create a high throughput write load and yank the power out
4. Don't boot back into your solaris based os instead boot linux live CD maybe
5. DD copy the first few gigabytes of data from both raw drives to files
6. Compare these files to see if the known safe drive with supercap wrote different data to the test drive

In theory the drives should be nearly identical except for zfs identifying drive headers. Note you would expect a very small possible difference as the last sector written may not be in perfect parallel but this will only be the finial bit that hasn't even been confirmed back to the originating sync write initiator.

This would then allow someone to test different brand controllers and see just how safe they are but remember no one try this on anything other than test systems as cutting power like this is not great.
 
They do this just to save money and allow people to choose what raid/hba card they want.
...

Also work out where you are going to mount your 2 x 2.5 boot drives. That case may have no good place to mount them so you would need to use 3.5-2.5 inch adapters and lose 2 of the hot swap bays for boot drives. You may be able to rig them somewhere random inside the case but thats at your own risk.

Thanks for all that. I guess there's no way for me to get out of learning all this HBA, Backplane, SAS controller stuff. :(
 
Just watch out as the intel 311 is not really designed for log use and does not have a supercap and it's unknown if it will lose all uncommitted writes if there is a powercut or the host system has an unexpected powerdown.

I thought it was well known that Intel's own SSD controllers don't store data in any caches....

Although saying that doesn't explain why the 320 series has high capacity capacitors.....

edit:

An Intel 710 or 2 would be a safe choice.

Another thought though - if your sata controller is set to turn drive write caches off surely an SSD should obey this in the same way a hard drive does?
 
Last edited:
Has anyone benchmarked read/write performance with OpenIndiana vs Solaris 11 vs FreeBSD 9? Pretty curious how BSD performs
 
I thought it was well known that Intel's own SSD controllers don't store data in any caches....

Although saying that doesn't explain why the 320 series has high capacity capacitors.....

edit:

An Intel 710 or 2 would be a safe choice.

Another thought though - if your sata controller is set to turn drive write caches off surely an SSD should obey this in the same way a hard drive does?

yeah 710 would be a safe bet but they don't have very good iops/$ because they are very expensive and slower than even the intel 320. They have higher write endurance so they will last longer in a high end production envoironment. For a smaller workload you can use a larger intel 320. One interesting point though is that the only thing that will really create large ware on a log device is massive bulk copying data to your pool for example backing up daily/weekly large data sets and for workloads like this you don't need to have perfect write consistancy so setting an alternative share that has sync write disabled will keep your speed high and your main important workloads can go though the log as normal so they won't lose data.

Also note that the intel flash controllers don't use the attached dram chip for write caches but inside the controller has to have some internal sram etc which it uses during writes. Also the other big problem is that when a controller losses power during the middle of a flash write cycle they will start writing random currupt data. With a supercap etc the controller has a chance to finish what its doing and get to a safe shutdown state before the power runs out.

here is someone who tested some intel SSDs:
http://www.evanjones.ca/intel-ssd-durability.html

This is why you have to be so careful. Its really bad that when people review SSD's online they never test this kind of thing. Its not just important for ZFS log use it is also very important for server and database use.
 
about missing manuals around Illumian, OpenIndiana and napp-it

I'm aware, that my english is not perfect but i suppose it does not help if i write manuals in perfect German (my mother language). I invite all to help with a better manual - related to napp-it as well as Illumian and OpenIndiana regarding language and content.

Recently I started to move my manuals to bookie.cc see http://booki.cc/illumian-openindiana-napp-it-the-missing-manual/_info/ where it is possible to collaborate writing manuals under a open licence. I invite all to help writing manuals not only about napp-it but also about things in OpenIndiana and the difference to Ilumian or Solaris 11 (this is the annoying part). You can also start books there. Linking between them may give a manual some day replacing Oracle docs (the differences are growing) or the old OpenSolaris bible (too outdated)

If you or other likes editing access, please send a mail to [email protected]
 
Last edited:
Back
Top