OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Yay, something happened inbetween M1015 HBA, Supermicro 826E1 Expander and 8 3TB Drives in Raidz2...
Basicly the pool is suspended and all of the disks are unavailable. But disks are present in format etc...
Tried clearing errors in fmadm, but no go. Zpool clear gives pool i/o is currently suspended error. Any ideas?

Posting from work, and since the storage for my network VMs is down I can't post better info until I get home.

Edit: Aaaand solved. Just had to turn of the server, pull the suspended pools disks, export the pool, shut down the server again, plugin the disks and import the pool.
 
Last edited:
I also had all the drives in my 16 bays JBOD disappear. The server is shut down most of the time, I booted it, and the pool was down because most drives were unavailable. Several reboots later (with the drives detected during boot, but not in OpenIndiana/napp-it), no change, so I tried an older boot environment, and voilà, the pool was up without doing anything. I set it as default and upgraded to latest napp-it from there, but I wonder what happened, considering my very light use and no change in parameters.
 
I need help with a potential problem with my 2 disk mirrored pool. As of late it has really bad access times (30 secs+) and some of the files after going a directory or two deep don't show up.

Unfortunately I've never fully scrubbed this mirror (or any other pools I have setup).

napp-it reports no errors:
pool: important
state: ONLINE
scan: scrub canceled on Thu Jan 9 22:34:56 2014
config:

NAME STATE READ WRITE CKSUM
important ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c3t5000CCA37EC1CC3Ad0 ONLINE 0 0 0
c3t5000CCA37EC227BAd0 ONLINE 0 0 0

errors: No known data errors


id part partcap diskcap identify stat state pool vdev error vendor product sn
c3t5000CCA37EC1CC3Ad0 (!parted) 3 TB via dd ok ONLINE important mirror S:0 H:0 T:0 ATA Hitachi HDS72303 MS79215X03YLLA
c3t5000CCA37EC227BAd0 (!parted) 3 TB via dd ok ONLINE important mirror S:0 H:3 T:0 ATA Hitachi HDS72303 MS79215X04RZ6G

Before I rebooted, disk c3t5000CCA37EC227BAd0 had a lot of S,H,T errors (several hundred). What are these numbers?


What should I be running to find out what's wrong with the mirror? I already have 1 replacement HD ready to pop in if need be. What is the process to replace the disk?

Any help would be greatly appreciated. Thank you!


-----
Current specs:
Intel Xeon E3-1230v2
Supermicro X9SCL+-F
32GB ECC Kingston RAM
IBM M1015 (IT Mode)
4x 3TB Toshiba HDD

ESXi 5.1
OI+napp-it v0.9d2
 
I need help with a potential problem with my 2 disk mirrored pool. As of late it has really bad access times (30 secs+) and some of the files after going a directory or two deep don't show up.

Unfortunately I've never fully scrubbed this mirror (or any other pools I have setup).

napp-it reports no errors:
pool: important
state: ONLINE
scan: scrub canceled on Thu Jan 9 22:34:56 2014
config:

NAME STATE READ WRITE CKSUM
important ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c3t5000CCA37EC1CC3Ad0 ONLINE 0 0 0
c3t5000CCA37EC227BAd0 ONLINE 0 0 0

errors: No known data errors


id part partcap diskcap identify stat state pool vdev error vendor product sn
c3t5000CCA37EC1CC3Ad0 (!parted) 3 TB via dd ok ONLINE important mirror S:0 H:0 T:0 ATA Hitachi HDS72303 MS79215X03YLLA
c3t5000CCA37EC227BAd0 (!parted) 3 TB via dd ok ONLINE important mirror S:0 H:3 T:0 ATA Hitachi HDS72303 MS79215X04RZ6G

Before I rebooted, disk c3t5000CCA37EC227BAd0 had a lot of S,H,T errors (several hundred). What are these numbers?


What should I be running to find out what's wrong with the mirror? I already have 1 replacement HD ready to pop in if need be. What is the process to replace the disk?

Any help would be greatly appreciated. Thank you!


-----
Current specs:
Intel Xeon E3-1230v2
Supermicro X9SCL+-F
32GB ECC Kingston RAM
IBM M1015 (IT Mode)
4x 3TB Toshiba HDD

ESXi 5.1
OI+napp-it v0.9d2

Soft/Hard/Transfer errors are warnings from iostat. Look at them as warnings not real errors like ZFS checksum errors. But they indicate problems.
Look also at the wait value on writes (%w) in system - statistics - disks or the status ample. If you have significant high wait or busy values on a single disk this indicates a problem as well.

I would replace the disk (menu disks-replace) and do a low level check with a (mostly Windows) tool from the disk manufacturer.
 
Soft/Hard/Transfer errors are warnings from iostat. Look at them as warnings not real errors like ZFS checksum errors. But they indicate problems.
Look also at the wait value on writes (%w) in system - statistics - disks or the status ample. If you have significant high wait or busy values on a single disk this indicates a problem as well.

I would replace the disk (menu disks-replace) and do a low level check with a (mostly Windows) tool from the disk manufacturer.

Thanks for the quick reply and your work on napp-it, _Gea!

How would I physically replace this disk? Do I leave the server on, plug in the disk and then select the bad disk to replace with the new one? Or should I really power everything down to replace it?
 
Your IBM 1015 is hotplug capable.
If you have a backlane or a hotplugable case:
Just hot plugin a new disk, do a disk replace and hot remove the old disk after resilver.

Otherwise shutdown for disk removal/ insert.
 
Hello guys, I would like to build a little file server/nas. It will be mainly used to download files (via torrent) and share them within my home network. I'd like to use a ZFS-compatilbe OS. I've downloaded Open Indiana and I'm trying to install napp-it without results (I think I am missing something for the static IP address/static route and so on) :(

Have you got a step-by-step guide/video that explains how to install napp-it ?

Thank you ^^
 
Thank you _Gea I'll ready all the documentation. I have another question: is there a Bittorrent client for OI? Or maybe a plugin in nappit ?

Thank you :)
 
Last edited:
I have successfully installed napp-it :D

Now, how can I share my Zpool with windows pcs ? Do you suggest CIFS or SMB ?
 
CIFS is built into ZFS so go with that. It's simple, just create a new ZFS folder and the default settings are share enabled.
 
I've thought that I can virtualize W7 x86/Lubuntu (minimum amount of needed hw resources) and make it save the downloaded (utorrent) files on the ZFS pool.

In that way I have a rather fast storage and the download software.

I have only to find the disk to install the OS onto... at the moment I only have 2 drives.
 
I am in the process of upgrading my home storage from 4 x 1TB RAIDZ1. I am planning on using 3TB disks and given the potential for issues when resilvering with 3 and 4TB disks am wondering if I should Mirror and Stripe or RAID Z2 with 4 drives? Any thoughts? My main concern is durability.
 
I think that mirror and stripe would be faster than raidz2, and would be even easier to handle for the cpu.
 
I think that mirror and stripe would be faster than raidz2, and would be even easier to handle for the cpu.
I agree it would be faster, but I am more concerned about resilience. With Z2 I believe any 2 drives can fail. With mirror and stripe if the second drive to go is the other one in the mirror I have a problem. Have others gone through the pros and cons of both options? In the end I guess it always comes down to backups.

On an unrelated question, what do others do to burn in their drives?
 
One of my drives is should in napp-it smart tool as having a smart_selftest "ERROR"

However when I look at the smart info details for the drive I do not see anything wrong, am I missing something?

Code:
smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

Vendor:               SEAGATE 
Product:              ST32000645SS    
Revision:             0004
User Capacity:        2,000,398,934,016 bytes [2.00 TB]
Logical block size:   512 bytes
Logical Unit id:      0x5000c5004173077b
Serial number:        Z1K01QE7000092373NHV
Device type:          disk
Transport protocol:   SAS
Local Time is:        Sun Jan 19 22:55:39 2014 EST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:     34 C
Drive Trip Temperature:        68 C
Manufactured in week 14 of year 2012
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  112
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  112
Elements in grown defect list: 0
Vendor (Seagate) cache information
  Blocks sent to initiator = 3205916666
  Blocks received from initiator = 1899239487
  Blocks read from cache and sent to initiator = 509538132
  Number of read and write commands whose size <= segment size = 12988653
  Number of read and write commands whose size > segment size = 1929
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 11383.38
  number of minutes until next internal SMART test = 57

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   959902262        0         0  959902262          0       8238.500           0
write:         0        0         0         0          0        987.895           0
verify:    32308        0         0     32308          0          0.000           0

Non-medium error count:        0

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -     266                 - [-   -    -]

Long (extended) Self Test duration: 32767 seconds [546.1 minutes]
 
One of my drives is should in napp-it smart tool as having a smart_selftest "ERROR"

However when I look at the smart info details for the drive I do not see anything wrong, am I missing something?

Code:
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -     266                 - [-   -    -]

Long (extended) Self Test duration: 32767 seconds [546.1 minutes]

All my disks returns a "Completed without error", yours only a Completed

i will add the following to the affected script get-disk-smart.bg.pl line 185

Code:
          if ($c=~/without error|Completed\s+\-/) {        # some disk report 'Completed  -' (not 'Completed without error')

in the loop

Code:
      foreach my $c (@s) {
            if (!($c=~/Completed/i))    { next; }

            if ($c=~/without error|Completed\s+\-/) {        # some disk report 'Completed  -' (not 'Completed without error')
                $sv{$k}="without error";                     #  Completed without error
            } else {
                $sv{$k}="ERROR";
            }
      }

If you like, you can try
 
Last edited:
Gea I made that change in /var/web-gui/data/napp-it/zfsos/_lib/get-disk-smart-bg.pl on line 185. However when I load the smart page it appears to get into an infinite loop of reloading the page.
 
Looks like I managed to fry my SAS card somehow. I just flashed it to the latest firmware, and in addition to a bunch of NVDATA mismatch errors, using the configuration util via ctrl+c ends up with it hardlocking after a minute or two. Good times.
 
Gea I made that change in /var/web-gui/data/napp-it/zfsos/_lib/get-disk-smart-bg.pl on line 185. However when I load the smart page it appears to get into an infinite loop of reloading the page.

This may happen on syntax errors.
Just reload 0.9e from today
 
Ugh, I've hit a snag and I'm not sure how to fix it.

I'm following the All-in-One guide.

Napp-it is currently fully functional, but I can't create a pool because it doesn't see my disks.

Since I am using the onboard SATA ports rather than a SAS expander card like the tutorial mentions, I suppose this is why.

I tried fiddling in the virtual machine's settings, but without much success.

What do I have to do to make the virtual machine see the disks that are connected to the motherboard's SATA ports?
 
You would have to pass the sata controller in (if possible), but then, of course, you have to find somewhere else to put the storage appliance...
 
You would have to pass the sata controller in (if possible), but then, of course, you have to find somewhere else to put the storage appliance...

By storage appliance you mean the SSD that runs the OS?

This is a NAS for my father, I built my own NAS with napp-it a few years ago. Then I had installed OpenIndiana manually and napp-it afterwards. In my NAS, I have SSD+4 disks connected to the mobo, and 4 more disks connected to a LSI SAS expander.

If I am to repeat this setup, I should just do the same thing then I guess.
 
But the OI SSD can't be on a controller that is being passed in to itself. Unless there are two different controllers on the mobo? You need to be more specific about what is hooked up how...
 
My own NAS doesn't make use of VMware at all (or ESXi). It's a barebone OI install with napp-it on top. I don't remember ever having to deal with controllers.

I plugged my SSD in the SATA0 port, installed OI from a USB stick, then added HDD1-4 on SATA1,SATA2, etc. and OI instantly recognized all the drives. I made a pool with these four. Then a few months later I added a LSI expander and four more drives. Had to flash the firmware on it, but that's all it took. The whole thing runs on a MSI X58M desktop motherboard though, whereas for my dad I go him a real server mobo (Supermicro X9SCL with ECC ram).

So if controllers were important before I didn't know about it and got lucky.
 
I'm confused then. When I think of Gea's AIO guide, it involves esxi with an OI storage appliance with controller(s) passed in via vt-d. I think I mixed your post with someone else's then. That said, if you're running a bare-metal storage server with napp-it on top, that is not an All in One per this thread.
 
When I originally built my own NAS (September 2011), the All-in-One tutorial wasn't up as far as I remember, so I never got to do it this way.

For the NAS I am building now, I was following the All-in-One guide, but didn't get a SAS because I wanted to use the SATA ports of my mobo like I had done on my first build. That is why I'm here again.

If I wanted to do that, I'd have to run OI on a USB stick with the all in one method through ESXi?
 
Success!

However in RaidZ I seem to have less capacity than my other NAS with an identical hard drive configs.

Four 2TB in Raidz only give 4.7TB of usable space?

9n4aOh1.jpg


zCCGiRR.jpg


Edit: N/m I had reserved 532G somehow
 
Last edited:
You have perforrmance problems with any filesystems when they are nearly full - also or especially with copy on write filesystems like ZFS.
When you create a pool within napp-it, the default is a 10% pool reservation. If you need you can delete or increase this reservation when you click on the fres poolreservation in menu ZFS filesystems.
 
Your IBM 1015 is hotplug capable.
If you have a backlane or a hotplugable case:
Just hot plugin a new disk, do a disk replace and hot remove the old disk after resilver.

Otherwise shutdown for disk removal/ insert.

I did the replacement, but towards the end it was slowing down to around 4MB/s reslivering. The entire reslivering process took over 4 days. 2 days ago, it was at around 96%. When I checked yesterday, the pool didn't show up as listing anything, it just said "Pool Status:" and nothing was populated after that line, so I let it sit for another day.

Today the same thing is happening, I cannot get any of the menus to show disks or pools. So I forced a restart from the napp-it menu.

Currently the console is outputting:
WARNING: /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Disconnected command timeout for Target 12

This is showing up repeatedly for the past 10 mins. Is it safe to force shutdown from vsphere?

I really have no idea what went wrong! Can anyone help?

EDIT: I forced it to reboot, and now it's stuck at the OI progress bar when the OS loads up. The error message now:
Jan 23 21:05:16 zfs scsi: WARNING: /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Jan 23 21:05:16 zfs : Disconnected command timeout for Target 12

Every 60-70 seconds, the message repeats.

At this point is OI buggered?

Should I be looking to switch to OmniOS and import the pool then?
 
Last edited:
I can't say what that message is, but are your drives seen during boot ? You should try to boot on a live OS (OI or OmniOS or Solaris) and see what you get.
 
I just had an issue like this, several disks kept causing messages like that, and the disks to disconnect.

This turned out to simplely be URE's on the disk, and the card not handling it nicely. Moving the disks to an intel ICH controller and fixing the URE's got the disks to play nice again, and bring the system back to life.
 
Back
Top