OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

unclerunkle

Weaksauce
Joined
Nov 9, 2010
Messages
94
Hello everyone -

This may be a dumb question, but after setting the power.conf settings, how can I tell if it's actually working? Also, is there a power log somewhere to see when/why the disks spin up and down? I'm hoping to eliminate any automated processes spinning up the disks.

Here are my settings for reference:
Code:
Status

fmri         svc:/system/power:default
name         power management
enabled      true
state        online
next_state   none
state_time   Sat Mar  1 15:59:13 2014
logfile      /var/svc/log/system-power:default.log
restarter    svc:/system/svc/restarter:default
dependency   require_all/none svc:/system/filesystem/minimal (online)
Code:
device-dependency-property removable-media /dev/fb
autopm                  enable
autoS3                  default
cpu-threshold           1s
# Auto-Shutdown         Idle(min)       Start/Finish(hh:mm)     Behavior
#autoshutdown            30              9:00 9:00               noshutdown
cpupm  disable
#device-thresholds         /dev/dsk/c2t0d0     15m
device-thresholds         /dev/dsk/c6t5000C5006CDDF84Cd0     15m
device-thresholds         /dev/dsk/c6t5000C5006CDD228Ad0     15m
device-thresholds         /dev/dsk/c6t5000C5006D028327d0     15m
device-thresholds         /dev/dsk/c6t5000C5006CDEA757d0     15m
device-thresholds         /dev/dsk/c6t5000C5006CDCE307d0     15m
device-thresholds         /dev/dsk/c6t5000C5006CDDBDADd0     15m
device-thresholds         /dev/dsk/c6t5000C5006CDD03DAd0     15m
device-thresholds         /dev/dsk/c6t5000C5006D16ECBDd0     15m
device-thresholds         /dev/dsk/c6t5000C5006CDD1722d0     15m
device-thresholds         /dev/dsk/c6t5000C5006CDDC3B1d0     15m

device-thresholds         /dev/dsk/c6t50014EE103AB86E4d0     30m
device-thresholds         /dev/dsk/c6t50014EE103AB80FFd0     30m
Note: I don't want CPU throttling, so I turned that feature off.
 

grendel19

Gawd
Joined
Jun 26, 2009
Messages
579
I couldn't open my shares today and noticed that the smb server service suddenly stopped today.
root@omnios:~# svcs -xv svc:/network/smb/server:default
svc:/network/smb/server:default (smbd daemon)
State: offline since Sat Mar 8 19:31:36 2014
Reason: Service svc:/network/rpc/bind:default
is not running because a method failed.
See: http://illumos.org/msg/SMF-8000-GE
Path: svc:/network/smb/server:default
svc:/system/idmap:default
svc:/network/rpc/bind:default
See: man -M /usr/share/man -s 1M smbd
Impact: This service is not running.
So then checked rpc bind service
root@omnios:~# svcs -xv svc:/network/rpc/bind:default
svc:/network/rpc/bind:default (RPC bindings)
State: maintenance since Sat Mar 8 20:04:47 2014
Reason: Start method exited with $SMF_EXIT_ERR_CONFIG.
See: http://illumos.org/msg/SMF-8000-KS
See: man -M /usr/share/man -s 1M rpcbind
See: /var/svc/log/network-rpc-bind:default.log
Impact: 6 dependent services are not running:
svc:/network/rpc/gss:default
svc:/network/smb/client:default
svc:/network/smb/server:default
svc:/system/idmap:default
svc:/system/filesystem/autofs:default
svc:/network/rpc/smserver:default
Checking the log file:
root@omnios:~# tail /var/svc/log/network-rpc-bind:default.log
[ Mar 8 19:54:15 svc.startd could not set context for method: ]
[ Mar 8 19:54:15 chdir: Permission denied ("/root") ]
[ Mar 8 19:54:15 Method "start" exited with status 96. ]
[ Mar 8 20:04:38 Leaving maintenance because disable requested. ]
[ Mar 8 20:04:38 Disabled. ]
[ Mar 8 20:04:47 Enabled. ]
[ Mar 8 20:04:47 Executing start method ("/lib/svc/method/rpc-bind start"). ]
[ Mar 8 20:04:47 svc.startd could not set context for method: ]
[ Mar 8 20:04:47 chdir: Permission denied ("/root") ]
[ Mar 8 20:04:47 Method "start" exited with status 96. ]
Haven't made any changes, this just suddenly happened. Any ideas?
 

moose517

Gawd
Joined
Feb 28, 2009
Messages
640
Man i could really use some help. I"m getting frustrated but i can't afford to do anything else right now. napp-it has marked a bunch of drives as unavailable but if i run fmadm repair it can't ind anything wrong with anything. What can i do to get my pools back online so i can at least figure out something to get my data if this is just gonna crap on me :(.
 

cw823

n00b
Joined
Mar 15, 2006
Messages
34
Man i could really use some help. I"m getting frustrated but i can't afford to do anything else right now. napp-it has marked a bunch of drives as unavailable but if i run fmadm repair it can't ind anything wrong with anything. What can i do to get my pools back online so i can at least figure out something to get my data if this is just gonna crap on me :(.

have you tried to export and then import your pool?
 

moose517

Gawd
Joined
Feb 28, 2009
Messages
640
zfs.PNG
wn't let me export, shut down and reseated all the connectors and such between my M1015 and the intel SAS expander and still nothing.
 

chune

Weaksauce
Joined
Nov 2, 2013
Messages
70
zfs.PNG
wn't let me export, shut down and reseated all the connectors and such between my M1015 and the intel SAS expander and still nothing.

There has been much debate on problems with zfs when using sata drives behind SAS expanders. Can you bypass the expander and see if the drives show up? Not sure why you are using one anyways if you only have 4 drives...
 
Last edited:

moose517

Gawd
Joined
Feb 28, 2009
Messages
640
not enough onboard ports to plug that raid pool in. but yeah its not just the 5 drives LOL, its a full norco chassis.
 

chune

Weaksauce
Joined
Nov 2, 2013
Messages
70
not enough onboard ports to plug that raid pool in. but yeah its not just the 5 drives LOL, its a full norco chassis.

Not sure i follow, the pool you screencapped only has the four disks... throw the disks in a different server to rule out hardware?
 

twistacatz

Limp Gawd
Joined
Jan 3, 2005
Messages
182
i didn't screencap the other pools as they are fine, ish, couple unavail on them as well

If some drives are coming up and some are not it sounds like you have a hardware issue. What is the topology of your system?

Is it possible one of your backplanes has gone bad? I know Norco backplanes are notorious for dying. What Norco case do you have?
 

moose517

Gawd
Joined
Feb 28, 2009
Messages
640
If some drives are coming up and some are not it sounds like you have a hardware issue. What is the topology of your system?

Is it possible one of your backplanes has gone bad? I know Norco backplanes are notorious for dying. What Norco case do you have?

I've got a M1015 in a PCIe slot of course with an intel RES2SV240 connected together. From there i have SATA break out cables to the HDD bays, its a norco 4020. I'm highly thinking backplane at this point because i started moving around the SAS connectors on the expander last night and still got nothing, even moved it directly to the controller and still got nothing. But this morning the system stopped responding and HDD lights all over the place are flashing activity LEd's making me think its doing something but i don't have monitor i can hook up ironically enough to see whats going on and the web interface seems to have stopped working. Really like to get my hands on a quality supermicro chassis but i don't have a job ATM so its not even feasible.
 

twistacatz

Limp Gawd
Joined
Jan 3, 2005
Messages
182
I've got a M1015 in a PCIe slot of course with an intel RES2SV240 connected together. From there i have SATA break out cables to the HDD bays, its a norco 4020. I'm highly thinking backplane at this point because i started moving around the SAS connectors on the expander last night and still got nothing, even moved it directly to the controller and still got nothing. But this morning the system stopped responding and HDD lights all over the place are flashing activity LEd's making me think its doing something but i don't have monitor i can hook up ironically enough to see whats going on and the web interface seems to have stopped working. Really like to get my hands on a quality Supermicro chassis but i don't have a job ATM so its not even feasible.

Yeah man I think they have a Norco owners thread somewhere around here. I would check it out and hit up google to see what you can find. Like I said you wouldn't be the first person to have issues with a Norco backplane.

I know you can't make the purchase now but I would keep my eye on the deal thread as I've seen a lot of people buy Supermicro rigs for the low, used.

Good Luck
 

moose517

Gawd
Joined
Feb 28, 2009
Messages
640
well i managed to hookup a monitor today, all 24 drives are seen on boot, so its purely got to be something with ZFS.

EDIT: did some checking and all 25 drives are showing up to the OS just fine, even able to use the activity LED to show the drive no problem from napp-it so i really have no clue how to clear the drives and bring my pool back online :(
 
Last edited:

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,032
well i managed to hookup a monitor today, all 24 drives are seen on boot, so its purely got to be something with ZFS.

EDIT: did some checking and all 25 drives are showing up to the OS just fine, even able to use the activity LED to show the drive no problem from napp-it so i really have no clue how to clear the drives and bring my pool back online :(

If your hardware is working properly and the WWN-IDs are not changed for whatever reason the pool sould be imported on reboot without any action.

Typical ZFS mechanism on such problems are:
- clear errors (napp-it menu pool), that clears former errors that are no longer valid
- export + import pool where disk IDs are newly assigned if needed

If you cannot export + import, you can unplug the pool, bootup and hotplug the disks and try a import.

As this does not happen you must think of reasons why the disks are unavail. These are usually driver + HBA problems, power and cabling problems, SAS expander problems with Sata disks and single disks that are blocking the controller.

Connect one and then another disk to Sata then the IBM without expander to check if this disk is available then (pools stays unavail unless enough disks are detected and online)
 

moose517

Gawd
Joined
Feb 28, 2009
Messages
640
the disks are already directly connected to the IBM without the expander. The OS sees the disks i just don't know how to clear the suspended I/O state. i'm getting so fed up with ZFS, thought it would make it easier to manage but its been more of a pain than just going with a true RAID setup.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,032
the disks are already directly connected to the IBM without the expander. The OS sees the disks i just don't know how to clear the suspended I/O state. i'm getting so fed up with ZFS, thought it would make it easier to manage but its been more of a pain than just going with a true RAID setup.

You are working with a true Raid setup that is more intelligent than any hardware Raid. But that does not help if your OS knows the pool while the disks are not available.

have you tried

- clear errors (napp-it menu pools)
- clear napp-it cache (menu disks-delete disk buffer and zfs filesystems - delete ZFS buffer)
- pool import
- pool export + import
- is one or another single disk online (in case of a disk blocking all others)
- use Sata (in case of the IBM is bad)
- connect directly without the backplane (in case of the backplane is bad)
 
Last edited:

moose517

Gawd
Joined
Feb 28, 2009
Messages
640
You are working with a true Raid setup that is more intelligent than any hardware Raid. But that does not help if your OS knows the pool while the disks are not available.

have you tried

- clear errors (napp-it menu pools)
- clear napp-it cache (menu disks-delete disk buffer and zfs filesystems - delete ZFS buffer)
- pool import
- pool export + import
- is one or another single disk online (in case of a disk blocking all others)
- use Sata (in case of the IBM is bad)
- connect directly without the backplane (in case of the backplane is bad)

if i do clear errors i get the following: cannot clear errors for NFS: I/O error
clearing napp-it cache: no errors
- pool import NFS SUSPENDED One or more devices are unavailable in response to IO failures. The pool is suspended.
- pool export + import: cannot export 'NFS': pool I/O is currently suspended
- accoding to both napp-it and the OS all disk drives from that pool are online, if its blocking i have no idea how to check but no activity LED's are lit solid
- can't use sata as there aren't enough onboard ports to connect + have my OS drive hooked up so wouldn't get anywhere
- if backplane was bad woudln't they not show in the BIOS either?
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,032
if i do clear errors i get the following: cannot clear errors for NFS: I/O error
clearing napp-it cache: no errors
- pool import NFS SUSPENDED One or more devices are unavailable in response to IO failures. The pool is suspended.
- pool export + import: cannot export 'NFS': pool I/O is currently suspended
- accoding to both napp-it and the OS all disk drives from that pool are online, if its blocking i have no idea how to check but no activity LED's are lit solid
- can't use sata as there aren't enough onboard ports to connect + have my OS drive hooked up so wouldn't get anywhere
- if backplane was bad woudln't they not show in the BIOS either?

Connect only one or another disk to Sata to check if these disks are displayed as online - it does not matter if the pool remains offline. Remove the IBM for this test. Be aware that these disks gets new short port numbers like c0t1d0 on AHCI Sata (only LSI SAS2 in IT mode shows disk unique WWN numbers as disk id).

If disk are seen, call pool-import to check if the pool is detected as not importable due to too many disks are missing. In such a case the problem is located at the IBM or the backplane.

If you can, move the disks to another server where you can import the pool (to rule out a power problem)
 

moose517

Gawd
Joined
Feb 28, 2009
Messages
640
ok that makes sense. I'll have to give it a shot later tonight/tomorrow and see what i can turn up. I guess when i get a job again i need to look into new hardware maybe :(
 

mkush

n00b
Joined
Aug 2, 2012
Messages
36
Strange thing happened and I'd like someone to reassure me that nothing is really wrong. Caused me to lose confidence in my box and OmniOS/Illumos so I hope there is a good explanation.

What happened is that I built my box, installed OmniOS and did my tried-and-true command sequence to share out my pool via iSCSI. I was investigating other paths but in the end that is what I've come back to. The iSCSI target is connected to a Mac Pro (OS X 10.9.x). The zpool is about 32TB in size. As you'll see below I initially left about 1TB free, then subsequently 2TB. This was only to keep up performance as the iSCSI volume approaches capacity.

Here is what I did originally:

Code:
zpool create -f zp raidz2 [disk list]
zpool set feature@lz4_compress=enabled zp
zfs create -s -V 31T zp/iSCSI
svcadm enable stmf
pkg install storage-server
svcadm enable iscsi/target
sbdadm create-lu /dev/zvol/rdsk/zp/iSCSI
itadm create-target
stmfadm add-view [GUID from sbdadm above]
I then connected to it from my Mac, formatted it and copied many terabytes of data to it. Finally I scrubbed it, no data errors, cool! I should mention that my data to be copied to it was on individual 4TB drives and it took each one probably 10 hours to copy, maybe a bit more.

Then I realized that I'd left out something important: when I created the filesystem (zfs create), I neglected to turn on lz4 compression. Ooops. I figured, no big deal, I'll just recreate it an copy the stuff again.

So, I got rid of the view, target, lu, and filesystem (in that order). Then I redid the above commands, starting with the zfs create like so:

Code:
zfs create -s -o compression=lz4 -V 30T zp/iSCSI
Followed by the other steps to create the lu/target/view. Note that I did NOT recreate the pool, just the filesystem on it. Reconnected the Mac, formatted, started copying files. Except this time, the Mac said it was going to take a couple of DAYS to copy the files. I left it all day, came back and it was still less than half. So there was no doubt that it was MUCH slower than the first time.

Baffled and doubting my design choices, I just redid the whole thing including the zpool itself. Why I didn't do that before is a mystery to me since it's just one command more, and the pool didn't contain any other data. The Mac is currently copying files and it is again fast, in fact it seems faster than the first time which may make sense given the lz4 compression this time (less data to write to disk).

So after all that blah blah, does anyone see a reason that the performance was so much slower the second time and why recreating the zpool seemed to "fix" it?
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,032
Have you modified any other settings?
Compress needs some CPU power but improve performane in most cases

I would
- disable compress and compare performance
- use iostat to check CPU load and compare busy/wait values of disks

- check/set iSCSI blocksize (try higher values like 64KB)
- check LU writeback cache settings (on=fast, off=secure but slow without a fast ZIL)

- use a local benchmark like bonnie to check basic values
 

mkush

n00b
Joined
Aug 2, 2012
Messages
36
No other settings modified. In fact, the commands shown are everything I did with the exception of setting up the NIC (an Intel x540-T1 10GbE, jumbo frames enabled). Very vanilla.

The system has plenty of CPU avail, it's a Xeon E5-2620 v2 (6 cores at 2.1GHz), 64GB memory.

To me the question boils down to: why does a "fresh" pool, one that never had a zfs create run on it, perform so much better than a pool which had a filesystem created and deleted on it?

In other words, same commands run for a pre-used but now-empty pool vs. being run on a freshly created pool result in vastly different performance. I don't see where the lz4 has anything to do with it... That has effectively been "factored out".
 

mkush

n00b
Joined
Aug 2, 2012
Messages
36
Didn't touch that setting, it's always been on.

I think I can summarize best like this:

1. Create vanilla OmniOS install
2. Create pool
3. Create volume (no lz4), share with iSCSI, write lots of data from Mac

-> PERFORMANCE IS GREAT

4. Destroy volume and iSCSI "share"
5. Create volume (with lz4), share with iSCSI, write lots of data from Mac

-> PERFORMANCE IS BAD

6. Destroy volume, iSCSI "share" and POOL
7. Create pool
8. Create volume (with lz4), share with iSCSI, write lots of data from Mac

-> PERFORMANCE IS GREAT

In the above I noted whether or not lz4 was enabled but I do not believe it matters here since it was enabled for both "try 2" and "try 3" above. It seems to me that it is then "factored out" of the equation. The only difference seems to be whether the volume was created on a fresh pool or not.

By the way, I'm now copying my third huge disk of data to the server and it is still very fast.
 

Freak1

Limp Gawd
Joined
Sep 9, 2009
Messages
191
Hi.

I like to update my All in one. I have esxi 5.0 and Napp-IT 0.8h nightly May.03.2012 on Open indiana.

Last time when i updated to esxi 5.0, you (_Gea) me to make a fresh installation and not an update. Is that also recommended this this time? Is there a mini how to for making a update?
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,032
You need to

1. update ESXi
- download new ISO, boot and select update
do not update virtual machine hardware version to v10 if you use ESXi free


2. update OI and vmware tools,
or better reinstall OI, install vmwaretools and install napp-it per wget

or use the prebuild napp-it virtual storage appliance (OmniOS stable)
downloadable from napp-it.org
 

kronik8

Limp Gawd
Joined
Oct 16, 2002
Messages
167
So, when I reboot my file server, after unlocking the encrypted volume, SMB needs to essentially be disabled and re-enabled before it'll work. This does not happen with NFS. Anything I can do about this?
 

moose517

Gawd
Joined
Feb 28, 2009
Messages
640
Connect only one or another disk to Sata to check if these disks are displayed as online - it does not matter if the pool remains offline. Remove the IBM for this test. Be aware that these disks gets new short port numbers like c0t1d0 on AHCI Sata (only LSI SAS2 in IT mode shows disk unique WWN numbers as disk id).

If disk are seen, call pool-import to check if the pool is detected as not importable due to too many disks are missing. In such a case the problem is located at the IBM or the backplane.

If you can, move the disks to another server where you can import the pool (to rule out a power problem)

I removed the IBM and directly connected and i'm still not able to import the pool due to too many IO Error. Would reinstalling the OS just clear everything or is like suspension state stored on the drives as well? THis particular raid pool its no big loss but my other is but it responds just fine.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,032
So, when I reboot my file server, after unlocking the encrypted volume, SMB needs to essentially be disabled and re-enabled before it'll work. This does not happen with NFS. Anything I can do about this?

Have you tried to restart NFS service?

svccadm disable nfs/server
svcadm enable nfs/server
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,032
I removed the IBM and directly connected and i'm still not able to import the pool due to too many IO Error. Would reinstalling the OS just clear everything or is like suspension state stored on the drives as well? THis particular raid pool its no big loss but my other is but it responds just fine.

There is nothing on disk. If the I/O error remains on a different controller one ore more disks are damaged blocking the damaged disk(s) or all disks on a controller - even good ones. Other possible problem is a PSU problem (not enough/ bad power)
 

Freak1

Limp Gawd
Joined
Sep 9, 2009
Messages
191
You need to

1. update ESXi
- download new ISO, boot and select update
do not update virtual machine hardware version to v10 if you use ESXi free


2. update OI and vmware tools,
or better reinstall OI, install vmwaretools and install napp-it per wget

or use the prebuild napp-it virtual storage appliance (OmniOS stable)
downloadable from napp-it.org

Thanks. I like to use the new Omni. Do i need to export the pool or anything like that before i close open indiana?
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,032
It is a good usage to export prior import but if you forgot, you can import without a former export. All pool properties are kept. You may need to recreate users and reassign permissions.
 

moose517

Gawd
Joined
Feb 28, 2009
Messages
640
There is nothing on disk. If the I/O error remains on a different controller one ore more disks are damaged blocking the damaged disk(s) or all disks on a controller - even good ones. Other possible problem is a PSU problem (not enough/ bad power)

I really hope its not a PSU problem i'm on my third one now. I would think a 560wwould be sufficient for 24 drives wouldn't it?

EDIT: managed to get it to recognize 2 of the disk in the "damaged" pool but now my other pool is suspended and unavailable. WTF.

EDIT2: what the fuck ever. I'm done with ZFS bullshit. there isn't a damn thing wrong with the drives, garbage raid system. Screw the 30TB of movies i had stored on it.
 
Last edited:

schleicher

n00b
Joined
Jun 4, 2012
Messages
39
ZFS is not garbage, your hardware maybe is... I know a dozen of people personally that are totally happy with ZFS and also this and other forums prove you are wrong :)
 
Top