OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Gea,
Pushover isn't available for Windows Phone users. I have no idea how much work it is, but can you add support for Pushalot as well? It is something like pushover. It has apps for Windows 8 and Windows Phone. And it is totally free.

I will check that but cannot promise anything


update
Pushalot support is available in napp-it 0.9f6 (Jun.10.2015)
 
Last edited:
If anyone encounters troubles with iscsi target freezing with JBOD SAS expander box, here is a discussion:
http://lists.omniti.com/pipermail/omnios-discuss/2015-March/004593.html

It doesn't look good though. It looks like SATA and SAS expanders don't go well together, specially on servers with high load (average IOPS on our server is 7k/s read and 1k/s write). There might be troubles when a drive in array hangs and controller sends a restart command. In some cases, the whole SAS expander resets and all commands in queue and on bus are lost, producing panic on a system:)

In my case, sometimes only some hosts loose a connection, sometime all hosts drop,...

Matej

We managed to change all hard drives, JBOD cases, servers, LSI HBA cards, SAS cables, upgraded OmniOS to latest x14 version,... Actually, we change all the hardware, we changes pool design(from a single 50 drive pool to a 7vdev x 10drives pool), we even changed datacenters and switch, yet we still had the same problems.

Then I thought, what is the only common thing left? Well, it's the firmware. We downgraded from P19 to P17 and so far, it looks way better. Speeds are actually a lot higher, pool is not 100% busy anymore and so far, no freezing. It's maby still too early to say, but it looks promising.

Also, it wasn't iscsi problem, but the pool itself froze (issuing 'touch a' in zpool folder made the command freeze for up to 15min) for X amount of time and then continue with operations. SAS HBA reset did not help. After so much downtime, iscsi crashed as well.

It might help someone with the same problems.

Matej
 
If anyone wants to try Pushalot with napp-it
use the following dev key

monitor dev - 12.06.2015::mVqmVqsmVVTTVsqmlqsmVKTTstDl

Copy/paste the whole line into menu extension >> register
and you can update to latest 0.9f6_dev
 
Last edited:
It works perfectly! :D
thank you for the quick implementation

[edit] found one typo in Push - Alert menu:
Service provider: Pushover (Android, IOS, Desktop) or Pushover (Win8.1+, Windows Phone, free)

should be: Service provider: Pushover (Android, IOS, Desktop) or Pushalot (Win8.1+, Windows Phone, free)
 
I guess I spoke too soon:)

It crashed this night. Although it looks like only iscsi target froze this time, not the FS itself. At least nagios did not report problems...

Matej
 
Hi Everyone, I'm running a Napp-it all in one server and have a question about ESXI NIC teaming.

I'm running ESXI 5.5U2 on a machine with two physical Intel NICs and my overall goal is to be able to have 2Gbps throughput over the network (outside of the VM network) to the ZFS Share

I have used the vSphere client to enable NIC teaming in ESXI.
I have the two network adapters added to the same VM Network, and have enabled NIC teaming. I chose "Route based on originating virtual port ID" which if I understand it correctly, won't increase the throughput, it will just spread the load to one NIC or the other.
I would use "Route based on IP Hash" which if I understand it correctly, would let the throughput reach 2gbps on one 10gbps vmxnet3. But from what I have read, this will not work properly with a "dumb" switch. (which is what I have.)

To some extent, I have acheived 2gbps, but it is difficult to manage:
I have been able to max the throughput on my ZFS server only by adding two virtual NICS to my napp-it VM, and then send data to each of the virtual NICs by referencing two copy jobs to one of the two IP address each. I can see that when I do this I am indeed saturating 2gbps worth of data. The downside is it is inconvenient to worry about which IP is currently being used, and then use the other one when performing a copy job, especially when I'd like to just reference the ZFS shares by name, rather than various IP addresses. I'd like one IP that can do it all.

I guess I'll phrase this differently now that the scenario is painted



A) Rather than try to NIC team via "Route based on IP hash" in esxi, Is it possible to team the two virtual lan ports within OmniOS in order to have one ip address but obtain 2gbps throughput?

B) Is it possible to "Route based on ip hash" on a dumb switch through some method?

Thanks for any insight you all might have on obtaining more than 1gbps throughput. Thanks

P.S. On further reading, I do see that many state that bonding can cause more problems than it is worth. I would still be interested in hearing from people who do have it setup since I often do large copy jobs, and these days even gigabit is pretty slow these days with USB 3.0+ HDDs and SSDs, etc..
 
Last edited:
Just wanted to reply to my own post... I discovered 2 things that are helping me out a lot to get better reliability / faster transfers.

A) USB 3.0 Support via a ESXI patch http://www.v-front.de/2014/11/vmware-silently-adds-native-usb-30.html
-Be sure to force USB 3.0 in the BIOS of your bare metal server, and use the automatic startup script that the website references.


B) Removing any e1000 nics from OmniOS VM on ESXI 5.5U2
I was having a horrible time with super high CPU usage and other problems, freezing up on the network, etc... I finally removed the e1000 nic(that e1000 is bad news and worked horribly on this version of esxi) and added a second vmxnet3. This way I can still saturate 2gbps via 2 IP addresses, I'm not getting pegged CPU at 100% and that is pretty good for my usage.


With these two fixes, I can now mount USB 3.0 drives directly into the system via a Windows 8.1 VM I have and do copies, and with multiple IPs I can saturate 2gbps on the network!, it isn't one big bruiser of a single link but its pretty good considering I can really do some damage now when copying to / from the ZFS pool or backing up my files to external drives.

Pretty happy at this point. Hope this post might help someone.
 
I need to replace all the drives in a RAIDZ3 vdev by bigger ones. We're talking 19 drives. I've got enough ports to put all the new drives into the server at once. I can offline the pool. What would be the fastest way to do this, can I clone all the drives, then when doing a one by one replace, the resilver would be instantaneous ?
 
With enough slots, I would create a new pool and replicate the filesystems to the new pool like
zfs send oldpool/filesystem -> newpool

Other option is a 19x disk replace olddisk -> newdisk with autoexpand enabled.
 
Last edited:
The pool is actually 57 drives (3 vdevs) so I can't do that. There is a backup but I would rather not lose redundancy while doing the operation.

My question is really about how the replace command works, I've seen that it uses the old disk if available, but if something goes wrong will it go back automatically ?

Also, copying the 19 drives at once will probably be quite slow as I'm bandwidth limited, so I would do 4 or 5 at once.

Finally, does autoexpand need to be used for each replace ?
 
replace command
zpool replace tank c1t1d0 c2t0d0

- The old disk is valid until the replace is finished
- autoexpand is a pool property
 
Neat! Thanks! I'll be switching back to Solaris from OmniOS for the SMB 2.1 feature. Hopefully I can keep my ZFS pools at the older version for compatibility if I want to switch back to OmniOS.
 
I have an OpenIndiana system with Napp-it, which shows the following:

Code:
  pool: rpool
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
  scan: scrub repaired 0 in 0h28m with 0 errors on Sun Jul 12 23:28:12 2015
config:

	NAME        STATE     READ WRITE CKSUM     CAP            Product
	rpool       DEGRADED     0     0     0
	  mirror-0  DEGRADED     0     0     0
	    c5d0s0  OFFLINE      0     0     0     80 GB          
	    c2d0s0  ONLINE       0     0     0     80 GB          

errors: No known data errors

From what I was told, the original second drive had failed and has been physically replaced (I don’t have access to it anymore) and an attempt was made to add a new drive to the system.

Trying
Code:
zpool online rpool c5d0s0
gave me the following warning
Code:
warning: device 'c5d0s0' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
and now the pool lists the drive as unavailable:

Code:
	NAME        STATE     READ WRITE CKSUM     CAP            MODELL
	rpool       DEGRADED     0     0     0
	  mirror-0  DEGRADED     0     0     0
	    c5d0s0  UNAVAIL      0     0     0  cannot open     80 GB          
	    c2d0s0  ONLINE       0     0     0     80 GB

Now I'm stuck... :( Can anyone help?

Thanks!
 
unavail means missing or damaged.

You can insert a new disk (>= 80GB, same sectorsize), reboot and replace the unavail disk with the new one on another port. If you replace on same port a reboot may also be needed.

If this disk is on Sata:
Onboard Sata/AHCI is basically hotplug capable but this is disabled per default:
To enable add the following line to /etc/system (OmniOS, should work on OI as well)

set sata:sata_auto_online=1
 
Quick question; Im running My ZFS currently on Openindiana. Im doing updates and consolidating services to my server. I need to upgrade to Esxi 6 and 'm trying to move all my media services to the same ESXI box. I am choosing between OmniOS and Freenas which has them all included. Setting these up on OI was a pain; Sabnzbd couch potato and sickbeard but id like to try Sonarr and nzbget/drone also which I could never get to work on OI.

I see performance wise OmniOS is the clear choice for now till Freenas10 might be available. Is it recommended to run these two in separate VM and view the ZFS system through the OmniOS? Cant decide if i want to just import to FreeNas and let it take control. I like Napp it not sure if I'm comfortable switching. Whats the ideal config for this situation?

Also looking into it; I could run OmniOS - then looking at this Flawless Media Server Linux Mint distro could stream everything through that..
 
Last edited:
The strength of Solaris based systems are performance, stabilty, the CIFS server and that an appliance is independent from a distribution (You can use a vanilla OS like OmniOS, OI or Oracle Solaris with the option of a GUI).

While there are many applications available via pkgin from SmartOS the focus is not a home media server. But with ESXi as a base, you can divide storage, media server, amp server etc to different VMs from BSD over OSX, Linux to Windows.


Regarding portability of pools:
If the disks were formatted in BSD with GPT partitions (which FreeBSD recognizes but Solaris doesn't), you cannot import to Solaris.

Pools with disks formatted with GEOM can be exported from FreeBSD/FreeNAS/ZFSGuru and reimported into Solaris. GPT may work also but only if the partition spans the whole disk
 
Hey Gea, the Pushover feature is awesome, thank you so much for adding it. I was wondering if you could expand it to notify on S/H/T errors as well? And maybe Smart info failures?

I ask because (at least in my experience) I've never had a disk just fail and go offline with ZFS except for physically pulling it to test Pushover. I always get hours/days of S/H/T errors, reduced performance (which I don't always notice right away), then maybe the disk fails. Usually I get to the "catastrophic performance degradation" phase before the "complete disk failure" phase.

So I think if Napp-IT can notify on S/H/T errors increasing by say more than 1 in the last X minutes (to account for Smart checks, etc) that would be incredibly useful.
 
BTW, I'm doing a scrub of my pool, and there has been some MB repaired on two drives of two vdevs, however I noticed that (repairing) stayed mentioned besides the drives for hours after that (the scrub takes several days). I did a reboot to plug some drives in and then (repairing) had disappeared, while the scrub continued.

The drives I'm plugging in are NTFS as I'm migrating the last of my data to ZFS, and for some reason I can mount GPT drives fine, while MBR drives always fail, any help appreciated. I'm using ntfs-3g, here is what I get :

root@X6:~# mount -F ntfs-3g /dev/dsk/c6t5000C5002F75FBA0d0s1 /mnt/_win2T01
Error opening '/devices/scsi_vhci/disk@g5000c5002f75fba0:b': I/O error
Failed to mount '/devices/scsi_vhci/disk@g5000c5002f75fba0:b': I/O error
NTFS is either inconsistent, or there is a hardware fault, or it's a
SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows
then reboot into Windows twice. The usage of the /f parameter is very
important! If the device is a SoftRAID/FakeRAID then first activate
it and mount a different device under the /dev/mapper/ directory, (e.g.
/dev/mapper/nvidia_eahaabcc1). Please see the 'dmraid' documentation
for more details.

The drives pass chkdsk without error under windows, and I eject them properly.
 
This has hampered me for more than a week and I just figured it out, with MBR drives I need to use the "p1" partition instead of the "s1".
 
Hey Gea, the Pushover feature is awesome, thank you so much for adding it. I was wondering if you could expand it to notify on S/H/T errors as well? And maybe Smart info failures?

I ask because (at least in my experience) I've never had a disk just fail and go offline with ZFS except for physically pulling it to test Pushover. I always get hours/days of S/H/T errors, reduced performance (which I don't always notice right away), then maybe the disk fails. Usually I get to the "catastrophic performance degradation" phase before the "complete disk failure" phase.

So I think if Napp-IT can notify on S/H/T errors increasing by say more than 1 in the last X minutes (to account for Smart checks, etc) that would be incredibly useful.

An extendable alert mechanism is already planned for one of the next releases.
 
Has anybody tried the Solaris 11.3 beta and/or using napp-it with it? Does it install and work OK?
 
Has anybody tried the Solaris 11.3 beta and/or using napp-it with it? Does it install and work OK?

I have done only some basic tests and it is running.
You need the newest wget installer from today (online now) to install Solaris 11.3 properly
 
EDIT: Disregard. My Windows 8.1 machine was showing SMB dialect 1.50 until I rebooted and now it shows 2.10 with Solaris 11.3 beta.
 
Last edited:
Should proftpd be included with Solaris 11.3? I'm trying to enable it but the 'proftpd' service can't be found. However, the 'network/ftp' service exists and looks to be proftpd.

Also, when I try to enable the tftpd service, I get the following error: "/var/web-gui/_log/tftpd.conf must contain -s /folder". It looks like this tftpd.conf file exists but is empty.
 
Last edited:
On Solaris, you must setup Proftp manually.

Proftp and the AMP stack settings within napp-it are based on pkgin from SmartOS
that can be used with OmniOS as an additional (free and community driven) setup.
 
Most napp-it users are on OmniOS now.
There are many professional Solaris users around but only with basic storage needs (CIFS, NFS, iSCSI)

While ftp is an addon, tftpd is included in the napp-it distribution.
I have not tried but the included tftpd may work on Solaris with the services menu.
 
I had still been having this problem which required me to delete old snaps myself. I am on omnios-r151014.

My snap logs have been showing lines like:

hold 5 days: match rule: snap-age d1error days

I traced the "d1error" to the date2diff function and ran the following test:

perl -e 'use lib "/var/web-gui/data/napp-it/CGI"; \
require "/var/web-gui/data/wwwroot/cgi-bin/admin-lib.pl"; \
$d = &date2diff("01.01.2015", "01.02.2015"); print $d, "\n"'

This produced the following error:

Modification of a read-only value attempted at /var/web-gui/data/wwwroot/cgi-bin/admin-lib.pl line 2621.

The relevant lines are:

2617 sub date2valid {
2618 ##############
2619 # parameter: datum ("tt.mm.jjjj")
2620 # return datum oder -1
2621 $_[0]=~s/ +//g;

Fixing to copy into a variable and not modifying $_ directly resolves the problem, and snaps are now being cleaned up.

FYI:
# perl --version

This is perl 5, version 16, subversion 1 (v5.16.1) built for i86pc-solaris-thread-multi-64

Copyright 1987-2012, Larry Wall
...

# uname -a
SunOS fishlaris.local 5.11 omnios-170cea2 i86pc i386 i86pc

The above setting creates 48 snaps per day.
As keep and hold are respected both, hold 3 days is the effective setting
what means that you should have 144 snaps

If snap is recursive, you have 144 snaps x number of filesystems.
 
Most napp-it users are on OmniOS now.
There are many professional Solaris users around but only with basic storage needs (CIFS, NFS, iSCSI)

While ftp is an addon, tftpd is included in the napp-it distribution.
I have not tried but the included tftpd may work on Solaris with the services menu.
OK - Thanks for the information. I was using OmniOS and really enjoyed it, but I feel that SMB 2.1 is important. Hopefully Nexenta merges their changes into Illumos so it's available in OmniOS in the future...
 
OK - Thanks for the information. I was using OmniOS and really enjoyed it, but I feel that SMB 2.1 is important. Hopefully Nexenta merges their changes into Illumos so it's available in OmniOS in the future...

There is some work by Nexenta to integrate SMB 2.1 into Illumos.
 
replace command
zpool replace tank c1t1d0 c2t0d0

- The old disk is valid until the replace is finished
- autoexpand is a pool property

I'm doing something wrong.

root@X6:~# zpool replace bay c4t0d0 C6t50014EE20707A7AAd0
cannot open 'C6t50014EE20707A7AAd0': no such device in /dev/dsk
must be a full path or shorthand device name

c4t0d0 is the old 2TB connected on an internal SATA port
C6t50014EE20707A7AAd0 is the new 3TB in a JBOD enclosure. It had an NTFS partition, I initialized it with napp-it but things didn't improve.
 
After trying a million things it finally worked, but I don't know why, annoying.

I'm seeing this :

root@X6:~# zpool status -v bay
pool: bay
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Jul 24 19:19:07 2015
8.45T scanned out of 134T at 2.47G/s, 14h28m to go
2.99M resilvered, 6.30% done


It doesn't make much sense, 6.30% done with only 2.99M resilvered, that 2TB drive is pretty full.

Is it reading the whole array like a scrub, and not just copying from the current drive ? If so it will waste a lot of time (and I fear that 2.47G/s will not last) since my pool is very unbalanced. I did a scrub a couple days ago (while using the pool) and it took 165 hours, so I wonder if that replace will take a similar amount of time.
 
Back
Top