OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

_Gea · Jun 9, 2015

nl-darklord said:
Gea,
Pushover isn't available for Windows Phone users. I have no idea how much work it is, but can you add support for Pushalot as well? It is something like pushover. It has apps for Windows 8 and Windows Phone. And it is totally free.

I will check that but cannot promise anything

update
Pushalot support is available in napp-it 0.9f6 (Jun.10.2015)

levak · Jun 10, 2015

levak said:
If anyone encounters troubles with iscsi target freezing with JBOD SAS expander box, here is a discussion:
http://lists.omniti.com/pipermail/omnios-discuss/2015-March/004593.html

It doesn't look good though. It looks like SATA and SAS expanders don't go well together, specially on servers with high load (average IOPS on our server is 7k/s read and 1k/s write). There might be troubles when a drive in array hangs and controller sends a restart command. In some cases, the whole SAS expander resets and all commands in queue and on bus are lost, producing panic on a system

In my case, sometimes only some hosts loose a connection, sometime all hosts drop,...

Matej

We managed to change all hard drives, JBOD cases, servers, LSI HBA cards, SAS cables, upgraded OmniOS to latest x14 version,... Actually, we change all the hardware, we changes pool design(from a single 50 drive pool to a 7vdev x 10drives pool), we even changed datacenters and switch, yet we still had the same problems.

Then I thought, what is the only common thing left? Well, it's the firmware. We downgraded from P19 to P17 and so far, it looks way better. Speeds are actually a lot higher, pool is not 100% busy anymore and so far, no freezing. It's maby still too early to say, but it looks promising.

Also, it wasn't iscsi problem, but the pool itself froze (issuing 'touch a' in zpool folder made the command freeze for up to 15min) for X amount of time and then continue with operations. SAS HBA reset did not help. After so much downtime, iscsi crashed as well.

It might help someone with the same problems.

Matej

_Gea · Jun 10, 2015

Good to hear about solutions, not only problems...

nl-darklord · Jun 10, 2015

_Gea said:
I will check that but cannot promise anything

update
Pushalot support is available in napp-it 0.9f6 (Jun.10.2015)

Thanks! I will test it
[edit] i'll have to wait. : free updates : 0.9f4, 0.9f5

_Gea · Jun 10, 2015

If anyone wants to try Pushalot with napp-it
use the following dev key

monitor dev - 12.06.2015::mVqmVqsmVVTTVsqmlqsmVKTTstDl

Copy/paste the whole line into menu extension >> register
and you can update to latest 0.9f6_dev

nl-darklord · Jun 11, 2015

It works perfectly!

thank you for the quick implementation

[edit] found one typo in Push - Alert menu:
Service provider: Pushover (Android, IOS, Desktop) or Pushover (Win8.1+, Windows Phone, free)

should be: Service provider: Pushover (Android, IOS, Desktop) or Pushalot (Win8.1+, Windows Phone, free)

levak · Jun 11, 2015

I guess I spoke too soon

It crashed this night. Although it looks like only iscsi target froze this time, not the FS itself. At least nagios did not report problems...

Matej

cbutters · Jun 12, 2015

Hi Everyone, I'm running a Napp-it all in one server and have a question about ESXI NIC teaming.

I'm running ESXI 5.5U2 on a machine with two physical Intel NICs and my overall goal is to be able to have 2Gbps throughput over the network (outside of the VM network) to the ZFS Share

I have used the vSphere client to enable NIC teaming in ESXI.
I have the two network adapters added to the same VM Network, and have enabled NIC teaming. I chose "Route based on originating virtual port ID" which if I understand it correctly, won't increase the throughput, it will just spread the load to one NIC or the other.
I would use "Route based on IP Hash" which if I understand it correctly, would let the throughput reach 2gbps on one 10gbps vmxnet3. But from what I have read, this will not work properly with a "dumb" switch. (which is what I have.)

To some extent, I have acheived 2gbps, but it is difficult to manage:
I have been able to max the throughput on my ZFS server only by adding two virtual NICS to my napp-it VM, and then send data to each of the virtual NICs by referencing two copy jobs to one of the two IP address each. I can see that when I do this I am indeed saturating 2gbps worth of data. The downside is it is inconvenient to worry about which IP is currently being used, and then use the other one when performing a copy job, especially when I'd like to just reference the ZFS shares by name, rather than various IP addresses. I'd like one IP that can do it all.

I guess I'll phrase this differently now that the scenario is painted

A) Rather than try to NIC team via "Route based on IP hash" in esxi, Is it possible to team the two virtual lan ports within OmniOS in order to have one ip address but obtain 2gbps throughput?

B) Is it possible to "Route based on ip hash" on a dumb switch through some method?

Thanks for any insight you all might have on obtaining more than 1gbps throughput. Thanks

P.S. On further reading, I do see that many state that bonding can cause more problems than it is worth. I would still be interested in hearing from people who do have it setup since I often do large copy jobs, and these days even gigabit is pretty slow these days with USB 3.0+ HDDs and SSDs, etc..

cbutters · Jun 12, 2015

Just wanted to reply to my own post... I discovered 2 things that are helping me out a lot to get better reliability / faster transfers.

A) USB 3.0 Support via a ESXI patch http://www.v-front.de/2014/11/vmware-silently-adds-native-usb-30.html
-Be sure to force USB 3.0 in the BIOS of your bare metal server, and use the automatic startup script that the website references.

B) Removing any e1000 nics from OmniOS VM on ESXI 5.5U2
I was having a horrible time with super high CPU usage and other problems, freezing up on the network, etc... I finally removed the e1000 nic(that e1000 is bad news and worked horribly on this version of esxi) and added a second vmxnet3. This way I can still saturate 2gbps via 2 IP addresses, I'm not getting pegged CPU at 100% and that is pretty good for my usage.

With these two fixes, I can now mount USB 3.0 drives directly into the system via a Windows 8.1 VM I have and do copies, and with multiple IPs I can saturate 2gbps on the network!, it isn't one big bruiser of a single link but its pretty good considering I can really do some damage now when copying to / from the ZFS pool or backing up my files to external drives.

Pretty happy at this point. Hope this post might help someone.

Aesma · Jun 20, 2015

I need to replace all the drives in a RAIDZ3 vdev by bigger ones. We're talking 19 drives. I've got enough ports to put all the new drives into the server at once. I can offline the pool. What would be the fastest way to do this, can I clone all the drives, then when doing a one by one replace, the resilver would be instantaneous ?

_Gea · Jun 21, 2015

With enough slots, I would create a new pool and replicate the filesystems to the new pool like
zfs send oldpool/filesystem -> newpool

Other option is a 19x disk replace olddisk -> newdisk with autoexpand enabled.

Aesma · Jun 21, 2015

The pool is actually 57 drives (3 vdevs) so I can't do that. There is a backup but I would rather not lose redundancy while doing the operation.

My question is really about how the replace command works, I've seen that it uses the old disk if available, but if something goes wrong will it go back automatically ?

Also, copying the 19 drives at once will probably be quite slow as I'm bandwidth limited, so I would do 4 or 5 at once.

Finally, does autoexpand need to be used for each replace ?

_Gea · Jun 21, 2015

replace command
zpool replace tank c1t1d0 c2t0d0

- The old disk is valid until the replace is finished
- autoexpand is a pool property

Aesma · Jun 21, 2015

Thanks. I will have fun with the WWIDs !

_Gea · Jun 27, 2015

Some watchable videos from the OpenZFS European Conference 2015 in Paris
are available at https://www.youtube.com/channel/UC0IK6Y4Go2KtRueHDiQcxow/videos

_Gea · Jul 17, 2015

News from Oracle:
Solaris 11.3 beta is available (LZ4 and SMB 2.1)

https://blogs.oracle.com/gman/entry/oracle_solaris_11_3_beta

jmk396 · Jul 17, 2015

Neat! Thanks! I'll be switching back to Solaris from OmniOS for the SMB 2.1 feature. Hopefully I can keep my ZFS pools at the older version for compatibility if I want to switch back to OmniOS.

TheLastBoyscout · Jul 17, 2015

I have an OpenIndiana system with Napp-it, which shows the following:

Code:

  pool: rpool
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
  scan: scrub repaired 0 in 0h28m with 0 errors on Sun Jul 12 23:28:12 2015
config:

	NAME        STATE     READ WRITE CKSUM     CAP            Product
	rpool       DEGRADED     0     0     0
	  mirror-0  DEGRADED     0     0     0
	    c5d0s0  OFFLINE      0     0     0     80 GB          
	    c2d0s0  ONLINE       0     0     0     80 GB          

errors: No known data errors

From what I was told, the original second drive had failed and has been physically replaced (I dont have access to it anymore) and an attempt was made to add a new drive to the system.

Trying

Code:

zpool online rpool c5d0s0

gave me the following warning

Code:

warning: device 'c5d0s0' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present

and now the pool lists the drive as unavailable:

Code:

	NAME        STATE     READ WRITE CKSUM     CAP            MODELL
	rpool       DEGRADED     0     0     0
	  mirror-0  DEGRADED     0     0     0
	    c5d0s0  UNAVAIL      0     0     0  cannot open     80 GB          
	    c2d0s0  ONLINE       0     0     0     80 GB

Now I'm stuck...

Can anyone help?

Thanks!

_Gea · Jul 17, 2015

unavail means missing or damaged.

You can insert a new disk (>= 80GB, same sectorsize), reboot and replace the unavail disk with the new one on another port. If you replace on same port a reboot may also be needed.

If this disk is on Sata:
Onboard Sata/AHCI is basically hotplug capable but this is disabled per default:
To enable add the following line to /etc/system (OmniOS, should work on OI as well)

set sata:sata_auto_online=1

bjamm · Jul 19, 2015

Quick question; Im running My ZFS currently on Openindiana. Im doing updates and consolidating services to my server. I need to upgrade to Esxi 6 and 'm trying to move all my media services to the same ESXI box. I am choosing between OmniOS and Freenas which has them all included. Setting these up on OI was a pain; Sabnzbd couch potato and sickbeard but id like to try Sonarr and nzbget/drone also which I could never get to work on OI.

I see performance wise OmniOS is the clear choice for now till Freenas10 might be available. Is it recommended to run these two in separate VM and view the ZFS system through the OmniOS? Cant decide if i want to just import to FreeNas and let it take control. I like Napp it not sure if I'm comfortable switching. Whats the ideal config for this situation?

Also looking into it; I could run OmniOS - then looking at this Flawless Media Server Linux Mint distro could stream everything through that..

_Gea · Jul 19, 2015

The strength of Solaris based systems are performance, stabilty, the CIFS server and that an appliance is independent from a distribution (You can use a vanilla OS like OmniOS, OI or Oracle Solaris with the option of a GUI).

While there are many applications available via pkgin from SmartOS the focus is not a home media server. But with ESXi as a base, you can divide storage, media server, amp server etc to different VMs from BSD over OSX, Linux to Windows.

Regarding portability of pools:
If the disks were formatted in BSD with GPT partitions (which FreeBSD recognizes but Solaris doesn't), you cannot import to Solaris.

Pools with disks formatted with GEOM can be exported from FreeBSD/FreeNAS/ZFSGuru and reimported into Solaris. GPT may work also but only if the partition spans the whole disk

thedge · Jul 19, 2015

Hey Gea, the Pushover feature is awesome, thank you so much for adding it. I was wondering if you could expand it to notify on S/H/T errors as well? And maybe Smart info failures?

I ask because (at least in my experience) I've never had a disk just fail and go offline with ZFS except for physically pulling it to test Pushover. I always get hours/days of S/H/T errors, reduced performance (which I don't always notice right away), then maybe the disk fails. Usually I get to the "catastrophic performance degradation" phase before the "complete disk failure" phase.

So I think if Napp-IT can notify on S/H/T errors increasing by say more than 1 in the last X minutes (to account for Smart checks, etc) that would be incredibly useful.

Aesma · Jul 19, 2015

BTW, I'm doing a scrub of my pool, and there has been some MB repaired on two drives of two vdevs, however I noticed that (repairing) stayed mentioned besides the drives for hours after that (the scrub takes several days). I did a reboot to plug some drives in and then (repairing) had disappeared, while the scrub continued.

The drives I'm plugging in are NTFS as I'm migrating the last of my data to ZFS, and for some reason I can mount GPT drives fine, while MBR drives always fail, any help appreciated. I'm using ntfs-3g, here is what I get :

root@X6:~# mount -F ntfs-3g /dev/dsk/c6t5000C5002F75FBA0d0s1 /mnt/_win2T01
Error opening '/devices/scsi_vhci/disk@g5000c5002f75fba0:b': I/O error
Failed to mount '/devices/scsi_vhci/disk@g5000c5002f75fba0:b': I/O error
NTFS is either inconsistent, or there is a hardware fault, or it's a
SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows
then reboot into Windows twice. The usage of the /f parameter is very
important! If the device is a SoftRAID/FakeRAID then first activate
it and mount a different device under the /dev/mapper/ directory, (e.g.
/dev/mapper/nvidia_eahaabcc1). Please see the 'dmraid' documentation
for more details.

The drives pass chkdsk without error under windows, and I eject them properly.

Aesma · Jul 19, 2015

This has hampered me for more than a week and I just figured it out, with MBR drives I need to use the "p1" partition instead of the "s1".

_Gea · Jul 20, 2015

thedge said:
Hey Gea, the Pushover feature is awesome, thank you so much for adding it. I was wondering if you could expand it to notify on S/H/T errors as well? And maybe Smart info failures?

I ask because (at least in my experience) I've never had a disk just fail and go offline with ZFS except for physically pulling it to test Pushover. I always get hours/days of S/H/T errors, reduced performance (which I don't always notice right away), then maybe the disk fails. Usually I get to the "catastrophic performance degradation" phase before the "complete disk failure" phase.

So I think if Napp-IT can notify on S/H/T errors increasing by say more than 1 in the last X minutes (to account for Smart checks, etc) that would be incredibly useful.

An extendable alert mechanism is already planned for one of the next releases.

jmk396 · Jul 21, 2015

Has anybody tried the Solaris 11.3 beta and/or using napp-it with it? Does it install and work OK?

_Gea · Jul 21, 2015

jmk396 said:
Has anybody tried the Solaris 11.3 beta and/or using napp-it with it? Does it install and work OK?

I have done only some basic tests and it is running.
You need the newest wget installer from today (online now) to install Solaris 11.3 properly

jmk396 · Jul 21, 2015

Thanks so much Gea.

thedge · Jul 21, 2015

_Gea said:
An extendable alert mechanism is already planned for one of the next releases.

Awesome, not at all surprised that you're steps ahead of me. I look forward to it!

jmk396 · Jul 21, 2015

EDIT: Disregard. My Windows 8.1 machine was showing SMB dialect 1.50 until I rebooted and now it shows 2.10 with Solaris 11.3 beta.

jmk396 · Jul 22, 2015

Should proftpd be included with Solaris 11.3? I'm trying to enable it but the 'proftpd' service can't be found. However, the 'network/ftp' service exists and looks to be proftpd.

Also, when I try to enable the tftpd service, I get the following error: "/var/web-gui/_log/tftpd.conf must contain -s /folder". It looks like this tftpd.conf file exists but is empty.

_Gea · Jul 22, 2015

On Solaris, you must setup Proftp manually.

Proftp and the AMP stack settings within napp-it are based on pkgin from SmartOS
that can be used with OmniOS as an additional (free and community driven) setup.

jmk396 · Jul 22, 2015

OK - Thanks Gea. Does the same apply for the tftpd service?

_Gea · Jul 22, 2015

Most napp-it users are on OmniOS now.
There are many professional Solaris users around but only with basic storage needs (CIFS, NFS, iSCSI)

While ftp is an addon, tftpd is included in the napp-it distribution.
I have not tried but the included tftpd may work on Solaris with the services menu.

natkin · Jul 22, 2015

I had still been having this problem which required me to delete old snaps myself. I am on omnios-r151014.

My snap logs have been showing lines like:

hold 5 days: match rule: snap-age d1error days

I traced the "d1error" to the date2diff function and ran the following test:

perl -e 'use lib "/var/web-gui/data/napp-it/CGI"; \
require "/var/web-gui/data/wwwroot/cgi-bin/admin-lib.pl"; \
$d = &date2diff("01.01.2015", "01.02.2015"); print $d, "\n"'

This produced the following error:

Modification of a read-only value attempted at /var/web-gui/data/wwwroot/cgi-bin/admin-lib.pl line 2621.

The relevant lines are:

2617 sub date2valid {
2618 ##############
2619 # parameter: datum ("tt.mm.jjjj")
2620 # return datum oder -1
2621 $_[0]=~s/ +//g;

Fixing to copy into a variable and not modifying $_ directly resolves the problem, and snaps are now being cleaned up.

FYI:
# perl --version

This is perl 5, version 16, subversion 1 (v5.16.1) built for i86pc-solaris-thread-multi-64

Copyright 1987-2012, Larry Wall
...

# uname -a
SunOS fishlaris.local 5.11 omnios-170cea2 i86pc i386 i86pc

_Gea said:
The above setting creates 48 snaps per day.
As keep and hold are respected both, hold 3 days is the effective setting
what means that you should have 144 snaps

If snap is recursive, you have 144 snaps x number of filesystems.

_Gea · Jul 23, 2015

@natkin
I will add this modification into next dev release.

jmk396 · Jul 23, 2015

_Gea said:
Most napp-it users are on OmniOS now.
There are many professional Solaris users around but only with basic storage needs (CIFS, NFS, iSCSI)

While ftp is an addon, tftpd is included in the napp-it distribution.
I have not tried but the included tftpd may work on Solaris with the services menu.

OK - Thanks for the information. I was using OmniOS and really enjoyed it, but I feel that SMB 2.1 is important. Hopefully Nexenta merges their changes into Illumos so it's available in OmniOS in the future...

_Gea · Jul 23, 2015

jmk396 said:
OK - Thanks for the information. I was using OmniOS and really enjoyed it, but I feel that SMB 2.1 is important. Hopefully Nexenta merges their changes into Illumos so it's available in OmniOS in the future...

There is some work by Nexenta to integrate SMB 2.1 into Illumos.

Aesma · Jul 24, 2015

_Gea said:
replace command
zpool replace tank c1t1d0 c2t0d0

- The old disk is valid until the replace is finished
- autoexpand is a pool property

I'm doing something wrong.

root@X6:~# zpool replace bay c4t0d0 C6t50014EE20707A7AAd0
cannot open 'C6t50014EE20707A7AAd0': no such device in /dev/dsk
must be a full path or shorthand device name

c4t0d0 is the old 2TB connected on an internal SATA port
C6t50014EE20707A7AAd0 is the new 3TB in a JBOD enclosure. It had an NTFS partition, I initialized it with napp-it but things didn't improve.

Aesma · Jul 24, 2015

After trying a million things it finally worked, but I don't know why, annoying.

I'm seeing this :

root@X6:~# zpool status -v bay
pool: bay
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Jul 24 19:19:07 2015
8.45T scanned out of 134T at 2.47G/s, 14h28m to go
2.99M resilvered, 6.30% done

It doesn't make much sense, 6.30% done with only 2.99M resilvered, that 2TB drive is pretty full.

Is it reading the whole array like a scrub, and not just copying from the current drive ? If so it will waste a lot of time (and I fear that 2.47G/s will not last) since my pool is very unbalanced. I did a scrub a couple days ago (while using the pool) and it took 165 hours, so I wonder if that replace will take a similar amount of time.

OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

n00b

Supreme [H]ardness

n00b

Limp Gawd

Gawd

Gawd

[H]ard|Gawd

Supreme [H]ardness

[H]ard|Gawd

Supreme [H]ardness

[H]ard|Gawd

Supreme [H]ardness

Supreme [H]ardness

Gawd

Limp Gawd

Supreme [H]ardness

Gawd

Supreme [H]ardness

Limp Gawd

[H]ard|Gawd

[H]ard|Gawd

Supreme [H]ardness

Gawd

Supreme [H]ardness

Gawd

Limp Gawd

Gawd

Gawd

Supreme [H]ardness

Gawd

Supreme [H]ardness

n00b

Supreme [H]ardness

Gawd

Supreme [H]ardness

[H]ard|Gawd

[H]ard|Gawd