OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Discussion in 'SSDs & Data Storage' started by _Gea, Dec 30, 2010.

  1. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
    I will check that but cannot promise anything


    update
    Pushalot support is available in napp-it 0.9f6 (Jun.10.2015)
     
    Last edited: Jun 10, 2015
  2. levak

    levak Limp Gawd

    Messages:
    386
    Joined:
    Mar 27, 2011
    We managed to change all hard drives, JBOD cases, servers, LSI HBA cards, SAS cables, upgraded OmniOS to latest x14 version,... Actually, we change all the hardware, we changes pool design(from a single 50 drive pool to a 7vdev x 10drives pool), we even changed datacenters and switch, yet we still had the same problems.

    Then I thought, what is the only common thing left? Well, it's the firmware. We downgraded from P19 to P17 and so far, it looks way better. Speeds are actually a lot higher, pool is not 100% busy anymore and so far, no freezing. It's maby still too early to say, but it looks promising.

    Also, it wasn't iscsi problem, but the pool itself froze (issuing 'touch a' in zpool folder made the command freeze for up to 15min) for X amount of time and then continue with operations. SAS HBA reset did not help. After so much downtime, iscsi crashed as well.

    It might help someone with the same problems.

    Matej
     
  3. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
    Good to hear about solutions, not only problems...
     
  4. nl-darklord

    nl-darklord n00b

    Messages:
    43
    Joined:
    Dec 30, 2010
    Thanks! I will test it
    [edit] i'll have to wait. : free updates : 0.9f4, 0.9f5
     
    Last edited: Jun 10, 2015
  5. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
    If anyone wants to try Pushalot with napp-it
    use the following dev key

    monitor dev - 12.06.2015::mVqmVqsmVVTTVsqmlqsmVKTTstDl

    Copy/paste the whole line into menu extension >> register
    and you can update to latest 0.9f6_dev
     
    Last edited: Jun 10, 2015
  6. nl-darklord

    nl-darklord n00b

    Messages:
    43
    Joined:
    Dec 30, 2010
    It works perfectly! :D
    thank you for the quick implementation

    [edit] found one typo in Push - Alert menu:
    Service provider: Pushover (Android, IOS, Desktop) or Pushover (Win8.1+, Windows Phone, free)

    should be: Service provider: Pushover (Android, IOS, Desktop) or Pushalot (Win8.1+, Windows Phone, free)
     
  7. levak

    levak Limp Gawd

    Messages:
    386
    Joined:
    Mar 27, 2011
    I guess I spoke too soon:)

    It crashed this night. Although it looks like only iscsi target froze this time, not the FS itself. At least nagios did not report problems...

    Matej
     
  8. cbutters

    cbutters Gawd

    Messages:
    513
    Joined:
    Dec 30, 2005
    Hi Everyone, I'm running a Napp-it all in one server and have a question about ESXI NIC teaming.

    I'm running ESXI 5.5U2 on a machine with two physical Intel NICs and my overall goal is to be able to have 2Gbps throughput over the network (outside of the VM network) to the ZFS Share

    I have used the vSphere client to enable NIC teaming in ESXI.
    I have the two network adapters added to the same VM Network, and have enabled NIC teaming. I chose "Route based on originating virtual port ID" which if I understand it correctly, won't increase the throughput, it will just spread the load to one NIC or the other.
    I would use "Route based on IP Hash" which if I understand it correctly, would let the throughput reach 2gbps on one 10gbps vmxnet3. But from what I have read, this will not work properly with a "dumb" switch. (which is what I have.)

    To some extent, I have acheived 2gbps, but it is difficult to manage:
    I have been able to max the throughput on my ZFS server only by adding two virtual NICS to my napp-it VM, and then send data to each of the virtual NICs by referencing two copy jobs to one of the two IP address each. I can see that when I do this I am indeed saturating 2gbps worth of data. The downside is it is inconvenient to worry about which IP is currently being used, and then use the other one when performing a copy job, especially when I'd like to just reference the ZFS shares by name, rather than various IP addresses. I'd like one IP that can do it all.

    I guess I'll phrase this differently now that the scenario is painted



    A) Rather than try to NIC team via "Route based on IP hash" in esxi, Is it possible to team the two virtual lan ports within OmniOS in order to have one ip address but obtain 2gbps throughput?

    B) Is it possible to "Route based on ip hash" on a dumb switch through some method?

    Thanks for any insight you all might have on obtaining more than 1gbps throughput. Thanks

    P.S. On further reading, I do see that many state that bonding can cause more problems than it is worth. I would still be interested in hearing from people who do have it setup since I often do large copy jobs, and these days even gigabit is pretty slow these days with USB 3.0+ HDDs and SSDs, etc..
     
    Last edited: Jun 12, 2015
  9. cbutters

    cbutters Gawd

    Messages:
    513
    Joined:
    Dec 30, 2005
    Just wanted to reply to my own post... I discovered 2 things that are helping me out a lot to get better reliability / faster transfers.

    A) USB 3.0 Support via a ESXI patch http://www.v-front.de/2014/11/vmware-silently-adds-native-usb-30.html
    -Be sure to force USB 3.0 in the BIOS of your bare metal server, and use the automatic startup script that the website references.


    B) Removing any e1000 nics from OmniOS VM on ESXI 5.5U2
    I was having a horrible time with super high CPU usage and other problems, freezing up on the network, etc... I finally removed the e1000 nic(that e1000 is bad news and worked horribly on this version of esxi) and added a second vmxnet3. This way I can still saturate 2gbps via 2 IP addresses, I'm not getting pegged CPU at 100% and that is pretty good for my usage.


    With these two fixes, I can now mount USB 3.0 drives directly into the system via a Windows 8.1 VM I have and do copies, and with multiple IPs I can saturate 2gbps on the network!, it isn't one big bruiser of a single link but its pretty good considering I can really do some damage now when copying to / from the ZFS pool or backing up my files to external drives.

    Pretty happy at this point. Hope this post might help someone.
     
  10. Aesma

    Aesma [H]ard|Gawd

    Messages:
    1,844
    Joined:
    Mar 24, 2010
    I need to replace all the drives in a RAIDZ3 vdev by bigger ones. We're talking 19 drives. I've got enough ports to put all the new drives into the server at once. I can offline the pool. What would be the fastest way to do this, can I clone all the drives, then when doing a one by one replace, the resilver would be instantaneous ?
     
  11. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
    With enough slots, I would create a new pool and replicate the filesystems to the new pool like
    zfs send oldpool/filesystem -> newpool

    Other option is a 19x disk replace olddisk -> newdisk with autoexpand enabled.
     
    Last edited: Jun 21, 2015
  12. Aesma

    Aesma [H]ard|Gawd

    Messages:
    1,844
    Joined:
    Mar 24, 2010
    The pool is actually 57 drives (3 vdevs) so I can't do that. There is a backup but I would rather not lose redundancy while doing the operation.

    My question is really about how the replace command works, I've seen that it uses the old disk if available, but if something goes wrong will it go back automatically ?

    Also, copying the 19 drives at once will probably be quite slow as I'm bandwidth limited, so I would do 4 or 5 at once.

    Finally, does autoexpand need to be used for each replace ?
     
  13. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
    replace command
    zpool replace tank c1t1d0 c2t0d0

    - The old disk is valid until the replace is finished
    - autoexpand is a pool property
     
  14. Aesma

    Aesma [H]ard|Gawd

    Messages:
    1,844
    Joined:
    Mar 24, 2010
    Thanks. I will have fun with the WWIDs !
     
  15. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
  16. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
  17. jmk396

    jmk396 Gawd

    Messages:
    783
    Joined:
    Jul 22, 2004
    Neat! Thanks! I'll be switching back to Solaris from OmniOS for the SMB 2.1 feature. Hopefully I can keep my ZFS pools at the older version for compatibility if I want to switch back to OmniOS.
     
  18. TheLastBoyscout

    TheLastBoyscout Limp Gawd

    Messages:
    142
    Joined:
    Feb 13, 2011
    I have an OpenIndiana system with Napp-it, which shows the following:

    Code:
      pool: rpool
     state: DEGRADED
    status: One or more devices has been taken offline by the administrator.
    	Sufficient replicas exist for the pool to continue functioning in a
    	degraded state.
    action: Online the device using 'zpool online' or replace the device with
    	'zpool replace'.
      scan: scrub repaired 0 in 0h28m with 0 errors on Sun Jul 12 23:28:12 2015
    config:
    
    	NAME        STATE     READ WRITE CKSUM     CAP            Product
    	rpool       DEGRADED     0     0     0
    	  mirror-0  DEGRADED     0     0     0
    	    c5d0s0  OFFLINE      0     0     0     80 GB          
    	    c2d0s0  ONLINE       0     0     0     80 GB          
    
    errors: No known data errors
    From what I was told, the original second drive had failed and has been physically replaced (I don’t have access to it anymore) and an attempt was made to add a new drive to the system.

    Trying
    Code:
    zpool online rpool c5d0s0
    gave me the following warning
    Code:
    warning: device 'c5d0s0' onlined, but remains in faulted state
    use 'zpool replace' to replace devices that are no longer present
    and now the pool lists the drive as unavailable:

    Code:
    	NAME        STATE     READ WRITE CKSUM     CAP            MODELL
    	rpool       DEGRADED     0     0     0
    	  mirror-0  DEGRADED     0     0     0
    	    c5d0s0  UNAVAIL      0     0     0  cannot open     80 GB          
    	    c2d0s0  ONLINE       0     0     0     80 GB     
    Now I'm stuck... :( Can anyone help?

    Thanks!
     
  19. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
    unavail means missing or damaged.

    You can insert a new disk (>= 80GB, same sectorsize), reboot and replace the unavail disk with the new one on another port. If you replace on same port a reboot may also be needed.

    If this disk is on Sata:
    Onboard Sata/AHCI is basically hotplug capable but this is disabled per default:
    To enable add the following line to /etc/system (OmniOS, should work on OI as well)

    set sata:sata_auto_online=1
     
  20. bjamm

    bjamm Gawd

    Messages:
    529
    Joined:
    Apr 18, 2005
    Quick question; Im running My ZFS currently on Openindiana. Im doing updates and consolidating services to my server. I need to upgrade to Esxi 6 and 'm trying to move all my media services to the same ESXI box. I am choosing between OmniOS and Freenas which has them all included. Setting these up on OI was a pain; Sabnzbd couch potato and sickbeard but id like to try Sonarr and nzbget/drone also which I could never get to work on OI.

    I see performance wise OmniOS is the clear choice for now till Freenas10 might be available. Is it recommended to run these two in separate VM and view the ZFS system through the OmniOS? Cant decide if i want to just import to FreeNas and let it take control. I like Napp it not sure if I'm comfortable switching. Whats the ideal config for this situation?

    Also looking into it; I could run OmniOS - then looking at this Flawless Media Server Linux Mint distro could stream everything through that..
     
    Last edited: Jul 19, 2015
  21. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
    The strength of Solaris based systems are performance, stabilty, the CIFS server and that an appliance is independent from a distribution (You can use a vanilla OS like OmniOS, OI or Oracle Solaris with the option of a GUI).

    While there are many applications available via pkgin from SmartOS the focus is not a home media server. But with ESXi as a base, you can divide storage, media server, amp server etc to different VMs from BSD over OSX, Linux to Windows.


    Regarding portability of pools:
    If the disks were formatted in BSD with GPT partitions (which FreeBSD recognizes but Solaris doesn't), you cannot import to Solaris.

    Pools with disks formatted with GEOM can be exported from FreeBSD/FreeNAS/ZFSGuru and reimported into Solaris. GPT may work also but only if the partition spans the whole disk
     
  22. thedge

    thedge Limp Gawd

    Messages:
    273
    Joined:
    Dec 8, 2010
    Hey Gea, the Pushover feature is awesome, thank you so much for adding it. I was wondering if you could expand it to notify on S/H/T errors as well? And maybe Smart info failures?

    I ask because (at least in my experience) I've never had a disk just fail and go offline with ZFS except for physically pulling it to test Pushover. I always get hours/days of S/H/T errors, reduced performance (which I don't always notice right away), then maybe the disk fails. Usually I get to the "catastrophic performance degradation" phase before the "complete disk failure" phase.

    So I think if Napp-IT can notify on S/H/T errors increasing by say more than 1 in the last X minutes (to account for Smart checks, etc) that would be incredibly useful.
     
  23. Aesma

    Aesma [H]ard|Gawd

    Messages:
    1,844
    Joined:
    Mar 24, 2010
    BTW, I'm doing a scrub of my pool, and there has been some MB repaired on two drives of two vdevs, however I noticed that (repairing) stayed mentioned besides the drives for hours after that (the scrub takes several days). I did a reboot to plug some drives in and then (repairing) had disappeared, while the scrub continued.

    The drives I'm plugging in are NTFS as I'm migrating the last of my data to ZFS, and for some reason I can mount GPT drives fine, while MBR drives always fail, any help appreciated. I'm using ntfs-3g, here is what I get :

    root@X6:~# mount -F ntfs-3g /dev/dsk/c6t5000C5002F75FBA0d0s1 /mnt/_win2T01
    Error opening '/devices/scsi_vhci/disk@g5000c5002f75fba0:b': I/O error
    Failed to mount '/devices/scsi_vhci/disk@g5000c5002f75fba0:b': I/O error
    NTFS is either inconsistent, or there is a hardware fault, or it's a
    SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows
    then reboot into Windows twice. The usage of the /f parameter is very
    important! If the device is a SoftRAID/FakeRAID then first activate
    it and mount a different device under the /dev/mapper/ directory, (e.g.
    /dev/mapper/nvidia_eahaabcc1). Please see the 'dmraid' documentation
    for more details.

    The drives pass chkdsk without error under windows, and I eject them properly.
     
  24. Aesma

    Aesma [H]ard|Gawd

    Messages:
    1,844
    Joined:
    Mar 24, 2010
    This has hampered me for more than a week and I just figured it out, with MBR drives I need to use the "p1" partition instead of the "s1".
     
  25. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
    An extendable alert mechanism is already planned for one of the next releases.
     
  26. jmk396

    jmk396 Gawd

    Messages:
    783
    Joined:
    Jul 22, 2004
    Has anybody tried the Solaris 11.3 beta and/or using napp-it with it? Does it install and work OK?
     
  27. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
    I have done only some basic tests and it is running.
    You need the newest wget installer from today (online now) to install Solaris 11.3 properly
     
  28. jmk396

    jmk396 Gawd

    Messages:
    783
    Joined:
    Jul 22, 2004
    Thanks so much Gea.
     
  29. thedge

    thedge Limp Gawd

    Messages:
    273
    Joined:
    Dec 8, 2010
    Awesome, not at all surprised that you're steps ahead of me. I look forward to it!
     
  30. jmk396

    jmk396 Gawd

    Messages:
    783
    Joined:
    Jul 22, 2004
    EDIT: Disregard. My Windows 8.1 machine was showing SMB dialect 1.50 until I rebooted and now it shows 2.10 with Solaris 11.3 beta.
     
    Last edited: Jul 21, 2015
  31. jmk396

    jmk396 Gawd

    Messages:
    783
    Joined:
    Jul 22, 2004
    Should proftpd be included with Solaris 11.3? I'm trying to enable it but the 'proftpd' service can't be found. However, the 'network/ftp' service exists and looks to be proftpd.

    Also, when I try to enable the tftpd service, I get the following error: "/var/web-gui/_log/tftpd.conf must contain -s /folder". It looks like this tftpd.conf file exists but is empty.
     
    Last edited: Jul 22, 2015
  32. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
    On Solaris, you must setup Proftp manually.

    Proftp and the AMP stack settings within napp-it are based on pkgin from SmartOS
    that can be used with OmniOS as an additional (free and community driven) setup.
     
  33. jmk396

    jmk396 Gawd

    Messages:
    783
    Joined:
    Jul 22, 2004
    OK - Thanks Gea. Does the same apply for the tftpd service?
     
  34. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
    Most napp-it users are on OmniOS now.
    There are many professional Solaris users around but only with basic storage needs (CIFS, NFS, iSCSI)

    While ftp is an addon, tftpd is included in the napp-it distribution.
    I have not tried but the included tftpd may work on Solaris with the services menu.
     
  35. natkin

    natkin n00b

    Messages:
    30
    Joined:
    Mar 31, 2014
    I had still been having this problem which required me to delete old snaps myself. I am on omnios-r151014.

    My snap logs have been showing lines like:

    hold 5 days: match rule: snap-age d1error days

    I traced the "d1error" to the date2diff function and ran the following test:

    perl -e 'use lib "/var/web-gui/data/napp-it/CGI"; \
    require "/var/web-gui/data/wwwroot/cgi-bin/admin-lib.pl"; \
    $d = &date2diff("01.01.2015", "01.02.2015"); print $d, "\n"'

    This produced the following error:

    Modification of a read-only value attempted at /var/web-gui/data/wwwroot/cgi-bin/admin-lib.pl line 2621.

    The relevant lines are:

    2617 sub date2valid {
    2618 ##############
    2619 # parameter: datum ("tt.mm.jjjj")
    2620 # return datum oder -1
    2621 $_[0]=~s/ +//g;

    Fixing to copy into a variable and not modifying $_ directly resolves the problem, and snaps are now being cleaned up.

    FYI:
    # perl --version

    This is perl 5, version 16, subversion 1 (v5.16.1) built for i86pc-solaris-thread-multi-64

    Copyright 1987-2012, Larry Wall
    ...

    # uname -a
    SunOS fishlaris.local 5.11 omnios-170cea2 i86pc i386 i86pc

     
  36. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
    @natkin
    I will add this modification into next dev release.
     
  37. jmk396

    jmk396 Gawd

    Messages:
    783
    Joined:
    Jul 22, 2004
    OK - Thanks for the information. I was using OmniOS and really enjoyed it, but I feel that SMB 2.1 is important. Hopefully Nexenta merges their changes into Illumos so it's available in OmniOS in the future...
     
  38. _Gea

    _Gea 2[H]4U

    Messages:
    3,874
    Joined:
    Dec 5, 2010
    There is some work by Nexenta to integrate SMB 2.1 into Illumos.
     
  39. Aesma

    Aesma [H]ard|Gawd

    Messages:
    1,844
    Joined:
    Mar 24, 2010
    I'm doing something wrong.

    root@X6:~# zpool replace bay c4t0d0 C6t50014EE20707A7AAd0
    cannot open 'C6t50014EE20707A7AAd0': no such device in /dev/dsk
    must be a full path or shorthand device name

    c4t0d0 is the old 2TB connected on an internal SATA port
    C6t50014EE20707A7AAd0 is the new 3TB in a JBOD enclosure. It had an NTFS partition, I initialized it with napp-it but things didn't improve.
     
  40. Aesma

    Aesma [H]ard|Gawd

    Messages:
    1,844
    Joined:
    Mar 24, 2010
    After trying a million things it finally worked, but I don't know why, annoying.

    I'm seeing this :

    root@X6:~# zpool status -v bay
    pool: bay
    state: ONLINE
    status: One or more devices is currently being resilvered. The pool will
    continue to function, possibly in a degraded state.
    action: Wait for the resilver to complete.
    scan: resilver in progress since Fri Jul 24 19:19:07 2015
    8.45T scanned out of 134T at 2.47G/s, 14h28m to go
    2.99M resilvered, 6.30% done


    It doesn't make much sense, 6.30% done with only 2.99M resilvered, that 2TB drive is pretty full.

    Is it reading the whole array like a scrub, and not just copying from the current drive ? If so it will waste a lot of time (and I fear that 2.47G/s will not last) since my pool is very unbalanced. I did a scrub a couple days ago (while using the pool) and it took 165 hours, so I wonder if that replace will take a similar amount of time.