OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Discussion in 'SSDs & Data Storage' started by _Gea, Dec 30, 2010.

  1. spazoid

    spazoid Limp Gawd

    Messages:
    289
    Joined:
    Jun 16, 2008
    Thanks, this helped alot! I can now ssh to the server without getting spammed :)
     
  2. spazoid

    spazoid Limp Gawd

    Messages:
    289
    Joined:
    Jun 16, 2008
    Well, I've run into a new problem that I can't seem to solve.

    I've passed through an M1015, and I can see the controller (lspci) but the disks doesn't show up. I know the disks work, because if I pass the controller through to a Windows 7 machine, they pop up perfectly in the disk manager..

    Any ideas?
     
  3. _Gea

    _Gea 2[H]4U

    Messages:
    3,782
    Joined:
    Dec 5, 2010
    For a regular 1015 you need to install drivers from LSI
    But most people reflash the IBM to a LSI HBA 9211-IT mode

    This is the perfect raidless ZFS mode and supported without additional drivers.
    http://www.servethehome.com/ibm-serveraid-m1015-part-4/
     
  4. jb33

    jb33 n00b

    Messages:
    39
    Joined:
    Feb 6, 2013
    How does nappit get it System->Network info? It's different on 1 of 2 boxes from what ifconfig -a and/or dladm show-link return. But I happen to have network related IO performance issues on the box where napp-it shows it wrong so I'm wondering if that's a clue to the cause.

    thanks,
    jb
     
  5. jb33

    jb33 n00b

    Messages:
    39
    Joined:
    Feb 6, 2013
    Ah, System-Network info includes values from ipadm? I used ifconfig for everything and now I think that's why one nic doesn't retain it's address on reboot!

    Just cleaned out old interfaces. There is one disabled address e1000g0/v4 that seems to be populating the value on the nappit screen, but I can't delete it:
    # ipadm show-addr
    ADDROBJ TYPE STATE ADDR
    lo0/v4 static ok 127.0.0.1/8
    e1000g0/_a static ok 192.168.1.186/24
    aggr1/_a static ok 10.10.2.201/24
    lo0/v6 static ok ::1/128
    e1000g0/v4 static disabled 10.10.2.202/24
    #
    #
    #ipadm delete-addr e1000g0/v4
    ipadm: could not delete address: Object not found
     
  6. jb33

    jb33 n00b

    Messages:
    39
    Joined:
    Feb 6, 2013
    del, sorry.
     
    Last edited: Mar 21, 2013
  7. jb33

    jb33 n00b

    Messages:
    39
    Joined:
    Feb 6, 2013
    Interesting! Significant performance boost after reconfiguring network with ipadm. IO is all ARC driven in this test but what's important is I'm finally pegged out on the nics!

    SERVER TYPE: Dell PowerEdge 2900, 48Gb ram
    CPU TYPE / NUMBER: 2x Xeon E5410 2.33GHz
    STORAGE: 6 2TB sata 7.2k in RAID 10

    Test name Avg MBps
    Max Throughput-100%Read 221
    RealLife-60%Rand-65%Read 95
    Max Throughput-50%Read 211
    Random-8k-70%Read 4.07 102
     
  8. Rectal Prolapse

    Rectal Prolapse Gawd

    Messages:
    575
    Joined:
    Jan 19, 2007
    jb33, what did you do with ipadm exactly? This knowledge could be useful for others with similar issues! Thanks.
     
  9. tladuke

    tladuke n00b

    Messages:
    52
    Joined:
    Sep 26, 2011
    I recently finished sending a snapshot of my pool to another box, but now I want to use napp-it replication. Is there a way to rename the snapshot so I don't have to resend everything when I first turn on replication?

    I sent nas1:tank/work@daily to nas2:tank/work@daily
     
  10. jb33

    jb33 n00b

    Messages:
    39
    Joined:
    Feb 6, 2013
    Well, I didn't actually know about ipadm (on Solaris-based OSes.) I used ifconfig to set up my nics. I *think* if you used the nappit gui to setup network adapters he calls ipadm so you wont have the problem.

    Reconfigured w/ ipadm using the instructions here and after a reboot I was full throttle on my tests. No idea why, of course. https://blogs.oracle.com/GmG/entry/ipadm_1m_the_new_ifconfig.

    Having sorted that out, I'm back to testing link-aggregation vs. COMSTAR multi-pathing and now I think COMSTAR MPIO is winning. Here are results from the last run. Anyone know how I'm getting 326MBps across 2x1Gbps links!?

    Link aggregation works great for reads, but writes were all coming from one nic. I couldn't seem to change that with aggr policy (L2,L3,L4) but COMSTAR seems to distribute writes evenly.

    Test name Avg MBps
    Max Throughput-100%Read 224
    RealLife-60%Rand-65%Read 100
    Max Throughput-50%Read 326
    Random-8k-70%Read 79
    100% sequesntial write(1MB) 212
     
  11. tladuke

    tladuke n00b

    Messages:
    52
    Joined:
    Sep 26, 2011
    I got a new WD15EARS and it still couldn't replace. Still says different sector sizes.
    I am on 151a7 now. Maybe I shouldn't have done that first?
    Any way to fix?

    Here's the one error I can find that might be related

    Code:
     zdb -l /dev/dsk/c3t4d0
    --------------------------------------------
    LABEL 0
    --------------------------------------------
    failed to unpack label 0
    --------------------------------------------
    LABEL 1
    --------------------------------------------
    failed to unpack label 1
    --------------------------------------------
    LABEL 2
    --------------------------------------------
    failed to unpack label 2
    --------------------------------------------
    LABEL 3
    --------------------------------------------
    failed to unpack label 3
    
    I exported then imported and now the pool status looks like this. The funny number is the new disk
    Code:
     tank                     DEGRADED     0     0     0
              raidz2-0               DEGRADED     0     0     0
                c3t1d0               ONLINE       0     0     0
                c3t2d0               ONLINE       0     0     0
                c3t3d0               ONLINE       0     0     0
                3832917263814807041  FAULTED      0     0     0  was /dev/dsk/c3t4d0s0
                c3t5d0               ONLINE       0     0     0
    
     
    Last edited: Mar 22, 2013
  12. _Gea

    _Gea 2[H]4U

    Messages:
    3,782
    Joined:
    Dec 5, 2010
    The "funny number" is a disk GUID that can be used in disk replace instead of the port number
     
  13. EnderW

    EnderW [H]ardForum Junkie

    Messages:
    10,727
    Joined:
    Sep 25, 2003
    What do you guys do for your backups? If your data is saved on the server, does it back up to a second server? Or is everything on one system and you use snapshots to backup to different drives within the same server?
     
  14. bl4s7er

    bl4s7er n00b

    Messages:
    17
    Joined:
    Aug 4, 2011
    Hopefully someone can help!

    Ive been running nappit for almost 2 years (RAIDZ2) and everythings been great. Decided it was time to swap out 2 drives with new ones. Upgraded from 2tb to 3tb drives. Ive shutdown, swapped out just one drive and restarted. Pool is degraded as expected BUT i cant seem to add the new drive to the pool.

    "cfgadm -al" indicates its connected and configured.

    napp-it indicates the drive with STATE 'unavailable' and CAPACITY 'cannot open'

    Does it need to be formatted prior to installing?

    Thanks for your assistance :)
     
  15. _Gea

    _Gea 2[H]4U

    Messages:
    3,782
    Joined:
    Dec 5, 2010
    A backup is never on the same machine or it isn't a backup but a simple copy.
     
  16. _Gea

    _Gea 2[H]4U

    Messages:
    3,782
    Joined:
    Dec 5, 2010
    Does your controller support 3TB disks?
    LSI 2008 is ok while LSI 1068 is not (max 2TB)
     
  17. bl4s7er

    bl4s7er n00b

    Messages:
    17
    Joined:
    Aug 4, 2011
    Yep confirming that it supports 3tb+.

    Ive been at it for hrs now trying to resolve this. I think its got something to do with not recognizing the names? Using command line and GUID to reference the drives has got me closer.

    Currently looks like this;
    Code:
    	NAME          STATE     READ WRITE CKSUM     CAP            Product
    	megadrive     DEGRADED     0     0     0
    	  raidz2-0    DEGRADED     0     0     0
    	    c4t0d0p0  ONLINE       0     0     0     2.00 TB        SAMSUNG HD204UI
    	    c4t1d0p0  OFFLINE      0     0     0     3.00 TB        ST3000DM001-1CH1
    	    c4t2d0p0  ONLINE       0     0     0     2.00 TB        ST2000DL003-9VT1
    	    c4t3d0p0  ONLINE       0     0     0     2.00 TB        ST2000DL003-9VT1
    	    c4t4d0p0  ONLINE       0     0     0     2.00 TB        SAMSUNG HD204UI
    	    c4t5d0p0  ONLINE       0     0     0     2.00 TB        SAMSUNG HD204UI
    Im trying to run the 'zpool replace' cmd but its just not working (whether the drive is online or offline)
    Code:
    -zpool replace megadrive c4t1d0
    cannot replace c4t1d0 with c4t1d0: no such device in pool
    
    -zpool replace megadrive 16795158329404238342
    cannot open '16795158329404238342': no such device in /dev/dsk
    must be a full path or shorthand device name
     
  18. bollar

    bollar n00b

    Messages:
    2
    Joined:
    Mar 5, 2013

    For what it's worth, I've always had to replace failed devices by GUID and I have LSI controllers. I find the GUID for the failed drive with
    Code:
     zdb -C tank
    Once I find the failed drive's GUID in that output, I replace with
    Code:
     zpool replace tank 17301056034236317553 c4t5000CCA225E17DC9d0
    Note that the replace command has old drive by GUID AND new drive by port in it.

    This is one case where I think napp-it would be more reliable if it used GUID for these types of operations, but I understand that it might break functionality for others.
     
    Last edited: Mar 26, 2013
  19. bl4s7er

    bl4s7er n00b

    Messages:
    17
    Joined:
    Aug 4, 2011
    Thanks!!.. that did the trick.

    Code:
    zpool replace tank GUID PORT
    :D
     
  20. bollar

    bollar n00b

    Messages:
    2
    Joined:
    Mar 5, 2013
    Excellent!
     
  21. _Gea

    _Gea 2[H]4U

    Messages:
    3,782
    Joined:
    Dec 5, 2010
    Another option:
    If it was essentially a partition problem (missing partition), you can also try
    napp-it menu disk - initialize
     
  22. EnderW

    EnderW [H]ardForum Junkie

    Messages:
    10,727
    Joined:
    Sep 25, 2003
  23. _Gea

    _Gea 2[H]4U

    Messages:
    3,782
    Joined:
    Dec 5, 2010
  24. craggy

    craggy n00b

    Messages:
    4
    Joined:
    Mar 27, 2013
    Hi,

    First time to post on the forum so not sure if this is even where I should be posting this.

    I have moved from Nexenta 3.1.3.5 to Openindiana + napp-it and have seen a significant drop in iScsi performance on the very same hardware / config as previous.

    With Nexenta and my ESXi 5.1 hosts I was getting 243MB/s read and 247MB/s write using dual Gbe nics and MPIO but since switching to Opendiana on the same hardware this has dropped to 174MB/s read but still 235Mb/s writes.

    Nics in OI server are Intel Pro1000 PT dual port adapters, is there some setting like Large Send Offloading or something that needs to be tweaked in the Illumos core?

    Thanks,
     
  25. _Gea

    _Gea 2[H]4U

    Messages:
    3,782
    Joined:
    Dec 5, 2010
    I have heard of performance problems with 10GBe adpaters and mpio that can be fixed by editing the driver config file. Maybee you can check this or other settings in the e1000 config file.

    http://lists.omniti.com/pipermail/omnios-discuss/2012-August/000053.html
     
  26. craggy

    craggy n00b

    Messages:
    4
    Joined:
    Mar 27, 2013
    Thanks, i'll give that a go.

    Do you know if there are any ssh commands I can issue to help tweak things?

    I've enabled jumbo frames accross the board, tried increasing the max tcp windows size etc. but none of this has made even a slight difference.
     
  27. craggy

    craggy n00b

    Messages:
    4
    Joined:
    Mar 27, 2013
    Bingo, had to change this in my e1000g.cfg file. It was set at 0,0,0,0.....

    MaxFrameSize=3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3;
    # 0 is for normal ethernet frames.
    # 1 is for upto 4k size frames.
    # 2 is for upto 8k size frames.
    # 3 is for upto 16k size frames.

    Now i'm seeing 243MB/s read and write.

    Thanks
     
  28. _Gea

    _Gea 2[H]4U

    Messages:
    3,782
    Joined:
    Dec 5, 2010
  29. iolaus

    iolaus n00b

    Messages:
    21
    Joined:
    Jul 12, 2009
    This is fantastic information for someone just starting out with OpenIndiana (like me). Thanks so much!
     
  30. twistacatz

    twistacatz Limp Gawd

    Messages:
    182
    Joined:
    Jan 3, 2005
    I'm starting to understand why Gea recommends dedicated hardware. Without going into too much detail about my setup I guess I have a broad question for the hive mind? Has anyone achieved transfers over 125MB/s using OI + Napp-it and VMware?

    I've tried using the E1000 and VMXNET3 with no avil. I've also tried passing thru a 10Gb NIC but it was unstable with OI for some reason.

    With my goal of 1GB/s transfers and 50,000 IOPS SSD pools it seems like the virtual NIC is the bottleneck with dedicated hardware being the only option. Am I right?

    Thanks in advance!
     
  31. halcyon

    halcyon Gawd

    Messages:
    719
    Joined:
    Mar 20, 2003
    I can get around 7-800Mbps from an in-memory transfer to my machine using an E1000, but my disks certainly aren't fast enough to sustain that.
     
  32. halcyon

    halcyon Gawd

    Messages:
    719
    Joined:
    Mar 20, 2003
    I decided to update ESXi to the latest patches for 5.0, and in that process I moved the NIC from E1000 to VMXNET3 and updated the tools. This made a big difference, it changed the local NIC from 1G->10Gbps (this could just be cosmetic), but my file transfers are now around 850-900Mbps (95-100MB/s), without being in cache on the box. This is using Solaris 11.
     
  33. twistacatz

    twistacatz Limp Gawd

    Messages:
    182
    Joined:
    Jan 3, 2005
    Yeah it sounds like 1Gbps is the limit for OI and ESX. I think it's time to look at building a new dedicated box. Thanks for the input guys!
     
  34. Aesma

    Aesma [H]ard|Gawd

    Messages:
    1,844
    Joined:
    Mar 24, 2010
    I'm ordering the parts I'm lacking to start my OI server tonight, but the more I think about it, the more I realize that while it will help me manage my data, it will not help my already cumbersome backup routine. Backuping dozens of TB to a bunch of drives will not work, so I'll make another server, maybe OI, maybe OmniOS or ZoL, it'll have to wait a few months anyway.

    If I keep it on my LAN then there are tools to do the syncing, but what about remote backup ? Let's say, I put the backup server in my parents' garage. With only 1Mb/s upload for them and for me, there is no way to backup anything meaningful, but is there a way to copy the differences (added files, mainly) on a hard drive that I would then bring from the server to the backup server ?
     
  35. twistacatz

    twistacatz Limp Gawd

    Messages:
    182
    Joined:
    Jan 3, 2005
    I'm not a pro myself but I would look into the zfs send a receive commands.
     
  36. noise850

    noise850 n00b

    Messages:
    22
    Joined:
    Dec 9, 2005
    Hey Guys,

    I apologize if this has been covered in this thread. I searched and read all I could but only ended up slightly more confused.

    I have a small all-in-one server box running OI and Napp-it. It has 6 2TB Hitachi drives as a raid-z2. I also have a SATA drive sled with 4 individual 2TB Hitachi Drives. Right now the server only has 1.7 TB full, but of course will only continue to grow.

    What I am looking to do is keep the 4 drive sleds as off-site backup for the server. The idea I had was to run 2 sets of drives that get swapped out every month with incrementals. To explain, I'll call the Drives A1, A2 and B1, B2.

    A1 and B1 get a full backup of the 1.7TB initially. A1 is placed off-site and B1 remains. B1 continues to have daily or weekly backups made to it, then after a month, B1 gets swapped with A1 off-site. A1 is then brought up to date with the new material in the last month, and continues to get the weekly backups until it gets swapped with B1 again. Once A1 and B1 are full, the incrementals would switch over to A2 and B2. Once those are full, new drives and sleds would be purchased as A3/B3 etc.

    So, what is the best way to accomplish this task? ZFS Send? RSYNC? Third-Party program? If it helps, the data to backup is 3 filesystems shared on SMB, 1 shared on NFS for ESXi and one iSCSI target for WHS2011 backups.

    Thanks for the help! I'm very new to this whole concept and maybe I am asking too much, but it's all a learning experience.
     
  37. levak

    levak Limp Gawd

    Messages:
    386
    Joined:
    Mar 27, 2011
    I have a problem and I'm not sure if it's possible to solve it or now...

    I have a NFS export on OmniOS to a linux server. On a linux server, I have NFS drive mounted into /home folder with many users...

    The problem I have is that after reboot, all users folders and files get owned by nobody/nogroup. Is it possible to force NFS to keep UIDs from linux server and not change it on remount?

    I'm using NFSv3, with v4, I can't even set my users, since they don't exists on OmniOS.

    Is it possible to keep permissions with NFSv3 or do I need to go with iSCSI which I don't want to...

    MAtej
     
  38. levak

    levak Limp Gawd

    Messages:
    386
    Joined:
    Mar 27, 2011
    I'm testing NFS and iSCSI today and I'm getting mixed results:

    I have a All-in-one server with ESXi 5.1.
    OmniOS has vmxnet3 driver for lan
    Linux also has vmxnet3 driver for lan
    Storage on OmniOS is RAID10 with 4x1TB drives...

    iSCSI over dedicated LAN:
    Read: 160MB/s
    Write: 60MB/s

    NFS(sync standard) over dedicated LAN:
    Read: 160MB/s
    Write: 120MB/s

    NGS(sync disabled) over dedicated LAN:
    Read: 160MB/s
    Write: 140MB/s

    Why is writing over iSCSI slower? I was expecting better results, not half as what NFS can do...

    Anyone had the same problems?

    Matej
     
  39. halcyon

    halcyon Gawd

    Messages:
    719
    Joined:
    Mar 20, 2003
    I just tested my setup, I have 7 Hitachi 3TB green drives in a Raid Z2 connected to an M1015 passed through to Solaris 11. The test was run from a Win 7 VM on the same box, with a 1TB iSCSI partition on the pool. Both VMs use the VMXnet3 NIC driver.

    [​IMG]
     
  40. levak

    levak Limp Gawd

    Messages:
    386
    Joined:
    Mar 27, 2011
    Can you try with a bigger file? You tried with 1GB and sometimes buffer can interact and give you false results... Try to use something that is 2xRAM...

    I'm also now trying in Win2008R2... With 1GB file I got 250/110...

    Trying now with 4GB file...