OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

J-san · Dec 15, 2014

Thanks _Gea,

I didn't know about the Autostart settings, I'll have to check those out.

The problem is that if you have more than one vmxnet3s NIC to the OmniOS VM then when you set the 9000 MTU in the vmxnet3s kernel driver it will by default (I think) raise both NIC interfaces to 9000.

You can still lower the 2nd management interface to 1500 MTU but I think you have to change it after setting the 9000MTU in the kernel driver:

## Set management interface to persistently stay at 1500 MTU
## Edit for your environment (maybe you want vmxnet3s1 as your mgmnt interface)

Code:

# ipadm set-ifprop -p mtu=1500 -m ipv4 vmxnet3s0

You may be able to set the management interface as above before the 9000MTU kernel driver edit, maybe it only auto-adjusts MTUs higher if not specifically set.

_Gea · Dec 16, 2014

Another option is a basic e1000 vnic for management

moose517 · Dec 16, 2014

_Gea said:
I would suppose, one disk is blocking the bus or controller.
What you can do on a hotpluggable server is (to find a single disk problem)
- power off and remove all disks
- power on, boot Solaris
- plug in a disk and wait some seconds until it comes up in menu disks
- do this with all disks. With one disk you may have problems and this is the disk that you need to replace

Unlike hardwareraid, such a procedure is not a problem with ZFS
The pool simply stays offline until enough disks come back, if a disk is missing within the redundancy the poolstate is degraded.
When the last disk is back the poolstate goes to online. If you have modified data in the meantime, it initiates a resilver.

If this does not help, you may need another setup where you can test
if the pool itself is the problem or
- power
- the SAS controller
- the expander
- the backplane

yeah at this pointi 'm thinking its a combination of the SAS controller <-> backplane. depending on how i plug things in i can get some drives to read and nothing on others. Gonna pull my entire server upstairs in the next day or so and just sit down and play around with different connnectiosn and see what i can get. Thinking i might just break down and go with a new chassis that has actual SAS ports and a new SAS controller that has enough ports to not need an expander, any suggestions on whats known to work well to control 24 drives?

toelie · Dec 16, 2014

J-san said:
This might be related to enabling 9000MTU and jumbo frames, or using Solaris 10 vs 11 vmware drivers. Did you modify the vmware tools script to try and select Solaris 11 drivers for OmniOS?
.

I noticed that in ESXi I had the guest OS set to solaris 11 x64. I did not modify the install script. Is this ESXi setting used by the perl script?

I tried setting it to solaris 10 x64, but after the vmware tools update the napp-it interface still wasn't working. Is there a way to see if I currently use the solaris 11 driver?

Never tried anything with jumbo frames on this machine.

J-san · Dec 16, 2014

toelie said:
I noticed that in ESXi I had the guest OS set to solaris 11 x64. I did not modify the install script. Is this ESXi setting used by the perl script?

I tried setting it to solaris 10 x64, but after the vmware tools update the napp-it interface still wasn't working. Is there a way to see if I currently use the solaris 11 driver?

Never tried anything with jumbo frames on this machine.

My VM's guest operating system is set to "Oracle Solaris 10 (64-bit)".. I'm not sure if it reads that during the vmware tools install though.

You should be able to figure out which one you're using by matching the file sizes:

from the vmware-tools-distrib folder, look at the filesizes
(eg the 10_64 version is 33888 bytes)

Code:

~/vmware-tools-distrib# find . -name 'vmxnet3s' -ls

 78270   34 -rw-r--r--   1 root     root        34072 Aug 23 00:41 ./lib/modules/binary/2009.06_64/vmxnet3s
 78280   25 -rw-r--r--   1 root     root        24568 Aug 23 00:41 ./lib/modules/binary/11/vmxnet3s
 78264   35 -rw-r--r--   1 root     root        35088 Aug 23 00:41 ./lib/modules/binary/11_64/vmxnet3s
 78256   34 -rw-r--r--   1 root     root        33888 Aug 23 00:41 ./lib/modules/binary/10_64/vmxnet3s
 78274   25 -rw-r--r--   1 root     root        24424 Aug 23 00:41 ./lib/modules/binary/2009.06/vmxnet3s
 78247   25 -rw-r--r--   1 root     root        24336 Aug 23 00:41 ./lib/modules/binary/10/vmxnet3s

Then look in your /kernel/drv/:

Code:

# find /kernel/drv/ -name 'vmxnet3s' -ls
 74473   34 -rw-r--r--   1 root     root        33888 Dec  1 19:39 /kernel/drv/amd64/vmxnet3s
 74472   25 -rw-r--r--   1 root     root        24336 Dec  1 19:39 /kernel/drv/vmxnet3s

The size of amd64 vmxnet3s in /kernel/drv/ matches the Solaris 10_64 version: 33888 bytes

toelie · Dec 17, 2014

solaris 10 x64 driver is installed now.

After vmware tools update the solaris 11 x64 driver is installed. So that would probably explain why napp-it breaks.

I can not find a way to force version 10.

J-san · Dec 17, 2014

toelie said:
solaris 10 x64 driver is installed now.

After vmware tools update the solaris 11 x64 driver is installed. So that would probably explain why napp-it breaks.

I can not find a way to force version 10.

Hi Toelie, as a last resort you could try to do the opposite in Step 4 of the following post and edit the script to use version 10:
https://forums.servethehome.com/ind...le-and-napp-it-vmxnet3-and-jumbo-frames.2853/

(they were trying to force using version 11)

You could try modifying the vmware-config-tools.pl to override the Solaris version reported by uname:

Under vmware-tools-distrib/bin

# vi vmware-config-tools.pl

Search for solaris_os_version:
Type:
/solaris_os_version (hit enter)

Add the hack section in following sub function, which basically overrides what the uname reports to the script.

Note: This will affect all the functions in the VMware tools install so use at your own risk, eg don't use unless you have nothing else to lose

Code:

sub solaris_os_version {
  my $solVersion = direct_command(shell_string($gHelper{'uname'}) . ' -r');
  chomp($solVersion);
  my ($major, $minor) = split /\./, $solVersion;

  # HardOCP hack - force to use Solaris 10.
  # Use at your own risk! (create a boot environment first)
  # This will force all function checks in VMware install
  #  to think you are on Solaris 10.
  $minor = 10;
  # END hack  

  return ($major, $minor);
}

toelie · Dec 19, 2014

@J-San

Thank you very much. I could only get it to work with the perl script modification. Very strange because it should detect the version automatically by querying omnios itself. No idea why it does not work on my omnios install.

I will try your omnios tweaks (Jumbo frames, TCP and NFS) on the new test server that will arrive in the next few weeks. Looks promising...

J-san · Dec 20, 2014

Glad that helped!

My version of OmniOS is from Gea's vmware appliance.
It's on OmniOS r151010

Maybe there's an issue with a newer version?
There is an OmniOS r151012 which is the latest "stable" release.. and maybe it reports as Solaris 11?

To determine the major release your system is on, look at /etc/release

_Gea · Dec 20, 2014

I have uploaded a new ESXi appliance with current OmniOS 151012 and napp-it 0.9f3

The installed vmware tools are a from a default 5.5U2 setup that installs the Solaris 10-64 versions
If you want to try Solaris 11 versions of vmxnet3, the vmware tools installer is in /root
with the drivers in /root/vmware-tools-distrib/lib/modules/binary/

download: http://napp-it.org/downloads/index_en.html

paniolo · Dec 22, 2014

Is there any way I can choose which NIC to use for Replication? If possible, can I use multiple NICs?

Also FYI appliance-group does not work when admin pwd starts with !.

/Paniolo

_Gea · Dec 22, 2014

You replicate to another host/ip so the nic is selected based on the ip settings.
With two nics they should not be in the same subnet.

If you want to use two nics you must create a link aggregation (LACP).
A better option may be a switch to 10G if you need more performance.

About pw
napp-it parses the form values and removes some characters
Allowed characters see menu napp-it settings -> pw
allowed: [a-zA-Z0-9,.-;:_#]

Some more characters would be possible for pw like !
I check if I can parse less restrictive.

paniolo · Dec 22, 2014

_Gea said:
You replicate to another host/ip so the nic is selected based on the ip settings.
With two nics they should not be in the same subnet.

If you want to use two nics you must create a link aggregation (LACP).
A better option may be a switch to 10G if you need more performance.

I have 2x1gbit in LACP I use for management and then two Infiniband 32Gbit I have each on their own subnet.

netcat happens over the management network - each host can ping each other across all 3 ip addresses.

I would prefer if it would use at least one of the 32gbit links instead of the 1gbit link.

Paniolo

shanester · Dec 22, 2014

Auto scrub jobs are no longer working since Dec 6. I am unable to delete or manually run the jobs. I have restarted the server (OminOS) with same results. Any suggestions for troubleshooting/resolution?

_Gea · Dec 22, 2014

confirmed, seems to be a bug in dev edition
.... fixed in 0.9f4_dev dec 23 nightly

shanester · Dec 23, 2014

_Gea said:
confirmed, seems to be a bug in dev edition
fixed in 0.9f4_dev dec 23 nightly

Thanks...what is the command to update to this nightly build?

_Gea · Dec 23, 2014

For napp-it Home and Pro: menu About - Update >> download 0.9f4

I have added 0.9f4 dev also to todays free ESXi appliance v15a (2015, first edition).
It includes support for push alerts to your smartphone/ tablet and first efforts to support IB

shanester · Dec 23, 2014

_Gea said:
For napp-it Home and Pro: menu About - Update >> download 0.9f4

I have added 0.9f4 dev also to todays free ESXi appliance v15a (2015, first edition).
It includes support for push alerts to your smartphone/ tablet and first efforts to support IB

Since I was already at a 0.9f4 release it wasn't clear. Appears to be working fine. Thanks for your quick responses.

_Gea · Dec 23, 2014

shanester said:
Since I was already at a 0.9f4 release it wasn't clear. Appears to be working fine. Thanks for your quick responses.

I do not assign a new release number to small bugfixes in the developer nightly edition.
A new date must be enough as they can/are be updated daily.

This is different to the stable numbers like 0.9f3 and the next 0.9f5 stable that are installed as a default setup.
They are published only after some time of "nobody reports problems"

balance101 · Dec 27, 2014

_Gea said:
If you want to improve local transfers
- use current ESXi 5.5U2
- use Vmxnet3 vnics
- opt. enable Jumbo frames

simlar posts:
https://forums.servethehome.com/ind...le-and-napp-it-vmxnet3-and-jumbo-frames.2853/
https://forums.servethehome.com/index.php?threads/napp-it-all-in-one-vmxnet3s-jumbo-frames.3858/

will this increase transfer speed from external computers to nappit in ESXI or just to VM to VM inside ESXI

I'm moving my zraid1 5x2TB from freenas VM to omniOS VM but transfer speed is like starts from 100 KB/s to 6 MB/s with large files from omniOS vm to windows 7. I did a clean install with OmniOS.iso install and followed the instruction to setup vmxnet3 only. I have been running esxi 5.5.0 U2

btw I have uploaded napp-it_15a_vm_for_ESXi_5.0-5.5.zip to my server if anyone else want it because the transfer speed to my end was pretty slow.

Will the proconfigured version be any different?

my stand alone IO 151a5 w/ nappit works great over the years. If this doesn't work I still have some consumer hardware I can build a standalone machine but it just wont have ecc memory.

_Gea · Dec 27, 2014

vmxnet3 speed up traffic from a guest VM to the ESXi virtual switch.
If you have two guests, transfer can go up to several GByte/s on internal transfers.

External transfers are limited by the physical NIC (1 GB/s, about 100 MByte/s).
In your case, you need to connect all VMs to the same vSwitch or transfers are going
over the slower physical NIcs.

Have you enabled sync on those filesystems that are connected to ESXi over NFS?
If so disable sync and recheck performance

balance101 · Dec 28, 2014

_Gea said:
vmxnet3 speed up traffic from a guest VM to the ESXi virtual switch.
If you have two guests, transfer can go up to several GByte/s on internal transfers.

External transfers are limited by the physical NIC (1 GB/s, about 100 MByte/s).
In your case, you need to connect all VMs to the same vSwitch or transfers are going
over the slower physical NIcs.

Have you enabled sync on those filesystems that are connected to ESXi over NFS?
If so disable sync and recheck performance

OK tested the system again found something interesting. new files I dump onto zfs drive is really fast and when I transfer it back, it's really fast.

ONLY the old files that was in the drive (from freenas) are super slow

zpool version is 5000 ???

HammerSandwich · Dec 28, 2014

balance101 said:
zpool version is 5000 ???

Yes, because it's OpenZFS.

8qpew2YA · Dec 29, 2014

I'm in the process of rebuilding one of my NAS machines following issues with a ST2000DM001 (apparently there is a firmware update that I need to apply to the disks). I had been running in RaidZ mode (Raid5), but want more security of data so am going to RaidZ2 (Raid6) with 6 * ST2000DM001.

This means I'm booting off the USB, which I have been doing on a couple of the other machines. However I'm unable to get the 2nd USB device to show up. The boot device is the lower of the 4 front USB slots, and I want to mirror with a device in the slot above.

The lower slot presents as c2t0d0, and the one above seems to come up as c4t0d0 - I say 'seems' as both USB's appeared in the Napp-It screen (Disks), but both has a status of 'removed', a few minutes later the mirror device no longer appeared.

Anyone have any ideas on how to mirror the zpool when booting from USB on a N40L? There's currently nothing on the machine so I have no issues with rebuilding anything to achieve the required end state (boot off mirrored USB's).

pcd
NB I'm aware that I can periodically 'clone' the USB device - not really what I'm looking for.

_Gea · Dec 30, 2014

Can you just mirror the sticks?
You can check the disk state in menu pools.

zervun · Dec 31, 2014

Hey everyone,

I'm entering the final stages of my Napp-it NAS.

What I have run into is an interesting problem.

I have a Supermicro MBD-X10SL7-F-O that has the 2308 controller on it.

I've successfully flashed the onboard to IT mode and it works fine by itself.

If I add in my 9207 card, only the bios for the onboard loads. If I disable the onboard SAS the 9207 loads.

I've tried flashing the 9207 to both IR and IT mode.

What I was trying to accomplish was to have the onboard handle my ZFS drives for Napp-it while the 9207 would be in IR mode hardware mirror on SSD for the ESXI and napp-it Install.

I have the following configuration so far

On the 9207 (IR Mode)
2x 120gig SSD, raid 1 mirror for ESXI and napp-it install

On the onboard 2308 (IT Mode)
2x 256gig SSD ZFS which will host all the other VMs
4x 4TB 7200rpm Seagates ZFS for SMB or iSCSI (fast media storage)

_Gea · Dec 31, 2014

I would flash

- one card with bios extension and IR firmware (boot)
- one card with removed bios and IT firmware (data, pass-through)

zervun · Dec 31, 2014

_Gea said:
I would flash

- one card with bios extension and IR firmware (boot)
- one card with removed bios and IT firmware (data, pass-through)

Ah thanks! I didn't know that you could remove the bios on them.

I'm having a hard time finding out how to remove the bios - do you happen to have an article on that?

I'm quite familiar with how to update the firmware.

TCM2 · Dec 31, 2014

Clear the flash (as you should do anyway on an update) and just don't flash a BIOS.

Edit: Specifically, the options -o -e 6 should be used to clear everything except the manufacturer area of the flash. See the sas2flash manual.

zervun · Dec 31, 2014

TCM2 said:
Clear the flash (as you should do anyway on an update) and just don't flash a BIOS.

Edit: Specifically, the options -o -e 6 should be used to clear everything except the manufacturer area of the flash. See the sas2flash manual.

Ah gotcha

sas2flash -o -e 6

to erase, then

sas2flash -o -f XXXXXXXX.bin

leaving off the -b mptsas2.rom - the bios is the .rom in this, got it

TCM2 · Dec 31, 2014

Exactly.

zervun · Jan 1, 2015

TCM2 said:
Exactly.

So the saga still goes on - I'm hoping this will help someone if they run into this same situation -

1. I have successfully flashed my 2308 onboard to IT without bios

Hooked up to it:
4x Seagate 7200rpm 4gig drives (for ZFS)
2x Samsung 840 EVOs 256 (for ZFS) which will hold the VMs

2. I have successfully flashed my 9207-8i to IR mode (making it a 9217-8i) with bios

This has fixed my bios issues - the mpt/avago bios manager correctly loads which recognizes both LSI cards

However - I found out a major issue with this. I had purchased Kingston 120gig SSDNOW v300's for a raid 1 mirror off the 9207 in IR mode.

On boot it either holds at initialization indefinitely, bugs out a while, one time made it into the bios manager however it failed out almost immediately on the raid 1 creation

If I disconnect the SAS cable on the 9207 this issue goes immediately away. If I plug any other SATA drive (I have a 30gig SSD as well as a SATA DVDR on the same system) it loads perfectly. If I move the samsung 840 EVOs to the controller all the issues go away.

I was pretty stumped and looked up the Kingston drive (I got them off of Amazon last week) and low and behold they have a sandforce controller (ironically owned by LSI). After googling a bit there are issues with trim and LSI controllers, sandforce in general on LSI controllers, etc.

tl:dr - my sandforce controllers on my SSD Kingston's is screwing me over and my LSI 9207 has major issues with them. I didn't think to check for that when buying them.

I haven't checked on them to see if they have any bios updates that fix the issues.

zervun · Jan 1, 2015

Another follow up -

I had purchased a pair of 120gig PNY Optima SSD's from Bestbuy. After reading the tweaktown and other horror stories of how they swapped the SMI controllers for sandforce ones (bait and switch) and were dying frequently I kept them in the box and ordered the Kingston v300's from Amazon not knowing they were sandforce.

Tonight after having the Kingston issues with the LSI, I decided to open up the PNYs. They were not the sandforce ones but the SMI ones so I lucked out. They work perfect and I was able to raid 1 on the 9207 IR as well as installing ESXI. The bios on the 9207 sees all the SSDs in the system and HDs perfectly (now that i have removed the Kingstons). Booting is quick.

The Kingston ones seem to work fine booting them on non-LSI controllers (one of my other systems).

_Gea · Jan 1, 2015

Which LSI firmware have you flashed?

There are some bug reports around with the current P20 firmware,
especially with faster disks and SSD.

Suggested workaround
use P19 firmware

zervun · Jan 1, 2015

_Gea said:
Which LSI firmware have you flashed?

There are some bug reports around with the current P20 firmware,
especially with faster disks and SSD.

Suggested workaround
use P19 firmware

I flashed the P20 firmware. Now I'm wondering if I should go back to P19

It does seem to be working fine with the Samsung EVOs, SMI controller PNY's and regular hard drives.

8qpew2YA · Jan 1, 2015

_Gea said:
Can you just mirror the sticks?
You can check the disk state in menu pools.

Gea
I wish I could, you even provide a menu option to do this. The problem is that the boot USB device appears as 'removed' and the candidate mirror appears as removed for a short period of time then vanishes altogether.

This is with OmniOS, which I notice is missing some utilities such as 'top' and 'rmformat'. These are both present in OpenIndiana, not sure if this helps or not. Installing the package that has 'rmformat' allowed me to at least 'see' the devices.

pcd

_Gea · Jan 2, 2015

Can you see the sticks when you enter format (console, end with ctrl-c or napp-it cmd form)
You can disable monitoring (top level menu mon). That may keep the removed state.

I have currently no N40 to check the base problem but the needed steps are easy, see
http://omnios.omniti.com/wiki.php/GeneralAdministration#MirroringARootPool

8qpew2YA · Jan 3, 2015

_Gea said:
Can you see the sticks when you enter format (console, end with ctrl-c or napp-it cmd form)
You can disable monitoring (top level menu mon). That may keep the removed state.

I have currently no N40 to check the base problem but the needed steps are easy, see
http://omnios.omniti.com/wiki.php/GeneralAdministration#MirroringARootPool

Format only shows the 6 main disks:

Code:

AVAILABLE DISK SELECTIONS:
       0. c1t0d0 <ATA-ST2000DM001-9YN1-CC4H-1.82TB>
          /pci@0,0/pci103c,1609@11/disk@0,0
       1. c1t1d0 <ATA-ST2000DM001-9YN1-CC4H-1.82TB>
          /pci@0,0/pci103c,1609@11/disk@1,0
       2. c1t2d0 <ATA-ST2000DM001-9YN1-CC4H-1.82TB>
          /pci@0,0/pci103c,1609@11/disk@2,0
       3. c1t3d0 <ATA-ST2000DM001-9YN1-CC4H-1.82TB>
          /pci@0,0/pci103c,1609@11/disk@3,0
       4. c1t4d0 <ATA-ST2000DM001-1ER1-CC25-1.82TB>
          /pci@0,0/pci103c,1609@11/disk@4,0
       5. c1t5d0 <ATA-ST2000DM001-1ER1-CC25-1.82TB>
          /pci@0,0/pci103c,1609@11/disk@5,0
Specify disk (enter its number):

rmformat shows the USB devices

Code:

     1. Logical Node: /dev/rdsk/c2t0d0p0
        Physical Node: /pci@0,0/pci103c,1609@12,2/storage@2/disk@0,0
        Connected Device: SanDisk  Ultra Fit        1.00
        Device Type: Removable
	Bus: USB
	Size: 30.6 GB
	Label: <None>
	Access permissions: Medium is not write protected.
     2. Logical Node: /dev/rdsk/c4t0d0p0
        Physical Node: /pci@0,0/pci103c,1609@12,2/storage@1/disk@0,0
        Connected Device: SanDisk  Ultra Fit        1.00
        Device Type: Removable
	Bus: USB
	Size: 30.6 GB
	Label: <Unknown>
	Access permissions: Medium is not write protected.

The disks menu only shows the boot device and indicates that it is removed (formatting stuffed up)

Code:

 id     	 part          	 identify     	 stat 	 diskcap 	 partcap 	 error 	 vendor 	 product 	 sn 
 c1t0d0 	 (!parted) 	 via dd 	 ok 	   	 2 TB 	  S:0 H:0 T:0 	 ATA 	 ST2000DM001-9YN1 	 S2F028NR 
 c1t1d0 	 (!parted) 	 via dd 	 ok 	   	 2 TB 	  S:0 H:0 T:0 	 ATA 	 ST2000DM001-9YN1 	 S2F00KND 
 c1t2d0 	 (!parted) 	 via dd 	 ok 	   	 2 TB 	  S:0 H:0 T:0 	 ATA 	 ST2000DM001-9YN1 	 S2F028W4 
 c1t3d0 	 (!parted) 	 via dd 	 ok 	   	 2 TB 	  S:0 H:0 T:0 	 ATA 	 ST2000DM001-9YN1 	 S2F0291P 
 c1t4d0 	 (!parted) 	 via dd 	 ok 	   	 2 TB 	  S:0 H:0 T:0 	 ATA 	 ST2000DM001-1ER1 	 Z4Z1MPCD 
 c1t5d0 	 (!parted) 	 via dd 	 ok 	   	 2 TB 	  S:0 H:0 T:0 	 ATA 	 ST2000DM001-1ER1 	 Z4Z14834 
 c2t0d0 	 - 	 - 	 removed 	   	 32 GB 	 - 	 SanDisk 	 Ultra Fit

_Gea · Jan 3, 2015

Ok, menu disks is using format to detect disks.
You can now
- mirror manually with the disk-ids

What happens if you enable partition support in napp-it
where parted is used instead of format.
>> Menu disks >> Partitions >> enable Partition support

8qpew2YA · Jan 3, 2015

_Gea said:
Can you see the sticks when you enter format (console, end with ctrl-c or napp-it cmd form)
You can disable monitoring (top level menu mon). That may keep the removed state.

I have currently no N40 to check the base problem but the needed steps are easy, see
http://omnios.omniti.com/wiki.php/GeneralAdministration#MirroringARootPool

Followup to the post above and my initial reply. I followed the instructions, and the pool is currently resilvering. So whilst the deices are appearing for some commands and not for others, blindly following the instructions seems to work!

Resilver complete, Grub installed, system rebooted - All OK
Everything is looking great.

OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

n00b

Supreme [H]ardness

Gawd

n00b

n00b

n00b

n00b

n00b

n00b

Supreme [H]ardness

n00b

Supreme [H]ardness

n00b

Weaksauce

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

Limp Gawd

[H]ard|Gawd

n00b

Supreme [H]ardness

Gawd

Supreme [H]ardness

Gawd

Gawd

Gawd

Gawd

Gawd

Gawd

Supreme [H]ardness

Gawd

n00b

Supreme [H]ardness

n00b

Supreme [H]ardness

n00b