OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Unfortunately Jumboframes isn't an option because on my client machines because a lot of the other servers and hosts they connect to wouldn't have Jumboframes.

Oddly enough, the Intel site doesn't have any drivers for the X540-T1 for Windows 10 but I think the 8.1 drivers work.
 
Hi, I'm encountering a strange permission problem on one ZFS-Filesystem:
One User can't delete non empty folders (others can). When he first deletes the files within the folder, then he's able to delete the folder itself.
What I've done so far is to check the ZFS-Filesystems aclinherit (passthrough), aclmode (passthrough) and to reset the ACL's with napp-it "reset ACL's" function to "modify". No improvements so far.
The same user is able to delete non empty folder folder on another ZFS-Filesystem on the same machine with the same rights (napp-it "modify").
Maybe this hint is helpful: Short time ago I accidentally changed the aclinherit from passthrough to restricted but changed it back to passthrough immediately.
Thanks in advance for your help!
 
One more thing to precise "can't delete": Yes, he can delete without error message. But shortly after deletion (or F5) the folder reappears.
 
Have you tried

permission problems
- reset acl to modify recursively
- have you enabled any special share settings like abe
- have you checked for hidden files ex via WinSCP
- smb-connect as root and try to delete

ZFS problems
- have you used OmniOS151014 (prior september 2015) with an L2Arc device?

Other
- what OS release
- have you tried from a different client/Windows machine?
 
Hi Gea, thanks for the good hints!

permission problems
- reset acl to modify recursively -> Yes, I've done that several times
- have you enabled any special share settings like abe -> No
- have you checked for hidden files ex via WinSCP -> Nothing special to see. I can see and delete the files.
- smb-connect as root and try to delete -> SAME error! I'm not able to delete non empty folders. They reappear after hitting F5.

ZFS problems
- have you used OmniOS151014 (prior september 2015) with an L2Arc device? -> I'm using OmniOS omnios-8322307 November built. Clean install. No L2Arc.

Other
- what OS release -> omnios-8322307 2015-11
- have you tried from a different client/Windows machine? -> Other physical Windows 7 machines with the same user logged on are working!

To sum it up: From one specific Windows 7 client I'm not able to delete non empty folders. There is no error message after hitting delete. The folder disappears and reappears after hitting F5. And the whole thing happens when being logged on as a root user! So it must be something very close related to Windows.

One thought I had was that there are two machines with a cloned Image in the network. One of these is the machine with the problems. Cause of that the SID of both machines is the same. I'm not concurrently logged in with the same user on both machines. Any ideas?
 
Ok, a Windows problem
Have you already googled "Windows 7 delete reappear"
 
Gea, do you know if NFS can be tuned at all with respect to TCP Window Size. I know that CIFS will pick up the TCP properties, does the same apply for NFS?
 
I am not sure if I understand the question correctly
but NFS 3/4 should benefit from the same tcp settings as smb

You have different levels on tunable parameters.
The effect may depend on each other but basically you can

- tune tcp-ip, mainly buffers and mtu
- tune the blocksize/recordsize of the filesystem. Default is 128k for a filesystem and 8k for a zvol
- tune system settings like cache behaviour ot disk timeout
- tune services, with NFS this depend on version and load
https://docs.oracle.com/cd/E53394_01/html/E54818/chapter3-1.html#scrolltoc
https://docs.oracle.com/cd/E23824_01/html/821-1454/rfsintro-101.html#scrolltoc

Solaris defaults are well for average use and 1G networks and low memory demand.
With 10G and newer hardware you can modify these settings. As all of these are specific to a hardware
and use case, you can only try some settings and check the effect.
 
One more thing to precise "can't delete": Yes, he can delete without error message. But shortly after deletion (or F5) the folder reappears.

I finally got it. The Windows 7 Sync Center / Offline Files Service was stuck. Reboots didn't fix that. I had to reset the settings of the service:
Code:
Windows Registry Editor Version 5.00  [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\CSC\Parameters] "FormatDatabase"=dword:00000001
And everything was working again.
I narrowed it down by recognising that only the Windows Explorer wasn't able to delete. Yet other programs e.g. PowerShell could do it.
Thanks @Gea for the guidance!
 
New OmniOS 151017 February beta with SMB 2.1 available
There is an update for OpenSSH that fix the sudo bug of the january beta.
Napp-it is working again with this update

update via:
pkg update

from OmniOS discuss:
Based on illumos-omnios master commit cc70d5b, uname -v reports "omnios-cc70d5b".

And omnios-build master commit 45f0d7b.

New with this update:

- Case-sensitivity now in ZFS test suite.

- GLDv3 improvement (by OmniTI's own Dale Ghent) which should produce less confusing dladm(1M) output.

- SunSSH and sudo now honor the illumos-bug-6057 ways of doing things. Please PLEASE look for other "last login" weirdness, but there shouldn't be any. If there is, PLEASE report it to this list ASAP.

- Speaking of SSH, OpenSSH now includes the full wad of Joyent work (modulo 1-2 small differences) to be more of a SunSSH replacement. If you haven't tried OpenSSH because you needed something from SunSSH, PLEASE try it now.


There is no new release media for this update, but I did update all of the packages, so you'll need to create a new BE and reboot.

Thanks,
Dan
 
Dear _Gea

I am using napp-it pro version for replication, and it works like a charm .

but today the board displayed
job/ task server: missing
, and the replication stoped working on one machine.

doesn't work even if I restart cron by
svcadm restart svc:/system/cron:default
what can I do about it ?
 
Taskserver is a service controlled by napp-it.
You can restart at console via
/etc/init.d/napp-it restart

or via web-ui in menu
System >> restart napp-it
 
Taskserver is a service controlled by napp-it.
You can restart at console via
/etc/init.d/napp-it restart

or via web-ui in menu
System >> restart napp-it

After restart napp-it ,
the taskserver came back, but the replication jobs didn't.

the log says "error; source missing: lfpool/vm.infra " , but I didn't change any setting@@

I tried to start the job manually , it still got the same error.

I went to Extensions/appliance-group to check the config, it seems ok, but when I click on the source-machines's zfs
it shows "the key is wrong"

both the source-machine and target-machine's replication extensions are valid.
 
I am not sure if I understand the question correctly
but NFS 3/4 should benefit from the same tcp settings as smb

You have different levels on tunable parameters.
The effect may depend on each other but basically you can

- tune tcp-ip, mainly buffers and mtu
- tune the blocksize/recordsize of the filesystem. Default is 128k for a filesystem and 8k for a zvol
- tune system settings like cache behaviour ot disk timeout
- tune services, with NFS this depend on version and load
NFS Tunable Parameters - Oracle® Solaris 11.3 Tunable Parameters Reference Manual
Features of the NFS Service - Oracle Solaris Administration: Network Services

Solaris defaults are well for average use and 1G networks and low memory demand.
With 10G and newer hardware you can modify these settings. As all of these are specific to a hardware
and use case, you can only try some settings and check the effect.

I've been able to tune my TCP parameters and my CIFS performance can now saturate the link. My NFS performance cannot (on windows). Perhaps it is related to the Windows NFS driver.
 
After restart napp-it ,
the taskserver came back, but the replication jobs didn't.

the log says "error; source missing: lfpool/vm.infra " , but I didn't change any setting@@

I tried to start the job manually , it still got the same error.

I went to Extensions/appliance-group to check the config, it seems ok, but when I click on the source-machines's zfs
it shows "the key is wrong"

both the source-machine and target-machine's replication extensions are valid.

"the key is wrong" is related to the communication key that is generated when you establish the group - not the license key.

Check
In Extension > appliance group, click on ZFS or snaps on a group member to check communication
On problems, rejoin the group (++ add appliance)

If this does not help example if you have several entries after a host rename or ip change, delete the group member either
manually in /var/web-gui/_log/group or in current napp-it with menu Extensions
gt.png
appliance-group
gt.png
delete group members
(check on both sides)
 
"the key is wrong" is related to the communication key that is generated when you establish the group - not the license key.

Check
In Extension > appliance group, click on ZFS or snaps on a group member to check communication
On problems, rejoin the group (++ add appliance)

If this does not help example if you have several entries after a host rename or ip change, delete the group member either
manually in /var/web-gui/_log/group or in current napp-it with menu Extensions
gt.png
appliance-group
gt.png
delete group members
(check on both sides)

Thanks , Gea
Helps a lot!!


It worked !! But I am encountering a little strange situation:


I have two hosts running replication to each other. that's say zfs01/zfs02
the two hosts both have four net cards.
zfs01: 192.168.1.5 , 192.168.1.6 , 192.168.10.5 , 192.168.10,6
zfs02: 192.168.1.33, 192.168.1.34 , 192.168.10.33, 192.168.10.34

192.168.1.* is for provide smb storage for other windows machine, so we can ignore it.
192.168.10.* is for nfs and zfs communication for servers.

if I join 192.168.10.5 + 192.168.10.6 on zfs02 first ,
then after join 192.168.10.33+192.168.10.34 on zfs01 ,
the replication job on zfs02 would fail and shows key's wrong,
while replication job on zfs01 would still working correctly.

but if I join 192.168.10.5 + 192.168.10.6 on zfs02 first,
then join only 192.168.10.34 on zfs01 ,
the replication jobs on both host would all work correctly.

so I am using the 2nd way now, but still don't know why this happened@@
 
Seems a multihome network related problem with different paths/ network connections between the servers as replication requires a two way communication.
Ex the request goes over one nic and the answer comes on the next and is discarded then.
 
Here's a question: I am back to trying omnios+napp-it (had been playing with linux+scst). One nice thing about scst is that I can set up esxi to use a zvol RW for the actual VM storage. The backup SW (veeam) is then presented with a totally different target/lun which is readonly. I trust veeam not to step on the VM zvol, but this is extra protection. Is there any way to do this with comstar? I don't see how, since the RO vs RW decision seems to be associated with the lun, and I can't seem to create multiple luns using the same underlying storage :(
 
It should be possible to
- create a logical unit ex from a zvol
- create two target groups each with a different target as member
- create a view with a different lun number for the logical unit to each target group

This should result in two targets showing the same logical unit.
But as the readonly property can be assigned only to the logical unit this will affect both targets.

This might work but only if a client cares about.
 
Yeah, I know. I don't *need* RO for the veeam backup but trying to play it safe. Just have to live with it, I guess :)
 
Gea,

What setup must be done in napp-it to ensure you get notified if a disk or smart values go bad?
 
In menu About >> settings
- enter email or push account

In menu Jobs:

- enable autoservice
- create an alert job and enable it

optionally: set TLS mail (see menu Jobs)


This will send an alert but only if a disk fails, not on smart checks
 
I'm having trouble with Realtimemonitor – iostat2 as the table is missing headers as of 16.02 pro Feb. 2 2016. I tried the Feb. 19 update to no avail.

My system is OmniOS r151016 omnios-33c53a8.

Thank you.
 
Have you tried to disable/enable monitoring
("Mon" in the upper right top level menu right of logout)
 
Have you tried to disable/enable monitoring
("Mon" in the upper right top level menu right of logout)
I disabled Mon, and then I saw "iostat disabled." After I reactivated Mon, iostat2 still showed without table column headers.

The rows of data are there, but there are no column headers.
 
Hey Gea, just purchased 2x monitor licenses, thanks for all the great work!

Question about snap jobs: Is it possible for me to edit the code to change how it names snaps? I want it to just be the date in the form of month/day/year instead of the very long string that Napp-it currently creates. So for instance I just want it to be pool@DailySnap-02-25-2016
 
Hey Gea, just purchased 2x monitor licenses, thanks for all the great work!

Question about snap jobs: Is it possible for me to edit the code to change how it names snaps? I want it to just be the date in the form of month/day/year instead of the very long string that Napp-it currently creates. So for instance I just want it to be pool@DailySnap-02-25-2016


It is possible to edit /var/web-gui/data/napp-it/zfsos/_lib/scripts/job-snap.pl but not suggested as it is not update safe.
Basically the current name is zfs@ "jobname like daily" - "jobid" _ "date" . "hour"

As you can create different snap jobs with their own retention ex hourly keep 12, daily keep 20 you need the jobid to identify a snap source.
You also need zfs and date so the only option to remove would be jobname and hour.

So the best option would create your own snap job as an "other-job" that you save below /var/web-gui/_my (update safe)
You can use the default snap job (Perl) as a reference.
 
Cool.

So on Solaris how would I use the date variable to put it in the snapshot?

Like would it be
Code:
zfs snapshot tank@'gdate'
?
 
You need a shell or Perl script that processes the output of date ex

# date
# Fri Feb 26 16:40:40 UTC 2016
 
Hello. Please may I request: when using "Filter snaps" on Snapshots that the column headers continue to show after having filtered.

Thank you.
 
Hi,

I have lost a disk in a RaidZ1 array. While im getting a replacement, is it possible to pass through a USB drive to the VM and temporarily replace the bad one with the USB drive, and when the replacement comes replace again?

Thanks,
 
I have a napp-it server where I needed to install runtime/perl-64 as it is a pre-requisite for another package.

After each napp-it update, some functionality (e.g., "ZFS Filesystems," "Pools") does not work unless I delete the IO/ perl package directories in /var/web-gui/data. How can I make my napp-it installation sensitive to the perl-64 installation so it doesn't install the superfluous, incompatible IO directories?
 
Napp-it includes some precompiled Perl modules that are not included per default like Expect.pm as this is needed for some interactive functionality and as they are not trivial to install. The reference to them is in /var/web-gui/data/wwwroot/cgi-bin/admin.pl line 4. You can comment this out (or delete the unwanted IO modules) but then your Perl config must include these modules.

The suggested way is to use napp-it for storage only and install other server applications in a virtualized environment.
 
Please may I request that S99napp-it stopcmd lock/offline any "encrypted pool on files" (pool oef) which napp-it is managing. I've had pools corrupted unexpectedly in nut power events which execute SHUTDOWNCMD and also when executing System -> Restart napp-it after updating /var/web-gui/_log/mini_httpd.pem with new SSL certificates.

Luckily, I had snaps to rescue the pool files, but I would greatly prefer if they weren't so at risk for corruption.
 
I have a napp-it pro license. Does that allow me to use the tuning panel? If so, how&where do I download that?
 
Back
Top