OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Thread starter _Gea
Start date Dec 30, 2010

Mar 22, 2015

#6,681

G

_Gea

Supreme [H]ardness

Olga-SAN said:
does this particular smb3 implementation work with hyper-v? i mean as vm storage of course

I have only read the announcement.
You may ask at Nexenta.com forums.

Mar 24, 2015

#6,682

A

arryo

n00b

I've been using OpenIndiana with napp-it for 2 years and it's running good. today i tried OmniOS so I export my poolz from Openindiana and import to OmniOS but when I click to pool it shows error like this, do I need to do anything so I can import to OmniOS

Software error:
Can't load '/var/web-gui/data/napp-it/CGI/auto/IO/Tty/Tty.so' for module IO::Tty: ld.so.1: perl: fatal: /var/web-gui/data/napp-it/CGI/auto/IO/Tty/Tty.so: wrong ELF class: ELFCLASS32 at /usr/perl5/5.16.1/lib/i86pc-solaris-thread-multi-64/DynaLoader.pm line 190.
at /var/web-gui/data/napp-it/CGI/IO/Tty.pm line 30.
Compilation failed in require at /var/web-gui/data/napp-it/CGI/IO/Pty.pm line 7.
BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/CGI/IO/Pty.pm line 7.
Compilation failed in require at /var/web-gui/data/napp-it/CGI/Expect.pm line 22.
BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/CGI/Expect.pm line 22.
Compilation failed in require at /var/web-gui/data/napp-it/zfsos/_lib/illumos/zfslib.pl line 2758.
BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/zfsos/_lib/illumos/zfslib.pl line 2758.
For help, please send mail to this site's webmaster, giving this error message and the time and date of the error.
Software error:
[Tue Mar 24 10:29:56 2015] admin.pl: Can't load '/var/web-gui/data/napp-it/CGI/auto/IO/Tty/Tty.so' for module IO::Tty: ld.so.1: perl: fatal: /var/web-gui/data/napp-it/CGI/auto/IO/Tty/Tty.so: wrong ELF class: ELFCLASS32 at /usr/perl5/5.16.1/lib/i86pc-solaris-thread-multi-64/DynaLoader.pm line 190.
[Tue Mar 24 10:29:56 2015] admin.pl: at /var/web-gui/data/napp-it/CGI/IO/Tty.pm line 30.
[Tue Mar 24 10:29:56 2015] admin.pl: Compilation failed in require at /var/web-gui/data/napp-it/CGI/IO/Pty.pm line 7.
[Tue Mar 24 10:29:56 2015] admin.pl: BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/CGI/IO/Pty.pm line 7.
[Tue Mar 24 10:29:56 2015] admin.pl: Compilation failed in require at /var/web-gui/data/napp-it/CGI/Expect.pm line 22.
[Tue Mar 24 10:29:56 2015] admin.pl: BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/CGI/Expect.pm line 22.
[Tue Mar 24 10:29:56 2015] admin.pl: Compilation failed in require at /var/web-gui/data/napp-it/zfsos/_lib/illumos/zfslib.pl line 2758.
[Tue Mar 24 10:29:56 2015] admin.pl: BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/zfsos/_lib/illumos/zfslib.pl line 2758.
Compilation failed in require at admin.pl line 463.
For help, please send mail to this site's webmaster, giving this error message and the time and date of the error.
[Tue Mar 24 10:29:56 2015] admin.pl: [Tue Mar 24 10:29:56 2015] admin.pl: Can't load '/var/web-gui/data/napp-it/CGI/auto/IO/Tty/Tty.so' for module IO::Tty: ld.so.1: perl: fatal: /var/web-gui/data/napp-it/CGI/auto/IO/Tty/Tty.so: wrong ELF class: ELFCLASS32 at /usr/perl5/5.16.1/lib/i86pc-solaris-thread-multi-64/DynaLoader.pm line 190. [Tue Mar 24 10:29:56 2015] admin.pl: [Tue Mar 24 10:29:56 2015] admin.pl: at /var/web-gui/data/napp-it/CGI/IO/Tty.pm line 30. [Tue Mar 24 10:29:56 2015] admin.pl: [Tue Mar 24 10:29:56 2015] admin.pl: Compilation failed in require at /var/web-gui/data/napp-it/CGI/IO/Pty.pm line 7. [Tue Mar 24 10:29:56 2015] admin.pl: [Tue Mar 24 10:29:56 2015] admin.pl: BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/CGI/IO/Pty.pm line 7. [Tue Mar 24 10:29:56 2015] admin.pl: [Tue Mar 24 10:29:56 2015] admin.pl: Compilation failed in require at /var/web-gui/data/napp-it/CGI/Expect.pm line 22. [Tue Mar 24 10:29:56 2015] admin.pl: [Tue Mar 24 10:29:56 2015] admin.pl: BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/CGI/Expect.pm line 22. [Tue Mar 24 10:29:56 2015] admin.pl: [Tue Mar 24 10:29:56 2015] admin.pl: Compilation failed in require at /var/web-gui/data/napp-it/zfsos/_lib/illumos/zfslib.pl line 2758. [Tue Mar 24 10:29:56 2015] admin.pl: [Tue Mar 24 10:29:56 2015] admin.pl: BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/zfsos/_lib/illumos/zfslib.pl line 2758. [Tue Mar 24 10:29:56 2015] admin.pl: Compilation failed in require at admin.pl line 463.

Mar 24, 2015

#6,683

G

_Gea

Supreme [H]ardness

This error message is related to Expect, a software module that is needed to run interactive console commands within the GUI.

Napp-it comes with different versions of Expect (Linux, OI, different OmniOS versions).
It usually detects the needed version during a login to napp-it, so first try a napp-it logout/login.

If this does not help, please add infos about the OmniOS and napp-it release.

Mar 24, 2015

#6,684

A

arryo

n00b

still the same problem after log out/log in.

I'm using Current stable release (r151012, omnios-10b9c79) and nappit 0.9f4 eval

Mar 24, 2015

#6,685

G

_Gea

Supreme [H]ardness

Depending on the reason, you have three options
- redo a basic napp-it setup via wget (helps if something went wrong with the setup)

or copy the two possible OmniOS Expect modules manually (try both)
cp /var/web-gui/data/tools/omni_bloody/CGI/* /var/web-gui/data/napp-it/
cp /var/web-gui/data/tools/omni_stable/CGI/* /var/web-gui/data/napp-it/
(replaces the Expect files in CGI)

or
redo a OmniOS setup followed by a napp-it wget setup

Last edited: Mar 24, 2015

Mar 24, 2015

#6,686

A

arryo

n00b

OK, i did a fresh install of OmniOS and then napp-it setup, and the problem is gone I was able to import my pools. Then I start to install transmission. The installation went fine, after reboot transmission running but the same problem again. So I guess it's the problem between napp-it and transmission. These are the dependent packages that I install together with transmission and pkg-config:

pkg install omniti/library/libevent
pkg install library/perl-5/xml-parser

so it seems that the error relates to perl, but Im not sure how it can be fixed so both can run together

Mar 24, 2015

#6,687

G

_Gea

Supreme [H]ardness

Expect.pm is a CPAN Perl module.
You may try to install Expect again afterwards.

Other option may be using Joyent pkin modules instead
as they install everything in /opt reducing dependencies with core OS settings.

see http://napp-it.org/downloads/binaries.html
http://pkgsrc.joyent.com/packages/SmartOS/2014Q4/x86_64/All/

Last option is a VM or zone

Mar 24, 2015

#6,688

A

arryo

n00b

_Gea said:
Expect.pm is a CPAN Perl module.
You may try to install Expect again afterwards.

Other option may be using Joyent pkin modules instead
as they install everything in /opt reducing dependencies with core OS settings.

see http://napp-it.org/downloads/binaries.html
http://pkgsrc.joyent.com/packages/SmartOS/2014Q4/x86_64/All/

Last option is a VM or zone

Thanks. Using Joyent pkin solved the problem

Mar 25, 2015

#6,689

S

spazoid

Limp Gawd

Okay, I'm throwing in the towel.. I need help!

I'm trying to move the metadata of my PMS to an nfs exported ZFS filesystem. It all seems to work fine: I can mount the exported filesystem and I've copied the old PMS metadata to the mounted filesystem.

Now PMS won't start, and I think the issue is permissions or ownership.

Code:

<username>@Plex:/var/lib/plexmediaserver/Library/Application Support$ ls -l
total 9
drwxrwxrwx 10 messagebus users 11 Mar 25 12:32 Plex Media Server

The filesystem is mounted as folder "Plex Media Server".
I've created a user and group named "plex" on the omnios fileserver and changed ownership of the filesystem to plex: plex, but the above is what I see on the client.

Any ideas?

Mar 25, 2015

#6,690

A

arryo

n00b

Does Power management work in OmniOS, I set my disks to spin down after 900s, but it seems they keep spinning all the time.

Thanks

Mar 25, 2015

#6,691

G

_Gea

Supreme [H]ardness

I have not heard of a general problem unless you or any service like napp-it alerts or fmd (Solaris fault management service that can be disabled) hits a disk.

Mar 26, 2015

#6,692

A

arryo

n00b

I installed Logitech Media Server, so I guess it keeps checking disks all the time making them not spinning off

Mar 27, 2015

#6,693

L

levak

Limp Gawd

Hello!

We are having issues with iSCSI on work. Every now and then iSCSI target just hangs up. We are unable to kill it, restart it or do anything else to restore the service. The only option to restore iscsi target to working state, is to reboot the whole server and loose all the sessions (around 100 clients).

Weird thing is, that only iscsi target hangs. I can ssh to server and work on it without any problem, there is no load or anything else, just iscsi target locks up

Server is a IBM 3550 M4 with dual Xeon E5-2640 CPUs and 160GB of memory.
SAS HBA: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2

Anyone encounter similar troubles?

Matej

Mar 27, 2015

#6,694

G

_Gea

Supreme [H]ardness

I had a similar problem recently on a test machine where I needed a reboot but
I found not the time since to dig deeper into this problem.

If this is on OmniOS, you should ask at the Illumos or OmniOS discuss mailing list (http://lists.omniti.com/mailman/listinfo/omnios-discuss and http://wiki.illumos.org/display/illumos/illumos+Mailing+Lists ). Maybe someone at OmniTi can give an answer even without a support contract. (That can be helpful in a production environment)

btw
every Illumos/OmniOS user should join the Illumos and OmniOS mailling lists as there a more technical informations there compared to a more general forum like this.

Mar 27, 2015

#6,695

S

scrummie02

n00b

I am trying to configure media tomb and want it to scan the directories I have for media but it doesn't seem to pick anything up. All that shows up is some pictures from my home directory - which i don't want to share.

Mar 27, 2015

#6,696

L

levak

Limp Gawd

_Gea: Thanks for your input.

I will give OmniOS mailing list a try. I already wanted to send a mail there as well.

In case you get any more information on this issue, please post here.

MAtej

Mar 29, 2015

#6,697

N

natkin

n00b

I am having trouble with all my hourly snaps not being maintained as expected—specifically they are accumulating in the hundreds despite their jobs' keep/hold parameters.

As an example, I have an hourly job with the following properties:

Opt1 /from: "keep 50"
Opt2 /to: "hld 3"
Opt3: ""
every month
every day
every 1-BC hour
15 min

but it has accumulated hundreds of snaps. After the first time I noticed the accumulation, I deleted all but fifteen, but the number has built up again.

Please does anyone know if this is a problematic configuration?

Mar 30, 2015

#6,698

G

_Gea

Supreme [H]ardness

The above setting creates 48 snaps per day.
As keep and hold are respected both, hold 3 days is the effective setting
what means that you should have 144 snaps

If snap is recursive, you have 144 snaps x number of filesystems.

Mar 30, 2015

#6,699

N

natkin

n00b

With those settings, I'm only seeing 12 snaps per dayat fifteen past the hour from 0615 to 1715but I have 359 of them now since 01 March and climbing.

_Gea said:
The above setting creates 48 snaps per day.
As keep and hold are respected both, hold 3 days is the effective setting
what means that you should have 144 snaps

If snap is recursive, you have 144 snaps x number of filesystems.

Mar 30, 2015

#6,700

C

CopyRunStart

Limp Gawd

I got a pro evaluation today and have a few questions:

1. If I start a replication that takes longer than the eval, will it stop?

2. How much is it for just the async moduel for a non-gov non-educational US company?

3. When I add my 2nd Solaris server that I want to back up via "++add appliance", it says "wrong password". It resolved the IP to hostname correctly so there is definitely a connection. I'm 100% sure the password is correct because I use it to login to the nap-itt config every day.

Mar 31, 2015

#6,701

G

_Gea

Supreme [H]ardness

1.
once started it will run

2.
replication extension, options see
http://napp-it.org/extensions/quotation_en.html

3.
enter the ip of the source server with its napp-it admin pw (not root pw)
some char are not allowed in your password, optionally try a pw from [A-Za-z0-9]

Apr 1, 2015

#6,702

L

levak

Limp Gawd

If anyone encounters troubles with iscsi target freezing with JBOD SAS expander box, here is a discussion:
http://lists.omniti.com/pipermail/omnios-discuss/2015-March/004593.html

It doesn't look good though. It looks like SATA and SAS expanders don't go well together, specially on servers with high load (average IOPS on our server is 7k/s read and 1k/s write). There might be troubles when a drive in array hangs and controller sends a restart command. In some cases, the whole SAS expander resets and all commands in queue and on bus are lost, producing panic on a system

In my case, sometimes only some hosts loose a connection, sometime all hosts drop,...

Matej

Apr 1, 2015

#6,703

T

ToddW2

2[H]4U

Matej -

What expander part # are you using?

Is this occurring on all your machines or just one?

Apr 1, 2015

#6,704

L

levak

Limp Gawd

We have 2 of these cases:
http://www.supermicro.com/products/chassis/3U/837/SC837E26-RJBOD1.cfm

This is occuring on only one machine, but then again, we only have one storage server with OmniOS and setup like that.

Matej

Apr 1, 2015

#6,705

S

sorhol

n00b

Not relevant

Last edited: Apr 1, 2015

Apr 1, 2015

#6,706

T

ToddW2

2[H]4U

levak said:
We have 2 of these cases:
http://www.supermicro.com/products/chassis/3U/837/SC837E26-RJBOD1.cfm

This is occuring on only one machine, but then again, we only have one storage server with OmniOS and setup like that.

Matej

Backplane
- BPN-SAS2-837EL2 +
- BPN-SAS-837A

Hmmmm. Are you using both ports, just one? Sata or SAS drives or mix?

Apr 1, 2015

#6,707

S

sorhol

n00b

Another mistake....sorry sorry

Last edited: Apr 1, 2015

Apr 1, 2015

#6,708

L

levak

Limp Gawd

ToddW2 said:
Backplane
- BPN-SAS2-837EL2 +
- BPN-SAS-837A

Hmmmm. Are you using both ports, just one? Sata or SAS drives or mix?

I have to check tomorrow at work, but I think I'm only using one port and all drives are SATA, which I know now, can be the cause of the problems with it's poor error handling...

I get some errors in logs, but can't access them right now, will have to post tomorrow. They are messages about drive reset, but I can't figure out which drive is causing problems.

MAtej

Apr 1, 2015

#6,709

T

ToddW2

2[H]4U

1. Which disks make/manufacturer?

2. Are you passing the disks to ZFS for management or using hardware raid and passing the array through or?

Apr 2, 2015

#6,710

L

levak

Limp Gawd

ToddW2 said:
1. Which disks make/manufacturer?

Seagate Constellation ES.3 4TB, model ST4000NM0033-9ZM170

ToddW2 said:
2. Are you passing the disks to ZFS for management or using hardware raid and passing the array through or?

We are passing disks to ZFS for management. Disks -> SAS expander -> SAS HBA LSI SAS2308.

And we are only using one port on a JBOD.

Matej

Apr 2, 2015

#6,711

G

_Gea

Supreme [H]ardness

Your options are mainly

- check HBA firmware (P20 has problems, use P19 then)
- depends on OmniOS release - timeout handling on OmniOS 151012 is better than on previous releases,
so update, optionally wait some days as 151014 long term stable is expected next week (with update option from 151006 and up)

- it seems that one or some disks are causing the problem as the problem was not there from the beginning.
if you have some indications in logs or smartvalues, replace that disk(s) - best with the SAS Constellation ES3 (they cost quite the same as the Sata one)

in general
- prefer expanderless multiple HBA solutions with sata disks

Apr 2, 2015

#6,712

L

levak

Limp Gawd

I checked the firmware with lsiutil software and it is reporting version 15.00.00, so I guess quite old:

Current active firmware version is 0f000000 (15.00.00)
Firmware image's version is MPTFW-15.00.00.00-IT
LSI Logic
Not Packaged Yet
x86 BIOS image's version is MPT2BIOS-7.29.00.00 (2012.11.12)
EFI BIOS image's version is 7.22.01.00

I will try and update it to P19.

- I heard about 151014 release and that there should be some nice mpt_sas updates in as well... I will probably wait for another month for all the issues to rise and then do an upgrade. I could do a new BE and upgrade ASAP, but I'm affraid to in case something goes wrong

- it looks like one or more drives are misbehaving, but I can't figure out which. I have a bunch of errors like this:

Apr 2 09:23:15 storage.host.org scsi: [ID 107833 kern.notice] /pci@0,0/pci8086,3c02@1/pci1000,3040@0 (mpt_sas0):
Apr 2 09:23:15 storage.host.org Timeout of 0 seconds expired with 1 commands on target 68 lun 0.
Apr 2 09:23:15 storage.host.org scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,3c02@1/pci1000,3040@0 (mpt_sas0):
Apr 2 09:23:15 storage.host.org Disconnected command timeout for target 68 w500304800039d83d, enclosure 3
Apr 2 09:23:15 storage.host.org scsi: [ID 365881 kern.info] /pci@0,0/pci8086,3c02@1/pci1000,3040@0 (mpt_sas0):
Apr 2 09:23:15 storage.host.org Log info 0x31140000 received for target 68 w500304800039d83d.
Apr 2 09:23:15 storage.host.org scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc

How do I find out which device/drive is the one causing me problems?
If I do 'zpool status' or 'cfgadm -vla', I can't see 'w5003...' WWN drive, all drives have WWN in 'w5000....' format.
I would like to look at SMART stats, but Seagate smart reporting sux, since they don't report the actual number of errors. Example:

1 Raw_Read_Error_Rate 0x000f 068 063 044 Pre-fail Always - 6424418
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 092 060 030 Pre-fail Always - 1773170504
195 Hardware_ECC_Recovered 0x001a 062 003 000 Old_age Always - 6424418

Which parameters do you usually look at?
We currently have around 140TB of storage in 2 JBODs and will attach 2 more, so I can't imagine going without expanders, although I would love to

Also, new drives will be all SAS, but 3 years ago, when we bought the current JBODs, SAS drives were still very expensive...

Matej

Apr 2, 2015

#6,713

G

_Gea

Supreme [H]ardness

Your problem is target 68, mpt_sas 0

If you are using napp-it with the monitor extension, you find the target to WWN/ slot translation
in menu Disks >> SAS2 extension

Last edited: Apr 2, 2015

Apr 2, 2015

#6,714

L

levak

Limp Gawd

I'm not the admin of the iscsi server and the current admin prefers not to install napp-it.

Can I do it from CLI? Which tools do you use to get target number and mpt_sas number? I guess I could somehow produce the same result using CLI only.

MAtej

Last edited: Apr 2, 2015

Apr 2, 2015

#6,715

C

CopyRunStart

Limp Gawd

_Gea said:
1.
once started it will run

2.
replication extension, options see
http://napp-it.org/extensions/quotation_en.html

3.
enter the ip of the source server with its napp-it admin pw (not root pw)
some char are not allowed in your password, optionally try a pw from [A-Za-z0-9]

Thanks Gea, we just bought the Async extension. Napp-it has been great for us, we thank you a million times.

And you were right, we had a special character in the password which it didn't like.

Apr 4, 2015

#6,716

C

CopyRunStart

Limp Gawd

Any thoughts on a 4 TB difference between a host/target when using replication extension? On the server that has the original data, the Snapshot Refer shows 16TB, on the backup server it's only showing 12TB. I thought maybe the backup got interrupted somehow so I ran the job again but there was no change.

Apr 5, 2015

#6,717

G

_Gea

Supreme [H]ardness

Most probably

- you have additional snaps on source
the replication parameter -I includes them otherwise only the basic and last are used (default)

- replication decodes compress or dedup,
if source is not compressed but target (parent) is, then this can be the result

- suboptimal ZFS layout on source, example nonoptimal amount of 4k disks in a raid-Z
(would not give a 30% difference but can increase other effects)

Apr 6, 2015

#6,718

A

Aesma

[H]ard|Gawd

:hello:

Are all multiple TB drives the exact same size ?

When I built my only vdev I used all kinds of 2TB drives I had, in part because that's what I had and in part because I thought that way the size of the smallest one would be used on the others, so I can replace any drive with any 2TB I have in case of failure.

Was that necessary ? I've received 20 HGST NAS 4TB and plan a vdev with them, so I want to be sure I can replace them with 4TB WDs or 4TB Seagates if the need arises.

Apr 7, 2015

#6,719

C

CopyRunStart

Limp Gawd

_Gea said:
Most probably

- you have additional snaps on source
the replication parameter -I includes them otherwise only the basic and last are used (default)

Perhaps I'm miss-reading it, but isn't the 15.6 the size of full dataset, not including previous snapshots?

pool3d/U pool3d/U@1427812105_repli_zfs_xxx-frontend-02_nr_1 Tue Mar 31 10:29 2015 878M - 15.6T delete

_Gea said:
- replication decodes compress or dedup,
if source is not compressed but target (parent) is, then this can be the result

No dedup but compression is on both target and parent.

_Gea said:
- suboptimal ZFS layout on source, example nonoptimal amount of 4k disks in a raid-Z
(would not give a 30% difference but can increase other effects)

Ashift is showing 12 for all VDEVs on both servers.

Backup server is RAID-Z2, source is RAID 10 if that matters.

Apr 7, 2015

#6,720

G

_Gea

Supreme [H]ardness

Its hard to compare a snapsize with a filesystem without knowing all details.

What should happen.
If you transfer a filesystem without snaps, compress or dedup from one server
to another with same disks and pool layout, the size of source and newly created
target filesystem should be quite the same.

If some is different, result is different.
If you are in doubt that everything is transferred, create a filelist on both and compare.
But usually a zfs send that ends without an error is very trusty.

You must log in or register to reply here.

Share:

Reddit Pinterest Tumblr WhatsApp Email Link

Top