OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
pkg update -f -r --be-name=r151026
just creates a BE with this name and load the newest updates for 151022.

To update to a newer version, you must first unset the omnios repo for 151022 and set it to 151024 then update to 024
and then from 024 go to 030 with a change of repository again (maybe a jump from 022 to 030 is possible)
 

Captainquark

Weaksauce
Joined
Dec 14, 2011
Messages
97
Oh wow, sorry, that was a noobie one...
So I made it to r151028, but as predicted in your docs, SSH didn't start afterwards, so I logged into napp-it (which was on 19.01c1, so presumably new enough) and tried to "set defaults after 151028 update". It gave me a number of errors in the process, here's the outcome: SSH still does not start, but still hoping that you can help me make this work... Thanks!
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
To be honest. I would simply

- save /var/web-gui/_log/* (napp-it settings)
- write down users with their uid and optionally smbgroups
- export pool

- fresh install of 151030
- install napp-it per wget
- import pool
- re-create users and smb-groups
- restore nappit settings var/web-gui/_log/*

or with very newest napp-it Pro
- run a backup job
- reinstall
- restore users and settings in menu User > Restore

This is done in 30min.
 
Joined
Apr 3, 2011
Messages
62
For anyone interested, upgrading from R151028 to R151030 went without a hitch after removing all previous GCC packages.
 

Captainquark

Weaksauce
Joined
Dec 14, 2011
Messages
97
Again, thanks to _Gea for his invaluable help. I have now upgraded my NAS to OmniOS 151031 (bloody), and my Windows Client now connects using SMB 3.0.2, however, the problem persists. I hope the guys at the VLC Forum can help me further on this one.
 

sjalloq

n00b
Joined
Jun 20, 2011
Messages
54
Hi,

I've got an AiO setup with an Optane 900P and I'd like some advice on the best setup as I'm not getting the same performance as detailed in Gea's optane_slog_pool_performance.pdf.

I've got a Supermicro X9SCL with a E3-1240 V2. My OmniOS VM has 4 vCPUs, 16GB of RAM and an LSI2008 passed through. I'm using 3 WD Red Pro in a mirror. My ZFS setup seems to match the test setup and I have sync=always, readcache=all, recsize=128k, compr=off. But I'm getting the following randomwrite test result:

55325 ops, 1844.119 ops/s, (0/1844 r/w), 14.4mb/s, 3649us cpu/op, 0.5ms latency

which is less than half of that detailed in sec 4.2 of the pdf. Where should I start looking?

Shareef.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
You are using a 3 way mirror?
The config in 4.2 is a raid-0 from 4 disks.

With redundancy this is similar on writes to a 4 x mirror in a multi Raid-10 setup (8 disks).
 

sjalloq

n00b
Joined
Jun 20, 2011
Messages
54
Yes, a 3 way mirror. I was assuming these benchmarks would be small enough to be soaked up by the Optane. So am I seeing a negative effect from the underlying vdev?
 

sjalloq

n00b
Joined
Jun 20, 2011
Messages
54
Perhaps I'm missing something here but isn't the point of using Optane to decouple the slow spinny things behind it from the write data. I've got a 20G write cache so shouldn't that be able to soak up over 20G of writes before it fills and back pressures the incoming stream?
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
An Slog is "NOT" a write cache device, its an ultrafast protector for the rambased writecache.

Every write on ZFS goes to the rambased writecache. Its default size is 10% RAM/max 4GB. It collects small random writes and transforms them to large and fast sequential writes. This happens with sync enabled or disabled.

If you enable sync (without an Slog), ZFS creates a special area on the pool, called ZIL where it logs every commited single small write. This is done to the ZIL and not the pool as the pool itself would be negatively affected with fragmentation on small writes - ZIL is faster with such small random writes than the pool. But with disks even this trick is not enough to make sync writes really fast.

If you add an Slog, all the logging goes to the Slog. In case of an Optane, this is quite the fastest logging option. Be aware that all the data logged to the Slog is never read - beside the crash case where on next reboot all otherwise lost writes from the ramcache are done delayed.

Again: Slog is not write cache, writecache on ZFS is always and only RAM. Think of it as a similar part like cache+BBU on a hardware raid.
 

sjalloq

n00b
Joined
Jun 20, 2011
Messages
54
Every description I've read of the ZIL says it's a write cache. For example, this quote from ixsystems: "When synchronous writes are requested, the ZIL is the short-term place on disk where the data lands prior to being formally spread across the pool for long-term storage at the configured level of redundancy.".

Or this by servethehome: https://www.servethehome.com/exploring-best-zfs-zil-slog-ssd-intel-optane-nand/

Can you provide a link that says otherwise?
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
Basically a cache is a device that sits in the datapath between a slow and a fast device in a technical system. This is not completely untrue for an slog so the term cache is not completely wrong. But unlike the real rambased write cache that collects all small writes and writes them collected as a large and fast sequential write to the pool, a Zil or Slog is there to log all commited writes going to the writecache with the only reason to preserve them in case of a power outage. or crash where otherwise the content of the writcache is lost.. If it would be a real write cache all data would go through the Slog to the datapool. This is not the case as without a crash, the Zil or Slog is never read.

So no, using the term cache for an Slog is at least irritating. I would not call a logging device a cache as many people misunderstand the device then.

see the datapaths for example at
https://jrs-s.net/2019/05/02/zfs-sync-async-zil-slog/https://jrs-s.net/2019/05/02/zfs-sync-async-zil-slog/
 

sjalloq

n00b
Joined
Jun 20, 2011
Messages
54
OK, thanks. So there's a lot of badly worded descriptions out there then! But that really doesn't change my question. The Optane SLOG is removing the write latency associated with the ZIL so why is my randomwrite perfromance not as good? What is it about FileBench's randomwrite.f test that means the underlying vdev is causing a performance issue? Surely now that I'm using a SLOG I've decoupled the slow spinning disk from the write path? Unless the RAM based cache is overflowing?
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
Look at 3.5 of https://www.napp-it.org/doc/downloads/optane_slog_pool_performane.pdf

There I use a 4 disk raid-0 pool with 32 GB RAM what means around 3 GB rambased writecache (10%). With sync disabled the result of the randomwrite filebench is 72 MB/s. This is the result under this special filebench access pattern.

If you just enable sync without the Optane where sync logging goes to the onpool ZIL, performance go down to 1.6 MB/s. With the Optane as Slog performance goes up to 15 MB/s in this test sequence.

So the real important result is here, that randowwrite with sync enabled (writecache protection=on) is 10x faster with an Optane than without. If you would not need the security of sync and writecache protection, you can just disable sync, save the expense of the Optane with 4 x the performance as result compared to sync+slog (Optane).

Sync and Slog is not for performance, only for security. Sync must be always slower than unsync as you write all data sequentially over the writecache to pool and additionally per every small and quite slow commit to the Slog. Effectively you write the data twice.

Additionally on random write (small datablocks), you also need a read for every action (read metadata to determine where to place).

All this means that you can improve performance with a faster pool or more RAM than current 16GB that you can use as writecache (currently around 1.6 GB) and readcache (currently maybe 10 GB)

About size of writecache
The cache is filled up for about half of its size then flushed to datapool while the other half collects next writes.The pool must be fast enough to write the cache content sequentially as a large write in a shorter time than the second half of the cache is filled or you see a performance degration.

Writes become faster with size of data. Small writes ex 8k are ultra slow while say up from a few Megabytes there is no further improvement.
 
Last edited:

davewolfs

Limp Gawd
Joined
Nov 7, 2006
Messages
333
Does it make sense to migrate to ZOL. It seems like this is what is being actively developed on now with Delphix moving away from Illumos.

OmniOS/Nappit has been extremely stable - just works. But wondering if Linux is the way to go.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
Linux is a general use mainstream platform with much more developpers available than on the Unix platforms OSX, Free-BSD or Illumos. You cans see it with some Open-ZFS features like encryption (that was nearly ready on Illumos but finally finished from Datto on ZoL) or trim (there was a pool trim on Free-BSD and Illumos_Nexenta but the ZoL version seems to become standard) or ZFS allocation classes/ special vdevs for metadate, small blocks or dedup.

This does not mean that there is no further development for Open-ZFS on Free-BSD or Illumos or that these features are ZoL exclusive. It only means that the Open-ZFS platform in general is moving faster, getting better and is on the way to be superiour even to a native Oracle Solaris ZFS that is/was the fastest and most feature rich ZFS. Whats no longer the case is that Illumos is the main and only Open-ZFS development platform as it was in the past.

But every platform hast its special use case where it is superiour. Mac ex on usability or media creation, Illumos because of "it just works, some superiour features ex with the kernelbased SMB/NFS or Comstar FC/iSCSI.

ZoL is superiour when you need more than just storage. There is not problem or hardware without a or better a douzen Linux solutions or distributions.

If you just need ultrastable minimalistic storage (FC/iSCSI, NFS and SMB) without hassles on setup or every update, I prefer OmniOS/ OI over any other solution. Feature wise, new ZFS features does not matter on what OS they are first published are available on Illumos without or with a short delay, see ZFS encryption, vdev remove, systemwide snappoints, pool trim, special vdev etc that are already in Illumos.
 

davewolfs

Limp Gawd
Joined
Nov 7, 2006
Messages
333
Thanks for that detailed explanation Gea.

Do you know if OmniOS will support Optane Via passthrough In ESXi?

I have an S3700 as my current slog and was thinking of replacing it with a P4801x. Not sure if there will be a large difference between the 100MB P4801x or a 900p. The former seems to support PLP at the expense of slightly slower write performance. But not sure how much that matters with real world performance.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
Optane 900P works for me on ESXi 6.7u3 with OmniOS 151030.
If you see problems, send a mail to illumos-discuss. They are currently working on NVMe to add NVMe hotplug.

I share your opinion about 4801x vs p900
I expect a very good plp behaviour of the 900p. For a production machine, you need the confirmation from Intel.
 

sjalloq

n00b
Joined
Jun 20, 2011
Messages
54
Hi, I've been trying to understand the ESXi hot snap feature of napp-it and now have it working after some experimenting. What I'd like to do now is use zfs send to back up those snapshots but I can't see an option in the napp-it UI. Is there one or do I need to script this myself?
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
Create a replication job to send/sync a filesystem via zfs send (Jobs > Replicate > Create) with a backup disk.
 

sjalloq

n00b
Joined
Jun 20, 2011
Messages
54
(Jobs > Replicate > Create)
Great, couldn't see the wood for the trees there. So that seems to work nicely and I'm now getting the snapshot created and backed up.

At the moment my esxi VM store is a basic pool of 1 SSD. If I want to change that in the future, is the process I need to follow: delete the current pool of 1 disk; create a new pool of whatever; zfs receive snap?
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
Much easier

Use menu Disks > Add to create a mirror from a basic disk or extend a n-way mirror to n+1
Use menu Disks > Remove to remove a disk from a mirror (or remove an Slog/L2Arc)
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
About ZFS Allocation Classes on OmniOS

I have made some performance benchmarks,
http://napp-it.org/doc/downloads/special-vdev.pdf

I am really impressed about the result as this allows to use a slow disk pool where you can decide per ZFS filesystem based on the "recsize" vs "special_small_blocks" settings that data of this filesystem land on the special vdev ex an Intel Optane.

btw
Special vdevs should be removable. When I test this, the OS crashed and the pool seems damaged.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
Update

Dedup and special vdevs are removable vdevs. This works only when all vdevs in the pool
have the same ashift setting, ex ashift =12, best for 4k disks.

At least in current Illumos there is a problem that a pool crashes (corrupted) when you try to remove a special vdev from a pool with different ashift settings, ex a pool with ashift=12 vdevs and a special vdev with ashift=9. In current napp-it-dev I therefor set ashift=12 instead "auto" as default to create or extend a pool.

If you want to remove a special or dedup vdev, first check the ashift setting of all vdevs (menu Pool, click on the datapool). I have send a mail to illumos-dev and hope that this bug is solved prior next OmniOS stable.

If you create or extend a pool, I suggest to care about same ashift. When you try to remove a regular vdev (ex basic, mirror) from a pool and vdev is different then it stops with a message that this cannot be done due different ashift settings (but no crash like with special vdevs).
 

ARNiTECT

n00b
Joined
Aug 4, 2012
Messages
43
I have been pretty happy with our server so far, it is due for an upgrade, but I hoped not right now.

Its all become a bit of a mess.

I logged into napp-it on Friday and noticed one of the drives had been removed. I assumed it was dead, so I turned off the server, removed the drive and fitted a new one.
The server is an all-in-one with the main pool as RAID-10 (4 sets of 2x 3TB)
The resilvering started, but on completion restarted again saying "too many errors". The drive it was resilvering from, apparently has errors. The resilvering has finished, but it still reports errors.

Should I put the old 'removed' HDD back in to see if it can recover the files from this? if so, how do I do this?

Here is the report:


pool: tank
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: resilvered 2.12T in 19h41m with 20 errors on Sun Oct 20 21:48:23 2019
config:
NAME STATE READ WRITE CKSUM CAP Product /napp-it IOstat mess
tank DEGRADED 36 0 0
mirror-0 ONLINE 0 0 0
c3t50014EE0035E4708d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68A S:0 H:0 T:0
c3t50014EE0035E6499d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68A S:0 H:0 T:0
mirror-1 ONLINE 0 0 0
c3t50014EE0035E6F58d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68A S:0 H:0 T:0
c3t50014EE2B7DD75CAd0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68E S:0 H:0 T:0
mirror-2 DEGRADED 36 0 12
c3t50014EE0AE090EC9d0 DEGRADED 37 0 19 too many errors 3 TB WDC WD30EFRX-68A S:0 H:865 T:771
replacing-1 DEGRADED 55 0 0
c3t50014EE603E4F695d0 REMOVED 0 0 0
c3t50014EE264F37380d0 ONLINE 0 0 55 3 TB WDC WD30EFRX-68E S:0 H:0 T:0
mirror-4 ONLINE 0 0 0
c3t50014EE20D866CA5d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68E S:0 H:0 T:0
c3t50014EE2B33F3F60d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68A S:0 H:0 T:0
logs
c3t5001517BB287C992d0 ONLINE 0 0 0 24 GB INTEL SSDSA2VP02 S:0 H:0 T:0
cache
c3t5E83A97C9E1F7B12d0 ONLINE 0 0 0 128 GB OCZ-VERTEX4 S:0 H:0 T:0
errors: 9 data errors, use '-v' for a list


I entered into the cmd box "zpool status -v" and the files are ...delta.vmdk files from many snapshots I was not able to consolidate.


errors: Permanent errors have been detected in the following files:
/tank/xxx/xxx/xxx-000017-delta.vmdk
/tank/xxx/xxx/xxx_14-000016-delta.vmdk
/tank/xxx/xxx/xxx_11-000016-delta.vmdk
/tank/xxx/xxx/xxx-000003-delta.vmdk
/tank/xxx/yyy/yyy-000001-delta.vmdk
/tank/xxx/xxx/xxx-000004-delta.vmdk
/tank/xxx/xxx/xxx-000014-delta.vmdk
/tank/xxx/xxx/xxx-000016-delta.vmdk
tank/xxx@2016.09.09.10.25.30_before_acl_reset:/xxx/xxx-000001-delta.vmdk


The VM "xxx" with file errors is SBS 2011 (old I know). "xxx-000017-delta.vmdk" is the current snapshot and appears to be running, I have left it on, but to reduce the load on the drives I have disabled the nightly backup from within SBS 2011 for tonight (very little data added this weekend).

If I cant recover files from the 'Removed' HDD, what other options are there?


I have another empty All-in-One napp-it server I haven't used in years, it should have enough storage, but only has 16Gb memory, so I could start this up as a temporary server if that would help? I will have to investigate how to set this up with the same configuration as the existing server.

I have backups from within the SBS2011 VM, so I could hopefully rebuild the VM from this on the other server. This was one idea I had a while ago for getting around my problem with not being able to consolidate the many VM snapshots. I could then strip the current server for parts to build a better All-in-One.

I plan to migrate from Exchange 2010 to Office 365 and SBS2011 to some other suitable file server, but I keep putting it off. Is perhaps now the time to do this? I assumed it would be best to get the server stable first, rather than migrate from the degraded pool.

...Will start up the old server in the morning.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
There are mainly three aspects

1. pool consistency
In mirror-2 you have one disk with status online, one removed and one with too many errors

I would expect the removed disk as dead and the disk with too many errors as bad.
Usual solution: use another disk to replace removed -> new

Unplug the two trouble disks and check them externally ex via WD data lifeguard and an intensive test. Depending on the result (ex repaired bad blocks), try to re-use or better bin them.

Remember: in a ZFS pool all vdevs are striped in a raid-0.
A vdev lost means pool lost.

btw
A resilver creates a mirror between the new and the failed disk. Like with any mirror, you can use Disk > Remove to remove one of the mirrorred disks. The too many errors message can be cleared by a zpool clear (menu Pools) but this does not repair a disk.


2. data consistency
ZFS can detect corrupt files due data and metadata checksums. This is absolutely trustworthy what means that the file with checksum errors is definitely damaged

There is no other solution beside delete and restore from backup (or re-create the VM).


3. ESXi snapshots
ESXi snapshots are based on delta files. They grow massively over time. You should use only single ESXi snapshots for a short time, say one or two days. Never keep them for a long time and newer use many of them. Use ZFS snaps instead.
 
Last edited:

ARNiTECT

n00b
Joined
Aug 4, 2012
Messages
43
Hi Gea, thanks for getting back to me so quickly.

The read errors are now at 795 (checksum still 19), so I'm running out of time.

1. The report lists 3 drives in Mirror-2, I had already physically removed the dead drive and replaced with new. I will remove the other bad drive and scan both with WD Data Lifeguard, I obviously hope they can be repaired, but I am pursuing with setting up the temporary server. I do have another spare new HDD to replace the 2nd bad drive. If all goes to plan, I will wipe the pool, recreate a new VM and rebuild from the same backup.

2. A lot of work to do then. I need to get this second server up and running.

3. With the snapshots, I realised the mistake I'd made too late.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
Update

Dedup and special vdevs are removable vdevs. This works only when all vdevs in the pool
have the same ashift setting, ex ashift =12, best for 4k disks.

At least in current Illumos there is a problem that a pool crashes (corrupted) when you try to remove a special vdev from a pool with different ashift settings, ex a pool with ashift=12 vdevs and a special vdev with ashift=9. In current napp-it-dev I therefor set ashift=12 instead "auto" as default to create or extend a pool.

If you want to remove a special or dedup vdev, first check the ashift setting of all vdevs (menu Pool, click on the datapool). I have send a mail to illumos-dev and hope that this bug is solved prior next OmniOS stable.

If you create or extend a pool, I suggest to care about same ashift. When you try to remove a regular vdev (ex basic, mirror) from a pool and vdev is different then it stops with a message that this cannot be done due different ashift settings (but no crash like with special vdevs).
Just to complete for those using ZoL
https://github.com/zfsonlinux/zfs/issues/9363
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
Some more insights about the new ZFS Allocation Classes feature
http://napp-it.org/doc/downloads/special-vdev.pdf

1. About Allocation Classes
2. Performance of a slow diskbased pool
3. With special vdev (metadata only)
4. With special vdev (for a single filesystem)
5. With special vdev (for a single filesystem) and Slog (Optane)
6. Performance of a fast diskbased pool
7. Fast diskbased pool vwith special vdev
8. NVMe Pool vs special vdev (same NVMe)
9. Compare Results
10. Conclusion
11. When is a special vdev helpful
12. When not
13. General suggestions
 

ARNiTECT

n00b
Joined
Aug 4, 2012
Messages
43
I have rebuilt my server on different hardware and its up and running.
On my old server Mirror-2 is still degraded, but the pool data is still accessible and I plan to copy off the data from the degraded Pool.
Should I leave it as it is (see below) and continue to copy off the data, or can I tidy up the degraded vDev mirror (unavail, replacing etc), accepting that there are corrupt files?

Here is how I got to where I am:
  • I first physically removed the drive that was marked as 'Removed' from Mirror-2 (c3t50014EE603E4F695d0) and replaced with a new drive (c3t50014EE264F37380d0).
  • The mirror drive (c3t50014EE0AE090EC9d0) then started to show errors during resilvering, which it completed after it's second attempt with errors.
  • I scanned the first 'Removed' drive (c3t50014EE603E4F695d0) in WD Data Lifeguard and it found bad blocks and confirmed it repaired all of them.
  • I then removed the 2nd faulty mirror drive (c3t50014EE0AE090EC9d0), scanned in WD Data Lifeguard, which confirmed it was irreparable and is ready for the bin.
  • I installed the repaired 'Removed ' drive (c3t50014EE603E4F695d0) and it completed resilvering.
(c3t50014EE0AE090EC9d0) binned
(c3t50014EE603E4F695d0) checked in WD Data Lifeguard and re-used
(c3t50014EE264F37380d0) new

Once I have copied off the data, I plan to strip the server for parts and build a new one.


pool: tank
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: resilvered 2.12T in 14h46m with 3 errors on Mon Oct 28 11:43:24 2019
config:

NAME STATE READ WRITE CKSUM SLOT CAP MODELL
tank DEGRADED 3 0 0
mirror-0 ONLINE 0 0 0
c3t50014EE0035E4708d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68A
c3t50014EE0035E6499d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68A
mirror-1 ONLINE 0 0 0
c3t50014EE0035E6F58d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68A
c3t50014EE2B7DD75CAd0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68E
mirror-2 DEGRADED 6 0 0
c3t50014EE0AE090EC9d0 UNAVAIL 0 0 0 cannot open
replacing-1 ONLINE 0 0 6
c3t50014EE603E4F695d0 ONLINE 0 0 6 3 TB WDC WD30EFRX-68E
c3t50014EE264F37380d0 ONLINE 0 0 6 3 TB WDC WD30EFRX-68E

mirror-4 ONLINE 0 0 0
c3t50014EE20D866CA5d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68E
c3t50014EE2B33F3F60d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68A
logs
c3t5001517BB287C992d0 ONLINE 0 0 0 24 GB INTEL SSDSA2VP02
cache
c3t5E83A97C9E1F7B12d0 ONLINE 0 0 0 128 GB OCZ-VERTEX4

errors: 2 data errors, use '-v' for a list​
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
Delete the corrupted files and copy off the rest.

If you may want to repair the pool
- delete the corrupted files and rerun a scrub

repair mirror-2
- remove the disk c3t50014EE0AE090EC9d0
optionally clear error
 

ARNiTECT

n00b
Joined
Aug 4, 2012
Messages
43
Thanks _Gea.

After I deleted the corrupt files, it resilvered, but it also removed disk c3t50014EE603E4F695d0 for some reason.
I removed the unavailable disk c3t50014EE0AE090EC9d0
now Mirror-2 is gone and drive c3t50014EE264F37380d0 is on its own

Disk c3t50014EE603E4F695d0 is available to add, but it will not allow me to add it to disk c3t50014EE264F37380d0 to recreate a mirror vdev. Is this possible now?


pool: tank
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(5) for details.
scan: resilvered 2.08T in 15h49m with 0 errors on Fri Nov 1 10:02:51 2019
config:

NAME STATE READ WRITE CKSUM CAP MODELL
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
- c3t50014EE0035E4708d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68A
- c3t50014EE0035E6499d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68A
mirror-1 ONLINE 0 0 0
- c3t50014EE0035E6F58d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68A
- c3t50014EE2B7DD75CAd0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68E
c3t50014EE264F37380d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68E
mirror-4 ONLINE 0 0 0
- c3t50014EE20D866CA5d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68E
- c3t50014EE2B33F3F60d0 ONLINE 0 0 0 3 TB WDC WD30EFRX-68A
logs
- c3t5001517BB287C992d0 ONLINE 0 0 0 24 GB INTEL SSDSA2VP02
cache
- c3t5E83A97C9E1F7B12d0 ONLINE 0 0 0 128 GB OCZ-VERTEX4

errors: No known data errors

-----------------------------------
add disk [ ] c3t50014EE603E4F695d0 - ATA WDC WD30EFRX-68E 3 TB

to basic/ mirrored vdev
[ ] c2t0d0 rpool basic VMware Virtual disk 37.6 GB
[ ] c3t5E83A97D7DAA481Bd0 tank2 basic ATA OCZ-VERTEX4 128 GB
-----------------------------------

....I've been reading the results of your previous post on 'ZFS Allocation Classes feature', which is helpful for specifying my new all-in-one server.
 
Last edited:

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
You can remove a disk from a mirror, resulting in a basic vdev.
You can also add a disk to a basic or mirror vdev to create a mirror or n+1 way mirror.

Can you try a "ZFS Filesystem > Delete ZFS Buffer" (in case the ZFS information is not up to date)
 

ARNiTECT

n00b
Joined
Aug 4, 2012
Messages
43
I tried "ZFS Filesystem > Delete ZFS Buffer"

it says:
processing, please wait..
ZFS properties are buffered on a Pro edition.
They are now reloaded manually, please wait.
(I don't have Pro)

then:
on problems with buffering=on, you can reload list with menu ZFS folder - reload
Size example 1T means 1 TiB
(I cant find a menu "ZFS folder - reload")

I restarted the server

Unfortunately, I still can't add the disk to the basic vdev to make mirror.
 

ARNiTECT

n00b
Joined
Aug 4, 2012
Messages
43
The server is currently:
OmniOS v11 r151018
ESXi 5.5.0 (4179633)

My other server I recovered with backups has the same CPU (Xeon X3450) with ESXi 6.5.0 U2 (8294253), which is the last compatible release and OmniOS v11 r151026c.

So I could upgrade to r151026c, but I'd rather not have to, if not necessary.

I'll look into the command line option, thanks for link.

Edit: using the command:
"zpool attach tank c3t50014EE264F37380d0 c3t50014EE603E4F695d0"
it worked! and is now resilvering.
it is strange the menu option didn't work.
 
Last edited:

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,903
OmniOS 151018 is VERY old. Propably the napp-it release is as old.

Current Open-ZFS and OmniOS 151032 is a huge improvement (Encryption, trim, special vdev, fast resilver, removeable vdev, system checkpoints, smb3 among others).

https://github.com/omniosorg/omnios-build/blob/r151032/doc/ReleaseNotes.md
(propably next week)

You can do a clean install of OmniOS then, install Open-VM tools, install napp-it, add users like now, restore /var/web-gui/_log/* for napp-it settimgs and import data pool to be up to date.
 
Top