OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

jb33 · May 2, 2013

I'm moving my target replication node tomorrow to our VPN-connected satellite office and will have to change the target-node’s IP address for the new local subnet. Will the existing Napp-it replication extension jobs continue to run?

thanks,
jb

tobiasl · May 3, 2013

thx for the help all

_Gea · May 3, 2013

jb33 said:
I'm moving my target replication node tomorrow to our VPN-connected satellite office and will have to change the target-nodes IP address for the new local subnet. Will the existing Napp-it replication extension jobs continue to run?

thanks,
jb

yes, on target machine (napp-it 0.9)
goto extension - appliance group and
- delete old source
- add new one with new ip

goto menu jobs and click on jobid
- edit ip

jb33 · May 3, 2013

Thanks Gea.

Source IP unchanged. Target IP has changed. Current source appliance status (as read in target appliance-group) is 'remote call: timeout.' manual nc send/receive test was succesful across vpn.

Confirm delete source from appliance-group and re-add?

I tried to add before deleting and received error that the box is not a napp-it appliance. As a test, I tried to add the target to an appliance group on the source and received the same error in case there's a chance my firewall is blocking necessary comms.

thanks,
jb

Rectal Prolapse · May 3, 2013

That's great news!

The PSU I am using in my Norco 4224 is a Corsair HX650 I think. Or something like that. I used the cables that are standard with the PSU - no more than 4 SATA power connectors per rail. Each backplane has one power connector for 4 drives. I use 4 of the modular SATA power cables to power the 6 backplanes. No problems!

But the HX650 is kind of a pricey PSU - no less than $120 usually.

EDIT: My M1015s by default stagger the spinup of each hdd so that must help a lot.

MistrWebmastr said:
If I could hug you I would. I completely removed the new pool and the entire original array came online. It's absolutely I overloaded the molex expander from Norco. Now I just need to figure out a way to power all 6 backplanes without the use of expanders. Or else, not chain all of them off a massive one like I did. Got any tips? This is the PSU my server is running (since it's damn near impossible to find a generic ATX server-grade PSU): http://www.amazon.com/gp/product/B00284AJ1G/ref=wms_ohs_product?ie=UTF8&psc=1

_Gea · May 4, 2013

jb33 said:
Thanks Gea.

Source IP unchanged. Target IP has changed. Current source appliance status (as read in target appliance-group) is 'remote call: timeout.' manual nc send/receive test was succesful across vpn.

Confirm delete source from appliance-group and re-add?

I tried to add before deleting and received error that the box is not a napp-it appliance. As a test, I tried to add the target to an appliance group on the source and received the same error in case there's a chance my firewall is blocking necessary comms.

thanks,
jb

Appliance group is needed for communication. (place a key on both sides).
If this fails, you may have a firewall. Delete/readd is independant from job settings.

You must open port 81 for base comunication and the replication port from job settings.

jb33 · May 4, 2013

_Gea said:
Appliance group is needed for communication. (place a key on both sides).
If this fails, you may have a firewall. Delete/readd is independant from job settings.

You must open port 81 for base comunication and the replication port from job settings.

81 is open - ran a netcat port scan! handy little feature! - and I've confirmed all traffic is allowed between hosts. I think I've got a host routing issue but I'm not sure what it is yet. Both hosts can ping one another, resolve eachother's hostname with an answer from our domain dns box and I've got 2 manual netcat send/receive jobs running now.

But on either, if I run a traceroute to the other the trace does not complete. Trace from a windows box sitting next to it and it completes.

LAN 1.............................LAN 2
Nappit0 -TRACEROUTE->Nappit1: Fail
WIN----------TRACERT----->Nappit1: Success

jb33 · May 4, 2013

hmm, maybe that's not it. ICMP Traceroute completes between both nappit boxes... Do the hosts "register" any Route/MAC or IP info about each other when the appliance group is first established?

_Gea · May 4, 2013

jb33 said:
hmm, maybe that's not it. ICMP Traceroute completes between both nappit boxes... Do the hosts "register" any Route/MAC or IP info about each other when the appliance group is first established?

Part of the group and the job settings are hostname and ip. Hostname is also part of snapnames.
If you only change ip, it is enough to delete/recreate the goup and edit jobsettings for new ip.

Can you create a new replication (can be a smaller one) for testing?

What you can check:
Is the gateway (IP setting) correct when routing or are you in the same network via VPN?
Check also default route settings.

stevebaynet · May 4, 2013

Good morning,

So i have a napp-it "all in one" running omnios. its primary use is to back up my nexenta boxes. I am first trying "auto-sync" which automates zfs-send/rec.

I created a pool on napp-it called "bakpool"

I am backing up the folder "infra" from nexenta box

When the job is complete, in napp-it GUI ZFS shares i now see /bakpool/infra and the nexenta box says the job finished correctly. however, napp-it (and shell) says /bakpool/infra does not exist! (but i see the space used up)

Here is the properties shown via napp-it of the ZFS share /bakpool/infra

Am i not seeing anything because it is not mounted? do i need to mount it read only to another folder or share? Any suggestions to accomplish this?

I'd prefer to make this work before i give up and replicate direct from the napp-it box via zrep.

stevebaynet · May 4, 2013

stevebaynet said:
Am i not seeing anything because it is not mounted? do i need to mount it read only to another folder or share? Any suggestions to accomplish this?

Trying to answer my own question. so i tried:

zfs mount bakpool/infra

To which i get: cannot mount 'bakpool/infra': 'canmount' property is set to 'off'

You cant see it in the image, but here is the canmount property from the same page:

bakpool/infra canmount off local

can i just set this to on? (i assume i would also want to set this to read only?)

MistrWebmastr · May 4, 2013

So I am working on trying to get my backup server back up and running, but in the meantime I would like to prevent any additional data loss on my primary array. Now that we've ruled it a power issue, is there a way I can clear the ZFS checksum errors by forcing a resliver? I checked the drives that are checksuming, and I'm getting 0 soft, hard, transport, media, devcei not ready, recoverable, illegal, and predictive failure analysis errors. I'm thinking the checksum issue was due to the insufficient power to the drives and they wrote garbage. My array is online at this point, but I've got 2 degraded drives in one RAIDZ2, and an UNAVAIL (the only actual failing disk before I had the power supply issue) and a drive with 47 checksum errors in the other RAIDZ2.

The drives to get my backup server up and running won't get here until Tuesday so I'd like to ensure I can recover at least one of the drives on each array so there's some redundancy left just in case.

This is the output of my zpool status right now:

Code:

  pool: data
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 1.96M in 0h0m with 0 errors on Wed May  1 21:37:51 2013
config:

        NAME                       STATE     READ WRITE CKSUM
        data                       DEGRADED     0     0     0
          raidz2-0                 DEGRADED     0     0     0
            c3t5000CCA228C2DD95d0  ONLINE       0     0     0
            11292073787090335211   UNAVAIL      0     0     0  was /dev/dsk/c3t5000CCA228C2E47Ed0s0
            c3t5000CCA228C33703d0  ONLINE       0     0     0
            c3t5000CCA228C34508d0  ONLINE       0     0    47
            c3t5000CCA228C349FAd0  ONLINE       0     0     0
            c3t5000CCA228C35CBFd0  ONLINE       0     0     0
          raidz2-1                 DEGRADED     0     0     0
            c3t5000CCA228C34321d0  ONLINE       0     0     0
            c3t5000CCA228C34EB9d0  ONLINE       0     0     0
            c3t5000CCA228C3542Ad0  DEGRADED     0     0    55  too many errors
            c3t5000CCA228C35875d0  ONLINE       0     0     0
            c3t5000CCA228C35A4Ad0  DEGRADED     0     0    60  too many errors
            c3t5000CCA228C35CD2d0  ONLINE       0     0     0

errors: No known data errors

_Gea · May 4, 2013

MistrWebmastr said:
So I am working on trying to get my backup server back up and running, but in the meantime I would like to prevent any additional data loss on my primary array. Now that we've ruled it a power issue, is there a way I can clear the ZFS checksum errors by forcing a resliver?

I would clear errors:
zpool clear data

followed by a scrub to verify/repair checksum errors

_Gea · May 4, 2013

stevebaynet said:
Trying to answer my own question. so i tried:

zfs mount bakpool/infra

To which i get: cannot mount 'bakpool/infra': 'canmount' property is set to 'off'

You cant see it in the image, but here is the canmount property from the same page:

bakpool/infra canmount off local

can i just set this to on? (i assume i would also want to set this to read only?)

You should be able to mount when you set canmount to on

MistrWebmastr · May 4, 2013

_Gea said:
I would clear errors:
zpool clear data

followed by a scrub to verify/repair checksum errors

I assume then that this is normal (since Time Slider has been having fun making snapshots while the disks were in a degraded state)? Once the scan finishes it should be back online?

Code:

  pool: data
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub in progress since Sat May  4 11:20:03 2013
    4.01G scanned out of 26.5T at 7.52M/s, (scan is slow, no estimated time)
    444K repaired, 0.01% done
config:

        NAME                       STATE     READ WRITE CKSUM
        data                       DEGRADED     0     0     0
          raidz2-0                 DEGRADED     0     0     0
            c3t5000CCA228C2DD95d0  ONLINE       0     0     0
            11292073787090335211   UNAVAIL      0     0     0  was /dev/dsk/c3t5000CCA228C2E47Ed0s0
            c3t5000CCA228C33703d0  ONLINE       0     0     0
            c3t5000CCA228C34508d0  DEGRADED     0     0   138  too many errors  (repairing)
            c3t5000CCA228C349FAd0  ONLINE       0     0     0
            c3t5000CCA228C35CBFd0  ONLINE       0     0     0
          raidz2-1                 DEGRADED     0     0     0
            c3t5000CCA228C34321d0  ONLINE       0     0     0
            c3t5000CCA228C34EB9d0  ONLINE       0     0     0
            c3t5000CCA228C3542Ad0  DEGRADED     0     0   169  too many errors  (repairing)
            c3t5000CCA228C35875d0  ONLINE       0     0     0
            c3t5000CCA228C35A4Ad0  DEGRADED     0     0   194  too many errors  (repairing)
            c3t5000CCA228C35CD2d0  ONLINE       0     0     0

errors: No known data errors

The drives since booting (when checksum showed 0) shows the following with iostat:

c3t5000CCA228C34508d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: Hitachi HDS5C303 Revision: A5C0 Serial No:
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t5000CCA228C3542Ad0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: Hitachi HDS5C303 Revision: A5C0 Serial No:
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t5000CCA228C35A4Ad0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: Hitachi HDS5C303 Revision: A5C0 Serial No:
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

jb33 · May 4, 2013

_Gea said:
Part of the group and the job settings are hostname and ip. Hostname is also part of snapnames.
If you only change ip, it is enough to delete/recreate the goup and edit jobsettings for new ip.

Can you create a new replication (can be a smaller one) for testing?

What you can check:
Is the gateway (IP setting) correct when routing or are you in the same network via VPN?
Check also default route settings.

I seem to have a handful of issues. The big one is my target-side firewall [sonicwall TZ210] can't seem to handle the traffic. CPU spikes to 100% from about 5% as soon as I kick off a manual replication and latency goes from <30ms to >2000ms!

During a manual remote replication, the target appliance group reports the source appliance is offline. Once I kill the manual replication job, the target appliance group reports the source as online with a status of 'remote call: timeout.' Now, if I try to add the target to the source appliance group (just for testing) I get the error that the host is not a nappit appliance.

Do you know what would throw those two messages: 'remote call:timeout' and 'not a nappit appliance?' Other than the SonicWall FAIL under load I think the networking checks out. Here are relevant outputs.

root@nappit0:~# ping nappit1
nappit1 is alive

root@nappit0:~# traceroute -I nappit1
traceroute to nappit1 (192.168.1.184), 30 hops max, 40 byte packets
1 * * *
2 nappit1.keslerassociates.local (192.168.1.184) 29.133 ms 32.935 ms 46.147 ms

root@nappit0:~# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 6
inet 192.168.200.186 netmask ffffff00 broadcast 192.168.200.255
ether 0:2:b3:d8:d:f4
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
inet6 ::1/128
e1000g0: flags=20002000840<RUNNING,MULTICAST,IPv6> mtu 1500 index 6
inet6 ::/0
ether 0:2:b3:d8:d:f4

root@nappit0:~# netstat -nr

Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ---------- ---------
default 192.168.200.9 UG 8 2396765
127.0.0.1 127.0.0.1 UH 2 1406 lo0
192.168.200.0 192.168.200.186 U 255 1495 e1000g0

Routing Table: IPv6
Destination/Mask Gateway Flags Ref Use If
--------------------------- --------------------------- ----- --- ------- -----
::1 ::1 UH 2 548 lo0

_Gea · May 4, 2013

timeout: ping ok, but no answer within 60s on port 81
not a napp-it appliance: ping ok but cannot connect port 81

jb33 · May 4, 2013

seems like it's open, see the port scans in both directions below. I clicked the ZFS link for the appliance in the t-AG and see this corresponding output in the jobs monitor. Anything in the fact that the host_ip=nappit1? By the way, there is no question of me buying the extension on Tuesday. No way I want to try and script this myself!

-> grouplib_ask_remote_zfslist 1379
nc -w 60 192.168.1.184 81
do=request_zfslist&hostname=nappit0&host_ip=nappit1

Port Scans:
root@nappit1:~# nc -vzu 192.168.200.186 81
Connection to 192.168.200.186 81 port [udp/*] succeeded!

root@nappit1:~# nc -vz 192.168.200.186 81
Connection to 192.168.200.186 81 port [tcp/*] succeeded!

root@nappit0:~# nc -vzu 192.168.1.184 81
Connection to 192.168.1.184 81 port [udp/*] succeeded!

root@nappit0:~# nc -vz 192.168.1.184 81
Connection to 192.168.1.184 81 port [tcp/*] succeeded!

_Gea · May 4, 2013

did you get the same with hostname (nappit1) instead of ip?

jb33 · May 4, 2013

Good question! But yes, in both directions.

root@nappit0:~# ping nappit1
nappit1 is alive
root@nappit0:~# nc -vzu nappit1 81
Connection to nappit1 81 port [udp/*] succeeded!
root@nappit0:~# nc -vz nappit1 81
Connection to nappit1 81 port [tcp/*] succeeded!

root@nappit1:~# ping nappit0
nappit0 is alive
root@nappit1:~# nc -vzu nappit0 81
Connection to nappit0 81 port [udp/*] succeeded!
root@nappit1:~# nc -vz nappit0 81
Connection to nappit0 81 port [tcp/*] succeeded!

I noticed an extra 1 in the minilog when trying to add an appliance as a test on the source server. any chance that's getting passed to the connection on port 81 so that the ping is good but connect-81 fails? I did upgrade since first adding the appliance. Running 9b1.

minilog
exe: ping 192.168.200.186 1192.168.200.186 is alive

Sorry I took a while to respond. SonicWall Support got me mostly sorted - just turn off everything you paid extra for! Getting about 20Mbps throughput. Thought I'd double down and try concurrent replications. Pushed it up to ~35Mbps but then the firewall took a dump 12GB into my transfer

Trying again with just one stream.

jdk · May 4, 2013

Any advice on enabling jumbo frames on Openindiana with a server running broadcom (bnx) NICs? Setting the MTU via dladm yields unsupported operation, so that pretty much only leaves editing /kernel/drv/bnx.conf as the only option on the table, but I think that would affect both NICs which I really don't want, I just want jumbo frames on bnx1.

Also does anyone have an general NFS tuning tips for OpenIndiana? Only getting about 20 MB/s writes on a 6 spindle raidz2 with a dedicated 15krpm log.

_Gea · May 5, 2013

You do not set a fixed MTU value in the driver config file buzt the allowed range.
You set this then on a per link value

I suppose you use secure unbuffered sync writes with NFS (with a dedicated ZIL log).
Now turn off sync and recheck performance.

Even if you put in the best ZIL (ex ZeusRam) you are far below this value.
If you add a good SSD Zil with supercap, values are a fraction like 1/3 or worser of a ZeusRam.

Using a spindle disk is not totally but mostly useless.

Stanza33 · May 5, 2013

jdk said:
Also does anyone have an general NFS tuning tips for OpenIndiana? Only getting about 20 MB/s writes on a 6 spindle raidz2 with a dedicated 15krpm log.

YES,

Dump the 15k drive as a Zil and use a SSD

Then read here

http://utcc.utoronto.ca/~cks/space/blog/solaris/SolarisNFSServerTuning

.

jb33 · May 5, 2013

Alright! Got my first 92GBs snapshots from Friday manually WAN replicated! Looks like there are 200GBs more as of this morning (Sunday) which means that at ~20Mbps I should just be caught up before the doors open Monday. Phew.

Had to wake up at 3 in the morning to kick off my 2nd stream so I'm super keen to get the appliance group happy. What's the significance of the Host IP value in About->settings? My values are blank.

In related news, my source node has reached >95% capacity and VDP doesn't seem to de-allocate the unused space on it's dynamic disks

. Hopefully that will hold till I can add storage. Gulp.

_Gea · May 5, 2013

Ok, i suppose your firewall has deep inspection and blocked netcat.

about the host-ip in napp-it settings.
you can there restrict access to napp-it to a ip or range ex
192.168.3. restricts to adresses beginning with these numbers.

If your pool is nearly full:
If you had created your pool with a 10% base reservation, you can delete this reservation now
otherwise check for unneeded snaps.

jb33 · May 5, 2013

_Gea said:
Ok, i suppose your firewall has deep inspection and blocked netcat.

If I understand it correctly netcat is working. It's how I am manually replicating for the time being. Seems like its just the appliance-group recognition piece that's having trouble. I *think* I've got the firewalls passing the traffic and bypassing the IPS, etc.

See the nc send/receive commands below. Since I'm sticking to the replication snapshot naming convention, I'm hopeful napp-it replication will pick right up once I get the appliance recognition sorted out?

nc -lv -p 5431 | zfs receive r10/B1@1366412794_repli_zfs_nappit0_nr_15

zfs send -v -i @1366412794_repli_zfs_nappit0_nr_14 r10/B1@1366412794_repli_zfs_nappit0_nr_15 | nc 192.168.200.186 5431

jb33 · May 5, 2013

Hi Gea, maybe this is the problem?

root@nappit0:~# nc -l -p 81
nc: Address already in use

jb33 · May 5, 2013

NC Shell output: Write nappit1 zfs list to nappit0 console on port 82.

root@nappit1:~# zfs list | nc nappit0 82
root@nappit0:~# nc -l -p 82
NAME USED AVAIL REFER MOUNTPOINT
r10 3.91T 235G 32K /r10
r10/B1 2.00T 102G 1.85T -
r10/B2 1.79T 102G 1.73T -
rpool 7.92G 126G 49K /rpool
rpool/ROOT 3.70G 126G 31K legacy
rpool/ROOT/napp-it-0.9a6 3.68G 126G 3.44G /
rpool/ROOT/openindiana 13.9M 126G 3.30G /
rpool/ROOT/openindiana-backup-1 122K 126G 3.03G /
rpool/ROOT/pre_napp-it-0.9a6 94K 126G 2.86G /
rpool/dump 2.00G 126G 2.00G -
rpool/export 103M 126G 32K /export
rpool/export/home 103M 126G 32K /export/home
rpool/export/home/josh 103M 126G 103M /export/home/josh
rpool/swap 2.13G 128G 136M -

_Gea · May 5, 2013

jb33 said:
Hi Gea, maybe this is the problem?

root@nappit0:~# nc -l -p 81
nc: Address already in use

the napp-it webserver is running on port 81.
netcat replication is on the transfer port from job settings.

ps
if your firewall supports deep inspection, switch it off.
grouping is initiated from netcat as well and answered from the webserver on port 81

jb33 · May 5, 2013

_Gea said:
ps
if your firewall supports deep inspection, switch it off.

Gea, Thankyou ! I'm sure that was getting annoying! It was an http-proxy inspector on our WATCHGUARD set to deny unknown web request methods. Here's a list of the accepted methods. I screen-captured a minilog thought I saw it was a POST request. Know why it might have been failing?

HEAD
GET
POST
OPTIONS
PUT
DELETE
COPY
LOCK
MKCOL
MOVE
PROPFIND
PROPPATCH
UNLOCK
BCOPY
BDELETE
BMOVE
BPROPFIND
BPROPPATCH
NOTIFY
POLL
SEARCH
SUBSCRIBE
UNSUBSCRIBE
CCM_POST
MKACTIVITY
CHECKOUT
MERGE
REPORT
CHECKIN
UNCHECKOUT
UPDATE
LABEL
VERSION-CONTROL
BASELINE_CONTROL
MKWORKSPACE

_Gea · May 6, 2013

jb33 said:
Gea, Thankyou ! I'm sure that was getting annoying! It was an http-proxy inspector on our WATCHGUARD set to deny unknown web request methods. Here's a list of the accepted methods. I screen-captured a minilog thought I saw it was a POST request. Know why it might have been failing?

GET
POST

The used methods are get and post but I suppose Watchguards discovers that they are not initiated by Explorer or Firefox but netcat.
Set a firewall rule like allow port 81 and replication ports from/to

netcat is a universal tool, it can also used as a hacking tool

jb33 · May 6, 2013

bummer, I think WAN link reliability is going to be an issue. Yesterday's manual 152GB send dropped 129GB into it last night. Do we have anything like zfs send --resume in the nappit replication?

And is this normal? Kicked off my nappit replication this morning. Logging these every 15 seconds since under monitor:

noid 07.58:02
glib 942 repli ask remote host 192.168.1.184 -> grouplib_ask_remote_pslist 1332
nc -w 60 192.168.1.184 81
do=request_pslist&hostname=nappit0 answer=
29788 sh -c zfs send -i r10/B2@1366552594_repli_zfs_nappit0_nr_14 r10/B2@13
29790 zfs send -i r10/B2@1366552594_repli_zfs_nappit0_nr_14 r10/B2@13665525
29791 /var/web-gui/data/tools/nc/nc -b 262144 -w 30 192.168.200.186 53120

noid 07.57:48
glib 942 repli ask remote host 192.168.1.184 -> grouplib_ask_remote_pslist 1332
nc -w 60 192.168.1.184 81
do=request_pslist&hostname=nappit0 answer=
29788 sh -c zfs send -i r10/B2@1366552594_repli_zfs_nappit0_nr_14 r10/B2@13
29790 zfs send -i r10/B2@1366552594_repli_zfs_nappit0_nr_14 r10/B2@13665525
29791 /var/web-gui/data/tools/nc/nc -b 262144 -w 30 192.168.200.186 53120

noid 07.57:33
glib 942 repli ask remote host 192.168.1.184 -> grouplib_ask_remote_pslist 1332
nc -w 60 192.168.1.184 81
do=request_pslist&hostname=nappit0 answer=
29788 sh -c zfs send -i r10/B2@1366552594_repli_zfs_nappit0_nr_14 r10/B2@13
29790 zfs send -i r10/B2@1366552594_repli_zfs_nappit0_nr_14 r10/B2@13665525
29791 /var/web-gui/data/tools/nc/nc -b 262144 -w 30 192.168.200.186 53120

_Gea · May 6, 2013

There is no -resume in zfs send.
What you can do:
- local initial replication
- or run incremental replication more often to have smaller deltas

the rest is a monitoring (job running on source?, % finished)

Captainquark · May 6, 2013

Hi Gea,
I am running an older release (0.8l3) of NappIt on my NAS (no All-in-One). I am on Oracle 11 11/11. Is there anything I have to be cautious about before I upgrade to the latest version of NappIt? Also, can I still go directly to the newest, or do I have to upgrade in stages? I don't want to change the underlying OS necessarily, as it took me dozens of hours to get my system running in a stable mode...
Thanks and best regards,
Cap'

_Gea · May 6, 2013

Captainquark said:
Hi Gea,
I am running an older release (0.8l3) of NappIt on my NAS (no All-in-One). I am on Oracle 11 11/11. Is there anything I have to be cautious about before I upgrade to the latest version of NappIt? Also, can I still go directly to the newest, or do I have to upgrade in stages? I don't want to change the underlying OS necessarily, as it took me dozens of hours to get my system running in a stable mode...
Thanks and best regards,
Cap'

napp-it update is not a problem
(update per wget, reboot, recreate jobs beside replication)

The problem:
Oracle changed some network and share basics with 11.1
Napp-it 0.9 cares only about Solaris 11.1 so I would suggest:
stay at napp-it 0.8 unless you are ready to 11.1

Captainquark · May 6, 2013

Thanks for the quick reply.
I'm no Solaris expert, but I guess there's no upgrade path to 11.1, right? So if I have to reinstall from scratch, can I re-import my pools afterwards? Or do I have to back them up, then re-create them from scratch? If so, I could also switch to OI, I guess... which I cannot right now, as my pools are ZFS v.33, if I'm not mistaken. Or does OI support ZFS v.33?
Thanks for helping!
Cap'

_Gea · May 6, 2013

Solaris 11.1 can import your pool, OI cannot.
ZFS V. 29 up are Oracle closed source.

This is the reason why Illumos introduced feature flags and ZFS 5000
to add new features independent from Oracle.

sha-1024 · May 6, 2013

m1abram said:
I have a SUPERMICRO MBD-X9DR7-LN4F-O with the LSI 2308 in passthru with ESXi 5.1 going to an OI+Nappit guest.

This was a pain in the arse to setup. First issue you will have if you use the onboard LSI 230 with the LSI 9207-8i (which I also have) the machine will not boot until you disable the boot-rom for the onboard controller, however you need to remove the LSI 92078i to do this. So just be warned. The LSI will handle the boot-rom for both controllers.

Second issue which is the tricker one is you need to flash the onboard with the IT rom, it comes with the IR rom. Now this is tricky because the board is a UEFI board which means you need to find the Flash binary that works from UEFI, when I did this a while ago I had issues finding the right one, and can not find it now . Now BIG WARNIG - you screw this up the best case scenriao is the onboard controller is bricked and not longer usable, worst case the whole motherboard is bricked. So be careful, have a UPS attached to the machine, etc.

If I were building a new machine I would get the board without the onboard controller, unless I had a real need to have not external controllers. Cause my other complaint about the onboard besides the pain I mention making it work, is the location of the ports sucks for most cases. I have a Supermicro 846 chassis and I can just barely make the SAS cable fit.

hi.
I did end up purchasing http://www.supermicro.com/products/system/2u/6027/ssg-6027r-e1r12l.cfm which is supposed to have the onboard 2308 flashed in IT mode. The reseller promised me it would be so with promise to return if it doesn't work out. Since it has an expander backplane I won't need an additional card. I'll update with how it goes.

jb33 · May 7, 2013

Oh no! Gea, what happened!? 2 replication jobs have deleted all my target snaps and all but one of my source snaps and I think forced me to reseed!! I think maybe given the fragility of WAN replication and the potential for jobs to fail zfs destroy should only fire after replication completes successfully.

also, future feature request: zfs send to .zip. rsync archive (w/ resume and bandwidth throttling) receive from file?

target# zpool history
2013-05-06.07:11:31 zfs destroy -f r10/B2@1366552594_repli_zfs_nappit0_nr_12
2013-05-06.07:11:36 zfs destroy -f r10/B2@1366552594_repli_zfs_nappit0_nr_13
2013-05-06.08:37:22 zfs set readonly=off r10/B2
2013-05-06.08:37:27 zfs set readonly=on r10/B2
2013-05-06.20:59:19 zfs destroy -f r10/B2@1366552594_repli_zfs_nappit0_nr_14
2013-05-06.20:59:21 zfs set readonly=off r10/B2
2013-05-06.20:59:26 zfs set readonly=on r10/B2

source# zpool history

2013-05-06.07:11:46 zfs snapshot r10/B2@1366552594_repli_zfs_nappit0_nr_15
2013-05-06.20:59:26 zfs destroy r10/B2@1366552594_repli_zfs_nappit0_nr_14
2013-05-06.20:59:31 zfs snapshot r10/B2@1366552594_repli_zfs_nappit0_nr_16
2013-05-06.20:59:36 zfs destroy r10/B2@1366552594_repli_zfs_nappit0_nr_16

Captainquark · May 7, 2013

_Gea said:
Solaris 11.1 can import your pool, OI cannot.
ZFS V. 29 up are Oracle closed source.

This is the reason why Illumos introduced feature flags and ZFS 5000
to add new features independent from Oracle.

Thanks, _Gea, most helpful.

On an unrelated subject, I have recently created a new ZFS folder on my datapool, and then moved some 100 Gigs of data from one folder to the other (within the same pool, that is). I just realized that it did not free up the space of the moved away data on the source folder. Is that expected behaviour, and is there something I can do about it? I am currently pretty tight with diskspace and cannot afford to buy bigger disks at the moment...

Thanks & regards,
Cap'

OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

n00b

Weaksauce

Supreme [H]ardness

n00b

Gawd

Supreme [H]ardness

n00b

n00b

Supreme [H]ardness

Limp Gawd

Limp Gawd

Limp Gawd

Supreme [H]ardness

Supreme [H]ardness

Limp Gawd

n00b

Supreme [H]ardness

n00b

Supreme [H]ardness

n00b

Limp Gawd

Supreme [H]ardness

Gawd

n00b

Supreme [H]ardness

n00b

n00b

n00b

Supreme [H]ardness

n00b

Supreme [H]ardness

n00b

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

n00b

n00b

Weaksauce