OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

jb33

n00b
Joined
Feb 6, 2013
Messages
39
I'm moving my target replication node tomorrow to our VPN-connected satellite office and will have to change the target-node’s IP address for the new local subnet. Will the existing Napp-it replication extension jobs continue to run?

thanks,
jb
 
Last edited:

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,051
I'm moving my target replication node tomorrow to our VPN-connected satellite office and will have to change the target-node’s IP address for the new local subnet. Will the existing Napp-it replication extension jobs continue to run?

thanks,
jb

yes, on target machine (napp-it 0.9)
goto extension - appliance group and
- delete old source
- add new one with new ip

goto menu jobs and click on jobid
- edit ip
 

jb33

n00b
Joined
Feb 6, 2013
Messages
39
Thanks Gea.

Source IP unchanged. Target IP has changed. Current source appliance status (as read in target appliance-group) is 'remote call: timeout.' manual nc send/receive test was succesful across vpn.

Confirm delete source from appliance-group and re-add?

I tried to add before deleting and received error that the box is not a napp-it appliance. As a test, I tried to add the target to an appliance group on the source and received the same error in case there's a chance my firewall is blocking necessary comms.

thanks,
jb
 
Joined
Jan 19, 2007
Messages
575
That's great news! :)

The PSU I am using in my Norco 4224 is a Corsair HX650 I think. Or something like that. I used the cables that are standard with the PSU - no more than 4 SATA power connectors per rail. Each backplane has one power connector for 4 drives. I use 4 of the modular SATA power cables to power the 6 backplanes. No problems!

But the HX650 is kind of a pricey PSU - no less than $120 usually.

EDIT: My M1015s by default stagger the spinup of each hdd so that must help a lot.

If I could hug you I would. I completely removed the new pool and the entire original array came online. It's absolutely I overloaded the molex expander from Norco. Now I just need to figure out a way to power all 6 backplanes without the use of expanders. Or else, not chain all of them off a massive one like I did. Got any tips? This is the PSU my server is running (since it's damn near impossible to find a generic ATX server-grade PSU): http://www.amazon.com/gp/product/B00284AJ1G/ref=wms_ohs_product?ie=UTF8&psc=1
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,051
Thanks Gea.

Source IP unchanged. Target IP has changed. Current source appliance status (as read in target appliance-group) is 'remote call: timeout.' manual nc send/receive test was succesful across vpn.

Confirm delete source from appliance-group and re-add?

I tried to add before deleting and received error that the box is not a napp-it appliance. As a test, I tried to add the target to an appliance group on the source and received the same error in case there's a chance my firewall is blocking necessary comms.

thanks,
jb

Appliance group is needed for communication. (place a key on both sides).
If this fails, you may have a firewall. Delete/readd is independant from job settings.

You must open port 81 for base comunication and the replication port from job settings.
 

jb33

n00b
Joined
Feb 6, 2013
Messages
39
Appliance group is needed for communication. (place a key on both sides).
If this fails, you may have a firewall. Delete/readd is independant from job settings.

You must open port 81 for base comunication and the replication port from job settings.

81 is open - ran a netcat port scan! handy little feature! - and I've confirmed all traffic is allowed between hosts. I think I've got a host routing issue but I'm not sure what it is yet. Both hosts can ping one another, resolve eachother's hostname with an answer from our domain dns box and I've got 2 manual netcat send/receive jobs running now.

But on either, if I run a traceroute to the other the trace does not complete. Trace from a windows box sitting next to it and it completes.

LAN 1.............................LAN 2
Nappit0 -TRACEROUTE->Nappit1: Fail
WIN----------TRACERT----->Nappit1: Success
 

jb33

n00b
Joined
Feb 6, 2013
Messages
39
hmm, maybe that's not it. ICMP Traceroute completes between both nappit boxes... Do the hosts "register" any Route/MAC or IP info about each other when the appliance group is first established?
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,051
hmm, maybe that's not it. ICMP Traceroute completes between both nappit boxes... Do the hosts "register" any Route/MAC or IP info about each other when the appliance group is first established?

Part of the group and the job settings are hostname and ip. Hostname is also part of snapnames.
If you only change ip, it is enough to delete/recreate the goup and edit jobsettings for new ip.

Can you create a new replication (can be a smaller one) for testing?

What you can check:
Is the gateway (IP setting) correct when routing or are you in the same network via VPN?
Check also default route settings.
 

stevebaynet

Limp Gawd
Joined
Nov 9, 2011
Messages
204
Good morning,

So i have a napp-it "all in one" running omnios. its primary use is to back up my nexenta boxes. I am first trying "auto-sync" which automates zfs-send/rec.

I created a pool on napp-it called "bakpool"

I am backing up the folder "infra" from nexenta box

When the job is complete, in napp-it GUI ZFS shares i now see /bakpool/infra and the nexenta box says the job finished correctly. however, napp-it (and shell) says /bakpool/infra does not exist! (but i see the space used up)

Here is the properties shown via napp-it of the ZFS share /bakpool/infra



Am i not seeing anything because it is not mounted? do i need to mount it read only to another folder or share? Any suggestions to accomplish this?

I'd prefer to make this work before i give up and replicate direct from the napp-it box via zrep.
 

stevebaynet

Limp Gawd
Joined
Nov 9, 2011
Messages
204
Am i not seeing anything because it is not mounted? do i need to mount it read only to another folder or share? Any suggestions to accomplish this?

Trying to answer my own question. so i tried:

zfs mount bakpool/infra

To which i get: cannot mount 'bakpool/infra': 'canmount' property is set to 'off'

You cant see it in the image, but here is the canmount property from the same page:

bakpool/infra canmount off local

can i just set this to on? (i assume i would also want to set this to read only?)
 

MistrWebmastr

Limp Gawd
Joined
Feb 24, 2008
Messages
279
So I am working on trying to get my backup server back up and running, but in the meantime I would like to prevent any additional data loss on my primary array. Now that we've ruled it a power issue, is there a way I can clear the ZFS checksum errors by forcing a resliver? I checked the drives that are checksuming, and I'm getting 0 soft, hard, transport, media, devcei not ready, recoverable, illegal, and predictive failure analysis errors. I'm thinking the checksum issue was due to the insufficient power to the drives and they wrote garbage. My array is online at this point, but I've got 2 degraded drives in one RAIDZ2, and an UNAVAIL (the only actual failing disk before I had the power supply issue) and a drive with 47 checksum errors in the other RAIDZ2.

The drives to get my backup server up and running won't get here until Tuesday so I'd like to ensure I can recover at least one of the drives on each array so there's some redundancy left just in case.

This is the output of my zpool status right now:

Code:
  pool: data
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 1.96M in 0h0m with 0 errors on Wed May  1 21:37:51 2013
config:

        NAME                       STATE     READ WRITE CKSUM
        data                       DEGRADED     0     0     0
          raidz2-0                 DEGRADED     0     0     0
            c3t5000CCA228C2DD95d0  ONLINE       0     0     0
            11292073787090335211   UNAVAIL      0     0     0  was /dev/dsk/c3t5000CCA228C2E47Ed0s0
            c3t5000CCA228C33703d0  ONLINE       0     0     0
            c3t5000CCA228C34508d0  ONLINE       0     0    47
            c3t5000CCA228C349FAd0  ONLINE       0     0     0
            c3t5000CCA228C35CBFd0  ONLINE       0     0     0
          raidz2-1                 DEGRADED     0     0     0
            c3t5000CCA228C34321d0  ONLINE       0     0     0
            c3t5000CCA228C34EB9d0  ONLINE       0     0     0
            c3t5000CCA228C3542Ad0  DEGRADED     0     0    55  too many errors
            c3t5000CCA228C35875d0  ONLINE       0     0     0
            c3t5000CCA228C35A4Ad0  DEGRADED     0     0    60  too many errors
            c3t5000CCA228C35CD2d0  ONLINE       0     0     0

errors: No known data errors
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,051
So I am working on trying to get my backup server back up and running, but in the meantime I would like to prevent any additional data loss on my primary array. Now that we've ruled it a power issue, is there a way I can clear the ZFS checksum errors by forcing a resliver?

I would clear errors:
zpool clear data

followed by a scrub to verify/repair checksum errors
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,051
Trying to answer my own question. so i tried:

zfs mount bakpool/infra

To which i get: cannot mount 'bakpool/infra': 'canmount' property is set to 'off'

You cant see it in the image, but here is the canmount property from the same page:

bakpool/infra canmount off local

can i just set this to on? (i assume i would also want to set this to read only?)

You should be able to mount when you set canmount to on
 

MistrWebmastr

Limp Gawd
Joined
Feb 24, 2008
Messages
279
I would clear errors:
zpool clear data

followed by a scrub to verify/repair checksum errors

I assume then that this is normal (since Time Slider has been having fun making snapshots while the disks were in a degraded state)? Once the scan finishes it should be back online?

Code:
  pool: data
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub in progress since Sat May  4 11:20:03 2013
    4.01G scanned out of 26.5T at 7.52M/s, (scan is slow, no estimated time)
    444K repaired, 0.01% done
config:

        NAME                       STATE     READ WRITE CKSUM
        data                       DEGRADED     0     0     0
          raidz2-0                 DEGRADED     0     0     0
            c3t5000CCA228C2DD95d0  ONLINE       0     0     0
            11292073787090335211   UNAVAIL      0     0     0  was /dev/dsk/c3t5000CCA228C2E47Ed0s0
            c3t5000CCA228C33703d0  ONLINE       0     0     0
            c3t5000CCA228C34508d0  DEGRADED     0     0   138  too many errors  (repairing)
            c3t5000CCA228C349FAd0  ONLINE       0     0     0
            c3t5000CCA228C35CBFd0  ONLINE       0     0     0
          raidz2-1                 DEGRADED     0     0     0
            c3t5000CCA228C34321d0  ONLINE       0     0     0
            c3t5000CCA228C34EB9d0  ONLINE       0     0     0
            c3t5000CCA228C3542Ad0  DEGRADED     0     0   169  too many errors  (repairing)
            c3t5000CCA228C35875d0  ONLINE       0     0     0
            c3t5000CCA228C35A4Ad0  DEGRADED     0     0   194  too many errors  (repairing)
            c3t5000CCA228C35CD2d0  ONLINE       0     0     0

errors: No known data errors

The drives since booting (when checksum showed 0) shows the following with iostat:

c3t5000CCA228C34508d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: Hitachi HDS5C303 Revision: A5C0 Serial No:
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t5000CCA228C3542Ad0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: Hitachi HDS5C303 Revision: A5C0 Serial No:
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c3t5000CCA228C35A4Ad0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: Hitachi HDS5C303 Revision: A5C0 Serial No:
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
 

jb33

n00b
Joined
Feb 6, 2013
Messages
39
Part of the group and the job settings are hostname and ip. Hostname is also part of snapnames.
If you only change ip, it is enough to delete/recreate the goup and edit jobsettings for new ip.

Can you create a new replication (can be a smaller one) for testing?

What you can check:
Is the gateway (IP setting) correct when routing or are you in the same network via VPN?
Check also default route settings.

I seem to have a handful of issues. The big one is my target-side firewall [sonicwall TZ210] can't seem to handle the traffic. CPU spikes to 100% from about 5% as soon as I kick off a manual replication and latency goes from <30ms to >2000ms!

During a manual remote replication, the target appliance group reports the source appliance is offline. Once I kill the manual replication job, the target appliance group reports the source as online with a status of 'remote call: timeout.' Now, if I try to add the target to the source appliance group (just for testing) I get the error that the host is not a nappit appliance.

Do you know what would throw those two messages: 'remote call:timeout' and 'not a nappit appliance?' Other than the SonicWall FAIL under load I think the networking checks out. Here are relevant outputs.

root@nappit0:~# ping nappit1
nappit1 is alive


root@nappit0:~# traceroute -I nappit1
traceroute to nappit1 (192.168.1.184), 30 hops max, 40 byte packets
1 * * *
2 nappit1.keslerassociates.local (192.168.1.184) 29.133 ms 32.935 ms 46.147 ms

root@nappit0:~# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 6
inet 192.168.200.186 netmask ffffff00 broadcast 192.168.200.255
ether 0:2:b3:d8:d:f4
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
inet6 ::1/128
e1000g0: flags=20002000840<RUNNING,MULTICAST,IPv6> mtu 1500 index 6
inet6 ::/0
ether 0:2:b3:d8:d:f4


root@nappit0:~# netstat -nr

Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ---------- ---------
default 192.168.200.9 UG 8 2396765
127.0.0.1 127.0.0.1 UH 2 1406 lo0
192.168.200.0 192.168.200.186 U 255 1495 e1000g0

Routing Table: IPv6
Destination/Mask Gateway Flags Ref Use If
--------------------------- --------------------------- ----- --- ------- -----
::1 ::1 UH 2 548 lo0
 
Last edited:

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,051
timeout: ping ok, but no answer within 60s on port 81
not a napp-it appliance: ping ok but cannot connect port 81
 

jb33

n00b
Joined
Feb 6, 2013
Messages
39
seems like it's open, see the port scans in both directions below. I clicked the ZFS link for the appliance in the t-AG and see this corresponding output in the jobs monitor. Anything in the fact that the host_ip=nappit1? By the way, there is no question of me buying the extension on Tuesday. No way I want to try and script this myself!

-> grouplib_ask_remote_zfslist 1379
nc -w 60 192.168.1.184 81
do=request_zfslist&hostname=nappit0&host_ip=nappit1


Port Scans:
root@nappit1:~# nc -vzu 192.168.200.186 81
Connection to 192.168.200.186 81 port [udp/*] succeeded!

root@nappit1:~# nc -vz 192.168.200.186 81
Connection to 192.168.200.186 81 port [tcp/*] succeeded!


root@nappit0:~# nc -vzu 192.168.1.184 81
Connection to 192.168.1.184 81 port [udp/*] succeeded!

root@nappit0:~# nc -vz 192.168.1.184 81
Connection to 192.168.1.184 81 port [tcp/*] succeeded!
 

jb33

n00b
Joined
Feb 6, 2013
Messages
39
Good question! But yes, in both directions.

root@nappit0:~# ping nappit1
nappit1 is alive
root@nappit0:~# nc -vzu nappit1 81
Connection to nappit1 81 port [udp/*] succeeded!
root@nappit0:~# nc -vz nappit1 81
Connection to nappit1 81 port [tcp/*] succeeded!

root@nappit1:~# ping nappit0
nappit0 is alive
root@nappit1:~# nc -vzu nappit0 81
Connection to nappit0 81 port [udp/*] succeeded!
root@nappit1:~# nc -vz nappit0 81
Connection to nappit0 81 port [tcp/*] succeeded!

I noticed an extra 1 in the minilog when trying to add an appliance as a test on the source server. any chance that's getting passed to the connection on port 81 so that the ping is good but connect-81 fails? I did upgrade since first adding the appliance. Running 9b1.

minilog
exe: ping 192.168.200.186 1192.168.200.186 is alive

Sorry I took a while to respond. SonicWall Support got me mostly sorted - just turn off everything you paid extra for! Getting about 20Mbps throughput. Thought I'd double down and try concurrent replications. Pushed it up to ~35Mbps but then the firewall took a dump 12GB into my transfer :( Trying again with just one stream.
 

jdk

Limp Gawd
Joined
Nov 23, 2006
Messages
437
Any advice on enabling jumbo frames on Openindiana with a server running broadcom (bnx) NICs? Setting the MTU via dladm yields unsupported operation, so that pretty much only leaves editing /kernel/drv/bnx.conf as the only option on the table, but I think that would affect both NICs which I really don't want, I just want jumbo frames on bnx1.

Also does anyone have an general NFS tuning tips for OpenIndiana? Only getting about 20 MB/s writes on a 6 spindle raidz2 with a dedicated 15krpm log.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,051
You do not set a fixed MTU value in the driver config file buzt the allowed range.
You set this then on a per link value

I suppose you use secure unbuffered sync writes with NFS (with a dedicated ZIL log).
Now turn off sync and recheck performance.

Even if you put in the best ZIL (ex ZeusRam) you are far below this value.
If you add a good SSD Zil with supercap, values are a fraction like 1/3 or worser of a ZeusRam.

Using a spindle disk is not totally but mostly useless.
 

jb33

n00b
Joined
Feb 6, 2013
Messages
39
Alright! Got my first 92GBs snapshots from Friday manually WAN replicated! Looks like there are 200GBs more as of this morning (Sunday) which means that at ~20Mbps I should just be caught up before the doors open Monday. Phew.

Had to wake up at 3 in the morning to kick off my 2nd stream so I'm super keen to get the appliance group happy. What's the significance of the Host IP value in About->settings? My values are blank.

In related news, my source node has reached >95% capacity and VDP doesn't seem to de-allocate the unused space on it's dynamic disks :(. Hopefully that will hold till I can add storage. Gulp.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,051
Ok, i suppose your firewall has deep inspection and blocked netcat.

about the host-ip in napp-it settings.
you can there restrict access to napp-it to a ip or range ex
192.168.3. restricts to adresses beginning with these numbers.

If your pool is nearly full:
If you had created your pool with a 10% base reservation, you can delete this reservation now
otherwise check for unneeded snaps.
 

jb33

n00b
Joined
Feb 6, 2013
Messages
39
Ok, i suppose your firewall has deep inspection and blocked netcat.

If I understand it correctly netcat is working. It's how I am manually replicating for the time being. Seems like its just the appliance-group recognition piece that's having trouble. I *think* I've got the firewalls passing the traffic and bypassing the IPS, etc.

See the nc send/receive commands below. Since I'm sticking to the replication snapshot naming convention, I'm hopeful napp-it replication will pick right up once I get the appliance recognition sorted out?

nc -lv -p 5431 | zfs receive r10/B1@1366412794_repli_zfs_nappit0_nr_15

zfs send -v -i @1366412794_repli_zfs_nappit0_nr_14 r10/B1@1366412794_repli_zfs_nappit0_nr_15 | nc 192.168.200.186 5431
 

jb33

n00b
Joined
Feb 6, 2013
Messages
39
Hi Gea, maybe this is the problem?

root@nappit0:~# nc -l -p 81
nc: Address already in use
 
Last edited:

jb33

n00b
Joined
Feb 6, 2013
Messages
39
NC Shell output: Write nappit1 zfs list to nappit0 console on port 82.

root@nappit1:~# zfs list | nc nappit0 82
root@nappit0:~# nc -l -p 82
NAME USED AVAIL REFER MOUNTPOINT
r10 3.91T 235G 32K /r10
r10/B1 2.00T 102G 1.85T -
r10/B2 1.79T 102G 1.73T -
rpool 7.92G 126G 49K /rpool
rpool/ROOT 3.70G 126G 31K legacy
rpool/ROOT/napp-it-0.9a6 3.68G 126G 3.44G /
rpool/ROOT/openindiana 13.9M 126G 3.30G /
rpool/ROOT/openindiana-backup-1 122K 126G 3.03G /
rpool/ROOT/pre_napp-it-0.9a6 94K 126G 2.86G /
rpool/dump 2.00G 126G 2.00G -
rpool/export 103M 126G 32K /export
rpool/export/home 103M 126G 32K /export/home
rpool/export/home/josh 103M 126G 103M /export/home/josh
rpool/swap 2.13G 128G 136M -
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,051
Hi Gea, maybe this is the problem?

root@nappit0:~# nc -l -p 81
nc: Address already in use

the napp-it webserver is running on port 81.
netcat replication is on the transfer port from job settings.

ps
if your firewall supports deep inspection, switch it off.
grouping is initiated from netcat as well and answered from the webserver on port 81
 

jb33

n00b
Joined
Feb 6, 2013
Messages
39
ps
if your firewall supports deep inspection, switch it off.
Gea, Thankyou ! I'm sure that was getting annoying! It was an http-proxy inspector on our WATCHGUARD set to deny unknown web request methods. Here's a list of the accepted methods. I screen-captured a minilog thought I saw it was a POST request. Know why it might have been failing?

HEAD
GET
POST
OPTIONS
PUT
DELETE
COPY
LOCK
MKCOL
MOVE
PROPFIND
PROPPATCH
UNLOCK
BCOPY
BDELETE
BMOVE
BPROPFIND
BPROPPATCH
NOTIFY
POLL
SEARCH
SUBSCRIBE
UNSUBSCRIBE
CCM_POST
MKACTIVITY
CHECKOUT
MERGE
REPORT
CHECKIN
UNCHECKOUT
UPDATE
LABEL
VERSION-CONTROL
BASELINE_CONTROL
MKWORKSPACE
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,051
Gea, Thankyou ! I'm sure that was getting annoying! It was an http-proxy inspector on our WATCHGUARD set to deny unknown web request methods. Here's a list of the accepted methods. I screen-captured a minilog thought I saw it was a POST request. Know why it might have been failing?

GET
POST

The used methods are get and post but I suppose Watchguards discovers that they are not initiated by Explorer or Firefox but netcat.
Set a firewall rule like allow port 81 and replication ports from/to

netcat is a universal tool, it can also used as a hacking tool
 

jb33

n00b
Joined
Feb 6, 2013
Messages
39
bummer, I think WAN link reliability is going to be an issue. Yesterday's manual 152GB send dropped 129GB into it last night. Do we have anything like zfs send --resume in the nappit replication?

And is this normal? Kicked off my nappit replication this morning. Logging these every 15 seconds since under monitor:

noid 07.58:02
glib 942 repli ask remote host 192.168.1.184 -> grouplib_ask_remote_pslist 1332
nc -w 60 192.168.1.184 81
do=request_pslist&hostname=nappit0 answer=
29788 sh -c zfs send -i r10/B2@1366552594_repli_zfs_nappit0_nr_14 r10/B2@13
29790 zfs send -i r10/B2@1366552594_repli_zfs_nappit0_nr_14 r10/B2@13665525
29791 /var/web-gui/data/tools/nc/nc -b 262144 -w 30 192.168.200.186 53120

noid 07.57:48
glib 942 repli ask remote host 192.168.1.184 -> grouplib_ask_remote_pslist 1332
nc -w 60 192.168.1.184 81
do=request_pslist&hostname=nappit0 answer=
29788 sh -c zfs send -i r10/B2@1366552594_repli_zfs_nappit0_nr_14 r10/B2@13
29790 zfs send -i r10/B2@1366552594_repli_zfs_nappit0_nr_14 r10/B2@13665525
29791 /var/web-gui/data/tools/nc/nc -b 262144 -w 30 192.168.200.186 53120

noid 07.57:33
glib 942 repli ask remote host 192.168.1.184 -> grouplib_ask_remote_pslist 1332
nc -w 60 192.168.1.184 81
do=request_pslist&hostname=nappit0 answer=
29788 sh -c zfs send -i r10/B2@1366552594_repli_zfs_nappit0_nr_14 r10/B2@13
29790 zfs send -i r10/B2@1366552594_repli_zfs_nappit0_nr_14 r10/B2@13665525
29791 /var/web-gui/data/tools/nc/nc -b 262144 -w 30 192.168.200.186 53120
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,051
There is no -resume in zfs send.
What you can do:
- local initial replication
- or run incremental replication more often to have smaller deltas

the rest is a monitoring (job running on source?, % finished)
 

Captainquark

Weaksauce
Joined
Dec 14, 2011
Messages
103
Hi Gea,
I am running an older release (0.8l3) of NappIt on my NAS (no All-in-One). I am on Oracle 11 11/11. Is there anything I have to be cautious about before I upgrade to the latest version of NappIt? Also, can I still go directly to the newest, or do I have to upgrade in stages? I don't want to change the underlying OS necessarily, as it took me dozens of hours to get my system running in a stable mode...
Thanks and best regards,
Cap'
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,051
Hi Gea,
I am running an older release (0.8l3) of NappIt on my NAS (no All-in-One). I am on Oracle 11 11/11. Is there anything I have to be cautious about before I upgrade to the latest version of NappIt? Also, can I still go directly to the newest, or do I have to upgrade in stages? I don't want to change the underlying OS necessarily, as it took me dozens of hours to get my system running in a stable mode...
Thanks and best regards,
Cap'

napp-it update is not a problem
(update per wget, reboot, recreate jobs beside replication)

The problem:
Oracle changed some network and share basics with 11.1
Napp-it 0.9 cares only about Solaris 11.1 so I would suggest:
stay at napp-it 0.8 unless you are ready to 11.1
 

Captainquark

Weaksauce
Joined
Dec 14, 2011
Messages
103
Thanks for the quick reply.
I'm no Solaris expert, but I guess there's no upgrade path to 11.1, right? So if I have to reinstall from scratch, can I re-import my pools afterwards? Or do I have to back them up, then re-create them from scratch? If so, I could also switch to OI, I guess... which I cannot right now, as my pools are ZFS v.33, if I'm not mistaken. Or does OI support ZFS v.33?
Thanks for helping!
Cap'
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,051
Solaris 11.1 can import your pool, OI cannot.
ZFS V. 29 up are Oracle closed source.

This is the reason why Illumos introduced feature flags and ZFS 5000
to add new features independent from Oracle.
 

sha-1024

n00b
Joined
Jan 18, 2013
Messages
3
I have a SUPERMICRO MBD-X9DR7-LN4F-O with the LSI 2308 in passthru with ESXi 5.1 going to an OI+Nappit guest.

This was a pain in the arse to setup. First issue you will have if you use the onboard LSI 230 with the LSI 9207-8i (which I also have) the machine will not boot until you disable the boot-rom for the onboard controller, however you need to remove the LSI 92078i to do this. So just be warned. The LSI will handle the boot-rom for both controllers.

Second issue which is the tricker one is you need to flash the onboard with the IT rom, it comes with the IR rom. Now this is tricky because the board is a UEFI board which means you need to find the Flash binary that works from UEFI, when I did this a while ago I had issues finding the right one, and can not find it now :(. Now BIG WARNIG - you screw this up the best case scenriao is the onboard controller is bricked and not longer usable, worst case the whole motherboard is bricked. So be careful, have a UPS attached to the machine, etc.

If I were building a new machine I would get the board without the onboard controller, unless I had a real need to have not external controllers. Cause my other complaint about the onboard besides the pain I mention making it work, is the location of the ports sucks for most cases. I have a Supermicro 846 chassis and I can just barely make the SAS cable fit.
hi.
I did end up purchasing http://www.supermicro.com/products/system/2u/6027/ssg-6027r-e1r12l.cfm which is supposed to have the onboard 2308 flashed in IT mode. The reseller promised me it would be so with promise to return if it doesn't work out. Since it has an expander backplane I won't need an additional card. I'll update with how it goes.
 

jb33

n00b
Joined
Feb 6, 2013
Messages
39
Oh no! Gea, what happened!? 2 replication jobs have deleted all my target snaps and all but one of my source snaps and I think forced me to reseed!! I think maybe given the fragility of WAN replication and the potential for jobs to fail zfs destroy should only fire after replication completes successfully.

also, future feature request: zfs send to .zip. rsync archive (w/ resume and bandwidth throttling) receive from file?

target# zpool history
2013-05-06.07:11:31 zfs destroy -f r10/B2@1366552594_repli_zfs_nappit0_nr_12
2013-05-06.07:11:36 zfs destroy -f r10/B2@1366552594_repli_zfs_nappit0_nr_13
2013-05-06.08:37:22 zfs set readonly=off r10/B2
2013-05-06.08:37:27 zfs set readonly=on r10/B2
2013-05-06.20:59:19 zfs destroy -f r10/B2@1366552594_repli_zfs_nappit0_nr_14
2013-05-06.20:59:21 zfs set readonly=off r10/B2
2013-05-06.20:59:26 zfs set readonly=on r10/B2

source# zpool history

2013-05-06.07:11:46 zfs snapshot r10/B2@1366552594_repli_zfs_nappit0_nr_15
2013-05-06.20:59:26 zfs destroy r10/B2@1366552594_repli_zfs_nappit0_nr_14
2013-05-06.20:59:31 zfs snapshot r10/B2@1366552594_repli_zfs_nappit0_nr_16
2013-05-06.20:59:36 zfs destroy r10/B2@1366552594_repli_zfs_nappit0_nr_16
 

Captainquark

Weaksauce
Joined
Dec 14, 2011
Messages
103
Solaris 11.1 can import your pool, OI cannot.
ZFS V. 29 up are Oracle closed source.

This is the reason why Illumos introduced feature flags and ZFS 5000
to add new features independent from Oracle.

Thanks, _Gea, most helpful.

On an unrelated subject, I have recently created a new ZFS folder on my datapool, and then moved some 100 Gigs of data from one folder to the other (within the same pool, that is). I just realized that it did not free up the space of the moved away data on the source folder. Is that expected behaviour, and is there something I can do about it? I am currently pretty tight with diskspace and cannot afford to buy bigger disks at the moment...

Thanks & regards,
Cap'
 
Top