iscsi enterprise target problem after upgrading Ubuntu 13.04 -> 13.10

extide · Oct 24, 2013

Ok, so let me explain this situation a bit. The overall setup is pretty complicated but most of it you can ignore, just remember this is a problem with iscsi enterprise target. Please read all of this before commenting, thanks!

The setup is as follows:
Ubuntu Server 13.10 running on bare metal, X58 system, i7 980, 24GB of RAM
I am running iet version 1.4.20.3 and ZoL version 0.6.2, both the latest stable.
Linux Kernel version 3.11.0-12
Problem started after 'do-system-upgrade' from Ubuntu Server 13.04 to 13.10.
I am using ZFSonLinux on this machine. I am using the PPA method of installing, so typically updates are pretty much automatic, ie whenever I get a new kernel version or ZoL version, all the kernel modules are automatically rebuilt and everything is happy schlappy. Note: I have upgraded this machine from 12.04 -> 12.10 -> 13.04 -> 13.10. Every upgrade in the past has been fine.

So I have ZFS setup with a single 6 disk raidz2 vdev, and on top of that I have a single large ZVOL which takes up basically all the space. ZVOL essentially spits out a block device that is sitting on top of the ZFS storage that you can do whatever with. It is much like mdadm spits out /dev/md0, etc. If you aren't familiar with ZFS then don't worry about it, just remember I am using a block device.

So now my ietd.conf has a single target with a single LUN which is the ZVOL block device from above.

I have a windows 2008R2 box that has the MS iSCSI initiator on it, and this box mounts up the drive. The volume is rather large (~7TB) and thus has a GPT partition table on it, and then a large NTFS partition with all my data on it.

Ok, if you have gotten this far, thank you.

My problem is that on the windows box, when I mount up the iSCSI drive, it shows up as un-initialized, as if the partition table is corrupt. Now, The thing is I KNOW the partition table is fine, and the data is also fine because I can mount the same volume locally on the linux box using ntfs-3g and see all the data just fine. I can load up the partition table with parted and see everything just fine. I see the proper additional block devices in linux for the whole drive, and each partition. (There are 2 total partitions because there is a hidden windows partition before the NTFS data partition.)

As part of my testing this I have uninstalled and reinstalled the iscsitarget kernel module, (which includes rebuilding the module from source). I also happen to have a 30GB SSD in the linux box that isn't currently being used for anything, so I changed my ietd.conf to have this 'spare' ssd be my target, and initially I was having problems with making that work, but after I rebuilt the iscsitarget kernel modules I was able to get that ssd to work just fine as an iscsi target. I was able to mount it up on the windows box, create a partition, format it, and copy and read data back off the drive. But when I switch the ietd.conf back to my ZFS ZVOL, window still says that the disk is un-initialized. But as I said before I know that the volume is indeed intact, as I have mounted it and read data just fine locally on the linux machine.

As a note I do run monthly scrubs on the ZFS pool as well. It has always shown zero errors, and never had any data integrity problems,

What the heck can I do? The data on here IS backed up, but I would much rather not having to pull it all down from crashplan unless I have to. Plus, all of my TV Shows (~3TB worth) are not backed up so I would have to download them again as well. It wouldn't be the end of the world but I definitely want to avoid it.

Should I attempt to delete the partition table and re-create it in-place?
Is this maybe a bug with the version of iscsitarget I am using? Or maybe a bug with ZoL?
I am kind of running out of ideas at this point. I mean I could just mount the drive locally using ntfs-3g and then share it over NFS/samba but I really would rather not do that...

danswartz · Oct 24, 2013

What does ietd.conf look like?

bmh.01 · Oct 24, 2013

Is there a specific reason you are using Ubuntu? Did it work for a while before doing this or has it always done it? IET isn't brilliant from what I remember and will most probably be causing your issue even if it just a misconfig problem, can you try LIO? Although i'm not sure on extent compatibility between the two.

extide · Oct 24, 2013

My ietd.conf:

Code:

Target iqn.2012-04.net.teraknor:hive
        Lun 0 Path=/dev/zvol/hive/iscsi-hive,Type=blockio
#       Lun 0 Path=/dev/zd0,Type=blockio
#       Lun 1 Path=/dev/sdh,Type=blockio,IOMode=wt
#       Lun 1 Type=nullio
        Alias Hive
        HeaderDigest None,CRC32C
        DataDigest None,CRC32C
        MaxConnections 1
        MaxSessions 1
        InitialR2T No
        ImmediateData Yes
        MaxRecvDataSegmentLength 32768
        MaxXmitDataSegmentLength 32768
        MaxBurstLength 2621440
        FirstBurstLength 524288
        #DefaultTime2Wait 2
        #DefaultTime2Retain 0
        MaxOutstandingR2T 32
        DataPDUInOrder Yes
        DataSequenceInOrder Yes
        ErrorRecoveryLevel 0
        #NOPInterval 0
        #NOPTimeout 0
        Wthreads 8
        QueuedCommands 32

I am using Ubuntu (instead of say BSD or OpenIndiana) because I want to fold on this box as well, and as far as I know there isnt a FAH client for those os's. This setup has worked frin for almost 2 years now.

What is LIO?

EDIT: Looking into LIO now. I may give it a shot...

danswartz · Oct 24, 2013

I think I know what might be going on. ZoL still has problems with pools and/or zvols mounting due to the async nature of udev on Linux. When I was still using ZoL, I'd have ietd not work right for clients, since when ietd started, the zvol wasn't yet visible. A quick test to see if this is the issue: stop ietd, check to make sure the zvol is accessible, restart ietd. See if it works then...

omniscence · Oct 24, 2013

Check whether IET uses the right block sizes. The easiest way to do this is to connect the iscsi volume on the ubuntu machine locally and use 'lsblk -t' to determine the block sizes.

extide · Oct 24, 2013

omniscence said:
Check whether IET uses the right block sizes. The easiest way to do this is to connect the iscsi volume on the ubuntu machine locally and use 'lsblk -t' to determine the block sizes.

I am currently working on switching to LIO, but I will keep this idea on the back-burner.

danswartz said:
I think I know what might be going on. ZoL still has problems with pools and/or zvols mounting due to the async nature of udev on Linux. When I was still using ZoL, I'd have ietd not work right for clients, since when ietd started, the zvol wasn't yet visible. A quick test to see if this is the issue: stop ietd, check to make sure the zvol is accessible, restart ietd. See if it works then...

I have tried doing this, in the process of the other stuff I have tried. No dice.

danswartz · Oct 24, 2013

Question: have you tried disabling autostart of ietd? I had to do that due to the above. So I would start up the system, make sure the pool was visible, and any zvols, and *then* manually start ietd. One thing I ended up doing instead: don't use zvols. Instead create a single large (sparse) file, and use /tank/foo/bar as the target with fileio instead of blockio.

extide · Oct 24, 2013

So, wow, in the process of switching to LIO and so far it has been a major pain in the ass!
1. for some reason installing the package 'targetcli' wanted to install a shit-ton of dependencies (an entire x11 setup!) it didnt need. So I had to manually figure out that mess of what packages I needed to make it work,

THEN I go to set this up and it doesnt recognize a zvol as a TYPE_DISK block device! Apparently there was a patch to fix this submitted 2 years ago that hasn't made it in yet. (See https://github.com/zfsonlinux/zfs/issues/515) So I manually made that change, and was finally able to add the zvol as a 'blockstore' in LIO.

LIO Configuration is kinda weird, but I like it.

Took me a little while to figure out how to disable auth. But switching to LIO worked.!

danswartz · Oct 24, 2013

Good to hear.

extide · Oct 24, 2013

danswartz said:
Question: have you tried disabling autostart of ietd? I had to do that due to the above. So I would start up the system, make sure the pool was visible, and any zvols, and *then* manually start ietd. One thing I ended up doing instead: don't use zvols. Instead create a single large (sparse) file, and use /tank/foo/bar as the target with fileio instead of blockio.

Yeah I had tried that. In the end switching to LIO worked. It is a bit more complex to get going, but I like it as it paves a clear path towards moving to a fiberchannel setup eventually.

One day I will probably just ditch the whole ZVOl/iSCSI thing entirely and just save files directly onto the ZFS volume, and share it with NFS/Samba. However for now I am kinda locked into this setup. My wife will be very happy as we do not subscribe to cable/sat/etc and all of our TV.Movies/etc are stored on this volume, lol.

extide · Oct 25, 2013

**Technically this post is un-related, well mostly.

Aaaaand today one of my drives started to fail, lol. What are the chances. Oh well, I love ZFS, it is handling the whole situation perfectly. Went ahead and swapped in my hot-spare and will go ahead and physically swap in my cold spare some time this weekend. Pretty sure these drives are still under warranty too...

FWIW If you want to see what any of this looks like:

Code:

james-hive /home/james # while true; do dmesg -c ; sleep 1 ; done
[152310.921117] ata2: hard resetting link
[152311.239220] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[152311.242065] ata2.00: configured for UDMA/33
[152311.242076] ata2: EH complete
[152311.464486] ata2.00: exception Emask 0x10 SAct 0x3ff SErr 0x400100 action 0x6 frozen
[152311.464718] ata2.00: irq_stat 0x08000000, interface fatal error
[152311.464947] ata2: SError: { UnrecovData Handshk }
[152311.465402] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.465865] ata2.00: cmd 61/19:00:44:2c:5b/00:00:bb:00:00/40 tag 0 ncq 12800 out
[152311.465865]          res 50/00:02:5d:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.466807] ata2.00: status: { DRDY }
[152311.467284] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.467743] ata2.00: cmd 61/08:08:76:ef:aa/00:00:de:00:00/40 tag 1 ncq 4096 out
[152311.467743]          res 50/00:02:5d:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.468688] ata2.00: status: { DRDY }
[152311.469157] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.469622] ata2.00: cmd 61/ff:10:25:2b:5b/00:00:bb:00:00/40 tag 2 ncq 130560 out
[152311.469622]          res 50/00:02:5d:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.470564] ata2.00: status: { DRDY }
[152311.471034] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.471497] ata2.00: cmd 61/20:18:24:2c:5b/00:00:bb:00:00/40 tag 3 ncq 16384 out
[152311.471497]          res 50/00:02:5d:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.472442] ata2.00: status: { DRDY }
[152311.472910] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.473374] ata2.00: cmd 61/ff:20:d6:29:5b/00:00:bb:00:00/40 tag 4 ncq 130560 out
[152311.473374]          res 50/00:02:5d:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.474320] ata2.00: status: { DRDY }
[152311.474787] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.475249] ata2.00: cmd 61/4c:28:d5:2a:5b/00:00:bb:00:00/40 tag 5 ncq 38912 out
[152311.475249]          res 50/00:02:5d:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.476191] ata2.00: status: { DRDY }
[152311.476660] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.477124] ata2.00: cmd 61/04:30:21:2b:5b/00:00:bb:00:00/40 tag 6 ncq 2048 out
[152311.477124]          res 50/00:02:5d:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.478052] ata2.00: status: { DRDY }
[152311.478521] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.478982] ata2.00: cmd 61/02:38:5d:2c:5b/00:00:bb:00:00/40 tag 7 ncq 1024 out
[152311.478982]          res 50/00:02:5d:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.479913] ata2.00: status: { DRDY }
[152311.480376] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.480837] ata2.00: cmd 61/04:40:8a:fb:aa/00:00:de:00:00/40 tag 8 ncq 2048 out
[152311.480837]          res 50/00:02:5d:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.481769] ata2.00: status: { DRDY }
[152311.482235] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.482695] ata2.00: cmd 61/08:48:e6:4f:ac/00:00:de:00:00/40 tag 9 ncq 4096 out
[152311.482695]          res 50/00:02:5d:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.483634] ata2.00: status: { DRDY }
[152311.484098] ata2: hard resetting link
[152311.802744] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[152311.805574] ata2.00: configured for UDMA/33
[152311.805587] ata2: EH complete
[152311.806977] ata2.00: exception Emask 0x10 SAct 0x3f0 SErr 0x400100 action 0x6 frozen
[152311.807208] ata2.00: irq_stat 0x08000000, interface fatal error
[152311.807432] ata2: SError: { UnrecovData Handshk }
[152311.807887] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.808351] ata2.00: cmd 61/4c:20:d5:2a:5b/00:00:bb:00:00/40 tag 4 ncq 38912 out
[152311.808351]          res 50/00:19:44:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.809281] ata2.00: status: { DRDY }
[152311.809751] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.810216] ata2.00: cmd 61/ff:28:d6:29:5b/00:00:bb:00:00/40 tag 5 ncq 130560 out
[152311.810216]          res 50/00:19:44:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.811157] ata2.00: status: { DRDY }
[152311.811617] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.812082] ata2.00: cmd 61/20:30:24:2c:5b/00:00:bb:00:00/40 tag 6 ncq 16384 out
[152311.812082]          res 50/00:19:44:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.813015] ata2.00: status: { DRDY }
[152311.813483] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.813948] ata2.00: cmd 61/ff:38:25:2b:5b/00:00:bb:00:00/40 tag 7 ncq 130560 out
[152311.813948]          res 50/00:19:44:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.814885] ata2.00: status: { DRDY }
[152311.815344] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.815806] ata2.00: cmd 61/08:40:76:ef:aa/00:00:de:00:00/40 tag 8 ncq 4096 out
[152311.815806]          res 50/00:19:44:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.816738] ata2.00: status: { DRDY }
[152311.817206] ata2.00: failed command: WRITE FPDMA QUEUED
[152311.817670] ata2.00: cmd 61/19:48:44:2c:5b/00:00:bb:00:00/40 tag 9 ncq 12800 out
[152311.817670]          res 50/00:19:44:2c:5b/00:00:bb:00:00/40 Emask 0x10 (ATA bus error)
[152311.818605] ata2.00: status: { DRDY }
[152311.819075] ata2: hard resetting link
[152312.138459] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[152312.141275] ata2.00: configured for UDMA/33
[152312.141286] ata2: EH complete
[152312.183242] ata2.00: exception Emask 0x10 SAct 0x3ff SErr 0x400100 action 0x6 frozen
[152312.183475] ata2.00: irq_stat 0x08000000, interface fatal error
[152312.183699] ata2: SError: { UnrecovData Handshk }
[152312.184151] ata2.00: failed command: WRITE FPDMA QUEUED
[152312.184615] ata2.00: cmd 61/03:00:61:2c:5b/00:00:bb:00:00/40 tag 0 ncq 1536 out
[152312.184615]          res 50/00:02:91:fb:aa/00:00:de:00:00/40 Emask 0x10 (ATA bus error)
[152312.185554] ata2.00: status: { DRDY }
[152312.186025] ata2.00: failed command: WRITE FPDMA QUEUED
[152312.186495] ata2.00: cmd 61/3c:08:6b:2c:5b/00:00:bb:00:00/40 tag 1 ncq 30720 out
[152312.186495]          res 50/00:02:91:fb:aa/00:00:de:00:00/40 Emask 0x10 (ATA bus error)
[152312.187438] ata2.00: status: { DRDY }
[152312.187911] ata2.00: failed command: WRITE FPDMA QUEUED
[152312.188383] ata2.00: cmd 61/02:10:91:fb:aa/00:00:de:00:00/40 tag 2 ncq 1024 out
[152312.188383]          res 50/00:02:91:fb:aa/00:00:de:00:00/40 Emask 0x10 (ATA bus error)
[152312.189325] ata2.00: status: { DRDY }
[152312.189790] ata2.00: failed command: WRITE FPDMA QUEUED
[152312.190254] ata2.00: cmd 61/01:18:68:69:d6/00:00:16:00:00/40 tag 3 ncq 512 out
[152312.190254]          res 50/00:02:91:fb:aa/00:00:de:00:00/40 Emask 0x10 (ATA bus error)
[152312.191200] ata2.00: status: { DRDY }
[152312.191668] ata2.00: failed command: WRITE FPDMA QUEUED
[152312.192137] ata2.00: cmd 61/1e:20:f8:a3:46/00:00:17:00:00/40 tag 4 ncq 15360 out
[152312.192137]          res 50/00:02:91:fb:aa/00:00:de:00:00/40 Emask 0x10 (ATA bus error)
[152312.193069] ata2.00: status: { DRDY }
[152312.193537] ata2.00: failed command: WRITE FPDMA QUEUED
[152312.194003] ata2.00: cmd 61/04:28:64:69:d6/00:00:16:00:00/40 tag 5 ncq 2048 out
[152312.194003]          res 50/00:02:91:fb:aa/00:00:de:00:00/40 Emask 0x10 (ATA bus error)
[152312.194939] ata2.00: status: { DRDY }
[152312.195405] ata2.00: failed command: WRITE FPDMA QUEUED
[152312.195872] ata2.00: cmd 61/01:30:60:2c:5b/00:00:bb:00:00/40 tag 6 ncq 512 out
[152312.195872]          res 50/00:02:91:fb:aa/00:00:de:00:00/40 Emask 0x10 (ATA bus error)
[152312.196809] ata2.00: status: { DRDY }
[152312.197277] ata2.00: failed command: WRITE FPDMA QUEUED
[152312.197748] ata2.00: cmd 61/02:38:16:a4:46/00:00:17:00:00/40 tag 7 ncq 1024 out
[152312.197748]          res 50/00:02:91:fb:aa/00:00:de:00:00/40 Emask 0x10 (ATA bus error)
[152312.198685] ata2.00: status: { DRDY }
[152312.199154] ata2.00: failed command: WRITE FPDMA QUEUED
[152312.199621] ata2.00: cmd 61/01:40:68:2c:5b/00:00:bb:00:00/40 tag 8 ncq 512 out
[152312.199621]          res 50/00:02:91:fb:aa/00:00:de:00:00/40 Emask 0x10 (ATA bus error)
[152312.200563] ata2.00: status: { DRDY }
[152312.201034] ata2.00: failed command: WRITE FPDMA QUEUED
[152312.201504] ata2.00: cmd 61/01:48:8e:fb:aa/00:00:de:00:00/40 tag 9 ncq 512 out
[152312.201504]          res 50/00:02:91:fb:aa/00:00:de:00:00/40 Emask 0x10 (ATA bus error)
[152312.202445] ata2.00: status: { DRDY }
[152312.202915] ata2: hard resetting link
[152312.522140] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[152312.524941] ata2.00: configured for UDMA/33
[152312.524954] ata2: EH complete
^C
james-hive /home/james :( # zpool status
  pool: hive
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 504K in 0h0m with 0 errors on Fri Oct 25 13:21:43 2013
config:

        NAME                                            STATE     READ WRITE CKSUM
        hive                                            ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            ata-Hitachi_HDS5C3020ALA632_ML0220F3151V6D  ONLINE       0     0     0
            ata-Hitachi_HDS5C3020ALA632_ML0220F3151WPD  ONLINE       0     0     0
            ata-Hitachi_HDS5C3020ALA632_ML0220F3151XBD  ONLINE       0     0     0
            ata-Hitachi_HDS5C3020ALA632_ML0220F3151XGD  ONLINE       4 1.59K    24
            ata-Hitachi_HDS5C3020ALA632_ML0220F3151YYD  ONLINE       0     0     0
            ata-Hitachi_HDS5C3020ALA632_ML0220F3151Z2D  ONLINE       0     0     0
        spares
          ata-Hitachi_HDS5C3020ALA632_ML0220F31524BD    AVAIL

errors: No known data errors
james-hive /home/james :( # zpool replace hive ata-Hitachi_HDS5C3020ALA632_ML0220F3151XGD ata-Hitachi_HDS5C3020ALA632_ML0220F31524BD
james-hive /home/james # zpool status
  pool: hive
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Oct 25 13:23:43 2013
    15.9M scanned out of 7.56T at 903K/s, (scan is slow, no estimated time)
    2.47M resilvered, 0.00% done
config:

        NAME                                              STATE     READ WRITE CKSUM
        hive                                              ONLINE       0     0     0
          raidz2-0                                        ONLINE       0     0     0
            ata-Hitachi_HDS5C3020ALA632_ML0220F3151V6D    ONLINE       0     0     0
            ata-Hitachi_HDS5C3020ALA632_ML0220F3151WPD    ONLINE       0     0     0
            ata-Hitachi_HDS5C3020ALA632_ML0220F3151XBD    ONLINE       0     0     0
            spare-3                                       ONLINE       0     0     4
              ata-Hitachi_HDS5C3020ALA632_ML0220F3151XGD  ONLINE       8 2.12K    24  (resilvering)
              ata-Hitachi_HDS5C3020ALA632_ML0220F31524BD  ONLINE       0     0     0  (resilvering)
            ata-Hitachi_HDS5C3020ALA632_ML0220F3151YYD    ONLINE       0     0     0
            ata-Hitachi_HDS5C3020ALA632_ML0220F3151Z2D    ONLINE       0     0     0
        spares
          ata-Hitachi_HDS5C3020ALA632_ML0220F31524BD      INUSE     currently in use

errors: No known data errors
james-hive /home/james #

This is going to take a while, heh.

iscsi enterprise target problem after upgrading Ubuntu 13.04 -> 13.10

extide

2[H]4U

danswartz

2[H]4U

bmh.01

Gawd

extide

2[H]4U

danswartz

2[H]4U

omniscence

[H]ard|Gawd

extide

2[H]4U

danswartz

2[H]4U

extide

2[H]4U

danswartz

2[H]4U

extide

2[H]4U

extide

2[H]4U