Help/Guidance: Hyper-V Cluster (CSVFS) + ISCSI - SLOW

Discussion in 'SSDs & Data Storage' started by Dillirium, Oct 31, 2017.

  1. Dillirium

    Dillirium Limp Gawd

    Messages:
    439
    Joined:
    Sep 16, 2004
    All,
    I'm in need of some help. I'm trying to figure out where to go with this. I recently setup a new cluster that's using a NETAPP LUN. When accessing the LUN as a mapped drive (before mapping as clustered storage) I get very fast performance. E.G. 900 MB/s xfer of files. Once I add the volume to the cluster (CSVFS) then it goes 10x slower. Max I've seen files transfer to the storage is at 140 MB/s. Guests also seem to suffer a bit in performance. Still usable but nothing like I expected.

    Crucial info:
    -Hyper-V server that i'm running the test from is the owner of the storage (role holder)
    -MTU Size is 9000
    *tested by using the ping -l 8000 -f trick
    - block size is 64k
    - MPIO is in use (2x 10 GIG connections)

    I created a second lun and I indeed get full speed on the same ISCSI connection but only a fraction still on the CSVFS. What could I be missing? Any Ideas?

    Edit 1: Forgot to add that i'm using intel NIC's not Broadcom. Saw issues related to those adapters. Tried out those troubleshooting steps for kicks and giggles but it didn't help. Plus some are not applicable due to new OS / NICs
     
  2. Biznatch

    Biznatch 2[H]4U

    Messages:
    2,224
    Joined:
    Nov 16, 2009
    Any errors in the event viewer? I've used CSV + ISCSI since 2008R2, and never had any performance issues like this. When you drop the drive from CSV, does the performance come back?

    Are you clustered with multiple servers? That's really the only reason to use CSV. If you are, do both node experience the slowness, or only the node that isn't directly attached to the ISCSI volume?
     
  3. Langly

    Langly Only Three Midgets

    Messages:
    4,229
    Joined:
    Dec 23, 2002
    Make sure your MPIO is truely setup correctly. Make sure the NICs/CNAs also have correct recommended settings for whatever storage you are using. I can't count the number of times that people forget to do things like enable flow control on the network or think they have 9000 MTU setup correctly when they don't realize some NICs will add 4-16bytes (or more) overhead causing them to actually be 9016 for the frame size causing the network to start chop chop chopping. Make sure switches are as set higher for their MTU... for the Cisco's I install generally i run them at 9216 (theres a few models that dont let me go that high but you get the idea)

    Definitely check the event log, its even possible one of your network stack or other services is having issues and just restarting over and over causing latency
     
  4. Dillirium

    Dillirium Limp Gawd

    Messages:
    439
    Joined:
    Sep 16, 2004

    I took some time tonight to evacuate the storage on one of the luns. I removed it from the cluster and sure as hell I got full speed. It was doing 114 MB/s before I had removed it from the clustered storage.. After removing it and mapping a drive to it instead, I got 805 MB/s
     
    Last edited: Nov 2, 2017
  5. Dillirium

    Dillirium Limp Gawd

    Messages:
    439
    Joined:
    Sep 16, 2004
    Checked and didn't see any errors popping, at least none related to this.99% of them were information updates and not spamming.
     
  6. Dillirium

    Dillirium Limp Gawd

    Messages:
    439
    Joined:
    Sep 16, 2004
    I was doing some more research and did happen on someone else w/ the same type of issue. They flat out gave up and just used individual luns. I have 3 nodes so I need the shared storage.
     
  7. Child of Wonder

    Child of Wonder 2[H]4U

    Messages:
    3,269
    Joined:
    May 22, 2006
    What version of Hyper-V? Could be a CSV ownership issue where metadata is going through a slower 1Gb link between hosts.
     
  8. Dillirium

    Dillirium Limp Gawd

    Messages:
    439
    Joined:
    Sep 16, 2004
    That's a good point about the metadata. I will check/test. i'm running on 2016 (Version: 10.0.14393.0)