VMware vSAN - Single Disk Noncompliant

Discussion in 'Virtualized Computing' started by gimp, Dec 28, 2018.

  1. gimp

    gimp Pound Me Too

    Messages:
    9,749
    Joined:
    Jul 25, 2008
    Just for funsies, and to see what this vSAN thing is all about, decided to build a 2-node VMware vSAN cluster.
    Hosts are running on 8th Gen NUC i7's, 32GB RAM, 1x 250gb nvme, 1x 2tb SSD, booting off old USB thumbsticks.

    Took a bit to figure out how to get it all working, but I managed.

    Only one snag, which I don't think is directly related to the 2-node set up (with witness appliance).

    The VMware vSAN witness appliance has 3 disks.
    Currently, 1 of those 3 disks is out of compliance with the storage policy.
    It reports that one disk as RAID0, when everything else is RAID1; and policy states RAID1.
    All of the disks in the VCSA are properly reported as RAID1.

    I've tried creating a new storage policy and applying it, but that doesn't change anything.
    I also tried doing the "Repair Objects Immediately" option from the vSAN Health menu, under the vSAN cluster/monitor.

    Nothing seems to want to correct it.
    I've done a whole lot of searching, but everything I find is related to all or multiple disks in a noncompliant state. In those cases, it was usually creating and applying a new policy that got things working.

    any tips? what might I be doing wrong? bug in something else?
    I have already updated everything with the latest patches (2 physical hosts, witness host, and VCSA)

    gs012865.png

    gs012866.png

    gs012867.png
     
  2. gimp

    gimp Pound Me Too

    Messages:
    9,749
    Joined:
    Jul 25, 2008
    Well never mind.
    vcenter crashed, and now all VMs are "unknown" status and "name" is just a single digit number.
    browsing the vsandatacenter store isn't showing the VCSA or vsan witness appliance vmdks on either host.
    neat...
    time to rebuild.
     
  3. Child of Wonder

    Child of Wonder 2[H]4U

    Messages:
    3,264
    Joined:
    May 22, 2006
    This is why I avoid VSAN like the plague.
     
  4. gimp

    gimp Pound Me Too

    Messages:
    9,749
    Joined:
    Jul 25, 2008
    I've heard a fair amount of good.
    And really, just using the 60-day trial to test it out because I was curious. I also haven't gotten any experience with vSphere 6.7 yet, as we are still on 6.0 due to other infrastructure compatibility pieces that we need to upgrade first.

    since "hyperconverged" is the latest fad, I wanted to check it out.
    and I don't have anything capable of shared storage atm; or, well, current setup/physical placement of devices makes it not very capable.
     
  5. gimp

    gimp Pound Me Too

    Messages:
    9,749
    Joined:
    Jul 25, 2008
    gotta say, maybe it's because I'm not 10gbe yet, running on NUCs, and it's only a 2-node stretched cluster with a nested witness host, but... none of that should be causing these issues.
    Occurred this morning as well, and it only seems to occur after storage vmotioning a single disk from an NFS share back to the vSAN datastore. And it doesn't happen until after the storage vmotion actually completes.
    the 2 nodes are connected to the same switch and I doubt I'm saturating 1gb.
    Both nodes show similar issues, while there are no issues connecting to the nodes themselves.

    gs012868.png

    not terribly impressed.
     
  6. Shockey

    Shockey [H]ard|Gawd

    Messages:
    1,967
    Joined:
    Nov 24, 2008
    Why is SCSI controller showing yellow?

    I assume you are running this on 6.7 based on screenshots. Those are nicer then 6.0 and 6.5
     
  7. gimp

    gimp Pound Me Too

    Messages:
    9,749
    Joined:
    Jul 25, 2008
    controller isn't on HCL, or at least not certified
     
  8. gimp

    gimp Pound Me Too

    Messages:
    9,749
    Joined:
    Jul 25, 2008
    so today... think I destroyed vsan again, by powering both hosts off.
    per VMware, everything should get re-established.
    https://blogs.vmware.com/virtualblocks/2018/07/13/understanding-ve-booting-w-vc-unavailable/
    sadly, it did not. Or, has not.
    Even after sitting for over 30 minutes.
    each host did eventually report vsan datastore size and usage, but data reported was what the individual host was providing; not the total.
    VMs showing back up with names as just numbers and status of invalid.

    Powered my hosts off so I could validate up-to-date firmware on the Corsair m.2 and SanDisk SSD.
    Had to use the hotswap bays on my media/fileserver for the SanDisk.
    Ran the "long" test with the SanDisk SSD software. Both came back clean.
    Also up to date firmware.
    Firmware on Corsair m.2 up to date as well.

    Yet on one host...
    event.WARNING - VSAN device t10.ATA___SanDisk_SDSSDH32000G_______________183795801332_____ is degrading. Consider replacing it..fullFormat (WARNING - VSAN device t10.ATA___SanDisk_SDSSDH32000G_______________183795801332_____ is degrading. Consider replacing it.)

    So now trying to run a test on the SSD while in the NUC; bit problematic figuring out what/how to do that, as I would prefer to use a live boot disk so I don't have to blow it all away again.

    currently running badblock read-only test off a gparted live.
     
  9. danswartz

    danswartz 2[H]4U

    Messages:
    3,602
    Joined:
    Feb 25, 2011
    When do you call it quits, here? :)
     
  10. gimp

    gimp Pound Me Too

    Messages:
    9,749
    Joined:
    Jul 25, 2008
    well smartctl long test passed on the ssd.
    running badblocks non-destructive read-write test.
    if that passes then I'd have to say the issue is compatibility-related, and I'd be calling it quits with vSAN attempts for now.
     
  11. Child of Wonder

    Child of Wonder 2[H]4U

    Messages:
    3,264
    Joined:
    May 22, 2006
     
    danswartz likes this.
  12. gimp

    gimp Pound Me Too

    Messages:
    9,749
    Joined:
    Jul 25, 2008
    well I just fucked up.
    I forgot I had disabled the vmw_ahci module one host 1, but never rebooted it.

    oops.

    vsan is back up.
    still not sure what's goin gon with host 2.
    even badblocks non-destructive read-write test came back with 0 errors, but host is still barking about the drive "degrading"
     
  13. Child of Wonder

    Child of Wonder 2[H]4U

    Messages:
    3,264
    Joined:
    May 22, 2006
    Dude, VSAN is steaming garbage. Just quit banging your head against a wall and buy a cheapo used Synology box or build a FreeNAS box. You'll have far less sleepless nights. Trust me, you've only just started to find all the problems and idiosyncrasies VSAN has to offer.
     
    danswartz likes this.
  14. gimp

    gimp Pound Me Too

    Messages:
    9,749
    Joined:
    Jul 25, 2008
    considering the host itself is reporting issues with my disk but no testing reports any issues, me thinks it's not vsan related.
    and, as I want to get some experience with it just because, well... your solution is not a solution.
     
  15. danswartz

    danswartz 2[H]4U

    Messages:
    3,602
    Joined:
    Feb 25, 2011
    You misspelled 'idiotsyncrasies' :)
     
  16. H2R2P2

    H2R2P2 Limp Gawd

    Messages:
    382
    Joined:
    Jun 18, 2006
    No he diddnt..
     
  17. danswartz

    danswartz 2[H]4U

    Messages:
    3,602
    Joined:
    Feb 25, 2011
    That was a joke...
     
  18. H2R2P2

    H2R2P2 Limp Gawd

    Messages:
    382
    Joined:
    Jun 18, 2006
    Touche!
     
  19. Modder man

    Modder man [H]ard|Gawd

    Messages:
    1,770
    Joined:
    May 13, 2009
    Suggesting vSAN is garbage because it doesn't run properly on hardware it was never intended to run on does not make any sense.....I have 100's of servers running vSAN and it works out quite well. There are caveats to its use like with any solution. This particular usecase is not the one vSAN was intended for.