VMware vSAN - Single Disk Noncompliant

Discussion in 'Virtualized Computing' started by gimp, Dec 28, 2018.

  1. gimp

    gimp [H]ardForum Junkie

    Messages:
    9,795
    Joined:
    Jul 25, 2008
    Just for funsies, and to see what this vSAN thing is all about, decided to build a 2-node VMware vSAN cluster.
    Hosts are running on 8th Gen NUC i7's, 32GB RAM, 1x 250gb nvme, 1x 2tb SSD, booting off old USB thumbsticks.

    Took a bit to figure out how to get it all working, but I managed.

    Only one snag, which I don't think is directly related to the 2-node set up (with witness appliance).

    The VMware vSAN witness appliance has 3 disks.
    Currently, 1 of those 3 disks is out of compliance with the storage policy.
    It reports that one disk as RAID0, when everything else is RAID1; and policy states RAID1.
    All of the disks in the VCSA are properly reported as RAID1.

    I've tried creating a new storage policy and applying it, but that doesn't change anything.
    I also tried doing the "Repair Objects Immediately" option from the vSAN Health menu, under the vSAN cluster/monitor.

    Nothing seems to want to correct it.
    I've done a whole lot of searching, but everything I find is related to all or multiple disks in a noncompliant state. In those cases, it was usually creating and applying a new policy that got things working.

    any tips? what might I be doing wrong? bug in something else?
    I have already updated everything with the latest patches (2 physical hosts, witness host, and VCSA)

    gs012865.png

    gs012866.png

    gs012867.png
     
  2. gimp

    gimp [H]ardForum Junkie

    Messages:
    9,795
    Joined:
    Jul 25, 2008
    Well never mind.
    vcenter crashed, and now all VMs are "unknown" status and "name" is just a single digit number.
    browsing the vsandatacenter store isn't showing the VCSA or vsan witness appliance vmdks on either host.
    neat...
    time to rebuild.
     
  3. Child of Wonder

    Child of Wonder 2[H]4U

    Messages:
    3,266
    Joined:
    May 22, 2006
    This is why I avoid VSAN like the plague.
     
  4. gimp

    gimp [H]ardForum Junkie

    Messages:
    9,795
    Joined:
    Jul 25, 2008
    I've heard a fair amount of good.
    And really, just using the 60-day trial to test it out because I was curious. I also haven't gotten any experience with vSphere 6.7 yet, as we are still on 6.0 due to other infrastructure compatibility pieces that we need to upgrade first.

    since "hyperconverged" is the latest fad, I wanted to check it out.
    and I don't have anything capable of shared storage atm; or, well, current setup/physical placement of devices makes it not very capable.
     
  5. gimp

    gimp [H]ardForum Junkie

    Messages:
    9,795
    Joined:
    Jul 25, 2008
    gotta say, maybe it's because I'm not 10gbe yet, running on NUCs, and it's only a 2-node stretched cluster with a nested witness host, but... none of that should be causing these issues.
    Occurred this morning as well, and it only seems to occur after storage vmotioning a single disk from an NFS share back to the vSAN datastore. And it doesn't happen until after the storage vmotion actually completes.
    the 2 nodes are connected to the same switch and I doubt I'm saturating 1gb.
    Both nodes show similar issues, while there are no issues connecting to the nodes themselves.

    gs012868.png

    not terribly impressed.
     
  6. Shockey

    Shockey [H]ard|Gawd

    Messages:
    1,978
    Joined:
    Nov 24, 2008
    Why is SCSI controller showing yellow?

    I assume you are running this on 6.7 based on screenshots. Those are nicer then 6.0 and 6.5
     
  7. gimp

    gimp [H]ardForum Junkie

    Messages:
    9,795
    Joined:
    Jul 25, 2008
    controller isn't on HCL, or at least not certified
     
  8. gimp

    gimp [H]ardForum Junkie

    Messages:
    9,795
    Joined:
    Jul 25, 2008
    so today... think I destroyed vsan again, by powering both hosts off.
    per VMware, everything should get re-established.
    https://blogs.vmware.com/virtualblocks/2018/07/13/understanding-ve-booting-w-vc-unavailable/
    sadly, it did not. Or, has not.
    Even after sitting for over 30 minutes.
    each host did eventually report vsan datastore size and usage, but data reported was what the individual host was providing; not the total.
    VMs showing back up with names as just numbers and status of invalid.

    Powered my hosts off so I could validate up-to-date firmware on the Corsair m.2 and SanDisk SSD.
    Had to use the hotswap bays on my media/fileserver for the SanDisk.
    Ran the "long" test with the SanDisk SSD software. Both came back clean.
    Also up to date firmware.
    Firmware on Corsair m.2 up to date as well.

    Yet on one host...
    event.WARNING - VSAN device t10.ATA___SanDisk_SDSSDH32000G_______________183795801332_____ is degrading. Consider replacing it..fullFormat (WARNING - VSAN device t10.ATA___SanDisk_SDSSDH32000G_______________183795801332_____ is degrading. Consider replacing it.)

    So now trying to run a test on the SSD while in the NUC; bit problematic figuring out what/how to do that, as I would prefer to use a live boot disk so I don't have to blow it all away again.

    currently running badblock read-only test off a gparted live.
     
  9. danswartz

    danswartz 2[H]4U

    Messages:
    3,609
    Joined:
    Feb 25, 2011
    When do you call it quits, here? :)
     
  10. gimp

    gimp [H]ardForum Junkie

    Messages:
    9,795
    Joined:
    Jul 25, 2008
    well smartctl long test passed on the ssd.
    running badblocks non-destructive read-write test.
    if that passes then I'd have to say the issue is compatibility-related, and I'd be calling it quits with vSAN attempts for now.
     
  11. Child of Wonder

    Child of Wonder 2[H]4U

    Messages:
    3,266
    Joined:
    May 22, 2006
     
    danswartz likes this.
  12. gimp

    gimp [H]ardForum Junkie

    Messages:
    9,795
    Joined:
    Jul 25, 2008
    well I just fucked up.
    I forgot I had disabled the vmw_ahci module one host 1, but never rebooted it.

    oops.

    vsan is back up.
    still not sure what's goin gon with host 2.
    even badblocks non-destructive read-write test came back with 0 errors, but host is still barking about the drive "degrading"
     
  13. Child of Wonder

    Child of Wonder 2[H]4U

    Messages:
    3,266
    Joined:
    May 22, 2006
    Dude, VSAN is steaming garbage. Just quit banging your head against a wall and buy a cheapo used Synology box or build a FreeNAS box. You'll have far less sleepless nights. Trust me, you've only just started to find all the problems and idiosyncrasies VSAN has to offer.
     
    danswartz likes this.
  14. gimp

    gimp [H]ardForum Junkie

    Messages:
    9,795
    Joined:
    Jul 25, 2008
    considering the host itself is reporting issues with my disk but no testing reports any issues, me thinks it's not vsan related.
    and, as I want to get some experience with it just because, well... your solution is not a solution.
     
  15. danswartz

    danswartz 2[H]4U

    Messages:
    3,609
    Joined:
    Feb 25, 2011
    You misspelled 'idiotsyncrasies' :)
     
  16. H2R2P2

    H2R2P2 Limp Gawd

    Messages:
    412
    Joined:
    Jun 18, 2006
    No he diddnt..
     
  17. danswartz

    danswartz 2[H]4U

    Messages:
    3,609
    Joined:
    Feb 25, 2011
    That was a joke...
     
  18. H2R2P2

    H2R2P2 Limp Gawd

    Messages:
    412
    Joined:
    Jun 18, 2006
    Touche!
     
  19. Modder man

    Modder man [H]ard|Gawd

    Messages:
    1,771
    Joined:
    May 13, 2009
    Suggesting vSAN is garbage because it doesn't run properly on hardware it was never intended to run on does not make any sense.....I have 100's of servers running vSAN and it works out quite well. There are caveats to its use like with any solution. This particular usecase is not the one vSAN was intended for.
     
    schizrade, RiDDLeRThC and Eulogy like this.
  20. Eulogy

    Eulogy 2[H]4U

    Messages:
    2,192
    Joined:
    Nov 9, 2005
    We have a few dozen PBs of vSAN in production, and a few hundred more TB in labs without any real issues.
     
    schizrade and RiDDLeRThC like this.
  21. Shockey

    Shockey [H]ard|Gawd

    Messages:
    1,978
    Joined:
    Nov 24, 2008
    If you don't mind me asking, what is the maximum number of nodes per a cluster you use with vSAN?
     
  22. gimp

    gimp [H]ardForum Junkie

    Messages:
    9,795
    Joined:
    Jul 25, 2008
    Related to my topic... :p

    since I destroyed the vSAN cluster, node2 no longer complains of "degrading" health on the SSD.

    ended up piecing together some old drives and mobo and loaded up FreeNAS until I can build myself a proper NAS.
    Was hoping to hack together my old MediaSmart Server, but looks like I'll need a proper breakout cable for video out since the thing completely fails to boot (possibly POST?) with a bootable thumbdrive.

    Gotta lab up a project I have coming up, and our prod environment is our test environment at work. So breaking down the 6.7 stuff.
    I'd rather go through trials and tribulations on my garbage first.
     
  23. Eulogy

    Eulogy 2[H]4U

    Messages:
    2,192
    Joined:
    Nov 9, 2005
    32, and our smallest is 8.
     
  24. Shockey

    Shockey [H]ard|Gawd

    Messages:
    1,978
    Joined:
    Nov 24, 2008
    how long does a 32 Node vsan take to upgrade?
     
  25. Eulogy

    Eulogy 2[H]4U

    Messages:
    2,192
    Joined:
    Nov 9, 2005
    Define upgrade? I'm actually not even sure I know really as our automation just rolls through when we tell it to. Next time we upgrade if I rememeber to I can see how long a single node takes and extrapolate that out, but, how long a thing takes isn't a metric I particuarly care to concern myself with :D
     
  26. Shockey

    Shockey [H]ard|Gawd

    Messages:
    1,978
    Joined:
    Nov 24, 2008
    Ahhh, Automation explains it.

    Like tell VUM to go and upgrade entire cluster from 6.0 to 6.5/6.7U1
     
  27. Eulogy

    Eulogy 2[H]4U

    Messages:
    2,192
    Joined:
    Nov 9, 2005
    We don't use VUM.. and we're already on 6.7U1
     
  28. Modder man

    Modder man [H]ard|Gawd

    Messages:
    1,771
    Joined:
    May 13, 2009
    We also do not use VUM, All host updates are automated and take between 30-50 minutes per host. We have ~10 or so hosts upgrading at any given time.
     
  29. Child of Wonder

    Child of Wonder 2[H]4U

    Messages:
    3,266
    Joined:
    May 22, 2006
    A 32 node cluster would take an entire day to upgrade? Is this with moving data during each host going into maintenance mode? All flash or hybrid? What FTT? Any data services enabled? What's baseline latency and how is it affected with hosts going offline?
     
  30. Shockey

    Shockey [H]ard|Gawd

    Messages:
    1,978
    Joined:
    Nov 24, 2008
    What do you use? PowerCLI?
     
  31. Eulogy

    Eulogy 2[H]4U

    Messages:
    2,192
    Joined:
    Nov 9, 2005
    Like I said, that was a gut guess as I'm not really sure on the timing. We don't sit and watch things. If it takes an hour or a day to upgrade a cluster, it doesn't matter to us.
    When we place a host into maint mode for upgrade, we do a full data evac, not ensure accessibility.
    FTT is 2. All flash. No data services.
    Latency on what? DIsk I/O? If so, zero impact.
    Most of our automation is in powercli, but we also use a bit of python in places for this.
     
    Shockey likes this.
  32. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005

    Stupid question cuz im like that..hah

    Ive only done esxi 6.7 with local storage and no other experience with esxi but... Now the dumb q..
    Is the freenas iscsi that your connecting too? I gotta readup on vSan...
     
  33. gimp

    gimp [H]ardForum Junkie

    Messages:
    9,795
    Joined:
    Jul 25, 2008
    no, just using it for NFS storage.
    One of these days I want to do iSCSI boot for ESXi hosts. But... I need a proper NAS. Not the thing I hobbled together that has garbage performance.
     
    TeleFragger likes this.
  34. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005

    define hobbled? I'm going to do a freenas box and curious what you consider so... I just posted a thread in virtualized computing on my hardware lookin for thoughts on what to use what as...
     
  35. gimp

    gimp [H]ardForum Junkie

    Messages:
    9,795
    Joined:
    Jul 25, 2008
    spare parts I had lying around.
    4 hdds, 3 different sizes (2tb, 3tb, 4tb)
    celeron
    8gb RAM
     
    TeleFragger likes this.
  36. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    yeah ive got e5's and 24gb/64gb and 128gb ram but I too have various drives as well...