VMware vSAN - Single Disk Noncompliant

gimp · Dec 28, 2018

Just for funsies, and to see what this vSAN thing is all about, decided to build a 2-node VMware vSAN cluster.
Hosts are running on 8th Gen NUC i7's, 32GB RAM, 1x 250gb nvme, 1x 2tb SSD, booting off old USB thumbsticks.

Took a bit to figure out how to get it all working, but I managed.

Only one snag, which I don't think is directly related to the 2-node set up (with witness appliance).

The VMware vSAN witness appliance has 3 disks.
Currently, 1 of those 3 disks is out of compliance with the storage policy.
It reports that one disk as RAID0, when everything else is RAID1; and policy states RAID1.
All of the disks in the VCSA are properly reported as RAID1.

I've tried creating a new storage policy and applying it, but that doesn't change anything.
I also tried doing the "Repair Objects Immediately" option from the vSAN Health menu, under the vSAN cluster/monitor.

Nothing seems to want to correct it.
I've done a whole lot of searching, but everything I find is related to all or multiple disks in a noncompliant state. In those cases, it was usually creating and applying a new policy that got things working.

any tips? what might I be doing wrong? bug in something else?
I have already updated everything with the latest patches (2 physical hosts, witness host, and VCSA)

gimp · Dec 30, 2018

Well never mind.
vcenter crashed, and now all VMs are "unknown" status and "name" is just a single digit number.
browsing the vsandatacenter store isn't showing the VCSA or vsan witness appliance vmdks on either host.
neat...
time to rebuild.

Child of Wonder · Dec 30, 2018

This is why I avoid VSAN like the plague.

gimp · Dec 30, 2018

Child of Wonder said:
This is why I avoid VSAN like the plague.

I've heard a fair amount of good.
And really, just using the 60-day trial to test it out because I was curious. I also haven't gotten any experience with vSphere 6.7 yet, as we are still on 6.0 due to other infrastructure compatibility pieces that we need to upgrade first.

since "hyperconverged" is the latest fad, I wanted to check it out.
and I don't have anything capable of shared storage atm; or, well, current setup/physical placement of devices makes it not very capable.

gimp · Jan 7, 2019

gotta say, maybe it's because I'm not 10gbe yet, running on NUCs, and it's only a 2-node stretched cluster with a nested witness host, but... none of that should be causing these issues.
Occurred this morning as well, and it only seems to occur after storage vmotioning a single disk from an NFS share back to the vSAN datastore. And it doesn't happen until after the storage vmotion actually completes.
the 2 nodes are connected to the same switch and I doubt I'm saturating 1gb.
Both nodes show similar issues, while there are no issues connecting to the nodes themselves.

not terribly impressed.

Shockey · Jan 11, 2019

Why is SCSI controller showing yellow?

I assume you are running this on 6.7 based on screenshots. Those are nicer then 6.0 and 6.5

gimp · Jan 11, 2019

Shockey said:
Why is SCSI controller showing yellow?

I assume you are running this on 6.7 based on screenshots. Those are nicer then 6.0 and 6.5

controller isn't on HCL, or at least not certified

gimp · Jan 13, 2019

so today... think I destroyed vsan again, by powering both hosts off.
per VMware, everything should get re-established.
https://blogs.vmware.com/virtualblocks/2018/07/13/understanding-ve-booting-w-vc-unavailable/
sadly, it did not. Or, has not.
Even after sitting for over 30 minutes.
each host did eventually report vsan datastore size and usage, but data reported was what the individual host was providing; not the total.
VMs showing back up with names as just numbers and status of invalid.

Powered my hosts off so I could validate up-to-date firmware on the Corsair m.2 and SanDisk SSD.
Had to use the hotswap bays on my media/fileserver for the SanDisk.
Ran the "long" test with the SanDisk SSD software. Both came back clean.
Also up to date firmware.
Firmware on Corsair m.2 up to date as well.

Yet on one host...
event.WARNING - VSAN device t10.ATA___SanDisk_SDSSDH32000G_______________183795801332_____ is degrading. Consider replacing it..fullFormat (WARNING - VSAN device t10.ATA___SanDisk_SDSSDH32000G_______________183795801332_____ is degrading. Consider replacing it.)

So now trying to run a test on the SSD while in the NUC; bit problematic figuring out what/how to do that, as I would prefer to use a live boot disk so I don't have to blow it all away again.

currently running badblock read-only test off a gparted live.

danswartz · Jan 13, 2019

When do you call it quits, here?

gimp · Jan 13, 2019

danswartz said:
When do you call it quits, here?

well smartctl long test passed on the ssd.
running badblocks non-destructive read-write test.
if that passes then I'd have to say the issue is compatibility-related, and I'd be calling it quits with vSAN attempts for now.

Child of Wonder · Jan 13, 2019

Child of Wonder said:
This is why I avoid VSAN like the plague.

gimp · Jan 14, 2019

well I just fucked up.
I forgot I had disabled the vmw_ahci module one host 1, but never rebooted it.

oops.

vsan is back up.
still not sure what's goin gon with host 2.
even badblocks non-destructive read-write test came back with 0 errors, but host is still barking about the drive "degrading"

Child of Wonder · Jan 15, 2019

Dude, VSAN is steaming garbage. Just quit banging your head against a wall and buy a cheapo used Synology box or build a FreeNAS box. You'll have far less sleepless nights. Trust me, you've only just started to find all the problems and idiosyncrasies VSAN has to offer.

gimp · Jan 15, 2019

Child of Wonder said:
Dude, VSAN is steaming garbage. Just quit banging your head against a wall and buy a cheapo used Synology box or build a FreeNAS box. You'll have far less sleepless nights. Trust me, you've only just started to find all the problems and idiosyncrasies VSAN has to offer.

considering the host itself is reporting issues with my disk but no testing reports any issues, me thinks it's not vsan related.
and, as I want to get some experience with it just because, well... your solution is not a solution.

danswartz · Jan 16, 2019

Child of Wonder said:
Dude, VSAN is steaming garbage. Just quit banging your head against a wall and buy a cheapo used Synology box or build a FreeNAS box. You'll have far less sleepless nights. Trust me, you've only just started to find all the problems and idiosyncrasies VSAN has to offer.

You misspelled 'idiotsyncrasies'

H2R2P2 · Jan 16, 2019

danswartz said:
You misspelled 'idiotsyncrasies'

No he diddnt..

danswartz · Jan 16, 2019

H2R2P2 said:
No he diddnt..

That was a joke...

H2R2P2 · Jan 16, 2019

danswartz said:
That was a joke...

Touche!

Modder man · Jan 17, 2019

Suggesting vSAN is garbage because it doesn't run properly on hardware it was never intended to run on does not make any sense.....I have 100's of servers running vSAN and it works out quite well. There are caveats to its use like with any solution. This particular usecase is not the one vSAN was intended for.

Eulogy · Jan 18, 2019

Child of Wonder said:
Dude, VSAN is steaming garbage. Just quit banging your head against a wall and buy a cheapo used Synology box or build a FreeNAS box. You'll have far less sleepless nights. Trust me, you've only just started to find all the problems and idiosyncrasies VSAN has to offer.

We have a few dozen PBs of vSAN in production, and a few hundred more TB in labs without any real issues.

Shockey · Jan 26, 2019

Eulogy said:
We have a few dozen PBs of vSAN in production, and a few hundred more TB in labs without any real issues.

If you don't mind me asking, what is the maximum number of nodes per a cluster you use with vSAN?

gimp · Jan 26, 2019

Related to my topic...

since I destroyed the vSAN cluster, node2 no longer complains of "degrading" health on the SSD.

ended up piecing together some old drives and mobo and loaded up FreeNAS until I can build myself a proper NAS.
Was hoping to hack together my old MediaSmart Server, but looks like I'll need a proper breakout cable for video out since the thing completely fails to boot (possibly POST?) with a bootable thumbdrive.

Gotta lab up a project I have coming up, and our prod environment is our test environment at work. So breaking down the 6.7 stuff.
I'd rather go through trials and tribulations on my garbage first.

Eulogy · Jan 29, 2019

Shockey said:
If you don't mind me asking, what is the maximum number of nodes per a cluster you use with vSAN?

32, and our smallest is 8.

Shockey · Feb 17, 2019

Eulogy said:
32, and our smallest is 8.

how long does a 32 Node vsan take to upgrade?

Eulogy · Feb 18, 2019

Shockey said:
how long does a 32 Node vsan take to upgrade?

Define upgrade? I'm actually not even sure I know really as our automation just rolls through when we tell it to. Next time we upgrade if I rememeber to I can see how long a single node takes and extrapolate that out, but, how long a thing takes isn't a metric I particuarly care to concern myself with

Shockey · Feb 18, 2019

Eulogy said:
Define upgrade? I'm actually not even sure I know really as our automation just rolls through when we tell it to. Next time we upgrade if I rememeber to I can see how long a single node takes and extrapolate that out, but, how long a thing takes isn't a metric I particuarly care to concern myself with

Ahhh, Automation explains it.

Like tell VUM to go and upgrade entire cluster from 6.0 to 6.5/6.7U1

Eulogy · Feb 18, 2019

Shockey said:
Ahhh, Automation explains it.

Like tell VUM to go and upgrade entire cluster from 6.0 to 6.5/6.7U1

We don't use VUM.. and we're already on 6.7U1

Modder man · Feb 20, 2019

Eulogy said:
We don't use VUM.. and we're already on 6.7U1

We also do not use VUM, All host updates are automated and take between 30-50 minutes per host. We have ~10 or so hosts upgrading at any given time.

Child of Wonder · Feb 20, 2019

Modder man said:
We also do not use VUM, All host updates are automated and take between 30-50 minutes per host. We have ~10 or so hosts upgrading at any given time.

A 32 node cluster would take an entire day to upgrade? Is this with moving data during each host going into maintenance mode? All flash or hybrid? What FTT? Any data services enabled? What's baseline latency and how is it affected with hosts going offline?

Shockey · Feb 20, 2019

Modder man said:
We also do not use VUM, All host updates are automated and take between 30-50 minutes per host. We have ~10 or so hosts upgrading at any given time.

What do you use? PowerCLI?

Eulogy · Feb 20, 2019

Child of Wonder said:
A 32 node cluster would take an entire day to upgrade? Is this with moving data during each host going into maintenance mode? All flash or hybrid? What FTT? Any data services enabled? What's baseline latency and how is it affected with hosts going offline?

Like I said, that was a gut guess as I'm not really sure on the timing. We don't sit and watch things. If it takes an hour or a day to upgrade a cluster, it doesn't matter to us.
When we place a host into maint mode for upgrade, we do a full data evac, not ensure accessibility.
FTT is 2. All flash. No data services.
Latency on what? DIsk I/O? If so, zero impact.

Shockey said:
What do you use? PowerCLI?

Most of our automation is in powercli, but we also use a bit of python in places for this.

TeleFragger · Feb 20, 2019

gimp said:
Related to my topic...

since I destroyed the vSAN cluster, node2 no longer complains of "degrading" health on the SSD.

ended up piecing together some old drives and mobo and loaded up FreeNAS until I can build myself a proper NAS.
Was hoping to hack together my old MediaSmart Server, but looks like I'll need a proper breakout cable for video out since the thing completely fails to boot (possibly POST?) with a bootable thumbdrive.

Gotta lab up a project I have coming up, and our prod environment is our test environment at work. So breaking down the 6.7 stuff.
I'd rather go through trials and tribulations on my garbage first.

Stupid question cuz im like that..hah

Ive only done esxi 6.7 with local storage and no other experience with esxi but... Now the dumb q..
Is the freenas iscsi that your connecting too? I gotta readup on vSan...

gimp · Feb 20, 2019

TeleFragger said:
Stupid question cuz im like that..hah

Ive only done esxi 6.7 with local storage and no other experience with esxi but... Now the dumb q..
Is the freenas iscsi that your connecting too? I gotta readup on vSan...

no, just using it for NFS storage.
One of these days I want to do iSCSI boot for ESXi hosts. But... I need a proper NAS. Not the thing I hobbled together that has garbage performance.

TeleFragger · Feb 21, 2019

gimp said:
Not the thing I hobbled together that has garbage performance.

define hobbled? I'm going to do a freenas box and curious what you consider so... I just posted a thread in virtualized computing on my hardware lookin for thoughts on what to use what as...

gimp · Feb 21, 2019

TeleFragger said:
define hobbled? I'm going to do a freenas box and curious what you consider so... I just posted a thread in virtualized computing on my hardware lookin for thoughts on what to use what as...

spare parts I had lying around.
4 hdds, 3 different sizes (2tb, 3tb, 4tb)
celeron
8gb RAM

TeleFragger · Feb 21, 2019

gimp said:
spare parts I had lying around.
4 hdds, 3 different sizes (2tb, 3tb, 4tb)
celeron
8gb RAM

yeah ive got e5's and 24gb/64gb and 128gb ram but I too have various drives as well...

gimp · Aug 12, 2019

got another NUC, so 3.
Recently'ish picked up a Ubiquiti ES-16-XG 10Gb switch.
Picked up 3x QNAP QNA-T130G1S (TB3 to SFP+) as Aquantia does have ESXi drivers that support the AQC-100 chip.
Rebuilt my homelab, using vSAN.
No issues.

Figured out how to silence the compliance pieces. There's a Ruby vSphere Console where you can silence the health checks (along with a lot of other things.)

VMware vSAN - Single Disk Noncompliant

[H]F Junkie

[H]F Junkie

2[H]4U

[H]F Junkie

[H]F Junkie

2[H]4U

[H]F Junkie

[H]F Junkie

2[H]4U

[H]F Junkie

2[H]4U

[H]F Junkie

2[H]4U

[H]F Junkie

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

[H]ard|Gawd

2[H]4U

2[H]4U

[H]F Junkie

2[H]4U

2[H]4U

2[H]4U

2[H]4U

2[H]4U

[H]ard|Gawd

2[H]4U

2[H]4U

2[H]4U

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

[H]F Junkie