So we decided to upgrade our HDS AMS500 SAN last year (totalling about 50TB in a mix of sata and SAS) to what we thought would be the next version of their product, the AMS2100. We also had Dell knocking on our door about Compellent. Sounds great, they came in at a price point well under what Hitachi could offer, and with features (automated tiering) that the Hitachi guys couldn't offer without going to USP range, which was well above us.
Upgrade went in well, migrate a few things over, performance is great, watching the automated tiering was a bit scary at first, but after a couple months of letting it do its thing, i had no issues letting it do what it does best. We migrate slowly the ESXi cluster over to the Compellent, and its been running like that for the last few months without issue.
Fast forward to last night, at about 10:15 last night, vmware sends me some emails about lost redundancy (which I fail to look at). When I arrive the next morning, I find that the shit has hit the fan.
Both controllers were down, couldnt ping either and no web interface. Looking at the box, the fibre channel was down, network had some lights.
Contacted Co-Pilot and from what I could gather it seems both controllers go into a reboot loop complaining about failed controllers, and then sit in SafeMode until the problem is resolved.
The problem? Well thats still to come, have uploaded the core dumps, hopefully will get an answer tomorrow.
Its quite frustrating, making sure that the solution we accept is redundant in theory, but for this to happen at the same time to both controllers.. not really sure what else to say.
Is this a rant? Probably, sorry for that. I'm not asking for help on what to do, im sure it will be handled by Dell, I just felt the need to type this one out.
Upgrade went in well, migrate a few things over, performance is great, watching the automated tiering was a bit scary at first, but after a couple months of letting it do its thing, i had no issues letting it do what it does best. We migrate slowly the ESXi cluster over to the Compellent, and its been running like that for the last few months without issue.
Fast forward to last night, at about 10:15 last night, vmware sends me some emails about lost redundancy (which I fail to look at). When I arrive the next morning, I find that the shit has hit the fan.
Both controllers were down, couldnt ping either and no web interface. Looking at the box, the fibre channel was down, network had some lights.
Contacted Co-Pilot and from what I could gather it seems both controllers go into a reboot loop complaining about failed controllers, and then sit in SafeMode until the problem is resolved.
The problem? Well thats still to come, have uploaded the core dumps, hopefully will get an answer tomorrow.
Its quite frustrating, making sure that the solution we accept is redundant in theory, but for this to happen at the same time to both controllers.. not really sure what else to say.
Is this a rant? Probably, sorry for that. I'm not asking for help on what to do, im sure it will be handled by Dell, I just felt the need to type this one out.