Petabyte Storage

madrebel · Nov 13, 2012

kdh said:
Im 2n for most things

are you really though? are your raid sets configured evenly across cabinets? its one thing to be 2N at the head/controller node its another to be 2N across raid sets across cabinets and JBODs. that can be done with EMC but it isn't as easy.

then you should have the money to buy a branded hardware solution if you really care about your data and business that much.

if you can accomplish the same thing for 1/4 the cost, why wouldn't you?

that impacts your nexenta install, but it will be outside the realm of nexenta support.(Hardware).

you would be surprised how good their support is. further, i also have a hardware partner to fall back on.

Unless there is a training class that has the exact config you have, it would be impossible for someone to pick it up out the shoot.

a good storage admin would figure it out in a week while working with nexenta support who has full documentation of my configs.

I can get 5 9's of uptime with my gear, and deliver the performance by business partners need.

and? really not terribly difficult to hit 5 9s of availability. uptime be damned idc if i reboot my heads monthly, i don't but as an example if that was required (it isnt) as long as the data is available is all that matters.

and you can pretty much expect it to perform the same way every time. You can't do that with a DYI solution. It's impossible.

i cant or you cant? cause it is quite doale and done a lot.

Your hardware/software stack could completely break because your server manufacture did a firmware update that busted the way your sas cards work.

i read updates and unless i plan on upgrading CPUs i rarely upgrade the bios. if it isn't rebooting, crashing, or has a remote exploit what is the point?

I don't have that problem with a solution that my vendor tested for me when I do a code upgrade on it.

what about those flare updates that went wrong? i just did an OS upgrade that went fine with zero downtime ... that is what a HA system is for.

The amount of hardware variables are pretty small.

as opposed to my variables?

Lets see,
Intel servers ... same as EMC
STEC SSDs ... same as EMC
Seagate SAS drives ... same as EMC

JBODs made by the same folks who make them for EMC, just not with the EMC logo ... ok you got me there i'm totally hosed.

Youtube while a good example, is also a bad example. If a youtube video doesn't load.. People goto the next video..

that is a nice if, however when was the last time you went to a youtube video and it didn't load and wasnt removed for violation or deleted by the OP? Do you really think it is ok for youtube to randomly not serve a video request? you don't think they lose money for downtime? really?

With the work that I do, if the service my company provides doesn't work, we go out of business. I would almost guess the same for you, if one of your large data stores took a dumb, there would be a fairly large impact to your business.

not just mine, i host datastores for quite a few fortune 1000. their admins are all the same, just like you. they come in hating on 'that nexenta thing' and during the migration there is always something that goes wrong and they instantly point the finger at me but it is ALWAYS some other problem. a few months later after zero downtime, better performance, and half the cost they all shut up.

While I do agree, yes, there are some terrified admins.. but that's a small view of things.

im in the data center business. people come to us for VPDC/cloud, IaaS, colo+services etc. typically they come to us to cut costs and improve uptime. typcially the C levels are also somewhat disappointed in their own IT ... some of it is unjustified but much of it is (especially costs).

I think your view of the storage landscape is a bit jaded.

not just storage, dont get me started on cisco.

Anyone and any solution can provide giant dumping grounds of space. Not everyone can provide the level of performance,

my tier0 benchmarked out to 600k IOPs ... not over the wire i would lose probably 40% of that over the wire but none the less i hit 600k iops. my hybrid pools are good for 50k ish. i can go higher on the hybrids, just haven't had a need yet. once the quad socket sandy stuff is blessed by nexenta i double my controller throughput and at least double my IOP capability. i can do in place upgrades on that too, sure, its a new mobo/cpus/ram but it will be a zero downtime upgrade. will probably run about 30k for upgrade since i can reuse the ram and HBAs. how much would it cost to double the performance potential of your EMC heads?

ease of use and level of uptime except a few key players.

ease of use is relative. i can train someone to present nfs or iscsi to clients in a week. replacing failed drives, 2 weeks. the ins and outs of the hardware config and why it is built the way it is might take another day or two. anyone that is comfortable with linux/unix can ick it up pretty easily really and there is a nice gui that makes certain things a breeze (cli is better imo but im weird like that).

I have zero problems presenting a 7 figure quote to my leadership when I know it's aligned with their goals and the level of service they expect from me.

ok cool, just know that if they ever start talking about cost cutting and go looking for a data center partner to help them cut costs or completely take over their hardware you're going to be competing with guys like me delivering the same or better performance at a lower cost.

I just went to nexentas website and did a search for "uptime".. Found 1 hit in regards to how things are monitored. I did the same with EMC's front page, and first document, dated 2007 talking about CX3 Clarrions has 5 9s of uptime right here on page 4. If a storage vendor doesn't brag about uptime, then its a storage vendor I would be wary of.

tell your boss i'll sell him a 5 9's SLA for less than he spends on IT now ...

400% markup is silly, and rarely if ever see that. If my sales guy tried to pull those kinds of shenanigans.. then we just get a different sales guy.

Click to expand...

EMC tried to buy my business. we game them our requirements. 2 HA/clustered setups for two different locations with X usable space at the main location and half that at the DR site. we also had a raw IOP requirement for the main location. the first quote came back at just under 3 million (2x vnx7500s, 2 x 5700s). we said we'll continue exploring our options.

brought DDN in, great company, amazing hardware, still too expensive but they were under 7 figures.

by this time i had the hardware config done for a nexenta based HA solution. all LSI, all SAS, lots of SSD, 512GB ram in each server head, screaming fast ... total cost for 4 heads, licensing and all the JBODs/drives was just over 400k .. i also learned a lot from DDN in how they lay out the raid sets i hadn't thought of that before but now that is part of how i build my pools.

anyway, EMC kept calling and said they would do anything. we told them what we were building with nexenta and if they could match it ok cool. they dropped like 83% off list and were still 100K more expensive only now we didnt have HA at either site ...

i spent less and got over twice the usable space and a boat load more SSD AND got HA and you're telling me they aren't marking up 400%? i have seen their quotes. i've talked to my hardware partners who sell directly to EMC. their markup starts at 400% and if you run extreme volume they'll drop their shorts to move the hardware however what they will not do, under any circumstances is discount the support which is done on a pre-discounted rate.

if your setup costs 5 million but they really really love you and take 50 percent off and you spend 2.5 million for the hardware guess what the support is based on ... not 2.5 million ... 5 million.

how much do you buy 3TB SAS drives for from EMC? I'm guessing about 900 dollars, at best. that is 3 times the cost of buying a SAS drive direct from seagate. the only difference is a sticker and a special firmware. thats the only difference.

EMC are fucking criminals man.

Click to expand...

tangoseal · Nov 13, 2012

I recommend complete professional top tier storage experts to come to your place of business and offer full consultation and planning. It sounds like you are looking at close to a $500,000 SAN system if not close to a $million.

madrebel · Nov 13, 2012

kdh said:
In EMC terms.. Nope. Not even close. The DAEs in both the CX/NS/VNX and V-Max are made for EMC. Provide a link to prove me wrong. Netapp.. I can't answer that.

you used to be able to buy the DAEs from the manufacturer, it may have been quanta who built them i forget. EMC eventually paid for exclusivity. Same thing for netapp only it was more recent with them. My hardware partner used to buy them.

made for and made by are different things. i can buy PCBs with my name stamped all over them too, really doesn't mean shit. its still intel, lsi, pmc sierra, arm, or whoever's actualy silicon on there. EMC is the apple of storage. they run on a very defined set of hardware and they gouge you for the cool looking rack. they don't actually make anything other than software.

From a mechanical stand point, they are close to the same. But there are differences. Try looking up the part numbers from an EMC provided drive with a seagate label. You can't buy that model off the shelf.

you used to be able to flash seagate drives with EMC firmware. i dont think you can do that anymore but we used to do it all the time to save money. mechanically they're IDENTICAL. the only difference is firmware and an extra sticker.

Is the amount of money saves on a cheap solution worth it when your cheap solution
takes a dump and wipes out the whole company? I don't think so.

define cheap solution. are you implying that my solution, using LSI controllers, intel cpu/nics, STEC SSDs, and seagate SAS drives is somehow not an enterprise solution?

yet your solution, using basically the same hardware is somehow less prone to failure because you paid four times what i did? our parts are made by the same manufacturers, yet i got all the shitty ones? really?

I agree with the above statement but there are times when the enterprise also needs to buck up and get real compute solutions instead of one off solutions. One off solutions will always cause problems down the road. You, as someone who stated being in the buisness for the last 20 years.. you can't tell me that some random 1 off solution didn't punch you in the IT face at some point in time.

define 1 off.

define real compute solutions cause i have racks of white box compute (intel, supermicro, and dell) down stairs orchestrated by razor/puppet for provisioning and presenting a mix of vcloud and openstack.

i have some 'one offs' for certain extreme database stuff and in some of those scenarios i'll dedicate hardware to solve the problem (or move the load away to drive more density). sometimes you just can't fix stupid DBAs (and or bloated code) and the only way to solve the problem is a terrabyte of ram and 100% SSD storage. for the most part though proper VM clustering and NFS over 10gig handles everything else with ease.

i'm tellin yah dude ... names and spending huge amounts of money for them is a waste. spend 50% of the money and build in more redundancy. you get more performance and better availability. the tools and eco system are out there, it is a lot easier than you think.

madrebel · Nov 13, 2012

tangoseal said:
I recommend complete professional top tier storage experts to come to your place of business and offer full consultation and planning. It sounds like you are looking at close to a $500,000 SAN system if not close to a $million.

who me? its just over 500k at this point i recently added 96 more drives and 8 more TB of SSD. so grand total of 720TB raw spin disk and just under 30TB raw of SSD along with 8 zeusram write cache drives.

been up and running for 6 months, works great, performance is crazy good.

Deleted member 82943 · Nov 14, 2012

interesting read kdh and madrebel.

@madrebel are you running 10gbit switches?

kdh · Nov 14, 2012

madrebel said:
are you really though? are your raid sets configured evenly across cabinets? its one thing to be 2N at the head/controller node its another to be 2N across raid sets across cabinets and JBODs. that can be done with EMC but it isn't as easy.

Yes. I have 3 vmaxs with my most critical data. each array has its own rack cluster, and the dbas can fail over to any cluster at any given point in time without me. 2 arrays in town, 1 in vegas. My tier 2/3 stuff is on vnx5700 or ns480, again 2 in town, 1 in vegas. I do some replication stuff, san copy, open replicator but nothing all that complicated. Few snaps and clones when needed.

if you can accomplish the same thing for 1/4 the cost, why wouldn't you?

Your beating cost into the ground and while a valid point, you're missing the big picture. It goes back to, when you pay for a certain level of hardware, you’re paying for more than just the hardware. You’re also paying for the support, the software, the single person on the other end of the phone.

you would be surprised how good their support is. further, i also have a hardware partner to fall back on.

I don't doubt their support isn't good. But at some point, they’ll throw in the towel when when something goes sideways and or they can't figure it out and will most likely point fingers at the hardware vendor. Then you have 2 phone calls you have to make. Thats double the amount of hold time, and double the amount of downtime.

a good storage admin would figure it out in a week while working with nexenta support who has full documentation of my configs.

Unlikely. It took me 6 months in my current role to figure out where things layed. That included ramping up on HDS, and Xiotech, and understanding which of the 48 arrays, and 18 cisco switch at the time I had were connected where to which server and how critical each server was. Talking just under 1P of space, not including backups.

If it takes a week to learn your environment, then your environment isn't nearly as big as you claim.

and? really not terribly difficult to hit 5 9s of availability. uptime be damned idc if i reboot my heads monthly, i don't but as an example if that was required (it isnt) as long as the data is available is all that matters.

And Yes. Sure you can hit 5 9s on your hardware, but you’re most likely one of the few who got lucky.

i cant or you cant? cause it is quite doale and done a lot.

Both. If you build one of your storage subsystems today, you'll be hard pressed to build the same exact hardware config next year. But if you buy a vnx5700 a year ago, and one today, it will be exactly the same parts.

i read updates and unless i plan on upgrading CPUs i rarely upgrade the bios. if it isn't rebooting, crashing, or has a remote exploit what is the point?

If you are a publicly traded company, do anything with PCI, or SOX for that matter, your boxes will get patched on a regular basis. Otherwise you will be in violation of compliance. Your DYI system will get patched on a much regular basis then most storage subsystems. IF your not a publicly traded company, but hosts a publicly traded company then you are bound to PCI2.0 standards, which means more patching. I've personally am working an issue with Centos/Redhad and Oracle having issues with native multipathing and round robbin not working right with LVM because of an upstream patch. Thats an example of a current issue, but I could easily see your DYI solution going side ways due to an upstream hba patch sorta like a database company provider. *cough cough oracle exadata cough cough *

what about those flare updates that went wrong? i just did an OS upgrade that went fine with zero downtime ... that is what a HA system is for.

Closest I had a Flarecode upgrade go wrong was an SP didn't reboot. Didn't matter because all my luns tresspassed to the other SP and my hosts never lost data. I unplugged the hung SP and the upgrade kept on trucking. In the 6 years I've been doing Symm(DMX1000, DMX3, V-Max20k, V-Max40k) I've never had a patch go sideways on me.

as opposed to my variables?

Lets see,
Intel servers ... same as EMC
STEC SSDs ... same as EMC
Seagate SAS drives ... same as EMC

That's just low hanging fruit.. Depending on how and when you build things, your Raid Cards, Motherboards, Network Interfaces could all be different. Its like trying to convince me that building one of your solutions on a HP G5 server is the same as building a G6 server a year later. Different platform, bunch of unknowns, and a different level of performance.

JBODs made by the same folks who make them for EMC, just not with the EMC logo ... ok you got me there i'm totally hosed.

We all know everything is made by Foxconn anyway so why does it matter? It matters because no one sells the generic emc daes or a dea that is interchangeable with another solution, and never could, I called a few internal EMC folks and asked. Ontop of that, your not getting some dumb DAE, your getting the software that makes that dumb DAE smart.

that is a nice if, however when was the last time you went to a youtube video and it didn't load and wasnt removed for violation or deleted by the OP? Do you really think it is ok for youtube to randomly not serve a video request? you don't think they lose money for downtime? really?

You tube loses a few cents if a video doesn't get serviced, whoopie doo... My company goes out of business if my data doesn't get served. Big big difference.

not just mine, i host datastores for quite a few fortune 1000. their admins are all the same, just like you. they come in hating on 'that nexenta thing' and during the migration there is always something that goes wrong and they instantly point the finger at me but it is ALWAYS some other problem. a few months later after zero downtime, better performance, and half the cost they all shut up.

Any "upgrade" to newer and better hardware is always going to seem faster and better. Most so if said customer didn't have a good solution to begin with. However, if I had an oracle app that was shoving close to 40k Random IOPS(which I do), and I thought it would be an awesome idea to put that app on some zfs data stores instead of ASM volumes? I'd be looking for a new job.

im in the data center business. people come to us for VPDC/cloud, IaaS, colo+services etc. typically they come to us to cut costs and improve uptime. typcially the C levels are also somewhat disappointed in their own IT ... some of it is unjustified but much of it is (especially costs).

The above makes sense, just doesn't apply to me. Ontop of that, if folks are going to you to host their systems, then they are small shops with out the demand for dedicated systems or they have bursty traffic. My hardware does does a bunch of vmware, but my biggest user of storage is Oracle. Only up until recently did Oracle give the green light to run Oracle on Vmware. There is no way in hell id run my OLTP platform behind vmware, with zfs data stores. Your just asking for performance problems. Further more, I've done buisness with 4 different data center providers, and not some chop shops either. The ones I've worked with, thier hosting and cloud services didn't even come close to meeting my needs. They flat out said, the'd have to buy and HDS or EMC array to do what we need.

not just storage, dont get me started on cisco.

why not.. UCS is one hell of a platform. Big wins all around with it in my shop. We will be doing more with cisco.

my tier0 benchmarked out to 600k IOPs ... not over the wire i would lose probably 40% of that over the wire but none the less i hit 600k iops. my hybrid pools are good for 50k ish. i can go higher on the hybrids, just haven't had a need yet. once the quad socket sandy stuff is blessed by nexenta i double my controller throughput and at least double my IOP capability. i can do in place upgrades on that too, sure, its a new mobo/cpus/ram but it will be a zero downtime upgrade. will probably run about 30k for upgrade since i can reuse the ram and HBAs.

Totally and completely disagree. A tier0 pool with 600K Iops would need over 840 Flash/EFDs in a Mirrored config or 60000 450gig 15K FC drives to pull it off. Thats a massive system. It becomes laughable and even more unbelieable with 60k 450gig 15k drives.

http://www.wmarow.com/storage/strcalc.html

There is no single server on the market where you could cram that many controller cards to pull it off. If you are doing HA on something that large, it would be huge. Your talking atleast 8 servers, and atleast 5 racks of disks drives in jbods. Then the internal HBAS to your servers would be completely maxed out as the fan our ratio on server based HBAs isn't that high. You would completely crush the bus of everyone of those systems. Even if you some how are pulling it off, the numbers in terms of cost you keep kicking around in your posts don't even come close to adding up.

how much would it cost to double the performance potential of your EMC heads?

Depends on which quote the sales guy sends me.. the 400% mark up one, or the real ones i'm used to getting. *wink*

ease of use is relative. i can train someone to present nfs or iscsi to clients in a week.

A week? Wow.

I trained someone to present an NFS share via the Isilon GUI in 10 minutes. Took me an hour on Celerra.

replacing failed drives, 2 weeks.

2 weeks to replace a drive? I do it in 4 1/2 hours. 4 hours is just me waiting for the part to land onsite.

the ins and outs of the hardware config and why it is built the way it is might take another day or two. anyone that is comfortable with linux/unix can ick it up pretty easily really and there is a nice gui that makes certain things a breeze (cli is better imo but im weird like that).

Spoken like a true linux expert. You fail to realize that not every storage admin out there is as highly proficent at linux as you. Just because you can nail it every time doesn't everyone else can. Its also a very small shop way of thinking. When you have a team of 12+ folks, there can't be 1 goto guy who knows everything and only gives out small nuggests of wisdom from time to time. Everyone has to know whats going on, and how to fix something when it goes sideways.

ok cool, just know that if they ever start talking about cost cutting and go looking for a data center partner to help them cut costs or completely take over their hardware you're going to be competing with guys like me delivering the same or better performance at a lower cost.

It will never happen. I'll say it again. I work in the financial sector.. Due to the way our PCI2.0 regulations are defined, we must have 100% positive control of the data. when you allow an outside source access to your data, you just violated the policy. When you run the risk of losing the ability to do buisness with visa and mastercard.. You hold onto your data.

tell your boss i'll sell him a 5 9's SLA for less than he spends on IT now ...

And my boss will walk away. You keep missing my point. You keep talking about cost savings, and I'll say it again. When you buy a hardware stack, it has a known proven track records and will function the same way every time. You also get a lot more then just a hardware stack, but a whole level of support that you don't get with your DYI solution. Your DYI solution has you calling X number of vendors that’s makes your DYI solution function when things go sideways.

EMC tried to buy my business. we game them our requirements. 2 HA/clustered setups for two different locations with X usable space at the main location and half that at the DR site. we also had a raw IOP requirement for the main location. the first quote came back at just under 3 million (2x vnx7500s, 2 x 5700s). we said we'll continue exploring our options.

3 million? Thats list, and they most likely didn't take you serious. They knew you were shopping around and weren’t really all that interested to begin with.

brought DDN in, great company, amazing hardware, still too expensive but they were under 7 figures.

by this time i had the hardware config done for a nexenta based HA solution. all LSI, all SAS, lots of SSD, 512GB ram in each server head, screaming fast ... total cost for 4 heads, licensing and all the JBODs/drives was just over 400k .. i also learned a lot from DDN in how they lay out the raid sets i hadn't thought of that before but now that is part of how i build my pools.

400K is nothing to laugh at, but it’s still a complicated 1 off solution that only you know the ins and outs of. That makes sense for small shops.. but not large ones.

anyway, EMC kept calling and said they would do anything. we told them what we were building with nexenta and if they could match it ok cool. they dropped like 83% off list and were still 100K more expensive only now we didnt have HA at either site ...

Click to expand...

83% isn't shocking. End of quarter and they'll work massive magic. How'd you lose HA? I'm confused as to how you define HA. You can't interconnect 2 VNXs so they run active/active. It doesn't work that way.. UNLESS, you have v-plex in front of them.

i spent less and got over twice the usable space and a boat load more SSD AND got HA and you're telling me they aren't marking up 400%? i have seen their quotes.

Click to expand...

Again, a custom 1 off solution that works for you. But doesn't work and or apply to everyone else.

I’ve talked to my hardware partners who sell directly to EMC.

Click to expand...

I highly doubt you've talked to the sales guys over at FoxConn, Cisco(vblock). Hitachi, and Seagate about the costs and prices that they set with EMC. Even if you did, I wouldn't brag about it on a public message board for fear of violating some NDA.

Their markup starts at 400% and if you run extreme volume they'll drop their shorts to move the hardware however what they will not do, under any circumstances is discount the support which is done on a pre-discounted rate.

Click to expand...

You might get 400% markup, but hardly anyone else does. I know I don't.

if your setup costs 5 million but they really really love you and take 50 percent off and you spend 2.5 million for the hardware guess what the support is based on ... not 2.5 million ... 5 million.

Click to expand...

Not even close to accurate. Are you just pulling stuff out of the air right now?

Support isn't based on total cost of solution, but based on the hardware itself which is pro-rated over 3 years, the amount of space used, and license used. I know because I do all the support for my shop.

how much do you buy 3TB SAS drives for from EMC? I'm guessing about 900 dollars, at best. that is 3 times the cost of buying a SAS drive direct from seagate. the only difference is a sticker and a special firmware. thats the only difference.

Click to expand...

That drive may cost me 900 bucks(if that), but when it takes a dump, I'll have a replacement in my hands in less than 4 hours from the local parts depot. Call Seagate and ask them to ship you a drive in 4 hours, and let me know how that works out for you.

EMC are fucking criminals man.

Click to expand...

Everyone is entitled to their own opinion.. But I completely disagree.

Click to expand...

kdh · Nov 14, 2012

madrebel said:
you used to be able to buy the DAEs from the manufacturer, it may have been quanta who built them i forget. EMC eventually paid for exclusivity. Same thing for netapp only it was more recent with them. My hardware partner used to buy them.

Doubt it. I've had the joy of replacing parts in cx3s,cx4s,nss, and vnx, and they all said foxcon.

Now prior to that, i'm not sure, but your talking 10+ years ago, so its completely null and voice at this time.

made for and made by are different things. i can buy PCBs with my name stamped all over them too, really doesn't mean shit. its still intel, lsi, pmc sierra, arm, or whoever's actualy silicon on there. EMC is the apple of storage. they run on a very defined set of hardware and they gouge you for the cool looking rack. they don't actually make anything other than software.

When you run on a very defined set of hardware, combine that with software and you get unparralled performance and a gareenteed uptime with little fuss. The racks aint cool either. They design the hardware, and then foxconn builds it. You are saddly mistaken if you think you can buy the same exact hardware that makes up a v-max engine or clariion service processor.

you used to be able to flash seagate drives with EMC firmware. i dont think you can do that anymore but we used to do it all the time to save money. mechanically they're IDENTICAL. the only difference is firmware and an extra sticker.

Not true.. only up until Flare 30 did they allow you to flash the drives yourself. They didn't allow you to get the firmware. You had to have an EMC CE download it and do it.

define cheap solution. are you implying that my solution, using LSI controllers, intel cpu/nics, STEC SSDs, and seagate SAS drives is somehow not an enterprise solution?

Honest answer? Yes. You maybe using it an in enterprise, but its far from an enterprise solution. Its a complicated custom 1 off that no one else has. That in itself makes it a scary solution to support for anyone that may have to fill your shoes one day.

yet your solution, using basically the same hardware is somehow less prone to failure because you paid four times what i did? our parts are made by the same manufacturers, yet i got all the shitty ones? really?

Standards. I can go from my shop, and walk in the door of another log into SMC or Unisphere and expect to see the same exact stuff from location to location. I never said your parts were shitty. But I know I can get my replacement parts in less then 4 hours. Can you say the same?

define 1 off.

Your solution doesn't exist in any other shop but your own. That makes it a 1 off.

define real compute solutions cause i have racks of white box compute (intel, supermicro, and dell) down stairs orchestrated by razor/puppet for provisioning and presenting a mix of vcloud and openstack.

If that stuff works for you, awesome. But it has its limits.

i have some 'one offs' for certain extreme database stuff and in some of those scenarios i'll dedicate hardware to solve the problem (or move the load away to drive more density). sometimes you just can't fix stupid DBAs (and or bloated code) and the only way to solve the problem is a terrabyte of ram and 100% SSD storage.

What you call a 1 off for DBAs, I call normal for my OTLP and Datawarehouse applications. But if youre using 100% SSD, you're still doing it wrong. Any modern system with built in automatic storage tiering will negate the need for 100% ssd.

for the most part though proper VM clustering and NFS over 10gig handles everything else with ease.

In your world, yes. But that doesn't apply to everyone else.

i'm tellin yah dude ... names and spending huge amounts of money for them is a waste. spend 50% of the money and build in more redundancy. you get more performance and better availability. the tools and eco system are out there, it is a lot easier than you think.

Everyone has an opinion, and I disagree with the above statement. In the long term, a DYI solution will punch you in the face.. it always does.

kdh · Nov 14, 2012

madrebel said:
who me? its just over 500k at this point i recently added 96 more drives and 8 more TB of SSD. so grand total of 720TB raw spin disk and just under 30TB raw of SSD along with 8 zeusram write cache drives.

been up and running for 6 months, works great, performance is crazy good.

Nah, I think this was directed at the OP, which you and I completely hijacked this thread.

whoops.

Child of Wonder · Nov 14, 2012

Bottom line is this --

Custom built SANs or software based SANs fill a particular niche when businesses want to cut costs. However, as a former Sys Admin and now a consultant, I would NEVER stake my job or career on a solution like that. Sure, everything may run fine and you may get kudos for saving the company money but when shit hits the fan and you end up with significant downtime or problems with the solution where does the blame fall? Squarely on your shoulders. It's YOUR baby. If something goes wrong I want to get an EMC, Netapp, or Hitachi on the phone and get it figured out ASAP and let them take the blame.

I have absolutely no doubt that Nexenta is a great product and have many, many happy customers. But when I walk into a Fortune 500 company and talk to them about storage solutions, if I asked them "hey, have you considered building your own Linux SAN?" I would get laughed out of the room and my competitors would sell them an EMC, Netapp, or HDS system.

When Nexenta builds their own software AND hardware platform and establishes a track record with that platform of reliability and performance, then they may enter the discussion. Until then, I wouldn't recommend them except to very small businesses who care only about cost.

EDIT:

While checking on a Nexenta 4 ETA I found a perfect example on why I would not stake my career on a product like Nexenta. While all vendors are guilty of code problems (EMC dual control station upgrades, VMware 3.5 U2, etc.) this problem with Nexenta brings down the entire SAN. After an upgrade to 3.1.3 QLogic FC cards refuse to work and only a rollback to 3.1.2 fixes them. It's not just a Community Edition problem either, it's in Enterprise as well. They released 3.1.3.5 that says it's fixed but it wasn't. This has been an issue for 5 months.

http://nexentastor.org/boards/1/topics/7009

kdh · Nov 14, 2012

Child of Wonder said:
Bottom line is this --

Custom built SANs or software based SANs fill a particular niche when businesses want to cut costs. However, as a former Sys Admin and now a consultant, I would NEVER stake my job or career on a solution like that. Sure, everything may run fine and you may get kudos for saving the company money but when shit hits the fan and you end up with significant downtime or problems with the solution where does the blame fall? Squarely on your shoulders. It's YOUR baby. If something goes wrong I want to get an EMC, Netapp, or Hitachi on the phone and get it figured out ASAP and let them take the blame.

I have absolutely no doubt that Nexenta is a great product and have many, many happy customers. But when I walk into a Fortune 500 company and talk to them about storage solutions, if I asked them "hey, have you considered building your own Linux SAN?" I would get laughed out of the room and my competitors would sell them an EMC, Netapp, or HDS system.

When Nexenta builds their own software AND hardware platform and establishes a track record with that platform of reliability and performance, then they may enter the discussion. Until then, I wouldn't recommend them except to very small businesses who care only about cost.

EDIT:

While checking on a Nexenta 4 ETA I found a perfect example on why I would not stake my career on a product like Nexenta. While all vendors are guilty of code problems (EMC dual control station upgrades, VMware 3.5 U2, etc.) this problem with Nexenta brings down the entire SAN. After an upgrade to 3.1.3 QLogic FC cards refuse to work and only a rollback to 3.1.2 fixes them. It's not just a Community Edition problem either, it's in Enterprise as well. They released 3.1.3.5 that says it's fixed but it wasn't. This has been an issue for 5 months.

http://nexentastor.org/boards/1/topics/7009

This guy right here said what I've been trying to say eloquently and right to the point. *thumbs up*

kdh · Nov 14, 2012

@madrebel, after my late night posts, I took step back and thought about your directions and mine in terms of storage. You focus on 10gig E and NFS, maybe some CIFs and a little bit of direct attached storage using mostly vmware and some db software.

I focus on 4/8gig FC all direct attach storage, Oracle/RAC Clusters, Vmware, and Linux(Centos/Redhat/Oracle) and a tiny bit of nfs/cifs(isilon).

Your response times in your 10GigE and NFS, will never come close to the response times I need in my direct attached 8Gig FC environment. I highly doubt you can get and maintain a 4ms response time at all times over NFS vs FC.

With that said, neither of us are "wrong"..

I just disagree with the technology stack you are using to go the direction you are going.

Child of wonder hit a key point.. When Nexenta starts building hardware to go with their solution people will without a doubt take notice. But at that point, what would Nexenta give me that Isilon doesn't already do?

Isilon has its OneFS OS and its own filesystem that spans nodes which does what Nexenta is currently doing, but with a proven hardware stack and a dedicated hardware and software team that works together at the hip. Looking at previous sales quotes for my Isilon X-Blades, my 40Ts of storage comes in at almost the same numbers you claim you can get with your DYI solution. You mentioned your built your solution for 400k. I could build the same exact solution with Isilon with a 400K budget, using an infiniband backend, that has a proven hardware and software stack that would walk all over your DYI solution in every way, shape and form every day.

One more note.. I went to Monster, and did a few searches... Searched EMC and found thousands of hits. 450 hits for Netapp, Almost 200 for Hitachi and 30 for Isilon.

Guess how many hits for Nexenta.. 1.

Being a Nexenta Storage admin has extreamlly limited options base. Your skill set is niche at best, and not very many folks will value it. You can't say the same for EMC, Netapp or Hitachi.

madrebel · Nov 14, 2012

kdh said:
@madrebel, after my late night posts, I took step back and thought about your directions and mine in terms of storage. You focus on 10gig E and NFS, maybe some CIFs and a little bit of direct attached storage using mostly vmware and some db software.

i use NFS because it is easier. i can do and do do iscsi, too.

Your response times in your 10GigE and NFS, will never come close to the response times I need in my direct attached 8Gig FC environment. I highly doubt you can get and maintain a 4ms response time at all times over NFS vs FC.

i can sustain 1-2ms over NFS or iscsi at obscenely high random IO rates.

Being a Nexenta Storage admin has extreamlly limited options base. Your skill set is niche at best, and not very many folks will value it. You can't say the same for EMC, Netapp or Hitachi.

certifications have never meant much to me. i have hands on with emc, netapp, compellant, 3par, and most others. its all the same thing. just like networking with cisco is the same as networking with juniper. there are differences of course but they all do the same thing. if you can do X with cisco you can do X with juniper, very few exceptions to that rule. same goes with EMC, netapp, etc.

madrebel · Nov 14, 2012

Child of Wonder said:
If something goes wrong I want to get an EMC, Netapp, or Hitachi on the phone and get it figured out ASAP and let them take the blame.

have you ever been in a situation where data was actually lost? the SAN blew up, the backups weren't good, data was just fucking gone? I have, well, I have been called in after the fact to clean up. One one occasion we managed to recover some crucial data, in most cases its just fucking gone. It didn't matter that the box had EMC or NETAPP on it. That didn't save these particular clients.

Heads rolled, etc etc. Granted, not all the time was it the fault of emc or netapp or whomever. The worst one I dealt with was during Katrina and the san was in 6 feet of water. oops. oddly enough the netapp nameplate didn't make that particular installation water proof.

point is, if your backups and DR plan aren't sound and TESTED it doesn't matter what you run.

I have absolutely no doubt that Nexenta is a great product and have many, many happy customers. But when I walk into a Fortune 500 company and talk to them about storage solutions, if I asked them "hey, have you considered building your own Linux SAN?" I would get laughed out of the room and my competitors would sell them an EMC, Netapp, or HDS system.

first, it is based on solaris. i know, enterprises laugh at solaris ....

second, i will probably displace at least 15 SANs this year from the big 3. folks looking at renewing support or just looking to get out of on prem self hosting.

this problem with Nexenta brings down the entire SAN. After an upgrade to 3.1.3 QLogic FC cards refuse to work and only a rollback to 3.1.2 fixes them. It's not just a Community Edition problem either, it's in Enterprise as well. They released 3.1.3.5 that says it's fixed but it wasn't. This has been an issue for 5 months.

known issue. shitty issue, however, 90% + of their business is ethernet facing. infiniband is picking up steam too but i wouldn't recommend nexenta for FC deployments. they aren't the right tool for that job and if you're heavily invested in FC ... well i pity you but its what you have so may as well run with it and use the best tool.

for new ethernet build outs or existing ethernet only shops that FC bug has zero impact.

Child of Wonder · Nov 14, 2012

madrebel said:
have you ever been in a situation where data was actually lost? the SAN blew up, the backups weren't good, data was just fucking gone? I have, well, I have been called in after the fact to clean up. One one occasion we managed to recover some crucial data, in most cases its just fucking gone. It didn't matter that the box had EMC or NETAPP on it. That didn't save these particular clients.

Heads rolled, etc etc. Granted, not all the time was it the fault of emc or netapp or whomever. The worst one I dealt with was during Katrina and the san was in 6 feet of water. oops. oddly enough the netapp nameplate didn't make that particular installation water proof.

You're absolutely right, failures can happen to any vendor. But there's a reason so many businesses put their money on EMC, Netapp, etc. when it comes to avoiding failures and rapid failure resolution versus Starwind, Nexenta, and so on.

10 years ago I thought the same way you do. Why spend so much money on an expensive SAN? I can set up desktop equipment running Linux to do the same thing! They want how much for a drive?? That same drive is 10% the cost on Newegg!

Now that I've been doing this stuff for 10 years and I see the problems that come from hodge podge solutions, the people that have had their careers ruined because of them or because there was little market for their skillset, and that when it comes to trusting your business applications and data to rapidly spinning metal, you better have a rock solid solution in place with an army of support with the expertise and logistics to correct any problems that come up in the shortest amount of time possible. There is no time for finger pointing between hardware and software vendors or waiting on parts your business loses access, even temporarily, to data.

point is, if your backups and DR plan aren't sound and TESTED it doesn't matter what you run.
first, it is based on solaris. i know, enterprises laugh at solaris ....

I don't. I love Linux/BSD/Solaris based stuff. Was a Linux Admin for a few years and enjoyed it.

And, sure, a DR plan can help, but it's not optimal. I wouldn't build a house out of 2x2's because I know I have another one also built out of 2x2's ready for me to move into.

second, i will probably displace at least 15 SANs this year from the big 3. folks looking at renewing support or just looking to get out of on prem self hosting.

That's cool. When cost is the primary driving factor it's hard not to look at a software based SAN.

known issue. shitty issue, however, 90% + of their business is ethernet facing. infiniband is picking up steam too but i wouldn't recommend nexenta for FC deployments. they aren't the right tool for that job and if you're heavily invested in FC ... well i pity you but its what you have so may as well run with it and use the best tool.

for new ethernet build outs or existing ethernet only shops that FC bug has zero impact.

5 month old issue that has caused major problems for a lot of people. If a client of mine performed a FLARE upgrade on their VNX and lost the ability to use FC I don't think there would be any way in hell I could talk them out of replacing it with something else.

Anyway... I'm not going to respond on this subject anymore. A heated or emotional debate isn't going to accomplish anything. I'm very happy you've got a Nexenta setup working for you and your clients. The market needs competition and the more players in the market the better. :thumbsup

Deleted member 82943 · Nov 14, 2012

hey op what did you end up doing?

madrebel · Nov 14, 2012

kdh said:
why not.. UCS is one hell of a platform. Big wins all around with it in my shop. We will be doing more with cisco.

meh ... pretty cool but grossly over priced.

Totally and completely disagree. A tier0 pool with 600K Iops would need over 840 Flash/EFDs in a Mirrored config or 60000 450gig 15K FC drives to pull it off. Thats a massive system. It becomes laughable and even more unbelieable with 60k 450gig 15k drives.

i'm not sure where you get your numbers but here you go

http://www.stec-inc.com/product/zeusiops.php

32 of those in raid10, 16 'drives', test was 50/50 read/write. ok, to be completely transparent i'm partially stretching the truth. the way zfs works in arc reads/writes were 600K but that was in ram. once we finally added enough threads to over run the ARC the disk layer itself was hitting 300K. these were 4K tests.

i recently ran a small test on some new enterprise kingston sata SSDs (not part of my prod setup). 8 of them in a stripe to simulate the write performance of a larger setup. i got 21937 IOPs @8K blocks with a 100% write and 100% random iometer test pattern. that was 2 hosts and 2 single gigabit links with an average aggregate throughput of 171.3MBps at 1500MTU.
[/quote]

There is no single server on the market where you could cram that many controller cards to pull it off. If you are doing HA on something that large, it would be huge. Your talking atleast 8 servers, and atleast 5 racks of disks drives in jbods. Then the internal HBAS to your servers would be completely maxed out as the fan our ratio on server based HBAs isn't that high. You would completely crush the bus of everyone of those systems. Even if you some how are pulling it off, the numbers in terms of cost you keep kicking around in your posts don't even come close to adding up.

my heads have 6 controllers each, lsi 9205s. each controller LSI claims can do 600k IOPs i'm guessing that is 4K blocks. 4 of those 6 are meshed between 4 JBODs that house only solid state. the remaining two are meshed to SAS switches and below them spinning disk. 288gbps aggregate throughput at the controller however the spin disks currently live in pci-e 2.0 x4 slots so the actual throughput is limited there until the quad socket sandy bridge motherboards are blessed off and tested (in process now). sandy bridge brings pci-e 3.0 which doubles my throughput on a per slot basis and will allow me to run x8 slots for all my HBAs instead of just the SSD HBAs. sandy bridge CPUs move the IO processor from the northbridge (westmere) to the CPU itself so i can scale IO capability by filling all 4 sockets.

like i said, you should do your homework on the performance potential of nexenta.

if you have any other questions, i'll happily answer.

Deleted member 82943 · Nov 14, 2012

lol this is about to spin out of control...

madrebel · Nov 14, 2012

Child of Wonder said:
You're absolutely right, failures can happen to any vendor. But there's a reason so many businesses put their money on EMC, Netapp, etc. when it comes to avoiding failures and rapid failure resolution versus Starwind, Nexenta, and so on.

status quo

10 years ago I thought the same way you do. Why spend so much money on an expensive SAN? I can set up desktop equipment running Linux to do the same thing! They want how much for a drive?? That same drive is 10% the cost on Newegg!

for the record 10 years ago you would be a fool to suggest such a thing. although, we did flash regular off the shelf drives with emc/netapp firmware from time to time

. at any rate in the past 5 years there have been many significant changes and or new technologies that have changed what can/can't or should/shouldn't be done. there was no facebook in 2002, no SSDs, etc.

Now that I've been doing this stuff for 10 years and I see the problems that come from hodge podge solutions, the people that have had their careers ruined because of them or because there was little market for their skillset, and that when it comes to trusting your business applications and data to rapidly spinning metal, you better have a rock solid solution in place with an army of support with the expertise and logistics to correct any problems that come up in the shortest amount of time possible. There is no time for finger pointing between hardware and software vendors or waiting on parts your business loses access, even temporarily, to data.

and yet the fact that the googles and facebooks of the world built billion dollar businesses using software driven COTS solution is apparently completely lost on you.

some of you enterprise grognards seem to think your databases running at 40K IOPs is some crazy impressive super high end piece of technology. the guys in wallstreet trading, the guys inside facebook, they laugh at those tiny numbers. you think greenplum didnt know what they were doing when they standardized on solaris/zfs for their data mining appliances? btw, emc bought greenplum .... want to bet whos filesystem/hardware still powers that? it ain't a VNX.

5 month old issue that has caused major problems for a lot of people. If a client of mine performed a FLARE upgrade on their VNX and lost the ability to use FC I don't think there would be any way in hell I could talk them out of replacing it with something else.

last point i'll respond too. i have heard this is an issue for one relatively large customer of theirs. shitty problem, no doubt. however don't over estimate the scope of the problem. 'lots' of people are not affected.

GeorgeHR · Nov 14, 2012

gigatexal said:
interesting read kdh and madrebel.

I expect that 1PB of data is worth enough that the cost of the storage is not material.

If anyone suggested a DIY solution for my 1PB of data, they would be out on the road.

----

Certainly there are data systems larger than 1PB that are for all practical purposes worth nothing, but ...

----

My business has 50GB of data that is worth about $250K. About $5 billion/PB. I wish I had 1PT of data.

Deleted member 82943 · Nov 14, 2012

the banter back and forth was interesting as I saw two different approaches: DIY vs. BTO. I also learned a few things even tho I lost a lot in the terminology.

I'm still curious if the OP got scared away and what direction he decided to go.

kdh · Nov 14, 2012

madrebel said:
for the record 10 years ago you would be a fool to suggest such a thing. although, we did flash regular off the shelf drives with emc/netapp firmware from time to time .

I'm going to quote this last line and then I'm out.

After poking many holes in your arguements, I'lI raise the bullshit flag on you, yet again.

Customers had no access to drive firmware only until recently, and none had the tools to flash them. To put it bluntly, youre talking out your ass when it comes to EMC hardware and support.

I'm going to go follow Child Of Wonders lead, and exit. Best of luck.

To the OP.. Sorry for train wrecking your thread. Best of luck.

madrebel · Nov 14, 2012

GeorgeHR said:
I expect that 1PB of data is worth enough that the cost of the storage is not material.

have you priced out 1PB worth of storage recently? the cost is VERY material.

Certainly there are data systems larger than 1PB that are for all practical purposes worth nothing, but ...

care to link me one?

My business has 50GB of data that is worth about $250K. About $5 billion/PB. I wish I had 1PT of data.

don't wish for things you don't understand. managing large amounts of data presents many challenges.

that said you're a fool for not considering your options. you have 50GB of data now ... which is nothing ... so you probably have some low end synology or something yes? maybe a compellant. at best you run raid 6 and probably a tape back up, hopefully. ok.

what is protecting your data on disk? that file you haven't read in 5 years that contains some important info you may need how do you know it isn't corrupted? with zfs if that read fails you rebuild from checksum.

you put so much faith in products you've at least heard about but you can't tell me why apart from the fact that you know about it and they have a support number. ok, great. EMC drives still suffer from silent corruption. EMC drives still lose sectors. it is absolutely asinine to completely discount something that you really know nothing about simply because you haven't heard of it.

Customers had no access to drive firmware only until recently, and none had the tools to flash them. To put it bluntly, youre talking out your ass when it comes to EMC hardware and support.

i had direct lines to development staff at EMC and netapp. those folks have long moved on, many to form new companies. what you did or didn't have access to is not what i did or didn't have access too.

you haven't poked holes in anything btw, you've burried your head further and further in the sand. you've told me what i can't do, despite the fact that i've been doing it for awhile.

keep doing what you're doing and i'll keep delivering cost effective OLAs/SLAs. your boss has been reading all about 'cloud' and is actively looking for ways to reduce spending. if i can deliver the same thing you can for a lower price, and i can, well ... i'm just saying. in the past 3 months my sales guys have been talking with an alarming number of very large companies, all of which are interested in getting out of or drastically scaling back their on prem IT infrastructure. organizations like mine then really only have to deliver an equivalent feature set at a more competitive price. if we can deliver beyond that, and we can, in house IT (specfically network, storage, and server procurement/deployment/break fix) are in a bad position.

olavgg · Nov 15, 2012

I don't understand why so many are upset with what madrebel says.

I would never ever outsource my data storage infrastructure to a third party. No matter the cost. I want full control, and full knowledge about how to fix things. I want to be agile to new requirements.

9999 of 10000 people working with data storage do not have enough knowledge about data security and consistency. Heck, even big companies like Oracle do weird things with their data storage solutions that could cause "dirty" data being written.

If someone came in here suggesting outsouring our storage to EMC/IBM/Oracle, I would do anything to get them fired. I do not want to work with people that wants rely on an external third party because they don't dare to take the blame!

TeeJayHoward · Nov 15, 2012

Threads like this make me realize how little I really know!

madrebel said:
i also learned a lot from DDN in how they lay out the raid sets i hadn't thought of that before but now that is part of how i build my pools.

Would you mind sharing this information? I'd love to learn more about why stuff gets laid out in certain ways.

madrebel · Nov 15, 2012

TeeJayHoward said:
Would you mind sharing this information? I'd love to learn more about why stuff gets laid out in certain ways.

so that is a DDN rack, they will do everything in their power to sell you a whole rack too. the reason for that is they use lots of raid6. raid6 is best with 6 or 10 drives. if you look at that picture you can see 10 JBODs. each JBOD holds 60 disks (they have an 80 drive JBOD on the way).

now to make a raid device in the DDN world you do that via a single disk in each JBOD. 10 JBOD, 1 disk in each, now you have the perfect number for raid 6. You're also spreading out the throughput for reading from that raid device across 10 JBOD controllers.

a second benefit to that is you can fail an entire JBOD (2 actually) and you've only lost a single drive in each raid set. the rest of your system continues operating without problem. rather low probability of that happening without the rest of the rack failing but nice none the less.

so the way i do it is along that same line of thought. build the raid sets with failure in mind. i use only mirroring or 4 disk raidz2. i have two racks so when building mirrors i pick a drive in each rack in opposing JBODs and then the next mirror is in the next JBODs, etc, etc till i start again on the first set of JBODs. for raidz2 opposing racks again but 1 drive in 4 opposing JBODs.

i can fail an entire rack and still be online. rebuild wouldn't be fun as i can't lose anymore drives during said rebuild however I am still available and can always go to tape if the rebuild fails (well, replicate from disk first, failing that, tape).

GeorgeHR · Nov 15, 2012

madrebel said:
1) have you priced out 1PB worth of storage recently? the cost is VERY material.

2) don't wish for things you don't understand. managing large amounts of data presents many challenges.

3) that said you're a fool for not considering your options. you have 50GB of data now ... which is nothing ... so you probably have some low end synology or something yes? maybe a compellant. at best you run raid 6 and probably a tape back up, hopefully. ok.

4) what is protecting your data on disk? that file you haven't read in 5 years that contains some important info you may need how do you know it isn't corrupted? with zfs if that read fails you rebuild from checksum.

1) Considering that 1PT of my data would be worth billions, the cost of the storage is not material. Buying a high end system from someone in the business of data storage and deep pockets is much better than some DIY solution. It also is very cheap insruance.

2) Managing 1PT of data does present a lot of challenges, but the biggest challenge is to maintain sufficient staff so that if someone decides to quit, the data is safe. An off the shelf implementation does that better than a DIY implementation.

3) No. I use an off the shelf PC. No RAID. I have a backup policy. I have a recovery plan which has been tested sufficiently.

4) Any file that is 5 years old has 60 copies (spread over 5 hard drives) in each of 4 places. It is difficult to imagine not finding a usable copy.

---

You appear to know very little except that DIY is cheaper than a purchased/supported solution.

madrebel · Nov 15, 2012

so the guy telling me i know nothing is the same guy protecting '$250k' data by copying it everywhere, not using any raid, with no on disk protection.

luckily for you that is a feasible plan i would like to see you copy 1TB 60 times and then tell me cost isn't a factor.

zrav · Nov 16, 2012

I worked for a movie post house that had a ~400TB SAN cluster from Bright Technologies, which is specifically tailored for the needs of the digital intermediate workflow, with things like sequence aware file ordering, etc. We were generally happy with it and would've upgraded/expanded capacity further if budget wasn't constrained. But so ZFS options were being investigated when I left.
Recently I also witnessed a successful deployment of a ~0.5PB Nexenta cluster...

dedobot · Nov 16, 2012

Very interesting even little bit out of topic.
I like the madrebel's approach and had to read almost all of his post in [H] but his opponent,kdh, have very valid points.
One of them is about HW parts with "EMC" logo or some other big company where I have humble experience. I'm tempting to post again picture of my 7yrs old XserveRAID.
http://hardforum.com/showthread.php?t=1718304
Filled with 500GB IDE Deskstars with "Apple" logo on them. Not a single drive dropped. Almost 50k hours uptime. I render this to the "Apple" logo.
I never achieve such hdd reliability with my other storages-Lacie fc xl,Gtech XL16 ,big DIY boxes with 2x 3ware9650-12 or my recently ZFS based builds.No with Ultrastars and not with WD RE hdds.
Between 5-10% failed HDDs per year and 0% for Apple marked HDDs for 7 years !
I have no idea how they're made -separate production line, better QC or .. !?

GeorgeHR · Nov 16, 2012

madrebel said:
so the guy telling me i know nothing is the same guy protecting '$250k' data by copying it everywhere, not using any raid, with no on disk protection.

luckily for you that is a feasible plan i would like to see you copy 1TB 60 times and then tell me cost isn't a factor.

History is wonderful. The 60 copies was not a plan but simply an outcome based on economics. When a backup hard drive was full it was simply stored and replaced by a new larger, faster drive.

As I said before any DIY solution (even one by me) to a 1PB data storage problem is not as cost effecrtive as one by a large firm.

---

I think you would object to paying to see anyone keeping 60 copies of data. If I had $5 billion of data and a backup copy cost me $1 million, I think I would have a lot of copies around. The actual number would depend on how reliable the copy media was.

madrebel · Nov 16, 2012

DIY, you keep saying that. i dont think you know what DIY is and why a nexenta solution isn't DIY.

schizrade · Nov 16, 2012

TeeJayHoward said:
Unless prices have drastically changed since I last looked, you're not going to get a petabyte from a vendor for 130K USD.

Heck, for a homebrew setup, you'd barely be able to squeeze in for that price:
34 3Us with 15x drives per 3U (10 data, 3 parity, 2 hot-swap)
510 raw drives. Best $/GB is the 3TB drives. Since I'm a Seagate fan, we'll use those at $150 per.
That's $76,500 in drives alone.
Then add in about $1500 for each box you put the drives in... $51000
You're sitting at $127,500 and you still don't have racks, power, network, or backups.

I would start by trying to convince your boss to double the budget for a homebrew setup, or quadruple it for a vendor solution.

This. I just installed 18TB for 70K.

brutalizer · Nov 20, 2012

GeorgeHR said:
History is wonderful. The 60 copies was not a plan but simply an outcome based on economics. When a backup hard drive was full it was simply stored and replaced by a new larger, faster drive.

A question. I have done something similar and see that the number of backups have exploded exponentially. That is a realy pain. That is the same reason you have 60 backups now. And if you want to backup these 60 backups, you have another 60 disks = 120 backups. And then 240 Backups. etc. Exponential growth is a bad thing.

The reason I got this exponential growth was because I backupped from a Windows server. The data was important, so I took two copies to my USB stick. I dont have access to the Windows server anymore. When I got home, I copied them to my PC. To be sure, I copied them again. So now I have four copies. etc etc. The number of copies just explodes.

To counter this exponential growth, to reduce the number of copies, I want to do MD5 checksums of all copies. If they are identical, I can discard all but one. Now I have verified with MD5 checksums that my USB stick and the copies I have are identical. etc.

My point is, instead of having lot of backups, I can just have one backup. If I take MD5 checksums all the time. Right?

The problem with our approach, is that with 60 backups: assume your data is corrupt when you try to access a certain piece. Byt nr X is corrupt and just displays garbage on the screen. So, you go to a backup. It seems to be correct. Everything is good. But... maybe the same file where corrupted in other bytes at the same time? Or several files? How can you check each file that all data is correct? Which backup should you go back to?

MD5 checksums (or SHA256, or...) solves this problem. So, isntead of having lot of backups, just have one good backup and that is enough. But... a new problem: for each file you need to save its corresponding MD5 checksum in a text file. And when you backup some files you need to update the MD5 checksum text file. Lets assume you have some data corruption and some files break. Time to go to the backup, which files are most recent? Doing MD5 checksums on all, and resolve which files are valid and which are not valid. This is tremendous book keeping. Do you really want to spend time with all this?

ZFS does all this automatically for you. Everytime you read a block in a file, it calculates a checksum and compares. And if files are corrupt, ZFS automitacally resolves which files are valid and which are not. You dont need to do anything else than issue "scrub" command and ZFS will calculate checksum on every block and do this for you. Then you can discard all these 59 backups.

If you are really cautious, you can have two backups. Preferebaly two ZFS servers, which you send your data to, over internet via SSH and "ZFS send receive" wich will automatically calculate checksums and send only the newest changed data. Everything is taken care for you.

I am now trying to move my backups to ZFS, after I have confirmed which copies are working. Finally with ZFS, I am safe. No need to do manual checksums anymore. Let the computer do all the work for you?

PS. Great thread, I learned lots!

Petabyte Storage

Gawd

[H]F Junkie

Gawd

Gawd

Deleted member 82943

Guest

Gawd

Gawd

Gawd

2[H]4U

Gawd

Gawd

Gawd

Gawd

2[H]4U

Deleted member 82943

Guest

Gawd

Deleted member 82943

Guest

Gawd

Gawd

Deleted member 82943

Guest

Gawd

Gawd

Limp Gawd

Limpness Supreme

Gawd

Gawd

Gawd

Limp Gawd

Weaksauce

Gawd

Gawd

Supreme [H]ardness

[H]ard|Gawd