Need help on building data storage system

johnksrv

n00b
Joined
Jun 4, 2012
Messages
6
I want to build a custom disk storage system to store backup of my customers' data. My main concerns:
_ Enclosure: rack-mount form (preferably 4U).
_ Storage density: Able to combine as many 3.5" hard drives as possible (preferably hot-swap).
_ Works well with consumer-grade 3TB 3.5" SATA drives.
_ Support RAID 6.
_ Lowest investment cost possible.

This is the first time I try to build this kind of system. The above criteria are results of my hasty research on data storage over the last few days. If I miss anything important please let me know (e.g. I've read a lot about SAN,NAS,DAS,... but still not sure how these concepts apply to my situation).

P.s: Please also tell me if you know any pre-built commercial solution that fits the criteria above.
 
Last edited:
Other than as cheap as possible, what are your immediate storage needs (size-wise) and the expected needs over the next 12 months. Are you looking for something with a single-vendor warranty? Onsite service? How are you going to be getting the data from your customers sites (Online/Tape/Disk etc)? How are you planning on backing up your customers' backups? Will you be handling all restore requests or is this something you are going to build a front-end for your customers? What is your upper limit on cost? How technically inclined are you? There are many pre-built options, but it mostly comes down to how much you are willing to spend.
 
Hi mwroobel, thanks for your response.

What are your immediate storage needs (size-wise) and the expected needs over the next 12 months?
_ We are going to run an online backup service, so it will mainly base on the number of customers. My plan is to build a cost-effective and high-density storage system (to save rack space) and when we need more storage we can build another one with the same configuration.

Are you looking for something with a single-vendor warranty?
_ I'm not really sure, but I'm most concerned about hard drives and chassis warranty. Do you have any suggestion on this?

Onsite service?
_ We don't need onsite service.

How are you going to be getting the data from your customers sites (Online/Tape/Disk etc)?
_ We will have a software client installed on our customers' computers to upload their data to storage server.

How are you planning on backing up your customers' backups?
_ Besides RAID 6 to increase data safety, my plan is to mirror the storage server to another storage server located in a different datacenter (I think it's the most cost-effective way). Do you have any suggestion for me?

Will you be handling all restore requests or is this something you are going to build a front-end for your customers?
_ As I've said before, there will be a software client and that software will also handle restore requests.

What is your upper limit on cost?
_ Honestly, I'm still not confident enough to decide on the budget at the moment. I'm just trying to find a best combination of cost and reliability, to finish my planning process and start looking for investments from venture capitals.

How technically inclined are you?
_ I consider myself a tech savvy. However I'm very new to server hardware, so there's a lot of things to learn.
 
Hi mwroobel, thanks for your response.

What are your immediate storage needs (size-wise) and the expected needs over the next 12 months?
_ We are going to run an online backup service, so it will mainly base on the number of customers. My plan is to build a cost-effective and high-density storage system (to save rack space) and when we need more storage we can build another one with the same configuration.

I work for a company that produces Online Backup service software, and I will be straight with you. The people that we have call and have problems with our software are the people that went really cheap on hardware, and didn't realize what they were getting into.

A couple of questions I have for you.
How many users are you expecting of this service?
What is the average size of each account you are looking to provide service for?
What is your incoming bandwidth at your datacenter?

Mirroring is not a backup plan, that is a DR plan in my book. If you have bad data or a delete on the main array and it is mirrored to the array at your DR location you are screwed. I have seen this so many times it is not funny. Please make sure you look into doing backups of the actual data that you are storing, not just mirroring.
 
Hi mwroobel, thanks for your response.

What are your immediate storage needs (size-wise) and the expected needs over the next 12 months?
_ We are going to run an online backup service, so it will mainly base on the number of customers. My plan is to build a cost-effective and high-density storage system (to save rack space) and when we need more storage we can build another one with the same configuration.
If this is going to be your business, and your responsibility I would honestly suggest hiring someone versed in network storage who you can get specifics from based on your needs vs choosing what you hear from people on a forum. In any case, how many users are you planning on and how much space are you going to give them?

Are you looking for something with a single-vendor warranty?
_ I'm not really sure, but I'm most concerned about hard drives and chassis warranty. Do you have any suggestion on this?
Well, unless you are purchasing drives from a specific server manufacturer, the warranties will be whatever you get based on the drives you choose. If you choose consumer OEM drives, this will be as little as 1 year. If you choose enterprise drives, it could be as much as 5.

Onsite service?
_ We don't need onsite service.
Are you going to host this from your own offices or colo it locally? That will affect your decisions on service needs.

How are you going to be getting the data from your customers sites (Online/Tape/Disk etc)?
_ We will have a software client installed on our customers' computers to upload their data to storage server.
How is this client going to talk to your back end system? proprietary, ftp, nfs, etc?

How are you planning on backing up your customers' backups?
_ Besides RAID 6 to increase data safety, my plan is to mirror the storage server to another storage server located in a different datacenter (I think it's the most cost-effective way). Do you have any suggestion for me?
You are going to need more than that. Some kind of offline backup system with dedup and versioning at a minimum.

Will you be handling all restore requests or is this something you are going to build a front-end for your customers?
_ As I've said before, there will be a software client and that software will also handle restore requests.

What is your upper limit on cost?
_ Honestly, I'm still not confident enough to decide on the budget at the moment. I'm just trying to find a best combination of cost and reliability, to finish my planning process and start looking for investments from venture capitals.

How technically inclined are you?
_ I consider myself a tech savvy. However I'm very new to server hardware, so there's a lot of things to learn.

If you want to look at a system that is up and running well, go to backblaze.com and look the their pod 2g system. This is not something someone not very technical is going to be able to administer easily, and your client software authors are going to want some input on the back end. How much bandwith are you going to allow each client in addition to the storage you are going to offer them? These are not simple questions and you will need some idead of the answers before proceeding.
 
Thank you very much, I'm taking a lot at the Backblaze Pod.

Just wonder how companies like Carbonite or Backblaze can offer unlimted backup for just $50 a year. I have made a calculation, even with cheap storage like Backblaze Pod, they can hardly get any profit, not to mention additional cost to backup customers' backup (e.g. an additional Pod to backup a Pod).

Does anyone know how can they do that?
 
With the right software (front-end and back-end) compression, dedupe, snapshot and delta updates etc, the right business plan, the right up-front capital expenditures and a lot of luck you can make a go of it. Obviously, backblaze is offering unlimited backup for as low as 3.96 per machine per month. As per them they are profitable. In any case, this is a risk-filled investment like many online propositions. Its up to you if everything you need to fund... Both front and back end software, capital equipment purchases and monthly expenses such as power, bandwidth, personnel, advertising etc is worth it based on your projections.
 
I would expect they archive your backups to tape (which is becomes cheaper than disk after a certain capacity point).
 
I would expect they archive your backups to tape (which is becomes cheaper than disk after a certain capacity point).

Actually, backblaze is completely disk based, redundancies at both the disk and pod level with the tools they have written. Since they are using consumer-level OEM SATA drives vs enterprise SAS drives, it makes things significantly cheaper for them.
 
Yes they use consumer-level OEM SATA drives but they dont pay what we pay. They have contract in place to purchase drives at a much lower price than we do.
If you a bit of market research you would realize that 90% of the computer user that have some ype of back up service only use it for documents, and family videos/ photos. That does not take a lot of space in a server. The only thing they really need to worry about is badwith and that is not that expensive this days for organizations like that.
 
As we talked about deduplication, do you have any idea on how to implement a block-level deduplication on a file-level storage like NAS?

Another question: What is the difference between 1) A storage server connected to a disk array and 2) A storage server that has a disk array in it?

I found some high density JBOD disk arrays. Is there any way to configure them to work as a RAID array?

@AceXsmurF: thank you, that looks interesting!
 
Unfortunately, unless your programmers are prepared to roll their own dedupe, you have few choices of well scalable (and reliable) block or zone level dedupe without going a commercial vendor route (you can roll your own with ZFS or some other compliant FS) or go hogwild with one such as NetApp, ExaGrid or EMC. You also have less expensive options such as Symantec PureDisk, but scalability is a concern there. As to the other question, it depends on how the storage server is attached to the array. If it is SAS or IB for example, it could be considered DAS. Otherwise, if the "storage server" is only connected to the network and the disk array is only connected to the storage server it is semantics. In the end, you will be best off doing dedupe yourself via whatever client you decide to write and just transfer a token to your software on the back end rather than take the time and expense of copying the file across the net and letting your filesystem dedupe it once it has been written to your storage infrastructure. You kill 2 birds with one stone that way.
 
Last edited:
I know that Backblaze are concerned about data corruption and are therefore closely following ZFS. If you are serious about data corruption, you should avoid raid-6 and do some research. Read this, here are several research papers:
http://en.wikipedia.org/wiki/ZFS#Data_Integrity


If you want ZFS, then you must not use hardware raid, which makes the server cheaper. Just connect all disks straight in. Here is a company that have rewritten ZFS deduplication to get better performance for those who are dependent on dedupe:
http://www.theregister.co.uk/2012/06/01/tegile_zebi/

Basically, you can build a PC with several SATA disks without any raid card, using ZFS and get a safer solution.

Here is a thread about enclosures accepting many disks:
http://hardforum.com/showthread.php?t=1643120&highlight=thumper
 
Last edited:
You might look into crashplan. They offer business and enterprise level backup now. I think under the enterprise level they offer the software and you can provide the hardware. You said you want to mirror to another server. I hope you are taking into account restoring files from accidental deletion. Because a mirror will only be current data.
 
So what strategy you think I should use to backup my customers' backup? I want it to be automatic and disk-based, so tape backup is not suitable for me.
 
Back
Top