1.5 Petabytes: Swiftstack In Pictures

Bigdady92

Supreme [H]ardness
Joined
Jun 20, 2001
Messages
5,767
At my job we use a ton of storage. Ridiculous amount scattered all over the place in SAN, NAS, local disk. Year upon year it's been added on piecemeal until we finally realized we are spending more on power and expansion than just doing it "right".

The new paradigm (BINGO!) is to consolidate all this mishmash of data onto a Hadoop Cluster and utilize Openstack: Swift to store the output of those jobs, incoming files from various 3rd party vendors, and data from our End Points.


I shall capture these moments in setting up 500 3TB Hard drives into our new cluster with images hopefully worth of [H].

Here they are on the pallet ready to ship.

GDyZJXh.jpg
 
What is the DoA count? Statistically there should be 5!

What hard drives are these? :D


The DoA count is by my count expect to be 5% which would be around 25 drives to fail. They are on their way from California to our data center via freight truck so I have not opened them yet. When I do I'll take more pictures.


They are TOSHIBA DT01ACA300 3TB 7200 RPM 64MB Cache SATA 6.0Gb/s 3.5" Internal Hard Drive

http://www.newegg.com/Product/Product.aspx?Item=N82E16822149408

We did not get them from newegg (Disclaimer) and since we were able to buy 500 at a time our prices were less than retail but not by much, they don't make enough from these drives and I shopped around heavily with a list of 7 different drives in order to find one that met our criteria: Relatively inexpensive, 3year warranty, 6GB/s, 3-4TB, 7200RPM,32mb+ cache.
 
Last edited:
That's a lot of hard drives :D


I've never ordered this amount of hard drives before. I have a team of 5 people who will, hopefully, help me put 400 or so of these into caddies and then into the servers to burn in.

Even with 1.5PB as raw space we will be lucky to get 500TB of useable space as Openstack:Swift uses multiples of 3 to store the data :1/3 of all space is used, 2/3 for redundancy.


We have a Cisco Nexus to utilize 10GBe for all of these systems. The price between 10 vs 1 gig ethernet has come down significantly and at the rate data is coming in and moving around we need as little lull between servers to propagate the data.
 
Why are you using 3TB drives and not 4TB ones? The power - and thus money - saved by using the latter must be considerable.
 
Why are you using 3TB drives and not 4TB ones? The power - and thus money - saved by using the latter must be considerable.

Systems are in an enterprise data center split between 2 cabinets utilizing 4 30A circuits. Since we pay per month for 30A of power circuits (normally 24ish before they whine), I'm not concerned about HVAC. That's for the DC staff to deal with and there's no way I'll be hitting that 24A line anytime soon.

The price point between the 3 and 4 was outside our planned budget ($30k+ extra) so we decided to go for the 3TB which is more than enough for our needs for the next 18 months. If we need to expand further we purchase more servers and add to the array in 90TB chunks. It is preferable to have more smaller hard drives than less bigger ones within reason of course.
 
Last edited:
A couple of questions for you:
1) Why Openstack instead of some of the other cloud platforms out there?
2) Are you going to be just using the storage stack or are you looking into using the compute and networking components?
3) 500TB usable after everything is said and done. I am going to assume this is all static data that is going to live out there?
4) How are you guys planning on backing that up?
5) Where in Cali is this going? (I am in the LA area-ish and would love to hear/see your experience with Openstack)
 
A couple of questions for you:
1) Why Openstack instead of some of the other cloud platforms out there?
2) Are you going to be just using the storage stack or are you looking into using the compute and networking components?
3) 500TB usable after everything is said and done. I am going to assume this is all static data that is going to live out there?
4) How are you guys planning on backing that up?
5) Where in Cali is this going? (I am in the LA area-ish and would love to hear/see your experience with Openstack)


1. After looking for other traditional company based solutions for housing our data (EMC, Compellent, etc) we decided to look for alternatives. Our groups went out and scoured the internet and found that Openstack: Swift provides for our needs for massive scalable storage solution utilizing commodity based hardware. Nothing else out there really can compare to Openstack:Swift for the price point including 24/7/365 support which is vital. If it's good enough for Rackspace, ebay, and Paypal, it's good enough for us. We even looked at rolling our own via backblaze pods but that was not viable either due to the hardware limitations (no dual power supply, difficulty to accessing the hard ware, no 10GBe cards, etc)

2. Just the storage for now, eventually the rest of Openstack. Openstack is very complicated and I want to focus on one small part with my team before tackling the larger projects.

3. 500TB of useable space yes. Going to say 50/50 with data rotation coming as we replace old files with new ones on a constant basis. There will be a nice ramp up of files for the first 6 months like this "/" and then it will be a steady growth of 10% or so per month. We have 150TB of storage currently and that needs to move over, 200TB by EOY.

4. Remote replication of additional pods in 2 other locations: 1 in Asia and 1 somewhere in the midwest

5. Sorry, not Cali, wrong coast. :)
 
Why not a zfs solution? Lower price. Data is protected.


Our data protection is secured by having 3 copies of the same file on 3 different servers at the same time with CRC checks verifying that all copies are in sync. Furthermore having the same data replicated out to 2 different sites gives us even more flexibility in case we get leveled by an Act of God event.

Also we plan on turning our old EMC into a backup repo for this data eventually. Gotta use it for something :)

I honestly did not consider a ZFS solution for our storage needs. We settled on Openstack:Swift fairly early as it fit our niche and when comparing with other vendors as above the prices sky rocketed.
 
Thanks for your answers that is great!


4. Remote replication of additional pods in 2 other locations: 1 in Asia and 1 somewhere in the midwest
-> Replication is good, but replication is not backup. I am going to assume you are going to do snapshots too on the data, what type of retention are you thinking for that large of amount of data? Do you have any plans do 'traditional' backup at all?

5. Sorry, not Cali, wrong coast. :)
-> Sorry I thought I saw something about Cali in the post :)
 
Thanks for your answers that is great!


4. Remote replication of additional pods in 2 other locations: 1 in Asia and 1 somewhere in the midwest
-> Replication is good, but replication is not backup. I am going to assume you are going to do snapshots too on the data, what type of retention are you thinking for that large of amount of data? Do you have any plans do 'traditional' backup at all?

5. Sorry, not Cali, wrong coast. :)
-> Sorry I thought I saw something about Cali in the post :)


Once we move all data off our current miriad of SAN/NAS/ISCI/Local Disk/HORROR storage we have 2 EMC systems that will house the data as 'traditional' backup. We will retrofit the disks to have big/slow storage and dump data onto that as well. One EMC will then mirror the data to the other EMC (they do this now) and that way we have backups in 2 different locations.

The servers are coming from California out to the East Coast.
 
I've read those threads before and my experiences with off the shelf ZFS solutions were not favorable, so I stopped considering them as alternatives.

I did reach out to vendors (EMC and Dell for their storage lines) for price quotes and they were both significantly out of our price range as we do have some cost constraints on the total size of our data required/$.

I have secured 24/7/365 support on all parts of my equipment from the networking, server, OS, software to get this puppy working if there is a problem somewhere on the pipeline. This was a requirement from the get go as my data is worth far more than a couple of thousand extra a month for support that we have to pay.

We may be going back to Compellent for future storage projects as we like how they integrate with other systems, this time it we chose price over functionality to get this project going.

ON a side note, my 500 hard drives have arrived at the data center. I'm not spending this weekend going there installing them, they can wait for my entire team to rollout and help.

Anyone want to make a couple bucks screwing in hard drives? Hah!
 
Can't wait to see this racked up. I have ~1.1PB of storage in house and another 3PB between remote clusters. Ours are all ZFS based and broken in to 140TB-336TB pools due to speed issues when attaching them to our cluster. (Each controller can max out or 40GB Infiniband network with 140TB.)
 
What hardware platform are you using for your cluster?

Supermicro recently announced a sweet box for Hadoop:

http://www.supermicro.com/products/system/4U/F617/SYS-F617H6-FT_.cfm

(you actually want the other chassis that has the LSI 2308 (non-raid) HBA ... unfortunately there's not a full web page yet for that one on the SM site)

Gives you 48 hard drives in each 4U/4Node chassis and inside the same 4U are 4 dual-processor (Sandy Bridge) motherboards. This thing is extremely dense and uses 80-Plus *Platinum* power supplies, so should end up being very power efficient.

I'm currently running a 6 node (250 TB) cluster using the older incarnation of Supermicro's "Twin" servers combined with external JBOD chassies via SAS switching...everything connected together with 40Gbps QDR Infiniband (I'm seeing intra-node communications in the 1.0 - 5.0 Gbps range).

On Friday, I just ordered two of the F617s (2x E2620s, 128GB/node), and 8 more QDR IB cards ... I already have the 96 4TB Hitachi drives "laying around" to populate these fully....all in I'm looking at about $45K which will take the customer up to about 600TB of capacity. The cost of the SM chassis, including CPU and memory (no disks or IB) is not much more than 10K a piece...which is insane.

My current 6 nodes is sucking down about 2-3KW of power or about $200-300/m. I'm hoping that after I added the addl 8 nodes, it'll all run under 6KW which will be pretty amazing for a cluster with 336 (including hyperthreading) cores.
 
What hardware platform are you using for your cluster?

.
For our Openstack cluster we went with a heavily customized Super Micro SC847E16:

SC847E16-R1K28LPB_spec.jpg


Key Features

1. Intel® Xeon® processor E5-2600
family; QPI up to 8GT/s
2. Up to 768GB DDR3 1600MHz* ECC
Registered DIMM; 24x DIMM sockets
* Depends on memory configuration
3. 3x PCI-E 3.0 x16 and 1x PCI-E 3.0 x8
slots (Low-profile)
4. Intel® i350 GbE Controller; 4x ports
5. 36x Hot-swap 3.5" SAS2 (with LSI
expander) / SATA3 HDD Bays;
24 front + 12 rear HDD Bays
6. Hardware RAID controller and JBOD
Expansion; RAID 0, 1, 5, 6, 10, 50, 60
7. Server remote management: IPMI 2.0
/ KVM over LAN / Media over LAN
8. 3x Heavy duty PWM fans
9. 1280W Redundant Power Supplies
Platinum Level (95%)


For Hadoop we had talks with the big 3 vendors and went with a heavily customized solution. I am unable to talk about that portion of the project at this time, but when I can I'll be sure to post pictures!

I did look at those supermicro systems (Twin/Fat/Quad) for our test bed/dev playground and still may go with that route.

I did pickup a few of those 24 2.5" disk 4n1 server in a 2U chassis servers, stuck 256GB SSD's in them, and they work like screaming demons. Highly recommend them.
 
What SSD's were you using? I'm about to build one of those for a project. I have been pretty happy with the Intel 520's.
 
You could be down here in 12 hours or so if you drove like a typical Philly driver ;)

so Florida somewhere? how far from Miami? :D

did pickup a few of those 24 2.5" disk 4n1 server in a 2U chassis servers, stuck 256GB SSD's in them, and they work like screaming demons. Highly recommend them.

dell cloud servers?
 
the lost city of Atlanta?

I picked up some used C6100's off ebay for cheap, they are nice boxes, turned 2 into an 8 node xen cluster...

the new stuff is astronomically expensive tho...
 
DAME.....................................
lord, i declare you the "king" of H forum with the biggest e-penis of them all...
while it's isn't your rig, still it's 1.5 PETA!!!

i can't stop drooling
 
I'm interested in the use case here... Just using hadoop to host hdfs for file storage or will you be doing processing on hadoop?

The project I'm currently on was an early adopter of Hadoop (we've been using it for about 4.5 years now.) Since our need was computational as well as storage our cluster is about 60 datanodes, each with 16 cores, 48gb of memory and 4x2TB drives. Folks didn't listen to me (my recommendation was 8x1TB) and we're severely constrained in terms of IOPS. We've been Cloudera customers essentially since they were founded so if you have any questions I'd love to help out. I've given a couple presentations on performance concerns with Hadoop and HBase.
 
Back
Top