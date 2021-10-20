500TB+ storage

Hello, at my internship I'm supposed to look into and choose good possibilities for a big storage (70TB is already here and is not sufficient, so we're talking about 500TB and more). The options are generally there and I'm getting lost in them a little bit, but when it comes to a concrete choices, I can't really find plenty of options since it seems the products aren't generally wide-spread.

I've already asked through the internet and I'd like to get some second opinions from you guys here. Seems like cloud storage is absolutely out of question, since the prices are really high. Some doable solutions might be SAN, SDS. What do you think, any other ideas?

The situation is that physical space isn't a problem, usage of the storage would be probably <10 people, speed of reading/writing to a storage would be very handy but I don't think the budget will allow us to push on this front.

P.S: A little questions aside - what do you think of assembling NAS (e.g. Synology) of 192GB, is that generally a good idea? (12 times 16GB disks)

the storage would be used by application engineers using programs for 3D reconstructions in cryoem, meaning the initial input in one project usually is images from microscope (few hundreds of GB) and while working and processing data the project grows to a few TB. the whole project is stored afterwards, meaning it's not hard to fill 100TB in few weeks, months. the length of storage of each project differs. usually the project directory is set up on the external storage as well as the data, meaning that parts of data are read to memory on the local machine and processed.
 
When I saw 500TB I immediately thought tape. You can certainly do it with a bunch of drives, however with 32 drives minimum (using 16TB drives) you wouldn't want to rely on zero drive failures, so you're actually talking about 50+ depending on how you do it. If I were you, though, I'd investigate using tape for the bulk of the data and a few 10s of TB of hard drive for active data.
 
Lots of things to consider here, but simply reaching 500TB is pretty darn easy these days. Supermicro makes chassis with up to 90 drives, perhaps even larger now, just add the SAS drives of your choice. Last I bought SAS drives were up to 18TB.
 
Is it fair to say you need to different storage usage (with different write-read) ? One to store older project and backup current active one and an other for the current active project ?

Making something like tape possible for one ?
 
As this is isn't for home use go with a vendor that will offer 24/7 support such as Dell/HPE/Cisco. Their pre-sales can get you a suitable build of you have properly defined requirements.
 
I mean... interns should do real work. But this isn't go out and get this and set it up, this is go out and find out what our options are so we can have a discussion.

Personally, without any other specs, I'd go towards whatever server vendor they use and find the box with the most disk slots, and fill them with big disks. You might need two boxes if your preferred vendor tops out at 24 disks like used to be popular. You might want two (or more) boxes for redundancy anyway. It sounds like this is mostly just a need for a NFS/Samba/etc shared workspace with a lot of disks? Not some sort of big data processing where you may want a lot more nodes for processing power and less disks per node.

If the data is mostly archived after a small period of active use, enterprise tape seems like a good idea, too.
 
