Anyone knows about SW solution to encrypted multi HDD storage with deduplication?

Discussion in 'SSDs & Data Storage' started by postcd, Apr 16, 2019.

  1. postcd

    postcd n00b

    Messages:
    45
    Joined:
    Nov 24, 2016
    Hello,

    i want to speak up about my problem i am long time thinking about and unable to solve despite of searching and also asking on other sites. Sumary is in the title of this page.

    Currently i am having Windows system that i do not want to leave.
    I need to store around 7TB of data and growing maybe 150GB/mo., mainly movies and also data of an application which is utilizing roughly million files (mostly small files of total size maybe 500GB). Currently i am on HDDs (mentioned app files on one ext. HDD and movies, etc on other external, then system hdd), but i would welcome if i can speed UP the storage at least by 100% (-> read striping) and also i am running out of disk space.

    ISSUE: need more space than SSDs can offer and bigger speed than single HDD can offer.

    i think i may need to create some storage "pool" out of my HDDs. I want this storage pool be encrypted after i shutdown computer. I would need to use USB enclosures i think as i do not want to buy expensive and power hungry NAS and i do not trust in compatibility of the network attached filesystem to my WIndows 10 PC (that all apps will have no problem using it).

    I can use Windows app DrivePool to combine multiple USB drives, this app do not offer deduplication (if the virtual storage pool - NTFS filesystem contains one file in multiple directories, this SW is not able to save physical disk space by utilizing physical space equal to 1 file only). But no problem, maybe i should find some SW that will do it regularly for me by replacing dupes by hardlinks?

    I do not need real time deduplication as i read that the ZFS filesystem and maybe BTRFS filesystem's deduplicaton feature requires lets say 6GB RAM + 1GB RAM per 1TB of the storage and also additional CPU time (https://hardforum.com/threads/zfs-dedupe-is-fixed-soon.1854062/) - that is too much resources for me. Thinking if there is some software solution that do not require me purchasing expensive minipc with big RAM..

    Maybe i am complicating it too much, maybe i should buy 2 large external HDDs and continue like now, but i am starting to need more space than largest affordable SSDs offer and also more than one HDD speed (IOPS) - (at least for certain file directories/files of like 500GB size needs to have fast read than current single HDD offers). I can not move them to SSD, because they are part of the APP which also contains very large files needing terabytes of storage. I tried to use symlinks (symbolic links) to a different HDD, but the app not worked with symlinks (hardlinks on windows only within single drive).

    I do not know what to do.
    Note that i prefer external 2.5" USB drives because of lower: noise,PWRconsumption,price.
    I do not wish to run some old noisy computer/NAS just to allow me connect 2-3 HDDs together. Also i wish i can prevent buying additional expensive or power hungry hardware. So i prefer (not insist) software solution or solution using simple static HW. DrivePool SW can do everything i want, except it will waste my disk space (in my case space wasted would be in terabytes) because it can not handle duplicates on storage - maybe i should find a Windows SW that will regularly scan for dupes and do replacement by hardlinks? Symlinks does not help as mentioned.
    Maybe i should buy some miniPC like Rpi (<10Watt) to offload my HDD IOPS and CPU cycles (which is also becoming problem). I only do not know how i would reliably and simply connect my Windows PC storage and Linux PC storage. From this point looks better to utilize single storage.

    Advices on good setup are very welcome, thank you in advance and sorry for hard reading (i am not native speaker).

    ---------------------

    Interesting comments found:
    https://news.ycombinator.com/item?id=15070542

    https://leaf.dragonflybsd.org/cgi/web-man?command=hammer&section=8
    (unsure if quoted text shows it can work in my case somehow, but the Dragonfly download page shows no ARM CPU support, it says "DragonFly BSD is 64-bit only")
     
    Last edited: Apr 19, 2019 at 5:26 AM
  2. kdh

    kdh Gawd

    Messages:
    700
    Joined:
    Mar 16, 2005
    Your requirements for what you are trying to do are not reasonable. You'll have to flex on budget, performance or any other needs. Also, encrypted data does not dedup or compress that well. Why? Encrypted data is almost all unique with very little commonality. In other words, encrypted data is a lot like any compressed media file as that it doesn't dedupe at all. You may get 1:1.2 dedupe if you are lucky but you are going to waste a ton of CPU cycles doing it.

    You are over complicating it and your management of what you posted is an animal to manage. Don't reinvent the wheel, because all you are going to end up with is some clunky convoluted complicated system that doesn't work, and locks you into a hardware model you can't get out of. Buy a small 4 drive nas with encryption built in that allows you to easily ingress and egress your data in and out of it down the road. Its a lot cheaper then you think.
     
    postcd likes this.
  3. SvenBent

    SvenBent 2[H]4U

    Messages:
    2,744
    Joined:
    Sep 13, 2008

    Tou are right on your aguments
    But you can dedupe before you encrypt
     
  4. kdh

    kdh Gawd

    Messages:
    700
    Joined:
    Mar 16, 2005
    Yes you can depending on the data. Then going back and laying encryption on top of will be ugly. Very hacky.

    If you are absolutely serious about data encryption on the disk, you'll spend the money on drives that are self encrypting. Meaning, they do encryption at rest with out you, the end user having to do anything as its baked into the firmware. You interact with it to get its security key so what ever disk management system you use can then interact with it. All it means is if someone takes a drive out of your hardware stack.. the drive just looks like it has garbage on it. However, if the drive is online and mounted using the security key, then anyone who has access permission wise to the system can access the data. A high level view of it is like thinking of it like using secure keys in ssh for a no password prompt to log into a server. Difference is your storage controller uses the key provided by the drive to access the data.
     
  5. _Gea

    _Gea 2[H]4U

    Messages:
    3,788
    Joined:
    Dec 5, 2010
    Oracle Solaris 11.4 and a Raid-Z2 pool ex 6 x 4TB=16 TB usable where 2 disks can fail without dataloss ?

    All the unique ZFS security features, Windows ntfs alike ACL with ZFS snaps as previous versions, SMB 3.1, realtime dedup2 with reduced memory needs and encryption per ZFS filesystem. Despite dedup2, I would use at least 16 GB RAM but RAM is not that expensive nowadays.
     
  6. kdh

    kdh Gawd

    Messages:
    700
    Joined:
    Mar 16, 2005
    this would work, but whats OPS budget?

    @OP whats your budget to work with? That may help us better get you in a direction?
     
  7. _Gea

    _Gea 2[H]4U

    Messages:
    3,788
    Joined:
    Dec 5, 2010
    Solaris 11.4 for a commercial use case is at least 800 USD/year with support and updates.
    You can download for free for test and development but without support and updates prior a Solaris 11.5.
     
  8. kdh

    kdh Gawd

    Messages:
    700
    Joined:
    Mar 16, 2005
    Unless someone is familiar with Linux, taking on a Solaris project might be a little advanced. Tons of tutorials for home grown ZFS based solutions, but they can be tough to follow for the average user. And building a ZFS solution is going to be around 800$

    Banging around newegg, I saw this: https://www.newegg.com/Product/Prod...=1&cm_re=encrypted_nas-_-22-108-687-_-Product 4bay nas with encryption built in. OP just has to buy some drives. Solid platform, vendor backed on a known good stack of hardware and a warrenty? Yeah.. Would be sub 800$ and with a few 4TB spinners https://www.newegg.com/Product/Product.aspx?Item=N82E16822149627&ignorebbr=1 could get OP under 800$.
     
  9. Nenu

    Nenu [H]ardened

    Messages:
    18,587
    Joined:
    Apr 28, 2007
    Is there a reason you are looking at USB drives other than not wanting a NAS?
    Wouldnt it be better to fit drives directly into your PC?
    This will remove many problems that can occur accessing drives through a USB interface over a long term, reliability will be improved substantially and data recovery much simpler if there is a problem.
    Its also cheaper and guarantees you will get max performance.
     
  10. postcd

    postcd n00b

    Messages:
    45
    Joined:
    Nov 24, 2016
    OP here!

    _Gea and few others did not understood from the following my sayings that my budget is something like a cost of a miniPC or less (which means maybe up to $350):
    my current working PC motherboard supports up to 16GB RAM which i have and utilizing them already at maximum. My computer can hold only single 2.5" drive, that is why i came with an idea of ext. USB HDDs (Nenu). If i will not use just software solution to build the storage pool, i would have to buy some NAS which you suggest or a minipc (energy efficient & silent as it will be in my room - i would prefer less than 10Watt as mentioned, which is raspberry pi etc.) and these i thing do not support 16+GB RAM.

    PS: in the meantime i discovered i may not need deduplication feature built in the HDD pooling solution, because maybe i can find and use some external app to regularly replace duplicates by hardlinks within resulting pool.
    So the requirement now would be "only":

    budget/silent/energy efficient solution for
    - joining multiple HDDs and creating storage pool out of them prefferabli with read
    - resulting storage "pool" will be supported by my WIndows apps (i have no experience with NAS so do not know if there is problem that some apps do not not understand attached storage)
    - data readable on Linux (in case my WIndows PC is out of order)
    - storage pool is encrypted when PC gets shutdown

    Thx all for attempt to help. I welcome and appreciate all your ontopic ideas that will not go wildly over mentioned budget. And sorry for my too broad and maybe hard to understand "request" o_O
     
    Last edited: Apr 19, 2019 at 5:22 AM
  11. Keljian

    Keljian Limp Gawd

    Messages:
    311
    Joined:
    Nov 7, 2006
    Freenas will do this.

    But you will need 8 gig of ram as a minimum, and need to skip the deduping option.

    Freenas will work on most reasonable hardware, maybe a used HP or lenovo desktop? and will encrypt fine.

    As you're managing lots of small files, you could add an optane memory nvme to use as cache.

    Done and dusted.
     
    postcd likes this.
  12. _Gea

    _Gea 2[H]4U

    Messages:
    3,788
    Joined:
    Dec 5, 2010
    And where is the problem?

    If you can build your own and skip professional server features like ECC and IPMI but insist on Intel quality nics
    (I note Euro prices with EU Vat but supposely USD is quite similar)

    - Silverstone CS 380 case with 8 hotplug bays (Sata, dualpath SAS) 130 Euro
    - PSU 50 Euro
    - Mainboard socket 1151 ex Asrock H270 Pro4 or B360M Pro4 (1151v2) 70 Euro
    - the cheapest Celeron 40 Euro
    - 8 GB RAM 50 Euro

    sum less than 350 Euro without disks - where the case ist the most expensive part.
    Without a backplane it can be much cheaper

    look for a free storage os with web management like my napp-it for Solaris or its free forks
    or FreeNAS/XigmaNas based on Free-BSD - they are all based on ZFS (either native ZFS on Solaris or Open-ZFS)

    From outside view (a Windows client) such a ZFS appliance is quite identical to a Windows machine as it just offers SMB shares, in the Solarish case even with Windows ntfs alike ACL and ZFS snaps as Windows "previous versions" out of the box.
     
    Last edited: Apr 19, 2019 at 10:31 AM
    postcd likes this.
  13. kdh

    kdh Gawd

    Messages:
    700
    Joined:
    Mar 16, 2005
    Budget vs requirements will not work for what you are trying to do. Hopefully you'll find a solution that works for you.
     
  14. postcd

    postcd n00b

    Messages:
    45
    Joined:
    Nov 24, 2016
    thx both for good tips.

    Method A:

    _Gea advised setup has downside in higher power consumption (possibly over 45Watt) i think, also maybe noise of the PSU fan. Here is an idea on how to reduce power consumption by at least 20W and possibly for lower budget:

    Method B - i will use Windows software like DrivePool + DiskCryptor. The software initial cost is <$50. This setup i mentioned already and would not add any noise and additional power consumption

    Method C - i will wait a few years for manufacturers to buid better miniPCs with more RAM and CPUs with AV1 support and in the meantime i will continue with not ideal setup like now (stand alone HDDs - without any read striping), only i buy new bigger HDD and migrate small one to it.

    I have checked also various NAS storages, but ones that i like (8GB RAM, encryption..) cost too much.

    Sad there is no software pooling solution for low RAM minipc like Rasberry (1GB RAM) where i can also use encryption, said FreeNAS was suggested by you for 8GB RAM and ZFS for 8GB RAM even i read on above linked reddit article that the ZFS can be run on lower RAM. Dedup would not be used (too high ram usage).

    Another question: Wondering if i can set some file types on a ZFS pool can be accelerated by placing them on a SSD which would be part of the pool created mainly by HDDs.
     
  15. _Gea

    _Gea 2[H]4U

    Messages:
    3,788
    Joined:
    Dec 5, 2010
    You can't do that in ZFS.

    The idea behind ZFS is different.
    For a sequential workload disks are fast enough. In a ZFS raid, sequential performance scale with number of data disks so a few disks can give a sequential performance good for a 10G network. For a random read/write workload where SSDs are much faster, ZFS use its superiour rambased read/write caches (blockbased not filebased with a read most/ read last strategy on reads). It is quite usual to have a read hit rate from cache over 80%. This is why you see ZFS storage systems with > 100 GB RAM.

    Additionally you can extend the rambased readcache Arc with an SSD or NVMe readcache L2Arc (max 10 x RAM) where you can additionally enable read ahead what can improve some sequiental workloads.
     
    postcd likes this.
  16. mwroobel

    mwroobel [H]ardness Supreme

    Messages:
    4,886
    Joined:
    Jul 24, 2008
    What you are looking for is Tiered Storage, a form of Hierarchical Storage. It is often used in enterprise SAN systems, but I don't know of any actively-maintained HSM system being developed today that is FOSS. The last one I remember was lvmts, but that hasn't been updated in a very long time.
     
    postcd likes this.
  17. HammerSandwich

    HammerSandwich [H]ard|Gawd

    Messages:
    1,104
    Joined:
    Nov 18, 2004
    postcd, how much duplication do you have in your current 7TB? If you don't actually know, run a scan overnight to help make an informed decision.