Hey guys I finally decided to create a real backup of my ZFS array and I have opted for an offline backup for a few reasons.
By offline I mean individual hard drives sitting in a drawer.
Anyways I get the drives in a couple days and I am trying to plan out the best way to transfer my array to them initially and the best way to manage them going forward.
Here are just some of the ideas I have come up with so far. I am looking for suggestion or tips to improve on my ideas so far. Maybe some of you have done something like this before or just have some good ideas or good software to use.
Pre-info
My array is a 20TB usable ZFS pool with currently 13TB used.
I have 3 4TB drives and 1 3TB drive to use as backup, so 15TB of space.
Part 1: Initially filling my drives.
So the problem I see is that I have lots of data in few folders. For example, my Movies folder is about 9TB alone. How should I easily copy this to multiple individual disks?
My current idea is to just start to rsync the first directory (Movies) until the drive is full. Then output the list of what files copied successfully into a text file and then input that as an -exclude-from list into the next rsync command to the next disk. Then once Movies is done, move onto my 3TB TV Show folder and so on.
Think this will work well or see a better way to do this?
Part 2: Keeping the backup updated.
One piece of information to keep in mind is that my ZFS pool is 90% media, so I very rarely delete or modify files. 90%+ of the time changes to my data are simply file additions.
I was thinking perhaps to update my backup monthly or possibly quicker. Once I finish my initial backup, I was going to take a zfs snapshot. Then when it's time to update my backup I would take another zfs snapshot and then do zfs diff to quickly and easily tell me the file differences. Again, 90% of these would be additions, so I would just manually copy these files to the backup disk that has room. I was going to do this manually, but perhaps I can parse the zfs diff output to a file and then create a script that rsyncs each file to a disk of my choosing. Then for any file deletions or modifications (rare case) pop in the other backup disks and make those changes.
I'm OK with this process and I don't think it will take too long to do once a month or so. But again, maybe there is a way easier way I'm missing.
Part 3: Extra data maintenance
I was also thinking that since my backup isnt going to have any redundancy or parity or anything that I should "scrub" my backups periodically. I was thinking of using SFV or some similar tool where I can just tell it to checksum all the files on my disks and it maintains like a text file on the disk keeping all the checksums. This way if a file becomes corrupted or bit-rotted and a sector needs to be remapped I will see which file didn't pass the checksum and then restore it fresh from my array.
Part 4: Random thoughts
I was thinking of using NTFS as the filesystem for my backup disks. I wanted them to be able to be read anywhere (mainly Windows and Linux) and NTFS does have built in compression that works in both places. Unless you guys see a really bad reason to use NTFS as the file-system and have compelling reasons for me to consider something else. Maybe a different filesystem would make the ideas above easier?
Thanks.
By offline I mean individual hard drives sitting in a drawer.
- It's "cheap".
- They will be powered off nearly all the time so they will last longer.
- I will keep them at work so they aren't at home where my server is.
Anyways I get the drives in a couple days and I am trying to plan out the best way to transfer my array to them initially and the best way to manage them going forward.
Here are just some of the ideas I have come up with so far. I am looking for suggestion or tips to improve on my ideas so far. Maybe some of you have done something like this before or just have some good ideas or good software to use.
Pre-info
My array is a 20TB usable ZFS pool with currently 13TB used.
I have 3 4TB drives and 1 3TB drive to use as backup, so 15TB of space.
Part 1: Initially filling my drives.
So the problem I see is that I have lots of data in few folders. For example, my Movies folder is about 9TB alone. How should I easily copy this to multiple individual disks?
My current idea is to just start to rsync the first directory (Movies) until the drive is full. Then output the list of what files copied successfully into a text file and then input that as an -exclude-from list into the next rsync command to the next disk. Then once Movies is done, move onto my 3TB TV Show folder and so on.
Think this will work well or see a better way to do this?
Part 2: Keeping the backup updated.
One piece of information to keep in mind is that my ZFS pool is 90% media, so I very rarely delete or modify files. 90%+ of the time changes to my data are simply file additions.
I was thinking perhaps to update my backup monthly or possibly quicker. Once I finish my initial backup, I was going to take a zfs snapshot. Then when it's time to update my backup I would take another zfs snapshot and then do zfs diff to quickly and easily tell me the file differences. Again, 90% of these would be additions, so I would just manually copy these files to the backup disk that has room. I was going to do this manually, but perhaps I can parse the zfs diff output to a file and then create a script that rsyncs each file to a disk of my choosing. Then for any file deletions or modifications (rare case) pop in the other backup disks and make those changes.
I'm OK with this process and I don't think it will take too long to do once a month or so. But again, maybe there is a way easier way I'm missing.
Part 3: Extra data maintenance
I was also thinking that since my backup isnt going to have any redundancy or parity or anything that I should "scrub" my backups periodically. I was thinking of using SFV or some similar tool where I can just tell it to checksum all the files on my disks and it maintains like a text file on the disk keeping all the checksums. This way if a file becomes corrupted or bit-rotted and a sector needs to be remapped I will see which file didn't pass the checksum and then restore it fresh from my array.
Part 4: Random thoughts
I was thinking of using NTFS as the filesystem for my backup disks. I wanted them to be able to be read anywhere (mainly Windows and Linux) and NTFS does have built in compression that works in both places. Unless you guys see a really bad reason to use NTFS as the file-system and have compelling reasons for me to consider something else. Maybe a different filesystem would make the ideas above easier?
Thanks.