Preventing bitrot on USB drives in Windows (ReFS?)

Skipper007

Limp Gawd
Joined
Jun 28, 2004
Messages
185
Hello,


So for the past five years I have been using a 1.5TB USB drive to transport files between two PCs I have in different locations. Usually this worked pretty well but on a couple occassions I tried to open a file only to find it was corrupt. I never figured out what was causing the issue as the drive's sectors would show up as fine when scanned.


I just bought a 3TB to replace the 1.5TB drive (I need the space going forward) and I am wondering if there is anything I can do to minimize the chance of data being corrupted. Does anyone have recommendations for best practices?


In particular, does anyone have experience with ReFS? I understand it won't correct bitrot automatically unless used with storage spaces, but if it can help me detect it in a timely fashion it sounds like a useful tool. I am curious as if ReFS is going to be useful, I should probably set it up before I start throwing data on the new drive.
 
If these are mechanical drives formatted with NTFS, the biggest cause of issues is forgetting to hit do the "Safely Remove Hardware" icon in your windows system tray. This stops the drive from doing any writes, optimizations, indexing, defrags etc. and finishes the current action. You may get lucky 9 times out of 10 unplugging a drive without doing it, but that 1 time you will interrupt something and possibly corrupt the drive's tables.

The other things to look for are worn USB ports and cables. I just had a 3TB WD Passport that would power on but was never recognized even in the Windows disk manager. I thought the drive was toast. I went and bought a spare, but then re-thought the "simplest things are usually the issue" and swapped the cable. Sure enough it worked. The cable was getting bent somehow while in the laptop bag.
 
I use teracopy which prevents a large amount of errors from just transfering data. If you have a dying drive or a low quality drive or a bad link you can get total trash data just by moving files. Teracopy has saved 100s of files on good drives over the last year and 1000s of files from dying hardrives.
 
If these are mechanical drives formatted with NTFS, the biggest cause of issues is forgetting to hit do the "Safely Remove Hardware" icon in your windows system tray. This stops the drive from doing any writes, optimizations, indexing, defrags etc. and finishes the current action. You may get lucky 9 times out of 10 unplugging a drive without doing it, but that 1 time you will interrupt something and possibly corrupt the drive's tables.

The other things to look for are worn USB ports and cables. I just had a 3TB WD Passport that would power on but was never recognized even in the Windows disk manager. I thought the drive was toast. I went and bought a spare, but then re-thought the "simplest things are usually the issue" and swapped the cable. Sure enough it worked. The cable was getting bent somehow while in the laptop bag.

I did have trouble with the drive dropping offline due to worn connectors at one point but the corruption issues showed up well before that - and sometimes the corruption would be in a single file that I had added months ago, worked when I initially added it to the drive and hadn't been opened since except in programs I would assume are read only (ie. opening a JPEG with the Windows 7 photo viewer). Any thoughts on what would cause that kind of corruption?
 
I did have trouble with the drive dropping offline due to worn connectors at one point but the corruption issues showed up well before that - and sometimes the corruption would be in a single file that I had added months ago, worked when I initially added it to the drive and hadn't been opened since except in programs I would assume are read only (ie. opening a JPEG with the Windows 7 photo viewer). Any thoughts on what would cause that kind of corruption?
Are you doing the "Safely Remove Hardware" option that I mentioned before?
 
Are you doing the "Safely Remove Hardware" option that I mentioned before?

Since the PCs were in different locations my routine was to power down the PC and then unplug the drive for transport. Should have the same effect.

I do have to make one correction though: in the original post I said the drive's sectors showed up fine when scanned. On further reflection, I recall maybe two instances where the scan marked a single sector as bad, which may explain the issue though I feel like a couple of the corruption incidents also occurred after the sectors were marked.

Again, it sounds to me like this is the kind of thing ReFS was designed to help deal with, but I am also not sure whether there is enough benefit in single drive configuration to make up for the performance and compatibility hit (both of the systems I will use the new drive with will run Windows 10 but I like having the option to use it with other systems in a pinch).
 
I use ReFS on all my HDDs, for most purposes, it's not different than using NTFS. You miss a few things like EFS, and it's slower if you checksum all file data. But none of my drives has had a corruption, so I don't know how that's handled or if there are notifications etc. I just use powershell to check for read errors once in a while and so far nothing. By default I think when you format with ReFS it does not checksum file data, if you want file data checksummed you also need to format with 'format drive: /fs:ReFS /i:enable /q'. I think you can only format a storage space mirror or parity to ReFS with file checksumming in Windows 10, to do a single drive you'll need to boot into Server, the evaluation version should do and you can do it from the install screen but it's tedious. It's way too much work for the average person unfortunately.
 
I use ReFS on all my HDDs, for most purposes, it's not different than using NTFS. You miss a few things like EFS, and it's slower if you checksum all file data. But none of my drives has had a corruption, so I don't know how that's handled or if there are notifications etc. I just use powershell to check for read errors once in a while and so far nothing. By default I think when you format with ReFS it does not checksum file data, if you want file data checksummed you also need to format with 'format drive: /fs:ReFS /i:enable /q'. I think you can only format a storage space mirror or parity to ReFS with file checksumming in Windows 10, to do a single drive you'll need to boot into Server, the evaluation version should do and you can do it from the install screen but it's tedious. It's way too much work for the average person unfortunately.

Thanks for the reply. I do have a couple followup questions.
1) How hard is it to check for read errors - I have a bit of experience using command lines but not a lot - and how long does it take?
2) How bad is the performance hit with checksums? Would I still be able to use it to store my main Adobe Lightroom catalog (ie. the one that I actually work on)? What if I use drive encryption as well?

Once again, thanks for the help.
 
Thanks for the reply. I do have a couple followup questions.
1) How hard is it to check for read errors - I have a bit of experience using command lines but not a lot - and how long does it take?
2) How bad is the performance hit with checksums? Would I still be able to use it to store my main Adobe Lightroom catalog (ie. the one that I actually work on)? What if I use drive encryption as well?

Once again, thanks for the help.

For 1. I run this powershell command: "get-physicaldisk|get-storagereliabilitycounter" And it returns this in about a second or two:

Code:
DeviceId Temperature ReadErrorsUncorrected Wear PowerOnHours
-------- ----------- --------------------- ---- ------------
3                        0                     0    16980
2                                              0    8684
4                        0                     0    47734
0                                              0    20619
10                                             0
1                                              0    10387
9                                              0
6                        0                     0    3165
5                        0                     0    3165

Then there are other commands to get serial number and drive type from device id, etc. in the event of a failure. You'll probably want to check this out if you do go this route: Storage Cmdlets in Windows PowerShell - Only thing is, if you have long dormant files that have not been read in a while, it won't know if they have a problem until they are read. If you make a storage space mirror, Windows is supposed to run a background process when the system is idle to read and check all data.
for 2. I don't use lightroom, you'll have to test it for your work load, I just moved a 21GB file from a 2TB WD Green drive to a 2x4TB Seagate Storage Space mirror array, both with file integrity enabled, after the memory cache filled up I was getting 35MBps. And 4K random reads/writes will take a big hit I think. It's really only for people for whom data integrity is the major concern and speed is a distant secondary.
 
Last edited:
So it took me a while to get around to testing things, but I got a single drive formatted in ReFS (it can be done on Windows 10 using a registry hack) and tested it's performance. For the most part performance is fine even with checksums and bitlocker enabled, probably faster than the 1.5TB Seagate it will replace. Changes to large files like virtual hard disks are an exception but I can disable checksums for those files.

I am curious as to what happens when the checksum doesn't match - does warn you about the corruption and ask what you want to do, or does it simply deny access? If anyone knows how this works I would appreciate the info.

devil22:
Thanks for the additional info. I am aware that ReFS won't check the files automatically without storage spaces but I figure that as long as the files get read occasionally - say using a program like RapidCRC - I should be fine.
 
For 1. I run this powershell command: "get-physicaldisk|get-storagereliabilitycounter" And it returns this in about a second or two:

Code:
DeviceId Temperature ReadErrorsUncorrected Wear PowerOnHours
-------- ----------- --------------------- ---- ------------
3                        0                     0    16980
2                                              0    8684
4                        0                     0    47734
0                                              0    20619
10                                             0
1                                              0    10387
9                                              0
6                        0                     0    3165
5                        0                     0    3165

Then there are other commands to get serial number and drive type from device id, etc. in the event of a failure. You'll probably want to check this out if you do go this route: Storage Cmdlets in Windows PowerShell - Only thing is, if you have long dormant files that have not been read in a while, it won't know if they have a problem until they are read. If you make a storage space mirror, Windows is supposed to run a background process when the system is idle to read and check all data.
for 2. I don't use lightroom, you'll have to test it for your work load, I just moved a 21GB file from a 2TB WD Green drive to a 2x4TB Seagate Storage Space mirror array, both with file integrity enabled, after the memory cache filled up I was getting 35MBps. And 4K random reads/writes will take a big hit I think. It's really only for people for whom data integrity is the major concern and speed is a distant secondary.

Does microsoft disk management 2 drive mirroring also do integrity check in the background?
 
It's odd back in the USB2.0 days I never used to worry or bother with "Safely Remove Hardware" ever. I never had an issue. Just plug and play and yank it out when done. Rock solid.

Then when I moved over to USB3.0 gear...oh boy...that cavalier attitude just did not work well. So many "The Drive has an issue/corrupted/needs formatting" etc. going on. I now always make sure I use the option (I wish it was a bit bigger/easier to click on) Sandisk USB3.0 drives seem very picky. The activity light on them will keep blinking long after the data transfer is over. Some high end Lexars I have too.
 
Back
Top