Samsung 2TB green drives defective firmware?

Discussion in 'SSDs & Data Storage' started by larrymoencurly, Dec 2, 2010.

  1. larrymoencurly

    larrymoencurly [H]ard|Gawd

    Messages:
    1,614
    Joined:
    Jul 18, 2002
    Data loss with 2TB Samsung HD204UI with firmware 1AQ10001 and some other Samsung F3 and F4 drives:

    http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks

    A translation of the article linked on that page says Samsung knows of the bug. Funny, but one of Samsung's FAQ pages said Samsung didn't issue firmware updates because their firmware was rigorously tested for bugs.
     
  2. mjn

    mjn [H]Lite

    Messages:
    90
    Joined:
    May 8, 2007
    Although i'm not suffering from data loss, i cannot get the drives to work in SATAII mode, they constantly "drop off" the SATAII channel, and i get read/write errors.
     
  3. drescherjm

    drescherjm [H]ardForum Junkie

    Messages:
    13,810
    Joined:
    Nov 19, 2008
  4. Cliff Couser

    Cliff Couser [H]ard|Gawd

    Messages:
    1,807
    Joined:
    Dec 22, 2003
    anyway to check firmware if the HD204UI is on a USB dock?
     
  5. john4200

    john4200 [H]ard|Gawd

    Messages:
    1,537
    Joined:
    Oct 16, 2009
  6. odditory

    odditory [H]ardness Supreme

    Messages:
    4,709
    Joined:
    Dec 23, 2007
    HDTune will let you see your firmware under the "Information" tab. You should be able to read it through a USB dock. But regardless, pretty much all HD204UI's right now have 1AQ10001.
     
    Last edited: Dec 3, 2010
  7. Gambit

    Gambit Gawd

    Messages:
    764
    Joined:
    Aug 26, 2002
    I take it this problem will still exist if the drives are in a RAID array? I can't think of why it wouldn't be an issue.
     
  8. drescherjm

    drescherjm [H]ardForum Junkie

    Messages:
    13,810
    Joined:
    Nov 19, 2008
    If this is indeed a drive bug where the drive randomly the drive ignores writing back 64 sectors at a time without telling the OS anyone storing any data on these is at serious risk of data loss. The best procedure for now would be to disable the cache while waiting for samsung to admit and fix the issue. I am going to test this at home since I have 3 drives with 1 of them full of data.

    Well if you use raid5 or better yet raid6 with these drives the raid should be able to detect and fix the problem for each stripe as long as only 1 drive (or 2 in raid6) in the stripe exhibits this problem. Run a raid integrity test to see if there are problems.
     
    Last edited: Dec 3, 2010
  9. tormentum

    tormentum Limp Gawd

    Messages:
    207
    Joined:
    Apr 18, 2010
    Aye, in addition, if you're a zfs user, the checksum writing will save you in this case. It would be a good idea to run a scrub more frequently for now, depending on your setup:

    zfs scrub <pool_name>

    I'm going to have a play at home and see if I can get this to report under solaris.
     
  10. drescherjm

    drescherjm [H]ardForum Junkie

    Messages:
    13,810
    Joined:
    Nov 19, 2008
    BTW, When I did my testing in the other thread I did notice a smartmontools oddity with this drive that I have never seen with any other drive. When badblocks was running any check of the SMART data with smartctl resulted in the POH being old. I mean 10 hours into the test it told me the drive was used for a total of 2 hours. However after the badblocks ended the POH was correct.
     
  11. Gambit

    Gambit Gawd

    Messages:
    764
    Joined:
    Aug 26, 2002
    This does only affect HD204UI drives with the firmware 1AQ10001 correct?
     
  12. drescherjm

    drescherjm [H]ardForum Junkie

    Messages:
    13,810
    Joined:
    Nov 19, 2008
    It seems like that is the case from the 1 article on the smartmontools website and the on the German magazine website. I have that model though and have experienced problems with badblocks. I have not seen this with my own data however. I do have 1 of my 3 2TB samsungs full of htpc data. Perhaps the reason for not seeing any problems yet is I did not watch all 2TB of video and second that I probably did not execute smartmontools or hdparm -I while recording video.
     
    Last edited: Dec 3, 2010
  13. odditory

    odditory [H]ardness Supreme

    Messages:
    4,709
    Joined:
    Dec 23, 2007
    A few updates to the findings linked in the OP:

    - It seems the issue has been narrowed from the original theories about certain SMART commands sent to the drive during a write operation leaving a write hole, to just one specific command: IDENTIFY DEVICE, an ATA command that queries a particular register on the drive's controller to determine the size of the drive.

    - Samsung is well aware of the issue and working on a firmware update

    Opinion: It doesn't sound like this issue affects these drives in day-to-day operation at least under Windows, unless during a write operation you also launch certain harddisk utilities that have the capability of sending that command, and do so let's say as part of their default startup behavior (I'm not aware of any that do so, just hypothesizing scenarios). It also doesn't *sound* like the issue would affect these drives while in a hardware based raid array, as I don't imagine a controller is constantly querying the drives for that information. Ultimately we'll all sleep better when a firmware update eliminates any remote possibility.
     
    Last edited: Dec 4, 2010
  14. larrymoencurly

    larrymoencurly [H]ard|Gawd

    Messages:
    1,614
    Joined:
    Jul 18, 2002
    HDDscan, from http://www.hddscan.com can do that for many USB interface chips, including the ones used by Western Digital (Initio).
     
  15. drescherjm

    drescherjm [H]ardForum Junkie

    Messages:
    13,810
    Joined:
    Nov 19, 2008
    Yes I think you are okay in Windows as long as you do not use any smart monitoring software or any other software that identifies the disk while writing to the disk. On top of that the OS may at some point use this command to identify the disk and also possibly some node locked software.
     
  16. odditory

    odditory [H]ardness Supreme

    Messages:
    4,709
    Joined:
    Dec 23, 2007
    And as you said drescherjm, people can just disable write cache for now. In Windows you go to System (WindowsKey+Break) -> Device Manger -> Disk Drives -> right-click each Samsung HD204UI and on the Policies tab uncheck "Enable write caching on the device" and click OK. Since these drives are fairly speedy and most people are using them for storage/archiving anyway, you may not even notice a difference with the write cache disabled.

    [​IMG]
     
    Last edited: Dec 4, 2010
  17. drescherjm

    drescherjm [H]ardForum Junkie

    Messages:
    13,810
    Joined:
    Nov 19, 2008
    Yes that should prevent data loss on windows. For me am on linux but I know I can use hdparm to disable the write cache there. I have not done this yet I want to test this myself first. I better disable any automatic smart checks..


    Edit: Now I remember I disabled that a few months ago because the smart monitoring daemon appeared to be waking my drives from sleep and I was trying to sleep as many drives as possible to cut down on noise and power (especially in the summer).
     
  18. Cliff Couser

    Cliff Couser [H]ard|Gawd

    Messages:
    1,807
    Joined:
    Dec 22, 2003
    thanks for all the useful advice odditory & drescherjm
     
  19. drescherjm

    drescherjm [H]ardForum Junkie

    Messages:
    13,810
    Joined:
    Nov 19, 2008
    I have duplicated the result in the first link to smartmontools. If I enable the cache and run a loop of hdparm -i in the background while running badblocks I get 64 or 128 bad blocks and these are because the disk did not write the data. There are no errors reported in dmesg.

    After this I disabled the cache and the problem does not appear to happen in 4 tests of different patterns in badblocks. Although the drive is very slow with disabled cache. I think this is partly due to writing 512 byte sectors to a 4096 byte sector drive.
     
  20. Metaluna

    Metaluna Limp Gawd

    Messages:
    393
    Joined:
    Jan 23, 2008
    Does anyone have experience with Samsung and their efficiency/willingness at supplying firmware fixes? I just took delivery of 5 of these drives a couple of days ago and while I don't need them right away, it would be nice to know if we are talking about 6 weeks or 6 months for typical Samsung bugfixes. Obviously every bug is different so there are no guarantees, but I'm trying to decide if I should just RMA them while I still can.

    One of the use cases I was considering for these drives was to use smartmontools at bootup to set the ERC timeout (aka TLER/CCTL) to enable them to work in a hardware array (via the 'smartctl -l scterc,70,70' command). Seems like it would be a risky strategy to issue any SMART commands to a live array unless you can guarantee there isn't an IDENTIFY command buried in there somewhere (or guarantee that the array isn't being written).
     
  21. odditory

    odditory [H]ardness Supreme

    Messages:
    4,709
    Joined:
    Dec 23, 2007
    I have to take back or at least amend what I posted prior, I was able to get MD5 CRC check to fail when copying a 1GB test file to a HD204UI even with write cache disabled in Windows.

    Granted it was inconsistent, sometimes CRC failed and sometimes it didn't when I simultaneously ran a smartctl -i command against the drive in a dos window even with write cache disabled. But once was enough to give me doubt about putting these drives into hardcore use. I was also able to get the CRC to fail by launching HDTune during the copy.

    What's the likelihood under normal operation that this write-hole firmware bug affects most people, probably low unless you have certain HDD utilities idling in the systray. For now, although I'm going to mostly have these drives sit on the sidelines until Samsung issues a firmware update, any file copying I do will be done with a CRC-checking capable copier, in my case Teracopy with the "always test after copy" option checked.

    FWIW I ran all the same tests against a Hitachi 2TB drive and couldn't ever get CRC to fail a single time; I was mostly curious in case this issue wasn't just limited to Samsung but at this point it seems to be.
     
    Last edited: Dec 7, 2010
  22. Stereodude

    Stereodude 2[H]4U

    Messages:
    3,302
    Joined:
    Oct 20, 2000
    Well, now we know why they were $60 at Newegg. :(
     
  23. odditory

    odditory [H]ardness Supreme

    Messages:
    4,709
    Joined:
    Dec 23, 2007
    that's silly. the $60 at Newegg thing was a pricing mistake according to a rep there. and the more recent $79 sale price still had nothing to do with the relative merits of this drive. i think longterm it'll prove itself a solid drive once it gets past the firmware hurdle. if I had to choose i'd take the HD204UI over any model WD green any day.
     
  24. SB1

    SB1 [H]Lite

    Messages:
    119
    Joined:
    Jan 21, 2009
    Yeah if anyone knows about how fast they are at doing firmware updates, that would be appreciated. If it's a month or 1.5 months that's fine I guess since I'll be using it as storage, but it will be plugged in all the time on my main PC (thinking as of right now).

    I just received mine today from SuperBiiz and haven't opened up the box yet. I just noticed that MicroCenter dropped the price from $100 to $90 today. Before I read this I was thinking of picking up another one once I did some testing on it in the next couple of days.

    If all went good I was going to get another one and also talk to my brother, friend and cousins and have them pick up one or two.
    But now I'm going to hold off on that and see when they plan on updating the firmware. I did a lot of research (well thought I did) and overall people were happy, well much happier than other 2tb drive models, that's for sure.

    You could say most people probably just transferred stuff over to it and played a couple things form their transfers and that was that, for the most part. But then again others would of done the same with the WD green 2tb ones also. Now the 7200 rpm 2tb drives like the Hitachi would be used as boot up drives way more than these green type hard drives, so they would see problems much sooner. People on this forum would test and see problems very soon, but regular folks probably wouldn't do any testing and may not notice anything.

    odditory, thanks for talking about Teracopy, I'll give that a try. Was thinking about getting SyncBack Pro, but I'll test Teracopy first since it's new to me, never hurts to try.

    I am planning on doing a clean install of Win 7 from WinXP and will be moving, deleting and repartitioning my hard drives. So hopefully the hard drive will be good enough until the firmware fix so I can go ahead with my plans.

    I'm going to have to re-read this tomorrow (way past my bedtime) and then hopefully do my testing right.
     
  25. rekd0514

    rekd0514 Gawd

    Messages:
    722
    Joined:
    Nov 24, 2007
    Well, I just disabled the write cache on mine in my WHS. Can you update the firmware on the drives without having to reload everything on them?
     
  26. drescherjm

    drescherjm [H]ardForum Junkie

    Messages:
    13,810
    Joined:
    Nov 19, 2008
    Most likely you will not need to erase the drive and reload everything. However I saw in a second thread that the fix would not be for a few weeks (early 2011) since Samsung did not write the firmware.
     
  27. odditory

    odditory [H]ardness Supreme

    Messages:
    4,709
    Joined:
    Dec 23, 2007
    Again you don't really have to be paranoid about writing files to the Samsung HD204UI unless you have other disk utils running in the systray, at least that's the way I see it. Under normal use with no other utils running the drive isn't going to get hit with the "Identify Device" ATA command to trigger the write-hole issue.

    Note Teracopy is more of a drop-in replacement to windows explorer file copy (like when you drag and drop files and folders in windows, Teracopy does the copy). SyncbackPro is more of a file backup and syncing utility.
     
  28. Stereodude

    Stereodude 2[H]4U

    Messages:
    3,302
    Joined:
    Oct 20, 2000
    That was supposed to be a joke. :p
     
  29. paziu

    paziu n00bie

    Messages:
    1
    Joined:
    Dec 8, 2010
    disabling the write cache in windoz, under drive properties will not disable cache on the drive itself, only on the operating system - same applies to linux.. to get the status for the "on-board" write cache ( in linux ) or disable/enable it, you could use ( first one view the status - second to set to disable caching ) - execute as root or >sudo<
    for windoz users ( and not only ) I would suggest parted magic - a load of features, and large H/W support... plus it looks cool:
    the only problem is, after power cycling the drive, it defaults to enabled - ( there is hdparm version also for windoz )
    if there is issue with handling the write-back or write-through cache policy on hitachi drives, disabling it in the OS will not make any difference....
    I am also waiting for the F/W update - got 2TB yesterday for $80 from superbiz, than found out about the F/W problem - if the price would not be as good, I would research first - lost my head again :)
    I have also a SAMSUNG HD154UI, FwRev=1AG01118 - very nice drive, using for off-line storage/backup only, I wonder if it also suffers from a bug in the F/W ( if it's really just a F/W bug )
     
    Last edited: Dec 8, 2010
  30. Gambit

    Gambit Gawd

    Messages:
    764
    Joined:
    Aug 26, 2002
    Holy shit, that was damn quick. Makes me feel better about the drive I've purchased. Problems will always creep up, having a quick response is what makes the difference.

    Thanks for the quick find.
     
  31. DTN107

    DTN107 [H]ardness Supreme

    Messages:
    4,544
    Joined:
    Jun 30, 2008
    +1

    I was feeling like Samsung was creeping up on pulling a Seagate but looks like Samsung got the job done and fast.

    Rep points to Samsung.
     
  32. pyr02k1

    pyr02k1 Limp Gawd

    Messages:
    393
    Joined:
    Nov 29, 2010
    I was thinking much the same. Was afraid I'd have to return the 4 I just bought and exchange them for the Hitachis. Instead, I'll probably be buying another 18 of them in a few days. Gotta love a companylike that
     
  33. drescherjm

    drescherjm [H]ardForum Junkie

    Messages:
    13,810
    Joined:
    Nov 19, 2008
    I installed it on one of my drives in linux and it appears to work. Here is what I did.

    I applied the fix on 1 drive using the grub boot disk method from the
    following site:

    http://idolinux.blogspot.com/2009/10/create-dos-boot-disk-for-cd-or-grub.html

    Since http://www.fdos.org does not have the files mentioned in the
    blog I had to find it here:
    http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/distributions/1.0/

    After that I rebooted into free dos. I selected something like highmem
    only in freedos and then ran the samsung executable 181550HD204UI.EXE.

    It found my 1 samsung F4 attached to the current machine (I have 2
    others on a different box) and patched it. The firmware revision # did
    not change however. That will be annoying as users can't tell they
    have the bad firmware from the good..

    I then ran the "How to reproduce" 3 times from the smartctl website
    and I was not able to trigger this bug with the new firmware.
     
  34. Tolyngee

    Tolyngee [H]ardness Supreme

    Messages:
    4,522
    Joined:
    Oct 17, 2005
    /facepalm

    I guess the only way to be fairly certain it did indeed update the firmware is to do what you suggested and see if the bug's triggered.
     
  35. tormentum

    tormentum Limp Gawd

    Messages:
    207
    Joined:
    Apr 18, 2010
    Props to Samsung for the quick release, however they should re-release with this fixed. I don't care what anyone says, version numbers are important.
     
  36. Metaluna

    Metaluna Limp Gawd

    Messages:
    393
    Joined:
    Jan 23, 2008
    Yes, and to be really certain, you need to run the procedure first with the old firmware and confirm that it fails, then confirm that it no longer fails with the new firmware.

    Maybe keeping the revision code the same was a way for them to bypass some internal bureaucracy or release/qualification procedure and get the fix out quicker (not necessarily a good thing if it means the fix isn't well tested).
     
  37. Gambit

    Gambit Gawd

    Messages:
    764
    Joined:
    Aug 26, 2002
    Technically, unless there's a problem running the patch against an already patched drive, no you don't. It doesn't matter if it shows the problem before the patch, just so long as it doesn't do it *after* the patch. I agree though, they should've had the version change; though for whatever the reason is, glad they got out a fix ASAP.
     
  38. Metaluna

    Metaluna Limp Gawd

    Messages:
    393
    Joined:
    Jan 23, 2008
    Well yes, but I was thinking more about validating that you are running the "How to reproduce" procedure correctly (especially for those of us who have not tried reproducing the bug yet). If you run the procedure on a known bad drive, and you can't make it fail, then you know something is wrong with the way you're doing it, or there's something odd about your system configuration, etc.. If your test is faulty, then you can't reliably tell if a drive has been updated or not.
     
    Last edited: Dec 9, 2010
  39. Gambit

    Gambit Gawd

    Messages:
    764
    Joined:
    Aug 26, 2002