Corruption 101

Ice Czar · Jun 5, 2004

Corruption 101
There sometimes seems to be a few hundred ways to munge your data
my personal list of probability based on

The Risks To Your Data @ the PC Guide
(with a few additions, and split into hardware, software and filesystem)

Hardware Failure

Memory Errors: With so many systems today running without error detection or correction on their system memory, there is a chance of a memory error corrupting the data on the hard disk. It is rare for it to happen, but it does happen.
Test your RAM with memtest86+ and or memtest86 Memtest, consider a board that supports ECC RAM

Power Loss: Losing power at the wrong time, such as when you are doing sensitive work on your hard disk, can easily result in the loss of many files
Use a high quality PSU with stable voltage, employ a UPS or other line conditioning, avoid hard restarts

Cables not included on the PCGuide's original list it is none the less an importatnt possibility, especially with todays transfer speeds, there is a very real reason that the industry is adopting SATA over PATA see post below for a full discussion and links

System Timing Problems: Setting the timing for memory or cache access too aggressively, or using a hard disk interface transfer mode that is too fast for the system or device, can cause data loss. This is often not something that will generally be realized until after some amount of damage has been done.
see tRAS below, bump back an overclock, try a different divider, dont overclock if you cant lock the PCI bus, I would bump the probability of this up a few slots if the machine is overclocked, otherwise I ve put it here

Resource Conflicts: Conflicts resulting from peripherals that try to use the same interrupt requests, DMA channels or I/O addresses, can cause data to become corrupted.
Review PIRQ routing and manual assignment of IRQs in the worse case senerio, chipset specific but a good overview

The Hard DriveTest with the manufacturers Diagnostic, most of the rest is preventative, proper handling vibration free and cool environment, with clean air and stable floor

Software Failure
Busmastering Drivers

Filesytem Corruption

Power Issues
There are three basic areas of power problems
1.Source Power Brown outs, blackouts, spikes\surges ect.
see > Power Conditioning and DIY UPS @ Dans Data, for the basics
In this category I would also place power issues due to pilot error, hard restarts and shorts, avoid both. Shutdown properly and pay attention when mounting your motherboard and routing power cables.

2. Under Power: Basically too many components for the power supply,
dont be decieved by wattage figures, its the amount of amps per rail that is really important.
See > Choosing the right Power Supply &
takaman's Power Supply Calculator rev0.61x
to determine the amps you need per rail

3. Voltage Stability Pretty much the all the following
[H]ardcore PSU info (Charts)
http://terasan.okiraku-pc.net/dengen/tester/index.html
http://terasan.okiraku-pc.net/dengen/tester2/index.html
(note the PC Power & Cooling, Antec, Ablecom, and Zippy)

In Japanese

But the graphs speak volumes
and the PSU are identified in English]

Continuous Power vs. Peak Power at Spin-Up
12V power profile (current vs. time) of an IDE/ATA hard disk at startup. You can see that the peak power draw is over quadruple
the steady-state operating requirement. The graph appears "noisy"
due to frequent oscillations in current requirements

Peak vs. Continuous Power
Despite this extra capacity, it is still a good idea to not load up your system to the very limit of your power supply's stated power capacity. It is also wise, if possible to employ features that delay the startup of some disk drive motors when the PC is first turned on, so the +12 voltage is not overloaded by everything drawing maximum current at the same time.

refering to the links above again
http://terasan.okiraku-pc.net/dengen/tester/index.html

note the consistent voltage instability at startup and shortly thereafter in those graphs

Winbond Launches New Bus Termination Regulator April 4th 2003

"Winbond Electronics Corporation, a leading supplier of semiconductor solutions, today launched the W83310S, a new DDR SDRAM bus termination regulator. The solution, new to Winbond's ACPI product family, is aimed at desktop PC and embedded system applications with DDR SDRAM requirements.

Computer systems architectures continue to evolve and are becoming more complex; CPU and memory speeds continue to increase ever more rapidly with every technology turn. More and more high current/low voltage power sources are required for PC systems. This is particularly true for high-speed components such as CPU, memory, and system chipsets. The performance of these components is highly dependent upon stable power. Therefore, motherboard designers require accurate, stable, low-ripple and robust power solutions for these components.

Many system designs use discrete components to implement bus termination functions. This approach creates several problems including poorer quality load regulation; higher voltage-ripple, increased usage of board space and inconsistent designs when different discrete components are used."

and just to reinterate this point one more time
http://www.anandtech.com/showdoc.html?i=1774&p=8
"the majority of damaged RAM returned to memory manufacturers is destoryed by fluctuations in the voltage."

the transient response is the critical measure, unfortunately its not a metric that is commonly supplied with the PSU specs
(this seems to be slowly changing, as some manufacturers are supplying the transient response now)

Transient Response: As shown in the diagram here, a switching power supply uses a closed feedback loop to allow measurements of the output of the supply to control the way the supply is operating. This is analogous to how a thermometer and thermostat work together to control the temperature of a house. As mentioned in the description of load regulation above, the output voltage of a signal varies as the load on it varies. In particular, when the load is drastically changed--either increased or decreased a great deal, suddenly--the voltage level may shift drastically. Such a sudden change is called a transient. If one of the voltages is under heavy load from several demanding components and suddenly all but one stops drawing current, the voltage to the remaining current may temporarily surge. This is called a voltage overshoot.

Transient response measures how quickly and effectively the power supply can adjust to these sudden changes. Here's an actual transient response specification that we can work together to decode: "+5V,+12V outputs return to within 5% in less than 1ms for 20% load change." What this means is the following: "for either the +5 V or +12 V outputs, if the output is at a certain level (call it V1) and the current load on that signal either increases or decreases by up to 20%, the voltage on that output will return to a value within 5% of V1 within 1 millisecond". Obviously, faster responses closer to the original voltage are best."

Ice Czar · Jun 5, 2004

Cables

there is definately a reason why SATA is being adopted, ATA\IDE\EIDE\ATAPI is an unterminated standard and as speeds increase that causes more and more problems, especially if cheap cables are employed, complex device configurations, hotswap\removable drivebay bridgecards and poor cable routing

a few

p) links and excerpts:
Standard (40-Conductor) IDE/ATA Cables

In many ways, the cable is the weak link in the IDE/ATA interface. It was originally designed for very slow hard disks that transferred less than 5 MB/s, not the high-speed devices of today. Flat ribbon cables have no insulation or protection from electromagnetic interference. Of course, these are reasons why the 80-conductor cable was developed for Ultra DMA. However, even with slower transfer modes there are limitations on how the cable can be used.

The main issue is the length of the cable. The longer the cable, the more the chance of data corruption due to interference on the cable and uneven signal propagation, and therefore, it is often recommended that the cable be kept as short as possible. According to the ATA standards, the official maximum length is 18 inches, but if you suspect problems with your hard disk you may find that a shorter cable will eliminate them. Sometimes moving where the disks are physically installed in the system case will let you use a shorter cable

Warning: There are companies that sell 24" and even 36" IDE cables. They are not recommended because they can lead to data corruption and other problems. Many people use these with success, but many people do a lot of things they shouldn't and get away with it. :^)

Ultra DMA (80-Conductor) IDE/ATA Cables

There are a lot of issues and problems associated with the original 40-conductor IDE cable, due to its very old and not very robust design. Unterminated flat ribbon cables have never been all that great in terms of signal quality and dealing with reflections from the end of the cable. The warts of the old design were tolerable while signaling speeds on the IDE/ATA interface were relatively low, but as the speed of the interface continued to increase, the limitations of the cable were finally too great to be ignored.

that "upgrade happened at 66MB/s burst, we are now at the same speed as the PCI bus for burst rates 133MB/s

Fancy IDE leads - The Terrible Truth

The spec mandates such short cables for two reasons.

Reason one - practically all IDE cables are unshielded. There's nothing around the conductors but insulation. Electromagnetic radiation goes straight through insulation. So external interference from the rest of your computer's giblets can influence the signal on your IDE leads.

Unshielded cables act like antennas. Generally speaking, the longer you make 'em, the more energy they can pick up from their environment.

Reason two - IDE cables are unterminated. "Termination", in the electrical sense, is essential to provide "impedance matching", which in English is what you have to do to stop the signal from reflecting off the end of the cable like a wave that hits the end of a bathtub.

Electric current does not move instantaneously down a wire. It travels at nearly the speed of light, but when you've got thirty-three and a third million clock pulses per second - which is the speed of the IDE bus - even light in a vacuum only moves a hair under nine metres per clock pulse.

So if you're fooling around with, say, a double-the-rated-length 900mm IDE lead, there's an end-to-end signal delay in it of about a tenth of a clock pulse. The signals you want your drives and your motherboard to be able to hear will be significantly blurred by delayed reflections from each end of the cable.

Transfer your data at twice or three times the UDMA/33 speed - as UDMA/66 and 100 do - and reflected signals get more and more out of step with the real signal, and do it more and more harm.

IDE ATA and ATAPI Disks Use PIO Mode After Multiple Time-Out or CRC Errors Occur

Serial ATA and the 7 Deadly Sins of Parallel ATA
Critical Limiting Factors in Parallel Design
There are some fundamental differences between serial and parallel buses, more importantly, there are some critical limiting factors in the design and implementation of any parallel bus.

1. Non-Interlocked (source synchronous) clocking
2. 3.3 V high-low signaling with 5V legacy tolerance
3. Cabling constraints
4. Connector legacy
5. Termination
6. Command queuing
7. PCB Design

3. Cable Design Issues: Cross-Talk and Ground Bouncing vs.Ringing

Each signal propagating through a data line makes the data line act like the inductor of a transformer. That is, each voltage swing generates a dynamic electromagnetic field, that, depending on cable length and proximity will induce another signal in adjacent data lines. This cross-talk adds noise to data lines and can produce errors by generating false positives or negatives simply by induction of voltage swings in data lines.

Another problem with parallel pathways is the phenomenon of simultaneously switching outputs (SSO) noise. As we explained in detail in our reviews of the i845 and the SIS645 chipsets, SSO noise becomes really problematic if the majority of signals switch from high to low since this can induce ground bouncing. On the chipset level, workaround in form of dynamic bus inversion (DBI) is feasible, that is, instead of switching all bits, only the reference bit is switched simultaneously at the sender and receiver end which has the same net effect, namely, that the system does not see the reference switch but thinks that all other lines have switched. DBI, however requires an additional latency cycle and this is where the 40 ns clock cycle time starts to look really ugly.

ATA not so Frequently Asked Questions
Or: Why Ribbon Cables are unsuitable for RF transmission of data

The following article was written by snn47 to address some of the issues associated with standard ribbon cables and the use of e.g. removable drive racks as an attempt to share some insight into factors that can adversely affect the life or reliability of of desktop Hard Disk Drives. Specifically, issues like why some drives are working in some systems and not in others, the impact of cable routing and why is it that the drive manufacturers always recommend using their own cables (if supplied with the drive). (emphasis mine)

Any RF system has a limited tolerance for distortion of signals, which, in the worst case, can destroy some of the semiconductor components. While a certain amount of variation is part of any systems specification, one needs to remember that ATA was never intended to handle today's data rates. ATA or Advanced Technology Attachment started as the usual run of the mill or: "just a system at the lowest possible price point that will work most of the time without the need for huge financial investments". The problems started when the system was forced to handle higher and higher clock and data rates within the original design limitations. Keep in mind that the latest ATA-PI7 specifications allow data rates of 133 MB/sec, which is 44-times faster than the original ATA transfer of 3 MB/sec. This increase in speed makes it necessary to enforce minimum tolerances and detailed specifications to allow for the manufacturing of affordable systems with minimum compatibility problems.

these are just a few excerpts, I would highly advice that everyone give them a good read, there ARE good rounded ATA cables RD3XP Super Shielded
"RD3XP is made from ATA 100/133 High impedance flat cable cut into 8 layers of 10 cable wires, with a ground wire and signal wire alternatively, and folded in zigzag-piled so that each signal wire is surrounded by 4 ground wires."

but like their SCSI counterparts, they aint cheap, there are also high quality flat cables (you buy a $300 RAID card, and they dont ship you crappy PVC cables, they are either Teflon or Thermoplastic Olefin (TPO)

Up until a little while ago I would have said ant investment made in high quality cables was money well spent, however with the introduction of SATA, that doesnt necessarily hold true anymore
unless your dealing with critical data (in which case you should be running ECC RAM) or your actually experiencing problems

a further excerpt from ATA not so FAQs

Preliminary Conclusions and Possible Cure

Reasons for changes in the propagation impedance, cross-coupling between adjacent signal wires and signal-velocity from one setup to another are :

Impedance of the drive and controller in high/low signal level will be different for different models.
Reflection of signals that garble the pulse, due to incorrect termination impedance or impedance-inconsistencies from the controller to the drive meaning the Impedance from the controller and the drive(s) differ.

If there is a a second drive (connector present/connected) the impedance will fluctuate at this point.

A. Only one HDD per controller channel.

B. Use a cable with only 2 connectors.

-Signal delay will increase with the length of the flat-ribbon-cable propagation of the signals were intended for a max. flat-ribbon-cable length of 18" with ~ 5ns/m would be 2.3ns delay.

C. shorten the cable whenever possible.

D. If the case requires long cables consider mounting just the HDD closer to the connectors of the controller or consider exchanging the usual desktop case, for a 19" case. Mount the HDD just above or below the PCB-controller-connector to allow you to reduce the length of the flat-ribbon-cable to a few cm.

Flat-ribbon-cable with different isolation material (higher/lower eR) and change in the conductor diameter will change the ratio of (2D/d).

Are rounded cable used?
E. Try exchanging the cable against another type/brand of flat-ribbon-cable.

Is the flat-ribbon-cable at some point parallel to a conducting grounded surface?
F. Try a different routing of your flat-ribbon-cable away from a ground-plane,

Was the cable cut apart and/or rolled it to get a rounded cable?
G. Unroll it and try B., if cut apart then start with A.

Is the drive mounted in a removable drive rack?
H. Remove HDD from the drive-bay and start with A.

However you should checkout the section in Dansdata's "IDE Fancy Leads, the terrible truth" as to why with all this goin on,
for the most part, it still works anyway

Check out what your chipset has to sort out here
http://www.vicstech.com/en/rd3xp/NoiseTest/
click on a picture to see an animated test
(note not all types of cables where employed, for instance there are no high quality TPO or teflon cables in this test)

like the Power Supply, cables are widely underated as a source of problems, and few ever spend any money on them for anything but "looks"

System Timing
By setting the memory timings to aggressively, overclocking the Front Side Bus (without locking the PCI bus) or using a hard disk interface transfer mode that is too fast for the system or device (or cables) can cause data loss

These days most "enthusiast" boards allow the PCI bus to be locked or employ a divider, but overclocking is always a risk to your data. A more common tweak is aggressive memory timings
and those to can effect data intgrity

from Eight Ways To Kill Your HDD

TRAS Violation: The Creeping Corruption of a HDD

One of the most common reasons for HDD failure is what is called tRAS violation. tRAS is the minimum bank open time of the DRAM, that is, we are talking about system memory here. Many mainboard manufacturer still include Ultra and Turbo settings in their CMOS setup options that are only workable at 100 MHz memory bus settings, a.k.a PC1600 mode. One setting that has absolutely no impact on performance is the minimum bank open time or tRAS, while the same setting can have catastrophic consequences for data integrity including HDD addressing schemes if the latency is set too short. In theory, tRAS can be as short as tRCD + CAS delay, however, in reality, the minimum bank open time is dictated by the RAS Pulse Width, that is the time required to reach a voltage differential between memory bitlines and reference lines to safely identify a 0 or 1 logical state.

The main reason why tRAS violation does commonly lead to HDD corruption may relate to the translation of the physical memory space into virtual memory sub-spaces by the operating system and finally writing the data back to the storage media but it is not entirely clear what is going on there. A fact is, though, that a tRAS value of 5 is adequate for PC1600 or 100 MHz operation. At 133 Mz or PC2100, tRAS should never undercut 6T, likewise, at PC2700, the value should be increased to 7T where applicable. In terms of performance, tRAS settings hardly make any difference. We challenged some performance gurus at AMD on this matter and they reported a drop in Quake frame rates from 792 fps to 790 fps when increasing tRAS from 5T to 6T.

for indepth information > tRAS violation as a cause of data corruption @ Lost Circuits

Ice Czar · Jun 6, 2004

FileSystem
http://ntfs.com/data-integrity.htm

An Explanation of CHKDSK and the New /C and /I Switches

<MORE

"To understand when it might be appropriate to use these switches (/C and /I) , it is important to have a basic understanding of some of the internal NTFS data structures, the kinds of corruption that can take place, what actions CHKDSK takes when it verifies a volume, and what the potential consequences are in circumventing CHKDSK's usual verification steps.

CHKDSK's activity is split into three major "passes" during which it examines all the "metadata" on the volume and an optional fourth pass. Metadata is "data about data." It is the file system overhead, so to speak, that is used to keep track of everything about all of the files on the volume. Metadata tells what allocation units make up the data for a given file, what allocation units are free, what allocation units contain bad sectors, and so on. The "contents" of a file, on the other hand, is termed "user data." NTFS protects its metadata through the use of a transaction log. User data is not so protected.

During its first pass, CHKDSK displays a message on the screen saying that it is verifying files and counts from 0 to 100 percent complete. During this phase, CHKDSK examines each file record segment (FRS) in the volume's master file table (MFT). Every file and directory on an NTFS volume is uniquely identified by a specific FRS in the MFT and the percent complete that CHKDSK displays during this phase is the percent of the MFT that has been verified. During this pass, CHKDSK examines each FRS for internal consistency and builds two bitmaps, one representing what FRSs are in use, and the other representing what clusters on the volume are in use. At the end of this phase, CHKDSK knows what space is in use and what space is available both within the MFT and on the volume as a whole. NTFS keeps track of this information in bitmaps of its own that are stored on the disk allowing CHKDSK to compare its results with NTFS's stored bitmaps. If there are discrepancies, they are noted in CHKDSK's output. For example, if an FRS that had been in use is found to be corrupted, the disk clusters formerly associated with that FRS will end up being marked as available in CHKDSK's bitmap, but will be marked as being "in use" according to NTFS's bitmap.

During its second pass, CHKDSK displays a message on the screen saying that it is verifying indexes and counts from 0 to 100 percent complete a second time. During this phase, CHKDSK examines each of the indexes on the volume. Indexes are essentially NTFS directories and the percent complete that CHKDSK displays during this phase is the percent of the total number of directories on the volume that have to be checked. During this pass, CHKDSK examines each directory on the volume for internal consistency and also verifies that every file and directory represented by an FRS in the MFT is referenced by at least one directory. It also confirms that every file or subdirectory referenced in each directory actually exists as a valid FRS in the MFT and checks for circular directory references. Finally, it confirms that the various time stamps and file size information associated with files are all up-to-date in the directory listings for those files. At the end of this phase, CHKDSK has ensured that there are no "orphaned" files and that all the directory listings are for legitimate files. An orphaned file is one for which a legitimate FRS exists, but which is not listed in any directory. When an orphaned file is found, it can often be restored to its rightful directory, provided that directory is still around. If the directory that should hold the file no longer exists, CHKDSK will create a directory in the root directory and place the file there. If directory listings are found that reference FRSs that are no longer in use or that are in use but do not correspond to the file listed in the directory, the directory entry is simply removed.

During its third pass, CHKDSK displays a message on the screen saying that it is verifying security descriptors and counts from 0 to 100 percent complete a third time. During this phase, CHKDSK examines each of the security descriptors associated with each of the files and directories on the volume. Security descriptors contain information regarding the owner of the file or directory, NTFS permission for the file or directory, and auditing information for the file or directory. The percent complete in this case is the percent of the number of files and directories on the volume. CHKDSK verifies that each security descriptor structure is well formed and internally consistent. It does not verify that the listed users or groups actually exist or that the permissions granted are in any way appropriate.

The fourth pass of CHKDSK is only invoked if the /R switch is used. /R is used to locate bad sectors in the volume's free space. When /R is used, CHKDSK attempts to read every sector on the volume to confirm that the sector is usable. Sectors associated with metadata are read during the natural course of running CHKDSK even when /R is not used. Sectors associated with user data are read during earlier phases of CHKDSK provided /R is specified. When an unreadable sector is located, NTFS will add the cluster containing that sector to its list of bad clusters and, if the cluster was in use, allocate a new cluster to do the job of the old. If a fault tolerant disk driver is being used, data is recovered and written to the newly allocated cluster. Otherwise, the new cluster is filled with a pattern of 0xFF bytes. When NTFS encounters unreadable sectors during the course of normal operation, it will also remap them in the same way. Thus, the /R switch is usually not essential, but it can be used as a convenient mechanism for scanning the entire volume if a disk is suspected of having bad sectors.

The preceding paragraphs give only the broadest outline of what CHKDSK is actually doing to verify the integrity of an NTFS volume. There are many specific checks made during each pass and several quick checks between passes that have not been mentioned. Instead, this is simply an outline to the more important facets of CHKDSK activity as a basis for the following discussion regarding the time required to run CHKDSK and the impact of the new switches provided in SP4"

MORE>

Description of Enhanced Chkdsk, Autochk, and Chkntfs Tools in Windows 2000

Ice Czar · Jun 6, 2004

Originally posted by Bung
There have been many cases of problems with SATA drives esp in RAID arrays traced to noise in cabling or connectors.

very true (I compiled that quite awhile ago and havent updated all that recently)

I have a SATA Cautions thread sticked on my forum

http://www.ata-atapi.com/sata.htm

FIRST, THINGS YOU DO NOT DO WHEN USING SATA!

If you are setting up a system using SATA here are some things you must be aware of:

DO NOT operate SATA devices outside of a sealed system unit. DO NOT operate SATA devices from a power supply that is not the system unit's power supply.
DO NOT tie wrap SATA cables together. DO NOT put sharp bends in SATA cables. DO NOT route SATA cables near PATA cables. Avoid placing SATA devices close to each other such that the SATA cable connectors are close to each other.
DO NOT operate a radio transmitter (such as a cell phone) near an exposed SATA cable or device.
Why all these warning? The basic problem is the SATA cable connector is not shielded. This has to be the number one most stupid thing that has been done in the SATA world.

SECOND, LETS TALK ABOUT SATA RELIABILITY!

Are you thinking about buying a Serial ATA system and drive? If yes, read this... The Serial ATA (or SATA) products that are now shipping and available in your local computer store may not be the most reliable products. Testing of SATA products with tools such ATACT program are finding a variety of problems. These problems are timeout errors, data compare errors, and strange status errors. These problems are being reported by a large number of people doing SATA product testing. Hale's advice at this time is be very careful - make sure you can return the SATA product your purchased if it does not perform as you expect. See the ATACT link above for some ATACT log files showing both normal testing of a parallel ATA (PATA) drive (no errors!) and testing of a SATA drive (lots of errors!).

The unshielded SATA cable connector is mostly like the source of many of these problems. Making things worse is the failure of the SATA specification to implement an equivalent to the ATA Soft Reset. On a PATA interface Soft Reset rarely fails to get ATA/ATAPI devices back to a known state so that a command can be retried. On a SATA interface the equivalent to this reset does not seem to reset anything and at some times it is basically ignored by the SATA controller and device.

And finally, ... Don't buy SATA because it claims to be faster than PATA. The marketing claims that it can transfer data at up to 150MB/second (making it faster than the fastest PATA Ultra DMA mode, mode 6 or 133MB/second) will not be seen with the SATA products that are shipping today (late 2003). Today's SATA products are actually 10% to 20% slower than PATA. This is because today's SATA products are really PATA products with an extra SATA-to-PATA 'bridge chip' in the device. These bridge chips add significant overhead to the SATA protocols. In time there will real 'native' SATA devices that do not need these bridge chips - Then we can see what the true performance of SATA. But remember SATA is a 'serial interface' and serial interfaces rarely live up to their marketing claims.

Hale Landis maintains the ata-atapi.com website, and has been working for open standards for 25 years. He has been a participant in the ANSI X3/NCITS Technical Committees that developed the ATA and ATA/ATAPI standards since 1990, and works as a consultant and provider of test software.

http://www.theregister.co.uk/2001/03/07/the_open_pc_is_dead/

there is a demo version of ATA Command Test available for download there as well

quite a few people have fabricated their own sheilded cables to good effect

Ice Czar · Jul 23, 2004

The Consequences

well other than your data going south

KingPariah777 said:
I just like being able to see jenna jameson without huge pixelated neon green corruptions all over her.

Windows will start to default to lower and lower transfer modes

DMA Mode for ATA/ATAPI Devices in Windows XP

For repeated DMA errors. Windows XP will turn off DMA mode for a device after encountering certain errors during data transfer operations. If more that six DMA transfer timeouts occur, Windows will turn off DMA and use only PIO mode on that device.

In this case, the user cannot turn on DMA for this device. The only option for the user who wants to enable DMA mode is to uninstall and reinstall the device.

Windows XP downgrades the Ultra DMA transfer mode after receiving more than six CRC errors. Whenever possible, the operating system will step down one UDMA mode at a time (from UDMA mode 4 to UDMA mode 3, and so on).

If the mini-IDE driver for the device does not support stepping down transfer modes, or if the device is running UDMA mode 0, Windows XP will step down to PIO mode after encountering six or more CRC errors. In this case, a system reboot should restore the original DMA mode settings.

All CRC and timeout errors are logged in the system event log. These types of errors could be caused by improper mounting or improper cabling (for example, 40-pin instead of 80-pin cable). Or such errors could indicate imminent hardware failure, for example, in a hard drive or chipset.

and the W2K counterpart

IDE ATA and ATAPI Disks Use PIO Mode After Multiple Time-Out or CRC Errors Occur
After the Windows IDE/ATAPI Port driver (Atapi.sys) receives a cumulative total of six time-out or cyclical redundancy check (CRC) errors, the driver reduces the communications speed (the transfer mode) from the highest Direct Memory Access (DMA) mode to lower DMA modes in steps. If the driver continues to receive time-out or CRC errors, the driver eventually reduces the transfer mode to the slowest mode (PIO mode).

which has further consequences
there is a myth about putting optical drives on the same channel as HDDs, it is just that a myth, but it keeps getting reinforced by the way Windows deals with ATA\ATAPI issues
basically with Independent Device Timing two devices (master\slave) both transfer their data at their own highest speed, but, they both either have to be PIO (which is glacially slow) or UDMA, if one defaults to PIO because of some issue, Windows will default the other as well. There was a time when CDROMs where only PIO, and HDDs where DMA, for that period of history you didnt want to share a channel, but modern opticals are UDMA mode2 so there is rarely any issue

some of the reasons a device might default to PIO
DMA Mode for ATA/ATAPI Devices in Windows XP

Ice Czar · Nov 1, 2004

Integrity Testing

USING ATACT (edited into the top, follies follow

)

well this is how I did it, and it was only employed on my PATA onboard channels
so for IDE Cards or RAID Channels, SATA ports, youll have to blunder around yourself

download ATACT demo
extract and change the .ug files to .txt files and read them
Download Dr DOS
run it and it will extract to a new floopy
access the floppy and remove the CONFIG.SYS file
(if you employ an MS-DOS boot disk you also need to remove the AUTOEXEC.BAT)
then add the ATACT.EXE file to it

reboot and change your boot order to floopy first if it isnt already
I was prompted to enter the Date and Time (???)
and then given an A:/ prompt
I always check the directory at that point (just to be sure)
A:/DIR which looked OK so
A:/ATACT P0 = runs on Primary Channel Device Zero (Master)
A:/ATACT P1 = runs on Primary Channel Device One (Slave)
A:/ATACT P01 = runs on Primary Channel Both Devices
A:/ATACT S0 = runs on Secondary Channel Device Zero (Master)
A:/ATACT S1 = runs on Secondary Channel Device One (Slave)
A:/ATACT S01 = runs on Secondary Channel Both Devices

there are lots of other switches, troubleshooting and modes
but I havent tried them, so Good Luck

another way to test data integrity is to copy and transfer it between partitions, drives, arrays, and computers, then run a checksum on the both sets of files to compare them, they should match
a very good freeware checksum is Fsum by slavasoft

Possibility to calculate a file message digest and/or a checksum using any of the 12 well-known and documented hash and checksum algorithms: MD2, MD4, MD5, SHA-1, SHA-2( 256, 384, 512), RIPEMD-160, PANAMA, TIGER, ADLER32, CRC32;

however since IDE covers up errors and resends them, while this will reveal if the data has been transfered without error, it wont verify that the data signaling is optimized, actual errors are pretty rare, so even a few are a sign something is wrong, in my personal experience various compressed archives seem to be more suceptible to corruption than a normal file, data corruption can strike any backup or redundancy strategy, so its important to test, and on occassion retest, when you need to restore important data isnt the time to find out its corrupted

Good Luck, and have fun.

Corruption 101

Ice Czar

Guest

Ice Czar

Guest

Ice Czar

Guest

Ice Czar

Guest

Ice Czar

Guest

Ice Czar

Inscrutable