2006-12-06 22:02:08

by Matthias Schniedermeyer

[permalink] [raw]
Subject: single bit errors on files stored on USB-HDDs via USB2/usb_storage

1:
245A E2F0: 0E D9 35 01 00 F4 7B F8 00 00 01 E0 09 00 80 00 ..5...{. ........
245A E300: 0D FF FF FF FF FF FF FF FF FF FF FF FF FF DF FC ........ ........
245A E310: 20 92 50 90 DC F4 0C 1A 1A 18 DB 80 4E 61 25 80 .P..... ....Na%.

245A E2F0: 0E D9 35 01 00 F4 7B F8 00 00 01 E0 09 00 80 00 ..5...{. ........
245A E300: 0D FF FF FF FF F7 FF FF FF FF FF FF FF FF DF FC ........ ........
245A E310: 20 92 50 90 DC F4 0C 1A 1A 18 DB 80 4E 61 25 80 .P..... ....Na%.

2:
24F9 F770: 00 00 01 E0 09 00 80 00 0D FF FF FF FF FF FF FF ........ ........
24F9 F780: FF FF FF FF FF FF FC 13 64 0B 38 68 EA A2 11 86 ........ d.8h....
24F9 F790: 61 7A EE EC ED 1D 6F 31 32 6E 4D D9 B5 31 37 66 az....o1 2nM..17f

24F9 F770: 00 00 01 E0 09 00 80 00 0D FF FF FF FF FF FF FF ........ ........
24F9 F780: FF FF FF FF FF F7 FC 13 64 0B 38 68 EA A2 11 86 ........ d.8h....
24F9 F790: 61 7A EE EC ED 1D 6F 31 32 6E 4D D9 B5 31 37 66 az....o1 2nM..17f

3:
20CB C6B0: 00 FB 3F F8 00 00 01 E0 09 00 80 80 0D 21 2A 1B ..?..... .....!*.
20CB C6C0: 65 F1 FF FF FF FF FF FF FF FF 3E C4 BC 2B 39 A4 e....... ..>..+9.
20CB C6D0: 8E 85 50 EB 7B 02 7B 93 79 77 50 EF 60 32 8C 03 ..P.{.{. ywP.`2..

20CB C6B0: 00 FB 3F F8 00 00 01 E0 09 00 80 80 0D 21 2A 1B ..?..... .....!*.
20CB C6C0: 65 F1 FF FF FF F7 FF FF FF FF 3E C4 BC 2B 39 A4 e....... ..>..+9.
20CB C6D0: 8E 85 50 EB 7B 02 7B 93 79 77 50 EF 60 32 8C 03 ..P.{.{. ywP.`2..

4:
1F13 06B0: 00 00 01 E0 09 00 80 00 0D FF FF FF FF FF FF FF ........ ........
1F13 06C0: FF FF FF FF FF FF 7F 5C 14 05 F2 9E 90 0F 6F A4 .......\ ......o.
1F13 06D0: B8 10 BF E9 6A 78 A3 00 13 00 FD 9C 00 A5 5B EB ....jx.. ......[.

1F13 06B0: 00 00 01 E0 09 00 80 00 0D FF FF FF FF FF FF FF ........ ........
1F13 06C0: FF FF FF FF FF F7 7F 5C 14 05 F2 9E 90 0F 6F A4 .......\ ......o.
1F13 06D0: B8 10 BF E9 6A 78 A3 00 13 00 FD 9C 00 A5 5B EB ....jx.. ......[.

5:
1F13 06B0: 00 00 01 E0 09 00 80 00 0D FF FF FF FF FF FF FF ........ ........
1F13 06C0: FF FF FF FF FF FF 7F 5C 14 05 F2 9E 90 0F 6F A4 .......\ ......o.
1F13 06D0: B8 10 BF E9 6A 78 A3 00 13 00 FD 9C 00 A5 5B EB ....jx.. ......[.

1F13 06B0: 00 00 01 E0 09 00 80 00 0D FF FF FF FF FF FF FF ........ ........
1F13 06C0: FF FF FF FF FF F7 7F 5C 14 05 F2 9E 90 0F 6F A4 .......\ ......o.
1F13 06D0: B8 10 BF E9 6A 78 A3 00 13 00 FD 9C 00 A5 5B EB ....jx.. ......[.


Attachments:
errors.txt (2.34 kB)

2006-12-07 01:13:59

by Robert Hancock

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Matthias Schniedermeyer wrote:
> Hi
>
>
> I'm using a Bunch auf HDDs in USB-Enclosures for storing files.
> (currently 38 HDD, with a total capacity of 9,5 TB of which 8,5 TB is used)

All the same enclosure type?

> This time i kept the defective files and used "vbindiff" to show me the
> difference. Strangly in EVERY case the difference is a single bit in a
> sequence of "0xff"-Bytes inside a block of varing bit-values that
> changed a "0xff" into a "0xf7".
> Also interesting is that each error is at a 0xXXXXXXX5-Position
>
> Attached is a file with 5 of the 6 differences named 1-5. Of each of the
> 5 2x3 lines-blocks the first 3 lines are the original the following 3
> lines contain the error in the middle row 6th value.
>
> NEVER did i see any messages in syslog regarding erros or an aborting
> program due to errors passed down from the kernel or something like that.

The fact that the corruption seems data dependent would seem to me to
point to some kind of hardware problem. I would tend to suspect the
USB-to-IDE converters in the enclosures as being faulty or something
like that..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-12-07 09:03:39

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Robert Hancock wrote:
> Matthias Schniedermeyer wrote:
>> Hi
>>
>>
>> I'm using a Bunch auf HDDs in USB-Enclosures for storing files.
>> (currently 38 HDD, with a total capacity of 9,5 TB of which 8,5 TB is
>> used)
>
> All the same enclosure type?

36x"Fantec (was MaPower) DB-335U2-1" with Genesys-Logic-Chipset (at
least the model i used yesterday said that. I bought this 36 enclosures
in the time from May/2005 - October/2006, so it is possible that they
use different chipsets and/or revisions of the chipset)
2x"IOmega 33644" bought last week, with a Chipset that says it is from
IOMega, but i guess it is just a rebranded.
I have errors with all of them.

I have a spare enclosure Fantec DB-35U2-2, AFAICT it uses a
Cypress-Chipset which i haven't used for some time, so ATM i don't
remember if i had it with this one too.


>> This time i kept the defective files and used "vbindiff" to show me the
>> difference. Strangly in EVERY case the difference is a single bit in a
>> sequence of "0xff"-Bytes inside a block of varing bit-values that
>> changed a "0xff" into a "0xf7".
>> Also interesting is that each error is at a 0xXXXXXXX5-Position
>>
>> Attached is a file with 5 of the 6 differences named 1-5. Of each of the
>> 5 2x3 lines-blocks the first 3 lines are the original the following 3
>> lines contain the error in the middle row 6th value.
>>
>> NEVER did i see any messages in syslog regarding erros or an aborting
>> program due to errors passed down from the kernel or something like that.
>
> The fact that the corruption seems data dependent would seem to me to
> point to some kind of hardware problem. I would tend to suspect the
> USB-to-IDE converters in the enclosures as being faulty or something
> like that..
>


--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2006-12-08 16:19:29

by John Stoffel

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

>>>>> "Stefan" == Stefan Richter <[email protected]> writes:

Stefan> Search for firmware updates from the manufacturer of the
Stefan> enclosure, of the bridge board, or of the bridge chip... if
Stefan> you didn't do so already. Some chips support firmware upload
Stefan> to an EEPROM, usually via a Windows utility.

I did this with my USB2.0/Firewire external enclosure and it fixed
most of the problems, but enough remained that I've basically given up
on those enclosures, they're just not reliable in my book.

John

2006-12-08 10:39:37

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Stefan Richter wrote:
> Matthias Schniedermeyer wrote:
>
>>Robert Hancock wrote:
>>
>>>Matthias Schniedermeyer wrote:
>>>
>>>>I have a 1,5 Meter and a 4,5 Meter cable connected to the USB-Controller
>>>>and i only use of them depending on where the HDD is placed in my room,
>>>>the other one is dangling unconnected.
>>>>
>>>>Then i will unconnect the short cable and use the long cable exclusivly
>>>>and see if it gets better(tm).
>
> BTW, I suspect front panel connectors could introduce noise too, via the
> jumper cables from motherboard to the panel.

It's a 5 port PCI-Addon-Card, no front panel connectors.
(The computers has only an OHCI/USB 1.1 controller onboard, which i use
for keyboard & mouse)

>>>That long cable could be part of the problem - I don't think the USB
>>>specification allows for cables that long (something like a 6 foot max
>>>as I recall).
>>
>>http://en.wikipedia.org/wiki/USB2
>>
>>Says that 5 meters are allowed.
>
>
> I don't know about USB 2.0, but in case of FireWire, ~4.5m long cables
> are theoretically in spec too. I've got a FireWire 400 and a FireWire
> 800 cable this long, and both don't work very unreliable. Depending on
> what's connected, they fail sooner or later. However due to how FireWire
> works, this is immediately noticed as data CRC errors or bus resets.
> I.e. it's nearly impossible for noisy hardware to _silently_ cause data
> corruption. I would suppose USB has similar CRC checks.
>
> Also, you mentioned that the corruption occurs systematically on certain
> byte patterns. Therefore it's certainly not related to the cables.

It'd guess that too, but who can that say for sure. :-|





Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2006-12-08 03:20:11

by Robert Hancock

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Matthias Schniedermeyer wrote:
> Hmmm. That's the only thing that i currently may be doing wrong.
> I have a 1,5 Meter and a 4,5 Meter cable connected to the USB-Controller
> and i only use of them depending on where the HDD is placed in my room,
> the other one is dangling unconnected.
>
> Then i will unconnect the short cable and use the long cable exclusivly
> and see if it gets better(tm).

That long cable could be part of the problem - I don't think the USB
specification allows for cables that long (something like a 6 foot max
as I recall).

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-12-08 10:25:10

by Stefan Richter

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Matthias Schniedermeyer wrote:
> Robert Hancock wrote:
>> Matthias Schniedermeyer wrote:
>>> I have a 1,5 Meter and a 4,5 Meter cable connected to the USB-Controller
>>> and i only use of them depending on where the HDD is placed in my room,
>>> the other one is dangling unconnected.
>>>
>>> Then i will unconnect the short cable and use the long cable exclusivly
>>> and see if it gets better(tm).

BTW, I suspect front panel connectors could introduce noise too, via the
jumper cables from motherboard to the panel.

>> That long cable could be part of the problem - I don't think the USB
>> specification allows for cables that long (something like a 6 foot max
>> as I recall).
>
> http://en.wikipedia.org/wiki/USB2
>
> Says that 5 meters are allowed.

I don't know about USB 2.0, but in case of FireWire, ~4.5m long cables
are theoretically in spec too. I've got a FireWire 400 and a FireWire
800 cable this long, and both don't work very unreliable. Depending on
what's connected, they fail sooner or later. However due to how FireWire
works, this is immediately noticed as data CRC errors or bus resets.
I.e. it's nearly impossible for noisy hardware to _silently_ cause data
corruption. I would suppose USB has similar CRC checks.

Also, you mentioned that the corruption occurs systematically on certain
byte patterns. Therefore it's certainly not related to the cables.
--
Stefan Richter
-=====-=-==- ==-- -=---
http://arcgraph.de/sr/

2006-12-08 09:07:36

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Robert Hancock wrote:
> Matthias Schniedermeyer wrote:
>
>> Hmmm. That's the only thing that i currently may be doing wrong.
>> I have a 1,5 Meter and a 4,5 Meter cable connected to the USB-Controller
>> and i only use of them depending on where the HDD is placed in my room,
>> the other one is dangling unconnected.
>>
>> Then i will unconnect the short cable and use the long cable exclusivly
>> and see if it gets better(tm).
>
>
> That long cable could be part of the problem - I don't think the USB
> specification allows for cables that long (something like a 6 foot max
> as I recall).

http://en.wikipedia.org/wiki/USB2

Says that 5 meters are allowed.



Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2006-12-07 19:41:22

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: [usb-storage] single bit errors on files stored on USB-HDDs via USB2/usb_storage

Alan Stern wrote:
> On Wed, 6 Dec 2006, Matthias Schniedermeyer wrote:
>
>
>>Hi
>>
>>
>>I'm using a Bunch auf HDDs in USB-Enclosures for storing files.
>>(currently 38 HDD, with a total capacity of 9,5 TB of which 8,5 TB is used)
>>
>>After i realised about a year(!) ago that the files copied to the HDDs
>>sometimes aren't identical to the "original"-files i changed my
>>procedured so that each file is MD5 before and after and deleted/copied
>>again if an error is detected.
>>
>>My averate file size is about 1GB with files from about 400MB to 5000MB
>>I estimate the average error-rate at about one damaged file in about
>>10GB of data.
>>
>>I'm not sure and haven't checked if the files are wrongly written or
>>"only" wrongly read back as i delete the defective files and copy them
>>again.
>>
>>Today i copied a few files back and checked them against the stored MD5
>>sums and 5 files of 86 (each about 700 MB) had errors. So i copied the 5
>>files again. 4 of the files were OK after that and coping the last file
>>the third time also resulted in the correct MD5.
>>
>>This time i kept the defective files and used "vbindiff" to show me the
>>difference. Strangly in EVERY case the difference is a single bit in a
>>sequence of "0xff"-Bytes inside a block of varing bit-values that
>>changed a "0xff" into a "0xf7".
>>Also interesting is that each error is at a 0xXXXXXXX5-Position
>>
>>Attached is a file with 5 of the 6 differences named 1-5. Of each of the
>>5 2x3 lines-blocks the first 3 lines are the original the following 3
>>lines contain the error in the middle row 6th value.
>>
>>NEVER did i see any messages in syslog regarding erros or an aborting
>>program due to errors passed down from the kernel or something like that.
>
>
> This was almost certainly caused by hardware flaws in the USB interface
> chips of the enclosures. There's nothing the kernel can do about it
> because the errors aren't reported; all that happens is that incorrect
> data is sent to or from the drive.

So pretty much all ich can do is to pray that the errors don't corrupt
the Filesystem-Metadata (XFS).

So i should definetly consider writing me a "NO-FS" where the
"filesystem"-part is stored elsewhere and the HDD contains 100% content
(Minus a Dummy-MBR-Block for sector 0). On the plus side such a
filesystem won't have any overhead at all, but on the flipside you loose
pretty much the whole content if you lose the metadata. But i guess in
my case it would considerably lower the risk of loosing data.




Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2006-12-07 22:57:20

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

DervishD wrote:
> Hi Matthias :)
>
> * Matthias Schniedermeyer <[email protected]> dixit:
>
>>My averate file size is about 1GB with files from about 400MB to
>>5000MB I estimate the average error-rate at about one damaged file in
>>about 10GB of data.
>>
>>I'm not sure and haven't checked if the files are wrongly written or
>>"only" wrongly read back as i delete the defective files and copy them
>>again.
>>
>>Today i copied a few files back and checked them against the stored MD5
>>sums and 5 files of 86 (each about 700 MB) had errors. So i copied the 5
>>files again. 4 of the files were OK after that and coping the last file
>>the third time also resulted in the correct MD5.
>
>
> I had more or less the same issue a week or two ago. I performed
> lots of tests and only by replacing the USB2.0 PCI card, the USB cable
> and the power supply of the usb-hdd adapter got the problem solved.
>
> I'm not sure if the problem is really gone, but the system works now
> reliably. I don't know if sooner or later I'll get the issue again,
> because I didn't really identify a culprit: looks like the
> card+adapter+cable combination was just "ugly", and errors from the
> adapter were not reported correctly.

The 38 HDDs are in 38 enclosures, so each has it's own power supply. I
have used different cables and i replaced the USB-Controller once.

So it can't be a single faulty component. Except when the computer
itself would be the culprit.

>>NEVER did i see any messages in syslog regarding erros or an aborting
>>program due to errors passed down from the kernel or something like
>>that.
>
> The same here! Looks like USB-HDD adapters don't report any errors
> to the kernel :?????
>
> The best advice I can give you, from my limited experience with the
> problem, is: replace the cable. This minimizes the chance of corrupted
> data getting into the adapter. If that doesn't solve the problem, try
> removing any unconnected cable that is plugged into the USB card.
> Believe it or not, a long but unconnected cable (put there just to be
> able to plug my USB card-reader without having to look for the cable in
> a drawer) was causing errors *even in a Kingston USB key that worked
> flawlessly otherwise*!!!

Hmmm. That's the only thing that i currently may be doing wrong.
I have a 1,5 Meter and a 4,5 Meter cable connected to the USB-Controller
and i only use of them depending on where the HDD is placed in my room,
the other one is dangling unconnected.

Then i will unconnect the short cable and use the long cable exclusivly
and see if it gets better(tm).

> If you have any other question, feel free to drop me a note. I'm
> sorry I cannot give a much more technical or scientific answer, but
> unfortunately I have none :((

Thank you anyway.





Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2006-12-07 23:46:37

by Pete Zaitcev

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

On Thu, 07 Dec 2006 20:41:12 +0100, Matthias Schniedermeyer <[email protected]> wrote:

> >>I'm using a Bunch auf HDDs in USB-Enclosures for storing files.
> >>(currently 38 HDD, with a total capacity of 9,5 TB of which 8,5 TB is used)
> >>[....]
> >>This time i kept the defective files and used "vbindiff" to show me the
> >>difference. Strangly in EVERY case the difference is a single bit in a
> >>sequence of "0xff"-Bytes inside a block of varing bit-values that
> >>changed a "0xff" into a "0xf7".

> > This was almost certainly caused by hardware flaws in the USB interface
> > chips of the enclosures. There's nothing the kernel can do about it
> > because the errors aren't reported; all that happens is that incorrect
> > data is sent to or from the drive.
>
> So pretty much all ich can do is to pray that the errors don't corrupt
> the Filesystem-Metadata (XFS).

No, this is not all. You should buy a variety of different enclosures
with different chipsets (e.g. find a Freecom if you can), and also
use decent cables.

-- Pete

2006-12-07 18:08:50

by Alan Stern

[permalink] [raw]
Subject: Re: [usb-storage] single bit errors on files stored on USB-HDDs via USB2/usb_storage

On Wed, 6 Dec 2006, Matthias Schniedermeyer wrote:

> Hi
>
>
> I'm using a Bunch auf HDDs in USB-Enclosures for storing files.
> (currently 38 HDD, with a total capacity of 9,5 TB of which 8,5 TB is used)
>
> After i realised about a year(!) ago that the files copied to the HDDs
> sometimes aren't identical to the "original"-files i changed my
> procedured so that each file is MD5 before and after and deleted/copied
> again if an error is detected.
>
> My averate file size is about 1GB with files from about 400MB to 5000MB
> I estimate the average error-rate at about one damaged file in about
> 10GB of data.
>
> I'm not sure and haven't checked if the files are wrongly written or
> "only" wrongly read back as i delete the defective files and copy them
> again.
>
> Today i copied a few files back and checked them against the stored MD5
> sums and 5 files of 86 (each about 700 MB) had errors. So i copied the 5
> files again. 4 of the files were OK after that and coping the last file
> the third time also resulted in the correct MD5.
>
> This time i kept the defective files and used "vbindiff" to show me the
> difference. Strangly in EVERY case the difference is a single bit in a
> sequence of "0xff"-Bytes inside a block of varing bit-values that
> changed a "0xff" into a "0xf7".
> Also interesting is that each error is at a 0xXXXXXXX5-Position
>
> Attached is a file with 5 of the 6 differences named 1-5. Of each of the
> 5 2x3 lines-blocks the first 3 lines are the original the following 3
> lines contain the error in the middle row 6th value.
>
> NEVER did i see any messages in syslog regarding erros or an aborting
> program due to errors passed down from the kernel or something like that.

This was almost certainly caused by hardware flaws in the USB interface
chips of the enclosures. There's nothing the kernel can do about it
because the errors aren't reported; all that happens is that incorrect
data is sent to or from the drive.

Alan Stern

2006-12-07 22:09:40

by DervishD

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Hi Matthias :)

* Matthias Schniedermeyer <[email protected]> dixit:
> My averate file size is about 1GB with files from about 400MB to
> 5000MB I estimate the average error-rate at about one damaged file in
> about 10GB of data.
>
> I'm not sure and haven't checked if the files are wrongly written or
> "only" wrongly read back as i delete the defective files and copy them
> again.
>
> Today i copied a few files back and checked them against the stored MD5
> sums and 5 files of 86 (each about 700 MB) had errors. So i copied the 5
> files again. 4 of the files were OK after that and coping the last file
> the third time also resulted in the correct MD5.

I had more or less the same issue a week or two ago. I performed
lots of tests and only by replacing the USB2.0 PCI card, the USB cable
and the power supply of the usb-hdd adapter got the problem solved.

I'm not sure if the problem is really gone, but the system works now
reliably. I don't know if sooner or later I'll get the issue again,
because I didn't really identify a culprit: looks like the
card+adapter+cable combination was just "ugly", and errors from the
adapter were not reported correctly.

> NEVER did i see any messages in syslog regarding erros or an aborting
> program due to errors passed down from the kernel or something like
> that.

The same here! Looks like USB-HDD adapters don't report any errors
to the kernel :?????

The best advice I can give you, from my limited experience with the
problem, is: replace the cable. This minimizes the chance of corrupted
data getting into the adapter. If that doesn't solve the problem, try
removing any unconnected cable that is plugged into the USB card.
Believe it or not, a long but unconnected cable (put there just to be
able to plug my USB card-reader without having to look for the cable in
a drawer) was causing errors *even in a Kingston USB key that worked
flawlessly otherwise*!!!

If you have any other question, feel free to drop me a note. I'm
sorry I cannot give a much more technical or scientific answer, but
unfortunately I have none :((

Ra?l N??ez de Arenas Coronado

--
Linux Registered User 88736 | http://www.dervishd.net
It's my PC and I'll cry if I want to... RAmen!

2006-12-08 12:21:31

by Stefan Richter

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Matthias Schniedermeyer wrote:
> Pete Zaitcev wrote:
...
>>>>>I'm using a Bunch auf HDDs in USB-Enclosures for storing files.
>>>>>(currently 38 HDD, with a total capacity of 9,5 TB of which 8,5 TB is used)
...
>> You should buy a variety of different enclosures
>> with different chipsets (e.g. find a Freecom if you can),
>
> That would definetly cost way to much money and time to be in any way
> "efficient".

Search for firmware updates from the manufacturer of the enclosure, of
the bridge board, or of the bridge chip... if you didn't do so already.
Some chips support firmware upload to an EEPROM, usually via a Windows
utility.
--
Stefan Richter
-=====-=-==- ==-- -=---
http://arcgraph.de/sr/

2006-12-08 12:27:33

by Stefan Richter

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Oliver Neukum wrote:
> Am Freitag, 8. Dezember 2006 11:39 schrieb Matthias Schniedermeyer:
>> > Also, you mentioned that the corruption occurs systematically on certain
>> > byte patterns. Therefore it's certainly not related to the cables.
>>
>> It'd guess that too, but who can that say for sure. :-|
>
> You may have a bit pattern that stresses the controllers and suddenly
> a marginal cable may matter.

And one more thing: I heard of FireWire enclosures which corrupted data
(although AFAIR with error detection by the drivers) due to overheating
PHY chip or bridge chip. Gluing a small passive heat sink to the
respective chip solved it in the reported case.
--
Stefan Richter
-=====-=-==- ==-- -=---
http://arcgraph.de/sr/

2006-12-08 11:00:22

by Oliver Neukum

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Am Freitag, 8. Dezember 2006 11:39 schrieb Matthias Schniedermeyer:
> > I.e. it's nearly impossible for noisy hardware to _silently_ cause data
> > corruption. I would suppose USB has similar CRC checks.

It has.

> > Also, you mentioned that the corruption occurs systematically on certain
> > byte patterns. Therefore it's certainly not related to the cables.
>
> It'd guess that too, but who can that say for sure. :-|

You may have a bit pattern that stresses the controllers and suddenly
a marginal cable may matter.

Regards
Oliver

2006-12-07 23:07:51

by Jan Engelhardt

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage


On Dec 7 2006 23:57, Matthias Schniedermeyer wrote:
>DervishD wrote:
>
>The 38 HDDs are in 38 enclosures, so each has it's own power supply. I
>have used different cables and i replaced the USB-Controller once.
>
>So it can't be a single faulty component. Except when the computer
>itself would be the culprit.

Or a production failure.


-`J'
--

2006-12-08 09:31:57

by DervishD

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Hi Matthias :)

* Matthias Schniedermeyer <[email protected]> dixit:
> > * Matthias Schniedermeyer <[email protected]> dixit:
> >
> >>Today i copied a few files back and checked them against the stored
> >>MD5 sums and 5 files of 86 (each about 700 MB) had errors. So i
> >>copied the 5 files again. 4 of the files were OK after that and
> >>coping the last file the third time also resulted in the correct
> >>MD5.
> >
> > I had more or less the same issue a week or two ago. I performed
> > lots of tests and only by replacing the USB2.0 PCI card, the USB
> > cable and the power supply of the usb-hdd adapter got the
> > problem solved.
> >
> The 38 HDDs are in 38 enclosures, so each has it's own power supply. I
> have used different cables and i replaced the USB-Controller once.
>
> So it can't be a single faulty component. Except when the computer
> itself would be the culprit.

In my case, the same applied: the problem didn't seem to be a single
faulty component, but a combination. Since I finally wasn't able to
check in another computer where I could carry tests, or check under
windows in the same machine, I don't know if the problem was a faulty
device driver, a faulty motherboard, RAM problems (although memtest said
my RAM was OK), etc.

> > The best advice I can give you, from my limited experience with the
> > problem, is: replace the cable. This minimizes the chance of corrupted
> > data getting into the adapter. If that doesn't solve the problem, try
> > removing any unconnected cable that is plugged into the USB card.
>
> Hmmm. That's the only thing that i currently may be doing wrong.
> I have a 1,5 Meter and a 4,5 Meter cable connected to the USB-Controller
> and i only use of them depending on where the HDD is placed in my room,
> the other one is dangling unconnected.

I don't know why the heck this was causing problems, since any
electric noise that could be picked from the dangling cable shouldn't
affect neither the card nor the other USB ports, but...

Ra?l N??ez de Arenas Coronado

--
Linux Registered User 88736 | http://www.dervishd.net
It's my PC and I'll cry if I want to... RAmen!

2006-12-08 09:16:48

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Pete Zaitcev wrote:
> On Thu, 07 Dec 2006 20:41:12 +0100, Matthias Schniedermeyer <[email protected]> wrote:
>
>
>>>>I'm using a Bunch auf HDDs in USB-Enclosures for storing files.
>>>>(currently 38 HDD, with a total capacity of 9,5 TB of which 8,5 TB is used)
>>>>[....]
>>>>This time i kept the defective files and used "vbindiff" to show me the
>>>>difference. Strangly in EVERY case the difference is a single bit in a
>>>>sequence of "0xff"-Bytes inside a block of varing bit-values that
>>>>changed a "0xff" into a "0xf7".
>
>
>>>This was almost certainly caused by hardware flaws in the USB interface
>>>chips of the enclosures. There's nothing the kernel can do about it
>>>because the errors aren't reported; all that happens is that incorrect
>>>data is sent to or from the drive.
>>
>>So pretty much all ich can do is to pray that the errors don't corrupt
>>the Filesystem-Metadata (XFS).
>
>
> No, this is not all. You should buy a variety of different enclosures
> with different chipsets (e.g. find a Freecom if you can),

That would definetly cost way to much money and time to be in any way
"efficient".

> and also use decent cables.

I replaced all cables with "High Quality"-cables.
But as a "Joe user" it is practically impossible to really know if the
cables are good.
All i can say is that the "original" cables that came with the
enclosures appear a bit thin and the ones i bought appear much more
thick, have gold plated contacts and have a massive plaited shielding
IOW appear much more trustworthy. But, as i said, in the end i can't
really know if they are better than the original ones.




Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2006-12-09 06:11:19

by Ben Nizette

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

>>> Also, you mentioned that the corruption occurs systematically on certain
>>> byte patterns. Therefore it's certainly not related to the cables.
>> It'd guess that too, but who can that say for sure. :-|
>
> You may have a bit pattern that stresses the controllers and suddenly
> a marginal cable may matter.

The errors occur in strings of 0xFFs. From the USB standard:

a ?1? is represented by no change in level and a ?0? is represented by a
change in level

so this error-infested bytes are effectively long, quiet times on the
wire. I would have thought this would be the _least_ stressful time for
the controllers but maybe they are also more susceptible to noise during
this period.

Regards,
Ben

2006-12-09 08:17:06

by Oliver Neukum

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Am Samstag, 9. Dezember 2006 07:11 schrieb Ben Nizette:
> >>> Also, you mentioned that the corruption occurs systematically on certain
> >>> byte patterns. Therefore it's certainly not related to the cables.
> >> It'd guess that too, but who can that say for sure. :-|
> >
> > You may have a bit pattern that stresses the controllers and suddenly
> > a marginal cable may matter.
>
> The errors occur in strings of 0xFFs. From the USB standard:
>
> a ?1? is represented by no change in level and a ?0? is represented by a
> change in level

Yes, plus added stuffing bits.

> so this error-infested bytes are effectively long, quiet times on the
> wire. I would have thought this would be the _least_ stressful time for
> the controllers but maybe they are also more susceptible to noise during
> this period.

The longer you don't change the voltage the likelier are reciever and
transmitter to get out of sync.

Regards
Oliver

2006-12-09 10:16:14

by Ben Nizette

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Oliver Neukum wrote:
> Am Samstag, 9. Dezember 2006 07:11 schrieb Ben Nizette:
>>>>> Also, you mentioned that the corruption occurs systematically on certain
>>>>> byte patterns. Therefore it's certainly not related to the cables.
>>>> It'd guess that too, but who can that say for sure. :-|
>>> You may have a bit pattern that stresses the controllers and suddenly
>>> a marginal cable may matter.
>> The errors occur in strings of 0xFFs. From the USB standard:
>>
>> a ?1? is represented by no change in level and a ?0? is represented by a
>> change in level
>
> Yes, plus added stuffing bits.
>
>> so this error-infested bytes are effectively long, quiet times on the
>> wire. I would have thought this would be the _least_ stressful time for
>> the controllers but maybe they are also more susceptible to noise during
>> this period.
>
> The longer you don't change the voltage the likelier are reciever and
> transmitter to get out of sync.

Yes, hence the bit-stuffing, you're right :). And hence this period
isn't really too stressful for the controller as the stuffed bits come
relatively often.

We're hoping that any wire-errors get picked up by the CRC anyway so a
marginal cable under any circumstances shouldn't silently corrupt data.
I love that word 'shouldn't' ;)

Regards,
Ben.

2006-12-10 08:44:55

by George Spelvin

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

How the wires can cause single-bit errors is a bit beyond me;
USB protects every bit on the wire well enough that communication
errors should be detected.

Every packet starts with an identifier byte; this contains a 4-bit packet
identifier repeated twice.

Some small "token" packets have an 11-bit payload (7 address and 4
endpoint bits) and a 5-bit CRC.

Any corruption of those would result in USB state machine confusion and
at least large data gaps.

Packets with an actual data payload are protected with a CRC-16.
Not quite as strong as Ethernet, but sufficient to detect all errors of
three bits or less, and all burst errors of 16 consecutive bits or less.

A single-bit flip can't get past a CRC-16 unless you flip at least
three bits in the CRC as well. The actual pattern depends on the bit
position and averages 8 bits; given the documented bit error positions
and a better knowledge of the ATA-over-USB encapsulation protocol,
the actual CRC changes could be computed.


Now, I can imagine a USB slave controller so cheap and/or buggy that it
doesn't check the CRC, but I'd think that most would. Checking a CRC
is hardly a novel challenge.

2006-12-10 15:37:38

by Clemens Koller

[permalink] [raw]
Subject: Re: single bit errors on files stored on USB-HDDs via USB2/usb_storage

Hi There!

> Now, I can imagine a USB slave controller so cheap and/or buggy that it
> doesn't check the CRC, but I'd think that most would. Checking a CRC
> is hardly a novel challenge.

Do we have any counters in the USB Stack and the drivers which count the
USB transaction errors?
According to some datasheets (i.e. NXP ISP1563) there are bits called
USB Error and USB Error Interrupt" etc.
Are those should be implemented / counted in the driver stack somewhere?
Okay, simple question...

A quick look into ehci.h tells me that the bit inside of the kernel
is propably called STS_ERR and is used i.e. in ehci_dbg.c 's
and printed through dbg_status_buf() and dbg_intr.buf().
Maybe it's sufficient to turn on debugging and turn the
error flag into an error counter just to get an idea if it cumulates?

Just my five cents,

Clemens Koller
_______________________________
R&D Imaging Devices
Anagramm GmbH
Rupert-Mayer-Str. 45/1
81379 Muenchen
Germany

http://www.anagramm-technology.com
Phone: +49-89-741518-50
Fax: +49-89-741518-19