2001-04-12 08:43:55

by Lorenzo Marcantonio

[permalink] [raw]
Subject: SCSI Tape Corruption - update

Still experimenting with my SDT-9000... tried connecting it to another
controller
(2940AU in place of 2904, sorry but I've only Adaptec stuff :). Same
problem.
Tried with another tape (even with an old DDS-2 tape). Same. Even tried
another
cable/removing the CDWR drive from the bus.

It seems that the tape is written incorrectly. I wrote some large file
(300MB)
and read it back four time. The read copies are all the same. They differ
from the original only in 32 consecutive bytes (the replaced values SEEM
random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be
accepted :)

Now I'll build some old 2.2 kernel to try...

-- Lorenzo Marcantonio


2001-04-12 12:44:14

by rct

[permalink] [raw]
Subject: Re: SCSI Tape Corruption - update

[email protected] wrote:
> It seems that the tape is written incorrectly. I wrote some large file
> (300MB)
> and read it back four time. The read copies are all the same. They differ
> from the original only in 32 consecutive bytes (the replaced values SEEM
> random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be
> accepted :)

Several years ago I ran into a problem with similar symptoms on an old
Adaptec AHA-154X controller. Files (and most certainly "file systems"
if I had persisted) on my hard disk were getting corrupted in random
places with constant length strings of garbage. This turned out to be
an inappropriate setting for the AHA1542_SCATTER constant: it *was* 16,
and setting it to 8 fixed my problem. I'd look for a similar "#define"
in the header file for your SCSI device driver and try cutting the value
by half. Why "half"? No justification other than it worked for me, and
it's a power-of-two kind of thing that hardware seems to like :-).

--Bob

2001-04-12 18:54:52

by Gérard Roudier

[permalink] [raw]
Subject: Re: SCSI Tape Corruption - update



On Thu, 12 Apr 2001 [email protected] wrote:

> Still experimenting with my SDT-9000... tried connecting it to another
> controller
> (2940AU in place of 2904, sorry but I've only Adaptec stuff :). Same
> problem.
> Tried with another tape (even with an old DDS-2 tape). Same. Even tried
> another
> cable/removing the CDWR drive from the bus.
>
> It seems that the tape is written incorrectly. I wrote some large file
> (300MB)
> and read it back four time. The read copies are all the same. They differ
> from the original only in 32 consecutive bytes (the replaced values SEEM
> random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be
> accepted :)

A similar problem has been reported under Linux/PPC a couple of weeks ago
using a sym53c875 controller. In this case, kernel 2.2 was fine.

> Now I'll build some old 2.2 kernel to try...

If 2.2 is ok with your tape, a software error in 2.4 gets very likely, in
my opinion.

G?rard.

2001-04-12 20:06:15

by Lorenzo Marcantonio

[permalink] [raw]
Subject: Re: SCSI Tape Corruption - update 2

On Thu, 12 Apr 2001, G?rard Roudier wrote:

> using a sym53c875 controller. In this case, kernel 2.2 was fine.
>
> > Now I'll build some old 2.2 kernel to try...
>
> If 2.2 is ok with your tape, a software error in 2.4 gets very likely, in
> my opinion.

Well, the 2.2 distributed with Mandrake 7.2 works fine ... :)

Hmmm... 32 CONSECUTIVE bytes are a very peculiar error. What can it be?

Still experimenting...

-- Lorenzo Marcantonio

2001-04-13 07:04:19

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: SCSI Tape Corruption - update

On Thu, 12 Apr 2001 [email protected] wrote:
> Still experimenting with my SDT-9000... tried connecting it to another
> controller
> (2940AU in place of 2904, sorry but I've only Adaptec stuff :). Same
> problem.
> Tried with another tape (even with an old DDS-2 tape). Same. Even tried
> another
> cable/removing the CDWR drive from the bus.
>
> It seems that the tape is written incorrectly. I wrote some large file
> (300MB)
> and read it back four time. The read copies are all the same. They differ
> from the original only in 32 consecutive bytes (the replaced values SEEM
> random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be
> accepted :)

As G?rard already replied, I have the same problem on my PPC box (cfr. my
postings last month) with DDS-1 tape drive. It has 2 SCSI adapters (MESH and
Sym53c875), and it seems to happen with the '875 only (but the MESH sucks
anyway and has other problems making it unusable for my DDS-1).

In my case, the 32 bad bytes are always a copy of the 32 bytes 10K before (10K
= blocksize of tar). Can you verify that's the case for you as well? For
reference, I have approx. 6 sequences of corrupted data when writing 256 MB to
tape. Reading gives no problems.

The problem does not appear in 2.2.13 (yep, that's old, but so far the latest
2.2.x kernel that runs on my CHRP LongTrail). I have to fix later kernels
first.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2001-04-13 07:14:44

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: SCSI Tape Corruption - update

On Fri, 13 Apr 2001, Geert Uytterhoeven wrote:
> On Thu, 12 Apr 2001 [email protected] wrote:
> > It seems that the tape is written incorrectly. I wrote some large file
> > (300MB)
> > and read it back four time. The read copies are all the same. They differ
> > from the original only in 32 consecutive bytes (the replaced values SEEM
> > random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be
> > accepted :)
>
> In my case, the 32 bad bytes are always a copy of the 32 bytes 10K before (10K
> = blocksize of tar). Can you verify that's the case for you as well? For
> reference, I have approx. 6 sequences of corrupted data when writing 256 MB to
> tape. Reading gives no problems.

Forgot some things...

It also happens with dd, so it's not a bug in tar.
If I set the tar blocksize to 512 bytes, the offset changes to 512 bytes as
well.
If I set the tar blocksize to 57*512 bytes, I didn't see a problem (however,
could have been `good luck').

The problem seems to be there since at least 2.4.0-test1-ac10, which means
quite some people may no longer have known good backups of their valuable data
(of course we should not run 2.[34].x kernels on our systems, right? :-)

Since you have a different SCSI host adapter, the problem is most likely in
st.c. I was thinking of writing `predictable' data (or checksummed blocks or
so) to tape and add some data verification tests to st.c at the very last
moment before it sends a write command to the SCSI host adapter, but I haven't
found time for that yet.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds