2003-02-10 18:13:14

by Kevin Fenzi

[permalink] [raw]
Subject: 2.4.x end of tape handling error

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Greetings.

I have had reported from a client that they are having problems with
backups that span more than one tape. Instead of getting an EOT error
or EOM, they are getting an I/O error wich requires the driver to be
unloaded and reloaded before the tape will work again.

http://www.linuxtapecert.org/ Says that the redhat 2.4.9-34 kernel is
the last one that had proper EOT handling. Indeed, if they use the
2.4.9-34 kernel, the tape works properly. Thats not a very good
solution however.

Is this fixed in the latest 2.4.21-pres? How about in 2.5.x?

Has anyone else seen this?

I can get further details to track this down and fix it if it's not
already fixed.

kevin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQE+R+3+3imCezTjY0ERAuZMAKCcQJInxbLBaOOFaTdIRxFhzVT+LQCfSqHg
9HpLrQOVYetev4zhYXnFD/o=
=WN5h
-----END PGP SIGNATURE-----


2003-02-10 18:54:46

by Pete Zaitcev

[permalink] [raw]
Subject: Re: 2.4.x end of tape handling error

> I have had reported from a client that they are having problems with
> backups that span more than one tape. Instead of getting an EOT error
> or EOM, they are getting an I/O error wich requires the driver to be
> unloaded and reloaded before the tape will work again.
>
> http://www.linuxtapecert.org/ Says that the redhat 2.4.9-34 kernel is
> the last one that had proper EOT handling. Indeed, if they use the
> 2.4.9-34 kernel, the tape works properly. Thats not a very good
> solution however.

You neglected to mention what kind of tape it is. There are
several types of tapes, served by a jigsaw puzzle of various
drivers.

> Is this fixed in the latest 2.4.21-pres? How about in 2.5.x?

Why don't you try and verify it, then let us know? You may
be the only guy in the world mad enough to use a tape with 2.5.x.
Please share your valuable expirience.

-- Pete

2003-02-10 20:53:59

by Kai Mäkisara (Kolumbus)

[permalink] [raw]
Subject: Re: 2.4.x end of tape handling error

This discussion should really be moved to linux-scsi...

On Mon, 10 Feb 2003, Kevin Fenzi wrote:

> Greetings.
>
> I have had reported from a client that they are having problems with
> backups that span more than one tape. Instead of getting an EOT error
> or EOM, they are getting an I/O error wich requires the driver to be
> unloaded and reloaded before the tape will work again.
>
What messages have they seen in the system log? Some messages should be
after this kind of error. It is difficult to see where the problem is
without any details. There have not been any significant changes in EOM
handling in st between the 2.4 kernels.

> http://www.linuxtapecert.org/ Says that the redhat 2.4.9-34 kernel is
> the last one that had proper EOT handling. Indeed, if they use the
> 2.4.9-34 kernel, the tape works properly. Thats not a very good
> solution however.
>
> Is this fixed in the latest 2.4.21-pres? How about in 2.5.x?
>
Don't know. EOM handling has worked with my test system (HP DDS drives
connected to a SYM53c896) with both 2.4 and 2.5 kernels. I just reran the
eom tests with 2.4.20 and 2.5.60 without problems.

Kai

2003-02-18 02:00:47

by Kevin Fenzi

[permalink] [raw]
Subject: Re: 2.4.x end of tape handling error

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>>>>> "Pete" == Pete Zaitcev <[email protected]> writes:

>> I have had reported from a client that they are having problems
>> with backups that span more than one tape. Instead of getting an
>> EOT error or EOM, they are getting an I/O error wich requires the
>> driver to be unloaded and reloaded before the tape will work again.
>>
>> http://www.linuxtapecert.org/ Says that the redhat 2.4.9-34 kernel
>> is the last one that had proper EOT handling. Indeed, if they use
>> the 2.4.9-34 kernel, the tape works properly. Thats not a very good
>> solution however.

Pete> You neglected to mention what kind of tape it is. There are
Pete> several types of tapes, served by a jigsaw puzzle of various
Pete> drivers.

The problem was reported to me on a LTO scsi drive, but they also said
it happened on normal DAT drives. I am trying to get the exact model
and such on that drive.

>> Is this fixed in the latest 2.4.21-pres? How about in 2.5.x?

Pete> Why don't you try and verify it, then let us know? You may be
Pete> the only guy in the world mad enough to use a tape with 2.5.x.
Pete> Please share your valuable expirience.

well, I have a HP dds2 drive here, so was happy to try and duplicate
the problem. Starting with the 2.4.18-24.7.x-i686-smp redhat kernel.

In the interests of getting the problem to occur quickly, I
partitioned the dds2 tape into 2 partitions, the second having only
10mb in it. That doesn't show the problem. I get ENOSPC as expected at
the end of the small partition.

Without partitions if I write more than can fit on a dds2 tape, I get:

...
write(3, "r\342H\\5,\341\235\203\6\245`\264.C\303*\262\27qZ\343\305"..., 10240) = -1 EIO (Input/output error)
write(2, "tar: ", 5) = 5
write(2, "/dev/nst0: Wrote only 0 of 10240"..., 38) = 38
write(2, "\n", 1) = 1
write(2, "tar: ", 5) = 5
write(2, "Error is not recoverable: exitin"..., 37) = 37
write(2, "\n", 1) = 1
munmap(0x4002e000, 4096) = 0
_exit(2) = ?

st0: Error with sense data: Info fld=0x28000, Current st09:00: sense key Medium Error
Additional sense indicates Write error
st0: Error with sense data: Info fld=0x0, Current st09:00: sense key Medium Error
Additional sense indicates Write error
st0: Error on write filemark.
st: Unloaded.

Sounds like it might be this issue:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&frame=right&th=85f41070543a0b41&seekm=DHn4y1.49t%40temic-ech.spacenet.de#s

I am trying another test with buffering off to see if that fixes it.
Nope. Tried loading st with everything set to 0, no dice.

Pete> -- Pete

kevin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQE+UZYf3imCezTjY0ERAtaeAJsH7cwVy8HCkzHoUH+x4D0t1En0NACeMR91
osNsXmVCPrvFCDRrUQ3NPPk=
=9ji/
-----END PGP SIGNATURE-----

2003-02-24 23:44:03

by Kevin Fenzi

[permalink] [raw]
Subject: Re: 2.4.x end of tape handling error

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Some more information on this problem discovered by Tim Jones
<[email protected]>:

Tim> Additional news.

Tim> This is actually related to the check sense bit not being
Tim> propagated up to the ST driver. A simpler test (beats writing
Tim> 40GB to a tape ...):

Tim> use a 2.2.19/20/21 or 22 kernel, or a 2.4.9-34 kernel Remove the
Tim> tape from the tape device execute:

Tim> tar -cvvf /dev/nst0 /etc

Tim> You will receive a "No medium found" message

Tim> Replace the kernel with 2.4.11+ and repeat the tar write test.
Tim> This time, you will receive a write failure.

Tim> This is caused by the check sense not being set and the ST driver
Tim> sending up a EIO instead of the ENOMEDIUM.

So, it looks like this problem is _not_ in the st driver itself, but
somewhere in the SCSI layer.

Anyone have any ideas how to better track it down?

Happy to run debug code/test cases here.

anyone?

kevin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQE+WrCT3imCezTjY0ERAoNDAJ9kx5aTtxJZlxKL04IJmVTztvM5MQCeIuFS
Y4RxoYEC619ckzSxXGIAlcM=
=A5/e
-----END PGP SIGNATURE-----