2002-04-02 20:38:36

by Evan Harris

[permalink] [raw]
Subject: Problem with scsi tape drives (2.4.18) and soft error count (BusLogic, AIC7xxx)


I've had a long time problem with trying to get the total soft error count
from tape devices when using the kernel provided tape interface.
Hopefully, someone here can shed some light on the problem. Using several
different DAT and DLT tape drives, the behavior seems the same.

I'm trying to figure out how to retrieve the soft error count from a tape
drive after having performed a backup. It helps me to gauge when a tape
needs to be retired, and I'm used to being able to get the total soft error
count from other backup software packages for dos/windows.

mt apparently queries the soft error count, but it always seems to be zero.
I've dug into the problem a bit, and it seems that mt reports zero because
the tape drive has had it checked and cleared by the kernel at every drive
operation. Is there any place in the kernel that this information is stored
so that it may be retrieved?

I've also tried with different scsi adapters, and it may rollover to an
adapter/driver issue. For instance, the BusLogic driver keeps alot more
statistics information in /proc/scsi/BusLogic/1 than the Adaptec driver, but
doesn't happen to have the soft error count in there.

Any help or pointers would be appreciated. Web search hasn't turned up much
useful information.

Evan

--
| Evan Harris - Consultant, Harris Enterprises - [email protected]
|
| Custom Solutions for your Software, Networking, and Telephony Needs


2002-04-02 20:59:07

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Problem with scsi tape drives (2.4.18) and soft error count (BusLogic, AIC7xxx)

On Tue, 2 Apr 2002, Evan Harris wrote:

>
> I've had a long time problem with trying to get the total soft error count
> from tape devices when using the kernel provided tape interface.
> Hopefully, someone here can shed some light on the problem. Using several
> different DAT and DLT tape drives, the behavior seems the same.
>
> I'm trying to figure out how to retrieve the soft error count from a tape
> drive after having performed a backup. It helps me to gauge when a tape
> needs to be retired, and I'm used to being able to get the total soft error
> count from other backup software packages for dos/windows.
>
> mt apparently queries the soft error count, but it always seems to be zero.
> I've dug into the problem a bit, and it seems that mt reports zero because
> the tape drive has had it checked and cleared by the kernel at every drive
> operation. Is there any place in the kernel that this information is stored
> so that it may be retrieved?


Not really. The soft error count is preserved across the 'correct' kinds
of open/close operations. To use `mt` to get the count and, to preserve
the state of the tape machine, you need to do your open/close against
the minor number that has the high-bit set:

# file /dev/st*
st0: character special (9/0)
st1: character special (9/1)
st3: character special (9/128)

Instead of using /dev/st0, you would use (on this machine) /dev/st3.

So, if you do your I/O and status through /dev/st3, you will get
meaningful information. Once you close /dev/st0, all history is
lost (correctly). Note that if you do I/O through /dev/st3, the
tape will not automatically rewind on close. You will need to
use `mt` for that.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

Windows-2000/Professional isn't.

2002-04-02 21:24:33

by Evan Harris

[permalink] [raw]
Subject: Re: Problem with scsi tape drives (2.4.18) and soft error count (BusLogic, AIC7xxx)


Only one problem, I'm using devfs, so the major/minor means nothing. But
from looking at online docs, the normal name for the high bit devices is
nst0, and that just happens to be the device I am using.

But that does explain why the soft error count is always 0 after doing a
tar, since tar closes the device when it's done, and you stated that it
loses all history after that. A subsequent call to mt would therefore
report 0.

I guess the only way to get the info is to hack a call to retrieve the soft
errors into tar, but I was hoping to avoid that.

Thanks for the info.

Evan

--
| Evan Harris - Consultant, Harris Enterprises - [email protected]
|
| Custom Solutions for your Software, Networking, and Telephony Needs

On Tue, 2 Apr 2002, Richard B. Johnson wrote:

> On Tue, 2 Apr 2002, Evan Harris wrote:
>
> >
> > I've had a long time problem with trying to get the total soft error count
> > from tape devices when using the kernel provided tape interface.
> > Hopefully, someone here can shed some light on the problem. Using several
> > different DAT and DLT tape drives, the behavior seems the same.
> >
> > I'm trying to figure out how to retrieve the soft error count from a tape
> > drive after having performed a backup. It helps me to gauge when a tape
> > needs to be retired, and I'm used to being able to get the total soft error
> > count from other backup software packages for dos/windows.
> >
> > mt apparently queries the soft error count, but it always seems to be zero.
> > I've dug into the problem a bit, and it seems that mt reports zero because
> > the tape drive has had it checked and cleared by the kernel at every drive
> > operation. Is there any place in the kernel that this information is stored
> > so that it may be retrieved?
>
>
> Not really. The soft error count is preserved across the 'correct' kinds
> of open/close operations. To use `mt` to get the count and, to preserve
> the state of the tape machine, you need to do your open/close against
> the minor number that has the high-bit set:
>
> # file /dev/st*
> st0: character special (9/0)
> st1: character special (9/1)
> st3: character special (9/128)
>
> Instead of using /dev/st0, you would use (on this machine) /dev/st3.
>
> So, if you do your I/O and status through /dev/st3, you will get
> meaningful information. Once you close /dev/st0, all history is
> lost (correctly). Note that if you do I/O through /dev/st3, the
> tape will not automatically rewind on close. You will need to
> use `mt` for that.
>
> Cheers,
> Dick Johnson
>
> Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
>
> Windows-2000/Professional isn't.
>

2002-04-03 15:55:16

by Kai Mäkisara (Kolumbus)

[permalink] [raw]
Subject: Re: Problem with scsi tape drives (2.4.18) and soft error count (BusLogic, AIC7xxx)

On Tue, 2 Apr 2002, Evan Harris wrote:

>
> Only one problem, I'm using devfs, so the major/minor means nothing. But
> from looking at online docs, the normal name for the high bit devices is
> nst0, and that just happens to be the device I am using.
>
> But that does explain why the soft error count is always 0 after doing a
> tar, since tar closes the device when it's done, and you stated that it
> loses all history after that. A subsequent call to mt would therefore
> report 0.
>
> I guess the only way to get the info is to hack a call to retrieve the soft
> errors into tar, but I was hoping to avoid that.
>
Quoting from README.st:

The number of recovered errors since the previous status call
is stored in the lower word of the field mt_erreg.

i.e., the number of recovered error is cleared when it is read with
MTIOCGET (e.g., mt status). It does not matter which one of the device
nodes pointing to the same drive you use.

Quoting from 'man st':

mt_erreg The only field defined in mt_erreg is the recovered error count
in the low 16 bits (as defined by MT_ST_SOFTERR_SHIFT and
MT_ST_SOFTERR_MASK). Due to inconsistencies in the way drives
report recovered errors, this count is often not maintained
(most drives do not by default report soft errors but this can
be changed with a SCSI MODE SELECT command).

You should check that your drive is configured to report the soft errors.
This can be done using the mode page 01h (read-write error recovery page,
bit PER). Some drives don't support setting this bit to one. You should be
able to see the value of the bit and change it using the scsi tools
probably included in your distribution.

Kai