2001-10-16 21:31:53

by Jeff V. Merkey

[permalink] [raw]
Subject: SCSI tape load problem with Exabyte Drive



On 2.4.6 with st and AICXXXX driver, issuance of an MTLOAD command
via st ioctl() calls results in a unit attention and failure of
the drive while loading a tape from an EXB-480 robotics tape
library.

Code which generates this error is attached. The error will not
clear unless the code first closes the open handle to the device,
then reopens the handle and retries the load command. The failure
scenario is always the same. The first MTLOAD command triggers
the tape drive to load the tape, then all subsequent commands
fail until the handle is closed and the device is reopened and
a second MTLOAD command gets issued, then the drive starts
working.

I have written a tape robotics library for the Exabyte EXB-80
robotic tape library on Linux for a customer of Canopy. The
offending code is from this library.

Code attached. This error is persistent and 100% reproducable on
this hardware. The error does not involve the robot in the
library, as the robot has a unique SCSI id, and commands being
sent to the robot via the SCSI generic interface work flawlessly.
The tape drives in the robotics library appear to Linux as
tape devices on the SCSI bus, and the problems are apparent
on the SCSI tape drives via st.o.

There are no other problems at present with any of the remaining
tape code, and it is working perfectly, with the exception of
this load command.

Jeff

//
// function used to call the st ioctl() interface. handles are opened
// in kernel space via filp_open()
//

int trx_tape_command(int tape_id, int cmd, int count, int count_bits)
{
struct file *filp;
register int ccode;
struct mtop mt_com;

if (tape_id >= MAX_TAPES)
return -EINVAL;

if ((!SystemTape[tape_id]) || (!SystemTape[tape_id]->filp))
return -EINVAL;

filp = SystemTape[tape_id]->filp;

mt_com.mt_op = cmd;
mt_com.mt_count = count;
mt_com.mt_count |= count_bits;

if (mt_com.mt_count < 0)
{
TRXDRVPrint("mt: negative repeat count\n");
return -EIO;
}

ccode = tape_ioctl(filp, MTIOCTOP, (char *)&mt_com);
if (ccode)
return ccode;

return 0;
}

//
//
// This code segment is what I am doing to get around this problem
// however, we still see a unit attention error periodically, and
// it does not reliably work every time.
//
//

reload:;
err = trx_tape_open(device); // this is a filp_open() call
if (err)
goto done;

err = trx_tape_status(device);
if (err)
goto done;

// load command
err = trx_tape_command(device, MTLOAD, 0, 0);
TRXDRVPrint("loading tape %d ret-%d\n", (int)device, (int)err);
if (err)
goto done;

err = trx_tape_command(device, MTSETBLK, 4096, 0);
TRXDRVPrint("set tape blksize %d ret-%d\n", (int)device, (int)err);
if (err)
{
if (rcount++ < 3)
{
trx_tape_close(device);
goto reload;
}
else
goto done;
}

err = trx_tape_command(device, MTREW, 0, 0);
TRXDRVPrint("rewinding tape %d ret-%d\n", (int)device, (int)err);
if (err)
goto done;

//
// redacted code.
//

done:;


2001-10-17 06:42:51

by Kai Mäkisara (Kolumbus)

[permalink] [raw]
Subject: Re: SCSI tape load problem with Exabyte Drive

On Tue, 16 Oct 2001, Jeff V. Merkey wrote:

>
>
> On 2.4.6 with st and AICXXXX driver, issuance of an MTLOAD command
> via st ioctl() calls results in a unit attention and failure of
> the drive while loading a tape from an EXB-480 robotics tape
> library.
>
> Code which generates this error is attached. The error will not
> clear unless the code first closes the open handle to the device,
> then reopens the handle and retries the load command. The failure
> scenario is always the same. The first MTLOAD command triggers
> the tape drive to load the tape, then all subsequent commands
> fail until the handle is closed and the device is reopened and
> a second MTLOAD command gets issued, then the drive starts
> working.
>
This is a "feature" of the st driver: if you get UNIT ATTENTION anywhere
else than within open(), it is considered an error. In most cases this is
true but MTLOAD is an exception. I have not thought about this exception
and noone before you has reported it ;-)

As you say, the workaround is to close and reopen the device after MTLOAD.
You should not need the second MTLOAD.

I will think about a fix to this problem. The basic reason for not
allowing UNIT ATTENTION anywhere is that flushing the driver state
properly in any condition is complicated and there has been no legitimate
reason to allow this. However, here it should be sufficient to use a no-op
SCSI command after LOAD to get the UNIT ATTENTION.

Kai


2001-10-17 16:50:49

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: SCSI tape load problem with Exabyte Drive

On Wed, Oct 17, 2001 at 09:43:42AM +0300, Kai Makisara wrote:
> On Tue, 16 Oct 2001, Jeff V. Merkey wrote:
>
> >
> >
> > On 2.4.6 with st and AICXXXX driver, issuance of an MTLOAD command
> > via st ioctl() calls results in a unit attention and failure of
> > the drive while loading a tape from an EXB-480 robotics tape
> > library.
> >
> > Code which generates this error is attached. The error will not
> > clear unless the code first closes the open handle to the device,
> > then reopens the handle and retries the load command. The failure
> > scenario is always the same. The first MTLOAD command triggers
> > the tape drive to load the tape, then all subsequent commands
> > fail until the handle is closed and the device is reopened and
> > a second MTLOAD command gets issued, then the drive starts
> > working.
> >
> This is a "feature" of the st driver: if you get UNIT ATTENTION anywhere
> else than within open(), it is considered an error. In most cases this is
> true but MTLOAD is an exception. I have not thought about this exception
> and noone before you has reported it ;-)
>
> As you say, the workaround is to close and reopen the device after MTLOAD.
> You should not need the second MTLOAD.
>
> I will think about a fix to this problem. The basic reason for not
> allowing UNIT ATTENTION anywhere is that flushing the driver state
> properly in any condition is complicated and there has been no legitimate
> reason to allow this. However, here it should be sufficient to use a no-op
> SCSI command after LOAD to get the UNIT ATTENTION.
>
> Kai
>

Kai,

Thanks for the prompt response. We will continue using the current
recovery method since this appears to work based upon your
description of what is happening here. I will remove the second MTLOAD
command and test with the robotics library. Sounds like it should
work OK. Please let us know what you decide if you feel a workaround
is needed for this problem, and we will be happy to test it for
you.

Do-na-da Go-hv-e

Wa-do

Thanks

Jeff

2001-10-18 00:15:34

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: SCSI tape load problem with Exabyte Drive



Kai,

We have seen some other weirdness, but it seems related to the
file handle engine in Linux. When we open a handle from
a user space context via ioctl() calls through to our kernel
level components, close this handle from another kernel
thread, then attempt to reopen the handle from a
differnt kernel thread, we are seeing a return code
of -21 (EISDIR) from filp_open() which is strange and
looks like a bug.

We are then unable to reopen the handle to /dev/st0 or /dev/st1
until the system has been rebooted. This looks like either
a bug in Linux or something we have caused by using handles
in this manner.

Do you have any idea what might be happening here.

Jeff


On Wed, Oct 17, 2001 at 10:55:35AM -0700, Jeff V. Merkey wrote:
> On Wed, Oct 17, 2001 at 09:43:42AM +0300, Kai Makisara wrote:
> > On Tue, 16 Oct 2001, Jeff V. Merkey wrote:
> >
> > >
> > >
> > > On 2.4.6 with st and AICXXXX driver, issuance of an MTLOAD command
> > > via st ioctl() calls results in a unit attention and failure of
> > > the drive while loading a tape from an EXB-480 robotics tape
> > > library.
> > >
> > > Code which generates this error is attached. The error will not
> > > clear unless the code first closes the open handle to the device,
> > > then reopens the handle and retries the load command. The failure
> > > scenario is always the same. The first MTLOAD command triggers
> > > the tape drive to load the tape, then all subsequent commands
> > > fail until the handle is closed and the device is reopened and
> > > a second MTLOAD command gets issued, then the drive starts
> > > working.
> > >
> > This is a "feature" of the st driver: if you get UNIT ATTENTION anywhere
> > else than within open(), it is considered an error. In most cases this is
> > true but MTLOAD is an exception. I have not thought about this exception
> > and noone before you has reported it ;-)
> >
> > As you say, the workaround is to close and reopen the device after MTLOAD.
> > You should not need the second MTLOAD.
> >
> > I will think about a fix to this problem. The basic reason for not
> > allowing UNIT ATTENTION anywhere is that flushing the driver state
> > properly in any condition is complicated and there has been no legitimate
> > reason to allow this. However, here it should be sufficient to use a no-op
> > SCSI command after LOAD to get the UNIT ATTENTION.
> >
> > Kai
> >
>
> Kai,
>
> Thanks for the prompt response. We will continue using the current
> recovery method since this appears to work based upon your
> description of what is happening here. I will remove the second MTLOAD
> command and test with the robotics library. Sounds like it should
> work OK. Please let us know what you decide if you feel a workaround
> is needed for this problem, and we will be happy to test it for
> you.
>
> Do-na-da Go-hv-e
>
> Wa-do
>
> Thanks
>
> Jeff
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/