2004-01-19 04:36:13

by Andries E. Brouwer

[permalink] [raw]
Subject: [PATCH] fix for ide-scsi crash

A rather reproducible crash with ide-scsi that I reported
yesterday is fixed by the patch below.

(Most of this is just polishing that emacs did while my eyes
looked at the code. The only change is not doing
ide_do_drive_cmd (drive, rq, ide_end);
spin_lock_irq(cmd->device->host->host_lock);
since cmd may be freed already.)

Andries

[With this in place I wrote three times 640 MB to floptical and diffed;
no problems occurred. Without it the system would crash each time.]

diff -u --recursive --new-file -X /linux/dontdiff a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
--- a/drivers/ide/ide-io.c 2004-01-11 14:20:49.000000000 +0100
+++ b/drivers/ide/ide-io.c 2004-01-19 05:00:47.000000000 +0100
@@ -1300,7 +1300,7 @@
* Initialize a request before we fill it in and send it down to
* ide_do_drive_cmd. Commands must be set up by this function. Right
* now it doesn't do a lot, but if that changes abusers will have a
- * nasty suprise.
+ * nasty surprise.
*/

void ide_init_drive_cmd (struct request *rq)
diff -u --recursive --new-file -X /linux/dontdiff a/drivers/scsi/ide-scsi.c b/drivers/scsi/ide-scsi.c
--- a/drivers/scsi/ide-scsi.c 2003-12-18 03:59:05.000000000 +0100
+++ b/drivers/scsi/ide-scsi.c 2004-01-19 05:30:28.000000000 +0100
@@ -148,7 +148,8 @@
count = IDE_MIN (pc->sg->length - pc->b_count, bcount);
buf = page_address(pc->sg->page) + pc->sg->offset;
atapi_input_bytes (drive, buf + pc->b_count, count);
- bcount -= count; pc->b_count += count;
+ bcount -= count;
+ pc->b_count += count;
if (pc->b_count == pc->sg->length) {
pc->sg++;
pc->b_count = 0;
@@ -191,8 +192,12 @@
return;
if (drive->media == ide_cdrom || drive->media == ide_optical) {
if (c[0] == READ_6 || c[0] == WRITE_6) {
- c[8] = c[4]; c[5] = c[3]; c[4] = c[2];
- c[3] = c[1] & 0x1f; c[2] = 0; c[1] &= 0xe0;
+ c[8] = c[4];
+ c[5] = c[3];
+ c[4] = c[2];
+ c[3] = c[1] & 0x1f;
+ c[2] = 0;
+ c[1] &= 0xe0;
c[0] += (READ_10 - READ_6);
}
if (c[0] == MODE_SENSE || c[0] == MODE_SELECT) {
@@ -380,7 +385,7 @@
static ide_startstop_t idescsi_pc_intr (ide_drive_t *drive)
{
idescsi_scsi_t *scsi = drive_to_idescsi(drive);
- idescsi_pc_t *pc=scsi->pc;
+ idescsi_pc_t *pc = scsi->pc;
struct request *rq = pc->rq;
atapi_bcount_t bcount;
atapi_status_t status;
@@ -664,8 +669,6 @@
.ioctl = idescsi_ide_ioctl,
};

-static int idescsi_attach(ide_drive_t *drive);
-
static int idescsi_slave_configure(Scsi_Device * sdp)
{
/* Configure detected device */
@@ -794,7 +797,8 @@
idescsi_pc_t *pc = NULL;

if (!drive) {
- printk (KERN_ERR "ide-scsi: drive id %d not present\n", cmd->device->id);
+ printk (KERN_ERR "ide-scsi: drive id %d not present\n",
+ cmd->device->id);
goto abort;
}
scsi = drive_to_idescsi(drive);
@@ -827,25 +831,30 @@
idescsi_transform_pc1 (drive, pc);

if (test_bit(IDESCSI_LOG_CMD, &scsi->log)) {
- printk ("ide-scsi: %s: que %lu, cmd = ", drive->name, cmd->serial_number);
+ printk ("ide-scsi: %s: que %lu, cmd = ",
+ drive->name, cmd->serial_number);
hexdump(cmd->cmnd, cmd->cmd_len);
if (memcmp(pc->c, cmd->cmnd, cmd->cmd_len)) {
- printk ("ide-scsi: %s: que %lu, tsl = ", drive->name, cmd->serial_number);
+ printk("ide-scsi: %s: que %lu, tsl = ",
+ drive->name, cmd->serial_number);
hexdump(pc->c, 12);
}
}

- ide_init_drive_cmd (rq);
+ ide_init_drive_cmd(rq);
rq->special = (char *) pc;
- rq->bio = idescsi_dma_bio (drive, pc);
+ rq->bio = idescsi_dma_bio(drive, pc);
rq->flags = REQ_SPECIAL;
- spin_unlock_irq(cmd->device->host->host_lock);
- (void) ide_do_drive_cmd (drive, rq, ide_end);
- spin_lock_irq(cmd->device->host->host_lock);
+ {
+ struct Scsi_Host *host = cmd->device->host;
+ spin_unlock_irq(host->host_lock);
+ (void) ide_do_drive_cmd(drive, rq, ide_end);
+ spin_lock_irq(host->host_lock);
+ }
return 0;
abort:
- if (pc) kfree (pc);
- if (rq) kfree (rq);
+ kfree (pc);
+ kfree (rq);
cmd->result = DID_ERROR << 16;
done(cmd);
return 1;


2004-01-19 18:51:57

by Pascal Schmidt

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash


> A rather reproducible crash with ide-scsi that I reported
> yesterday is fixed by the patch below.

> [With this in place I wrote three times 640 MB to floptical and diffed;
> no problems occurred. Without it the system would crash each time.]

I can confirm that it also makes my MO drive work with ide-scsi.
I could also successfully write 4.1 GB of data to a DVD+RW disc.
Writing 700 MB to a normal CD-RW disc also worked (yes, I know I
could use ide-cd for this, I just did this for testing the ide-scsi
fix).

This patch seems to solve all my 2.6 ide-scsi problems.

Andrew, you can drop the atapi-mo-support patches from -mm if you
like. That patch only works with 2048 byte sector discs, while
the ide-scsi/sd solution also works with 512 and 1024 byte sector
discs.

--
Ciao,
Pascal

2004-01-19 22:35:56

by Willem Riede

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash

On Mon, 19 Jan 2004 05:35:58 +0100, Andries.Brouwer wrote:

> - spin_unlock_irq(cmd->device->host->host_lock);
> - (void) ide_do_drive_cmd (drive, rq, ide_end);
> - spin_lock_irq(cmd->device->host->host_lock);
> + {
> + struct Scsi_Host *host = cmd->device->host;
> + spin_unlock_irq(host->host_lock);
> + (void) ide_do_drive_cmd(drive, rq, ide_end);
> + spin_lock_irq(host->host_lock);
> + }

Interesting. So cmd->device->host changed after ide_do_drive_cmd?

That may be a problem if the scsi error handler has to spring
into action for this command, as it uses that field extensively...

> - if (pc) kfree (pc);
> - if (rq) kfree (rq);
> + kfree (pc);
> + kfree (rq);

So it is OK to rely on the NULL check in kfree?

Thanks, Willem Riede.

2004-01-19 22:56:49

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash

From: Willem Riede <[email protected]>

> - spin_unlock_irq(cmd->device->host->host_lock);
> - (void) ide_do_drive_cmd (drive, rq, ide_end);
> - spin_lock_irq(cmd->device->host->host_lock);
> + {
> + struct Scsi_Host *host = cmd->device->host;
> + spin_unlock_irq(host->host_lock);
> + (void) ide_do_drive_cmd(drive, rq, ide_end);
> + spin_lock_irq(host->host_lock);
> + }

Interesting. So cmd->device->host changed after ide_do_drive_cmd?

It is a race.
ide_do_drive_cmd(..., ide_end) does not wait for I/O to finish,
but when I/O finishes cmd is freed. So, if the device is quick
and the kernel is slow - maybe a timer interrupt in between -
the command can have finished before the spin_lock_irq() is done.

That may be a problem if the scsi error handler has to spring
into action for this command, as it uses that field extensively...

> - if (pc) kfree (pc);
> - if (rq) kfree (rq);
> + kfree (pc);
> + kfree (rq);

So it is OK to rely on the NULL check in kfree?

Yes. Many years ago I regarded it as ugly to do so, but by now
it is common usage.

Andries


2004-01-20 07:33:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash



On Mon, 19 Jan 2004, Pascal Schmidt wrote:
>
> This patch seems to solve all my 2.6 ide-scsi problems.

I've applied the fix part of it and pushed it out. If Andries wants to
re-send the whitespace fixes, I can apply those too, but I hate applying
patches like this where the whitespace fixes hide the real fix.

> Andrew, you can drop the atapi-mo-support patches from -mm if you
> like. That patch only works with 2048 byte sector discs, while
> the ide-scsi/sd solution also works with 512 and 1024 byte sector
> discs.

I'd really like the ATA cdrom driver to handle different sector sizes
properly. There really is no excuse for a block device driver to hardcode
its blocksize if it can avoid it.

Linus

2004-01-20 09:44:40

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash

From [email protected] Tue Jan 20 08:33:03 2004

> This patch seems to solve all my 2.6 ide-scsi problems.

I've applied the fix part of it and pushed it out.

Good.

If Andries wants to
re-send the whitespace fixes, I can apply those too, but I hate applying
patches like this where the whitespace fixes hide the real fix.

Yes, it seems we presently have no good mechanism / policy here.
Patches are noise. If some kernel version works and another doesnt,
one has to look at the diffs. Whitespace-only diffs are bad,
I would never submit them. They also needlessly invalidate existing patches.

On the other hand, nice, readable kernel sources are important.
I used to polish the immediate neighbourhood of an actual change.
If that is undesirable, what would you prefer?

> Andrew, you can drop the atapi-mo-support patches from -mm if you
> like. That patch only works with 2048 byte sector discs, while
> the ide-scsi/sd solution also works with 512 and 1024 byte sector
> discs.

I'd really like the ATA cdrom driver to handle different sector sizes
properly. There really is no excuse for a block device driver to hardcode
its blocksize if it can avoid it.

Yes, it is very easy to change that.
And another thing that is very easy is to move partitioning away
from the individual block devices. It was part of the stuff I did
last year. Hope to try again for 2.7.
And then there is the read-only part that must be removed.

Those are three reasons why ide-cd today doesnt work so well
with optical disks. But I am not sure it is desirable to make
ide-cd work with them. The source would be littered with ifs -
all this toc stuff is inappropriate for disks.

Andries




2004-01-20 12:04:20

by Pascal Schmidt

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash

On Tue, 20 Jan 2004 [email protected] wrote:

> And then there is the read-only part that must be removed.

Not a problem for ide-cd, it already supports writing for DVD-RAM,
and in -mm also packet writing.

--
Ciao,
Pascal

2004-01-20 13:56:47

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash

On Tue, Jan 20 2004, Pascal Schmidt wrote:
> On Tue, 20 Jan 2004 [email protected] wrote:
>
> > And then there is the read-only part that must be removed.
>
> Not a problem for ide-cd, it already supports writing for DVD-RAM,
> and in -mm also packet writing.

Packet writing isn't in -mm, it's mt rainier writing.

--
Jens Axboe

2004-01-20 13:54:35

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash

On Mon, Jan 19 2004, Linus Torvalds wrote:
> > Andrew, you can drop the atapi-mo-support patches from -mm if you
> > like. That patch only works with 2048 byte sector discs, while
> > the ide-scsi/sd solution also works with 512 and 1024 byte sector
> > discs.
>
> I'd really like the ATA cdrom driver to handle different sector sizes
> properly. There really is no excuse for a block device driver to hardcode
> its blocksize if it can avoid it.

Agree, it's a bug. Pascal, care to take a stab at fixing it? You're the
most avid ide-cd non-2kb block size user at the moment :)

--
Jens Axboe

2004-01-20 14:59:13

by Pascal Schmidt

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash

On Tue, 20 Jan 2004 15:00:16 +0100, you wrote in linux.kernel:

>> I'd really like the ATA cdrom driver to handle different sector sizes
>> properly. There really is no excuse for a block device driver to hardcode
>> its blocksize if it can avoid it.
>
> Agree, it's a bug. Pascal, care to take a stab at fixing it? You're the
> most avid ide-cd non-2kb block size user at the moment :)

There's a lot of macros related to sector sizes in ide-cd.h. I suppose
all that would need to be changed to depend on the real hardware sector
size?

I don't have any idea about block layer internals or about sending
commands to a drive.

In fact, I don't even know how to start or what uses of sector size
in ide-cd.c really need to be changed. Much less do I know where
information about hardware sector size could be gotten and how it should
be stored for use by ide-cd.

--
Ciao,
Pascal

2004-01-20 17:08:50

by Pascal Schmidt

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash


>> Agree, it's a bug. Pascal, care to take a stab at fixing it? You're the
>> most avid ide-cd non-2kb block size user at the moment :)

> There's a lot of macros related to sector sizes in ide-cd.h. I suppose
> all that would need to be changed to depend on the real hardware sector
> size?

I've actually looked at the code now and it seems that it might be
quite easy to fix this, after all.

I have a question about cdrom_start_read_continuation:

Variables called nframes and frames are computed but never used. Only
nskip actually gets factored into the request:

rq->current_nr_sectors += nskip;

The others are local vars and never get assigned to anything more
global. So I conclude they are meaningless? I ask because this
is one of the places that uses SECTORS_PER_FRAME and it doesn't make
sense to me.

Unrelated question:

Later on, the code decides to use DMA only for requests aligned
on a SECTORS_PER_FRAME boundary. Where does this limitation come from?
Does it come from the drive, so that alignment to 512 byte would be
enough on a drive with 512 byte sectors?

I've actually ordered some 512 byte sector MO discs today and will try
to get them working with ide-cd once they arrive here.

--
Ciao,
Pascal

2004-01-20 17:56:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash



On Tue, 20 Jan 2004 [email protected] wrote:
>
> If Andries wants to
> re-send the whitespace fixes, I can apply those too, but I hate applying
> patches like this where the whitespace fixes hide the real fix.
>
> Yes, it seems we presently have no good mechanism / policy here.
> Patches are noise. If some kernel version works and another doesnt,
> one has to look at the diffs. Whitespace-only diffs are bad,
> I would never submit them. They also needlessly invalidate existing patches.

Whitespace-only diffs can be very useful. In particular, they are common
when somebody starts working on a piece of code without a maintainer, and
the old code was terminally broken wrt whitespace. Happens quite often in
the driver world.

So I don't have any real issues with applying whitespace-only patches, and
I much prefer them to patches that mix whitespace and bugfixes. In
particular, if the whitespace fixes are preparation for some other
cleanup, it's usually a good idea.

(I agree that if the whitespace fix is just random, it's usually not worth
it).

Linus

2004-01-20 19:04:57

by Bill Davidsen

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash

In article <[email protected]>,
Pascal Schmidt <[email protected]> wrote:
| On Tue, 20 Jan 2004 15:00:16 +0100, you wrote in linux.kernel:
|
| >> I'd really like the ATA cdrom driver to handle different sector sizes
| >> properly. There really is no excuse for a block device driver to hardcode
| >> its blocksize if it can avoid it.
| >
| > Agree, it's a bug. Pascal, care to take a stab at fixing it? You're the
| > most avid ide-cd non-2kb block size user at the moment :)
|
| There's a lot of macros related to sector sizes in ide-cd.h. I suppose
| all that would need to be changed to depend on the real hardware sector
| size?
|
| I don't have any idea about block layer internals or about sending
| commands to a drive.
|
| In fact, I don't even know how to start or what uses of sector size
| in ide-cd.c really need to be changed. Much less do I know where
| information about hardware sector size could be gotten and how it should
| be stored for use by ide-cd.

To pour confusion on this, I thought this was working already. I just
(Sunday) tried burning audio CDs with the ide-cd interface, using
Joerg's new a25 version of cdrecord which claimed to use DMA for 2352
byte audio sectors. Worked! About 92% idle and 5% wait-io. only 1%
system while burning at 32x. And the CD does play just fine, I listened
to it all the way to work.

Is this a physical/logical sector thing, or is Joerg doing a magic
trick, or what? Definitely does work in terms of producing correct
output, is it really writing 2352 byte secotrs?
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2004-01-20 19:20:41

by Ben Pfaff

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash

[email protected] writes:

> Yes, it seems we presently have no good mechanism / policy here.
> Patches are noise. If some kernel version works and another doesnt,
> one has to look at the diffs. Whitespace-only diffs are bad,
> I would never submit them. They also needlessly invalidate existing patches.

When one version of a source file works and another doesn't,
`diff -b' or `diff -w' usually does a good job of ignoring
whitespace changes.
--
<[email protected]> <[email protected]> <[email protected]> <[email protected]>
Stanford Ph.D. Candidate - MSU Alumnus - Debian Maintainer - GNU Developer
http://www.benpfaff.org

2004-01-20 20:01:32

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash

On Tue, Jan 20, 2004 at 06:08:44PM +0100, Pascal Schmidt wrote:

> I have a question about cdrom_start_read_continuation:
>
> Variables called nframes and frames are computed but never used. Only
> nskip actually gets factored into the request:
>
> rq->current_nr_sectors += nskip;
>
> The others are local vars and never get assigned to anything more
> global. So I conclude they are meaningless? I ask because this
> is one of the places that uses SECTORS_PER_FRAME and it doesn't make
> sense to me.

Yes, they are meaningless.
The code that used them was removed in 2.5.1.

2004-01-20 20:58:16

by Pascal Schmidt

[permalink] [raw]
Subject: Re: [PATCH] fix for ide-scsi crash

On Tue, 20 Jan 2004, Andries Brouwer wrote:

> Yes, they are meaningless.
> The code that used them was removed in 2.5.1.

Then perhaps it would be a good idea to apply the following cleanup
patch:

--- ide-cd.c.orig Tue Jan 20 21:47:30 2004
+++ ide-cd.c Tue Jan 20 21:52:34 2004
@@ -1236,13 +1236,7 @@ static int cdrom_read_from_buffer (ide_d
static ide_startstop_t cdrom_start_read_continuation (ide_drive_t *drive)
{
struct request *rq = HWGROUP(drive)->rq;
- int nsect, sector, nframes, frame, nskip;
-
- /* Number of sectors to transfer. */
- nsect = rq->nr_sectors;
-
- /* Starting sector. */
- sector = rq->sector;
+ int nskip;

/* If the requested sector doesn't start on a cdrom block boundary,
we must adjust the start of the transfer so that it does,
@@ -1251,7 +1245,7 @@ static ide_startstop_t cdrom_start_read_
of the buffer, it will mean that we're to skip a number
of sectors equal to the amount by which CURRENT_NR_SECTORS
is larger than the buffer size. */
- nskip = (sector % SECTORS_PER_FRAME);
+ nskip = (rq->sector % SECTORS_PER_FRAME);
if (nskip > 0) {
/* Sanity check... */
if (rq->current_nr_sectors != bio_cur_sectors(rq->bio) &&
@@ -1261,21 +1255,9 @@ static ide_startstop_t cdrom_start_read_
cdrom_end_request(drive, 0);
return ide_stopped;
}
- sector -= nskip;
- nsect += nskip;
rq->current_nr_sectors += nskip;
}

- /* Convert from sectors to cdrom blocks, rounding up the transfer
- length if needed. */
- nframes = (nsect + SECTORS_PER_FRAME-1) / SECTORS_PER_FRAME;
- frame = sector / SECTORS_PER_FRAME;
-
- /* Largest number of frames was can transfer at once is 64k-1. For
- some drives we need to limit this even more. */
- nframes = MIN (nframes, (CDROM_CONFIG_FLAGS (drive)->limit_nframes) ?
- (65534 / CD_FRAMESIZE) : 65535);
-
/* Set up the command */
rq->timeout = WAIT_CMD;


--
Ciao,
Pascal