2007-10-23 06:49:55

by David Miller

[permalink] [raw]
Subject: IDE crash...


I'm debugging a blk_rq_map_sg() crash that i'm getting on sparc64 as
root is mounted over IDE. I think I know what is happening now.

The IDE sg table is allocated and initialized like this in
drivers/ide/ide-probe.c:

x = kmalloc(sizeof(struct scatterlist) * nents, GFP_XXX);
sg_init_table(x, nents);

So far, so good.

Now, ide_map_sg() passes requests down to blk_rq_map_sg() like this in
drivers/block/ide-io.c:

hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);

Ok, so what does blk_rq_map_sg() do?

sg = NULL;
rq_for_each_segment(bvec, rq, iter) {
...
if (bvprv && cluster) {
...
} else {
new_segment:
if (!sg)
sg = sglist;
else
sg = sg_next(sg);
...
}
bvprv = bvec;
} /* segments in rq */

if (sg)
__sg_mark_end(sg);

So let's say the first request comes in and needs 2 segs.
This will mark sg[1].page_link with 0x2

If the next request from IDE needs 4 segs, we'll OOPS because
sg_next() on &sg[1] will see page_link bit 0x2 is set and
therefore return NULL.

A quick look shows that if you're testing on SCSI (or something
layered on top of it like SATA or PATA) you won't see this seemingly
guarenteed crash because the SCSI mid-layer allocates a fresh sglist
via mempool_alloc() and runs sg_init_table() on it for every I/O
request.


2007-10-23 07:04:27

by Jens Axboe

[permalink] [raw]
Subject: Re: IDE crash...

On Mon, Oct 22 2007, David Miller wrote:
>
> I'm debugging a blk_rq_map_sg() crash that i'm getting on sparc64 as
> root is mounted over IDE. I think I know what is happening now.
>
> The IDE sg table is allocated and initialized like this in
> drivers/ide/ide-probe.c:
>
> x = kmalloc(sizeof(struct scatterlist) * nents, GFP_XXX);
> sg_init_table(x, nents);
>
> So far, so good.
>
> Now, ide_map_sg() passes requests down to blk_rq_map_sg() like this in
> drivers/block/ide-io.c:
>
> hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
>
> Ok, so what does blk_rq_map_sg() do?
>
> sg = NULL;
> rq_for_each_segment(bvec, rq, iter) {
> ...
> if (bvprv && cluster) {
> ...
> } else {
> new_segment:
> if (!sg)
> sg = sglist;
> else
> sg = sg_next(sg);
> ...
> }
> bvprv = bvec;
> } /* segments in rq */
>
> if (sg)
> __sg_mark_end(sg);
>
> So let's say the first request comes in and needs 2 segs.
> This will mark sg[1].page_link with 0x2
>
> If the next request from IDE needs 4 segs, we'll OOPS because
> sg_next() on &sg[1] will see page_link bit 0x2 is set and
> therefore return NULL.
>
> A quick look shows that if you're testing on SCSI (or something
> layered on top of it like SATA or PATA) you won't see this seemingly
> guarenteed crash because the SCSI mid-layer allocates a fresh sglist
> via mempool_alloc() and runs sg_init_table() on it for every I/O
> request.

We should never see the end pointer in blk_rq_map_sg(), or that's a bug
in the driver. So it should be OK to just clear the end pointer always
in there, even if it's not the prettiest solution...

This just needs to be wrapped up in some scatterlist.h macro/function.

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 61c2e39..a3bda2f 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1354,6 +1354,12 @@ new_segment:
else
sg = sg_next(sg);

+ /*
+ * Clear end-of-table pointer, we'll mark a new one
+ * at the end
+ */
+ sg->page_link &= ~0x2;
+
sg_dma_len(sg) = 0;
sg_dma_address(sg) = 0;
sg_set_page(sg, bvec->bv_page);

--
Jens Axboe

2007-10-23 07:11:13

by Jens Axboe

[permalink] [raw]
Subject: Re: IDE crash...

On Tue, Oct 23 2007, Jens Axboe wrote:
> On Mon, Oct 22 2007, David Miller wrote:
> >
> > I'm debugging a blk_rq_map_sg() crash that i'm getting on sparc64 as
> > root is mounted over IDE. I think I know what is happening now.
> >
> > The IDE sg table is allocated and initialized like this in
> > drivers/ide/ide-probe.c:
> >
> > x = kmalloc(sizeof(struct scatterlist) * nents, GFP_XXX);
> > sg_init_table(x, nents);
> >
> > So far, so good.
> >
> > Now, ide_map_sg() passes requests down to blk_rq_map_sg() like this in
> > drivers/block/ide-io.c:
> >
> > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> >
> > Ok, so what does blk_rq_map_sg() do?
> >
> > sg = NULL;
> > rq_for_each_segment(bvec, rq, iter) {
> > ...
> > if (bvprv && cluster) {
> > ...
> > } else {
> > new_segment:
> > if (!sg)
> > sg = sglist;
> > else
> > sg = sg_next(sg);
> > ...
> > }
> > bvprv = bvec;
> > } /* segments in rq */
> >
> > if (sg)
> > __sg_mark_end(sg);
> >
> > So let's say the first request comes in and needs 2 segs.
> > This will mark sg[1].page_link with 0x2
> >
> > If the next request from IDE needs 4 segs, we'll OOPS because
> > sg_next() on &sg[1] will see page_link bit 0x2 is set and
> > therefore return NULL.
> >
> > A quick look shows that if you're testing on SCSI (or something
> > layered on top of it like SATA or PATA) you won't see this seemingly
> > guarenteed crash because the SCSI mid-layer allocates a fresh sglist
> > via mempool_alloc() and runs sg_init_table() on it for every I/O
> > request.
>
> We should never see the end pointer in blk_rq_map_sg(), or that's a bug
> in the driver. So it should be OK to just clear the end pointer always
> in there, even if it's not the prettiest solution...
>
> This just needs to be wrapped up in some scatterlist.h macro/function.
>
> diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
> index 61c2e39..a3bda2f 100644
> --- a/block/ll_rw_blk.c
> +++ b/block/ll_rw_blk.c
> @@ -1354,6 +1354,12 @@ new_segment:
> else
> sg = sg_next(sg);
>
> + /*
> + * Clear end-of-table pointer, we'll mark a new one
> + * at the end
> + */
> + sg->page_link &= ~0x2;
> +
> sg_dma_len(sg) = 0;
> sg_dma_address(sg) = 0;
> sg_set_page(sg, bvec->bv_page);

Eh this wont work, it's the wrong entry... Here's a temporary
work-around.

diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index c89f0d3..108202b 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
return;

if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
+ sg_init_table(hwif->sg_table, hwif->sg_max_nents);
hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
} else {
sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index ec55a17..cbfb113 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -1324,8 +1324,6 @@ static int hwif_init(ide_hwif_t *hwif)
goto out;
}

- sg_init_table(hwif->sg_table, hwif->sg_max_nents);
-
if (init_irq(hwif) == 0)
goto done;


--
Jens Axboe

2007-10-23 07:14:46

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: IDE crash...

On Tue, 23 Oct 2007 09:09:33 +0200
Jens Axboe <[email protected]> wrote:

> On Tue, Oct 23 2007, Jens Axboe wrote:
> > On Mon, Oct 22 2007, David Miller wrote:
> > >
> > > I'm debugging a blk_rq_map_sg() crash that i'm getting on sparc64 as
> > > root is mounted over IDE. I think I know what is happening now.
> > >
> > > The IDE sg table is allocated and initialized like this in
> > > drivers/ide/ide-probe.c:
> > >
> > > x = kmalloc(sizeof(struct scatterlist) * nents, GFP_XXX);
> > > sg_init_table(x, nents);
> > >
> > > So far, so good.
> > >
> > > Now, ide_map_sg() passes requests down to blk_rq_map_sg() like this in
> > > drivers/block/ide-io.c:
> > >
> > > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> > >
> > > Ok, so what does blk_rq_map_sg() do?
> > >
> > > sg = NULL;
> > > rq_for_each_segment(bvec, rq, iter) {
> > > ...
> > > if (bvprv && cluster) {
> > > ...
> > > } else {
> > > new_segment:
> > > if (!sg)
> > > sg = sglist;
> > > else
> > > sg = sg_next(sg);
> > > ...
> > > }
> > > bvprv = bvec;
> > > } /* segments in rq */
> > >
> > > if (sg)
> > > __sg_mark_end(sg);
> > >
> > > So let's say the first request comes in and needs 2 segs.
> > > This will mark sg[1].page_link with 0x2
> > >
> > > If the next request from IDE needs 4 segs, we'll OOPS because
> > > sg_next() on &sg[1] will see page_link bit 0x2 is set and
> > > therefore return NULL.
> > >
> > > A quick look shows that if you're testing on SCSI (or something
> > > layered on top of it like SATA or PATA) you won't see this seemingly
> > > guarenteed crash because the SCSI mid-layer allocates a fresh sglist
> > > via mempool_alloc() and runs sg_init_table() on it for every I/O
> > > request.
> >
> > We should never see the end pointer in blk_rq_map_sg(), or that's a bug
> > in the driver. So it should be OK to just clear the end pointer always
> > in there, even if it's not the prettiest solution...
> >
> > This just needs to be wrapped up in some scatterlist.h macro/function.
> >
> > diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
> > index 61c2e39..a3bda2f 100644
> > --- a/block/ll_rw_blk.c
> > +++ b/block/ll_rw_blk.c
> > @@ -1354,6 +1354,12 @@ new_segment:
> > else
> > sg = sg_next(sg);
> >
> > + /*
> > + * Clear end-of-table pointer, we'll mark a new one
> > + * at the end
> > + */
> > + sg->page_link &= ~0x2;
> > +
> > sg_dma_len(sg) = 0;
> > sg_dma_address(sg) = 0;
> > sg_set_page(sg, bvec->bv_page);
>
> Eh this wont work, it's the wrong entry... Here's a temporary
> work-around.

Yeah, it won't work. Now we must call sg_init_table for every I/O
request (it's not nice).

I think that there are other blk_rq_map_sg users need such fix.


> diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> index c89f0d3..108202b 100644
> --- a/drivers/ide/ide-io.c
> +++ b/drivers/ide/ide-io.c
> @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
> return;
>
> if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
> + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> } else {
> sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
> diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
> index ec55a17..cbfb113 100644
> --- a/drivers/ide/ide-probe.c
> +++ b/drivers/ide/ide-probe.c
> @@ -1324,8 +1324,6 @@ static int hwif_init(ide_hwif_t *hwif)
> goto out;
> }
>
> - sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> -
> if (init_irq(hwif) == 0)
> goto done;
>
>
> --
> Jens Axboe
>

2007-10-23 07:18:35

by David Miller

[permalink] [raw]
Subject: Re: IDE crash...

From: Jens Axboe <[email protected]>
Date: Tue, 23 Oct 2007 09:09:33 +0200

> Eh this wont work, it's the wrong entry... Here's a temporary
> work-around.
>
> diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> index c89f0d3..108202b 100644
> --- a/drivers/ide/ide-io.c
> +++ b/drivers/ide/ide-io.c
> @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
> return;
>
> if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
> + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> } else {
> sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);

That's the exact patch I'm about to boot test :-)

2007-10-23 07:24:58

by Jens Axboe

[permalink] [raw]
Subject: Re: IDE crash...

On Tue, Oct 23 2007, FUJITA Tomonori wrote:
> On Tue, 23 Oct 2007 09:09:33 +0200
> Jens Axboe <[email protected]> wrote:
>
> > On Tue, Oct 23 2007, Jens Axboe wrote:
> > > On Mon, Oct 22 2007, David Miller wrote:
> > > >
> > > > I'm debugging a blk_rq_map_sg() crash that i'm getting on sparc64 as
> > > > root is mounted over IDE. I think I know what is happening now.
> > > >
> > > > The IDE sg table is allocated and initialized like this in
> > > > drivers/ide/ide-probe.c:
> > > >
> > > > x = kmalloc(sizeof(struct scatterlist) * nents, GFP_XXX);
> > > > sg_init_table(x, nents);
> > > >
> > > > So far, so good.
> > > >
> > > > Now, ide_map_sg() passes requests down to blk_rq_map_sg() like this in
> > > > drivers/block/ide-io.c:
> > > >
> > > > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> > > >
> > > > Ok, so what does blk_rq_map_sg() do?
> > > >
> > > > sg = NULL;
> > > > rq_for_each_segment(bvec, rq, iter) {
> > > > ...
> > > > if (bvprv && cluster) {
> > > > ...
> > > > } else {
> > > > new_segment:
> > > > if (!sg)
> > > > sg = sglist;
> > > > else
> > > > sg = sg_next(sg);
> > > > ...
> > > > }
> > > > bvprv = bvec;
> > > > } /* segments in rq */
> > > >
> > > > if (sg)
> > > > __sg_mark_end(sg);
> > > >
> > > > So let's say the first request comes in and needs 2 segs.
> > > > This will mark sg[1].page_link with 0x2
> > > >
> > > > If the next request from IDE needs 4 segs, we'll OOPS because
> > > > sg_next() on &sg[1] will see page_link bit 0x2 is set and
> > > > therefore return NULL.
> > > >
> > > > A quick look shows that if you're testing on SCSI (or something
> > > > layered on top of it like SATA or PATA) you won't see this seemingly
> > > > guarenteed crash because the SCSI mid-layer allocates a fresh sglist
> > > > via mempool_alloc() and runs sg_init_table() on it for every I/O
> > > > request.
> > >
> > > We should never see the end pointer in blk_rq_map_sg(), or that's a bug
> > > in the driver. So it should be OK to just clear the end pointer always
> > > in there, even if it's not the prettiest solution...
> > >
> > > This just needs to be wrapped up in some scatterlist.h macro/function.
> > >
> > > diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
> > > index 61c2e39..a3bda2f 100644
> > > --- a/block/ll_rw_blk.c
> > > +++ b/block/ll_rw_blk.c
> > > @@ -1354,6 +1354,12 @@ new_segment:
> > > else
> > > sg = sg_next(sg);
> > >
> > > + /*
> > > + * Clear end-of-table pointer, we'll mark a new one
> > > + * at the end
> > > + */
> > > + sg->page_link &= ~0x2;
> > > +
> > > sg_dma_len(sg) = 0;
> > > sg_dma_address(sg) = 0;
> > > sg_set_page(sg, bvec->bv_page);
> >
> > Eh this wont work, it's the wrong entry... Here's a temporary
> > work-around.
>
> Yeah, it won't work. Now we must call sg_init_table for every I/O
> request (it's not nice).

I think the fix would be to have a sg_next_and_clear() or something that
doesn't honor the 0x02 termination bit and clears it, for the cases
where you KNOW that there are more entries.

> I think that there are other blk_rq_map_sg users need such fix.

Possibly, I did do quite a few of them. Alternatively, we can remove
__sg_mark_end() and leave that up to the driver.


diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 61c2e39..290836f 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1352,7 +1352,7 @@ new_segment:
if (!sg)
sg = sglist;
else
- sg = sg_next(sg);
+ sg = sg_next_force(sg);

sg_dma_len(sg) = 0;
sg_dma_address(sg) = 0;
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 42daf5e..a98a2ee 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -99,6 +99,22 @@ static inline struct scatterlist *sg_next(struct scatterlist *sg)
return sg;
}

+/**
+ * sg_next_force - return the next scatterlist entry in a list
+ * @sg: The current sg entry
+ *
+ * Description:
+ * Must only be used when more entries beyond this one is known to exist,
+ * as it clears the termination bit. Useful to avoid adding a full sg
+ * table init on every mapping.
+ *
+ **/
+static inline struct scatterlist *sg_next_force(struct scatterlist *sg)
+{
+ sg->page_link &= ~0x02;
+ return sg_next(sg);
+}
+
/*
* Loop over each sg element, following the pointer to a new list if necessary
*/

--
Jens Axboe

2007-10-23 07:25:33

by Jens Axboe

[permalink] [raw]
Subject: Re: IDE crash...

On Tue, Oct 23 2007, David Miller wrote:
> From: Jens Axboe <[email protected]>
> Date: Tue, 23 Oct 2007 09:09:33 +0200
>
> > Eh this wont work, it's the wrong entry... Here's a temporary
> > work-around.
> >
> > diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> > index c89f0d3..108202b 100644
> > --- a/drivers/ide/ide-io.c
> > +++ b/drivers/ide/ide-io.c
> > @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
> > return;
> >
> > if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
> > + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> > } else {
> > sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
>
> That's the exact patch I'm about to boot test :-)

That should work - once you verify that, would you mind testing this one
as well? Thanks!

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 61c2e39..290836f 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1352,7 +1352,7 @@ new_segment:
if (!sg)
sg = sglist;
else
- sg = sg_next(sg);
+ sg = sg_next_force(sg);

sg_dma_len(sg) = 0;
sg_dma_address(sg) = 0;
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 42daf5e..a98a2ee 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -99,6 +99,22 @@ static inline struct scatterlist *sg_next(struct scatterlist *sg)
return sg;
}

+/**
+ * sg_next_force - return the next scatterlist entry in a list
+ * @sg: The current sg entry
+ *
+ * Description:
+ * Must only be used when more entries beyond this one is known to exist,
+ * as it clears the termination bit. Useful to avoid adding a full sg
+ * table init on every mapping.
+ *
+ **/
+static inline struct scatterlist *sg_next_force(struct scatterlist *sg)
+{
+ sg->page_link &= ~0x02;
+ return sg_next(sg);
+}
+
/*
* Loop over each sg element, following the pointer to a new list if necessary
*/

--
Jens Axboe

2007-10-23 07:43:11

by David Miller

[permalink] [raw]
Subject: Re: IDE crash...

From: Jens Axboe <[email protected]>
Date: Tue, 23 Oct 2007 09:23:59 +0200

> On Tue, Oct 23 2007, David Miller wrote:
> > From: Jens Axboe <[email protected]>
> > Date: Tue, 23 Oct 2007 09:09:33 +0200
> >
> > > Eh this wont work, it's the wrong entry... Here's a temporary
> > > work-around.
> > >
> > > diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> > > index c89f0d3..108202b 100644
> > > --- a/drivers/ide/ide-io.c
> > > +++ b/drivers/ide/ide-io.c
> > > @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
> > > return;
> > >
> > > if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
> > > + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> > > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> > > } else {
> > > sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
> >
> > That's the exact patch I'm about to boot test :-)
>
> That should work - once you verify that, would you mind testing this one
> as well? Thanks!

This one works here too, thanks.

2007-10-23 07:47:28

by Jens Axboe

[permalink] [raw]
Subject: Re: IDE crash...

On Tue, Oct 23 2007, David Miller wrote:
> From: Jens Axboe <[email protected]>
> Date: Tue, 23 Oct 2007 09:23:59 +0200
>
> > On Tue, Oct 23 2007, David Miller wrote:
> > > From: Jens Axboe <[email protected]>
> > > Date: Tue, 23 Oct 2007 09:09:33 +0200
> > >
> > > > Eh this wont work, it's the wrong entry... Here's a temporary
> > > > work-around.
> > > >
> > > > diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> > > > index c89f0d3..108202b 100644
> > > > --- a/drivers/ide/ide-io.c
> > > > +++ b/drivers/ide/ide-io.c
> > > > @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
> > > > return;
> > > >
> > > > if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
> > > > + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> > > > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> > > > } else {
> > > > sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
> > >
> > > That's the exact patch I'm about to boot test :-)
> >
> > That should work - once you verify that, would you mind testing this one
> > as well? Thanks!
>
> This one works here too, thanks.

Great, thanks for testing that as well. Thinking a bit more about it, I
think the forced clear should stay in blk_rq_map_sg() since we don't
want to advertise it to drivers. So I'll just open-code it in there.

--
Jens Axboe

2007-10-23 10:53:57

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: IDE crash...

On Tue, 23 Oct 2007 00:43:21 -0700 (PDT)
David Miller <[email protected]> wrote:

> From: Jens Axboe <[email protected]>
> Date: Tue, 23 Oct 2007 09:23:59 +0200
>
> > On Tue, Oct 23 2007, David Miller wrote:
> > > From: Jens Axboe <[email protected]>
> > > Date: Tue, 23 Oct 2007 09:09:33 +0200
> > >
> > > > Eh this wont work, it's the wrong entry... Here's a temporary
> > > > work-around.
> > > >
> > > > diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> > > > index c89f0d3..108202b 100644
> > > > --- a/drivers/ide/ide-io.c
> > > > +++ b/drivers/ide/ide-io.c
> > > > @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
> > > > return;
> > > >
> > > > if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
> > > > + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> > > > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> > > > } else {
> > > > sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
> > >
> > > That's the exact patch I'm about to boot test :-)
> >
> > That should work - once you verify that, would you mind testing this one
> > as well? Thanks!
>
> This one works here too, thanks.

BTW, we avoid calling sg_init_table() for every I/O request due to Jens'
trick. And Jens will remove the code to clear sg_dma_len/addr in
blk_rq_map_sg(). So sparc64 iommu needs to do that for itself?

http://marc.info/?l=linux-scsi&m=119261920425120&w=2

2007-10-23 10:58:51

by Jens Axboe

[permalink] [raw]
Subject: Re: IDE crash...

On Tue, Oct 23 2007, FUJITA Tomonori wrote:
> On Tue, 23 Oct 2007 00:43:21 -0700 (PDT)
> David Miller <[email protected]> wrote:
>
> > From: Jens Axboe <[email protected]>
> > Date: Tue, 23 Oct 2007 09:23:59 +0200
> >
> > > On Tue, Oct 23 2007, David Miller wrote:
> > > > From: Jens Axboe <[email protected]>
> > > > Date: Tue, 23 Oct 2007 09:09:33 +0200
> > > >
> > > > > Eh this wont work, it's the wrong entry... Here's a temporary
> > > > > work-around.
> > > > >
> > > > > diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> > > > > index c89f0d3..108202b 100644
> > > > > --- a/drivers/ide/ide-io.c
> > > > > +++ b/drivers/ide/ide-io.c
> > > > > @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
> > > > > return;
> > > > >
> > > > > if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
> > > > > + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> > > > > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> > > > > } else {
> > > > > sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
> > > >
> > > > That's the exact patch I'm about to boot test :-)
> > >
> > > That should work - once you verify that, would you mind testing this one
> > > as well? Thanks!
> >
> > This one works here too, thanks.
>
> BTW, we avoid calling sg_init_table() for every I/O request due to Jens'
> trick. And Jens will remove the code to clear sg_dma_len/addr in
> blk_rq_map_sg(). So sparc64 iommu needs to do that for itself?
>
> http://marc.info/?l=linux-scsi&m=119261920425120&w=2

It does - Dave, do you agree with the patch? I think it should be added,
so that sparc64 isn't the odd-arch-out in this respect. I've tentatively
applied the patch, Davem?

--
Jens Axboe

2007-10-23 10:59:46

by Jens Axboe

[permalink] [raw]
Subject: Re: IDE crash...

On Tue, Oct 23 2007, Jens Axboe wrote:
> On Tue, Oct 23 2007, FUJITA Tomonori wrote:
> > On Tue, 23 Oct 2007 00:43:21 -0700 (PDT)
> > David Miller <[email protected]> wrote:
> >
> > > From: Jens Axboe <[email protected]>
> > > Date: Tue, 23 Oct 2007 09:23:59 +0200
> > >
> > > > On Tue, Oct 23 2007, David Miller wrote:
> > > > > From: Jens Axboe <[email protected]>
> > > > > Date: Tue, 23 Oct 2007 09:09:33 +0200
> > > > >
> > > > > > Eh this wont work, it's the wrong entry... Here's a temporary
> > > > > > work-around.
> > > > > >
> > > > > > diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> > > > > > index c89f0d3..108202b 100644
> > > > > > --- a/drivers/ide/ide-io.c
> > > > > > +++ b/drivers/ide/ide-io.c
> > > > > > @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
> > > > > > return;
> > > > > >
> > > > > > if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
> > > > > > + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> > > > > > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> > > > > > } else {
> > > > > > sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
> > > > >
> > > > > That's the exact patch I'm about to boot test :-)
> > > >
> > > > That should work - once you verify that, would you mind testing this one
> > > > as well? Thanks!
> > >
> > > This one works here too, thanks.
> >
> > BTW, we avoid calling sg_init_table() for every I/O request due to Jens'
> > trick. And Jens will remove the code to clear sg_dma_len/addr in
> > blk_rq_map_sg(). So sparc64 iommu needs to do that for itself?
> >
> > http://marc.info/?l=linux-scsi&m=119261920425120&w=2
>
> It does - Dave, do you agree with the patch? I think it should be added,
> so that sparc64 isn't the odd-arch-out in this respect. I've tentatively
> applied the patch, Davem?

BTW, you sure that should not include to check whether sg_next() returns
non-NULL?

--
Jens Axboe

2007-10-23 11:11:23

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: IDE crash...

On Tue, 23 Oct 2007 12:58:11 +0200
Jens Axboe <[email protected]> wrote:

> On Tue, Oct 23 2007, Jens Axboe wrote:
> > On Tue, Oct 23 2007, FUJITA Tomonori wrote:
> > > On Tue, 23 Oct 2007 00:43:21 -0700 (PDT)
> > > David Miller <[email protected]> wrote:
> > >
> > > > From: Jens Axboe <[email protected]>
> > > > Date: Tue, 23 Oct 2007 09:23:59 +0200
> > > >
> > > > > On Tue, Oct 23 2007, David Miller wrote:
> > > > > > From: Jens Axboe <[email protected]>
> > > > > > Date: Tue, 23 Oct 2007 09:09:33 +0200
> > > > > >
> > > > > > > Eh this wont work, it's the wrong entry... Here's a temporary
> > > > > > > work-around.
> > > > > > >
> > > > > > > diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> > > > > > > index c89f0d3..108202b 100644
> > > > > > > --- a/drivers/ide/ide-io.c
> > > > > > > +++ b/drivers/ide/ide-io.c
> > > > > > > @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
> > > > > > > return;
> > > > > > >
> > > > > > > if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
> > > > > > > + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> > > > > > > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> > > > > > > } else {
> > > > > > > sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
> > > > > >
> > > > > > That's the exact patch I'm about to boot test :-)
> > > > >
> > > > > That should work - once you verify that, would you mind testing this one
> > > > > as well? Thanks!
> > > >
> > > > This one works here too, thanks.
> > >
> > > BTW, we avoid calling sg_init_table() for every I/O request due to Jens'
> > > trick. And Jens will remove the code to clear sg_dma_len/addr in
> > > blk_rq_map_sg(). So sparc64 iommu needs to do that for itself?
> > >
> > > http://marc.info/?l=linux-scsi&m=119261920425120&w=2
> >
> > It does - Dave, do you agree with the patch? I think it should be added,
> > so that sparc64 isn't the odd-arch-out in this respect. I've tentatively
> > applied the patch, Davem?
>
> BTW, you sure that should not include to check whether sg_next() returns
> non-NULL?

Yeah, the following part checks that:

+ if (dma_sg != sg) {

2007-10-23 11:44:49

by Jens Axboe

[permalink] [raw]
Subject: Re: IDE crash...

On Tue, Oct 23 2007, FUJITA Tomonori wrote:
> On Tue, 23 Oct 2007 12:58:11 +0200
> Jens Axboe <[email protected]> wrote:
>
> > On Tue, Oct 23 2007, Jens Axboe wrote:
> > > On Tue, Oct 23 2007, FUJITA Tomonori wrote:
> > > > On Tue, 23 Oct 2007 00:43:21 -0700 (PDT)
> > > > David Miller <[email protected]> wrote:
> > > >
> > > > > From: Jens Axboe <[email protected]>
> > > > > Date: Tue, 23 Oct 2007 09:23:59 +0200
> > > > >
> > > > > > On Tue, Oct 23 2007, David Miller wrote:
> > > > > > > From: Jens Axboe <[email protected]>
> > > > > > > Date: Tue, 23 Oct 2007 09:09:33 +0200
> > > > > > >
> > > > > > > > Eh this wont work, it's the wrong entry... Here's a temporary
> > > > > > > > work-around.
> > > > > > > >
> > > > > > > > diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> > > > > > > > index c89f0d3..108202b 100644
> > > > > > > > --- a/drivers/ide/ide-io.c
> > > > > > > > +++ b/drivers/ide/ide-io.c
> > > > > > > > @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
> > > > > > > > return;
> > > > > > > >
> > > > > > > > if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
> > > > > > > > + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> > > > > > > > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> > > > > > > > } else {
> > > > > > > > sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
> > > > > > >
> > > > > > > That's the exact patch I'm about to boot test :-)
> > > > > >
> > > > > > That should work - once you verify that, would you mind testing this one
> > > > > > as well? Thanks!
> > > > >
> > > > > This one works here too, thanks.
> > > >
> > > > BTW, we avoid calling sg_init_table() for every I/O request due to Jens'
> > > > trick. And Jens will remove the code to clear sg_dma_len/addr in
> > > > blk_rq_map_sg(). So sparc64 iommu needs to do that for itself?
> > > >
> > > > http://marc.info/?l=linux-scsi&m=119261920425120&w=2
> > >
> > > It does - Dave, do you agree with the patch? I think it should be added,
> > > so that sparc64 isn't the odd-arch-out in this respect. I've tentatively
> > > applied the patch, Davem?
> >
> > BTW, you sure that should not include to check whether sg_next() returns
> > non-NULL?
>
> Yeah, the following part checks that:
>
> + if (dma_sg != sg) {

Ah good, thanks for confirming.

--
Jens Axboe

2007-10-23 15:11:19

by John Stoffel

[permalink] [raw]
Subject: Re: IDE crash...

>>>>> "Jens" == Jens Axboe <[email protected]> writes:

Jens> On Tue, Oct 23 2007, David Miller wrote:
>> From: Jens Axboe <[email protected]>
>> Date: Tue, 23 Oct 2007 09:23:59 +0200
>>
>> > On Tue, Oct 23 2007, David Miller wrote:
>> > > From: Jens Axboe <[email protected]>
>> > > Date: Tue, 23 Oct 2007 09:09:33 +0200
>> > >
>> > > > Eh this wont work, it's the wrong entry... Here's a temporary
>> > > > work-around.
>> > > >
>> > > > diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
>> > > > index c89f0d3..108202b 100644
>> > > > --- a/drivers/ide/ide-io.c
>> > > > +++ b/drivers/ide/ide-io.c
>> > > > @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
>> > > > return;
>> > > >
>> > > > if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
>> > > > + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
>> > > > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
>> > > > } else {
>> > > > sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
>> > >
>> > > That's the exact patch I'm about to boot test :-)
>> >
>> > That should work - once you verify that, would you mind testing this one
>> > as well? Thanks!
>>
>> This one works here too, thanks.

Jens> Great, thanks for testing that as well. Thinking a bit more
Jens> about it, I think the forced clear should stay in
Jens> blk_rq_map_sg() since we don't want to advertise it to
Jens> drivers. So I'll just open-code it in there.

Should there be more bounds checking here, so that if you try to do a
_force() you at least check to make sure that there's something beyond
there and useable on the list?

We're saving the time from not reallocating from scratch, but let's
make it robust by doing at least some more checks here.

John

2007-10-23 21:17:41

by David Miller

[permalink] [raw]
Subject: Re: IDE crash...

From: Jens Axboe <[email protected]>
Date: Tue, 23 Oct 2007 12:57:11 +0200

> On Tue, Oct 23 2007, FUJITA Tomonori wrote:
> > BTW, we avoid calling sg_init_table() for every I/O request due to Jens'
> > trick. And Jens will remove the code to clear sg_dma_len/addr in
> > blk_rq_map_sg(). So sparc64 iommu needs to do that for itself?
> >
> > http://marc.info/?l=linux-scsi&m=119261920425120&w=2
>
> It does - Dave, do you agree with the patch? I think it should be added,
> so that sparc64 isn't the odd-arch-out in this respect. I've tentatively
> applied the patch, Davem?

No objections.

2007-10-23 21:45:22

by Jens Axboe

[permalink] [raw]
Subject: Re: IDE crash...

On Tue, Oct 23 2007, David Miller wrote:
> From: Jens Axboe <[email protected]>
> Date: Tue, 23 Oct 2007 12:57:11 +0200
>
> > On Tue, Oct 23 2007, FUJITA Tomonori wrote:
> > > BTW, we avoid calling sg_init_table() for every I/O request due to Jens'
> > > trick. And Jens will remove the code to clear sg_dma_len/addr in
> > > blk_rq_map_sg(). So sparc64 iommu needs to do that for itself?
> > >
> > > http://marc.info/?l=linux-scsi&m=119261920425120&w=2
> >
> > It does - Dave, do you agree with the patch? I think it should be added,
> > so that sparc64 isn't the odd-arch-out in this respect. I've tentatively
> > applied the patch, Davem?
>
> No objections.

Great, thanks Dave.

--
Jens Axboe

2007-10-24 06:49:20

by Jens Axboe

[permalink] [raw]
Subject: Re: IDE crash...

On Tue, Oct 23 2007, John Stoffel wrote:
> >>>>> "Jens" == Jens Axboe <[email protected]> writes:
>
> Jens> On Tue, Oct 23 2007, David Miller wrote:
> >> From: Jens Axboe <[email protected]>
> >> Date: Tue, 23 Oct 2007 09:23:59 +0200
> >>
> >> > On Tue, Oct 23 2007, David Miller wrote:
> >> > > From: Jens Axboe <[email protected]>
> >> > > Date: Tue, 23 Oct 2007 09:09:33 +0200
> >> > >
> >> > > > Eh this wont work, it's the wrong entry... Here's a temporary
> >> > > > work-around.
> >> > > >
> >> > > > diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> >> > > > index c89f0d3..108202b 100644
> >> > > > --- a/drivers/ide/ide-io.c
> >> > > > +++ b/drivers/ide/ide-io.c
> >> > > > @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
> >> > > > return;
> >> > > >
> >> > > > if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
> >> > > > + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> >> > > > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> >> > > > } else {
> >> > > > sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
> >> > >
> >> > > That's the exact patch I'm about to boot test :-)
> >> >
> >> > That should work - once you verify that, would you mind testing this one
> >> > as well? Thanks!
> >>
> >> This one works here too, thanks.
>
> Jens> Great, thanks for testing that as well. Thinking a bit more
> Jens> about it, I think the forced clear should stay in
> Jens> blk_rq_map_sg() since we don't want to advertise it to
> Jens> drivers. So I'll just open-code it in there.
>
> Should there be more bounds checking here, so that if you try to do a
> _force() you at least check to make sure that there's something beyond
> there and useable on the list?
>
> We're saving the time from not reallocating from scratch, but let's
> make it robust by doing at least some more checks here.

We have to rely on the caller passing in an sgtable that is big enough
to map the request, we have always made that assumption. Anything else
would be a bug, of course. Now, catching that bug would indeed be nice.
Any suggestions?

--
Jens Axboe

2007-10-24 16:50:49

by John Stoffel

[permalink] [raw]
Subject: Re: IDE crash...

>>>>> "Jens" == Jens Axboe <[email protected]> writes:

Jens> On Tue, Oct 23 2007, John Stoffel wrote:
>> >>>>> "Jens" == Jens Axboe <[email protected]> writes:
>>
Jens> On Tue, Oct 23 2007, David Miller wrote:
>> >> From: Jens Axboe <[email protected]>
>> >> Date: Tue, 23 Oct 2007 09:23:59 +0200
>> >>
>> >> > On Tue, Oct 23 2007, David Miller wrote:
>> >> > > From: Jens Axboe <[email protected]>
>> >> > > Date: Tue, 23 Oct 2007 09:09:33 +0200
>> >> > >
>> >> > > > Eh this wont work, it's the wrong entry... Here's a temporary
>> >> > > > work-around.
>> >> > > >
>> >> > > > diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
>> >> > > > index c89f0d3..108202b 100644
>> >> > > > --- a/drivers/ide/ide-io.c
>> >> > > > +++ b/drivers/ide/ide-io.c
>> >> > > > @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
>> >> > > > return;
>> >> > > >
>> >> > > > if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
>> >> > > > + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
>> >> > > > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
>> >> > > > } else {
>> >> > > > sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
>> >> > >
>> >> > > That's the exact patch I'm about to boot test :-)
>> >> >
>> >> > That should work - once you verify that, would you mind testing this one
>> >> > as well? Thanks!
>> >>
>> >> This one works here too, thanks.
>>
Jens> Great, thanks for testing that as well. Thinking a bit more
Jens> about it, I think the forced clear should stay in
Jens> blk_rq_map_sg() since we don't want to advertise it to
Jens> drivers. So I'll just open-code it in there.
>>
>> Should there be more bounds checking here, so that if you try to do a
>> _force() you at least check to make sure that there's something beyond
>> there and useable on the list?
>>
>> We're saving the time from not reallocating from scratch, but let's
>> make it robust by doing at least some more checks here.

Jens> We have to rely on the caller passing in an sgtable that is big enough
Jens> to map the request, we have always made that assumption. Anything else
Jens> would be a bug, of course. Now, catching that bug would indeed be nice.
Jens> Any suggestions?

Poor mans slab poisoning maybe? As Alan Cox was saying about keeping
it simple with a NULL entry on the end of the SG list, having an easy
way to not fall of the end seems key.

But from reading and re-reading the thread and some of the code, it
looks like this list is really allocated, then the driver can just
re-use the list (or a subset) as much as it likes without clearing and
then reallocating.

So if you've got a chain of 127 items, does it make sense when you
only use 16 of them to poison the next three or four in the chain to
really make sure you don't fall off the end?

Sorry I don't have a concrete example, my C-foo is quite weak and I
haven't had the time to really dive into this code. So take my
comments as just an interested but ignorant bystander butting in. :]

John

2007-10-24 18:10:42

by Jens Axboe

[permalink] [raw]
Subject: Re: IDE crash...

On Wed, Oct 24 2007, John Stoffel wrote:
> >>>>> "Jens" == Jens Axboe <[email protected]> writes:
>
> Jens> On Tue, Oct 23 2007, John Stoffel wrote:
> >> >>>>> "Jens" == Jens Axboe <[email protected]> writes:
> >>
> Jens> On Tue, Oct 23 2007, David Miller wrote:
> >> >> From: Jens Axboe <[email protected]>
> >> >> Date: Tue, 23 Oct 2007 09:23:59 +0200
> >> >>
> >> >> > On Tue, Oct 23 2007, David Miller wrote:
> >> >> > > From: Jens Axboe <[email protected]>
> >> >> > > Date: Tue, 23 Oct 2007 09:09:33 +0200
> >> >> > >
> >> >> > > > Eh this wont work, it's the wrong entry... Here's a temporary
> >> >> > > > work-around.
> >> >> > > >
> >> >> > > > diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> >> >> > > > index c89f0d3..108202b 100644
> >> >> > > > --- a/drivers/ide/ide-io.c
> >> >> > > > +++ b/drivers/ide/ide-io.c
> >> >> > > > @@ -822,6 +822,7 @@ void ide_map_sg(ide_drive_t *drive, struct request *rq)
> >> >> > > > return;
> >> >> > > >
> >> >> > > > if (rq->cmd_type != REQ_TYPE_ATA_TASKFILE) {
> >> >> > > > + sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> >> >> > > > hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> >> >> > > > } else {
> >> >> > > > sg_init_one(sg, rq->buffer, rq->nr_sectors * SECTOR_SIZE);
> >> >> > >
> >> >> > > That's the exact patch I'm about to boot test :-)
> >> >> >
> >> >> > That should work - once you verify that, would you mind testing this one
> >> >> > as well? Thanks!
> >> >>
> >> >> This one works here too, thanks.
> >>
> Jens> Great, thanks for testing that as well. Thinking a bit more
> Jens> about it, I think the forced clear should stay in
> Jens> blk_rq_map_sg() since we don't want to advertise it to
> Jens> drivers. So I'll just open-code it in there.
> >>
> >> Should there be more bounds checking here, so that if you try to do a
> >> _force() you at least check to make sure that there's something beyond
> >> there and useable on the list?
> >>
> >> We're saving the time from not reallocating from scratch, but let's
> >> make it robust by doing at least some more checks here.
>
> Jens> We have to rely on the caller passing in an sgtable that is big enough
> Jens> to map the request, we have always made that assumption. Anything else
> Jens> would be a bug, of course. Now, catching that bug would indeed be nice.
> Jens> Any suggestions?
>
> Poor mans slab poisoning maybe? As Alan Cox was saying about keeping
> it simple with a NULL entry on the end of the SG list, having an easy
> way to not fall of the end seems key.
>
> But from reading and re-reading the thread and some of the code, it
> looks like this list is really allocated, then the driver can just
> re-use the list (or a subset) as much as it likes without clearing and
> then reallocating.
>
> So if you've got a chain of 127 items, does it make sense when you
> only use 16 of them to poison the next three or four in the chain to
> really make sure you don't fall off the end?

Yes, that is indeed how most drivers work. It's both cheaper and faster
to keep the list than allocating and initializing it every time. With
the current termination bit approach, only that bits needs to be
set/cleared to properly terminate the sgtable. So it's free, since it
happens with setting page/length/offset.

--
Jens Axboe