Hi!
While doing some stress testing on the 2.5.2-pre5 kernel, I am hitting
a kernel BUG at scsi_merge.c:83, followed by a kernel panic. The
problem is that scsi_alloc_sgtable fails because the request contains
too many physical segments. I think this patch is the correct fix:
--- linux-2.5.2-pre5/drivers/scsi/scsi.c Fri Dec 28 12:38:01 2001
+++ linux-2.5-packet/drivers/scsi/scsi.c Wed Jan 2 02:27:45 2002
@@ -201,11 +201,6 @@
/* Hardware imposed limit. */
blk_queue_max_hw_segments(q, SHpnt->sg_tablesize);
- /*
- * When we remove scsi_malloc soonish, this can die too
- */
- blk_queue_max_phys_segments(q, PAGE_SIZE / sizeof(struct scatterlist));
-
blk_queue_max_sectors(q, SHpnt->max_sectors);
if (!SHpnt->use_clustering)
--
Peter Osterlund - [email protected]
http://w1.894.telia.com/~u89404340
On Wed, Jan 02 2002, Peter Osterlund wrote:
> Hi!
>
> While doing some stress testing on the 2.5.2-pre5 kernel, I am hitting
> a kernel BUG at scsi_merge.c:83, followed by a kernel panic. The
> problem is that scsi_alloc_sgtable fails because the request contains
> too many physical segments. I think this patch is the correct fix:
Correct, ll_rw_blk default is ok now. I missed this when killing
scsi_malloc/scsi_dma, thanks.
--
Jens Axboe
Jens Axboe <[email protected]> writes:
> On Wed, Jan 02 2002, Peter Osterlund wrote:
> > Hi!
> >
> > While doing some stress testing on the 2.5.2-pre5 kernel, I am hitting
> > a kernel BUG at scsi_merge.c:83, followed by a kernel panic. The
> > problem is that scsi_alloc_sgtable fails because the request contains
> > too many physical segments. I think this patch is the correct fix:
>
> Correct, ll_rw_blk default is ok now. I missed this when killing
> scsi_malloc/scsi_dma, thanks.
It turns out this is still not enough to fix the problem for me,
because ll_new_hw_segment is still allowing nr_phys_segments to become
too large. Is the following patch the correct way to deal with this
problem, or is that case supposed to be prevented by some other means?
At least, this patch prevents the kernel panic during my stress test.
--- linux-2.5.2-pre5/drivers/block/ll_rw_blk.c Mon Dec 31 14:56:37 2001
+++ linux-2.5-packet/drivers/block/ll_rw_blk.c Wed Jan 2 11:44:21 2002
@@ -530,6 +530,7 @@
struct bio *bio)
{
int nr_hw_segs = bio_hw_segments(q, bio);
+ int nr_phys_segs;
if (req->nr_hw_segments + nr_hw_segs > q->max_hw_segments) {
req->flags |= REQ_NOMERGE;
@@ -537,12 +538,19 @@
return 0;
}
+ nr_phys_segs = bio_phys_segments(q, bio);
+ if (req->nr_phys_segments + nr_phys_segs > q->max_phys_segments) {
+ req->flags |= REQ_NOMERGE;
+ q->last_merge = NULL;
+ return 0;
+ }
+
/*
* This will form the start of a new hw segment. Bump both
* counters.
*/
req->nr_hw_segments += nr_hw_segs;
- req->nr_phys_segments += bio_phys_segments(q, bio);
+ req->nr_phys_segments += nr_phys_segs;
return 1;
}
--
Peter Osterlund - [email protected]
http://w1.894.telia.com/~u89404340
Peter Osterlund <[email protected]> wrote:
> Jens Axboe <[email protected]> writes:
>
> > On Wed, Jan 02 2002, Peter Osterlund wrote:
> > > Hi!
> > >
> > > While doing some stress testing on the 2.5.2-pre5 kernel, I am hitting
> > > a kernel BUG at scsi_merge.c:83, followed by a kernel panic. The
> > > problem is that scsi_alloc_sgtable fails because the request contains
> > > too many physical segments. I think this patch is the correct fix:
> >
> > Correct, ll_rw_blk default is ok now. I missed this when killing
> > scsi_malloc/scsi_dma, thanks.
>
> It turns out this is still not enough to fix the problem for me,
> because ll_new_hw_segment is still allowing nr_phys_segments to become
> too large. Is the following patch the correct way to deal with this
> problem, or is that case supposed to be prevented by some other means?
> At least, this patch prevents the kernel panic during my stress test.
<snipped patches/>
Peter,
I was able to get a repeatable oops at that line copying
files from /boot onto a "fake" scsi_debug disk with "pre5".
The first largish file it attempted to copy caused the
oops (which I sent to Jens).
Anyway, I just applied your 2 patches (to scsi.c and ll_rw_blk.c)
and the oops is no more.
Good work.
Doug Gilbert
On Wed, Jan 02 2002, Douglas Gilbert wrote:
> Peter Osterlund <[email protected]> wrote:
>
> > Jens Axboe <[email protected]> writes:
> >
> > > On Wed, Jan 02 2002, Peter Osterlund wrote:
> > > > Hi!
> > > >
> > > > While doing some stress testing on the 2.5.2-pre5 kernel, I am hitting
> > > > a kernel BUG at scsi_merge.c:83, followed by a kernel panic. The
> > > > problem is that scsi_alloc_sgtable fails because the request contains
> > > > too many physical segments. I think this patch is the correct fix:
> > >
> > > Correct, ll_rw_blk default is ok now. I missed this when killing
> > > scsi_malloc/scsi_dma, thanks.
> >
> > It turns out this is still not enough to fix the problem for me,
> > because ll_new_hw_segment is still allowing nr_phys_segments to become
> > too large. Is the following patch the correct way to deal with this
> > problem, or is that case supposed to be prevented by some other means?
> > At least, this patch prevents the kernel panic during my stress test.
>
> <snipped patches/>
>
> Peter,
> I was able to get a repeatable oops at that line copying
> files from /boot onto a "fake" scsi_debug disk with "pre5".
> The first largish file it attempted to copy caused the
> oops (which I sent to Jens).
>
> Anyway, I just applied your 2 patches (to scsi.c and ll_rw_blk.c)
> and the oops is no more.
I've included a slightly modified version, your logic was correct though
Peter. Thanks.
--
Jens Axboe