2002-06-05 23:02:56

by Adam J. Richter

[permalink] [raw]
Subject: Patch: linux-2.5.20/fs/bio.c - ll_rw_kio made incorrect assumptions about queue handler's capabilities

I have not tested this patch, and I am not sure how to.
Perhaps you or someone else can suggest a program that exercises
ll_rw_kio.

It seems to me that ll_rw_kio in linux-2.5.20 can generate
bio's that have more segments or more sectors than their queues declare
that they can handle. For example, you could have block driver
that has max_hw_segments == 1, max_phys_segments == 1, max_sectors == 1.

The intermediate bio merging code in drivers/block/ll_rw_blk.c
obeys these constraints when it comes to merging requests that
also are all within these constraints already, but it does break
up bio's that were to big for the underlying queue at the time they
were submitted. As far as I can tell, it is the responsibility of
anything that calls submit_bio or generic_make_request to ensure that the
bio that is submitted is within the request_queue_t's limits
if the driver that pulls requests off of the queue is going to
be able to use max_hw_segments, max_phys_segments and max_sectors
as guarantees.

I have attached an improved version of the patch that I
posted to linux-kernel yesterday. There have been no comments
about that posting. This new version also handles the case
where the queue's maximum number of sectors is too little to
hold even one full page. I am cc'ing this version to linux-kernel
as well, in case anyone wants to comment.

Jens, does this patch look OK to you? If it does look OK,
is there anything else that I should do (e.g., is there some test I should
run?), or are do you want to take it from here and handle submitting
it to Linus?

--
Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."


2002-06-06 00:04:37

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.20/fs/bio.c - ll_rw_kio made incorrect assumptions about queue handler's capabilities

Arg! I forgot to attach the patch. Here it is.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

--- linux-2.5.20/fs/bio.c 2002-06-02 18:44:40.000000000 -0700
+++ linux/fs/bio.c 2002-06-05 15:46:16.000000000 -0700
@@ -338,10 +338,12 @@
**/
void ll_rw_kio(int rw, struct kiobuf *kio, struct block_device *bdev, sector_t sector)
{
- int i, offset, size, err, map_i, total_nr_pages, nr_pages;
+ int offset, size, err, map_i, total_nr_pages, nr_pages;
struct bio_vec *bvec;
struct bio *bio;
kdev_t dev = to_kdev_t(bdev->bd_dev);
+ request_queue_t *q = bdev->bd_queue;
+ int max_bytes, max_pages;

err = 0;
if ((rw & WRITE) && is_read_only(dev)) {
@@ -355,6 +357,7 @@
goto out;
}

+ max_bytes = q->max_sectors << 9;
/*
* maybe kio is bigger than the max we can easily map into a bio.
* if so, split it up in appropriately sized chunks.
@@ -367,8 +370,14 @@

map_i = 0;

+ max_pages = (max_bytes + PAGE_MASK) >> PAGE_SHIFT;
+ if (max_pages > q->max_phys_segments)
+ max_pages = q->max_phys_segments;
+ if (max_pages > q->max_hw_segments)
+ max_pages = q->max_hw_segments;
+
next_chunk:
- nr_pages = BIO_MAX_SECTORS >> (PAGE_SHIFT - 9);
+ nr_pages = max_pages;
if (nr_pages > total_nr_pages)
nr_pages = total_nr_pages;

@@ -389,15 +398,17 @@
bio->bi_private = kio;

bvec = bio->bi_io_vec;
- for (i = 0; i < nr_pages; i++, bvec++, map_i++) {
+ for (;total_nr_pages > 0; bvec++) {
int nbytes = PAGE_SIZE - offset;

if (nbytes > size)
nbytes = size;
+ if (nbytes > max_bytes)
+ nbytes = max_bytes;

BUG_ON(kio->maplist[map_i] == NULL);

- if (bio->bi_size + nbytes > (BIO_MAX_SECTORS << 9))
+ if (bio->bi_size + nbytes > max_bytes)
goto queue_io;

bio->bi_vcnt++;
@@ -407,14 +418,14 @@
bvec->bv_len = nbytes;
bvec->bv_offset = offset;

- /*
- * kiobuf only has an offset into the first page
- */
- offset = 0;
+ offset = (offset + nbytes) & PAGE_MASK;
+ if (offset == 0) {
+ total_nr_pages--;
+ map_i++;
+ }

sector += nbytes >> 9;
size -= nbytes;
- total_nr_pages--;
kio->offset += nbytes;
}

2002-06-06 00:23:32

by Andrew Morton

[permalink] [raw]
Subject: Re: Patch: linux-2.5.20/fs/bio.c - ll_rw_kio made incorrect assumptions about queue handler's capabilities

"Adam J. Richter" wrote:
>
> ..
> + max_bytes = q->max_sectors << 9;

Does this not also need to be done in fs/mpage.c? It's
just using BIO_MAX_SIZE.

What particular problem are you trying to solve here?

> /*
> * maybe kio is bigger than the max we can easily map into a bio.
> * if so, split it up in appropriately sized chunks.
> @@ -367,8 +370,14 @@
>
> map_i = 0;
>
> + max_pages = (max_bytes + PAGE_MASK) >> PAGE_SHIFT;
> + if (max_pages > q->max_phys_segments)
> + max_pages = q->max_phys_segments;
> + if (max_pages > q->max_hw_segments)
> + max_pages = q->max_hw_segments;
> +

I think probably this should be implemented as a block API
function.

This is going to drag us back into the BIO splitting quagmire.

> next_chunk:
> - nr_pages = BIO_MAX_SECTORS >> (PAGE_SHIFT - 9);
> + nr_pages = max_pages;

hmm. So BIO is based on PAGE_SIZE pages. Not PAGE_CACHE_SIZE.
I currently have:

unsigned nr_bvecs = MPAGE_BIO_MAX_SIZE / PAGE_CACHE_SIZE;

Which is about the only sane way in which the pagecache BIO
assembly code can go from "bytes" to "number of pages".
It's going to get interesting if someone makes PAGE_SIZE != PAGE_CACHE_SIZE.

2002-06-06 01:02:52

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.20/fs/bio.c - ll_rw_kio made incorrect assumptions about queue handler's capabilities

Andrew Morton <[email protected]> writes:
>What particular problem are you trying to solve here?

In 2.5.20, ll_rw_kio could submit bio's that are too
big for the underlying queue to handle (they might have more
sectors than q->max_sectors, more iovec's than q->max_phys_segments
or q->max_hw_segments).

As a hypothetical example, suppose that ll_rw_kio is
called to copy 148kB of data (37 4kB pages on x86) to a Compaq
Smart2 raid controller. Assume also that the pages are not
contiguous in physical memory and there is no iommu to make them
look contiguous to a DMA device, so there will be no merging.

In this example, ll_rw_kio in linux-2.5.20 would handle this
call by building a single bio with 37 iovec's and pass it in a single
call to submit_bio. Nothing in the request merging code in
drivers/block/ll_rw_blk.c will break up this bio. blk_recount_segments
will fill in bio->bi_phys_segments and bio->bi_hw_segments as 37
(because there is no merging to make it any smaller).
blk_recalc_rq_segments will then set rq->nr_phys_segments to 37
(the request will contain only this one bio, becasue bio->bi_phys_segments
already excceds q->max_phys_segments).

Eventually, do_cciss_request (in drivers/block/cciss.c) will
extract the request containing the bio with bio->bi_phys_segments == 37.
request and generate a kernel panic because the number of segements
exceeds 32. In the case of linux-2.5.20/drivers/block/cciss.c, this
happens around line 1799:

creq = elv_next_request(q);
if (creq->nr_phys_segments > MAXSGENTRIES)
BUG();


Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."