On Tue, 7 May 2002, Jens Axboe wrote:
> On Tue, May 07 2002, Peter Osterlund wrote:
> > On Mon, 6 May 2002, Johnathan Hicks wrote:
> >
> > > I thought some people here might find this amusing. After I
> > > quick-formatted a CD-RW with cdrwtool today I then did the
> > > following:
> > >
> > > mke2fs -b 2048 /dev/pktcdvd0
> > >
> > > It appears to work OK and I'm compiling a kernel on it as I type
> > > this. pktcdvd can't handle this properly unless the '-b 2048' is
> > > given for a pretty obvious reason. df tells me that the capacity of
> > > a disc formatted like this is 530MB so there is a few MB lost
> > > compared to UDF. I'm pretty sure this is a follish thing to do so
> > > would anyone care to give me some technical reasons why? :)
> >
> > Yes, I have tested this too. It works in 2.4 but not in 2.5. In 2.5
> > sr.c complains about unaligned transfers when I try to mount the
> > filesystem. 2.4 has code (sr_scatter_pad) to handle unaligned
> > transfers, but that code was removed in 2.5, and I havn't yet
> > investigated how it is supposed to work without that code.
>
> There's supposed to be a reblocking player in the middle, ala loop. This
> doesn't really work well for pktcdvd of course, so I'd suggest writing a
> helper for reading and writing sub-block size units.
I finally decided to investigate why this didn't work in the 2.5/2.6 code.
(My tests were made with the 2.6.1-rc1 code and the latest packet patch.)
There are two problems. The first is that the packet writing code forgot
to set the hardsect size. This is fixed by the following patch.
--- linux/drivers/block/pktcdvd.c.old 2004-01-02 00:23:57.000000000 +0100
+++ linux/drivers/block/pktcdvd.c 2004-01-02 00:24:01.000000000 +0100
@@ -2164,6 +2164,7 @@
request_queue_t *q = disks[pkt_get_minor(pd)]->queue;
blk_queue_make_request(q, pkt_make_request);
+ blk_queue_hardsect_size(q, CD_FRAMESIZE);
blk_queue_max_sectors(q, PACKET_MAX_SECTORS);
blk_queue_merge_bvec(q, pkt_merge_bvec);
q->queuedata = pd;
That's not enough to make it work though. If I create an ext2 filesystem
with 2kb blocksize, I hit a bug when I try to write some large files to
the filesystem. The problem is that the code in mpage_writepage() fails if
a page is mapped to disk across a packet boundary. In that case, the
bio_add_page() call at line 543 in mpage.c can fail even if the bio was
previously empty. The code then passes an empty bio to submit_bio(), which
triggers a bug at line 2303 in ll_rw_blk.c. This patch seems to fix the
problem.
--- linux/fs/mpage.c.old 2004-01-02 00:26:19.000000000 +0100
+++ linux/fs/mpage.c 2004-01-02 00:26:50.000000000 +0100
@@ -541,6 +541,11 @@
length = first_unmapped << blkbits;
if (bio_add_page(bio, page, length, 0) < length) {
+ if (!bio->bi_size) {
+ bio_put(bio);
+ bio = NULL;
+ goto confused;
+ }
bio = mpage_bio_submit(WRITE, bio);
goto alloc_new;
}
I noted that performance is quite bad with 2kb blocksize. It is a lot
faster with 4kb blocksize. (2kb blocksize with the udf filesystem is not
slow, only ext2 seems to have this problem.) Maybe the "confused" case
(which calls a_ops->writepage()) in mpage_writepage() isn't really meant
to be fast. Is there a better way to fix this problem?
--
Peter Osterlund - [email protected]
http://w1.894.telia.com/~u89404340
Peter Osterlund <[email protected]> wrote:
>
> If I create an ext2 filesystem
> with 2kb blocksize, I hit a bug when I try to write some large files to
> the filesystem. The problem is that the code in mpage_writepage() fails if
> a page is mapped to disk across a packet boundary. In that case, the
> bio_add_page() call at line 543 in mpage.c can fail even if the bio was
> previously empty. The code then passes an empty bio to submit_bio(), which
> triggers a bug at line 2303 in ll_rw_blk.c. This patch seems to fix the
> problem.
>
> --- linux/fs/mpage.c.old 2004-01-02 00:26:19.000000000 +0100
> +++ linux/fs/mpage.c 2004-01-02 00:26:50.000000000 +0100
> @@ -541,6 +541,11 @@
>
> length = first_unmapped << blkbits;
> if (bio_add_page(bio, page, length, 0) < length) {
> + if (!bio->bi_size) {
> + bio_put(bio);
> + bio = NULL;
> + goto confused;
> + }
> bio = mpage_bio_submit(WRITE, bio);
> goto alloc_new;
> }
Confused. We initially have an empty BIO, and we run bio_add_page()
against it, adding one page.
How can that bio_add_page() fail to add the page?
Cold you describe the failure a little more please?
> I noted that performance is quite bad with 2kb blocksize. It is a lot
> faster with 4kb blocksize. (2kb blocksize with the udf filesystem is not
> slow, only ext2 seems to have this problem.) Maybe the "confused" case
> (which calls a_ops->writepage()) in mpage_writepage() isn't really meant
> to be fast. Is there a better way to fix this problem?
How much slower is it?
Andrew Morton <[email protected]> writes:
> Peter Osterlund <[email protected]> wrote:
> >
> > If I create an ext2 filesystem
> > with 2kb blocksize, I hit a bug when I try to write some large files to
> > the filesystem. The problem is that the code in mpage_writepage() fails if
> > a page is mapped to disk across a packet boundary. In that case, the
> > bio_add_page() call at line 543 in mpage.c can fail even if the bio was
> > previously empty. The code then passes an empty bio to submit_bio(), which
> > triggers a bug at line 2303 in ll_rw_blk.c. This patch seems to fix the
> > problem.
> >
> > --- linux/fs/mpage.c.old 2004-01-02 00:26:19.000000000 +0100
> > +++ linux/fs/mpage.c 2004-01-02 00:26:50.000000000 +0100
> > @@ -541,6 +541,11 @@
> >
> > length = first_unmapped << blkbits;
> > if (bio_add_page(bio, page, length, 0) < length) {
> > + if (!bio->bi_size) {
> > + bio_put(bio);
> > + bio = NULL;
> > + goto confused;
> > + }
> > bio = mpage_bio_submit(WRITE, bio);
> > goto alloc_new;
> > }
>
> Confused. We initially have an empty BIO, and we run bio_add_page()
> against it, adding one page.
>
> How can that bio_add_page() fail to add the page?
>
> Cold you describe the failure a little more please?
In bio_add_page(), there is this check:
/*
* if queue has other restrictions (eg varying max sector size
* depending on offset), it can specify a merge_bvec_fn in the
* queue to get further control
*/
if (q->merge_bvec_fn) {
/*
* merge_bvec_fn() returns number of bytes it can accept
* at this offset
*/
if (q->merge_bvec_fn(q, bio, bvec) < len) {
bvec->bv_page = NULL;
bvec->bv_len = 0;
bvec->bv_offset = 0;
return 0;
}
}
The packet writing code has the restriction that a bio must not span a
packet boundary. (A packet is 32*2048 bytes.) If the page when mapped
to disk starts 2kb before a packet boundary, merge_bvec_fn therefore
returns 2048, which is less than len, which is 4096 if the whole page
is mapped, so the bio_add_page() call fails.
I don't think this case ever happens with the standard kernel, because
it doesn't (afaik) contain any merge_bvec_fn implementations that are
as restrictive as the one in the cdrw packet writing code. I still
think the code in mpage_writepage() is wrong, even if it happens to
work in the standard kernel.
> > I noted that performance is quite bad with 2kb blocksize. It is a lot
> > faster with 4kb blocksize. (2kb blocksize with the udf filesystem is not
> > slow, only ext2 seems to have this problem.) Maybe the "confused" case
> > (which calls a_ops->writepage()) in mpage_writepage() isn't really meant
> > to be fast. Is there a better way to fix this problem?
>
> How much slower is it?
I didn't make any exact measurements, but it looked like the pageout
code didn't throw enough parallel bios at the packet writing request
queue, so a lot of incomplete packets had to be written to disk. This
is very costly, because the code then has to do read/modify/write
cycles instead of just writing the whole packet directly.
A rough guess is that 2kb block size was 2-3 times slower than 4kb
block size, but I'll make more tests later to see what's really going
on here.
--
Peter Osterlund - [email protected]
http://w1.894.telia.com/~u89404340
On Fri, 2004-01-02 at 02:30, Peter Osterlund wrote:
> The packet writing code has the restriction that a bio must not span a
> packet boundary. (A packet is 32*2048 bytes.) If the page when mapped
> to disk starts 2kb before a packet boundary, merge_bvec_fn therefore
> returns 2048, which is less than len, which is 4096 if the whole page
> is mapped, so the bio_add_page() call fails.
devicemapper has similar restrictions for raid0 format; in that case
it's device-mappers job to split the page/bio. Just as it is UDF's task
to do the same I suspect...
On Fri, Jan 02 2004, Peter Osterlund wrote:
> Andrew Morton <[email protected]> writes:
>
> > Peter Osterlund <[email protected]> wrote:
> > >
> > > If I create an ext2 filesystem
> > > with 2kb blocksize, I hit a bug when I try to write some large files to
> > > the filesystem. The problem is that the code in mpage_writepage() fails if
> > > a page is mapped to disk across a packet boundary. In that case, the
> > > bio_add_page() call at line 543 in mpage.c can fail even if the bio was
> > > previously empty. The code then passes an empty bio to submit_bio(), which
> > > triggers a bug at line 2303 in ll_rw_blk.c. This patch seems to fix the
> > > problem.
> > >
> > > --- linux/fs/mpage.c.old 2004-01-02 00:26:19.000000000 +0100
> > > +++ linux/fs/mpage.c 2004-01-02 00:26:50.000000000 +0100
> > > @@ -541,6 +541,11 @@
> > >
> > > length = first_unmapped << blkbits;
> > > if (bio_add_page(bio, page, length, 0) < length) {
> > > + if (!bio->bi_size) {
> > > + bio_put(bio);
> > > + bio = NULL;
> > > + goto confused;
> > > + }
> > > bio = mpage_bio_submit(WRITE, bio);
> > > goto alloc_new;
> > > }
> >
> > Confused. We initially have an empty BIO, and we run bio_add_page()
> > against it, adding one page.
> >
> > How can that bio_add_page() fail to add the page?
> >
> > Cold you describe the failure a little more please?
>
> In bio_add_page(), there is this check:
>
> /*
> * if queue has other restrictions (eg varying max sector size
> * depending on offset), it can specify a merge_bvec_fn in the
> * queue to get further control
> */
> if (q->merge_bvec_fn) {
> /*
> * merge_bvec_fn() returns number of bytes it can accept
> * at this offset
> */
> if (q->merge_bvec_fn(q, bio, bvec) < len) {
> bvec->bv_page = NULL;
> bvec->bv_len = 0;
> bvec->bv_offset = 0;
> return 0;
> }
> }
>
> The packet writing code has the restriction that a bio must not span a
> packet boundary. (A packet is 32*2048 bytes.) If the page when mapped
> to disk starts 2kb before a packet boundary, merge_bvec_fn therefore
> returns 2048, which is less than len, which is 4096 if the whole page
> is mapped, so the bio_add_page() call fails.
The packet writing code is buggy then, you must always allow a page to
be added to an empty bio. You'll have to deal with the pieces there, I'm
afraid.
So it's not a bug in mpage. If that was the case, there are other buggy
places in the kernel.
--
Jens Axboe
On Fri, Jan 02 2004, Arjan van de Ven wrote:
> On Fri, 2004-01-02 at 02:30, Peter Osterlund wrote:
>
> > The packet writing code has the restriction that a bio must not span a
> > packet boundary. (A packet is 32*2048 bytes.) If the page when mapped
> > to disk starts 2kb before a packet boundary, merge_bvec_fn therefore
> > returns 2048, which is less than len, which is 4096 if the whole page
> > is mapped, so the bio_add_page() call fails.
>
> devicemapper has similar restrictions for raid0 format; in that case
> it's device-mappers job to split the page/bio. Just as it is UDF's task
> to do the same I suspect...
It has nothing to do with UDF, it's a driver problem (API breakage).
--
Jens Axboe
On Fri, Jan 02 2004, Peter Osterlund wrote:
> Arjan van de Ven <[email protected]> writes:
>
> > On Fri, 2004-01-02 at 02:30, Peter Osterlund wrote:
> >
> > > The packet writing code has the restriction that a bio must not span a
> > > packet boundary. (A packet is 32*2048 bytes.) If the page when mapped
> > > to disk starts 2kb before a packet boundary, merge_bvec_fn therefore
> > > returns 2048, which is less than len, which is 4096 if the whole page
> > > is mapped, so the bio_add_page() call fails.
> >
> > devicemapper has similar restrictions for raid0 format; in that case
> > it's device-mappers job to split the page/bio. Just as it is UDF's task
> > to do the same I suspect...
>
> Old versions of the packet writing code did just that, but Jens told
> me that bio splitting was evil, so when the merge_bvec_fn
> functionality was added to the kernel, I started to use it.
>
> http://lists.suse.com/archive/packet-writing/2002-Aug/0044.html
Splitting is evil, but unfortunately it's a necessary evil... There are
a few kernel helpers to make supporting it easier (see bio_split()). Not
so sure how well that'll work for you, you may have to do the grunt work
yourself.
> If merge_bvec_fn is not supposed to be able to handle the need of the
> packet writing code, I can certainly resurrect my bio splitting code.
Only partially. Read my email: you _must_ accept a page addition to an
empty bio. You can refuse others. For the single page case, you may need
to split.
> Btw, for some reason, this bug is not triggered when using the UDF
> filesystem on a CDRW. I've only seen it with the ext2 filesystem.
Does UDF use mpage? The fact that it doesn't trigger on UDF doesn't mean
that packet writing isn't breaking the API :)
--
Jens Axboe
Arjan van de Ven <[email protected]> writes:
> On Fri, 2004-01-02 at 02:30, Peter Osterlund wrote:
>
> > The packet writing code has the restriction that a bio must not span a
> > packet boundary. (A packet is 32*2048 bytes.) If the page when mapped
> > to disk starts 2kb before a packet boundary, merge_bvec_fn therefore
> > returns 2048, which is less than len, which is 4096 if the whole page
> > is mapped, so the bio_add_page() call fails.
>
> devicemapper has similar restrictions for raid0 format; in that case
> it's device-mappers job to split the page/bio. Just as it is UDF's task
> to do the same I suspect...
Old versions of the packet writing code did just that, but Jens told
me that bio splitting was evil, so when the merge_bvec_fn
functionality was added to the kernel, I started to use it.
http://lists.suse.com/archive/packet-writing/2002-Aug/0044.html
If merge_bvec_fn is not supposed to be able to handle the need of the
packet writing code, I can certainly resurrect my bio splitting code.
Btw, for some reason, this bug is not triggered when using the UDF
filesystem on a CDRW. I've only seen it with the ext2 filesystem.
--
Peter Osterlund - [email protected]
http://w1.894.telia.com/~u89404340
Jens Axboe <[email protected]> writes:
> On Fri, Jan 02 2004, Peter Osterlund wrote:
> > Arjan van de Ven <[email protected]> writes:
> >
> > > On Fri, 2004-01-02 at 02:30, Peter Osterlund wrote:
> > >
> > > > The packet writing code has the restriction that a bio must not span a
> > > > packet boundary. (A packet is 32*2048 bytes.) If the page when mapped
> > > > to disk starts 2kb before a packet boundary, merge_bvec_fn therefore
> > > > returns 2048, which is less than len, which is 4096 if the whole page
> > > > is mapped, so the bio_add_page() call fails.
> > >
> > > devicemapper has similar restrictions for raid0 format; in that case
> > > it's device-mappers job to split the page/bio. Just as it is UDF's task
> > > to do the same I suspect...
> >
> > Old versions of the packet writing code did just that, but Jens told
> > me that bio splitting was evil, so when the merge_bvec_fn
> > functionality was added to the kernel, I started to use it.
> >
> > http://lists.suse.com/archive/packet-writing/2002-Aug/0044.html
>
> Splitting is evil, but unfortunately it's a necessary evil... There are
> a few kernel helpers to make supporting it easier (see bio_split()). Not
> so sure how well that'll work for you, you may have to do the grunt work
> yourself.
OK, I'll fix the packet writing code.
> > If merge_bvec_fn is not supposed to be able to handle the need of the
> > packet writing code, I can certainly resurrect my bio splitting code.
>
> Only partially. Read my email: you _must_ accept a page addition to an
> empty bio. You can refuse others. For the single page case, you may need
> to split.
>
> > Btw, for some reason, this bug is not triggered when using the UDF
> > filesystem on a CDRW. I've only seen it with the ext2 filesystem.
>
> Does UDF use mpage? The fact that it doesn't trigger on UDF doesn't mean
> that packet writing isn't breaking the API :)
Agreed, this was not meant to excuse the packet writing code, just
some additional trivia.
Btw, is this API documented somewhere, or does it have to be reverse
engineered by means of understanding implementation details in
mpage_writepage() and similar functions? ;) May I suggest this patch?
--- linux/drivers/block/ll_rw_blk.c.old 2004-01-02 12:56:55.000000000 +0100
+++ linux/drivers/block/ll_rw_blk.c 2004-01-02 13:07:25.000000000 +0100
@@ -173,9 +173,11 @@
* are dynamic, and thus we have to query the queue whether it is ok to
* add a new bio_vec to a bio at a given offset or not. If the block device
* has such limitations, it needs to register a merge_bvec_fn to control
- * the size of bio's sent to it. Per default now merge_bvec_fn is defined for
- * a queue, and only the fixed limits are honored.
- *
+ * the size of bio's sent to it. Note that a block device *must* allow a
+ * single page to be added to an empty bio. The block device driver may want
+ * to use the bio_split() function to deal with these bio's. Per default
+ * no merge_bvec_fn is defined for a queue, and only the fixed limits are
+ * honored.
*/
void blk_queue_merge_bvec(request_queue_t *q, merge_bvec_fn *mbfn)
{
--
Peter Osterlund - [email protected]
http://w1.894.telia.com/~u89404340
On Fri, Jan 02 2004, Peter Osterlund wrote:
> > Does UDF use mpage? The fact that it doesn't trigger on UDF doesn't mean
> > that packet writing isn't breaking the API :)
>
> Agreed, this was not meant to excuse the packet writing code, just
> some additional trivia.
>
> Btw, is this API documented somewhere, or does it have to be reverse
> engineered by means of understanding implementation details in
> mpage_writepage() and similar functions? ;) May I suggest this patch?
>
> --- linux/drivers/block/ll_rw_blk.c.old 2004-01-02 12:56:55.000000000 +0100
> +++ linux/drivers/block/ll_rw_blk.c 2004-01-02 13:07:25.000000000 +0100
> @@ -173,9 +173,11 @@
> * are dynamic, and thus we have to query the queue whether it is ok to
> * add a new bio_vec to a bio at a given offset or not. If the block device
> * has such limitations, it needs to register a merge_bvec_fn to control
> - * the size of bio's sent to it. Per default now merge_bvec_fn is defined for
> - * a queue, and only the fixed limits are honored.
> - *
> + * the size of bio's sent to it. Note that a block device *must* allow a
> + * single page to be added to an empty bio. The block device driver may want
> + * to use the bio_split() function to deal with these bio's. Per default
> + * no merge_bvec_fn is defined for a queue, and only the fixed limits are
> + * honored.
> */
> void blk_queue_merge_bvec(request_queue_t *q, merge_bvec_fn *mbfn)
> {
I just looked but could not find anything about it, there's been some
talk on this list. But it doesn't look like it ever got documented in
text writing. That needs to be fixed for sure, thanks for the patch. It
probably wants documenting in fs/bio.c:bio_add_page() too.
--
Jens Axboe
Jens Axboe <[email protected]> writes:
> I just looked but could not find anything about it, there's been some
> talk on this list. But it doesn't look like it ever got documented in
> text writing. That needs to be fixed for sure, thanks for the patch. It
> probably wants documenting in fs/bio.c:bio_add_page() too.
OK, here is an updated patch.
Improved documentation for blk_queue_merge_bvec() and bio_add_page().
linux-petero/drivers/block/ll_rw_blk.c | 8 +++++---
linux-petero/fs/bio.c | 4 +++-
2 files changed, 8 insertions(+), 4 deletions(-)
diff -puN drivers/block/ll_rw_blk.c~block-api-doc drivers/block/ll_rw_blk.c
--- linux/drivers/block/ll_rw_blk.c~block-api-doc 2004-01-02 13:54:38.000000000 +0100
+++ linux-petero/drivers/block/ll_rw_blk.c 2004-01-02 13:55:50.000000000 +0100
@@ -173,9 +173,11 @@ EXPORT_SYMBOL(blk_queue_prep_rq);
* are dynamic, and thus we have to query the queue whether it is ok to
* add a new bio_vec to a bio at a given offset or not. If the block device
* has such limitations, it needs to register a merge_bvec_fn to control
- * the size of bio's sent to it. Per default now merge_bvec_fn is defined for
- * a queue, and only the fixed limits are honored.
- *
+ * the size of bio's sent to it. Note that a block device *must* allow a
+ * single page to be added to an empty bio. The block device driver may want
+ * to use the bio_split() function to deal with these bio's. Per default
+ * no merge_bvec_fn is defined for a queue, and only the fixed limits are
+ * honored.
*/
void blk_queue_merge_bvec(request_queue_t *q, merge_bvec_fn *mbfn)
{
diff -puN fs/bio.c~block-api-doc fs/bio.c
--- linux/fs/bio.c~block-api-doc 2004-01-02 14:00:13.000000000 +0100
+++ linux-petero/fs/bio.c 2004-01-02 14:37:41.000000000 +0100
@@ -290,7 +290,9 @@ int bio_get_nr_vecs(struct block_device
*
* Attempt to add a page to the bio_vec maplist. This can fail for a
* number of reasons, such as the bio being full or target block
- * device limitations.
+ * device limitations. The target block device must not disallow bio's
+ * smaller than PAGE_SIZE, so it is always possible to add a single
+ * page to an empty bio.
*/
int bio_add_page(struct bio *bio, struct page *page, unsigned int len,
unsigned int offset)
_
--
Peter Osterlund - [email protected]
http://w1.894.telia.com/~u89404340
On Fri, Jan 02 2004, Peter Osterlund wrote:
> Jens Axboe <[email protected]> writes:
>
> > I just looked but could not find anything about it, there's been some
> > talk on this list. But it doesn't look like it ever got documented in
> > text writing. That needs to be fixed for sure, thanks for the patch. It
> > probably wants documenting in fs/bio.c:bio_add_page() too.
>
> OK, here is an updated patch.
>
> Improved documentation for blk_queue_merge_bvec() and bio_add_page().
>
>
> linux-petero/drivers/block/ll_rw_blk.c | 8 +++++---
> linux-petero/fs/bio.c | 4 +++-
> 2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff -puN drivers/block/ll_rw_blk.c~block-api-doc drivers/block/ll_rw_blk.c
> --- linux/drivers/block/ll_rw_blk.c~block-api-doc 2004-01-02 13:54:38.000000000 +0100
> +++ linux-petero/drivers/block/ll_rw_blk.c 2004-01-02 13:55:50.000000000 +0100
> @@ -173,9 +173,11 @@ EXPORT_SYMBOL(blk_queue_prep_rq);
> * are dynamic, and thus we have to query the queue whether it is ok to
> * add a new bio_vec to a bio at a given offset or not. If the block device
> * has such limitations, it needs to register a merge_bvec_fn to control
> - * the size of bio's sent to it. Per default now merge_bvec_fn is defined for
> - * a queue, and only the fixed limits are honored.
> - *
> + * the size of bio's sent to it. Note that a block device *must* allow a
> + * single page to be added to an empty bio. The block device driver may want
> + * to use the bio_split() function to deal with these bio's. Per default
> + * no merge_bvec_fn is defined for a queue, and only the fixed limits are
> + * honored.
> */
> void blk_queue_merge_bvec(request_queue_t *q, merge_bvec_fn *mbfn)
> {
> diff -puN fs/bio.c~block-api-doc fs/bio.c
> --- linux/fs/bio.c~block-api-doc 2004-01-02 14:00:13.000000000 +0100
> +++ linux-petero/fs/bio.c 2004-01-02 14:37:41.000000000 +0100
> @@ -290,7 +290,9 @@ int bio_get_nr_vecs(struct block_device
> *
> * Attempt to add a page to the bio_vec maplist. This can fail for a
> * number of reasons, such as the bio being full or target block
> - * device limitations.
> + * device limitations. The target block device must not disallow bio's
That may not be unparsable, but I'll remove the double negative :).
Otherwise comitted, thanks.
--
Jens Axboe
Jens Axboe <[email protected]> writes:
> On Fri, Jan 02 2004, Peter Osterlund wrote:
> >
> > Old versions of the packet writing code did just that, but Jens told
> > me that bio splitting was evil, so when the merge_bvec_fn
> > functionality was added to the kernel, I started to use it.
> >
> > http://lists.suse.com/archive/packet-writing/2002-Aug/0044.html
>
> Splitting is evil, but unfortunately it's a necessary evil... There are
> a few kernel helpers to make supporting it easier (see bio_split()). Not
> so sure how well that'll work for you, you may have to do the grunt work
> yourself.
It seems like bio_split() does exactly what I need. The patch below
makes 2kb blocksize ext2 work and also fixes the performance problem
compared to 4kb blocksize ext2. Thanks to everyone involved for their
help.
--- linux/drivers/block/pktcdvd.c.old 2004-01-02 16:58:52.000000000 +0100
+++ linux/drivers/block/pktcdvd.c 2004-01-02 16:59:26.000000000 +0100
@@ -2083,11 +2083,23 @@
(unsigned long long)bio->bi_sector,
(unsigned long long)(bio->bi_sector + bio_sectors(bio)));
- /* Some debug code to make sure the merge_bvec_fn function is working. */
+ /* Check if we have to split the bio */
{
+ struct bio_pair *bp;
sector_t last_zone;
+ int first_sectors;
+
last_zone = ZONE(bio->bi_sector + bio_sectors(bio) - 1, pd);
- BUG_ON(last_zone != zone);
+ if (last_zone != zone) {
+ BUG_ON(last_zone != zone + pd->settings.size);
+ first_sectors = last_zone - bio->bi_sector;
+ bp = bio_split(bio, bio_split_pool, first_sectors);
+ BUG_ON(!bp);
+ pkt_make_request(q, &bp->bio1);
+ pkt_make_request(q, &bp->bio2);
+ bio_pair_release(bp);
+ return 0;
+ }
}
/*
@@ -2153,6 +2165,15 @@
sector_t zone = ZONE(bio->bi_sector, pd);
int used = ((bio->bi_sector - zone) << 9) + bio->bi_size;
int remaining = (pd->settings.size << 9) - used;
+ int remaining2;
+
+ /*
+ * A bio <= PAGE_SIZE must be allowed. If it crosses a packet
+ * boundary, pkt_make_request() will split the bio.
+ */
+ remaining2 = PAGE_SIZE - bio->bi_size;
+ remaining = max_t(int, remaining, remaining2);
+
BUG_ON(remaining < 0);
return remaining;
}
--
Peter Osterlund - [email protected]
http://w1.894.telia.com/~u89404340
Hi!
> > > > I thought some people here might find this amusing. After I
> > > > quick-formatted a CD-RW with cdrwtool today I then did the
> > > > following:
> > > >
> > > > mke2fs -b 2048 /dev/pktcdvd0
> > > >
> > > > It appears to work OK and I'm compiling a kernel on it as I type
> > > > this. pktcdvd can't handle this properly unless the '-b 2048' is
> > > > given for a pretty obvious reason. df tells me that the capacity of
> > > > a disc formatted like this is 530MB so there is a few MB lost
> > > > compared to UDF. I'm pretty sure this is a follish thing to do so
> > > > would anyone care to give me some technical reasons why? :)
> > >
> > > Yes, I have tested this too. It works in 2.4 but not in 2.5. In 2.5
> > > sr.c complains about unaligned transfers when I try to mount the
> > > filesystem. 2.4 has code (sr_scatter_pad) to handle unaligned
> > > transfers, but that code was removed in 2.5, and I havn't yet
> > > investigated how it is supposed to work without that code.
> >
> > There's supposed to be a reblocking player in the middle, ala loop. This
> > doesn't really work well for pktcdvd of course, so I'd suggest writing a
> > helper for reading and writing sub-block size units.
>
> I finally decided to investigate why this didn't work in the 2.5/2.6 code.
> (My tests were made with the 2.6.1-rc1 code and the latest packet patch.)
> There are two problems. The first is that the packet writing code forgot
> to set the hardsect size. This is fixed by the following patch.
>
> --- linux/drivers/block/pktcdvd.c.old 2004-01-02 00:23:57.000000000 +0100
> +++ linux/drivers/block/pktcdvd.c 2004-01-02 00:24:01.000000000 +0100
> @@ -2164,6 +2164,7 @@
> request_queue_t *q = disks[pkt_get_minor(pd)]->queue;
>
> blk_queue_make_request(q, pkt_make_request);
> + blk_queue_hardsect_size(q, CD_FRAMESIZE);
> blk_queue_max_sectors(q, PACKET_MAX_SECTORS);
> blk_queue_merge_bvec(q, pkt_merge_bvec);
> q->queuedata = pd;
>
Where do I get this file? It does not appear to be in 2.6.0.
[I have few partly-bad cd-rws, and putting ext2 on them would be
"cool" :-)]
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
Pavel Machek <[email protected]> writes:
> > --- linux/drivers/block/pktcdvd.c.old 2004-01-02 00:23:57.000000000 +0100
> > +++ linux/drivers/block/pktcdvd.c 2004-01-02 00:24:01.000000000 +0100
> > @@ -2164,6 +2164,7 @@
> > request_queue_t *q = disks[pkt_get_minor(pd)]->queue;
> >
> > blk_queue_make_request(q, pkt_make_request);
> > + blk_queue_hardsect_size(q, CD_FRAMESIZE);
> > blk_queue_max_sectors(q, PACKET_MAX_SECTORS);
> > blk_queue_merge_bvec(q, pkt_merge_bvec);
> > q->queuedata = pd;
> >
>
> Where do I get this file? It does not appear to be in 2.6.0.
>
> [I have few partly-bad cd-rws, and putting ext2 on them would be
> "cool" :-)]
Look here:
http://w1.894.telia.com/~u89404340/packet.html
Download here:
http://w1.894.telia.com/~u89404340/patches/packet/2.6/
Note that this software is still beta quality at best. I'm not sure if
you'll have any success with partly bad cd-rws, but it doesn't hurt to
try I guess.
I haven't made a new release with the two latest bug fixes, so if you
plan to put ext2 on the disc, you need the quoted hardsect_size patch
and the bio split patch from a few days ago.
http://marc.theaimsgroup.com/?l=linux-kernel&m=107306015810846&w=2
--
Peter Osterlund - [email protected]
http://w1.894.telia.com/~u89404340
Jens Axboe <[email protected]> writes:
> On Fri, Jan 02 2004, Peter Osterlund wrote:
> >
> > Old versions of the packet writing code did just that, but Jens told
> > me that bio splitting was evil, so when the merge_bvec_fn
> > functionality was added to the kernel, I started to use it.
> >
> > http://lists.suse.com/archive/packet-writing/2002-Aug/0044.html
>
> Splitting is evil, but unfortunately it's a necessary evil... There are
> a few kernel helpers to make supporting it easier (see bio_split()). Not
> so sure how well that'll work for you, you may have to do the grunt work
> yourself.
It seems like bio_split() does exactly what I need. The patch below
makes 2kb blocksize ext2 work and also fixes the performance problem
compared to 4kb blocksize ext2. Thanks to everyone involved for their
help.
--- linux/drivers/block/pktcdvd.c.old 2004-01-02 16:58:52.000000000 +0100
+++ linux/drivers/block/pktcdvd.c 2004-01-02 16:59:26.000000000 +0100
@@ -2083,11 +2083,23 @@
(unsigned long long)bio->bi_sector,
(unsigned long long)(bio->bi_sector + bio_sectors(bio)));
- /* Some debug code to make sure the merge_bvec_fn function is working. */
+ /* Check if we have to split the bio */
{
+ struct bio_pair *bp;
sector_t last_zone;
+ int first_sectors;
+
last_zone = ZONE(bio->bi_sector + bio_sectors(bio) - 1, pd);
- BUG_ON(last_zone != zone);
+ if (last_zone != zone) {
+ BUG_ON(last_zone != zone + pd->settings.size);
+ first_sectors = last_zone - bio->bi_sector;
+ bp = bio_split(bio, bio_split_pool, first_sectors);
+ BUG_ON(!bp);
+ pkt_make_request(q, &bp->bio1);
+ pkt_make_request(q, &bp->bio2);
+ bio_pair_release(bp);
+ return 0;
+ }
}
/*
@@ -2153,6 +2165,15 @@
sector_t zone = ZONE(bio->bi_sector, pd);
int used = ((bio->bi_sector - zone) << 9) + bio->bi_size;
int remaining = (pd->settings.size << 9) - used;
+ int remaining2;
+
+ /*
+ * A bio <= PAGE_SIZE must be allowed. If it crosses a packet
+ * boundary, pkt_make_request() will split the bio.
+ */
+ remaining2 = PAGE_SIZE - bio->bi_size;
+ remaining = max_t(int, remaining, remaining2);
+
BUG_ON(remaining < 0);
return remaining;
}
--
Peter Osterlund - [email protected]
http://w1.894.telia.com/~u89404340