2008-10-10 14:06:42

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [patch 4/8] mm: write_cache_pages type overflow fix

On Fri, Oct 10, 2008 at 02:48:02PM +0100, Steven Whitehouse wrote:
> I've not looked at ext4's copy of write_cache_pages, but there is also a
> copy in GFS2. Its used only for journaled data, and it is pretty much a
> direct copy of write_cache_pages except that its split into two so that
> a transaction can be opened in the "middle".
>
> Perhaps it would be possible to make some changes so that we can both
> share the "core" version of write_cache_pages. My plan was to wait until
> Nick's patches have made it to Linus, and then to look into what might
> be done,

To be clear; ext4 doesn't have its own copy of write_cache_pages (at
least not yet); there is a patch that creates our own copy, mainly to
disable updates to writeback_index, range_start, and nr_to_write in
the wbc structure.

Christoph has suggested a patch which modifies write_cache_pages() so
that a filesystem could set a flag which disables those updates,
instead of just making a copy of write_cache_pages. (Maybe eventually
we would get rid of those updates unconditionally; it's not clear to
me though that this makes sense for all filesystems.)

So we have three choices as far as getting the 10x for (large)
streaming write performance into 2.6.28:

1) Aneesh's first patch, which called write_cache_pages()
and then undid the effects of the updates to the relevant wbc fields.
(http://lkml.org/lkml/2008/10/6/61)

2) Aneesh's second version of the patch, which copied
write_cache_pages() into ext4_write_cache_pages() and then removed the
updates; resulting in a large patch, but one that might be easier to
understand, although harder to maintain in the long term.
(http://lkml.org/lkml/2008/10/7/52)

3) A version which (optionally via a flag in the wbc structure)
instructs write_cache_pages() to not pursue those updates. This has
not been written yet.

For why we need to do this, see Aneesh's explanation here:

http://lkml.org/lkml/2008/10/7/78

If we don't think the Nick's patches are going to be stable enough for
merging in time for 2.6.28, it's possible we could pursue (1) or (2),
and if there's -mm concurrence, even (3). (1) might be the best if
the goal is to wait for Nick's patches to hit mainline first, and then
we can try to sort and merge our per-filesystems unique hacks (or
copies of write_cache_pages or whatever) back to the upstream version.

All of this being said, I'll confess that I have *not* had time to
look deeply at Nick's full patchset yet. Which has been another
reason why I haven't queued up any of Aneesh's patches in this area
yet; all of this hit just right before the merge window opened up, and
I've been insanely busy as of late.

- Ted


2008-10-10 14:08:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch 4/8] mm: write_cache_pages type overflow fix

On Fri, Oct 10, 2008 at 10:05:35AM -0400, Theodore Tso wrote:
> 3) A version which (optionally via a flag in the wbc structure)
> instructs write_cache_pages() to not pursue those updates. This has
> not been written yet.

This one sounds best to me (although we'd have to actualy see it..)

2008-10-10 15:54:47

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [patch 4/8] mm: write_cache_pages type overflow fix

On Fri, Oct 10, 2008 at 10:08:29AM -0400, Christoph Hellwig wrote:
> On Fri, Oct 10, 2008 at 10:05:35AM -0400, Theodore Tso wrote:
> > 3) A version which (optionally via a flag in the wbc structure)
> > instructs write_cache_pages() to not pursue those updates. This has
> > not been written yet.
>
> This one sounds best to me (although we'd have to actualy see it..)

something like the below ?

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index bd91987..7599af2 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -63,6 +63,8 @@ struct writeback_control {
unsigned for_writepages:1; /* This is a writepages() call */
unsigned range_cyclic:1; /* range_start is cyclic */
unsigned more_io:1; /* more io to be dispatched */
+ /* flags which control the write_cache_pages behaviour */
+ int writeback_flags;
};

/*
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 718efa6..c198ead 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -876,11 +876,18 @@ int write_cache_pages(struct address_space *mapping,
pgoff_t end; /* Inclusive */
int scanned = 0;
int range_whole = 0;
+ int flags = wbc->writeback_flags;
+ long *nr_to_write, count;

if (wbc->nonblocking && bdi_write_congested(bdi)) {
wbc->encountered_congestion = 1;
return 0;
}
+ if (flags & WB_NO_NRWRITE_UPDATE) {
+ count = wbc->nr_to_write;
+ nr_to_write = &count;
+ } else
+ nr_to_write = &wbc->nr_to_write;

pagevec_init(&pvec, 0);
if (wbc->range_cyclic) {
@@ -939,7 +946,7 @@ int write_cache_pages(struct address_space *mapping,
unlock_page(page);
ret = 0;
}
- if (ret || (--(wbc->nr_to_write) <= 0))
+ if (ret || (--(*nr_to_write) <= 0))
done = 1;
if (wbc->nonblocking && bdi_write_congested(bdi)) {
wbc->encountered_congestion = 1;
@@ -958,8 +965,11 @@ int write_cache_pages(struct address_space *mapping,
index = 0;
goto retry;
}
- if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
+ if ((wbc->range_cyclic ||
+ (range_whole && wbc->nr_to_write > 0)) &&
+ (flags & ~WB_NO_INDEX_UPDATE)) {
mapping->writeback_index = index;
+ }

return ret;
}

2008-10-10 15:59:18

by Chris Mason

[permalink] [raw]
Subject: Re: [patch 4/8] mm: write_cache_pages type overflow fix

On Fri, 2008-10-10 at 21:24 +0530, Aneesh Kumar K.V wrote:
> On Fri, Oct 10, 2008 at 10:08:29AM -0400, Christoph Hellwig wrote:
> > On Fri, Oct 10, 2008 at 10:05:35AM -0400, Theodore Tso wrote:
> > > 3) A version which (optionally via a flag in the wbc structure)
> > > instructs write_cache_pages() to not pursue those updates. This has
> > > not been written yet.
> >
> > This one sounds best to me (although we'd have to actualy see it..)
>
> something like the below ?
>
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index bd91987..7599af2 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -63,6 +63,8 @@ struct writeback_control {
> unsigned for_writepages:1; /* This is a writepages() call */
> unsigned range_cyclic:1; /* range_start is cyclic */
> unsigned more_io:1; /* more io to be dispatched */
> + /* flags which control the write_cache_pages behaviour */
> + int writeback_flags;
> };

Doesn't seem in line with the bitflag method currently used in struct
writeback_control.

-chris


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2008-10-10 16:11:55

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [patch 4/8] mm: write_cache_pages type overflow fix

On Fri, Oct 10, 2008 at 09:24:47PM +0530, Aneesh Kumar K.V wrote:
> On Fri, Oct 10, 2008 at 10:08:29AM -0400, Christoph Hellwig wrote:
> > On Fri, Oct 10, 2008 at 10:05:35AM -0400, Theodore Tso wrote:
> > > 3) A version which (optionally via a flag in the wbc structure)
> > > instructs write_cache_pages() to not pursue those updates. This has
> > > not been written yet.
> >
> > This one sounds best to me (although we'd have to actualy see it..)
>
> something like the below ?
>
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index bd91987..7599af2 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -63,6 +63,8 @@ struct writeback_control {
> unsigned for_writepages:1; /* This is a writepages() call */
> unsigned range_cyclic:1; /* range_start is cyclic */
> unsigned more_io:1; /* more io to be dispatched */
> + /* flags which control the write_cache_pages behaviour */
> + int writeback_flags;
> };

I don't see a definition for WB_NO_NRWRITE_UPDATE and
WB_NO_INDEX_UPDATE in your patch?

Given the structure seems to be using bitfields for all of the other
fields, why not do this instead?

unsigned no_nrwrite_update:1;
unsigned no_index_update:1;

Personally, I'm old school, and prefer using an int flag field and
using #define's for flags, but the rest of the structure is using
bitfields for flags, and it's probably better to be consistent...


- Ted

2008-10-10 16:38:19

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch 4/8] mm: write_cache_pages type overflow fix

On Fri, Oct 10, 2008 at 09:24:47PM +0530, Aneesh Kumar K.V wrote:
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index bd91987..7599af2 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -63,6 +63,8 @@ struct writeback_control {
> unsigned for_writepages:1; /* This is a writepages() call */
> unsigned range_cyclic:1; /* range_start is cyclic */
> unsigned more_io:1; /* more io to be dispatched */
> + /* flags which control the write_cache_pages behaviour */
> + int writeback_flags;

As Ted already said please follow the bitfields style already used.

> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -876,11 +876,18 @@ int write_cache_pages(struct address_space *mapping,
> pgoff_t end; /* Inclusive */
> int scanned = 0;
> int range_whole = 0;
> + int flags = wbc->writeback_flags;
> + long *nr_to_write, count;
>
> if (wbc->nonblocking && bdi_write_congested(bdi)) {
> wbc->encountered_congestion = 1;
> return 0;
> }
> + if (flags & WB_NO_NRWRITE_UPDATE) {
> + count = wbc->nr_to_write;
> + nr_to_write = &count;
> + } else
> + nr_to_write = &wbc->nr_to_write;

I think we'd be better off always using a local variable and updating
wbc->nr_to_write again before the exit for the !WB_NO_NRWRITE_UPDATE
case.

> - if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
> + if ((wbc->range_cyclic ||
> + (range_whole && wbc->nr_to_write > 0)) &&
> + (flags & ~WB_NO_INDEX_UPDATE)) {
> mapping->writeback_index = index;

The conditional looks rather odd, what about:

if (!wbc->no_index_update &&
(wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))

Also I wonder what this is for. Do you want what Chris did in his
original patch in ext4 code, or is there another reason to not update
the writeback_index sometimes?