2005-12-09 22:09:25

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH] PageCompound avoid page[1].mapping

If a compound page has its own put_page_testzero destructor (the only
current example is free_huge_page), that is noted in page[1].mapping of
the compound page. But David Gibson's recent fix to access_process_vm
shows that to be rather a poor place to keep it: functions which call
set_page_dirty(_lock) after get_user_pages ought to check !PageCompound
first, otherwise set_page_dirty may crash on what's not the address of
a struct address_space; but Infiniband for one is unaware of this issue.

Even if we fixed all callers, or set_page_dirty(_lock) itself, it would
still be unsatisfactory: e.g. get_user_pages calls flush_dcache_page,
which involves page->mapping on some architectures - not a problem while
hugetlb goes its own way in get_user_pages, but needs a test if another
compound page destructor were added. page->mapping is used too widely
to be a safe field to reuse in this way.

The safest field to reuse, given how PageCompound redirects callers to
the page count of the first page, is actually the _count field of the
second page: save order (only used for debug) there, and move destructor
address from mapping to index. Add __page_count inline for internal
debug use - to avoid reliance on page_private when page is in doubt.

Revert David's mod to access_process_vm, no longer required. But leave
the PageCompound tests in fs/bio.c and fs/direct-io.c: perhaps those are
worthwhile optimizations when working on hugetlb areas.

Signed-off-by: Hugh Dickins <[email protected]>
---

kernel/ptrace.c | 3 +--
mm/hugetlb.c | 5 ++---
mm/page_alloc.c | 25 ++++++++++++++++++-------
mm/swap.c | 2 +-
4 files changed, 22 insertions(+), 13 deletions(-)

--- 2.6.15-rc5/kernel/ptrace.c 2005-12-04 06:48:17.000000000 +0000
+++ linux/kernel/ptrace.c 2005-12-09 20:45:21.000000000 +0000
@@ -241,8 +241,7 @@ int access_process_vm(struct task_struct
if (write) {
copy_to_user_page(vma, page, addr,
maddr + offset, buf, bytes);
- if (!PageCompound(page))
- set_page_dirty_lock(page);
+ set_page_dirty_lock(page);
} else {
copy_from_user_page(vma, page, addr,
buf, maddr + offset, bytes);
--- 2.6.15-rc5/mm/hugetlb.c 2005-12-04 06:48:17.000000000 +0000
+++ linux/mm/hugetlb.c 2005-12-09 20:45:21.000000000 +0000
@@ -78,7 +78,7 @@ void free_huge_page(struct page *page)
BUG_ON(page_count(page));

INIT_LIST_HEAD(&page->lru);
- page[1].mapping = NULL;
+ page[1].index = 0; /* reset dtor */

spin_lock(&hugetlb_lock);
enqueue_huge_page(page);
@@ -98,7 +98,7 @@ struct page *alloc_huge_page(void)
}
spin_unlock(&hugetlb_lock);
set_page_count(page, 1);
- page[1].mapping = (void *)free_huge_page;
+ page[1].index = (unsigned long)free_huge_page; /* set dtor */
for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i)
clear_highpage(&page[i]);
return page;
@@ -147,7 +147,6 @@ static void update_and_free_page(struct
page[i].flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced |
1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
1 << PG_private | 1<< PG_writeback);
- set_page_count(&page[i], 0);
}
set_page_count(page, 1);
__free_pages(page, HUGETLB_PAGE_ORDER);
--- 2.6.15-rc5/mm/page_alloc.c 2005-12-04 06:48:17.000000000 +0000
+++ linux/mm/page_alloc.c 2005-12-09 20:45:21.000000000 +0000
@@ -122,13 +122,22 @@ static int bad_range(struct zone *zone,
return 0;
}

+/*
+ * Ignore PageCompound when checking page_count for debugging -
+ * page_private might be corrupt; but never expose this to wider use.
+ */
+static inline int __page_count(struct page *page)
+{
+ return atomic_read(&page->_count) + 1;
+}
+
static void bad_page(const char *function, struct page *page)
{
printk(KERN_EMERG "Bad page state at %s (in process '%s', page %p)\n",
function, current->comm, page);
printk(KERN_EMERG "flags:0x%0*lx mapping:%p mapcount:%d count:%d\n",
(int)(2*sizeof(unsigned long)), (unsigned long)page->flags,
- page->mapping, page_mapcount(page), page_count(page));
+ page->mapping, page_mapcount(page), __page_count(page));
printk(KERN_EMERG "Backtrace:\n");
dump_stack();
printk(KERN_EMERG "Trying to fix it up, but a reboot is needed\n");
@@ -157,10 +166,10 @@ static void bad_page(const char *functio
* All pages have PG_compound set. All pages have their ->private pointing at
* the head page (even the head page has this).
*
- * The first tail page's ->mapping, if non-zero, holds the address of the
+ * The first tail page's ->index, if non-zero, holds the address of the
* compound page's put_page() function.
*
- * The order of the allocation is stored in the first tail page's ->index
+ * The order of the allocation is stored in the first tail page's ->count.
* This is only for debug at present. This usage means that zero-order pages
* may not be compound.
*/
@@ -169,8 +178,9 @@ static void prep_compound_page(struct pa
int i;
int nr_pages = 1 << order;

- page[1].mapping = NULL;
- page[1].index = order;
+ set_page_count(&page[1], order);
+ page[1].index = 0; /* reset dtor */
+
for (i = 0; i < nr_pages; i++) {
struct page *p = page + i;

@@ -187,8 +197,9 @@ static void destroy_compound_page(struct
if (!PageCompound(page))
return;

- if (page[1].index != order)
+ if (__page_count(&page[1]) != order)
bad_page(__FUNCTION__, page);
+ set_page_count(&page[1], 0);

for (i = 0; i < nr_pages; i++) {
struct page *p = page + i;
@@ -403,7 +414,7 @@ void __free_pages_ok(struct page *page,

#ifndef CONFIG_MMU
if (order > 0)
- for (i = 1 ; i < (1 << order) ; ++i)
+ for (i = 1 + PageCompound(page); i < (1 << order); ++i)
__put_page(page + i);
#endif

--- 2.6.15-rc5/mm/swap.c 2005-12-04 06:48:17.000000000 +0000
+++ linux/mm/swap.c 2005-12-09 20:45:21.000000000 +0000
@@ -41,7 +41,7 @@ void put_page(struct page *page)
if (put_page_testzero(page)) {
void (*dtor)(struct page *page);

- dtor = (void (*)(struct page *))page[1].mapping;
+ dtor = (void (*)(struct page *))page[1].index;
(*dtor)(page);
}
return;


2005-12-09 22:35:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] PageCompound avoid page[1].mapping

Hugh Dickins <[email protected]> wrote:
>
> If a compound page has its own put_page_testzero destructor (the only
> current example is free_huge_page), that is noted in page[1].mapping of
> the compound page. But David Gibson's recent fix to access_process_vm
> shows that to be rather a poor place to keep it: functions which call
> set_page_dirty(_lock) after get_user_pages ought to check !PageCompound
> first, otherwise set_page_dirty may crash on what's not the address of
> a struct address_space; but Infiniband for one is unaware of this issue.
>
> Even if we fixed all callers, or set_page_dirty(_lock) itself, it would
> still be unsatisfactory: e.g. get_user_pages calls flush_dcache_page,
> which involves page->mapping on some architectures - not a problem while
> hugetlb goes its own way in get_user_pages, but needs a test if another
> compound page destructor were added. page->mapping is used too widely
> to be a safe field to reuse in this way.
>
> The safest field to reuse, given how PageCompound redirects callers to
> the page count of the first page, is actually the _count field of the
> second page: save order (only used for debug) there, and move destructor
> address from mapping to index. Add __page_count inline for internal
> debug use - to avoid reliance on page_private when page is in doubt.
>
> Revert David's mod to access_process_vm, no longer required. But leave
> the PageCompound tests in fs/bio.c and fs/direct-io.c: perhaps those are
> worthwhile optimizations when working on hugetlb areas.
>
> ...
> - page[1].mapping = (void *)free_huge_page;
> + page[1].index = (unsigned long)free_huge_page; /* set dtor */

This is a little awkward, IMO. page.index is actually pgoff_t, not
unsigned long. Conceivably someday someone may want to make pgoff_t 64-bit
on 32-bit machines, for example. Yes, that'll still work, but it's still
awkward.

Is it not possible to put the dtor address into some address-type field
within the pageframe rather than into an integer-type?

Or just leave it using page.mapping? Anyone who uses page.mapping of a
tail page in a compound page should go oops.

> +/*
> + * Ignore PageCompound when checking page_count for debugging -
> + * page_private might be corrupt; but never expose this to wider use.
> + */
> +static inline int __page_count(struct page *page)
> +{
> + return atomic_read(&page->_count) + 1;
> +}

hm, spose so. Nick has a patch which unskews page.count which will need
updating for this.

<checks>

Looks like I didn't apply Nick's patch yet - mm/ in -mm has a nutty amount
of stuff in it now.

2005-12-10 07:03:39

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] PageCompound avoid page[1].mapping

On Fri, Dec 09, 2005 at 09:56:42PM +0000, Hugh Dickins wrote:
> - page[1].mapping = (void *)free_huge_page;
> + page[1].index = (unsigned long)free_huge_page; /* set dtor */
> for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i)
> clear_highpage(&page[i]);
> return page;

Hmm, ->lru would be nicer. What prompted the use of ->index?


-- wli

2005-12-12 00:27:18

by David Gibson

[permalink] [raw]
Subject: Re: [PATCH] PageCompound avoid page[1].mapping

On Fri, Dec 09, 2005 at 09:56:42PM +0000, Hugh Dickins wrote:
> If a compound page has its own put_page_testzero destructor (the only
> current example is free_huge_page), that is noted in page[1].mapping of
> the compound page. But David Gibson's recent fix to access_process_vm
> shows that to be rather a poor place to keep it: functions which call
> set_page_dirty(_lock) after get_user_pages ought to check !PageCompound
> first, otherwise set_page_dirty may crash on what's not the address of
> a struct address_space; but Infiniband for one is unaware of this issue.
>
> Even if we fixed all callers, or set_page_dirty(_lock) itself, it would
> still be unsatisfactory: e.g. get_user_pages calls flush_dcache_page,
> which involves page->mapping on some architectures - not a problem while
> hugetlb goes its own way in get_user_pages, but needs a test if another
> compound page destructor were added. page->mapping is used too widely
> to be a safe field to reuse in this way.
>
> The safest field to reuse, given how PageCompound redirects callers to
> the page count of the first page, is actually the _count field of the
> second page: save order (only used for debug) there, and move destructor
> address from mapping to index. Add __page_count inline for internal
> debug use - to avoid reliance on page_private when page is in doubt.

It's not clear to me this is a good way to go. It will make things
work neatly in the case where the correct action on a compound page is
to do nothing (and we already have a mapping == NULL test). However
it will fail silently in cases where we actually need to get at the
mapping for a compound page (so we need to check for CompoundPage and
find the master page). The existing code will probably blow up in
that case, at least showing there's a problem.

> Revert David's mod to access_process_vm, no longer required. But leave
> the PageCompound tests in fs/bio.c and fs/direct-io.c: perhaps those are
> worthwhile optimizations when working on hugetlb areas.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

2005-12-12 10:47:01

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] PageCompound avoid page[1].mapping

William Lee Irwin III wrote:
> On Fri, Dec 09, 2005 at 09:56:42PM +0000, Hugh Dickins wrote:
>
>>- page[1].mapping = (void *)free_huge_page;
>>+ page[1].index = (unsigned long)free_huge_page; /* set dtor */
>> for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i)
>> clear_highpage(&page[i]);
>> return page;
>
>
> Hmm, ->lru would be nicer. What prompted the use of ->index?
>

Yes, agreed. That would rid you of __page_count() too.

Send instant messages to your online friends http://au.messenger.yahoo.com