2005-11-03 19:27:51

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH] mm: poison struct page for ptlock

The split ptlock patch enlarged the default SMP PREEMPT struct page from
32 to 36 bytes on most 32-bit platforms, from 32 to 44 bytes on PA-RISC
7xxx (without PREEMPT). That was not my intention, and I don't believe
that split ptlock deserves any such slice of the user's memory.

While leaving most of the page_private() mods in place for the moment,
could we please try this patch, or something like it? Again to overlay
the spinlock_t from &page->private onwards, with corrected BUILD_BUG_ON
that we don't go beyond ->lru; with poisoning of the fields overlaid,
and unsplit config verifying that the split config is safe to use them.

Signed-off-by: Hugh Dickins <[email protected]>
---

include/linux/mm.h | 69 +++++++++++++++++++++++++++++++++++++++++++----------
1 files changed, 56 insertions(+), 13 deletions(-)

--- 2.6.14-git6/include/linux/mm.h 2005-11-03 18:38:01.000000000 +0000
+++ linux/include/linux/mm.h 2005-11-03 18:46:06.000000000 +0000
@@ -226,18 +226,19 @@ struct page {
* to show when page is mapped
* & limit reverse map searches.
*/
- union {
- unsigned long private; /* Mapping-private opaque data:
+ unsigned long private; /* Mapping-private opaque data:
* usually used for buffer_heads
* if PagePrivate set; used for
* swp_entry_t if PageSwapCache
* When page is free, this indicates
* order in the buddy system.
*/
-#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
- spinlock_t ptl;
-#endif
- } u;
+ /*
+ * Along with private, the mapping, index and lru fields of a
+ * page table page's struct page may be overlaid by a spinlock
+ * for pte locking: see comment on "split ptlock" below. Please
+ * do not rearrange these fields without considering that usage.
+ */
struct address_space *mapping; /* If low bit clear, points to
* inode address_space, or NULL.
* If page mapped as anonymous
@@ -265,8 +266,8 @@ struct page {
#endif /* WANT_PAGE_VIRTUAL */
};

-#define page_private(page) ((page)->u.private)
-#define set_page_private(page, v) ((page)->u.private = (v))
+#define page_private(page) ((page)->private)
+#define set_page_private(page, v) ((page)->private = (v))

/*
* FIXME: take this include out, include page-flags.h in
@@ -787,25 +788,67 @@ static inline pmd_t *pmd_alloc(struct mm
}
#endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */

+/*
+ * In the split ptlock case, we shall be overlaying the struct page
+ * of a page table page with a spinlock starting at &page->private,
+ * ending dependent on architecture and config, but never beyond lru.
+ *
+ * So poison the struct page in all cases (in part to assert our
+ * territory: that pte locking owns these fields of a page table
+ * struct page), and verify it when freeing in the unsplit ptlock
+ * case, when none of these fields should have been touched.
+ * Poison lru back-to-front, to make sure list_del was not used.
+ *
+ * The time may come when important configs requiring split ptlock
+ * have a spinlock_t which cannot fit here: then kmalloc a spinlock_t
+ * (perhaps in its own cacheline) and keep the pointer in struct page.
+ */
+static inline void poison_struct_page(struct page *page)
+{
+ page->private = (unsigned long) page;
+ page->mapping = (struct address_space *) page;
+ page->index = (pgoff_t) page;
+ page->lru.next = LIST_POISON2;
+ page->lru.prev = LIST_POISON1;
+}
+
+static inline void verify_struct_page(struct page *page)
+{
+ BUG_ON(page->private != (unsigned long) page);
+ BUG_ON(page->mapping != (struct address_space *) page);
+ BUG_ON(page->index != (pgoff_t) page);
+ BUG_ON(page->lru.next != LIST_POISON2);
+ BUG_ON(page->lru.prev != LIST_POISON1);
+ /*
+ * Reset page->mapping so free_pages_check won't complain.
+ */
+ page->mapping = NULL;
+}
+
#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
/*
* We tuck a spinlock to guard each pagetable page into its struct page,
* at page->private, with BUILD_BUG_ON to make sure that this will not
- * overflow into the next struct page (as it might with DEBUG_SPINLOCK).
- * When freeing, reset page->mapping so free_pages_check won't complain.
+ * overflow beyond page->lru (as it might with PA-RISC DEBUG_SPINLOCK).
*/
-#define __pte_lockptr(page) &((page)->u.ptl)
+#define __pte_lockptr(page) ((spinlock_t *)&((page)->private))
#define pte_lock_init(_page) do { \
+ BUILD_BUG_ON(__pte_lockptr((struct page *)0) + 1 > \
+ (spinlock_t *)(&((struct page *)0)->lru + 1)); \
+ poison_struct_page(_page); \
spin_lock_init(__pte_lockptr(_page)); \
} while (0)
+/*
+ * When freeing, reset page->mapping so free_pages_check won't complain.
+ */
#define pte_lock_deinit(page) ((page)->mapping = NULL)
#define pte_lockptr(mm, pmd) ({(void)(mm); __pte_lockptr(pmd_page(*(pmd)));})
#else
/*
* We use mm->page_table_lock to guard all pagetable pages of the mm.
*/
-#define pte_lock_init(page) do {} while (0)
-#define pte_lock_deinit(page) do {} while (0)
+#define pte_lock_init(page) poison_struct_page(page)
+#define pte_lock_deinit(page) verify_struct_page(page)
#define pte_lockptr(mm, pmd) ({(void)(pmd); &(mm)->page_table_lock;})
#endif /* NR_CPUS < CONFIG_SPLIT_PTLOCK_CPUS */


2005-11-05 05:32:34

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock

Hugh Dickins <[email protected]> wrote:
>
> The split ptlock patch enlarged the default SMP PREEMPT struct page from
> 32 to 36 bytes on most 32-bit platforms, from 32 to 44 bytes on PA-RISC
> 7xxx (without PREEMPT). That was not my intention, and I don't believe
> that split ptlock deserves any such slice of the user's memory.

Only with preempt and >= 4 CPUs. Vendors don't ship preemptible kernels,
especially on SMP. So the impact is minor.

> While leaving most of the page_private() mods in place for the moment,
> could we please try this patch, or something like it? Again to overlay
> the spinlock_t from &page->private onwards, with corrected BUILD_BUG_ON
> that we don't go beyond ->lru; with poisoning of the fields overlaid,
> and unsplit config verifying that the split config is safe to use them.

Does your family know you do this sort of thing?

> --- 2.6.14-git6/include/linux/mm.h 2005-11-03 18:38:01.000000000 +0000
> +++ linux/include/linux/mm.h 2005-11-03 18:46:06.000000000 +0000
> @@ -226,18 +226,19 @@ struct page {
> * to show when page is mapped
> * & limit reverse map searches.
> */
> - union {
> - unsigned long private; /* Mapping-private opaque data:
> + unsigned long private; /* Mapping-private opaque data:
> * usually used for buffer_heads
> * if PagePrivate set; used for
> * swp_entry_t if PageSwapCache
> * When page is free, this indicates
> * order in the buddy system.
> */
> -#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
> - spinlock_t ptl;
> -#endif
> - } u;
> + /*
> + * Along with private, the mapping, index and lru fields of a
> + * page table page's struct page may be overlaid by a spinlock
> + * for pte locking: see comment on "split ptlock" below. Please
> + * do not rearrange these fields without considering that usage.
> + */
> struct address_space *mapping; /* If low bit clear, points to
> * inode address_space, or NULL.
> * If page mapped as anonymous

What happened to my suggestion that we use anonymous structs here, and
abandon gcc-2.9x?

> @@ -265,8 +266,8 @@ struct page {
> #endif /* WANT_PAGE_VIRTUAL */
> };
>
> -#define page_private(page) ((page)->u.private)
> -#define set_page_private(page, v) ((page)->u.private = (v))
> +#define page_private(page) ((page)->private)
> +#define set_page_private(page, v) ((page)->private = (v))

Need to rename ->private to ->_private here, otherwise people will start
using page->private again.

2005-11-05 06:41:46

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock

On Fri, 4 Nov 2005, Andrew Morton wrote:
> Hugh Dickins <[email protected]> wrote:
>
> > While leaving most of the page_private() mods in place for the moment,
> > could we please try this patch, or something like it? Again to overlay
> > the spinlock_t from &page->private onwards, with corrected BUILD_BUG_ON
> > that we don't go beyond ->lru; with poisoning of the fields overlaid,
> > and unsplit config verifying that the split config is safe to use them.
>
> Does your family know you do this sort of thing?

They are beginning to suspect it, so I've made sure they're still asleep.

> What happened to my suggestion that we use anonymous structs here, and
> abandon gcc-2.9x?

Much as I'd hate split ptlock to enlarge struct page, I'd hate it to be
the last straw that forced you off gcc-2.9x: it does not deserve that.

But also, I just misunderstood you until now: I'd been stupidly thinking
you meant an anonymous union in place of your current u there; whereas
you meant extending the union to cover mapping, index, lru. Hmm.

Well, if you're actively seeking a excuse to abandon gcc-2.9x? But
although you're its best-known advocate, I think you're not alone.
Shall we wait for a more pressing reason?

> > -#define page_private(page) ((page)->u.private)
> > -#define set_page_private(page, v) ((page)->u.private = (v))
> > +#define page_private(page) ((page)->private)
> > +#define set_page_private(page, v) ((page)->private = (v))
>
> Need to rename ->private to ->_private here, otherwise people will start
> using page->private again.

I'm happy for people to use page->private again. I thought we
ought to try out this minimal patch first, for reassurance that the
arches are not using those fields of a pagetable struct page; but that
once it's had some exposure, we'd revert the page_private mods elsewhere.
There's no point to that wrapper with the union gone, is there?

Hugh

2005-11-05 07:17:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock

Hugh Dickins <[email protected]> wrote:
>
> I'm happy for people to use page->private again. I thought we
> ought to try out this minimal patch first, for reassurance that the
> arches are not using those fields of a pagetable struct page; but that
> once it's had some exposure, we'd revert the page_private mods elsewhere.
> There's no point to that wrapper with the union gone, is there?

I'd be inclined to leave it there - we wrap access to most of the rest of
struct page. It might come in handy for the next brainwave - dunno.

2005-11-06 19:28:51

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock

Hugh Dickins <[email protected]> wrote:
>
> The split ptlock patch enlarged the default SMP PREEMPT struct page from
> 32 to 36 bytes on most 32-bit platforms, from 32 to 44 bytes on PA-RISC
> 7xxx (without PREEMPT). That was not my intention, and I don't believe
> that split ptlock deserves any such slice of the user's memory.
>
> While leaving most of the page_private() mods in place for the moment,
> could we please try this patch, or something like it? Again to overlay
> the spinlock_t from &page->private onwards, with corrected BUILD_BUG_ON
> that we don't go beyond ->lru; with poisoning of the fields overlaid,
> and unsplit config verifying that the split config is safe to use them.
>

This patch makes the ppc64 crash. See
http://www.zip.com.au/~akpm/linux/patches/stuff/dsc02976.jpg

I don't know what the access address was (ia32 nicely tells you), but if
it's `DAR' then we have LIST_POISON1. Which would indicate that the slab
page which backs the mm_struct itself is getting freed-up-pte-page
treatment, which is deeply screwed up.

I'll try it on x86_64 and ia64, see if it's specific to ppc64.

>
> include/linux/mm.h | 69 +++++++++++++++++++++++++++++++++++++++++++----------
> 1 files changed, 56 insertions(+), 13 deletions(-)
>
> --- 2.6.14-git6/include/linux/mm.h 2005-11-03 18:38:01.000000000 +0000
> +++ linux/include/linux/mm.h 2005-11-03 18:46:06.000000000 +0000
> @@ -226,18 +226,19 @@ struct page {
> * to show when page is mapped
> * & limit reverse map searches.
> */
> - union {
> - unsigned long private; /* Mapping-private opaque data:
> + unsigned long private; /* Mapping-private opaque data:
> * usually used for buffer_heads
> * if PagePrivate set; used for
> * swp_entry_t if PageSwapCache
> * When page is free, this indicates
> * order in the buddy system.
> */
> -#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
> - spinlock_t ptl;
> -#endif
> - } u;
> + /*
> + * Along with private, the mapping, index and lru fields of a
> + * page table page's struct page may be overlaid by a spinlock
> + * for pte locking: see comment on "split ptlock" below. Please
> + * do not rearrange these fields without considering that usage.
> + */
> struct address_space *mapping; /* If low bit clear, points to
> * inode address_space, or NULL.
> * If page mapped as anonymous
> @@ -265,8 +266,8 @@ struct page {
> #endif /* WANT_PAGE_VIRTUAL */
> };
>
> -#define page_private(page) ((page)->u.private)
> -#define set_page_private(page, v) ((page)->u.private = (v))
> +#define page_private(page) ((page)->private)
> +#define set_page_private(page, v) ((page)->private = (v))
>
> /*
> * FIXME: take this include out, include page-flags.h in
> @@ -787,25 +788,67 @@ static inline pmd_t *pmd_alloc(struct mm
> }
> #endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
>
> +/*
> + * In the split ptlock case, we shall be overlaying the struct page
> + * of a page table page with a spinlock starting at &page->private,
> + * ending dependent on architecture and config, but never beyond lru.
> + *
> + * So poison the struct page in all cases (in part to assert our
> + * territory: that pte locking owns these fields of a page table
> + * struct page), and verify it when freeing in the unsplit ptlock
> + * case, when none of these fields should have been touched.
> + * Poison lru back-to-front, to make sure list_del was not used.
> + *
> + * The time may come when important configs requiring split ptlock
> + * have a spinlock_t which cannot fit here: then kmalloc a spinlock_t
> + * (perhaps in its own cacheline) and keep the pointer in struct page.
> + */
> +static inline void poison_struct_page(struct page *page)
> +{
> + page->private = (unsigned long) page;
> + page->mapping = (struct address_space *) page;
> + page->index = (pgoff_t) page;
> + page->lru.next = LIST_POISON2;
> + page->lru.prev = LIST_POISON1;
> +}
> +
> +static inline void verify_struct_page(struct page *page)
> +{
> + BUG_ON(page->private != (unsigned long) page);
> + BUG_ON(page->mapping != (struct address_space *) page);
> + BUG_ON(page->index != (pgoff_t) page);
> + BUG_ON(page->lru.next != LIST_POISON2);
> + BUG_ON(page->lru.prev != LIST_POISON1);
> + /*
> + * Reset page->mapping so free_pages_check won't complain.
> + */
> + page->mapping = NULL;
> +}
> +
> #if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
> /*
> * We tuck a spinlock to guard each pagetable page into its struct page,
> * at page->private, with BUILD_BUG_ON to make sure that this will not
> - * overflow into the next struct page (as it might with DEBUG_SPINLOCK).
> - * When freeing, reset page->mapping so free_pages_check won't complain.
> + * overflow beyond page->lru (as it might with PA-RISC DEBUG_SPINLOCK).
> */
> -#define __pte_lockptr(page) &((page)->u.ptl)
> +#define __pte_lockptr(page) ((spinlock_t *)&((page)->private))
> #define pte_lock_init(_page) do { \
> + BUILD_BUG_ON(__pte_lockptr((struct page *)0) + 1 > \
> + (spinlock_t *)(&((struct page *)0)->lru + 1)); \
> + poison_struct_page(_page); \
> spin_lock_init(__pte_lockptr(_page)); \
> } while (0)
> +/*
> + * When freeing, reset page->mapping so free_pages_check won't complain.
> + */
> #define pte_lock_deinit(page) ((page)->mapping = NULL)
> #define pte_lockptr(mm, pmd) ({(void)(mm); __pte_lockptr(pmd_page(*(pmd)));})
> #else
> /*
> * We use mm->page_table_lock to guard all pagetable pages of the mm.
> */
> -#define pte_lock_init(page) do {} while (0)
> -#define pte_lock_deinit(page) do {} while (0)
> +#define pte_lock_init(page) poison_struct_page(page)
> +#define pte_lock_deinit(page) verify_struct_page(page)
> #define pte_lockptr(mm, pmd) ({(void)(pmd); &(mm)->page_table_lock;})
> #endif /* NR_CPUS < CONFIG_SPLIT_PTLOCK_CPUS */
>

2005-11-06 19:35:44

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock

On Sun, Nov 06, 2005 at 11:28:38AM -0800, Andrew Morton wrote:

> This patch makes the ppc64 crash. See
> http://www.zip.com.au/~akpm/linux/patches/stuff/dsc02976.jpg
>
> I don't know what the access address was (ia32 nicely tells you), but if
> it's `DAR' then we have LIST_POISON1. Which would indicate that the slab

Yes, DAR is the faulting address on PPC. I'll submit a patch to improve
our Oops message later today unless someone beats me to it. :)


-Olof

2005-11-06 19:37:46

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock


> This patch makes the ppc64 crash. See
> http://www.zip.com.au/~akpm/linux/patches/stuff/dsc02976.jpg
>
> I don't know what the access address was (ia32 nicely tells you), but if
> it's `DAR' then we have LIST_POISON1. Which would indicate that the slab
> page which backs the mm_struct itself is getting freed-up-pte-page
> treatment, which is deeply screwed up.

Yep on a 300 exception the DAR is the faulting address.

We should spit out a line like x86 does at the start of the oops to make
it clear.

Anton

2005-11-06 22:39:33

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock

Andrew Morton writes:

> I don't know what the access address was (ia32 nicely tells you), but if
> it's `DAR' then we have LIST_POISON1.

Yes, DAR is the access address.

Paul.

2005-11-06 22:57:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock

Andrew Morton <[email protected]> wrote:
>
> Hugh Dickins <[email protected]> wrote:
> >
> > The split ptlock patch enlarged the default SMP PREEMPT struct page from
> > 32 to 36 bytes on most 32-bit platforms, from 32 to 44 bytes on PA-RISC
> > 7xxx (without PREEMPT). That was not my intention, and I don't believe
> > that split ptlock deserves any such slice of the user's memory.
> >
> > While leaving most of the page_private() mods in place for the moment,
> > could we please try this patch, or something like it? Again to overlay
> > the spinlock_t from &page->private onwards, with corrected BUILD_BUG_ON
> > that we don't go beyond ->lru; with poisoning of the fields overlaid,
> > and unsplit config verifying that the split config is safe to use them.
> >
>
> This patch makes the ppc64 crash. See
> http://www.zip.com.au/~akpm/linux/patches/stuff/dsc02976.jpg
>
> I don't know what the access address was (ia32 nicely tells you), but if
> it's `DAR' then we have LIST_POISON1. Which would indicate that the slab
> page which backs the mm_struct itself is getting freed-up-pte-page
> treatment, which is deeply screwed up.
>
> I'll try it on x86_64 and ia64, see if it's specific to ppc64.

Yup, the patch works OK on x86, x86_64 and ia64.

2005-11-06 22:59:26

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock

On Sun, 6 Nov 2005, Andrew Morton wrote:
>
> This patch makes the ppc64 crash. See
> http://www.zip.com.au/~akpm/linux/patches/stuff/dsc02976.jpg
>
> I don't know what the access address was (ia32 nicely tells you), but if
> it's `DAR' then we have LIST_POISON1. Which would indicate that the slab
> page which backs the mm_struct itself is getting freed-up-pte-page
> treatment, which is deeply screwed up.
>
> I'll try it on x86_64 and ia64, see if it's specific to ppc64.

I think it'll turn out to be (my patch, yes, but) the way mm/slab.c does

#define SET_PAGE_CACHE(pg,x) ((pg)->lru.next = (struct list_head *)(x))
#define GET_PAGE_CACHE(pg) ((kmem_cache_t *)(pg)->lru.next)
#define SET_PAGE_SLAB(pg,x) ((pg)->lru.prev = (struct list_head *)(x))
#define GET_PAGE_SLAB(pg) ((struct slab *)(pg)->lru.prev)

and needs those fields preserved while that page is in the slab.
Though I've not tried to work out why it crashes on an mm_struct.

I'd checked that none of the architectures were using those page fields
of a page table page, but never considered that slab was using them: my
patch probably breaks all those which use slab for their page tables.

Drat. I'm trying to think of the best way to retrieve the situation.
The priority must be for you to get 2.6.14-mm1 out: is the easiest for
now simply to revert my patch (and the _private one(s) you added on top)?

Well, at least that patch has told us something we needed to know:
sorry for wasting _your_ time with it. I'll try to dream up some other
way (or config restriction) to avoid enlarging struct page for ptlock.

Or am I jumping to conclusions and on the wrong track?

Hugh

2005-11-06 23:13:37

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock

Hugh Dickins <[email protected]> wrote:
>
> On Sun, 6 Nov 2005, Andrew Morton wrote:
> >
> > This patch makes the ppc64 crash. See
> > http://www.zip.com.au/~akpm/linux/patches/stuff/dsc02976.jpg
> >
> > I don't know what the access address was (ia32 nicely tells you), but if
> > it's `DAR' then we have LIST_POISON1. Which would indicate that the slab
> > page which backs the mm_struct itself is getting freed-up-pte-page
> > treatment, which is deeply screwed up.
> >
> > I'll try it on x86_64 and ia64, see if it's specific to ppc64.
>
> I think it'll turn out to be (my patch, yes, but) the way mm/slab.c does
>
> #define SET_PAGE_CACHE(pg,x) ((pg)->lru.next = (struct list_head *)(x))
> #define GET_PAGE_CACHE(pg) ((kmem_cache_t *)(pg)->lru.next)
> #define SET_PAGE_SLAB(pg,x) ((pg)->lru.prev = (struct list_head *)(x))
> #define GET_PAGE_SLAB(pg) ((struct slab *)(pg)->lru.prev)
>
> and needs those fields preserved while that page is in the slab.
> Though I've not tried to work out why it crashes on an mm_struct.
>
> I'd checked that none of the architectures were using those page fields
> of a page table page, but never considered that slab was using them: my
> patch probably breaks all those which use slab for their page tables.

Ah, of course, yes. pagetable pages which come from slab have a live
page.lru even while the memory is in use by the caller.

> Drat. I'm trying to think of the best way to retrieve the situation.

I suspect a slab-based fix/workaround would be unpleasant. Simpler to not
use slab for pagetable pages.

I doubt if there's much benefit to pagetable-pages-in-slab, really. It
_used_ to make sense because slab has the per-cpu LIFO magazines. But now
the page allocator has them too, it's probably better to rely upon that
magazine to provide cache-warm pages.

> The priority must be for you to get 2.6.14-mm1 out: is the easiest for
> now simply to revert my patch (and the _private one(s) you added on top)?

yup, when I can get the steaming pile to compile.

2005-11-06 23:37:57

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock

From: Andrew Morton <[email protected]>
Date: Sun, 6 Nov 2005 15:13:26 -0800

> I doubt if there's much benefit to pagetable-pages-in-slab, really.

It helps for D-cache coloring, that's mainly why ppc uses them
this way I think.

I would do the same on sparc64 if I didn't have to virtually
map the page tables for the fast TLB miss handlers.

2005-11-06 23:50:04

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock

On Sun, 6 Nov 2005, Andrew Morton wrote:
> Hugh Dickins <[email protected]> wrote:
> >
> > I'd checked that none of the architectures were using those page fields
> > of a page table page, but never considered that slab was using them: my
> > patch probably breaks all those which use slab for their page tables.
>
> Ah, of course, yes. pagetable pages which come from slab have a live
> page.lru even while the memory is in use by the caller.

arm26 seems to be the only arch which uses slab for pts other than ppc64.

But (without fully working it out) I think sparc (32) may also use page
parts: never mind slab using page->lru, I'm utterly broken using fields
of one struct page for two or more page tables, union or overlay or not.

So as well as reverting my poisonous patch, we'll need the patch at the
bottom, at least for now. We could almost keep my poisonous patch (but
I expect you're well on your way reverting it), except for how it uses
the unsplit config to test the split - we could suppress its poisoning
and verification when ARM26 || SPARC32 || PPC64. If reverting turns
out to be tiresome, then that would be another direction to try.

> > Drat. I'm trying to think of the best way to retrieve the situation.
>
> I suspect a slab-based fix/workaround would be unpleasant. Simpler to not
> use slab for pagetable pages.
>
> I doubt if there's much benefit to pagetable-pages-in-slab, really. It
> _used_ to make sense because slab has the per-cpu LIFO magazines. But now
> the page allocator has them too, it's probably better to rely upon that
> magazine to provide cache-warm pages.

I think they use slab, not for speed, but because they only need a
fraction of PAGE_SIZE for the page table - easy to imagine that in
the case of Ben's 64kB ppc64 pages, and he did mention that he was
trying to get away from the idea that a page table was a page.

> > The priority must be for you to get 2.6.14-mm1 out: is the easiest for
> > now simply to revert my patch (and the _private one(s) you added on top)?
>
> yup, when I can get the steaming pile to compile.

Suppress split ptlock on arches which may use one page for multiple page
tables. Reconsider what better to do (particularly on ppc64) later on.

Signed-off-by: Hugh Dickins <[email protected]>

--- 2.6.14-git/mm/Kconfig 2005-11-05 16:03:24.000000000 +0000
+++ linux/mm/Kconfig 2005-11-06 23:32:23.000000000 +0000
@@ -126,9 +126,11 @@ comment "Memory hotplug is currently inc
# Default to 4 for wider testing, though 8 might be more appropriate.
# ARM's adjust_pte (unused if VIPT) depends on mm-wide page_table_lock.
# PA-RISC's debug spinlock_t is too large for the 32-bit struct page.
+# ARM26 and SPARC32 and PPC64 may use one page for multiple page tables.
#
config SPLIT_PTLOCK_CPUS
int
default "4096" if ARM && !CPU_CACHE_VIPT
default "4096" if PARISC && DEBUG_SPINLOCK && !64BIT
+ default "4096" if ARM26 || SPARC32 || PPC64
default "4"

2005-11-07 00:00:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock

Hugh Dickins <[email protected]> wrote:
>
> Suppress split ptlock on arches which may use one page for multiple page
> tables. Reconsider what better to do (particularly on ppc64) later on.

But why is that a problem per-se? A few "pagetable pages" will share the
same lock. Deadlocky when we take two pagetable locks?

The code ran OK on ppc64, fwiw.

2005-11-07 00:16:26

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] mm: poison struct page for ptlock

On Sun, 6 Nov 2005, Andrew Morton wrote:
> Hugh Dickins <[email protected]> wrote:
> >
> > Suppress split ptlock on arches which may use one page for multiple page
> > tables. Reconsider what better to do (particularly on ppc64) later on.
>
> But why is that a problem per-se? A few "pagetable pages" will share the
> same lock. Deadlocky when we take two pagetable locks?

Good point, it may not be as disastrous as I was imagining. Certainly
deadlocky in copy_page_range and mremap move at present, but that can
be easily surmounted. However, there is the slab lru issue too: if it's
convenient for an arch to use slab cache, I'd prefer not to make life
difficult for them. I'll come back to this but not tonight.

Hugh