2006-03-03 01:04:45

by David Gibson

[permalink] [raw]
Subject: hugepage: Fix hugepage logic in free_pgtables() harder

Sigh. Turns out the hugepage logic in free_pgtables() was doubly
broken. The loop coalescing multiple normal page VMAs into one call
to free_pgd_range() had an off by one error, which could mean it would
coalesce one hugepage VMA into the same bundle (checking 'vma' not
'next' in the loop). I transferred this bug into the new
is_vm_hugetlb_page() based version. Here's the fix.

This one didn't bite on powerpc previously for the same reason the
is_hugepage_only_range() problem didn't: powerpc's
hugetlb_free_pgd_range() is identical to free_pgd_range(). It didn't
bite on ia64 because the hugepage region is distant enough from any
other region that the separated PMD_SIZE distance test would always
prevent coalescing the two together.

No libhugetlbfs testsuite regressions (ppc64, POWER5).

Signed-off-by: David Gibson <[email protected]>

Index: working-2.6/mm/memory.c
===================================================================
--- working-2.6.orig/mm/memory.c 2006-03-03 11:39:33.000000000 +1100
+++ working-2.6/mm/memory.c 2006-03-03 11:39:50.000000000 +1100
@@ -285,7 +285,7 @@ void free_pgtables(struct mmu_gather **t
* Optimization: gather nearby vmas into one call down
*/
while (next && next->vm_start <= vma->vm_end + PMD_SIZE
- && !is_vm_hugetlb_page(vma)) {
+ && !is_vm_hugetlb_page(next)) {
vma = next;
next = vma->vm_next;
anon_vma_unlink(vma);


--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


2006-03-03 05:18:04

by Hugh Dickins

[permalink] [raw]
Subject: Re: hugepage: Fix hugepage logic in free_pgtables() harder

On Fri, 3 Mar 2006, 'David Gibson' wrote:

> Sigh. Turns out the hugepage logic in free_pgtables() was doubly
> broken. The loop coalescing multiple normal page VMAs into one call
> to free_pgd_range() had an off by one error, which could mean it would
> coalesce one hugepage VMA into the same bundle (checking 'vma' not
> 'next' in the loop). I transferred this bug into the new
> is_vm_hugetlb_page() based version. Here's the fix.
>
> This one didn't bite on powerpc previously for the same reason the
> is_hugepage_only_range() problem didn't: powerpc's
> hugetlb_free_pgd_range() is identical to free_pgd_range(). It didn't
> bite on ia64 because the hugepage region is distant enough from any
> other region that the separated PMD_SIZE distance test would always
> prevent coalescing the two together.

I agree with your patch, but not with your comment: it's just a fix
to your earlier patch, there's no such off-by-one in the mainline
free_pgtables. Probably you were misled by my use of "vma->vm_mm"
rather than "next->vm_mm", equal but admittedly confusing, when
looking at the "next" vma.

Hugh

2006-03-03 05:26:43

by David Gibson

[permalink] [raw]
Subject: Re: hugepage: Fix hugepage logic in free_pgtables() harder

On Fri, Mar 03, 2006 at 05:18:51AM +0000, Hugh Dickins wrote:
> On Fri, 3 Mar 2006, 'David Gibson' wrote:
>
> > Sigh. Turns out the hugepage logic in free_pgtables() was doubly
> > broken. The loop coalescing multiple normal page VMAs into one call
> > to free_pgd_range() had an off by one error, which could mean it would
> > coalesce one hugepage VMA into the same bundle (checking 'vma' not
> > 'next' in the loop). I transferred this bug into the new
> > is_vm_hugetlb_page() based version. Here's the fix.
> >
> > This one didn't bite on powerpc previously for the same reason the
> > is_hugepage_only_range() problem didn't: powerpc's
> > hugetlb_free_pgd_range() is identical to free_pgd_range(). It didn't
> > bite on ia64 because the hugepage region is distant enough from any
> > other region that the separated PMD_SIZE distance test would always
> > prevent coalescing the two together.
>
> I agree with your patch, but not with your comment: it's just a fix
> to your earlier patch, there's no such off-by-one in the mainline
> free_pgtables. Probably you were misled by my use of "vma->vm_mm"
> rather than "next->vm_mm", equal but admittedly confusing, when
> looking at the "next" vma.

Ah, yes, indeed. The bug's all my fault, but it's still a bug.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson