2006-03-02 04:52:41

by David Gibson

[permalink] [raw]
Subject: hugepage: Fix hugepage logic in free_pgtables()

free_pgtables() has special logic to call hugetlb_free_pgd_range()
instead of the normal free_pgd_range() on hugepage VMAs. However, the
test it uses to do so is incorrect: it calls is_hugepage_only_range on
a hugepage sized range at the start of the vma.
is_hugepage_only_range() will return true if the given range has any
intersection with a hugepage address region, and in this case the
given region need not be hugepage aligned. So, for example, this test
can return true if called on, say, a 4k VMA immediately preceding a
(nicely aligned) hugepage VMA.

At present we get away with this because the powerpc version of
hugetlb_free_pgd_range() is just a call to free_pgd_range(). On ia64
(the only other arch with a non-trivial is_hugepage_only_range()) we
get away with it for a different reason; the hugepage area is not
contiguous with the rest of the user address space, and VMAs are not
permitted in between, so the test can't return a false positive there.

Nonetheless this should be fixed. We do that in the patch below by
replacing the is_hugepage_only_range() test with an explicit test of
the VMA using is_vm_hugetlb_page().

This in turn changes behaviour for platforms where
is_hugepage_only_range() returns false always (everything except
powerpc and ia64). We address this by ensuring that
hugetlb_free_pgd_range() is defined to be identical to
free_pgd_range() (instead of a no-op) on everything except ia64. Even
so, it will prevent some otherwise possible coalescing of calls down
to free_pgd_range(). Since this only happens for hugepage VMAs,
removing this small optimization seems unlikely to cause any trouble.

This patch causes no regressions on the libhugetlbfs testsuite - ppc64
POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP).

Signed-off-by: David Gibson <[email protected]>

Index: working-2.6/mm/memory.c
===================================================================
--- working-2.6.orig/mm/memory.c 2006-02-24 11:44:36.000000000 +1100
+++ working-2.6/mm/memory.c 2006-03-02 11:14:03.000000000 +1100
@@ -277,7 +277,7 @@ void free_pgtables(struct mmu_gather **t
anon_vma_unlink(vma);
unlink_file_vma(vma);

- if (is_hugepage_only_range(vma->vm_mm, addr, HPAGE_SIZE)) {
+ if (is_vm_hugetlb_page(vma)) {
hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
floor, next? next->vm_start: ceiling);
} else {
@@ -285,8 +285,7 @@ void free_pgtables(struct mmu_gather **t
* Optimization: gather nearby vmas into one call down
*/
while (next && next->vm_start <= vma->vm_end + PMD_SIZE
- && !is_hugepage_only_range(vma->vm_mm, next->vm_start,
- HPAGE_SIZE)) {
+ && !is_vm_hugetlb_page(vma)) {
vma = next;
next = vma->vm_next;
anon_vma_unlink(vma);
Index: working-2.6/include/asm-ia64/page.h
===================================================================
--- working-2.6.orig/include/asm-ia64/page.h 2006-03-02 11:12:40.000000000 +1100
+++ working-2.6/include/asm-ia64/page.h 2006-03-02 11:30:26.000000000 +1100
@@ -57,6 +57,7 @@

# define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
# define ARCH_HAS_HUGEPAGE_ONLY_RANGE
+# define ARCH_HAS_HUGETLB_FREE_PGD_RANGE
#endif /* CONFIG_HUGETLB_PAGE */

#ifdef __ASSEMBLY__
Index: working-2.6/include/asm-powerpc/pgtable.h
===================================================================
--- working-2.6.orig/include/asm-powerpc/pgtable.h 2006-02-24 11:44:35.000000000 +1100
+++ working-2.6/include/asm-powerpc/pgtable.h 2006-03-02 11:29:26.000000000 +1100
@@ -468,11 +468,6 @@ extern pgd_t swapper_pg_dir[];

extern void paging_init(void);

-#ifdef CONFIG_HUGETLB_PAGE
-#define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) \
- free_pgd_range(tlb, addr, end, floor, ceiling)
-#endif
-
/*
* This gets called at the end of handling a page fault, when
* the kernel has put a new PTE into the page table for the process.
Index: working-2.6/include/linux/hugetlb.h
===================================================================
--- working-2.6.orig/include/linux/hugetlb.h 2006-03-02 11:12:40.000000000 +1100
+++ working-2.6/include/linux/hugetlb.h 2006-03-02 11:47:30.000000000 +1100
@@ -43,8 +43,10 @@ void hugetlb_change_protection(struct vm

#ifndef ARCH_HAS_HUGEPAGE_ONLY_RANGE
#define is_hugepage_only_range(mm, addr, len) 0
-#define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) \
- do { } while (0)
+#endif
+
+#ifndef ARCH_HAS_HUGETLB_FREE_PGD_RANGE
+#define hugetlb_free_pgd_range free_pgd_range
#endif

#ifndef ARCH_HAS_PREPARE_HUGEPAGE_RANGE
@@ -93,8 +95,7 @@ static inline unsigned long hugetlb_tota
#define prepare_hugepage_range(addr, len) (-EINVAL)
#define pmd_huge(x) 0
#define is_hugepage_only_range(mm, addr, len) 0
-#define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) \
- do { } while (0)
+#define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; })
#define hugetlb_fault(mm, vma, addr, write) ({ BUG(); 0; })

#define hugetlb_change_protection(vma, address, end, newprot)

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


2006-03-02 18:52:43

by Hugh Dickins

[permalink] [raw]
Subject: Re: hugepage: Fix hugepage logic in free_pgtables()

On Thu, 2 Mar 2006, 'David Gibson' wrote:

> free_pgtables() has special logic to call hugetlb_free_pgd_range()
> instead of the normal free_pgd_range() on hugepage VMAs. However, the
> test it uses to do so is incorrect: it calls is_hugepage_only_range on
> a hugepage sized range at the start of the vma.
> is_hugepage_only_range() will return true if the given range has any
> intersection with a hugepage address region, and in this case the
> given region need not be hugepage aligned. So, for example, this test
> can return true if called on, say, a 4k VMA immediately preceding a
> (nicely aligned) hugepage VMA.
>
> At present we get away with this because the powerpc version of
> hugetlb_free_pgd_range() is just a call to free_pgd_range(). On ia64
> (the only other arch with a non-trivial is_hugepage_only_range()) we
> get away with it for a different reason; the hugepage area is not
> contiguous with the rest of the user address space, and VMAs are not
> permitted in between, so the test can't return a false positive there.
>
> Nonetheless this should be fixed. We do that in the patch below by
> replacing the is_hugepage_only_range() test with an explicit test of
> the VMA using is_vm_hugetlb_page().
>
> This in turn changes behaviour for platforms where
> is_hugepage_only_range() returns false always (everything except
> powerpc and ia64). We address this by ensuring that
> hugetlb_free_pgd_range() is defined to be identical to
> free_pgd_range() (instead of a no-op) on everything except ia64. Even
> so, it will prevent some otherwise possible coalescing of calls down
> to free_pgd_range(). Since this only happens for hugepage VMAs,
> removing this small optimization seems unlikely to cause any trouble.
>
> This patch causes no regressions on the libhugetlbfs testsuite - ppc64
> POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP).
>
> Signed-off-by: David Gibson <[email protected]>

Yes, okay, you can add my

Acked-by: Hugh Dickins <[email protected]>

(ARCH_HAS... and HAVE_ARCH... have fallen into disfavour, but I
don't think you're doing wrong by splitting the old one into two.)

But let me emphasize again, in case Andrew wonders, that no current bug
is fixed by this (as indeed you indicate in your "we get away with this"
comments).

Whereas there's still a real ia64 get_unmapped_area bug to be fixed,
arising from the same confusion, that is_hugepage_only_range needs
to mean overlaps_hugepage_only_range (as on PowerPC) rather than
within_hugepage_only_range (as on IA64). Is Ken fixing that one?

Hugh

> Index: working-2.6/mm/memory.c
> ===================================================================
> --- working-2.6.orig/mm/memory.c 2006-02-24 11:44:36.000000000 +1100
> +++ working-2.6/mm/memory.c 2006-03-02 11:14:03.000000000 +1100
> @@ -277,7 +277,7 @@ void free_pgtables(struct mmu_gather **t
> anon_vma_unlink(vma);
> unlink_file_vma(vma);
>
> - if (is_hugepage_only_range(vma->vm_mm, addr, HPAGE_SIZE)) {
> + if (is_vm_hugetlb_page(vma)) {
> hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
> floor, next? next->vm_start: ceiling);
> } else {
> @@ -285,8 +285,7 @@ void free_pgtables(struct mmu_gather **t
> * Optimization: gather nearby vmas into one call down
> */
> while (next && next->vm_start <= vma->vm_end + PMD_SIZE
> - && !is_hugepage_only_range(vma->vm_mm, next->vm_start,
> - HPAGE_SIZE)) {
> + && !is_vm_hugetlb_page(vma)) {
> vma = next;
> next = vma->vm_next;
> anon_vma_unlink(vma);
> Index: working-2.6/include/asm-ia64/page.h
> ===================================================================
> --- working-2.6.orig/include/asm-ia64/page.h 2006-03-02 11:12:40.000000000 +1100
> +++ working-2.6/include/asm-ia64/page.h 2006-03-02 11:30:26.000000000 +1100
> @@ -57,6 +57,7 @@
>
> # define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
> # define ARCH_HAS_HUGEPAGE_ONLY_RANGE
> +# define ARCH_HAS_HUGETLB_FREE_PGD_RANGE
> #endif /* CONFIG_HUGETLB_PAGE */
>
> #ifdef __ASSEMBLY__
> Index: working-2.6/include/asm-powerpc/pgtable.h
> ===================================================================
> --- working-2.6.orig/include/asm-powerpc/pgtable.h 2006-02-24 11:44:35.000000000 +1100
> +++ working-2.6/include/asm-powerpc/pgtable.h 2006-03-02 11:29:26.000000000 +1100
> @@ -468,11 +468,6 @@ extern pgd_t swapper_pg_dir[];
>
> extern void paging_init(void);
>
> -#ifdef CONFIG_HUGETLB_PAGE
> -#define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) \
> - free_pgd_range(tlb, addr, end, floor, ceiling)
> -#endif
> -
> /*
> * This gets called at the end of handling a page fault, when
> * the kernel has put a new PTE into the page table for the process.
> Index: working-2.6/include/linux/hugetlb.h
> ===================================================================
> --- working-2.6.orig/include/linux/hugetlb.h 2006-03-02 11:12:40.000000000 +1100
> +++ working-2.6/include/linux/hugetlb.h 2006-03-02 11:47:30.000000000 +1100
> @@ -43,8 +43,10 @@ void hugetlb_change_protection(struct vm
>
> #ifndef ARCH_HAS_HUGEPAGE_ONLY_RANGE
> #define is_hugepage_only_range(mm, addr, len) 0
> -#define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) \
> - do { } while (0)
> +#endif
> +
> +#ifndef ARCH_HAS_HUGETLB_FREE_PGD_RANGE
> +#define hugetlb_free_pgd_range free_pgd_range
> #endif
>
> #ifndef ARCH_HAS_PREPARE_HUGEPAGE_RANGE
> @@ -93,8 +95,7 @@ static inline unsigned long hugetlb_tota
> #define prepare_hugepage_range(addr, len) (-EINVAL)
> #define pmd_huge(x) 0
> #define is_hugepage_only_range(mm, addr, len) 0
> -#define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) \
> - do { } while (0)
> +#define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; })
> #define hugetlb_fault(mm, vma, addr, write) ({ BUG(); 0; })
>
> #define hugetlb_change_protection(vma, address, end, newprot)
>
> --
> David Gibson | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
> | _way_ _around_!
> http://www.ozlabs.org/~dgibson
>

2006-03-02 19:43:09

by Chen, Kenneth W

[permalink] [raw]
Subject: RE: hugepage: Fix hugepage logic in free_pgtables()

Hugh Dickins wrote on Thursday, March 02, 2006 10:53 AM
> On Thu, 2 Mar 2006, 'David Gibson' wrote:
> > free_pgtables() has special logic to call hugetlb_free_pgd_range()
> > instead of the normal free_pgd_range() on hugepage VMAs. However, the
> > test it uses to do so is incorrect: it calls is_hugepage_only_range on
> > a hugepage sized range at the start of the vma.
> > is_hugepage_only_range() will return true if the given range has any
> > intersection with a hugepage address region, and in this case the
> > given region need not be hugepage aligned. So, for example, this test
> > can return true if called on, say, a 4k VMA immediately preceding a
> > (nicely aligned) hugepage VMA.
> >
> > At present we get away with this because the powerpc version of
> > hugetlb_free_pgd_range() is just a call to free_pgd_range(). On ia64
> > (the only other arch with a non-trivial is_hugepage_only_range()) we
> > get away with it for a different reason; the hugepage area is not
> > contiguous with the rest of the user address space, and VMAs are not
> > permitted in between, so the test can't return a false positive there.
> >
> > Nonetheless this should be fixed. We do that in the patch below by
> > replacing the is_hugepage_only_range() test with an explicit test of
> > the VMA using is_vm_hugetlb_page().
> >
> Yes, okay, you can add my
>
> Acked-by: Hugh Dickins <[email protected]>
>
> (ARCH_HAS... and HAVE_ARCH... have fallen into disfavour, but I
> don't think you're doing wrong by splitting the old one into two.)
>
> But let me emphasize again, in case Andrew wonders, that no current bug
> is fixed by this (as indeed you indicate in your "we get away with this"
> comments).


I've double checked that David's patch is OK for ia64.


> Whereas there's still a real ia64 get_unmapped_area bug to be fixed,
> arising from the same confusion, that is_hugepage_only_range needs
> to mean overlaps_hugepage_only_range (as on PowerPC) rather than
> within_hugepage_only_range (as on IA64). Is Ken fixing that one?


Yes, I'm fixing it. See patch below.


[patch] ia64: fix is_hugepage_only_range() definition to be overlaps
instead of within architectural restricted hugetlb address
range. Fix all affected usages.

Signed-off-by: Ken Chen <[email protected]>

--- ./include/asm-ia64/page.h.orig 2006-03-02 12:16:00.636688455 -0800
+++ ./include/asm-ia64/page.h 2006-03-02 12:23:30.151331386 -0800
@@ -147,7 +147,7 @@ typedef union ia64_va {
| (REGION_OFFSET(x) >> (HPAGE_SHIFT-PAGE_SHIFT)))
# define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
# define is_hugepage_only_range(mm, addr, len) \
- (REGION_NUMBER(addr) == RGN_HPAGE && \
+ (REGION_NUMBER(addr) == RGN_HPAGE || \
REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE)
extern unsigned int hpage_shift;
#endif
--- ./arch/ia64/mm/hugetlbpage.c.orig 2006-03-02 12:31:12.020466353 -0800
+++ ./arch/ia64/mm/hugetlbpage.c 2006-03-02 12:31:02.944294589 -0800
@@ -112,8 +112,7 @@ void hugetlb_free_pgd_range(struct mmu_g
unsigned long floor, unsigned long ceiling)
{
/*
- * This is called only when is_hugepage_only_range(addr,),
- * and it follows that is_hugepage_only_range(end,) also.
+ * This is called to free hugetlb page tables.
*
* The offset of these addresses from the base of the hugetlb
* region must be scaled down by HPAGE_SIZE/PAGE_SIZE so that
@@ -125,9 +124,9 @@ void hugetlb_free_pgd_range(struct mmu_g

addr = htlbpage_to_page(addr);
end = htlbpage_to_page(end);
- if (is_hugepage_only_range(tlb->mm, floor, HPAGE_SIZE))
+ if (REGION_NUMBER(floor) == RGN_HPAGE)
floor = htlbpage_to_page(floor);
- if (is_hugepage_only_range(tlb->mm, ceiling, HPAGE_SIZE))
+ if (REGION_NUMBER(ceiling) == RGN_HPAGE)
ceiling = htlbpage_to_page(ceiling);

free_pgd_range(tlb, addr, end, floor, ceiling);





2006-03-02 20:26:40

by Hugh Dickins

[permalink] [raw]
Subject: RE: hugepage: Fix hugepage logic in free_pgtables()

On Thu, 2 Mar 2006, Chen, Kenneth W wrote:
> Hugh Dickins wrote on Thursday, March 02, 2006 10:53 AM
>
> > Whereas there's still a real ia64 get_unmapped_area bug to be fixed,
> > arising from the same confusion, that is_hugepage_only_range needs
> > to mean overlaps_hugepage_only_range (as on PowerPC) rather than
> > within_hugepage_only_range (as on IA64). Is Ken fixing that one?
>
> Yes, I'm fixing it. See patch below.

Great, thanks. The second part, using REGION_NUMBER instead of
is_hugepage_only_range in the ia64 hugetlb_free_pgd_range, looks nice.

But the first part, || instead of && in is_hugepage_only_range, looks
insufficient: the start and end of the range might each fall in a
non-huge region, but the range still cross a huge region.

Ah, does RGN_HPAGE nestle up against the TASK_SIZE roof, so any range
already tested against TASK_SIZE (as get_unmapped_area has) cannot
cross RGN_HPAGE? If so, perhaps it deserves a comment there. And
if that is so, and can be relied upon, is_hugepage_only_range need
only be testing REGION_NUMBER(addr+len-1) - but it does seem fragile.

Hugh

>
> [patch] ia64: fix is_hugepage_only_range() definition to be overlaps
> instead of within architectural restricted hugetlb address
> range. Fix all affected usages.
>
> Signed-off-by: Ken Chen <[email protected]>
>
> --- ./include/asm-ia64/page.h.orig 2006-03-02 12:16:00.636688455 -0800
> +++ ./include/asm-ia64/page.h 2006-03-02 12:23:30.151331386 -0800
> @@ -147,7 +147,7 @@ typedef union ia64_va {
> | (REGION_OFFSET(x) >> (HPAGE_SHIFT-PAGE_SHIFT)))
> # define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
> # define is_hugepage_only_range(mm, addr, len) \
> - (REGION_NUMBER(addr) == RGN_HPAGE && \
> + (REGION_NUMBER(addr) == RGN_HPAGE || \
> REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE)
> extern unsigned int hpage_shift;
> #endif
> --- ./arch/ia64/mm/hugetlbpage.c.orig 2006-03-02 12:31:12.020466353 -0800
> +++ ./arch/ia64/mm/hugetlbpage.c 2006-03-02 12:31:02.944294589 -0800
> @@ -112,8 +112,7 @@ void hugetlb_free_pgd_range(struct mmu_g
> unsigned long floor, unsigned long ceiling)
> {
> /*
> - * This is called only when is_hugepage_only_range(addr,),
> - * and it follows that is_hugepage_only_range(end,) also.
> + * This is called to free hugetlb page tables.
> *
> * The offset of these addresses from the base of the hugetlb
> * region must be scaled down by HPAGE_SIZE/PAGE_SIZE so that
> @@ -125,9 +124,9 @@ void hugetlb_free_pgd_range(struct mmu_g
>
> addr = htlbpage_to_page(addr);
> end = htlbpage_to_page(end);
> - if (is_hugepage_only_range(tlb->mm, floor, HPAGE_SIZE))
> + if (REGION_NUMBER(floor) == RGN_HPAGE)
> floor = htlbpage_to_page(floor);
> - if (is_hugepage_only_range(tlb->mm, ceiling, HPAGE_SIZE))
> + if (REGION_NUMBER(ceiling) == RGN_HPAGE)
> ceiling = htlbpage_to_page(ceiling);
>
> free_pgd_range(tlb, addr, end, floor, ceiling);

2006-03-02 21:30:39

by Chen, Kenneth W

[permalink] [raw]
Subject: RE: hugepage: Fix hugepage logic in free_pgtables()

Hugh Dickins wrote on Thursday, March 02, 2006 12:27 PM
> But the first part, || instead of && in is_hugepage_only_range, looks
> insufficient: the start and end of the range might each fall in a
> non-huge region, but the range still cross a huge region.
>
> Ah, does RGN_HPAGE nestle up against the TASK_SIZE roof, so any range
> already tested against TASK_SIZE (as get_unmapped_area has) cannot
> cross RGN_HPAGE? If so, perhaps it deserves a comment there. And
> if that is so, and can be relied upon, is_hugepage_only_range need
> only be testing REGION_NUMBER(addr+len-1) - but it does seem fragile.

There are many address range check before we hit get_unmapped area.
ia64 can never have a vma range that crosses region boundary. David
pointed out earlier that shmat and mremap can still slip through the
crack and he has a patch that fixed it. But yes, this patch is making
that assumption (or relying on checks being done properly beforehand).

- Ken

2006-03-02 23:15:01

by David Gibson

[permalink] [raw]
Subject: Re: hugepage: Fix hugepage logic in free_pgtables()

On Thu, Mar 02, 2006 at 11:42:15AM -0800, Chen, Kenneth W wrote:
> Hugh Dickins wrote on Thursday, March 02, 2006 10:53 AM
> > On Thu, 2 Mar 2006, 'David Gibson' wrote:
> > > free_pgtables() has special logic to call hugetlb_free_pgd_range()
> > > instead of the normal free_pgd_range() on hugepage VMAs. However, the
> > > test it uses to do so is incorrect: it calls is_hugepage_only_range on
> > > a hugepage sized range at the start of the vma.
> > > is_hugepage_only_range() will return true if the given range has any
> > > intersection with a hugepage address region, and in this case the
> > > given region need not be hugepage aligned. So, for example, this test
> > > can return true if called on, say, a 4k VMA immediately preceding a
> > > (nicely aligned) hugepage VMA.
> > >
> > > At present we get away with this because the powerpc version of
> > > hugetlb_free_pgd_range() is just a call to free_pgd_range(). On ia64
> > > (the only other arch with a non-trivial is_hugepage_only_range()) we
> > > get away with it for a different reason; the hugepage area is not
> > > contiguous with the rest of the user address space, and VMAs are not
> > > permitted in between, so the test can't return a false positive there.
> > >
> > > Nonetheless this should be fixed. We do that in the patch below by
> > > replacing the is_hugepage_only_range() test with an explicit test of
> > > the VMA using is_vm_hugetlb_page().
> > >
> > Yes, okay, you can add my
> >
> > Acked-by: Hugh Dickins <[email protected]>
> >
> > (ARCH_HAS... and HAVE_ARCH... have fallen into disfavour, but I
> > don't think you're doing wrong by splitting the old one into two.)
> >
> > But let me emphasize again, in case Andrew wonders, that no current bug
> > is fixed by this (as indeed you indicate in your "we get away with this"
> > comments).
>
> I've double checked that David's patch is OK for ia64.

Speaking of checking things on ia64, it would be really nice if
someone could see if libhugetlbfs and its testsuite can be made to
work on ia64. It should be easy for at least the basics - I just have
no machine to try on. Full support will mean hacking up linker
scripts which will be a bit tricker.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

2006-03-02 23:14:59

by David Gibson

[permalink] [raw]
Subject: Re: hugepage: Fix hugepage logic in free_pgtables()

On Thu, Mar 02, 2006 at 01:29:50PM -0800, Chen, Kenneth W wrote:
> Hugh Dickins wrote on Thursday, March 02, 2006 12:27 PM
> > But the first part, || instead of && in is_hugepage_only_range, looks
> > insufficient: the start and end of the range might each fall in a
> > non-huge region, but the range still cross a huge region.
> >
> > Ah, does RGN_HPAGE nestle up against the TASK_SIZE roof, so any range
> > already tested against TASK_SIZE (as get_unmapped_area has) cannot
> > cross RGN_HPAGE? If so, perhaps it deserves a comment there. And
> > if that is so, and can be relied upon, is_hugepage_only_range need
> > only be testing REGION_NUMBER(addr+len-1) - but it does seem fragile.
>
> There are many address range check before we hit get_unmapped area.
> ia64 can never have a vma range that crosses region boundary. David
> pointed out earlier that shmat and mremap can still slip through the
> crack and he has a patch that fixed it. But yes, this patch is making
> that assumption (or relying on checks being done properly beforehand).

In fact with that other patch, which ensures that no region-crossing
ranges get through, simply (REGION_NUMBER(addr) == RGN_HPAGE) would be
sufficient; either both start and end are in the hugepage region, or
they're both in the same different region.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson