From: Martin Schwidefsky <[email protected]>
In order to change the layout of the page tables after an mmap has
crossed the adress space limit of the current page table layout a
architecture hook in get_unmapped_area is needed. The arguments
are the address of the new mapping and the length of it.
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
---
mm/mmap.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c
+++ linux-2.6/mm/mmap.c
@@ -36,6 +36,10 @@
#define arch_mmap_check(addr, len, flags) (0)
#endif
+#ifndef arch_rebalance_pgtables
+#define arch_rebalance_pgtables(addr, len) (addr)
+#endif
+
static void unmap_region(struct mm_struct *mm,
struct vm_area_struct *vma, struct vm_area_struct *prev,
unsigned long start, unsigned long end);
@@ -1436,7 +1440,7 @@ get_unmapped_area(struct file *file, uns
if (addr & ~PAGE_MASK)
return -EINVAL;
- return addr;
+ return arch_rebalance_pgtables(addr, len);
}
EXPORT_SYMBOL(get_unmapped_area);
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
On Tuesday 13 November 2007 01:30, [email protected] wrote:
> From: Martin Schwidefsky <[email protected]>
>
> In order to change the layout of the page tables after an mmap has
> crossed the adress space limit of the current page table layout a
> architecture hook in get_unmapped_area is needed. The arguments
> are the address of the new mapping and the length of it.
Can you comment what this is supposed to be fore somewhere?
> Cc: Benjamin Herrenschmidt <[email protected]>
> Signed-off-by: Martin Schwidefsky <[email protected]>
> ---
>
> mm/mmap.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/mm/mmap.c
> ===================================================================
> --- linux-2.6.orig/mm/mmap.c
> +++ linux-2.6/mm/mmap.c
> @@ -36,6 +36,10 @@
> #define arch_mmap_check(addr, len, flags) (0)
> #endif
>
> +#ifndef arch_rebalance_pgtables
> +#define arch_rebalance_pgtables(addr, len) (addr)
> +#endif
> +
> static void unmap_region(struct mm_struct *mm,
> struct vm_area_struct *vma, struct vm_area_struct *prev,
> unsigned long start, unsigned long end);
> @@ -1436,7 +1440,7 @@ get_unmapped_area(struct file *file, uns
> if (addr & ~PAGE_MASK)
> return -EINVAL;
>
> - return addr;
> + return arch_rebalance_pgtables(addr, len);
> }
>
> EXPORT_SYMBOL(get_unmapped_area);
On Tue, 2007-11-13 at 23:33 +1100, Nick Piggin wrote:
> On Tuesday 13 November 2007 01:30, [email protected] wrote:
> > From: Martin Schwidefsky <[email protected]>
> >
> > In order to change the layout of the page tables after an mmap has
> > crossed the adress space limit of the current page table layout a
> > architecture hook in get_unmapped_area is needed. The arguments
> > are the address of the new mapping and the length of it.
>
> Can you comment what this is supposed to be fore somewhere?
This hook is going to be used by the dynamic page table patch for s390:
http://marc.info/?l=linux-mm&m=119333667710539&w=2
That patch allows processes to have different number of page table
levels, 31 bit processes have 2 levels (2GB), normal 64 bit processes
have 3 levels (4TB) and really big 64 bit processes can have 4 levels
(8PB). The downgrade of a page table to use less levels than the parent
process is done in arch_pick_mmap_layout. The upgrade is done by using
the arch_rebalance_pgtables call. I've considered using the
arch_get_unmapped_area but got scared by the indirection in
get_unmapped_area:
get_area = current->mm->get_unmapped_area;
if (file && file->f_op && file->f_op->get_unmapped_area)
get_area = file->f_op->get_unmapped_area;
addr = get_area(file, addr, len, pgoff, flags);
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
On Wed, 2007-11-14 at 10:26 +0100, Martin Schwidefsky wrote:
> That patch allows processes to have different number of page table
> levels, 31 bit processes have 2 levels (2GB), normal 64 bit processes
> have 3 levels (4TB) and really big 64 bit processes can have 4 levels
> (8PB). The downgrade of a page table to use less levels than the
> parent
> process is done in arch_pick_mmap_layout. The upgrade is done by using
> the arch_rebalance_pgtables call. I've considered using the
> arch_get_unmapped_area but got scared by the indirection in
> get_unmapped_area:
>
> get_area = current->mm->get_unmapped_area;
> if (file && file->f_op && file->f_op->get_unmapped_area)
> get_area = file->f_op->get_unmapped_area;
> addr = get_area(file, addr, len, pgoff, flags);
Don't be, it's really only hugetlb and other arch specific stuff that
hook in here on platforms with an MMU (It's also used by /dev/mem etc...
for mmu-less platforms but you don't care).
Ben.
On Wed, 2007-11-14 at 21:06 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2007-11-14 at 10:26 +0100, Martin Schwidefsky wrote:
> > That patch allows processes to have different number of page table
> > levels, 31 bit processes have 2 levels (2GB), normal 64 bit processes
> > have 3 levels (4TB) and really big 64 bit processes can have 4 levels
> > (8PB). The downgrade of a page table to use less levels than the
> > parent
> > process is done in arch_pick_mmap_layout. The upgrade is done by using
> > the arch_rebalance_pgtables call. I've considered using the
> > arch_get_unmapped_area but got scared by the indirection in
> > get_unmapped_area:
> >
> > get_area = current->mm->get_unmapped_area;
> > if (file && file->f_op && file->f_op->get_unmapped_area)
> > get_area = file->f_op->get_unmapped_area;
> > addr = get_area(file, addr, len, pgoff, flags);
>
> Don't be, it's really only hugetlb and other arch specific stuff that
> hook in here on platforms with an MMU (It's also used by /dev/mem etc...
> for mmu-less platforms but you don't care).
I find 8 places where a get_unmapped_area function pointer is used:
ipc/shm.c: shm_get_unmapped_area / shm_file_operations
drivers/char/mem.c: get_unmapped_area_mem / mem_fops & kmem_fops
drivers/video/fbmem.c: get_fb_unmapped_area / fb_fops
drivers/pci/proc.c: get_pci_unmapped_area / proc_bus_pci_operations
fs/hugetlbfs/inode.c: hugetlb_get_unmapped_area / hugetlbfs_file_operations
fs/bad_inode.c: bad_file_get_unmapped_area / bad_file_ops
fs/ramfs/file-nommu.c: ramfs_nommu_get_unmapped_area / ramfs_file_operations
arch/powerpc/platforms/cell/spufs/file.c:
spufs_get_unmapped_area / spufs_mem_fops
They all either have an arch override, call get_unmapped_area again or
are not relevant. So it should be possible to do the upgrade in
arch_get_unmapped_area. I still have my doubts though, all future uses
of the get_unmapped_area pointer have to be checked and I feel it is
easier to understand to do the upgrade / rebalance of the page table at
the end of get_unmapped_area where every caller of mmap is guaranteed to
pass through.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
On Wed, 2007-11-14 at 12:49 +0100, Martin Schwidefsky wrote:
>
> They all either have an arch override, call get_unmapped_area again or
> are not relevant. So it should be possible to do the upgrade in
> arch_get_unmapped_area. I still have my doubts though, all future uses
> of the get_unmapped_area pointer have to be checked and I feel it is
> easier to understand to do the upgrade / rebalance of the page table
> at
> the end of get_unmapped_area where every caller of mmap is guaranteed
> to
> pass through.
Well, if something does what you are worried about, then it would be
broken on powerpc as well (among others). We have various constraints on
the address space layout that must be handled by our arch g_u_a (or our
hugetlb one).
Ben.
On Thu, 2007-11-15 at 09:07 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2007-11-14 at 12:49 +0100, Martin Schwidefsky wrote:
> >
> > They all either have an arch override, call get_unmapped_area again or
> > are not relevant. So it should be possible to do the upgrade in
> > arch_get_unmapped_area. I still have my doubts though, all future uses
> > of the get_unmapped_area pointer have to be checked and I feel it is
> > easier to understand to do the upgrade / rebalance of the page table
> > at
> > the end of get_unmapped_area where every caller of mmap is guaranteed
> > to
> > pass through.
>
> Well, if something does what you are worried about, then it would be
> broken on powerpc as well (among others). We have various constraints on
> the address space layout that must be handled by our arch g_u_a (or our
> hugetlb one).
Ok, I rearranged the dynamic page tables code (and fixed the bug that 31
bit processes had a 3 level page table instead of 2). It is working fine
with s390 specific versions of arch_get_unmapped_area and
arch_get_unmapped_area_topdown which do the page table upgrade.
Which means we can drop the arch_rebalance_pgtables-call.patch from -mm
again.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.