2004-06-19 16:29:40

by Andrea Arcangeli

[permalink] [raw]
Subject: mincore on anon mappings

Hi David,

here a first (untested) attempt to allow mincore on anon vmas too.

I heard you need this from gcc, right?

(btw, returning -ENOMEM for anon vmas was pretty bogus, -EINVAL or
even -ENOSYS would been more correct)

--- sles/mm/mincore.c.~1~ 2004-02-04 16:07:06.000000000 +0100
+++ sles/mm/mincore.c 2004-06-15 00:42:52.122127224 +0200
@@ -11,6 +11,8 @@
#include <linux/pagemap.h>
#include <linux/mm.h>
#include <linux/mman.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>

#include <asm/uaccess.h>
#include <asm/pgtable.h>
@@ -22,7 +24,7 @@
* and is up to date; i.e. that no page-in operation would be required
* at this time if an application were to map and access this page.
*/
-static unsigned char mincore_page(struct vm_area_struct * vma,
+static unsigned char mincore_page_inode(struct vm_area_struct * vma,
unsigned long pgoff)
{
unsigned char present = 0;
@@ -38,16 +40,74 @@ static unsigned char mincore_page(struct
return present;
}

+/*
+ * Careful this only works with anon-vma for the vma->vm_pgoff settings.
+ */
+static unsigned char mincore_page_anon(struct vm_area_struct * vma,
+ unsigned long pgoff)
+{
+ unsigned char present = 0;
+ struct page * page;
+ struct mm_struct *mm = vma->vm_mm;
+ pgd_t *pgd;
+ pmd_t *pmd;
+ pte_t *ptep, pte;
+ unsigned long address;
+
+ address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+ BUG_ON(address < vma->vm_start || address >= vma->vm_end);
+
+ spin_lock(&mm->page_table_lock);
+
+ pgd = pgd_offset(mm, address);
+ if (!pgd_present(*pgd))
+ goto out_unlock;
+
+ pmd = pmd_offset(pgd, address);
+ if (!pmd_present(*pmd))
+ goto out_unlock;
+
+ ptep = pte_offset_map(pmd, address);
+ pte = *ptep;
+ pte_unmap(ptep);
+
+ spin_unlock(&mm->page_table_lock);
+
+ if (pte_none(pte))
+ goto out;
+ if (!pte_present(pte)) {
+ swp_entry_t entry = pte_to_swp_entry(pte);
+ page = lookup_swap_cache(entry);
+ if (page) {
+ present = PageUptodate(page);
+ page_cache_release(page);
+ }
+ } else
+ present = 1;
+
+ out:
+ return present;
+
+ out_unlock:
+ spin_unlock(&mm->page_table_lock);
+ goto out;
+}
+
+static unsigned char mincore_page(struct vm_area_struct * vma,
+ unsigned long pgoff)
+{
+ if (vma->vm_file)
+ return mincore_page_inode(vma, pgoff);
+ else
+ return mincore_page_anon(vma, pgoff);
+}
+
static long mincore_vma(struct vm_area_struct * vma,
unsigned long start, unsigned long end, unsigned char __user * vec)
{
long error, i, remaining;
unsigned char * tmp;

- error = -ENOMEM;
- if (!vma->vm_file)
- return error;
-
start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
if (end > vma->vm_end)
end = vma->vm_end;


2004-06-19 18:55:22

by David Miller

[permalink] [raw]
Subject: Re: mincore on anon mappings

On Sat, 19 Jun 2004 18:25:03 +0200
Andrea Arcangeli <[email protected]> wrote:

> here a first (untested) attempt to allow mincore on anon vmas too.
>
> I heard you need this from gcc, right?

Sort of. What I think we really need is a new MAP_FIXED that
doesn't wipe out existing mappings.

The issue is that architectures make decisions about passed in
"hint" addresses based upon cache aliasing concerns sometimes.

You can bypass that by using MAP_FIXED, but MAP_FIXED has the nasty
side effect of zapping existing mappings. That's not what the user
wants in this case, the user here wants to tell the cache aliasing
virtual address hint changing logic to just go away.

mincore() is a really inefficient way to accomplish what GCC wants.
What gcc would do with this is basically:

addr = mmap(request_address, ..., len, ...);
if (addr == MMAP_FAIL) {
use_mincore_to_see_if_area_free(request_address, len);
if (really_is_free)
addr = mmap( ... | MAP_FIXED );
}

See? That's terribly inefficient.

Basically what GCC is trying to do is mmap a file at a fixed location.
Per-architecture get_unmapped_area() implementations will change the
requested address, unless MAP_FIXED is set, in order to more efficiently
handle cache aliasing issues in data caches. Using MAP_FIXED has the
unwanted side-effect of munmap()'ing any existing things in that range,
which is not what GCC wants.

Therefore I propose we add a MAP_FORCE which does exactly what GCC wants
which is:

1) The passed in 'hint' address is treated as mandatory, if exactly that
address cannot be used, we fail.

2) Existing areas get in the way, and cause failure.

3) get_unmapped_area() implementations shut off any 'hint' address
modification logic they may have.

I think the mincore() fix is important independant of how GCC eventually
deals with the thing it wants to accomplish.

2004-06-19 21:55:20

by William Lee Irwin III

[permalink] [raw]
Subject: Re: mincore on anon mappings

On Sat, Jun 19, 2004 at 06:25:03PM +0200, Andrea Arcangeli wrote:
> Hi David,
> here a first (untested) attempt to allow mincore on anon vmas too.
> I heard you need this from gcc, right?
> (btw, returning -ENOMEM for anon vmas was pretty bogus, -EINVAL or
> even -ENOSYS would been more correct)

Thanks for cleaning this up. I gave up on fixing mincore after the
nonlinear fixes for it got dropped.


-- wli

2004-06-19 22:04:58

by Andi Kleen

[permalink] [raw]
Subject: Re: mincore on anon mappings

"David S. Miller" <[email protected]> writes:

>
> Therefore I propose we add a MAP_FORCE which does exactly what GCC wants
> which is:
>
> 1) The passed in 'hint' address is treated as mandatory, if exactly that
> address cannot be used, we fail.
>
> 2) Existing areas get in the way, and cause failure.

That sounds unintuitive. I would expect MAP_FORCE to do exactly
that (that is is done by default right now is a different story).
But you want to reverse the meaning.

How about calling it MAP_STRICT or just MAP_CHECK ?

>
> 3) get_unmapped_area() implementations shut off any 'hint' address
> modification logic they may have.

Good idea definitely.


-Andi

2004-06-19 22:51:58

by David Miller

[permalink] [raw]
Subject: Re: mincore on anon mappings

On Sun, 20 Jun 2004 02:05:19 +0200
Andi Kleen <[email protected]> wrote:

> How about calling it MAP_STRICT or just MAP_CHECK ?

MAP_STRICT sounds fine to me.