2003-05-13 21:41:38

by Paul E. McKenney

[permalink] [raw]
Subject: [RFC][PATCH] vm_operation to avoid pagefault/inval race

This patch adds a vm_operations_struct function pointer that allows
networked and distributed filesystems to avoid a race between a
pagefault on an mmap and an invalidation request from some other
node. The race goes as follows:

1. A user process on node A accesses a portion of a mapped
file, resulting in a page fault. The pagefault handler
invokes the corresponding nopage function, which reads
the page into memory.

2. A user process on node B writes to the same portion of
the file (either via mmap or write()), therefore sending
node A an invalidation request to node A.

3. Node A receives this invalidate request, and dutifully
invalidates all mmaps. Except for the one that has
not yet been fully mapped by step 1.

4. Node A then executes the rest of do_no_page(), entering
the now-invalid page into the PTEs.

5. One way or another, life is now hard.

One solution would be for the distributed filesystem to hold
onto a lock or semaphore upon return from the nopage function.
The problem is that there is no way to determine (in a timely
fashion) when it safe to release this lock or semaphore.

The attached patch addresses this by adding a nopagedone
function for when do_no_page() exits. The filesystem may then
drop the lock or semaphore in this nopagedone function.

Thoughts? Is there some other existing way to get this done?

Thanx, Paul


diff -urN -X dontdiff linux-2.5.69/include/linux/mm.h linux-2.5.69.stmmap/include/linux/mm.h
--- linux-2.5.69/include/linux/mm.h Sun May 4 16:53:00 2003
+++ linux-2.5.69.stmmap/include/linux/mm.h Fri May 9 09:30:37 2003
@@ -134,6 +134,7 @@
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int unused);
+ void (*nopagedone)(struct vm_area_struct * area, unsigned long address, int status);
int (*populate)(struct vm_area_struct * area, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock);
};

diff -urN -X dontdiff linux-2.5.69/mm/memory.c linux-2.5.69.stmmap/mm/memory.c
--- linux-2.5.69/mm/memory.c Sun May 4 16:53:14 2003
+++ linux-2.5.69.stmmap/mm/memory.c Fri May 9 17:04:09 2003
@@ -1426,6 +1487,9 @@
ret = VM_FAULT_OOM;
out:
pte_chain_free(pte_chain);
+ if (vma->vm_ops && vma->vm_ops->nopagedone) {
+ vma->vm_ops->nopagedone(vma, address & PAGE_MASK, ret);
+ }
return ret;
}


2003-05-17 15:53:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC][PATCH] vm_operation to avoid pagefault/inval race

On Tue, May 13, 2003 at 01:53:26PM -0700, Paul E. McKenney wrote:
> This patch adds a vm_operations_struct function pointer that allows
> networked and distributed filesystems to avoid a race between a
> pagefault on an mmap and an invalidation request from some other
> node. The race goes as follows:

The race is real although currenly no in-tree filesystem is affected.
The patch is uglyh as hell, though. The right fix is to change the
->nopage method to cover what do_no_page is currently, change anonymous
vmas to have vm_ops as well and set ->nopage to do_anonymous_page.

The gets of the current do_no_page become a new helper (__finish_nopage?)
and EXPORT_SYMBOL_GPL()ed. It would also be nice if you could point to
a filesystem that actually needs this, but if you can get rid of the
do_anonymous_page special casing a patch might even be acceptable without it.

2003-05-17 18:08:59

by Daniel Phillips

[permalink] [raw]
Subject: Re: [RFC][PATCH] vm_operation to avoid pagefault/inval race

Please don't take lack of response for lack of interest. The generic issue
here is "what are the vfs changes needed to support cross-host mmap?". You
defined the problem nicely.

>
> [...]
>
> 5. One way or another, life is now hard.

Indeed. In brief, ->nopage just doesn't provide adequate coverage to support
the cross-host lock.

> One solution would be for the distributed filesystem to hold
> onto a lock or semaphore upon return from the nopage function.
> The problem is that there is no way to determine (in a timely
> fashion) when it safe to release this lock or semaphore.
>
> The attached patch addresses this by adding a nopagedone
> function for when do_no_page() exits. The filesystem may then
> drop the lock or semaphore in this nopagedone function.
>
> Thoughts? Is there some other existing way to get this done?

There is. One way is to make all of do_no_page a hook, and clearly this is
more generic than what you proposed, since it covers your hook and the rest
can be done with library calls. Once you've gone there, the next question to
ask is "what use is the existing ->nopage" hook, and the answer is: none,
really. The existing usage of ->nopage can be replaced by ->do_no_page plus
library code, and the only problem is, we have to change pretty well every
filesystem in and out of tree. So that gets a little, em, interesting from
the 2.6.0 point of view, which is why I cc'd Andrew on this. Christoph has
also expressed interest in this, which explains the other cc.

Any clustered filesystem that wants to support posix mmap is going to need
this hook, so the sooner we hash this out, the better.

Regards,

Daniel

2003-05-17 19:35:03

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC][PATCH] vm_operation to avoid pagefault/inval race

Daniel Phillips <[email protected]> wrote:
>
> and the only problem is, we have to change pretty well every
> filesystem in and out of tree.

But it's only a one-liner per fs.


2003-05-20 02:11:39

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC][PATCH] vm_operation to avoid pagefault/inval race

On Sat, May 17, 2003 at 12:49:48PM -0700, Andrew Morton wrote:
> Daniel Phillips <[email protected]> wrote:
> >
> > and the only problem is, we have to change pretty well every
> > filesystem in and out of tree.
>
> But it's only a one-liner per fs.

So the general idea is to do something as follows, right?
(Sorry for not just putting together a patch -- I want
to make sure I understand all of your advice first!)

o Make all callers to do_no_page() instead call
vma->vm_ops->nopage().

o Make a function, perhaps named something like
install_new_page(), that does the PTE-installation
and RSS-adjustment tasks currently performed by
both do_no_page() and by do_anonymous_page().
(Not clear to me yet whether a full merge of
these two functions is the right approach, more
thought needed. Note that the nopage function
is implicitly aware of whether it is handling
an anonymous page or not, so a pair of functions
that both call another function containing the
common code is reasonable, if warranted.)

The install_new_page() function needs an additional
argument to accept the new_page value that used
to be returned by the nopage() function.

o Add arguments to nopage() to allow it to invoke
install_new_page().

o Change all nopage() functions to invoke install_new_page(),
but only in cases where they would -not- return
VM_FAULT_OOM or VM_FAULT_SIGBUS. In these cases,
these two return codes must be handed back to the
caller without invoking install_new_page().

o Otherwise, the value that these nopage() functions
would normally return must be passed to
install_new_page(), and the value returned by
install_new_page() must be returned to the nopage()
function's caller.

o Replace all occurrences of "->vm_ops = NULL" with
"->vm_ops = anonymous_vm_ops" or some such.

o The anonymous_vm_ops would have the following members:

nopage: pointer to a function containing the page-allocation
code extracted from do_anonymous_page(), followed
by a call to install_new_page().

populate: NULL.

open: NULL.

close: NULL.

Thoughts?

Thanx, Paul

2003-05-20 07:56:40

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC][PATCH] vm_operation to avoid pagefault/inval race

"Paul E. McKenney" <[email protected]> wrote:
>
> So the general idea is to do something as follows, right?

It sounds reasonable. A matter of putting together the appropriate
library functions and refactoring a few things.

>
> o Make a function, perhaps named something like
> install_new_page(), that does the PTE-installation
> and RSS-adjustment tasks currently performed by
> both do_no_page() and by do_anonymous_page().

That's similar to mm/fremap.c:install_page(). (Which forgets to call
update_mmu_cache(). Debatably a buglet.)

However there is not a lot of commonality between the various nopage()s and
there may not be a lot to be gained from all this. There is subtle code in
there and it is performance-critical. I'd be inclined to try to minimise
overall code churn in this work.


2003-05-23 15:23:48

by Paul E. McKenney

[permalink] [raw]
Subject: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race

On Tue, May 20, 2003 at 01:11:57AM -0700, Andrew Morton wrote:
> "Paul E. McKenney" <[email protected]> wrote:
> >
> > So the general idea is to do something as follows, right?
>
> It sounds reasonable. A matter of putting together the appropriate
> library functions and refactoring a few things.
>
> >
> > o Make a function, perhaps named something like
> > install_new_page(), that does the PTE-installation
> > and RSS-adjustment tasks currently performed by
> > both do_no_page() and by do_anonymous_page().
>
> That's similar to mm/fremap.c:install_page(). (Which forgets to call
> update_mmu_cache(). Debatably a buglet.)
>
> However there is not a lot of commonality between the various nopage()s and
> there may not be a lot to be gained from all this. There is subtle code in
> there and it is performance-critical. I'd be inclined to try to minimise
> overall code churn in this work.

Good point! Here is a patch to do this. A "few" caveats:

o I have not tested this, in fact, I have only compiled
it for i386.

o The bit about removing checks for vm_ops==NULL and
making do_anonymous_page() be just another nopage
function turned out to be problematic, since the
do_anonymous_page() function wants page_table_lock
held, and the other nopage functions do not.
So I kept the NULL checks, since I was going to
need some anyway (I did -not- want to make every
nopage function have to explicitly drop page_table_lock).

I am especially interested in feedback on this point --
did I miss something here?

o I had to expand the trace of the nopage functions to
pass through the mm_struct and the pmd_t to the
new install_new_page() function.

o The nopage functions now return an int instead of
the old struct page*.

o NOPAGE_OOM and NOPAGE_SIGBUS are no more, since
one case just use VM_FAULT_OOM and VM_FAULT_SIGBUS
instead.

o I still need to remove some LINUX_2_2 stuff (thanks
to Dan Phillips for letting me know it was OK to
do so...).

o This patch is a bit long. Thoughts on how to break it up?

Thanx, Paul


diff -urN -X dontdiff linux-2.5.69-mm7/arch/ia64/ia32/binfmt_elf32.c linux-2.5.69-mm7.install_new_page/arch/ia64/ia32/binfmt_elf32.c
--- linux-2.5.69-mm7/arch/ia64/ia32/binfmt_elf32.c Tue May 20 09:10:43 2003
+++ linux-2.5.69-mm7.install_new_page/arch/ia64/ia32/binfmt_elf32.c Thu May 22 16:26:07 2003
@@ -56,13 +56,13 @@
extern struct page *ia32_shared_page[];
extern unsigned long *ia32_gdt;

-struct page *
-ia32_install_shared_page (struct vm_area_struct *vma, unsigned long address, int no_share)
+struct int
+ia32_install_shared_page (struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access, pmd_t *pmd)
{
struct page *pg = ia32_shared_page[(address - vma->vm_start)/PAGE_SIZE];

get_page(pg);
- return pg;
+ return install_new_page(mm, vma, address, write_access, pmd, pg);
}

static struct vm_operations_struct ia32_shared_page_vm_ops = {
diff -urN -X dontdiff linux-2.5.69-mm7/arch/sparc64/mm/hugetlbpage.c linux-2.5.69-mm7.install_new_page/arch/sparc64/mm/hugetlbpage.c
--- linux-2.5.69-mm7/arch/sparc64/mm/hugetlbpage.c Sun May 4 16:53:35 2003
+++ linux-2.5.69-mm7.install_new_page/arch/sparc64/mm/hugetlbpage.c Thu May 22 16:23:56 2003
@@ -633,11 +633,12 @@
return (int) htlbzone_pages;
}

-static struct page *
-hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
+static int
+hugetlb_nopage(struct mm_struct * mm, struct vm_area_struct *vma,
+ unsigned long address, int write_access, pmd_t * pmd)
{
BUG();
- return NULL;
+ return VM_FAULT_SIGBUS;
}

static struct vm_operations_struct hugetlb_vm_ops = {
diff -urN -X dontdiff linux-2.5.69-mm7/drivers/char/agp/alpha-agp.c linux-2.5.69-mm7.install_new_page/drivers/char/agp/alpha-agp.c
--- linux-2.5.69-mm7/drivers/char/agp/alpha-agp.c Tue May 20 09:10:46 2003
+++ linux-2.5.69-mm7.install_new_page/drivers/char/agp/alpha-agp.c Thu May 22 16:02:30 2003
@@ -11,9 +11,11 @@

#include "agp.h"

-static struct page *alpha_core_agp_vm_nopage(struct vm_area_struct *vma,
- unsigned long address,
- int write_access)
+static int alpha_core_agp_vm_nopage(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access,
+ pmd_t pmd)
{
alpha_agp_info *agp = agp_bridge->dev_private_data;
dma_addr_t dma_addr;
@@ -23,14 +25,15 @@
dma_addr = address - vma->vm_start + agp->aperture.bus_base;
pa = agp->ops->translate(agp, dma_addr);

- if (pa == (unsigned long)-EINVAL) return NULL; /* no translation */
+ if (pa == (unsigned long)-EINVAL) return VM_FAULT_SIGBUS;
+ /* no translation */

/*
* Get the page, inc the use count, and return it
*/
page = virt_to_page(__va(pa));
get_page(page);
- return page;
+ return install_new_page(mm, vma, address, write_access, pmd, page);
}

static struct aper_size_info_fixed alpha_core_agp_sizes[] =
diff -urN -X dontdiff linux-2.5.69-mm7/drivers/char/drm/drmP.h linux-2.5.69-mm7.install_new_page/drivers/char/drm/drmP.h
--- linux-2.5.69-mm7/drivers/char/drm/drmP.h Sun May 4 16:53:36 2003
+++ linux-2.5.69-mm7.install_new_page/drivers/char/drm/drmP.h Thu May 22 16:15:52 2003
@@ -620,18 +620,26 @@
extern int DRM(fasync)(int fd, struct file *filp, int on);

/* Mapping support (drm_vm.h) */
-extern struct page *DRM(vm_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access);
-extern struct page *DRM(vm_shm_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access);
-extern struct page *DRM(vm_dma_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access);
-extern struct page *DRM(vm_sg_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access);
+extern int DRM(vm_nopage)(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access,
+ pmd_t *pmd);
+extern int DRM(vm_shm_nopage)(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access,
+ pmd_t *pmd);
+extern int DRM(vm_dma_nopage)(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access,
+ pmd_t *pmd);
+extern int DRM(vm_sg_nopage)(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access,
+ pmd_t *pmd);
extern void DRM(vm_open)(struct vm_area_struct *vma);
extern void DRM(vm_close)(struct vm_area_struct *vma);
extern void DRM(vm_shm_close)(struct vm_area_struct *vma);
diff -urN -X dontdiff linux-2.5.69-mm7/drivers/char/drm/drm_vm.h linux-2.5.69-mm7.install_new_page/drivers/char/drm/drm_vm.h
--- linux-2.5.69-mm7/drivers/char/drm/drm_vm.h Sun May 4 16:53:57 2003
+++ linux-2.5.69-mm7.install_new_page/drivers/char/drm/drm_vm.h Thu May 22 15:09:40 2003
@@ -55,9 +55,11 @@
.close = DRM(vm_close),
};

-struct page *DRM(vm_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access)
+int DRM(vm_nopage)(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access,
+ pmd_t *pmd)
{
#if __REALLY_HAVE_AGP
drm_file_t *priv = vma->vm_file->private_data;
@@ -114,35 +116,38 @@
DRM_DEBUG("baddr = 0x%lx page = 0x%p, offset = 0x%lx\n",
baddr, __va(agpmem->memory->memory[offset]), offset);

- return page;
+ return install_new_page(mm, vma, address, write_access,
+ pmd, page);
}
vm_nopage_error:
#endif /* __REALLY_HAVE_AGP */

- return NOPAGE_SIGBUS; /* Disallow mremap */
+ return VM_FAULT_SIGBUS; /* Disallow mremap */
}

-struct page *DRM(vm_shm_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access)
+int DRM(vm_shm_nopage)(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access,
+ pmd_t *pmd)
{
drm_map_t *map = (drm_map_t *)vma->vm_private_data;
unsigned long offset;
unsigned long i;
struct page *page;

- if (address > vma->vm_end) return NOPAGE_SIGBUS; /* Disallow mremap */
- if (!map) return NOPAGE_OOM; /* Nothing allocated */
+ if (address > vma->vm_end) return VM_FAULT_SIGBUS; /* Disallow mremap */
+ if (!map) return VM_FAULT_OOM; /* Nothing allocated */

offset = address - vma->vm_start;
i = (unsigned long)map->handle + offset;
page = vmalloc_to_page((void *)i);
if (!page)
- return NOPAGE_OOM;
+ return VM_FAULT_OOM;
get_page(page);

DRM_DEBUG("shm_nopage 0x%lx\n", address);
- return page;
+ return install_new_page(mm, vma, address, write_access, pmd, page);
}

/* Special close routine which deletes map information if we are the last
@@ -221,9 +226,11 @@
up(&dev->struct_sem);
}

-struct page *DRM(vm_dma_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access)
+int DRM(vm_dma_nopage)(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access,
+ pmd_t *pmd)
{
drm_file_t *priv = vma->vm_file->private_data;
drm_device_t *dev = priv->dev;
@@ -232,9 +239,9 @@
unsigned long page_nr;
struct page *page;

- if (!dma) return NOPAGE_SIGBUS; /* Error */
- if (address > vma->vm_end) return NOPAGE_SIGBUS; /* Disallow mremap */
- if (!dma->pagelist) return NOPAGE_OOM ; /* Nothing allocated */
+ if (!dma) return VM_FAULT_SIGBUS; /* Error */
+ if (address > vma->vm_end) return VM_FAULT_SIGBUS; /* Disallow mremap */
+ if (!dma->pagelist) return VM_FAULT_OOM ; /* Nothing allocated */

offset = address - vma->vm_start; /* vm_[pg]off[set] should be 0 */
page_nr = offset >> PAGE_SHIFT;
@@ -244,12 +251,14 @@
get_page(page);

DRM_DEBUG("dma_nopage 0x%lx (page %lu)\n", address, page_nr);
- return page;
+ return install_new_page(mm, vma, address, write_access, pmd, page);
}

-struct page *DRM(vm_sg_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access)
+int DRM(vm_sg_nopage)(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access,
+ pmd_t *pmd)
{
drm_map_t *map = (drm_map_t *)vma->vm_private_data;
drm_file_t *priv = vma->vm_file->private_data;
@@ -260,9 +269,9 @@
unsigned long page_offset;
struct page *page;

- if (!entry) return NOPAGE_SIGBUS; /* Error */
- if (address > vma->vm_end) return NOPAGE_SIGBUS; /* Disallow mremap */
- if (!entry->pagelist) return NOPAGE_OOM ; /* Nothing allocated */
+ if (!entry) return VM_FAULT_SIGBUS; /* Error */
+ if (address > vma->vm_end) return VM_FAULT_SIGBUS; /* Disallow mremap */
+ if (!entry->pagelist) return VM_FAULT_OOM ; /* Nothing allocated */


offset = address - vma->vm_start;
@@ -271,7 +280,7 @@
page = entry->pagelist[page_offset];
get_page(page);

- return page;
+ return install_new_page(mm, vma, address, write_access, pmd, page);
}

void DRM(vm_open)(struct vm_area_struct *vma)
diff -urN -X dontdiff linux-2.5.69-mm7/drivers/ieee1394/dma.c linux-2.5.69-mm7.install_new_page/drivers/ieee1394/dma.c
--- linux-2.5.69-mm7/drivers/ieee1394/dma.c Sun May 4 16:53:31 2003
+++ linux-2.5.69-mm7.install_new_page/drivers/ieee1394/dma.c Thu May 22 16:16:07 2003
@@ -184,28 +184,28 @@

/* nopage() handler for mmap access */

-static struct page*
-dma_region_pagefault(struct vm_area_struct *area, unsigned long address, int write_access)
+static int
+dma_region_pagefault(struct mm_struct *mm, struct vm_area_struct *area,
+ unsigned long address, int write_access, pmd_t *pmd)
{
unsigned long offset;
unsigned long kernel_virt_addr;
- struct page *ret = NOPAGE_SIGBUS;
+ struct page *page;

struct dma_region *dma = (struct dma_region*) area->vm_private_data;

if(!dma->kvirt)
- goto out;
+ return VM_FAULT_SIGBUS;

if( (address < (unsigned long) area->vm_start) ||
(address > (unsigned long) area->vm_start + (PAGE_SIZE * dma->n_pages)) )
- goto out;
+ return VM_FAULT_SIGBUS;

offset = address - area->vm_start;
kernel_virt_addr = (unsigned long) dma->kvirt + offset;
- ret = vmalloc_to_page((void*) kernel_virt_addr);
- get_page(ret);
-out:
- return ret;
+ page = vmalloc_to_page((void*) kernel_virt_addr);
+ get_page(page);
+ return install_new_page(mm, vma, address, write_access, pmd, page);
}

static struct vm_operations_struct dma_region_vm_ops = {
diff -urN -X dontdiff linux-2.5.69-mm7/drivers/media/video/video-buf.c linux-2.5.69-mm7.install_new_page/drivers/media/video/video-buf.c
--- linux-2.5.69-mm7/drivers/media/video/video-buf.c Tue May 20 09:10:50 2003
+++ linux-2.5.69-mm7.install_new_page/drivers/media/video/video-buf.c Thu May 22 17:59:32 2003
@@ -979,21 +979,21 @@
* now ...). Bounce buffers don't work very well for the data rates
* video capture has.
*/
-static struct page*
-videobuf_vm_nopage(struct vm_area_struct *vma, unsigned long vaddr,
- int write_access)
+static int
+videobuf_vm_nopage(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long vaddr, int write_access, pmd_t pmd)
{
struct page *page;

dprintk(3,"nopage: fault @ %08lx [vma %08lx-%08lx]\n",
vaddr,vma->vm_start,vma->vm_end);
if (vaddr > vma->vm_end)
- return NOPAGE_SIGBUS;
+ return VM_FAULT_SIGBUS;
page = alloc_page(GFP_USER);
if (!page)
- return NOPAGE_OOM;
+ return VM_FAULT_OOM;
clear_user_page(page_address(page), vaddr, page);
- return page;
+ return install_new_page(mm, vma, vaddr, write_access, pmd, page);
}

static struct vm_operations_struct videobuf_vm_ops =
diff -urN -X dontdiff linux-2.5.69-mm7/drivers/scsi/sg.c linux-2.5.69-mm7.install_new_page/drivers/scsi/sg.c
--- linux-2.5.69-mm7/drivers/scsi/sg.c Tue May 20 09:10:57 2003
+++ linux-2.5.69-mm7.install_new_page/drivers/scsi/sg.c Thu May 22 16:34:00 2003
@@ -1121,21 +1121,22 @@
}
}

-static struct page *
-sg_vma_nopage(struct vm_area_struct *vma, unsigned long addr, int unused)
+static int
+sg_vma_nopage(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long addr, int write_access, pmd_t *pmd)
{
Sg_fd *sfp;
- struct page *page = NOPAGE_SIGBUS;
+ struct page *page = VM_FAULT_SIGBUS;
void *page_ptr = NULL;
unsigned long offset;
Sg_scatter_hold *rsv_schp;

if ((NULL == vma) || (!(sfp = (Sg_fd *) vma->vm_private_data)))
- return page;
+ return install_new_page(mm, vma, addr, write_access, pmd, page);
rsv_schp = &sfp->reserve;
offset = addr - vma->vm_start;
if (offset >= rsv_schp->bufflen)
- return page;
+ return install_new_page(mm, vma, addr, write_access, pmd, page);
SCSI_LOG_TIMEOUT(3, printk("sg_vma_nopage: offset=%lu, scatg=%d\n",
offset, rsv_schp->k_use_sg));
if (rsv_schp->k_use_sg) { /* reserve buffer is a scatter gather list */
@@ -1162,7 +1163,7 @@
page = virt_to_page(page_ptr);
get_page(page); /* increment page count */
}
- return page;
+ return install_new_page(mm, vma, addr, write_access, pmd, page);
}

static struct vm_operations_struct sg_mmap_vm_ops = {
diff -urN -X dontdiff linux-2.5.69-mm7/drivers/sgi/char/graphics.c linux-2.5.69-mm7.install_new_page/drivers/sgi/char/graphics.c
--- linux-2.5.69-mm7/drivers/sgi/char/graphics.c Sun May 4 16:53:31 2003
+++ linux-2.5.69-mm7.install_new_page/drivers/sgi/char/graphics.c Thu May 22 16:37:00 2003
@@ -211,9 +211,9 @@
/*
* This is the core of the direct rendering engine.
*/
-struct page *
-sgi_graphics_nopage (struct vm_area_struct *vma, unsigned long address, int
- no_share)
+struct int
+sgi_graphics_nopage (struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, int write_access, pmd_t *pmdpf)
{
pgd_t *pgd; pmd_t *pmd; pte_t *pte;
int board = GRAPHICS_CARD (vma->vm_dentry->d_inode->i_rdev);
@@ -249,7 +249,7 @@
pte = pte_kmap_offset(pmd, address);
page = pte_page(*pte);
pte_kunmap(pte);
- return page;
+ return install_new_page(mm, vma, address, write_access, pmdpf, page);
}

/*
diff -urN -X dontdiff linux-2.5.69-mm7/fs/ncpfs/mmap.c linux-2.5.69-mm7.install_new_page/fs/ncpfs/mmap.c
--- linux-2.5.69-mm7/fs/ncpfs/mmap.c Sun May 4 16:53:35 2003
+++ linux-2.5.69-mm7.install_new_page/fs/ncpfs/mmap.c Thu May 22 16:28:25 2003
@@ -25,8 +25,10 @@
/*
* Fill in the supplied page for mmap
*/
-static struct page* ncp_file_mmap_nopage(struct vm_area_struct *area,
- unsigned long address, int write_access)
+static int ncp_file_mmap_nopage(struct mm_struct *mm,
+ struct vm_area_struct *area,
+ unsigned long address, int write_access,
+ pmd_t *pmd)
{
struct file *file = area->vm_file;
struct dentry *dentry = file->f_dentry;
@@ -85,7 +87,7 @@
memset(pg_addr + already_read, 0, PAGE_SIZE - already_read);
flush_dcache_page(page);
kunmap(page);
- return page;
+ return install_new_page(mm, area, address, write_access, pmd, page);
}

static struct vm_operations_struct ncp_file_mmap =
diff -urN -X dontdiff linux-2.5.69-mm7/include/linux/mm.h linux-2.5.69-mm7.install_new_page/include/linux/mm.h
--- linux-2.5.69-mm7/include/linux/mm.h Tue May 20 09:11:08 2003
+++ linux-2.5.69-mm7.install_new_page/include/linux/mm.h Thu May 22 20:00:41 2003
@@ -142,7 +142,8 @@
struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
- struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int unused);
+ int (*nopage)(struct mm_struct * mm, struct vm_area_struct * area,
+ unsigned long address, int write_access, pmd_t *pmd);
int (*populate)(struct vm_area_struct * area, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock);
};

@@ -380,12 +381,6 @@
}

/*
- * Error return values for the *_nopage functions
- */
-#define NOPAGE_SIGBUS (NULL)
-#define NOPAGE_OOM ((struct page *) (-1))
-
-/*
* Different kinds of faults, as returned by handle_mm_fault().
* Used to decide whether a process gets delivered SIGBUS or
* just gets major/minor fault counters bumped up.
@@ -402,8 +397,8 @@

extern void show_free_areas(void);

-struct page *shmem_nopage(struct vm_area_struct * vma,
- unsigned long address, int unused);
+int shmem_nopage(struct mm_struct * mm, struct vm_area_struct * vma,
+ unsigned long address, int write_access, pmd_t * pmd);
struct file *shmem_file_setup(char * name, loff_t size, unsigned long flags);
void shmem_lock(struct file * file, int lock);
int shmem_zero_setup(struct vm_area_struct *);
@@ -421,6 +416,9 @@
int zeromap_page_range(struct vm_area_struct *vma, unsigned long from,
unsigned long size, pgprot_t prot);

+extern int install_new_page(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, int write_access,
+ pmd_t *pmd, struct page * new_page);
extern void invalidate_mmap_range(struct address_space *mapping,
loff_t const holebegin,
loff_t const holelen);
@@ -559,7 +557,8 @@
extern void truncate_inode_pages(struct address_space *, loff_t);

/* generic vm_area_ops exported for stackable file systems */
-extern struct page *filemap_nopage(struct vm_area_struct *, unsigned long, int);
+int filemap_nopage(struct mm_struct *, struct vm_area_struct *,
+ unsigned long, int, pmd_t *);

/* mm/page-writeback.c */
int write_one_page(struct page *page, int wait);
diff -urN -X dontdiff linux-2.5.69-mm7/kernel/ksyms.c linux-2.5.69-mm7.install_new_page/kernel/ksyms.c
--- linux-2.5.69-mm7/kernel/ksyms.c Tue May 20 09:11:09 2003
+++ linux-2.5.69-mm7.install_new_page/kernel/ksyms.c Thu May 22 14:54:24 2003
@@ -116,6 +116,7 @@
EXPORT_SYMBOL(max_mapnr);
#endif
EXPORT_SYMBOL(high_memory);
+EXPORT_SYMBOL(install_new_page);
EXPORT_SYMBOL(invalidate_mmap_range);
EXPORT_SYMBOL(vmtruncate);
EXPORT_SYMBOL(find_vma);
diff -urN -X dontdiff linux-2.5.69-mm7/mm/filemap.c linux-2.5.69-mm7.install_new_page/mm/filemap.c
--- linux-2.5.69-mm7/mm/filemap.c Tue May 20 09:11:09 2003
+++ linux-2.5.69-mm7.install_new_page/mm/filemap.c Thu May 22 19:55:56 2003
@@ -982,7 +982,8 @@
* it in the page cache, and handles the special cases reasonably without
* having a lot of duplicated code.
*/
-struct page * filemap_nopage(struct vm_area_struct * area, unsigned long address, int unused)
+int filemap_nopage(struct mm_struct * mm, struct vm_area_struct * area,
+ unsigned long address, int write_access, pmd_t * pmd)
{
int error;
struct file *file = area->vm_file;
@@ -1003,7 +1004,7 @@
*/
size = (inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
if ((pgoff >= size) && (area->vm_mm == current->mm))
- return NULL;
+ return VM_FAULT_SIGBUS;

/*
* The "size" of the file, as far as mmap is concerned, isn't bigger
@@ -1057,7 +1058,7 @@
* Found the page and have a reference on it.
*/
mark_page_accessed(page);
- return page;
+ return install_new_page(mm, area, address, write_access, pmd, page);

no_cached_page:
/*
@@ -1080,8 +1081,8 @@
* to schedule I/O.
*/
if (error == -ENOMEM)
- return NOPAGE_OOM;
- return NULL;
+ return VM_FAULT_OOM;
+ return VM_FAULT_SIGBUS;

page_not_uptodate:
inc_page_state(pgmajfault);
@@ -1138,7 +1139,7 @@
* mm layer so, possibly freeing the page cache page first.
*/
page_cache_release(page);
- return NULL;
+ return VM_FAULT_SIGBUS;
}

static struct page * filemap_getpage(struct file *file, unsigned long pgoff,
diff -urN -X dontdiff linux-2.5.69-mm7/mm/memory.c linux-2.5.69-mm7.install_new_page/mm/memory.c
--- linux-2.5.69-mm7/mm/memory.c Tue May 20 09:11:09 2003
+++ linux-2.5.69-mm7.install_new_page/mm/memory.c Thu May 22 14:56:27 2003
@@ -1385,28 +1385,49 @@
* This is called with the MM semaphore held and the page table
* spinlock held. Exit with the spinlock released.
*/
-static int
+static inline int
do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, int write_access, pte_t *page_table, pmd_t *pmd)
{
- struct page * new_page;
- pte_t entry;
- struct pte_chain *pte_chain;
- int ret;
-
if (!vma->vm_ops || !vma->vm_ops->nopage)
return do_anonymous_page(mm, vma, page_table,
- pmd, write_access, address);
+ pmd, write_access, address);
pte_unmap(page_table);
spin_unlock(&mm->page_table_lock);

- new_page = vma->vm_ops->nopage(vma, address & PAGE_MASK, 0);
+ return vma->vm_ops->nopage(mm, vma, address & PAGE_MASK,
+ write_access, pmd);
+}

- /* no page was available -- either SIGBUS or OOM */
- if (new_page == NOPAGE_SIGBUS)
- return VM_FAULT_SIGBUS;
- if (new_page == NOPAGE_OOM)
- return VM_FAULT_OOM;
+/**
+ * install_new_page - tries to create a new page mapping.
+ * @mm: mmap structure, locus of locking and RSS activity.
+ * @vma: the vm_area_struct controlling the virtual address at
+ * which the page fault occurred.
+ * @address: address of fault.
+ * @write_access: write access required.
+ * @pmd: PMD for faulting address.
+ * @page: physical page to satisfy fault.
+ *
+ * The install_new_page() function aggressively tries to share with
+ * existing pages, but makes a separate copy if the "write_access"
+ * parameter is true in order to avoid the next page fault.
+ *
+ * As this is called only for pages that do not currently exist, we
+ * do not need to flush old virtual caches or the TLB.
+ *
+ * This is called with the MM semaphore held and the page table
+ * spinlock held. Exit with the spinlock released.
+ */
+int
+install_new_page(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, int write_access,
+ pmd_t *pmd, struct page * new_page)
+{
+ pte_t entry;
+ pte_t *page_table;
+ struct pte_chain *pte_chain;
+ int ret;

pte_chain = pte_chain_alloc(GFP_KERNEL);
if (!pte_chain)
diff -urN -X dontdiff linux-2.5.69-mm7/mm/shmem.c linux-2.5.69-mm7.install_new_page/mm/shmem.c
--- linux-2.5.69-mm7/mm/shmem.c Tue May 20 09:11:10 2003
+++ linux-2.5.69-mm7.install_new_page/mm/shmem.c Thu May 22 17:18:55 2003
@@ -936,7 +936,8 @@
return error;
}

-struct page *shmem_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
+int shmem_nopage(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, int write_access, pmd_t *pmd)
{
struct inode *inode = vma->vm_file->f_dentry->d_inode;
struct page *page = NULL;
@@ -949,10 +950,10 @@

error = shmem_getpage(inode, idx, &page, SGP_CACHE);
if (error)
- return (error == -ENOMEM)? NOPAGE_OOM: NOPAGE_SIGBUS;
+ return (error == -ENOMEM)? VM_FAULT_OOM: VM_FAULT_SIGBUS;

mark_page_accessed(page);
- return page;
+ return install_new_page(mm, vma, address, write_access, pmd, page);
}

static int shmem_populate(struct vm_area_struct *vma,
diff -urN -X dontdiff linux-2.5.69-mm7/sound/core/pcm_native.c linux-2.5.69-mm7.install_new_page/sound/core/pcm_native.c
--- linux-2.5.69-mm7/sound/core/pcm_native.c Sun May 4 16:53:09 2003
+++ linux-2.5.69-mm7.install_new_page/sound/core/pcm_native.c Thu May 22 20:07:04 2003
@@ -2693,7 +2693,7 @@
#endif

#ifndef LINUX_2_2
-static struct page * snd_pcm_mmap_status_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
+static int snd_pcm_mmap_status_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd)
#else
static unsigned long snd_pcm_mmap_status_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
#endif
@@ -2703,12 +2703,12 @@
struct page * page;

if (substream == NULL)
- return NOPAGE_OOM;
+ return VM_FAULT_OOM;
runtime = substream->runtime;
page = virt_to_page(runtime->status);
get_page(page);
#ifndef LINUX_2_2
- return page;
+ return install_new_page(mm, area, address, write_access, pmd, page);
#else
return page_address(page);
#endif
@@ -2747,7 +2747,7 @@
}

#ifndef LINUX_2_2
-static struct page * snd_pcm_mmap_control_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
+static int snd_pcm_mmap_control_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd)
#else
static unsigned long snd_pcm_mmap_control_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
#endif
@@ -2757,12 +2757,12 @@
struct page * page;

if (substream == NULL)
- return NOPAGE_OOM;
+ return VM_FAULT_OOM;
runtime = substream->runtime;
page = virt_to_page(runtime->control);
get_page(page);
#ifndef LINUX_2_2
- return page;
+ return install_new_page(mm, area, address, write_access, pmd, page);
#else
return page_address(page);
#endif
@@ -2813,7 +2813,7 @@
}

#ifndef LINUX_2_2
-static struct page * snd_pcm_mmap_data_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
+static int snd_pcm_mmap_data_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd)
#else
static unsigned long snd_pcm_mmap_data_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
#endif
@@ -2826,7 +2826,7 @@
size_t dma_bytes;

if (substream == NULL)
- return NOPAGE_OOM;
+ return VM_FAULT_OOM;
runtime = substream->runtime;
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 3, 25)
offset = area->vm_pgoff << PAGE_SHIFT;
@@ -2834,21 +2834,21 @@
offset = area->vm_offset;
#endif
offset += address - area->vm_start;
- snd_assert((offset % PAGE_SIZE) == 0, return NOPAGE_OOM);
+ snd_assert((offset % PAGE_SIZE) == 0, return VM_FAULT_OOM);
dma_bytes = PAGE_ALIGN(runtime->dma_bytes);
if (offset > dma_bytes - PAGE_SIZE)
- return NOPAGE_SIGBUS;
+ return VM_FAULT_SIGBUS;
if (substream->ops->page) {
page = substream->ops->page(substream, offset);
if (! page)
- return NOPAGE_OOM;
+ return VM_FAULT_OOM;
} else {
vaddr = runtime->dma_area + offset;
page = virt_to_page(vaddr);
}
get_page(page);
#ifndef LINUX_2_2
- return page;
+ return install_new_page(mm, area, address, write_access, pmd, page);
#else
return page_address(page);
#endif
diff -urN -X dontdiff linux-2.5.69-mm7/sound/oss/emu10k1/audio.c linux-2.5.69-mm7.install_new_page/sound/oss/emu10k1/audio.c
--- linux-2.5.69-mm7/sound/oss/emu10k1/audio.c Sun May 4 16:53:02 2003
+++ linux-2.5.69-mm7.install_new_page/sound/oss/emu10k1/audio.c Thu May 22 16:18:16 2003
@@ -970,7 +970,7 @@
return 0;
}

-static struct page *emu10k1_mm_nopage (struct vm_area_struct * vma, unsigned long address, int write_access)
+static int emu10k1_mm_nopage (struct mm_struct * mm, struct vm_area_struct * vma, unsigned long address, int write_access, pmd_t * pmd)
{
struct emu10k1_wavedevice *wave_dev = vma->vm_private_data;
struct woinst *woinst = wave_dev->woinst;
@@ -983,8 +983,8 @@
DPD(3, "addr: %#lx\n", address);

if (address > vma->vm_end) {
- DPF(1, "EXIT, returning NOPAGE_SIGBUS\n");
- return NOPAGE_SIGBUS; /* Disallow mremap */
+ DPF(1, "EXIT, returning VM_FAULT_SIGBUS\n");
+ return VM_FAULT_SIGBUS; /* Disallow mremap */
}

pgoff = vma->vm_pgoff + ((address - vma->vm_start) >> PAGE_SHIFT);
@@ -1013,7 +1013,7 @@
get_page (dmapage);

DPD(3, "page: %#lx\n", (unsigned long) dmapage);
- return dmapage;
+ return install_new_page(mm, vma, address, write_access, pmd, dmapage);
}

struct vm_operations_struct emu10k1_mm_ops = {
diff -urN -X dontdiff linux-2.5.69-mm7/sound/oss/via82cxxx_audio.c linux-2.5.69-mm7.install_new_page/sound/oss/via82cxxx_audio.c
--- linux-2.5.69-mm7/sound/oss/via82cxxx_audio.c Sun May 4 16:53:08 2003
+++ linux-2.5.69-mm7.install_new_page/sound/oss/via82cxxx_audio.c Thu May 22 17:48:25 2003
@@ -1846,8 +1846,8 @@
}


-static struct page * via_mm_nopage (struct vm_area_struct * vma,
- unsigned long address, int write_access)
+static int via_mm_nopage (struct mm_struct *mm, struct vm_area_struct * vma,
+ unsigned long address, int write_access, pmd_t *pmd)
{
struct via_info *card = vma->vm_private_data;
struct via_channel *chan = &card->ch_out;
@@ -1863,12 +1863,12 @@
write_access);

if (address > vma->vm_end) {
- DPRINTK ("EXIT, returning NOPAGE_SIGBUS\n");
- return NOPAGE_SIGBUS; /* Disallow mremap */
+ DPRINTK ("EXIT, returning VM_FAULT_SIGBUS\n");
+ return VM_FAULT_SIGBUS; /* Disallow mremap */
}
if (!card) {
- DPRINTK ("EXIT, returning NOPAGE_OOM\n");
- return NOPAGE_OOM; /* Nothing allocated */
+ DPRINTK ("EXIT, returning VM_FAULT_OOM\n");
+ return VM_FAULT_OOM; /* Nothing allocated */
}

pgoff = vma->vm_pgoff + ((address - vma->vm_start) >> PAGE_SHIFT);
@@ -1895,10 +1895,10 @@
assert ((((unsigned long)chan->pgtbl[pgoff].cpuaddr) % PAGE_SIZE) == 0);

dmapage = virt_to_page (chan->pgtbl[pgoff].cpuaddr);
- DPRINTK ("EXIT, returning page %p for cpuaddr %lXh\n",
+ DPRINTK ("EXIT, installing page %p for cpuaddr %lXh\n",
dmapage, (unsigned long) chan->pgtbl[pgoff].cpuaddr);
get_page (dmapage);
- return dmapage;
+ return install_new_page(mm, vma, address, write_access, pmd, dmapage);
}


2003-05-23 16:06:33

by Hugh Dickins

[permalink] [raw]
Subject: Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race

On Fri, 23 May 2003, Paul E. McKenney wrote:
> On Tue, May 20, 2003 at 01:11:57AM -0700, Andrew Morton wrote:
> >
> > However there is not a lot of commonality between the various nopage()s and
> > there may not be a lot to be gained from all this. There is subtle code in
> > there and it is performance-critical. I'd be inclined to try to minimise
> > overall code churn in this work.
>
> Good point! Here is a patch to do this. A "few" caveats:

Sorry, I miss the point of this patch entirely. At the moment it just
looks like an unattractive rearrangement - the code churn akpm advised
against - with no bearing on that vmtruncate race. Please correct me.

Hugh

2003-05-23 16:57:48

by Daniel Phillips

[permalink] [raw]
Subject: Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race

On Friday 23 May 2003 18:21, Hugh Dickins wrote:
> Sorry, I miss the point of this patch entirely. At the moment it just
> looks like an unattractive rearrangement - the code churn akpm advised
> against - with no bearing on that vmtruncate race. Please correct me.

This is all about supporting cross-host mmap (nice trick, huh?). Yes,
somebody should post a detailed rfc on that subject.

Regards,

Daniel

2003-05-23 17:32:11

by Hugh Dickins

[permalink] [raw]
Subject: Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race

On Fri, 23 May 2003, Daniel Phillips wrote:
> On Friday 23 May 2003 18:21, Hugh Dickins wrote:
> > Sorry, I miss the point of this patch entirely. At the moment it just
> > looks like an unattractive rearrangement - the code churn akpm advised
> > against - with no bearing on that vmtruncate race. Please correct me.
>
> This is all about supporting cross-host mmap (nice trick, huh?). Yes,
> somebody should post a detailed rfc on that subject.

Ah, thanks - translated into terms that I can understand, so that
some ->nopage() not yet in the tree could do something after the
install_new_page() returns. Hmm. Can we be sure it's appropriate
for install_new_page to drop mm->page_table_lock before it returns?

Hugh

2003-05-23 19:30:57

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race

On Fri, May 23, 2003 at 06:47:31PM +0100, Hugh Dickins wrote:
> On Fri, 23 May 2003, Daniel Phillips wrote:
> > On Friday 23 May 2003 18:21, Hugh Dickins wrote:
> > > Sorry, I miss the point of this patch entirely. At the moment it just
> > > looks like an unattractive rearrangement - the code churn akpm advised
> > > against - with no bearing on that vmtruncate race. Please correct me.
> >
> > This is all about supporting cross-host mmap (nice trick, huh?). Yes,
> > somebody should post a detailed rfc on that subject.
>
> Ah, thanks - translated into terms that I can understand, so that
> some ->nopage() not yet in the tree could do something after the
> install_new_page() returns. Hmm. Can we be sure it's appropriate
> for install_new_page to drop mm->page_table_lock before it returns?

Exactly -- allows a ->nopage() to drop some lock to avoid races
between pagefault and either vmtruncate() or invalidate_mmap_range().
This race (from the cross-host mmap viewpoint) is described in:

http://marc.theaimsgroup.com/?l=linux-kernel&m=105286345316249&w=2

install_new_page() has to drop mm->page_table_lock() for the same
reason that the previous do_no_page() did. In addition, dropping
the lock permits a ->nopage() to invoke things like zap_page_range()
which acquire mm->page_table_lock().

Thanx, Paul

2003-05-29 15:01:48

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race

On Fri, May 23, 2003 at 11:42:02AM -0700, Paul E. McKenney wrote:
> On Fri, May 23, 2003 at 06:47:31PM +0100, Hugh Dickins wrote:
> > On Fri, 23 May 2003, Daniel Phillips wrote:
> > > On Friday 23 May 2003 18:21, Hugh Dickins wrote:
> > > > Sorry, I miss the point of this patch entirely. At the moment it just
> > > > looks like an unattractive rearrangement - the code churn akpm advised
> > > > against - with no bearing on that vmtruncate race. Please correct me.
> > >
> > > This is all about supporting cross-host mmap (nice trick, huh?). Yes,
> > > somebody should post a detailed rfc on that subject.
> >
> > Ah, thanks - translated into terms that I can understand, so that
> > some ->nopage() not yet in the tree could do something after the
> > install_new_page() returns. Hmm. Can we be sure it's appropriate
> > for install_new_page to drop mm->page_table_lock before it returns?
>
> Exactly -- allows a ->nopage() to drop some lock to avoid races
> between pagefault and either vmtruncate() or invalidate_mmap_range().
> This race (from the cross-host mmap viewpoint) is described in:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=105286345316249&w=2
>
> install_new_page() has to drop mm->page_table_lock() for the same
> reason that the previous do_no_page() did. In addition, dropping
> the lock permits a ->nopage() to invoke things like zap_page_range()
> which acquire mm->page_table_lock().

Rediffed for 2.5.70-mm1. Some added lines of code due to following
the "#ifndef LINUX_2_2" in the sound system. The patch in the following
email removes these #ifdefs on the off-chance that they are a
holdover rather than the sound system's way of maintaining
a single code base across all versions of Linux or some such.

Thanx, Paul


diff -urN -x dontdiff linux-2.5.70-mm1/arch/ia64/ia32/binfmt_elf32.c linux-2.5.70-mm1.install_new_page/arch/ia64/ia32/binfmt_elf32.c
--- linux-2.5.70-mm1/arch/ia64/ia32/binfmt_elf32.c 2003-05-26 18:00:58.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/arch/ia64/ia32/binfmt_elf32.c 2003-05-28 20:17:42.000000000 -0700
@@ -56,13 +56,13 @@
extern struct page *ia32_shared_page[];
extern unsigned long *ia32_gdt;

-struct page *
-ia32_install_shared_page (struct vm_area_struct *vma, unsigned long address, int no_share)
+int
+ia32_install_shared_page (struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access, pmd_t *pmd)
{
struct page *pg = ia32_shared_page[(address - vma->vm_start)/PAGE_SIZE];

get_page(pg);
- return pg;
+ return install_new_page(mm, vma, address, write_access, pmd, pg);
}

static struct vm_operations_struct ia32_shared_page_vm_ops = {
diff -urN -x dontdiff linux-2.5.70-mm1/arch/sparc64/mm/hugetlbpage.c linux-2.5.70-mm1.install_new_page/arch/sparc64/mm/hugetlbpage.c
--- linux-2.5.70-mm1/arch/sparc64/mm/hugetlbpage.c 2003-05-26 18:00:42.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/arch/sparc64/mm/hugetlbpage.c 2003-05-28 20:17:42.000000000 -0700
@@ -633,11 +633,11 @@
return (int) htlbzone_pages;
}

-static struct page *
-hugetlb_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
+static int
+hugetlb_nopage(struct mm_struct * mm, struct vm_area_struct *vma, unsigned long address, int write_access, pmd_t * pmd)
{
BUG();
- return NULL;
+ return VM_FAULT_SIGBUS;
}

static struct vm_operations_struct hugetlb_vm_ops = {
diff -urN -x dontdiff linux-2.5.70-mm1/drivers/char/agp/alpha-agp.c linux-2.5.70-mm1.install_new_page/drivers/char/agp/alpha-agp.c
--- linux-2.5.70-mm1/drivers/char/agp/alpha-agp.c 2003-05-26 18:00:42.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/drivers/char/agp/alpha-agp.c 2003-05-28 20:37:38.000000000 -0700
@@ -11,9 +11,9 @@

#include "agp.h"

-static struct page *alpha_core_agp_vm_nopage(struct vm_area_struct *vma,
- unsigned long address,
- int write_access)
+static int alpha_core_agp_vm_nopage(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access, pmd_t pmd)
{
alpha_agp_info *agp = agp_bridge->dev_private_data;
dma_addr_t dma_addr;
@@ -23,14 +23,14 @@
dma_addr = address - vma->vm_start + agp->aperture.bus_base;
pa = agp->ops->translate(agp, dma_addr);

- if (pa == (unsigned long)-EINVAL) return NULL; /* no translation */
+ if (pa == (unsigned long)-EINVAL) return VM_FAULT_SIGBUS; /* no translation */

/*
* Get the page, inc the use count, and return it
*/
page = virt_to_page(__va(pa));
get_page(page);
- return page;
+ return install_new_page(mm, vma, address, write_access, pmd, page);
}

static struct aper_size_info_fixed alpha_core_agp_sizes[] =
diff -urN -x dontdiff linux-2.5.70-mm1/drivers/char/drm/drmP.h linux-2.5.70-mm1.install_new_page/drivers/char/drm/drmP.h
--- linux-2.5.70-mm1/drivers/char/drm/drmP.h 2003-05-26 18:00:45.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/drivers/char/drm/drmP.h 2003-05-28 20:55:40.000000000 -0700
@@ -620,18 +620,17 @@
extern int DRM(fasync)(int fd, struct file *filp, int on);

/* Mapping support (drm_vm.h) */
-extern struct page *DRM(vm_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access);
-extern struct page *DRM(vm_shm_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access);
-extern struct page *DRM(vm_dma_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access);
-extern struct page *DRM(vm_sg_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access);
+extern int DRM(vm_nopage)(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, int write_access, pmd_t *pmd);
+extern int DRM(vm_shm_nopage)(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access, pmd_t *pmd);
+extern int DRM(vm_dma_nopage)(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access, pmd_t *pmd);
+extern int DRM(vm_sg_nopage)(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access, pmd_t *pmd);
extern void DRM(vm_open)(struct vm_area_struct *vma);
extern void DRM(vm_close)(struct vm_area_struct *vma);
extern void DRM(vm_shm_close)(struct vm_area_struct *vma);
diff -urN -x dontdiff linux-2.5.70-mm1/drivers/char/drm/drm_vm.h linux-2.5.70-mm1.install_new_page/drivers/char/drm/drm_vm.h
--- linux-2.5.70-mm1/drivers/char/drm/drm_vm.h 2003-05-26 18:01:02.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/drivers/char/drm/drm_vm.h 2003-05-28 20:57:19.000000000 -0700
@@ -55,9 +55,9 @@
.close = DRM(vm_close),
};

-struct page *DRM(vm_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access)
+int DRM(vm_nopage)(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access, pmd_t *pmd)
{
#if __REALLY_HAVE_AGP
drm_file_t *priv = vma->vm_file->private_data;
@@ -114,35 +114,35 @@
baddr, __va(agpmem->memory->memory[offset]), offset,
atomic_read(&page->count));

- return page;
+ return install_new_page(mm, vma, address, write_access, pmd, page);
}
vm_nopage_error:
#endif /* __REALLY_HAVE_AGP */

- return NOPAGE_SIGBUS; /* Disallow mremap */
+ return VM_FAULT_SIGBUS; /* Disallow mremap */
}

-struct page *DRM(vm_shm_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access)
+int DRM(vm_shm_nopage)(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access, pmd_t *pmd)
{
drm_map_t *map = (drm_map_t *)vma->vm_private_data;
unsigned long offset;
unsigned long i;
struct page *page;

- if (address > vma->vm_end) return NOPAGE_SIGBUS; /* Disallow mremap */
- if (!map) return NOPAGE_OOM; /* Nothing allocated */
+ if (address > vma->vm_end) return VM_FAULT_SIGBUS; /* Disallow mremap */
+ if (!map) return VM_FAULT_OOM; /* Nothing allocated */

offset = address - vma->vm_start;
i = (unsigned long)map->handle + offset;
page = vmalloc_to_page((void *)i);
if (!page)
- return NOPAGE_OOM;
+ return VM_FAULT_OOM;
get_page(page);

DRM_DEBUG("shm_nopage 0x%lx\n", address);
- return page;
+ return install_new_page(mm, vma, address, write_access, pmd, page);
}

/* Special close routine which deletes map information if we are the last
@@ -221,9 +221,9 @@
up(&dev->struct_sem);
}

-struct page *DRM(vm_dma_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access)
+int DRM(vm_dma_nopage)(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access, pmd_t *pmd)
{
drm_file_t *priv = vma->vm_file->private_data;
drm_device_t *dev = priv->dev;
@@ -232,9 +232,9 @@
unsigned long page_nr;
struct page *page;

- if (!dma) return NOPAGE_SIGBUS; /* Error */
- if (address > vma->vm_end) return NOPAGE_SIGBUS; /* Disallow mremap */
- if (!dma->pagelist) return NOPAGE_OOM ; /* Nothing allocated */
+ if (!dma) return VM_FAULT_SIGBUS; /* Error */
+ if (address > vma->vm_end) return VM_FAULT_SIGBUS; /* Disallow mremap */
+ if (!dma->pagelist) return VM_FAULT_OOM ; /* Nothing allocated */

offset = address - vma->vm_start; /* vm_[pg]off[set] should be 0 */
page_nr = offset >> PAGE_SHIFT;
@@ -244,12 +244,12 @@
get_page(page);

DRM_DEBUG("dma_nopage 0x%lx (page %lu)\n", address, page_nr);
- return page;
+ return install_new_page(mm, vma, address, write_access, pmd, page);
}

-struct page *DRM(vm_sg_nopage)(struct vm_area_struct *vma,
- unsigned long address,
- int write_access)
+int DRM(vm_sg_nopage)(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address,
+ int write_access, pmd_t *pmd)
{
drm_map_t *map = (drm_map_t *)vma->vm_private_data;
drm_file_t *priv = vma->vm_file->private_data;
@@ -260,9 +260,9 @@
unsigned long page_offset;
struct page *page;

- if (!entry) return NOPAGE_SIGBUS; /* Error */
- if (address > vma->vm_end) return NOPAGE_SIGBUS; /* Disallow mremap */
- if (!entry->pagelist) return NOPAGE_OOM ; /* Nothing allocated */
+ if (!entry) return VM_FAULT_SIGBUS; /* Error */
+ if (address > vma->vm_end) return VM_FAULT_SIGBUS; /* Disallow mremap */
+ if (!entry->pagelist) return VM_FAULT_OOM ; /* Nothing allocated */


offset = address - vma->vm_start;
@@ -271,7 +271,7 @@
page = entry->pagelist[page_offset];
get_page(page);

- return page;
+ return install_new_page(mm, vma, address, write_access, pmd, page);
}

void DRM(vm_open)(struct vm_area_struct *vma)
diff -urN -x dontdiff linux-2.5.70-mm1/drivers/ieee1394/dma.c linux-2.5.70-mm1.install_new_page/drivers/ieee1394/dma.c
--- linux-2.5.70-mm1/drivers/ieee1394/dma.c 2003-05-26 18:00:40.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/drivers/ieee1394/dma.c 2003-05-28 20:39:31.000000000 -0700
@@ -184,28 +184,27 @@

/* nopage() handler for mmap access */

-static struct page*
-dma_region_pagefault(struct vm_area_struct *area, unsigned long address, int write_access)
+static int
+dma_region_pagefault(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd)
{
unsigned long offset;
unsigned long kernel_virt_addr;
- struct page *ret = NOPAGE_SIGBUS;
+ struct page *page;

struct dma_region *dma = (struct dma_region*) area->vm_private_data;

if(!dma->kvirt)
- goto out;
+ return VM_FAULT_SIGBUS;

if( (address < (unsigned long) area->vm_start) ||
(address > (unsigned long) area->vm_start + (PAGE_SIZE * dma->n_pages)) )
- goto out;
+ return VM_FAULT_SIGBUS;

offset = address - area->vm_start;
kernel_virt_addr = (unsigned long) dma->kvirt + offset;
- ret = vmalloc_to_page((void*) kernel_virt_addr);
- get_page(ret);
-out:
- return ret;
+ page = vmalloc_to_page((void*) kernel_virt_addr);
+ get_page(page);
+ return install_new_page(mm, vma, address, write_access, pmd, page);
}

static struct vm_operations_struct dma_region_vm_ops = {
diff -urN -x dontdiff linux-2.5.70-mm1/drivers/media/video/video-buf.c linux-2.5.70-mm1.install_new_page/drivers/media/video/video-buf.c
--- linux-2.5.70-mm1/drivers/media/video/video-buf.c 2003-05-26 18:00:40.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/drivers/media/video/video-buf.c 2003-05-28 20:17:42.000000000 -0700
@@ -979,21 +979,21 @@
* now ...). Bounce buffers don't work very well for the data rates
* video capture has.
*/
-static struct page*
-videobuf_vm_nopage(struct vm_area_struct *vma, unsigned long vaddr,
- int write_access)
+static int
+videobuf_vm_nopage(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long vaddr, int write_access, pmd_t pmd)
{
struct page *page;

dprintk(3,"nopage: fault @ %08lx [vma %08lx-%08lx]\n",
vaddr,vma->vm_start,vma->vm_end);
if (vaddr > vma->vm_end)
- return NOPAGE_SIGBUS;
+ return VM_FAULT_SIGBUS;
page = alloc_page(GFP_USER);
if (!page)
- return NOPAGE_OOM;
+ return VM_FAULT_OOM;
clear_user_page(page_address(page), vaddr, page);
- return page;
+ return install_new_page(mm, vma, vaddr, write_access, pmd, page);
}

static struct vm_operations_struct videobuf_vm_ops =
diff -urN -x dontdiff linux-2.5.70-mm1/drivers/scsi/sg.c linux-2.5.70-mm1.install_new_page/drivers/scsi/sg.c
--- linux-2.5.70-mm1/drivers/scsi/sg.c 2003-05-28 20:16:04.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/drivers/scsi/sg.c 2003-05-28 20:39:59.000000000 -0700
@@ -1121,21 +1121,21 @@
}
}

-static struct page *
-sg_vma_nopage(struct vm_area_struct *vma, unsigned long addr, int unused)
+static int
+sg_vma_nopage(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, int write_access, pmd_t *pmd)
{
Sg_fd *sfp;
- struct page *page = NOPAGE_SIGBUS;
+ struct page *page = VM_FAULT_SIGBUS;
void *page_ptr = NULL;
unsigned long offset;
Sg_scatter_hold *rsv_schp;

if ((NULL == vma) || (!(sfp = (Sg_fd *) vma->vm_private_data)))
- return page;
+ return install_new_page(mm, vma, addr, write_access, pmd, page);
rsv_schp = &sfp->reserve;
offset = addr - vma->vm_start;
if (offset >= rsv_schp->bufflen)
- return page;
+ return install_new_page(mm, vma, addr, write_access, pmd, page);
SCSI_LOG_TIMEOUT(3, printk("sg_vma_nopage: offset=%lu, scatg=%d\n",
offset, rsv_schp->k_use_sg));
if (rsv_schp->k_use_sg) { /* reserve buffer is a scatter gather list */
@@ -1162,7 +1162,7 @@
page = virt_to_page(page_ptr);
get_page(page); /* increment page count */
}
- return page;
+ return install_new_page(mm, vma, addr, write_access, pmd, page);
}

static struct vm_operations_struct sg_mmap_vm_ops = {
diff -urN -x dontdiff linux-2.5.70-mm1/drivers/sgi/char/graphics.c linux-2.5.70-mm1.install_new_page/drivers/sgi/char/graphics.c
--- linux-2.5.70-mm1/drivers/sgi/char/graphics.c 2003-05-26 18:00:40.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/drivers/sgi/char/graphics.c 2003-05-28 20:17:42.000000000 -0700
@@ -211,9 +211,9 @@
/*
* This is the core of the direct rendering engine.
*/
-struct page *
-sgi_graphics_nopage (struct vm_area_struct *vma, unsigned long address, int
- no_share)
+struct int
+sgi_graphics_nopage (struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, int write_access, pmd_t *pmdpf)
{
pgd_t *pgd; pmd_t *pmd; pte_t *pte;
int board = GRAPHICS_CARD (vma->vm_dentry->d_inode->i_rdev);
@@ -249,7 +249,7 @@
pte = pte_kmap_offset(pmd, address);
page = pte_page(*pte);
pte_kunmap(pte);
- return page;
+ return install_new_page(mm, vma, address, write_access, pmdpf, page);
}

/*
diff -urN -x dontdiff linux-2.5.70-mm1/fs/ncpfs/mmap.c linux-2.5.70-mm1.install_new_page/fs/ncpfs/mmap.c
--- linux-2.5.70-mm1/fs/ncpfs/mmap.c 2003-05-26 18:00:43.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/fs/ncpfs/mmap.c 2003-05-28 20:17:42.000000000 -0700
@@ -25,8 +25,8 @@
/*
* Fill in the supplied page for mmap
*/
-static struct page* ncp_file_mmap_nopage(struct vm_area_struct *area,
- unsigned long address, int write_access)
+static int ncp_file_mmap_nopage(struct mm_struct *mm, struct vm_area_struct *area,
+ unsigned long address, int write_access, pmd_t *pmd)
{
struct file *file = area->vm_file;
struct dentry *dentry = file->f_dentry;
@@ -85,7 +85,7 @@
memset(pg_addr + already_read, 0, PAGE_SIZE - already_read);
flush_dcache_page(page);
kunmap(page);
- return page;
+ return install_new_page(mm, area, address, write_access, pmd, page);
}

static struct vm_operations_struct ncp_file_mmap =
diff -urN -x dontdiff linux-2.5.70-mm1/include/linux/mm.h linux-2.5.70-mm1.install_new_page/include/linux/mm.h
--- linux-2.5.70-mm1/include/linux/mm.h 2003-05-28 20:16:04.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/include/linux/mm.h 2003-05-28 20:17:42.000000000 -0700
@@ -142,7 +142,7 @@
struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
- struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int unused);
+ int (*nopage)(struct mm_struct * mm, struct vm_area_struct * area, unsigned long address, int write_access, pmd_t *pmd);
int (*populate)(struct vm_area_struct * area, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock);
};

@@ -380,12 +380,6 @@
}

/*
- * Error return values for the *_nopage functions
- */
-#define NOPAGE_SIGBUS (NULL)
-#define NOPAGE_OOM ((struct page *) (-1))
-
-/*
* Different kinds of faults, as returned by handle_mm_fault().
* Used to decide whether a process gets delivered SIGBUS or
* just gets major/minor fault counters bumped up.
@@ -402,8 +396,8 @@

extern void show_free_areas(void);

-struct page *shmem_nopage(struct vm_area_struct * vma,
- unsigned long address, int unused);
+int shmem_nopage(struct mm_struct * mm, struct vm_area_struct * vma,
+ unsigned long address, int write_access, pmd_t * pmd);
struct file *shmem_file_setup(char * name, loff_t size, unsigned long flags);
void shmem_lock(struct file * file, int lock);
int shmem_zero_setup(struct vm_area_struct *);
@@ -421,6 +415,7 @@
int zeromap_page_range(struct vm_area_struct *vma, unsigned long from,
unsigned long size, pgprot_t prot);

+extern int install_new_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access, pmd_t *pmd, struct page * new_page);
extern void invalidate_mmap_range(struct address_space *mapping,
loff_t const holebegin,
loff_t const holelen);
@@ -559,7 +554,7 @@
extern void truncate_inode_pages(struct address_space *, loff_t);

/* generic vm_area_ops exported for stackable file systems */
-extern struct page *filemap_nopage(struct vm_area_struct *, unsigned long, int);
+int filemap_nopage(struct mm_struct *, struct vm_area_struct *, unsigned long, int, pmd_t *);

/* mm/page-writeback.c */
int write_one_page(struct page *page, int wait);
diff -urN -x dontdiff linux-2.5.70-mm1/kernel/ksyms.c linux-2.5.70-mm1.install_new_page/kernel/ksyms.c
--- linux-2.5.70-mm1/kernel/ksyms.c 2003-05-28 20:16:04.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/kernel/ksyms.c 2003-05-28 20:17:42.000000000 -0700
@@ -116,6 +116,7 @@
EXPORT_SYMBOL(max_mapnr);
#endif
EXPORT_SYMBOL(high_memory);
+EXPORT_SYMBOL(install_new_page);
EXPORT_SYMBOL(invalidate_mmap_range);
EXPORT_SYMBOL(vmtruncate);
EXPORT_SYMBOL(find_vma);
diff -urN -x dontdiff linux-2.5.70-mm1/mm/filemap.c linux-2.5.70-mm1.install_new_page/mm/filemap.c
--- linux-2.5.70-mm1/mm/filemap.c 2003-05-28 20:16:04.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/mm/filemap.c 2003-05-28 20:17:42.000000000 -0700
@@ -1013,7 +1013,7 @@
* it in the page cache, and handles the special cases reasonably without
* having a lot of duplicated code.
*/
-struct page * filemap_nopage(struct vm_area_struct * area, unsigned long address, int unused)
+int filemap_nopage(struct mm_struct * mm, struct vm_area_struct * area, unsigned long address, int write_access, pmd_t * pmd)
{
int error;
struct file *file = area->vm_file;
@@ -1034,7 +1034,7 @@
*/
size = (inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
if ((pgoff >= size) && (area->vm_mm == current->mm))
- return NULL;
+ return VM_FAULT_SIGBUS;

/*
* The "size" of the file, as far as mmap is concerned, isn't bigger
@@ -1088,7 +1088,7 @@
* Found the page and have a reference on it.
*/
mark_page_accessed(page);
- return page;
+ return install_new_page(mm, area, address, write_access, pmd, page);

no_cached_page:
/*
@@ -1111,8 +1111,8 @@
* to schedule I/O.
*/
if (error == -ENOMEM)
- return NOPAGE_OOM;
- return NULL;
+ return VM_FAULT_OOM;
+ return VM_FAULT_SIGBUS;

page_not_uptodate:
inc_page_state(pgmajfault);
@@ -1169,7 +1169,7 @@
* mm layer so, possibly freeing the page cache page first.
*/
page_cache_release(page);
- return NULL;
+ return VM_FAULT_SIGBUS;
}

static struct page * filemap_getpage(struct file *file, unsigned long pgoff,
diff -urN -x dontdiff linux-2.5.70-mm1/mm/memory.c linux-2.5.70-mm1.install_new_page/mm/memory.c
--- linux-2.5.70-mm1/mm/memory.c 2003-05-28 20:16:04.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/mm/memory.c 2003-05-28 20:43:16.000000000 -0700
@@ -1374,39 +1374,33 @@
}

/*
- * do_no_page() tries to create a new page mapping. It aggressively
- * tries to share with existing pages, but makes a separate copy if
- * the "write_access" parameter is true in order to avoid the next
- * page fault.
- *
- * As this is called only for pages that do not currently exist, we
- * do not need to flush old virtual caches or the TLB.
- *
- * This is called with the MM semaphore held and the page table
- * spinlock held. Exit with the spinlock released.
+ * do_no_page() invokes do_anonymous_page() or ->nopage, as appropriate.
+ * Called w/ MM sema and page_table_lock held, the latter released before exit.
*/
-static int
+static inline int
do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, int write_access, pte_t *page_table, pmd_t *pmd)
{
- struct page * new_page;
- pte_t entry;
- struct pte_chain *pte_chain;
- int ret;
-
if (!vma->vm_ops || !vma->vm_ops->nopage)
- return do_anonymous_page(mm, vma, page_table,
- pmd, write_access, address);
+ return do_anonymous_page(mm, vma, page_table, pmd, write_access, address);
pte_unmap(page_table);
spin_unlock(&mm->page_table_lock);
+ return vma->vm_ops->nopage(mm, vma, address & PAGE_MASK, write_access, pmd);
+}

- new_page = vma->vm_ops->nopage(vma, address & PAGE_MASK, 0);
-
- /* no page was available -- either SIGBUS or OOM */
- if (new_page == NOPAGE_SIGBUS)
- return VM_FAULT_SIGBUS;
- if (new_page == NOPAGE_OOM)
- return VM_FAULT_OOM;
+/*
+ * install_new_page - tries to create a new page mapping.
+ * install_new_page() tries to share w/existing pages, but makes separate
+ * copy if "write_access" is true in order to avoid the next page fault.
+ * As this is called only for pages that do not currently exist, we
+ * do not need to flush old virtual caches or the TLB.
+ */
+int
+install_new_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access, pmd_t *pmd, struct page * new_page)
+{
+ pte_t entry, *page_table;
+ struct pte_chain *pte_chain;
+ int ret;

pte_chain = pte_chain_alloc(GFP_KERNEL);
if (!pte_chain)
diff -urN -x dontdiff linux-2.5.70-mm1/mm/shmem.c linux-2.5.70-mm1.install_new_page/mm/shmem.c
--- linux-2.5.70-mm1/mm/shmem.c 2003-05-26 18:00:39.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/mm/shmem.c 2003-05-28 20:17:42.000000000 -0700
@@ -936,7 +936,7 @@
return error;
}

-struct page *shmem_nopage(struct vm_area_struct *vma, unsigned long address, int unused)
+int shmem_nopage(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, int write_access, pmd_t *pmd)
{
struct inode *inode = vma->vm_file->f_dentry->d_inode;
struct page *page = NULL;
@@ -949,10 +949,10 @@

error = shmem_getpage(inode, idx, &page, SGP_CACHE);
if (error)
- return (error == -ENOMEM)? NOPAGE_OOM: NOPAGE_SIGBUS;
+ return (error == -ENOMEM)? VM_FAULT_OOM: VM_FAULT_SIGBUS;

mark_page_accessed(page);
- return page;
+ return install_new_page(mm, vma, address, write_access, pmd, page);
}

static int shmem_populate(struct vm_area_struct *vma,
diff -urN -x dontdiff linux-2.5.70-mm1/sound/core/pcm_native.c linux-2.5.70-mm1.install_new_page/sound/core/pcm_native.c
--- linux-2.5.70-mm1/sound/core/pcm_native.c 2003-05-26 18:00:37.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/sound/core/pcm_native.c 2003-05-28 21:39:45.000000000 -0700
@@ -60,6 +60,11 @@
static int snd_pcm_hw_refine_old_user(snd_pcm_substream_t * substream, struct sndrv_pcm_hw_params_old * _oparams);
static int snd_pcm_hw_params_old_user(snd_pcm_substream_t * substream, struct sndrv_pcm_hw_params_old * _oparams);

+#ifndef LINUX_2_2
+#define NOPAGE_OOM VM_FAULT_OOM
+#define NOPAGE_SIGBUS VM_FAULT_SIGBUS
+#endif
+
/*
*
*/
@@ -2693,7 +2698,7 @@
#endif

#ifndef LINUX_2_2
-static struct page * snd_pcm_mmap_status_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
+static int snd_pcm_mmap_status_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd)
#else
static unsigned long snd_pcm_mmap_status_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
#endif
@@ -2708,7 +2713,7 @@
page = virt_to_page(runtime->status);
get_page(page);
#ifndef LINUX_2_2
- return page;
+ return install_new_page(mm, area, address, write_access, pmd, page);
#else
return page_address(page);
#endif
@@ -2747,7 +2752,7 @@
}

#ifndef LINUX_2_2
-static struct page * snd_pcm_mmap_control_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
+static int snd_pcm_mmap_control_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd)
#else
static unsigned long snd_pcm_mmap_control_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
#endif
@@ -2762,7 +2767,7 @@
page = virt_to_page(runtime->control);
get_page(page);
#ifndef LINUX_2_2
- return page;
+ return install_new_page(mm, area, address, write_access, pmd, page);
#else
return page_address(page);
#endif
@@ -2813,7 +2818,7 @@
}

#ifndef LINUX_2_2
-static struct page * snd_pcm_mmap_data_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
+static int snd_pcm_mmap_data_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd)
#else
static unsigned long snd_pcm_mmap_data_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
#endif
@@ -2848,7 +2853,7 @@
}
get_page(page);
#ifndef LINUX_2_2
- return page;
+ return install_new_page(mm, area, address, write_access, pmd, page);
#else
return page_address(page);
#endif
diff -urN -x dontdiff linux-2.5.70-mm1/sound/oss/emu10k1/audio.c linux-2.5.70-mm1.install_new_page/sound/oss/emu10k1/audio.c
--- linux-2.5.70-mm1/sound/oss/emu10k1/audio.c 2003-05-26 18:00:23.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/sound/oss/emu10k1/audio.c 2003-05-28 20:17:42.000000000 -0700
@@ -970,7 +970,7 @@
return 0;
}

-static struct page *emu10k1_mm_nopage (struct vm_area_struct * vma, unsigned long address, int write_access)
+static int emu10k1_mm_nopage (struct mm_struct * mm, struct vm_area_struct * vma, unsigned long address, int write_access, pmd_t * pmd)
{
struct emu10k1_wavedevice *wave_dev = vma->vm_private_data;
struct woinst *woinst = wave_dev->woinst;
@@ -983,8 +983,8 @@
DPD(3, "addr: %#lx\n", address);

if (address > vma->vm_end) {
- DPF(1, "EXIT, returning NOPAGE_SIGBUS\n");
- return NOPAGE_SIGBUS; /* Disallow mremap */
+ DPF(1, "EXIT, returning VM_FAULT_SIGBUS\n");
+ return VM_FAULT_SIGBUS; /* Disallow mremap */
}

pgoff = vma->vm_pgoff + ((address - vma->vm_start) >> PAGE_SHIFT);
@@ -1013,7 +1013,7 @@
get_page (dmapage);

DPD(3, "page: %#lx\n", (unsigned long) dmapage);
- return dmapage;
+ return install_new_page(mm, vma, address, write_access, pmd, dmapage);
}

struct vm_operations_struct emu10k1_mm_ops = {
diff -urN -x dontdiff linux-2.5.70-mm1/sound/oss/via82cxxx_audio.c linux-2.5.70-mm1.install_new_page/sound/oss/via82cxxx_audio.c
--- linux-2.5.70-mm1/sound/oss/via82cxxx_audio.c 2003-05-26 18:00:27.000000000 -0700
+++ linux-2.5.70-mm1.install_new_page/sound/oss/via82cxxx_audio.c 2003-05-28 20:17:44.000000000 -0700
@@ -1846,8 +1846,8 @@
}


-static struct page * via_mm_nopage (struct vm_area_struct * vma,
- unsigned long address, int write_access)
+static int via_mm_nopage (struct mm_struct *mm, struct vm_area_struct * vma,
+ unsigned long address, int write_access, pmd_t *pmd)
{
struct via_info *card = vma->vm_private_data;
struct via_channel *chan = &card->ch_out;
@@ -1863,12 +1863,12 @@
write_access);

if (address > vma->vm_end) {
- DPRINTK ("EXIT, returning NOPAGE_SIGBUS\n");
- return NOPAGE_SIGBUS; /* Disallow mremap */
+ DPRINTK ("EXIT, returning VM_FAULT_SIGBUS\n");
+ return VM_FAULT_SIGBUS; /* Disallow mremap */
}
if (!card) {
- DPRINTK ("EXIT, returning NOPAGE_OOM\n");
- return NOPAGE_OOM; /* Nothing allocated */
+ DPRINTK ("EXIT, returning VM_FAULT_OOM\n");
+ return VM_FAULT_OOM; /* Nothing allocated */
}

pgoff = vma->vm_pgoff + ((address - vma->vm_start) >> PAGE_SHIFT);
@@ -1895,10 +1895,10 @@
assert ((((unsigned long)chan->pgtbl[pgoff].cpuaddr) % PAGE_SIZE) == 0);

dmapage = virt_to_page (chan->pgtbl[pgoff].cpuaddr);
- DPRINTK ("EXIT, returning page %p for cpuaddr %lXh\n",
+ DPRINTK ("EXIT, installing page %p for cpuaddr %lXh\n",
dmapage, (unsigned long) chan->pgtbl[pgoff].cpuaddr);
get_page (dmapage);
- return dmapage;
+ return install_new_page(mm, vma, address, write_access, pmd, dmapage);
}


2003-05-29 15:05:35

by Paul E. McKenney

[permalink] [raw]
Subject: [RFC][PATCH] Remove LINUX_2_2

On Thu, May 29, 2003 at 08:14:24AM -0700, Paul E. McKenney wrote:
> Rediffed for 2.5.70-mm1. Some added lines of code due to following
> the "#ifndef LINUX_2_2" in the sound system. The patch in the following
> email removes these #ifdefs on the off-chance that they are a
> holdover rather than the sound system's way of maintaining
> a single code base across all versions of Linux or some such.

This is the patch to remove the LINUX_2_2. This patch depends
on the earlier install_new_page.2.5.70-mm1-3.patch sent earlier.

Thanx, Paul


diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/control.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/control.c
--- linux-2.5.70-mm1.install_new_page/sound/core/control.c 2003-05-26 18:00:24.000000000 -0700
+++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/control.c 2003-05-28 22:41:51.000000000 -0700
@@ -931,9 +931,7 @@

static struct file_operations snd_ctl_f_ops =
{
-#ifndef LINUX_2_2
.owner = THIS_MODULE,
-#endif
.read = snd_ctl_read,
.open = snd_ctl_open,
.release = snd_ctl_release,
diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/hwdep.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/hwdep.c
--- linux-2.5.70-mm1.install_new_page/sound/core/hwdep.c 2003-05-26 18:00:21.000000000 -0700
+++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/hwdep.c 2003-05-28 22:41:51.000000000 -0700
@@ -292,9 +292,7 @@

static struct file_operations snd_hwdep_f_ops =
{
-#ifndef LINUX_2_2
.owner = THIS_MODULE,
-#endif
.llseek = snd_hwdep_llseek,
.read = snd_hwdep_read,
.write = snd_hwdep_write,
diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/info.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/info.c
--- linux-2.5.70-mm1.install_new_page/sound/core/info.c 2003-05-26 18:00:59.000000000 -0700
+++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/info.c 2003-05-28 22:41:51.000000000 -0700
@@ -126,27 +126,6 @@
snd_info_entry_t *snd_oss_root = NULL;
#endif

-#ifdef LINUX_2_2
-static void snd_info_fill_inode(struct inode *inode, int fill)
-{
- if (fill)
- MOD_INC_USE_COUNT;
- else
- MOD_DEC_USE_COUNT;
-}
-
-static inline void snd_info_entry_prepare(struct proc_dir_entry *de)
-{
- de->fill_inode = snd_info_fill_inode;
-}
-
-void snd_remove_proc_entry(struct proc_dir_entry *parent,
- struct proc_dir_entry *de)
-{
- if (parent && de)
- proc_unregister(parent, de->low_ino);
-}
-#else
static inline void snd_info_entry_prepare(struct proc_dir_entry *de)
{
de->owner = THIS_MODULE;
@@ -158,7 +137,6 @@
if (de)
remove_proc_entry(de->name, parent);
}
-#endif

static loff_t snd_info_entry_llseek(struct file *file, loff_t offset, int orig)
{
@@ -520,9 +498,7 @@

static struct file_operations snd_info_entry_operations =
{
-#ifndef LINUX_2_2
.owner = THIS_MODULE,
-#endif
.llseek = snd_info_entry_llseek,
.read = snd_info_entry_read,
.write = snd_info_entry_write,
@@ -533,67 +509,22 @@
.release = snd_info_entry_release,
};

-#ifdef LINUX_2_2
-static struct inode_operations snd_info_entry_inode_operations =
-{
- &snd_info_entry_operations, /* default sound info directory file-ops */
-};
-
-static struct inode_operations snd_info_device_inode_operations =
-{
- &snd_fops, /* default sound info directory file-ops */
-};
-#endif /* LINUX_2_2 */
-
static int snd_info_card_readlink(struct dentry *dentry,
char *buffer, int buflen)
{
char *s = PDE(dentry->d_inode)->data;
-#ifndef LINUX_2_2
return vfs_readlink(dentry, buffer, buflen, s);
-#else
- int len;
-
- if (s == NULL)
- return -EIO;
- len = strlen(s);
- if (len > buflen)
- len = buflen;
- if (copy_to_user(buffer, s, len))
- return -EFAULT;
- return len;
-#endif
}

-#ifndef LINUX_2_2
static int snd_info_card_followlink(struct dentry *dentry,
struct nameidata *nd)
{
- char *s = PDE(dentry->d_inode)->data;
- return vfs_follow_link(nd, s);
-}
-#else
-static struct dentry *snd_info_card_followlink(struct dentry *dentry,
- struct dentry *base,
- unsigned int follow)
-{
char *s = PDE(dentry->d_inode)->data;
- return lookup_dentry(s, base, follow);
+ return vfs_follow_link(nd, s);
}
-#endif
-
-#ifdef LINUX_2_2
-static struct file_operations snd_info_card_link_operations =
-{
- NULL
-};
-#endif

struct inode_operations snd_info_card_link_inode_operations =
{
-#ifdef LINUX_2_2
- .default_file_ops = &snd_info_card_link_operations,
-#endif
.readlink = snd_info_card_readlink,
.follow_link = snd_info_card_followlink,
};
@@ -744,12 +675,8 @@
if (p == NULL)
return -ENOMEM;
p->data = s;
-#ifndef LINUX_2_2
p->owner = card->module;
p->proc_iops = &snd_info_card_link_inode_operations;
-#else
- p->ops = &snd_info_card_link_inode_operations;
-#endif
card->proc_root_link = p;
return 0;
}
@@ -1008,40 +935,11 @@
snd_magic_kfree(entry);
}

-#ifdef LINUX_2_2
-static void snd_info_device_fill_inode(struct inode *inode, int fill)
-{
- struct proc_dir_entry *de;
- snd_info_entry_t *entry;
-
- if (!fill) {
- MOD_DEC_USE_COUNT;
- return;
- }
- MOD_INC_USE_COUNT;
- de = PDE(inode);
- if (de == NULL)
- return;
- entry = (snd_info_entry_t *) de->data;
- if (entry == NULL)
- return;
- inode->i_gid = device_gid;
- inode->i_uid = device_uid;
- inode->i_rdev = MKDEV(entry->c.device.major, entry->c.device.minor);
-}
-
-static inline void snd_info_device_entry_prepare(struct proc_dir_entry *de, snd_info_entry_t *entry)
-{
- de->fill_inode = snd_info_device_fill_inode;
- de->ops = &snd_info_device_inode_operations;
-}
-#else
static inline void snd_info_device_entry_prepare(struct proc_dir_entry *de, snd_info_entry_t *entry)
{
de->rdev = mk_kdev(entry->c.device.major, entry->c.device.minor);
de->owner = THIS_MODULE;
}
-#endif /* LINUX_2_2 */

/*
* create a procfs device file
@@ -1119,15 +1017,9 @@
up(&info_mutex);
return -ENOMEM;
}
-#ifndef LINUX_2_2
p->owner = entry->module;
-#endif
if (!S_ISDIR(entry->mode)) {
-#ifndef LINUX_2_2
p->proc_fops = &snd_info_entry_operations;
-#else
- p->ops = &snd_info_entry_inode_operations;
-#endif
}
p->size = entry->size;
p->data = entry;
diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/init.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/init.c
--- linux-2.5.70-mm1.install_new_page/sound/core/init.c 2003-05-26 18:00:25.000000000 -0700
+++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/init.c 2003-05-28 22:41:51.000000000 -0700
@@ -193,9 +193,7 @@
f_ops = &s_f_ops->f_ops;

memset(f_ops, 0, sizeof(*f_ops));
-#ifndef LINUX_2_2
f_ops->owner = file->f_op->owner;
-#endif
f_ops->release = file->f_op->release;
f_ops->poll = snd_disconnect_poll;

diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/oss/mixer_oss.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/oss/mixer_oss.c
--- linux-2.5.70-mm1.install_new_page/sound/core/oss/mixer_oss.c 2003-05-26 18:00:42.000000000 -0700
+++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/oss/mixer_oss.c 2003-05-28 22:41:51.000000000 -0700
@@ -376,9 +376,7 @@

static struct file_operations snd_mixer_oss_f_ops =
{
-#ifndef LINUX_2_2
.owner = THIS_MODULE,
-#endif
.open = snd_mixer_oss_open,
.release = snd_mixer_oss_release,
.ioctl = snd_mixer_oss_ioctl,
diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/oss/pcm_oss.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/oss/pcm_oss.c
--- linux-2.5.70-mm1.install_new_page/sound/core/oss/pcm_oss.c 2003-05-26 18:00:56.000000000 -0700
+++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/oss/pcm_oss.c 2003-05-28 22:41:51.000000000 -0700
@@ -2148,9 +2148,7 @@

static struct file_operations snd_pcm_oss_f_reg =
{
-#ifndef LINUX_2_2
.owner = THIS_MODULE,
-#endif
.read = snd_pcm_oss_read,
.write = snd_pcm_oss_write,
.open = snd_pcm_oss_open,
diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/pcm_native.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/pcm_native.c
--- linux-2.5.70-mm1.install_new_page/sound/core/pcm_native.c 2003-05-28 21:39:45.000000000 -0700
+++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/pcm_native.c 2003-05-28 22:46:38.000000000 -0700
@@ -60,11 +60,6 @@
static int snd_pcm_hw_refine_old_user(snd_pcm_substream_t * substream, struct sndrv_pcm_hw_params_old * _oparams);
static int snd_pcm_hw_params_old_user(snd_pcm_substream_t * substream, struct sndrv_pcm_hw_params_old * _oparams);

-#ifndef LINUX_2_2
-#define NOPAGE_OOM VM_FAULT_OOM
-#define NOPAGE_SIGBUS VM_FAULT_SIGBUS
-#endif
-
/*
*
*/
@@ -2687,21 +2682,13 @@
}

#ifndef VM_RESERVED
-#ifndef LINUX_2_2
static int snd_pcm_mmap_swapout(struct page * page, struct file * file)
-#else
-static int snd_pcm_mmap_swapout(struct vm_area_struct * area, struct page * page)
-#endif
{
return 0;
}
#endif

-#ifndef LINUX_2_2
static int snd_pcm_mmap_status_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd)
-#else
-static unsigned long snd_pcm_mmap_status_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
-#endif
{
snd_pcm_substream_t *substream = (snd_pcm_substream_t *)area->vm_private_data;
snd_pcm_runtime_t *runtime;
@@ -2712,11 +2699,7 @@
runtime = substream->runtime;
page = virt_to_page(runtime->status);
get_page(page);
-#ifndef LINUX_2_2
return install_new_page(mm, area, address, write_access, pmd, page);
-#else
- return page_address(page);
-#endif
}

static struct vm_operations_struct snd_pcm_vm_ops_status =
@@ -2740,22 +2723,14 @@
if (size != PAGE_ALIGN(sizeof(snd_pcm_mmap_status_t)))
return -EINVAL;
area->vm_ops = &snd_pcm_vm_ops_status;
-#ifndef LINUX_2_2
area->vm_private_data = substream;
-#else
- area->vm_private_data = (long)substream;
-#endif
#ifdef VM_RESERVED
area->vm_flags |= VM_RESERVED;
#endif
return 0;
}

-#ifndef LINUX_2_2
static int snd_pcm_mmap_control_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd)
-#else
-static unsigned long snd_pcm_mmap_control_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
-#endif
{
snd_pcm_substream_t *substream = (snd_pcm_substream_t *)area->vm_private_data;
snd_pcm_runtime_t *runtime;
@@ -2766,11 +2741,7 @@
runtime = substream->runtime;
page = virt_to_page(runtime->control);
get_page(page);
-#ifndef LINUX_2_2
return install_new_page(mm, area, address, write_access, pmd, page);
-#else
- return page_address(page);
-#endif
}

static struct vm_operations_struct snd_pcm_vm_ops_control =
@@ -2794,11 +2765,7 @@
if (size != PAGE_ALIGN(sizeof(snd_pcm_mmap_control_t)))
return -EINVAL;
area->vm_ops = &snd_pcm_vm_ops_control;
-#ifndef LINUX_2_2
area->vm_private_data = substream;
-#else
- area->vm_private_data = (long)substream;
-#endif
#ifdef VM_RESERVED
area->vm_flags |= VM_RESERVED;
#endif
@@ -2817,11 +2784,7 @@
atomic_dec(&substream->runtime->mmap_count);
}

-#ifndef LINUX_2_2
static int snd_pcm_mmap_data_nopage(struct mm_struct *mm, struct vm_area_struct *area, unsigned long address, int write_access, pmd_t *pmd)
-#else
-static unsigned long snd_pcm_mmap_data_nopage(struct vm_area_struct *area, unsigned long address, int no_share)
-#endif
{
snd_pcm_substream_t *substream = (snd_pcm_substream_t *)area->vm_private_data;
snd_pcm_runtime_t *runtime;
@@ -2852,11 +2815,7 @@
page = virt_to_page(vaddr);
}
get_page(page);
-#ifndef LINUX_2_2
return install_new_page(mm, area, address, write_access, pmd, page);
-#else
- return page_address(page);
-#endif
}

static struct vm_operations_struct snd_pcm_vm_ops_data =
@@ -2906,11 +2865,7 @@
return -EINVAL;

area->vm_ops = &snd_pcm_vm_ops_data;
-#ifndef LINUX_2_2
area->vm_private_data = substream;
-#else
- area->vm_private_data = (long)substream;
-#endif
#ifdef VM_RESERVED
area->vm_flags |= VM_RESERVED;
#endif
@@ -3040,9 +2995,7 @@
*/

static struct file_operations snd_pcm_f_ops_playback = {
-#ifndef LINUX_2_2
.owner = THIS_MODULE,
-#endif
.write = snd_pcm_write,
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 3, 44)
.writev = snd_pcm_writev,
@@ -3056,9 +3009,7 @@
};

static struct file_operations snd_pcm_f_ops_capture = {
-#ifndef LINUX_2_2
.owner = THIS_MODULE,
-#endif
.read = snd_pcm_read,
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 3, 44)
.readv = snd_pcm_readv,
diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/rawmidi.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/rawmidi.c
--- linux-2.5.70-mm1.install_new_page/sound/core/rawmidi.c 2003-05-26 18:00:24.000000000 -0700
+++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/rawmidi.c 2003-05-28 22:41:51.000000000 -0700
@@ -1316,9 +1316,7 @@

static struct file_operations snd_rawmidi_f_ops =
{
-#ifndef LINUX_2_2
.owner = THIS_MODULE,
-#endif
.read = snd_rawmidi_read,
.write = snd_rawmidi_write,
.open = snd_rawmidi_open,
diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/seq/oss/seq_oss.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/seq/oss/seq_oss.c
--- linux-2.5.70-mm1.install_new_page/sound/core/seq/oss/seq_oss.c 2003-05-26 18:00:46.000000000 -0700
+++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/seq/oss/seq_oss.c 2003-05-28 22:41:51.000000000 -0700
@@ -194,9 +194,7 @@

static struct file_operations seq_oss_f_ops =
{
-#ifndef LINUX_2_2
.owner = THIS_MODULE,
-#endif
.read = odev_read,
.write = odev_write,
.open = odev_open,
diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/seq/seq_clientmgr.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/seq/seq_clientmgr.c
--- linux-2.5.70-mm1.install_new_page/sound/core/seq/seq_clientmgr.c 2003-05-26 18:00:24.000000000 -0700
+++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/seq/seq_clientmgr.c 2003-05-28 22:41:51.000000000 -0700
@@ -2454,9 +2454,7 @@

static struct file_operations snd_seq_f_ops =
{
-#ifndef LINUX_2_2
.owner = THIS_MODULE,
-#endif
.read = snd_seq_read,
.write = snd_seq_write,
.open = snd_seq_open,
diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/seq/seq_memory.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/seq/seq_memory.c
--- linux-2.5.70-mm1.install_new_page/sound/core/seq/seq_memory.c 2003-05-26 18:00:23.000000000 -0700
+++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/seq/seq_memory.c 2003-05-28 22:41:51.000000000 -0700
@@ -235,18 +235,7 @@
while (pool->free == NULL && ! nonblock && ! pool->closing) {

spin_unlock(&pool->lock);
-#ifdef LINUX_2_2
- /* change semaphore to allow other clients
- to access device file */
- if (file)
- up(&semaphore_of(file));
-#endif
interruptible_sleep_on(&pool->output_sleep);
-#ifdef LINUX_2_2
- /* restore semaphore again */
- if (file)
- down(&semaphore_of(file));
-#endif
spin_lock(&pool->lock);
/* interrupted? */
if (signal_pending(current)) {
diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/sound.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/sound.c
--- linux-2.5.70-mm1.install_new_page/sound/core/sound.c 2003-05-26 18:00:43.000000000 -0700
+++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/sound.c 2003-05-28 22:41:51.000000000 -0700
@@ -157,9 +157,7 @@

struct file_operations snd_fops =
{
-#ifndef LINUX_2_2
.owner = THIS_MODULE,
-#endif
.open = snd_open
};

diff -urN -X dontdiff linux-2.5.70-mm1.install_new_page/sound/core/timer.c linux-2.5.70-mm1.loseLINUX_2_2/sound/core/timer.c
--- linux-2.5.70-mm1.install_new_page/sound/core/timer.c 2003-05-26 18:00:41.000000000 -0700
+++ linux-2.5.70-mm1.loseLINUX_2_2/sound/core/timer.c 2003-05-28 22:41:51.000000000 -0700
@@ -1733,9 +1733,7 @@

static struct file_operations snd_timer_f_ops =
{
-#ifndef LINUX_2_2
.owner = THIS_MODULE,
-#endif
.read = snd_timer_user_read,
.open = snd_timer_user_open,
.release = snd_timer_user_release,

2003-05-29 16:17:31

by Hugh Dickins

[permalink] [raw]
Subject: Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race

On Thu, 29 May 2003, Paul E. McKenney wrote:
> On Fri, May 23, 2003 at 11:42:02AM -0700, Paul E. McKenney wrote:
> >
> > Exactly -- allows a ->nopage() to drop some lock to avoid races
> > between pagefault and either vmtruncate() or invalidate_mmap_range().
> > This race (from the cross-host mmap viewpoint) is described in:
> >
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=105286345316249&w=2
>
> Rediffed for 2.5.70-mm1.

Me? I much preferred your original, much sparer, nopagedone patch
(labelled "uglyh as hell" by hch). I dislike passing lots of args
down a level so they can be passed up again to the library function.

In particular, I feel queasy (fear loss of control) about passing a
pmd_t* down to a filesystem, which I'd prefer to have no access to
such. But I may be in a minority, and the decision won't be mine.

Hugh

2003-05-29 17:02:03

by Daniel Phillips

[permalink] [raw]
Subject: Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race

On Thursday 29 May 2003 18:33, you wrote:
> On Thu, 29 May 2003, Paul E. McKenney wrote:
> > On Fri, May 23, 2003 at 11:42:02AM -0700, Paul E. McKenney wrote:
> > > Exactly -- allows a ->nopage() to drop some lock to avoid races
> > > between pagefault and either vmtruncate() or invalidate_mmap_range().
> > > This race (from the cross-host mmap viewpoint) is described in:
> > >
> > > http://marc.theaimsgroup.com/?l=linux-kernel&m=105286345316249&w=2
> >
> > Rediffed for 2.5.70-mm1.
>
> Me? I much preferred your original, much sparer, nopagedone patch
> (labelled "uglyh as hell" by hch).

"me too".

The fat patch that hits every fs to get rid of two lines and .5 cycles per
no_page fault could be an epilogue (if/when it passes muster) to the little
one that does the job and has already been thoroughly tested.

I see both sides of the argument. The third side, not yet discussed, is the
value of doing things incrementally, with widespread testing of the system at
each step.

Regards,

Daniel

2003-05-29 17:26:26

by Daniel Phillips

[permalink] [raw]
Subject: Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race

On Thursday 29 May 2003 19:15, Daniel Phillips wrote:
> On Thursday 29 May 2003 18:33, you wrote:
> > Me? I much preferred your original, much sparer, nopagedone patch
> > (labelled "uglyh as hell" by hch).
>
> "me too".

Oh wait, I mispoke... there is another formulation of the patch that hasn't
yet been posted for review. Instead of having the nopagedone hook, it turns
the entire do_no_page into a hook, per hch's suggestion, but leaves in the
->nopage hook, which makes the patch small and obviously right. I need to
post that version for comparison, please bear with me.

IMHO, it's nicer than the ->nopagedone form.

Regards,

Daniel

2003-05-29 20:11:53

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race

On Thu, May 29, 2003 at 07:39:47PM +0200, Daniel Phillips wrote:
> On Thursday 29 May 2003 19:15, Daniel Phillips wrote:
> > On Thursday 29 May 2003 18:33, you wrote:
> > > Me? I much preferred your original, much sparer, nopagedone patch
> > > (labelled "uglyh as hell" by hch).
> >
> > "me too".
>
> Oh wait, I mispoke... there is another formulation of the patch that hasn't
> yet been posted for review. Instead of having the nopagedone hook, it turns
> the entire do_no_page into a hook, per hch's suggestion, but leaves in the
> ->nopage hook, which makes the patch small and obviously right. I need to
> post that version for comparison, please bear with me.
>
> IMHO, it's nicer than the ->nopagedone form.

I put together something like this, but the problem with it is that
do_anonymous_page() needs the mm->page_table_lock held, but the
->nopage functions want this lock not to be held. One could require
that all the lock be held on entry to all ->nopage functions, but
this would require almost all ->nopage functions to drop the lock
immediately upon entry. This seemed error-prone to me, but could
certainly be done...

Thoughts? Me, I don't care as long as there is some reasonable
way for distributed filesystems to safely resolve the race between
page faults and invalidation requests from other nodes. ;-)

Thanx, Paul

2003-05-30 16:45:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race

On Thu, May 29, 2003 at 05:33:04PM +0100, Hugh Dickins wrote:
> On Thu, 29 May 2003, Paul E. McKenney wrote:
> > On Fri, May 23, 2003 at 11:42:02AM -0700, Paul E. McKenney wrote:
> > >
> > > Exactly -- allows a ->nopage() to drop some lock to avoid races
> > > between pagefault and either vmtruncate() or invalidate_mmap_range().
> > > This race (from the cross-host mmap viewpoint) is described in:
> > >
> > > http://marc.theaimsgroup.com/?l=linux-kernel&m=105286345316249&w=2
> >
> > Rediffed for 2.5.70-mm1.
>
> Me? I much preferred your original, much sparer, nopagedone patch
> (labelled "uglyh as hell" by hch). I dislike passing lots of args
> down a level so they can be passed up again to the library function.
>
> In particular, I feel queasy (fear loss of control) about passing a
> pmd_t* down to a filesystem, which I'd prefer to have no access to
> such. But I may be in a minority, and the decision won't be mine.

Fine by me either way. ;-) Here is the rediffed nopagedone patch
for 2.5.70-mm1.

Thanx, Paul


diff -urN -X dontdiff linux-2.5.70-mm1/include/linux/mm.h linux-2.5.70-mm1.nopagedone/include/linux/mm.h
--- linux-2.5.70-mm1/include/linux/mm.h 2003-05-28 20:16:04.000000000 -0700
+++ linux-2.5.70-mm1.nopagedone/include/linux/mm.h 2003-05-29 19:34:55.000000000 -0700
@@ -143,6 +143,7 @@
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int unused);
+ void (*nopagedone)(struct vm_area_struct * area, unsigned long address, int status);
int (*populate)(struct vm_area_struct * area, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock);
};

diff -urN -X dontdiff linux-2.5.70-mm1/mm/memory.c linux-2.5.70-mm1.nopagedone/mm/memory.c
--- linux-2.5.70-mm1/mm/memory.c 2003-05-28 20:16:04.000000000 -0700
+++ linux-2.5.70-mm1.nopagedone/mm/memory.c 2003-05-29 19:34:55.000000000 -0700
@@ -1468,6 +1468,9 @@
ret = VM_FAULT_OOM;
out:
pte_chain_free(pte_chain);
+ if (vma->vm_ops && vma->vm_ops->nopagedone) {
+ vma->vm_ops->nopagedone(vma, address & PAGE_MASK, ret);
+ }
return ret;
}