2010-01-08 22:05:27

by David Howells

[permalink] [raw]
Subject: [PATCH 1/6] NOMMU: Fix SYSV SHM for NOMMU

Commit c4caa778157dbbf04116f0ac2111e389b5cd7a29 broke SYSV SHM for NOMMU by
taking away the pointer to shm_get_unmapped_area() from shm_file_operations.

Put it back conditionally on CONFIG_MMU=n.

file->f_ops->get_unmapped_area() is used to find out the base address for a
mapping of a mappable chardev device or mappable memory-based file (such as a
ramfs file). It needs to be called prior to file->f_ops->mmap() being called.

Signed-off-by: David Howells <[email protected]>
---

ipc/shm.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)


diff --git a/ipc/shm.c b/ipc/shm.c
index 92fe923..f2da7d2 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -298,6 +298,7 @@ static const struct file_operations shm_file_operations = {
.mmap = shm_mmap,
.fsync = shm_fsync,
.release = shm_release,
+ .get_unmapped_area = shm_get_unmapped_area,
};

static const struct file_operations shm_file_operations_huge = {


2010-01-08 22:05:56

by David Howells

[permalink] [raw]
Subject: [PATCH 6/6] NOMMU: Fix shared mmap after truncate shrinkage problems

Fix a problem in NOMMU mmap with ramfs whereby a shared mmap can happen over
the end of a truncation. The problem is that ramfs_nommu_check_mappings()
checks that the reduced file size against the VMA tree, but not the vm_region
tree.

The following sequence of events can cause the problem:

fd = open("/tmp/x", O_RDWR|O_TRUNC|O_CREAT, 0600);
ftruncate(fd, 32 * 1024);
a = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
b = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
munmap(a, 32 * 1024);
ftruncate(fd, 16 * 1024);
c = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

Mapping 'a' creates a vm_region covering 32KB of the file. Mapping 'b' sees
that the vm_region from 'a' is covering the region it wants and so shares it,
pinning it in memory.

Mapping 'a' then goes away and the file is truncated to the end of VMA 'b'.
However, the region allocated by 'a' is still in effect, and has _not_ been
reduced.

Mapping 'c' is then created, and because there's a vm_region covering the
desired region, get_unmapped_area() is _not_ called to repeat the check, and
the mapping is granted, even though the pages from the latter half of the
mapping have been discarded.

However:

d = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

Mapping 'd' should work, and should end up sharing the region allocated by
'a'.

To deal with this, we shrink the vm_region struct during the truncation, lest
do_mmap_pgoff() take it as licence to share the full region automatically
without calling the get_unmapped_area() file op again.

Signed-off-by: David Howells <[email protected]>
---

fs/ramfs/file-nommu.c | 31 +------------------------
include/linux/mm.h | 1 +
mm/nommu.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 64 insertions(+), 30 deletions(-)


diff --git a/fs/ramfs/file-nommu.c b/fs/ramfs/file-nommu.c
index 2665313..1739a4a 100644
--- a/fs/ramfs/file-nommu.c
+++ b/fs/ramfs/file-nommu.c
@@ -123,35 +123,6 @@ add_error:

/*****************************************************************************/
/*
- * check that file shrinkage doesn't leave any VMAs dangling in midair
- */
-static int ramfs_nommu_check_mappings(struct inode *inode,
- size_t newsize, size_t size)
-{
- struct vm_area_struct *vma;
- struct prio_tree_iter iter;
-
- down_write(&nommu_region_sem);
-
- /* search for VMAs that fall within the dead zone */
- vma_prio_tree_foreach(vma, &iter, &inode->i_mapping->i_mmap,
- newsize >> PAGE_SHIFT,
- (size + PAGE_SIZE - 1) >> PAGE_SHIFT
- ) {
- /* found one - only interested if it's shared out of the page
- * cache */
- if (vma->vm_flags & VM_SHARED) {
- up_write(&nommu_region_sem);
- return -ETXTBSY; /* not quite true, but near enough */
- }
- }
-
- up_write(&nommu_region_sem);
- return 0;
-}
-
-/*****************************************************************************/
-/*
*
*/
static int ramfs_nommu_resize(struct inode *inode, loff_t newsize, loff_t size)
@@ -169,7 +140,7 @@ static int ramfs_nommu_resize(struct inode *inode, loff_t newsize, loff_t size)

/* check that a decrease in size doesn't cut off any shared mappings */
if (newsize < size) {
- ret = ramfs_nommu_check_mappings(inode, newsize, size);
+ ret = nommu_shrink_inode_mappings(inode, size, newsize);
if (ret < 0)
return ret;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2265f28..60c467b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1089,6 +1089,7 @@ extern void zone_pcp_update(struct zone *zone);

/* nommu.c */
extern atomic_long_t mmap_pages_allocated;
+extern int nommu_shrink_inode_mappings(struct inode *, size_t, size_t);

/* prio_tree.c */
void vma_prio_tree_add(struct vm_area_struct *, struct vm_area_struct *old);
diff --git a/mm/nommu.c b/mm/nommu.c
index 32be0cf..5e09ab6 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1914,3 +1914,65 @@ int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, in
mmput(mm);
return len;
}
+
+/**
+ * nommu_shrink_inode_mappings - Shrink the shared mappings on an inode
+ * @inode: The inode to check
+ * @size: The current filesize of the inode
+ * @newsize: The proposed filesize of the inode
+ *
+ * Check the shared mappings on an inode on behalf of a shrinking truncate to
+ * make sure that that any outstanding VMAs aren't broken and then shrink the
+ * vm_regions that extend that beyond so that do_mmap_pgoff() doesn't
+ * automatically grant mappings that are too large.
+ */
+int nommu_shrink_inode_mappings(struct inode *inode, size_t size,
+ size_t newsize)
+{
+ struct vm_area_struct *vma;
+ struct prio_tree_iter iter;
+ struct vm_region *region;
+ pgoff_t low, high;
+ size_t r_size, r_top;
+
+ low = newsize >> PAGE_SHIFT;
+ high = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+ down_write(&nommu_region_sem);
+
+ /* search for VMAs that fall within the dead zone */
+ vma_prio_tree_foreach(vma, &iter, &inode->i_mapping->i_mmap,
+ low, high) {
+ /* found one - only interested if it's shared out of the page
+ * cache */
+ if (vma->vm_flags & VM_SHARED) {
+ up_write(&nommu_region_sem);
+ return -ETXTBSY; /* not quite true, but near enough */
+ }
+ }
+
+ /* reduce any regions that overlap the dead zone - if in existence,
+ * these will be pointed to by VMAs that don't overlap the dead zone
+ *
+ * we don't check for any regions that start beyond the EOF as there
+ * shouldn't be any
+ */
+ vma_prio_tree_foreach(vma, &iter, &inode->i_mapping->i_mmap,
+ 0, ULONG_MAX) {
+ if (!(vma->vm_flags & VM_SHARED))
+ continue;
+
+ region = vma->vm_region;
+ r_size = region->vm_top - region->vm_start;
+ r_top = (region->vm_pgoff << PAGE_SHIFT) + r_size;
+
+ if (r_top > newsize) {
+ region->vm_top -= r_top - newsize;
+ if (region->vm_end > region->vm_top)
+ region->vm_end = region->vm_top;
+ }
+ }
+
+ up_write(&nommu_region_sem);
+ return 0;
+}

2010-01-08 22:05:43

by David Howells

[permalink] [raw]
Subject: [PATCH 3/6] NOMMU: Remove a superfluous check of vm_region::vm_usage

In split_vma(), there's no need to check if the VMA being split has a region
that's in use by more than one VMA because:

(1) The preceding test prohibits splitting of non-anonymous VMAs and regions
(eg: file or chardev backed VMAs).

(2) Anonymous regions can't be mapped multiple times because there's no handle
by which to refer to the already existing region.

(3) If a VMA has previously been split, then the region backing it has also
been split into two regions, each of usage 1.

Signed-off-by: David Howells <[email protected]>
---

mm/nommu.c | 7 +++----
1 files changed, 3 insertions(+), 4 deletions(-)


diff --git a/mm/nommu.c b/mm/nommu.c
index 5e39294..d6dd656 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1441,10 +1441,9 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,

kenter("");

- /* we're only permitted to split anonymous regions that have a single
- * owner */
- if (vma->vm_file ||
- vma->vm_region->vm_usage != 1)
+ /* we're only permitted to split anonymous regions (these should have
+ * only a single usage on the region) */
+ if (vma->vm_file)
return -ENOMEM;

if (mm->map_count >= sysctl_max_map_count)

2010-01-08 22:05:36

by David Howells

[permalink] [raw]
Subject: [PATCH 2/6] NOMMU: struct vm_region's vm_usage count need not be atomic

The vm_usage count field in struct vm_region does not need to be atomic as it's
only even modified whilst nommu_region_sem is write locked.

Signed-off-by: David Howells <[email protected]>
---

include/linux/mm_types.h | 2 +-
mm/nommu.c | 14 +++++++-------
2 files changed, 8 insertions(+), 8 deletions(-)


diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 84d020b..80cfa78 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -122,7 +122,7 @@ struct vm_region {
unsigned long vm_pgoff; /* the offset in vm_file corresponding to vm_start */
struct file *vm_file; /* the backing file or NULL */

- atomic_t vm_usage; /* region usage count */
+ int vm_usage; /* region usage count (access under nommu_region_sem) */
bool vm_icache_flushed : 1; /* true if the icache has been flushed for
* this region */
};
diff --git a/mm/nommu.c b/mm/nommu.c
index 1777386..5e39294 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -552,11 +552,11 @@ static void free_page_series(unsigned long from, unsigned long to)
static void __put_nommu_region(struct vm_region *region)
__releases(nommu_region_sem)
{
- kenter("%p{%d}", region, atomic_read(&region->vm_usage));
+ kenter("%p{%d}", region, region->vm_usage);

BUG_ON(!nommu_region_tree.rb_node);

- if (atomic_dec_and_test(&region->vm_usage)) {
+ if (--region->vm_usage == 0) {
if (region->vm_top > region->vm_start)
delete_nommu_region(region);
up_write(&nommu_region_sem);
@@ -1205,7 +1205,7 @@ unsigned long do_mmap_pgoff(struct file *file,
if (!vma)
goto error_getting_vma;

- atomic_set(&region->vm_usage, 1);
+ region->vm_usage = 1;
region->vm_flags = vm_flags;
region->vm_pgoff = pgoff;

@@ -1272,7 +1272,7 @@ unsigned long do_mmap_pgoff(struct file *file,
}

/* we've found a region we can share */
- atomic_inc(&pregion->vm_usage);
+ pregion->vm_usage++;
vma->vm_region = pregion;
start = pregion->vm_start;
start += (pgoff - pregion->vm_pgoff) << PAGE_SHIFT;
@@ -1289,7 +1289,7 @@ unsigned long do_mmap_pgoff(struct file *file,
vma->vm_region = NULL;
vma->vm_start = 0;
vma->vm_end = 0;
- atomic_dec(&pregion->vm_usage);
+ pregion->vm_usage--;
pregion = NULL;
goto error_just_free;
}
@@ -1444,7 +1444,7 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
/* we're only permitted to split anonymous regions that have a single
* owner */
if (vma->vm_file ||
- atomic_read(&vma->vm_region->vm_usage) != 1)
+ vma->vm_region->vm_usage != 1)
return -ENOMEM;

if (mm->map_count >= sysctl_max_map_count)
@@ -1518,7 +1518,7 @@ static int shrink_vma(struct mm_struct *mm,

/* cut the backing region down to size */
region = vma->vm_region;
- BUG_ON(atomic_read(&region->vm_usage) != 1);
+ BUG_ON(region->vm_usage != 1);

down_write(&nommu_region_sem);
delete_nommu_region(region);

2010-01-08 22:05:51

by David Howells

[permalink] [raw]
Subject: [PATCH 5/6] NOMMU: Fix race between ramfs truncation and shared mmap

Fix the race between the truncation of a ramfs file and an attempt to make a
shared mmap of region of that file.

The problem is that do_mmap_pgoff() calls f_op->get_unmapped_area() to verify
that the file region is made of contiguous pages and to find its base address -
but there isn't any locking to guarantee this region until
vma_prio_tree_insert() is called by add_vma_to_mm().

Note that moving the functionality into f_op->mmap() doesn't help as that is
also called before vma_prio_tree_insert().

Instead make ramfs_nommu_check_mappings() grab nommu_region_sem whilst it does
its checks. This means that this function will wait whilst mmaps take place.

Signed-off-by: David Howells <[email protected]>
---

fs/ramfs/file-nommu.c | 7 ++++++-
include/linux/sched.h | 2 +-
2 files changed, 7 insertions(+), 2 deletions(-)


diff --git a/fs/ramfs/file-nommu.c b/fs/ramfs/file-nommu.c
index 2efc571..2665313 100644
--- a/fs/ramfs/file-nommu.c
+++ b/fs/ramfs/file-nommu.c
@@ -131,6 +131,8 @@ static int ramfs_nommu_check_mappings(struct inode *inode,
struct vm_area_struct *vma;
struct prio_tree_iter iter;

+ down_write(&nommu_region_sem);
+
/* search for VMAs that fall within the dead zone */
vma_prio_tree_foreach(vma, &iter, &inode->i_mapping->i_mmap,
newsize >> PAGE_SHIFT,
@@ -138,10 +140,13 @@ static int ramfs_nommu_check_mappings(struct inode *inode,
) {
/* found one - only interested if it's shared out of the page
* cache */
- if (vma->vm_flags & VM_SHARED)
+ if (vma->vm_flags & VM_SHARED) {
+ up_write(&nommu_region_sem);
return -ETXTBSY; /* not quite true, but near enough */
+ }
}

+ up_write(&nommu_region_sem);
return 0;
}

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 08a2523..6f7bba9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -389,7 +389,7 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
extern void arch_unmap_area(struct mm_struct *, unsigned long);
extern void arch_unmap_area_topdown(struct mm_struct *, unsigned long);
#else
-extern void arch_pick_mmap_layout(struct mm_struct *mm) {}
+static inline void arch_pick_mmap_layout(struct mm_struct *mm) {}
#endif

#if USE_SPLIT_PTLOCKS

2010-01-08 22:06:24

by David Howells

[permalink] [raw]
Subject: [PATCH 4/6] NOMMU: Don't need get_unmapped_area() for NOMMU

get_unmapped_area() is unnecessary for NOMMU as no-one calls it.

Signed-off-by: David Howells <[email protected]>
---

include/linux/mm_types.h | 2 ++
include/linux/sched.h | 7 +++++--
mm/nommu.c | 21 ---------------------
mm/util.c | 2 +-
4 files changed, 8 insertions(+), 24 deletions(-)


diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 80cfa78..36f9627 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -205,10 +205,12 @@ struct mm_struct {
struct vm_area_struct * mmap; /* list of VMAs */
struct rb_root mm_rb;
struct vm_area_struct * mmap_cache; /* last find_vma result */
+#ifdef CONFIG_MMU
unsigned long (*get_unmapped_area) (struct file *filp,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags);
void (*unmap_area) (struct mm_struct *mm, unsigned long addr);
+#endif
unsigned long mmap_base; /* base of mmap area */
unsigned long task_size; /* size of task vm space */
unsigned long cached_hole_size; /* if non-zero, the largest hole below free_area_cache */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8d4991b..08a2523 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -377,6 +377,8 @@ extern int sysctl_max_map_count;

#include <linux/aio.h>

+#ifdef CONFIG_MMU
+extern void arch_pick_mmap_layout(struct mm_struct *mm);
extern unsigned long
arch_get_unmapped_area(struct file *, unsigned long, unsigned long,
unsigned long, unsigned long);
@@ -386,6 +388,9 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
unsigned long flags);
extern void arch_unmap_area(struct mm_struct *, unsigned long);
extern void arch_unmap_area_topdown(struct mm_struct *, unsigned long);
+#else
+extern void arch_pick_mmap_layout(struct mm_struct *mm) {}
+#endif

#if USE_SPLIT_PTLOCKS
/*
@@ -2491,8 +2496,6 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)

#endif /* CONFIG_SMP */

-extern void arch_pick_mmap_layout(struct mm_struct *mm);
-
#ifdef CONFIG_TRACING
extern void
__trace_special(void *__tr, void *__data,
diff --git a/mm/nommu.c b/mm/nommu.c
index d6dd656..32be0cf 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1761,27 +1761,6 @@ void unmap_mapping_range(struct address_space *mapping,
EXPORT_SYMBOL(unmap_mapping_range);

/*
- * ask for an unmapped area at which to create a mapping on a file
- */
-unsigned long get_unmapped_area(struct file *file, unsigned long addr,
- unsigned long len, unsigned long pgoff,
- unsigned long flags)
-{
- unsigned long (*get_area)(struct file *, unsigned long, unsigned long,
- unsigned long, unsigned long);
-
- get_area = current->mm->get_unmapped_area;
- if (file && file->f_op && file->f_op->get_unmapped_area)
- get_area = file->f_op->get_unmapped_area;
-
- if (!get_area)
- return -ENOSYS;
-
- return get_area(file, addr, len, pgoff, flags);
-}
-EXPORT_SYMBOL(get_unmapped_area);
-
-/*
* Check that a process has enough memory to allocate a new virtual
* mapping. 0 means there is enough memory for the allocation to
* succeed and -ENOMEM implies there is not.
diff --git a/mm/util.c b/mm/util.c
index 7c35ad9..834db7b 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -220,7 +220,7 @@ char *strndup_user(const char __user *s, long n)
}
EXPORT_SYMBOL(strndup_user);

-#ifndef HAVE_ARCH_PICK_MMAP_LAYOUT
+#if defined(CONFIG_MMU) && !defined(HAVE_ARCH_PICK_MMAP_LAYOUT)
void arch_pick_mmap_layout(struct mm_struct *mm)
{
mm->mmap_base = TASK_UNMAPPED_BASE;

2010-01-08 22:10:22

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 1/6] NOMMU: Fix SYSV SHM for NOMMU

David Howells <[email protected]> wrote:

> Put it back conditionally on CONFIG_MMU=n.

Seems I forgot to put in the conditional bits. Revised patch attached.

David
---
From: David Howells <[email protected]>
Subject: [PATCH] NOMMU: Fix SYSV SHM for NOMMU

Commit c4caa778157dbbf04116f0ac2111e389b5cd7a29 broke SYSV SHM for NOMMU by
taking away the pointer to shm_get_unmapped_area() from shm_file_operations.

Put it back conditionally on CONFIG_MMU=n.

file->f_ops->get_unmapped_area() is used to find out the base address for a
mapping of a mappable chardev device or mappable memory-based file (such as a
ramfs file). It needs to be called prior to file->f_ops->mmap() being called.

Signed-off-by: David Howells <[email protected]>
---

ipc/shm.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)


diff --git a/ipc/shm.c b/ipc/shm.c
index 92fe923..23256b8 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -298,6 +298,9 @@ static const struct file_operations shm_file_operations = {
.mmap = shm_mmap,
.fsync = shm_fsync,
.release = shm_release,
+#ifndef CONFIG_MMU
+ .get_unmapped_area = shm_get_unmapped_area,
+#endif
};

static const struct file_operations shm_file_operations_huge = {

2010-01-09 00:56:10

by Mike Frysinger

[permalink] [raw]
Subject: Re: [PATCH 4/6] NOMMU: Don't need get_unmapped_area() for NOMMU

On Fri, Jan 8, 2010 at 17:05, David Howells wrote:
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
>
> +#ifdef CONFIG_MMU
> +extern void arch_pick_mmap_layout(struct mm_struct *mm);
> +#else
> +extern void arch_pick_mmap_layout(struct mm_struct *mm) {}
> +#endif

static inline instead of extern when !MMU ?
-mike

2010-01-09 00:57:42

by Mike Frysinger

[permalink] [raw]
Subject: Re: [PATCH 5/6] NOMMU: Fix race between ramfs truncation and shared mmap

On Fri, Jan 8, 2010 at 17:05, David Howells wrote:
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -389,7 +389,7 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
>  extern void arch_unmap_area(struct mm_struct *, unsigned long);
>  extern void arch_unmap_area_topdown(struct mm_struct *, unsigned long);
>  #else
> -extern void arch_pick_mmap_layout(struct mm_struct *mm) {}
> +static inline void arch_pick_mmap_layout(struct mm_struct *mm) {}
>  #endif

oh, i guess the static inline change was moved to this patch for some reason ...
-mike

2010-01-11 12:12:03

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 4/6] NOMMU: Don't need get_unmapped_area() for NOMMU

Mike Frysinger <[email protected]> wrote:

> static inline instead of extern when !MMU ?

Yep. The fix accidentally wound up in the next patch. I'll move it.

David

2010-01-14 05:36:53

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/6] NOMMU: Fix SYSV SHM for NOMMU

On Fri, Jan 08, 2010 at 10:10:13PM +0000, David Howells wrote:
> David Howells <[email protected]> wrote:
>
> > Put it back conditionally on CONFIG_MMU=n.
>
> Seems I forgot to put in the conditional bits. Revised patch attached.

Series looks sane... ACK on the entire bunch

2010-01-14 13:59:14

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 1/6] NOMMU: Fix SYSV SHM for NOMMU

Al Viro <[email protected]> wrote:

> Series looks sane... ACK on the entire bunch

Thanks!

David