2012-06-11 09:18:01

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 0/6][resend] mempolicy memory corruption fixlet

From: KOSAKI Motohiro <[email protected]>

Hi

This is trivial fixes of mempolicy meory corruption issues. There
are independent patches each ather. and, they don't change userland
ABIs.

Thanks.

changes from v1: fix some typo of changelogs.

-----------------------------------------------
KOSAKI Motohiro (6):
Revert "mm: mempolicy: Let vma_merge and vma_split handle
vma->vm_policy linkages"
mempolicy: Kill all mempolicy sharing
mempolicy: fix a race in shared_policy_replace()
mempolicy: fix refcount leak in mpol_set_shared_policy()
mempolicy: fix a memory corruption by refcount imbalance in
alloc_pages_vma()
MAINTAINERS: Added MEMPOLICY entry

MAINTAINERS | 7 +++
mm/mempolicy.c | 151 ++++++++++++++++++++++++++++++++++++++++----------------
mm/shmem.c | 9 ++--
3 files changed, 120 insertions(+), 47 deletions(-)


2012-06-11 09:18:17

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"

From: KOSAKI Motohiro <[email protected]>

commit 05f144a0d5 "mm: mempolicy: Let vma_merge and vma_split handle
vma->vm_policy linkages" removed a vma->vm_policy updates. But it is
a primary purpose of mbind_range(). Now, mbind(2) is no-op in several
case unintentionally. It is not ideal fix. This patch reverts it.

Cc: Dave Jones <[email protected]>,
Cc: Mel Gorman <[email protected]>
Cc: Christoph Lameter <[email protected]>,
Cc: Andrew Morton <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/mempolicy.c | 41 ++++++++++++++++++++++++-----------------
1 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f15c1b2..0a60def 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -607,6 +607,27 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
return first;
}

+/* Apply policy to a single VMA */
+static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new)
+{
+ int err = 0;
+ struct mempolicy *old = vma->vm_policy;
+
+ pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
+ vma->vm_start, vma->vm_end, vma->vm_pgoff,
+ vma->vm_ops, vma->vm_file,
+ vma->vm_ops ? vma->vm_ops->set_policy : NULL);
+
+ if (vma->vm_ops && vma->vm_ops->set_policy)
+ err = vma->vm_ops->set_policy(vma, new);
+ if (!err) {
+ mpol_get(new);
+ vma->vm_policy = new;
+ mpol_put(old);
+ }
+ return err;
+}
+
/* Step 2: apply policy to a range and do splits. */
static int mbind_range(struct mm_struct *mm, unsigned long start,
unsigned long end, struct mempolicy *new_pol)
@@ -655,23 +676,9 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
if (err)
goto out;
}
-
- /*
- * Apply policy to a single VMA. The reference counting of
- * policy for vma_policy linkages has already been handled by
- * vma_merge and split_vma as necessary. If this is a shared
- * policy then ->set_policy will increment the reference count
- * for an sp node.
- */
- pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
- vma->vm_start, vma->vm_end, vma->vm_pgoff,
- vma->vm_ops, vma->vm_file,
- vma->vm_ops ? vma->vm_ops->set_policy : NULL);
- if (vma->vm_ops && vma->vm_ops->set_policy) {
- err = vma->vm_ops->set_policy(vma, new_pol);
- if (err)
- goto out;
- }
+ err = policy_vma(vma, new_pol);
+ if (err)
+ goto out;
}

out:
--
1.7.1

2012-06-11 09:18:25

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 2/6] mempolicy: remove all mempolicy sharing

From: KOSAKI Motohiro <[email protected]>

Dave Jones' system call fuzz testing tool "trinity" triggered the following
bug error with slab debugging enabled

[ 7613.229315] =============================================================================
[ 7613.229955] BUG numa_policy (Not tainted): Poison overwritten
[ 7613.230560] -----------------------------------------------------------------------------
[ 7613.230560]
[ 7613.231834] INFO: 0xffff880146498250-0xffff880146498250. First byte 0x6a instead of 0x6b
[ 7613.232518] INFO: Allocated in mpol_new+0xa3/0x140 age=46310 cpu=6 pid=32154
[ 7613.233188] __slab_alloc+0x3d3/0x445
[ 7613.233877] kmem_cache_alloc+0x29d/0x2b0
[ 7613.234564] mpol_new+0xa3/0x140
[ 7613.235236] sys_mbind+0x142/0x620
[ 7613.235929] system_call_fastpath+0x16/0x1b
[ 7613.236640] INFO: Freed in __mpol_put+0x27/0x30 age=46268 cpu=6 pid=32154
[ 7613.237354] __slab_free+0x2e/0x1de
[ 7613.238080] kmem_cache_free+0x25a/0x260
[ 7613.238799] __mpol_put+0x27/0x30
[ 7613.239515] remove_vma+0x68/0x90
[ 7613.240223] exit_mmap+0x118/0x140
[ 7613.240939] mmput+0x73/0x110
[ 7613.241651] exit_mm+0x108/0x130
[ 7613.242367] do_exit+0x162/0xb90
[ 7613.243074] do_group_exit+0x4f/0xc0
[ 7613.243790] sys_exit_group+0x17/0x20
[ 7613.244507] system_call_fastpath+0x16/0x1b
[ 7613.245212] INFO: Slab 0xffffea0005192600 objects=27 used=27 fp=0x (null) flags=0x20000000004080
[ 7613.246000] INFO: Object 0xffff880146498250 @offset=592 fp=0xffff88014649b9d0

The problem was created by a reference count imbalance. Example, In following case,
mbind(addr, len) try to replace mempolicies of vma1 and vma2 and then they will
be share the same mempolicy, and the new mempolicy has MPOL_F_SHARED flag.

+-------------------+-------------------+
| vma1 | vma2(shmem) |
+-------------------+-------------------+
| |
addr addr+len

Look at alloc_pages_vma(), it uses get_vma_policy() and mpol_cond_put() pair
for maintaining mempolicy refcount. The current rule is, get_vma_policy() does
NOT increase a refcount if the policy is not attached shmem vma and mpol_cond_put()
DOES decrease a refcount if mpol has MPOL_F_SHARED.

In above case, vma1 is not shmem vma and vma->policy has MPOL_F_SHARED! then,
get_vma_policy() doesn't increase a refcount and mpol_cond_put() decrease a
refcount whenever alloc_page_vma() is called.

The bug was introduced by commit 52cd3b0740 (mempolicy: rework mempolicy Reference
Counting) at 4 years ago.

More unfortunately mempolicy has one another serious broken. Currently,
mempolicy rebind logic (it is called from cpuset rebinding) ignore a refcount
of mempolicy and override it forcibly. Thus, any mempolicy sharing may
cause mempolicy corruption. The bug was introduced by commit 68860ec10b
(cpusets: automatic numa mempolicy rebinding) at 7 years ago.

To disable policy sharing solves user visible breakage and this patch does it.
Maybe, we need to rewrite MPOL_F_SHARED and mempolicy rebinding code and aim
to proper cow logic eventually, but I think this is good first step.

Reported-by: Dave Jones <[email protected]>,
Cc: Mel Gorman <[email protected]>
Cc: Christoph Lameter <[email protected]>,
Cc: Andrew Morton <[email protected]>
Cc: <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/mempolicy.c | 49 ++++++++++++++++++++++++++++++++++++-------------
1 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 0a60def..9505cb9 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -607,24 +607,38 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
return first;
}

-/* Apply policy to a single VMA */
-static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new)
+/*
+ * Apply policy to a single VMA
+ * This must be called with the mmap_sem held for writing.
+ */
+static int policy_vma(struct vm_area_struct *vma, struct mempolicy *pol)
{
- int err = 0;
- struct mempolicy *old = vma->vm_policy;
+ int err;
+ struct mempolicy *old;
+ struct mempolicy *new;

pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
vma->vm_start, vma->vm_end, vma->vm_pgoff,
vma->vm_ops, vma->vm_file,
vma->vm_ops ? vma->vm_ops->set_policy : NULL);

- if (vma->vm_ops && vma->vm_ops->set_policy)
+ new = mpol_dup(pol);
+ if (IS_ERR(new))
+ return PTR_ERR(new);
+
+ if (vma->vm_ops && vma->vm_ops->set_policy) {
err = vma->vm_ops->set_policy(vma, new);
- if (!err) {
- mpol_get(new);
- vma->vm_policy = new;
- mpol_put(old);
+ if (err)
+ goto err_out;
}
+
+ old = vma->vm_policy;
+ vma->vm_policy = new; /* protected by mmap_sem */
+ mpol_put(old);
+
+ return 0;
+ err_out:
+ mpol_put(new);
return err;
}

@@ -2147,15 +2161,24 @@ static void sp_delete(struct shared_policy *sp, struct sp_node *n)
static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
struct mempolicy *pol)
{
- struct sp_node *n = kmem_cache_alloc(sn_cache, GFP_KERNEL);
+ struct sp_node *n;
+ struct mempolicy *newpol;

+ n = kmem_cache_alloc(sn_cache, GFP_KERNEL);
if (!n)
return NULL;
+
+ newpol = mpol_dup(pol);
+ if (IS_ERR(newpol)) {
+ kmem_cache_free(sn_cache, n);
+ return NULL;
+ }
+ newpol->flags |= MPOL_F_SHARED;
+
n->start = start;
n->end = end;
- mpol_get(pol);
- pol->flags |= MPOL_F_SHARED; /* for unref */
- n->policy = pol;
+ n->policy = newpol;
+
return n;
}

--
1.7.1

2012-06-11 09:18:32

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 3/6] mempolicy: fix a race in shared_policy_replace()

From: KOSAKI Motohiro <[email protected]>

shared_policy_replace() uses sp_alloc() wrongly. 1) sp_node can't be dereferenced when
not holding sp->lock and 2) another thread can modify sp_node when sp->lock is unlocked.

This patch fixes them.

Note: The bug was introduced pre-git age (IOW, before 2.6.12-rc2). I believe nobody
uses this feature in production systems.

Cc: Dave Jones <[email protected]>,
Cc: Mel Gorman <[email protected]>
Cc: Christoph Lameter <[email protected]>,
Cc: Andrew Morton <[email protected]>
Cc: <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/mempolicy.c | 55 ++++++++++++++++++++++++++++++++++++++-----------------
1 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 9505cb9..d97d2db 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2158,6 +2158,14 @@ static void sp_delete(struct shared_policy *sp, struct sp_node *n)
kmem_cache_free(sn_cache, n);
}

+static void sp_node_init(struct sp_node *node, unsigned long start,
+ unsigned long end, struct mempolicy *pol)
+{
+ node->start = start;
+ node->end = end;
+ node->policy = pol;
+}
+
static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
struct mempolicy *pol)
{
@@ -2174,10 +2182,7 @@ static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
return NULL;
}
newpol->flags |= MPOL_F_SHARED;
-
- n->start = start;
- n->end = end;
- n->policy = newpol;
+ sp_node_init(n, start, end, newpol);

return n;
}
@@ -2186,7 +2191,9 @@ static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
static int shared_policy_replace(struct shared_policy *sp, unsigned long start,
unsigned long end, struct sp_node *new)
{
+ int err;
struct sp_node *n, *new2 = NULL;
+ struct mempolicy *new2_pol = NULL;

restart:
spin_lock(&sp->lock);
@@ -2202,16 +2209,16 @@ restart:
} else {
/* Old policy spanning whole new range. */
if (n->end > end) {
- if (!new2) {
- spin_unlock(&sp->lock);
- new2 = sp_alloc(end, n->end, n->policy);
- if (!new2)
- return -ENOMEM;
- goto restart;
- }
- n->end = start;
+ if (!new2)
+ goto alloc_new2;
+
+ *new2_pol = *n->policy;
+ atomic_set(&new2_pol->refcnt, 1);
+ sp_node_init(new2, n->end, end, new2_pol);
sp_insert(sp, new2);
+ n->end = start;
new2 = NULL;
+ new2_pol = NULL;
break;
} else
n->end = start;
@@ -2223,11 +2230,25 @@ restart:
if (new)
sp_insert(sp, new);
spin_unlock(&sp->lock);
- if (new2) {
- mpol_put(new2->policy);
- kmem_cache_free(sn_cache, new2);
- }
- return 0;
+ err = 0;
+
+ err_out:
+ if (new2_pol)
+ mpol_put(new2_pol);
+ if (new2)
+ kmem_cache_free(sn_cache, new2);
+ return err;
+
+ alloc_new2:
+ spin_unlock(&sp->lock);
+ err = -ENOMEM;
+ new2 = kmem_cache_alloc(sn_cache, GFP_KERNEL);
+ if (!new2)
+ goto err_out;
+ new2_pol = kmem_cache_alloc(policy_cache, GFP_KERNEL);
+ if (!new2_pol)
+ goto err_out;
+ goto restart;
}

/**
--
1.7.1

2012-06-11 09:18:37

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 4/6] mempolicy: fix refcount leak in mpol_set_shared_policy()

From: KOSAKI Motohiro <[email protected]>

When shared_policy_replace() failure, new->policy is not freed correctly.
The real problem is, shared mempolicy codes directly call kmem_cache_free()
in multiple place.

This patch creates proper wrapper function and uses it.
Note: The bug was introduced pre-git age (IOW, before 2.6.12-rc2).

Cc: Dave Jones <[email protected]>,
Cc: Mel Gorman <[email protected]>
Cc: Christoph Lameter <[email protected]>,
Cc: Andrew Morton <[email protected]>
Cc: <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/mempolicy.c | 15 +++++++++------
1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d97d2db..7fb7d51 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2150,12 +2150,17 @@ mpol_shared_policy_lookup(struct shared_policy *sp, unsigned long idx)
return pol;
}

+static void sp_free(struct sp_node *n)
+{
+ mpol_put(n->policy);
+ kmem_cache_free(sn_cache, n);
+}
+
static void sp_delete(struct shared_policy *sp, struct sp_node *n)
{
pr_debug("deleting %lx-l%lx\n", n->start, n->end);
rb_erase(&n->nd, &sp->root);
- mpol_put(n->policy);
- kmem_cache_free(sn_cache, n);
+ sp_free(n);
}

static void sp_node_init(struct sp_node *node, unsigned long start,
@@ -2320,7 +2325,7 @@ int mpol_set_shared_policy(struct shared_policy *info,
}
err = shared_policy_replace(info, vma->vm_pgoff, vma->vm_pgoff+sz, new);
if (err && new)
- kmem_cache_free(sn_cache, new);
+ sp_free(new);
return err;
}

@@ -2337,9 +2342,7 @@ void mpol_free_shared_policy(struct shared_policy *p)
while (next) {
n = rb_entry(next, struct sp_node, nd);
next = rb_next(&n->nd);
- rb_erase(&n->nd, &p->root);
- mpol_put(n->policy);
- kmem_cache_free(sn_cache, n);
+ sp_delete(p, n);
}
spin_unlock(&p->lock);
}
--
1.7.1

2012-06-11 09:18:40

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 5/6] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()

From: KOSAKI Motohiro <[email protected]>

commit cc9a6c8776 (cpuset: mm: reduce large amounts of memory barrier related
damage v3) introduced a memory corruption.

shmem_alloc_page() passes pseudo vma and it has one significant unique
combination, vma->vm_ops=NULL and (vma->policy->flags & MPOL_F_SHARED).

Now, get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
and mpol_cond_put() DOES decrease a policy ref when a policy has MPOL_F_SHARED.
Therefore, when alloc_pages_vma() goes 'goto retry_cpuset' path, a policy
refcount will be decreased too much and therefore it will make a memory corruption.

This patch fixes it.

Cc: Dave Jones <[email protected]>,
Cc: Mel Gorman <[email protected]>
Cc: Christoph Lameter <[email protected]>,
Cc: Andrew Morton <[email protected]>
Cc: <[email protected]>
Cc: Miao Xie <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: Andi Kleen <[email protected]>
---
mm/mempolicy.c | 13 ++++++++++++-
mm/shmem.c | 9 +++++----
2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 7fb7d51..0da0969 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1544,18 +1544,29 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
struct vm_area_struct *vma, unsigned long addr)
{
struct mempolicy *pol = task->mempolicy;
+ int got_ref;

if (vma) {
if (vma->vm_ops && vma->vm_ops->get_policy) {
struct mempolicy *vpol = vma->vm_ops->get_policy(vma,
addr);
- if (vpol)
+ if (vpol) {
pol = vpol;
+ got_ref = 1;
+ }
} else if (vma->vm_policy)
pol = vma->vm_policy;
}
if (!pol)
pol = &default_policy;
+
+ /*
+ * shmem_alloc_page() passes MPOL_F_SHARED policy with vma->vm_ops=NULL.
+ * Thus, we need to take additional ref for avoiding refcount imbalance.
+ */
+ if (!got_ref && mpol_needs_cond_ref(pol))
+ mpol_get(pol);
+
return pol;
}

diff --git a/mm/shmem.c b/mm/shmem.c
index d576b84..eb5f1eb 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -919,6 +919,7 @@ static struct page *shmem_alloc_page(gfp_t gfp,
struct shmem_inode_info *info, pgoff_t index)
{
struct vm_area_struct pvma;
+ struct page *page;

/* Create a pseudo vma that just contains the policy */
pvma.vm_start = 0;
@@ -926,10 +927,10 @@ static struct page *shmem_alloc_page(gfp_t gfp,
pvma.vm_ops = NULL;
pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);

- /*
- * alloc_page_vma() will drop the shared policy reference
- */
- return alloc_page_vma(gfp, &pvma, 0);
+ page = alloc_page_vma(gfp, &pvma, 0);
+
+ mpol_put(pvma.vm_policy);
+ return page;
}
#else /* !CONFIG_NUMA */
#ifdef CONFIG_TMPFS
--
1.7.1

2012-06-11 09:18:44

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH 6/6] MAINTAINERS: Added MEMPOLICY entry

From: KOSAKI Motohiro <[email protected]>

Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
---
MAINTAINERS | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a246490..6f4a8e2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4467,6 +4467,13 @@ F: drivers/mtd/
F: include/linux/mtd/
F: include/mtd/

+MEMPOLICY
+M: KOSAKI Motohiro <[email protected]>
+L: [email protected]
+S: Maintained
+F: include/linux/mempolicy.h
+F: mm/mempolicy.c
+
MICROBLAZE ARCHITECTURE
M: Michal Simek <[email protected]>
L: [email protected] (moderated for non-subscribers)
--
1.7.1

2012-06-11 13:33:40

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 5/6] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()

On Mon, 2012-06-11 at 05:17 -0400, [email protected] wrote:
> From: KOSAKI Motohiro <[email protected]>
>
> commit cc9a6c8776 (cpuset: mm: reduce large amounts of memory barrier related
> damage v3) introduced a memory corruption.
>
> shmem_alloc_page() passes pseudo vma and it has one significant unique
> combination, vma->vm_ops=NULL and (vma->policy->flags & MPOL_F_SHARED).
>
> Now, get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
> and mpol_cond_put() DOES decrease a policy ref when a policy has MPOL_F_SHARED.
> Therefore, when alloc_pages_vma() goes 'goto retry_cpuset' path, a policy
> refcount will be decreased too much and therefore it will make a memory corruption.
[...]
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1544,18 +1544,29 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
> struct vm_area_struct *vma, unsigned long addr)
> {
> struct mempolicy *pol = task->mempolicy;
> + int got_ref;

= 0

And this should really be a bool.

> if (vma) {
> if (vma->vm_ops && vma->vm_ops->get_policy) {
> struct mempolicy *vpol = vma->vm_ops->get_policy(vma,
> addr);
> - if (vpol)
> + if (vpol) {
> pol = vpol;
> + got_ref = 1;
> + }
> } else if (vma->vm_policy)
> pol = vma->vm_policy;
> }
> if (!pol)
> pol = &default_policy;
> +
> + /*
> + * shmem_alloc_page() passes MPOL_F_SHARED policy with vma->vm_ops=NULL.
> + * Thus, we need to take additional ref for avoiding refcount imbalance.
> + */
> + if (!got_ref && mpol_needs_cond_ref(pol))
> + mpol_get(pol);
> +
> return pol;
> }
>
[...]

--
Ben Hutchings
Computers are not intelligent. They only think they are.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part
Subject: Re: [PATCH 1/6] Revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages"

On Mon, 11 Jun 2012, [email protected] wrote:

> From: KOSAKI Motohiro <[email protected]>
>
> commit 05f144a0d5 "mm: mempolicy: Let vma_merge and vma_split handle
> vma->vm_policy linkages" removed a vma->vm_policy updates. But it is
> a primary purpose of mbind_range(). Now, mbind(2) is no-op in several
> case unintentionally. It is not ideal fix. This patch reverts it.

Rewritten changelog:

commit 05f144a0d5 "mm: mempolicy: Let vma_merge and vma_split handle
vma->vm_policy linkages" removed policy_vma() but the function is
needed in this patchset.


(It is not clear to me what the last sentences mean. AFAICT the code for
policy_vma() still exists in another function prior to this patch)

Subject: Re: [PATCH 0/6][resend] mempolicy memory corruption fixlet

On Mon, 11 Jun 2012, [email protected] wrote:

> changes from v1: fix some typo of changelogs.

I still have a hard time with the changelogs. Will try to give you
hopefulyl better ones.

Again you need to CC Andi Kleen on this. He wrote most of the shared
mempolicy code (long time ago) and then Lee started tinkering with it.

Subject: Re: [PATCH 2/6] mempolicy: remove all mempolicy sharing

Some more attempts to cleanup changelogs:

> The problem was created by a reference count imbalance. Example, In following case,
> mbind(addr, len) try to replace mempolicies of vma1 and vma2 and then they will
> be share the same mempolicy, and the new mempolicy has MPOL_F_SHARED flag.

The bug that we saw <where ? details?> was created by a refcount
imbalance. If mbind() replaces the memory policies of vma1 and vma and
they share the same shared mempolicy (MPOL_F_SHARED set) then an imbalance
may occur.

> +-------------------+-------------------+
> | vma1 | vma2(shmem) |
> +-------------------+-------------------+
> | |
> addr addr+len
>
> Look at alloc_pages_vma(), it uses get_vma_policy() and mpol_cond_put() pair
> for maintaining mempolicy refcount. The current rule is, get_vma_policy() does
> NOT increase a refcount if the policy is not attached shmem vma and mpol_cond_put()
> DOES decrease a refcount if mpol has MPOL_F_SHARED.

alloc_pages_vma() uses the two function get_vma_policy() and
mpol_cond_put() to maintain the refcount on the memory policies. However,
the current rule is that get_vma_policy() does *not* increase the refcount
if the policy is not attached to a shm vma. mpol_cond_put *does* decrease
the refcount if the memory policy has MPOL_F_SHARED set.

> In above case, vma1 is not shmem vma and vma->policy has MPOL_F_SHARED! then,
> get_vma_policy() doesn't increase a refcount and mpol_cond_put() decrease a
> refcount whenever alloc_page_vma() is called.
>
> The bug was introduced by commit 52cd3b0740 (mempolicy: rework mempolicy Reference
> Counting) at 4 years ago.
>
> More unfortunately mempolicy has one another serious broken. Currently,
> mempolicy rebind logic (it is called from cpuset rebinding) ignore a refcount
> of mempolicy and override it forcibly. Thus, any mempolicy sharing may
> cause mempolicy corruption. The bug was introduced by commit 68860ec10b
> (cpusets: automatic numa mempolicy rebinding) at 7 years ago.

Memory policies have another issue. Currently the mempolicy rebind logic
used for cpuset rebinding ignores the refcount of memory policies.
Therefore, any memory policy sharing can cause refcount mismatches. The
bug was ...

> To disable policy sharing solves user visible breakage and this patch does it.
> Maybe, we need to rewrite MPOL_F_SHARED and mempolicy rebinding code and aim
> to proper cow logic eventually, but I think this is good first step.

Disabling policy sharing solves the breakage and that is how this patch
fixes the issue for now. Rewriting the shared policy handling with proper
COW logic support will be necessary to cleanly address the
problem and allow proper sharing of memory policies.

2012-06-11 15:24:15

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 5/6] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()

(6/11/12 9:33 AM), Ben Hutchings wrote:
> On Mon, 2012-06-11 at 05:17 -0400, [email protected] wrote:
>> From: KOSAKI Motohiro<[email protected]>
>>
>> commit cc9a6c8776 (cpuset: mm: reduce large amounts of memory barrier related
>> damage v3) introduced a memory corruption.
>>
>> shmem_alloc_page() passes pseudo vma and it has one significant unique
>> combination, vma->vm_ops=NULL and (vma->policy->flags& MPOL_F_SHARED).
>>
>> Now, get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
>> and mpol_cond_put() DOES decrease a policy ref when a policy has MPOL_F_SHARED.
>> Therefore, when alloc_pages_vma() goes 'goto retry_cpuset' path, a policy
>> refcount will be decreased too much and therefore it will make a memory corruption.
> [...]
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -1544,18 +1544,29 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
>> struct vm_area_struct *vma, unsigned long addr)
>> {
>> struct mempolicy *pol = task->mempolicy;
>> + int got_ref;
>
> = 0
>
> And this should really be a bool.

Good catch. Thanks.



From 6a635a77e7b192413525855e19df7a724c81ae5b Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <[email protected]>
Date: Tue, 29 May 2012 22:23:46 -0400
Subject: [PATCH 5/6] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()

commit cc9a6c8776 (cpuset: mm: reduce large amounts of memory barrier related
damage v3) introduced a memory corruption.

shmem_alloc_page() passes pseudo vma and it has one significant unique
combination, vma->vm_ops=NULL and (vma->policy->flags & MPOL_F_SHARED).

Now, get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
and mpol_cond_put() DOES decrease a policy ref when a policy has MPOL_F_SHARED.
Therefore, when alloc_pages_vma() goes 'goto retry_cpuset' path, a policy
refcount will be decreased too much and therefore it will make a memory corruption.

This patch fixes it.

Cc: Dave Jones <[email protected]>,
Cc: Mel Gorman <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Christoph Lameter <[email protected]>,
Cc: Andrew Morton <[email protected]>
Cc: <[email protected]>
Cc: Miao Xie <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
mm/mempolicy.c | 13 ++++++++++++-
mm/shmem.c | 9 +++++----
2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 7fb7d51..ddde834 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1544,18 +1544,29 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
struct vm_area_struct *vma, unsigned long addr)
{
struct mempolicy *pol = task->mempolicy;
+ bool got_ref = 0;

if (vma) {
if (vma->vm_ops && vma->vm_ops->get_policy) {
struct mempolicy *vpol = vma->vm_ops->get_policy(vma,
addr);
- if (vpol)
+ if (vpol) {
pol = vpol;
+ got_ref = 1;
+ }
} else if (vma->vm_policy)
pol = vma->vm_policy;
}
if (!pol)
pol = &default_policy;
+
+ /*
+ * shmem_alloc_page() passes MPOL_F_SHARED policy with vma->vm_ops=NULL.
+ * Thus, we need to take additional ref for avoiding refcount imbalance.
+ */
+ if (!got_ref && mpol_needs_cond_ref(pol))
+ mpol_get(pol);
+
return pol;
}

diff --git a/mm/shmem.c b/mm/shmem.c
index d576b84..eb5f1eb 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -919,6 +919,7 @@ static struct page *shmem_alloc_page(gfp_t gfp,
struct shmem_inode_info *info, pgoff_t index)
{
struct vm_area_struct pvma;
+ struct page *page;

/* Create a pseudo vma that just contains the policy */
pvma.vm_start = 0;
@@ -926,10 +927,10 @@ static struct page *shmem_alloc_page(gfp_t gfp,
pvma.vm_ops = NULL;
pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);

- /*
- * alloc_page_vma() will drop the shared policy reference
- */
- return alloc_page_vma(gfp, &pvma, 0);
+ page = alloc_page_vma(gfp, &pvma, 0);
+
+ mpol_put(pvma.vm_policy);
+ return page;
}
#else /* !CONFIG_NUMA */
#ifdef CONFIG_TMPFS
--
1.7.1

2012-06-12 13:55:39

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 2/6] mempolicy: remove all mempolicy sharing

On Mon, Jun 11, 2012 at 05:17:26AM -0400, [email protected] wrote:
> From: KOSAKI Motohiro <[email protected]>
>
> Dave Jones' system call fuzz testing tool "trinity" triggered the following
> bug error with slab debugging enabled
>
> [ 7613.229315] =============================================================================
> [ 7613.229955] BUG numa_policy (Not tainted): Poison overwritten
> [ 7613.230560] -----------------------------------------------------------------------------
> [ 7613.230560]
> [ 7613.231834] INFO: 0xffff880146498250-0xffff880146498250. First byte 0x6a instead of 0x6b
> [ 7613.232518] INFO: Allocated in mpol_new+0xa3/0x140 age=46310 cpu=6 pid=32154
> [ 7613.233188] __slab_alloc+0x3d3/0x445
> [ 7613.233877] kmem_cache_alloc+0x29d/0x2b0
> [ 7613.234564] mpol_new+0xa3/0x140
> [ 7613.235236] sys_mbind+0x142/0x620
> [ 7613.235929] system_call_fastpath+0x16/0x1b
> [ 7613.236640] INFO: Freed in __mpol_put+0x27/0x30 age=46268 cpu=6 pid=32154
> [ 7613.237354] __slab_free+0x2e/0x1de
> [ 7613.238080] kmem_cache_free+0x25a/0x260
> [ 7613.238799] __mpol_put+0x27/0x30
> [ 7613.239515] remove_vma+0x68/0x90
> [ 7613.240223] exit_mmap+0x118/0x140
> [ 7613.240939] mmput+0x73/0x110
> [ 7613.241651] exit_mm+0x108/0x130
> [ 7613.242367] do_exit+0x162/0xb90
> [ 7613.243074] do_group_exit+0x4f/0xc0
> [ 7613.243790] sys_exit_group+0x17/0x20
> [ 7613.244507] system_call_fastpath+0x16/0x1b
> [ 7613.245212] INFO: Slab 0xffffea0005192600 objects=27 used=27 fp=0x (null) flags=0x20000000004080
> [ 7613.246000] INFO: Object 0xffff880146498250 @offset=592 fp=0xffff88014649b9d0
>
> The problem was created by a reference count imbalance. Example, In following case,
> mbind(addr, len) try to replace mempolicies of vma1 and vma2 and then they will
> be share the same mempolicy, and the new mempolicy has MPOL_F_SHARED flag.
>
> +-------------------+-------------------+
> | vma1 | vma2(shmem) |
> +-------------------+-------------------+
> | |
> addr addr+len
>

Your example is missing some important detail. When I was looking at this
I thought of the same scenario because initially I thought this might be
the problem Dave's test case was hitting. Obviously I then proceeded to
mess up anyway so take this with a grain of salt but why is this particular
situation not prevented by vma_merge? is_mergeable_vma() should have spotted
that the vm_files differed and mbind_range() should not have tried
sharing them.

> Look at alloc_pages_vma(), it uses get_vma_policy() and mpol_cond_put() pair
> for maintaining mempolicy refcount. The current rule is, get_vma_policy() does
> NOT increase a refcount if the policy is not attached shmem vma and mpol_cond_put()
> DOES decrease a refcount if mpol has MPOL_F_SHARED.
>

The rules about refcounting are indeed annoying. It would be a lot easier
to understand if the reference counting was unconditional but then every
page allocation in a large VMA would also bounce the cacheline storing
the count which would just generate a new bug later.

> In above case, vma1 is not shmem vma and vma->policy has MPOL_F_SHARED! then,
> get_vma_policy() doesn't increase a refcount and mpol_cond_put() decrease a
> refcount whenever alloc_page_vma() is called.
>
> The bug was introduced by commit 52cd3b0740 (mempolicy: rework mempolicy Reference
> Counting) at 4 years ago.
>
> More unfortunately mempolicy has one another serious broken. Currently,
> mempolicy rebind logic (it is called from cpuset rebinding) ignore a refcount
> of mempolicy and override it forcibly. Thus, any mempolicy sharing may
> cause mempolicy corruption. The bug was introduced by commit 68860ec10b
> (cpusets: automatic numa mempolicy rebinding) at 7 years ago.
>

I suspect these bugs were not noticed because the shmem policies are
typically large and very long lived without much use of mbind() but
that's not an excuse.

> To disable policy sharing solves user visible breakage and this patch does it.
> Maybe, we need to rewrite MPOL_F_SHARED and mempolicy rebinding code and aim
> to proper cow logic eventually, but I think this is good first step.
>
> Reported-by: Dave Jones <[email protected]>,
> Cc: Mel Gorman <[email protected]>
> Cc: Christoph Lameter <[email protected]>,
> Cc: Andrew Morton <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> mm/mempolicy.c | 49 ++++++++++++++++++++++++++++++++++++-------------
> 1 files changed, 36 insertions(+), 13 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 0a60def..9505cb9 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -607,24 +607,38 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
> return first;
> }
>
> -/* Apply policy to a single VMA */
> -static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new)
> +/*
> + * Apply policy to a single VMA
> + * This must be called with the mmap_sem held for writing.
> + */
> +static int policy_vma(struct vm_area_struct *vma, struct mempolicy *pol)

If we're going to change this, change the policy_vma() name as well to
set_vma_policy. We currently have policy_vma() and vma_policy() which mean
totally different things which is partially why I deleted it entirely the
first time around. It's a small issue but it might make mempolicy.c 0.0001%
easier to follow.

> {
> - int err = 0;
> - struct mempolicy *old = vma->vm_policy;
> + int err;
> + struct mempolicy *old;
> + struct mempolicy *new;
>
> pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n",
> vma->vm_start, vma->vm_end, vma->vm_pgoff,
> vma->vm_ops, vma->vm_file,
> vma->vm_ops ? vma->vm_ops->set_policy : NULL);
>
> - if (vma->vm_ops && vma->vm_ops->set_policy)
> + new = mpol_dup(pol);
> + if (IS_ERR(new))
> + return PTR_ERR(new);
> +
> + if (vma->vm_ops && vma->vm_ops->set_policy) {
> err = vma->vm_ops->set_policy(vma, new);
> - if (!err) {
> - mpol_get(new);
> - vma->vm_policy = new;
> - mpol_put(old);
> + if (err)
> + goto err_out;
> }
> +
> + old = vma->vm_policy;
> + vma->vm_policy = new; /* protected by mmap_sem */
> + mpol_put(old);
> +
> + return 0;
> + err_out:
> + mpol_put(new);
> return err;
> }
>
> @@ -2147,15 +2161,24 @@ static void sp_delete(struct shared_policy *sp, struct sp_node *n)
> static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
> struct mempolicy *pol)
> {
> - struct sp_node *n = kmem_cache_alloc(sn_cache, GFP_KERNEL);
> + struct sp_node *n;
> + struct mempolicy *newpol;
>
> + n = kmem_cache_alloc(sn_cache, GFP_KERNEL);
> if (!n)
> return NULL;
> +
> + newpol = mpol_dup(pol);
> + if (IS_ERR(newpol)) {
> + kmem_cache_free(sn_cache, n);
> + return NULL;
> + }
> + newpol->flags |= MPOL_F_SHARED;
> +
> n->start = start;
> n->end = end;
> - mpol_get(pol);
> - pol->flags |= MPOL_F_SHARED; /* for unref */
> - n->policy = pol;
> + n->policy = newpol;
> +
> return n;
> }
>

--
Mel Gorman
SUSE Labs

2012-06-12 14:20:19

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 5/6] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()

On Mon, Jun 11, 2012 at 05:17:29AM -0400, [email protected] wrote:
> From: KOSAKI Motohiro <[email protected]>
>
> commit cc9a6c8776 (cpuset: mm: reduce large amounts of memory barrier related
> damage v3) introduced a memory corruption.
>

Ouch. No biscuits for Mel.

> shmem_alloc_page() passes pseudo vma and it has one significant unique
> combination, vma->vm_ops=NULL and (vma->policy->flags & MPOL_F_SHARED).
>
> Now, get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
> and mpol_cond_put() DOES decrease a policy ref when a policy has MPOL_F_SHARED.
> Therefore, when alloc_pages_vma() goes 'goto retry_cpuset' path, a policy
> refcount will be decreased too much and therefore it will make a memory corruption.
>

Yes, this is true. Hitting the bug requires that the cpuset is being
updated during the allocation so it's not a common but it is real. I'm
surprised I did not hit this while I was running the cpuset stress test
that originally introduced [get|put]_mems_allowed().

> This patch fixes it.
>
> Cc: Dave Jones <[email protected]>,
> Cc: Mel Gorman <[email protected]>
> Cc: Christoph Lameter <[email protected]>,
> Cc: Andrew Morton <[email protected]>
> Cc: <[email protected]>
> Cc: Miao Xie <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> Acked-by: Andi Kleen <[email protected]>
> ---
> mm/mempolicy.c | 13 ++++++++++++-
> mm/shmem.c | 9 +++++----
> 2 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 7fb7d51..0da0969 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1544,18 +1544,29 @@ struct mempolicy *get_vma_policy(struct task_struct *task,
> struct vm_area_struct *vma, unsigned long addr)
> {
> struct mempolicy *pol = task->mempolicy;
> + int got_ref;
>
> if (vma) {
> if (vma->vm_ops && vma->vm_ops->get_policy) {
> struct mempolicy *vpol = vma->vm_ops->get_policy(vma,
> addr);
> - if (vpol)
> + if (vpol) {
> pol = vpol;
> + got_ref = 1;
> + }
> } else if (vma->vm_policy)
> pol = vma->vm_policy;
> }
> if (!pol)
> pol = &default_policy;
> +
> + /*
> + * shmem_alloc_page() passes MPOL_F_SHARED policy with vma->vm_ops=NULL.
> + * Thus, we need to take additional ref for avoiding refcount imbalance.
> + */
> + if (!got_ref && mpol_needs_cond_ref(pol))
> + mpol_get(pol);
> +
> return pol;
> }
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d576b84..eb5f1eb 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -919,6 +919,7 @@ static struct page *shmem_alloc_page(gfp_t gfp,
> struct shmem_inode_info *info, pgoff_t index)
> {
> struct vm_area_struct pvma;
> + struct page *page;
>
> /* Create a pseudo vma that just contains the policy */
> pvma.vm_start = 0;
> @@ -926,10 +927,10 @@ static struct page *shmem_alloc_page(gfp_t gfp,
> pvma.vm_ops = NULL;
> pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, index);
>
> - /*
> - * alloc_page_vma() will drop the shared policy reference
> - */
> - return alloc_page_vma(gfp, &pvma, 0);
> + page = alloc_page_vma(gfp, &pvma, 0);
> +
> + mpol_put(pvma.vm_policy);
> + return page;
> }

Why does dequeue_huge_page_vma() not need to be changed as well? It's
currently using mpol_cond_put() but if there is a goto retry_cpuset then
will it have not take an additional reference count and leak?

Would it be more straight forward to put the mpol_cond_put() and __mpol_put()
calls after the "goto retry_cpuset" checks instead?

> #else /* !CONFIG_NUMA */
> #ifdef CONFIG_TMPFS
> --
> 1.7.1
>

--
Mel Gorman
SUSE Labs

2012-06-12 16:31:52

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 5/6] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()

> Why does dequeue_huge_page_vma() not need to be changed as well? It's
> currently using mpol_cond_put() but if there is a goto retry_cpuset then
> will it have not take an additional reference count and leak?

dequeue_huge_page_vma() also uses get_vma_policy() and mpol_cond_put()
pair. thus we don't need special concern.


> Would it be more straight forward to put the mpol_cond_put() and __mpol_put()
> calls after the "goto retry_cpuset" checks instead?

I hope to keep symmetric. Sane design prevent a lot of unintentional breakage.
Frankly says, now all caller assume the symmetric. It's natural.

2012-06-12 16:46:15

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 2/6] mempolicy: remove all mempolicy sharing

> Your example is missing some important detail. When I was looking at this
> I thought of the same scenario because initially I thought this might be
> the problem Dave's test case was hitting. Obviously I then proceeded to
> mess up anyway so take this with a grain of salt but why is this particular
> situation not prevented by vma_merge? is_mergeable_vma() should have spotted
> that the vm_files differed and mbind_range() should not have tried
> sharing them.

vma1 and vma2 are never merged. but policy_vma() used mpol_get() instaed
of mpol_dup(). then vma1 and vma2 became to use the same mempolicy.

vma merge/split are completely unrelated. Antually, vma1 and vma2 don't need
to be neighbor vma. | vma1 | hole | vma2| pattern makes the same scenario.


>> Look at alloc_pages_vma(), it uses get_vma_policy() and mpol_cond_put() pair
>> for maintaining mempolicy refcount. The current rule is, get_vma_policy() does
>> NOT increase a refcount if the policy is not attached shmem vma and mpol_cond_put()
>> DOES decrease a refcount if mpol has MPOL_F_SHARED.
>
> The rules about refcounting are indeed annoying. It would be a lot easier
> to understand if the reference counting was unconditional but then every
> page allocation in a large VMA would also bounce the cacheline storing
> the count which would just generate a new bug later.

Yes. regular task/vma policy shouldn't take refcount in fast path. In the other
hands, shmem policy can't avoid refcount game because we have to avoid a
race that another thread free the policy in same time.


> I suspect these bugs were not noticed because the shmem policies are
> typically large and very long lived without much use of mbind() but
> that's not an excuse.

I agree your suspection. I haven't heared this issue.



>> -/* Apply policy to a single VMA */
>> -static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new)
>> +/*
>> + * Apply policy to a single VMA
>> + * This must be called with the mmap_sem held for writing.
>> + */
>> +static int policy_vma(struct vm_area_struct *vma, struct mempolicy *pol)
>
> If we're going to change this, change the policy_vma() name as well to
> set_vma_policy. We currently have policy_vma() and vma_policy() which mean
> totally different things which is partially why I deleted it entirely the
> first time around. It's a small issue but it might make mempolicy.c 0.0001%
> easier to follow.

100% agree. I'll make simple renaming patch.

2012-06-12 16:47:07

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 2/6] mempolicy: remove all mempolicy sharing

On Mon, Jun 11, 2012 at 11:02 AM, Christoph Lameter <[email protected]> wrote:
> Some more attempts to cleanup changelogs:
>
>> The problem was created by a reference count imbalance. Example, In following case,
>> mbind(addr, len) try to replace mempolicies of vma1 and vma2 and then they will
>> be share the same mempolicy, and the new mempolicy has MPOL_F_SHARED flag.
>
> The bug that we saw <where ? details?> was created by a refcount
> imbalance. If mbind() replaces the memory policies of vma1 and vma and
> they share the same shared mempolicy (MPOL_F_SHARED set) then an imbalance
> may occur.
>
>> ? +-------------------+-------------------+
>> ? | ? ? vma1 ? ? ? ? ?| ? ? vma2(shmem) ? |
>> ? +-------------------+-------------------+
>> ? | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? |
>> ?addr ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? addr+len
>>
>> Look at alloc_pages_vma(), it uses get_vma_policy() and mpol_cond_put() pair
>> for maintaining mempolicy refcount. The current rule is, get_vma_policy() does
>> NOT increase a refcount if the policy is not attached shmem vma and mpol_cond_put()
>> DOES decrease a refcount if mpol has MPOL_F_SHARED.
>
> alloc_pages_vma() uses the two function get_vma_policy() and
> mpol_cond_put() to maintain the refcount on the memory policies. However,
> the current rule is that get_vma_policy() does *not* increase the refcount
> if the policy is not attached to a shm vma. mpol_cond_put *does* decrease
> the refcount if the memory policy has MPOL_F_SHARED set.
>
>> In above case, vma1 is not shmem vma and vma->policy has MPOL_F_SHARED! then,
>> get_vma_policy() doesn't increase a refcount and mpol_cond_put() decrease a
>> refcount whenever alloc_page_vma() is called.
>>
>> The bug was introduced by commit 52cd3b0740 (mempolicy: rework mempolicy Reference
>> Counting) at 4 years ago.
>>
>> More unfortunately mempolicy has one another serious broken. Currently,
>> mempolicy rebind logic (it is called from cpuset rebinding) ignore a refcount
>> of mempolicy and override it forcibly. Thus, any mempolicy sharing may
>> cause mempolicy corruption. The bug was introduced by commit 68860ec10b
>> (cpusets: automatic numa mempolicy rebinding) at 7 years ago.
>
> Memory policies have another issue. Currently the mempolicy rebind logic
> used for cpuset rebinding ignores the refcount of memory policies.
> Therefore, any memory policy sharing can cause refcount mismatches. The
> bug was ...
>
>> To disable policy sharing solves user visible breakage and this patch does it.
>> Maybe, we need to rewrite MPOL_F_SHARED and mempolicy rebinding code and aim
>> to proper cow logic eventually, but I think this is good first step.
>
> Disabling policy sharing solves the breakage and that is how this patch
> fixes the issue for now. Rewriting the shared policy handling with proper
> COW logic support will be necessary to cleanly address the
> problem and allow proper sharing of memory policies.

Thanks, Christoph.
I'll rewrite the description as your suggestion.

2012-08-06 19:32:04

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 0/6][resend] mempolicy memory corruption fixlet

On 7/31/2012 8:33 AM, Josh Boyer wrote:
> On Mon, Jun 11, 2012 at 5:17 AM, <[email protected]> wrote:
>> From: KOSAKI Motohiro <[email protected]>
>>
>> Hi
>>
>> This is trivial fixes of mempolicy meory corruption issues. There
>> are independent patches each ather. and, they don't change userland
>> ABIs.
>>
>> Thanks.
>>
>> changes from v1: fix some typo of changelogs s.
>>
>> -----------------------------------------------
>> KOSAKI Motohiro (6):
>> Revert "mm: mempolicy: Let vma_merge and vma_split handle
>> vma->vm_policy linkages"
>> mempolicy: Kill all mempolicy sharing
>> mempolicy: fix a race in shared_policy_replace()
>> mempolicy: fix refcount leak in mpol_set_shared_policy()
>> mempolicy: fix a memory corruption by refcount imbalance in
>> alloc_pages_vma()
>> MAINTAINERS: Added MEMPOLICY entry
>>
>> MAINTAINERS | 7 +++
>> mm/mempolicy.c | 151 ++++++++++++++++++++++++++++++++++++++++----------------
>> mm/shmem.c | 9 ++--
>> 3 files changed, 120 insertions(+), 47 deletions(-)
>
> I don't see these patches queued anywhere. They aren't in linux-next,
> mmotm, or Linus' tree. Did these get dropped? Is the revert still
> needed?

Sorry. my fault. yes, it is needed. currently, Some LTP was fail since
Mel's "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages" patch.



2012-08-15 11:40:49

by Josh Boyer

[permalink] [raw]
Subject: Re: [PATCH 0/6][resend] mempolicy memory corruption fixlet

On Mon, Aug 6, 2012 at 3:32 PM, KOSAKI Motohiro
<[email protected]> wrote:
> On 7/31/2012 8:33 AM, Josh Boyer wrote:
>> On Mon, Jun 11, 2012 at 5:17 AM, <[email protected]> wrote:
>>> From: KOSAKI Motohiro <[email protected]>
>>>
>>> Hi
>>>
>>> This is trivial fixes of mempolicy meory corruption issues. There
>>> are independent patches each ather. and, they don't change userland
>>> ABIs.
>>>
>>> Thanks.
>>>
>>> changes from v1: fix some typo of changelogs s.
>>>
>>> -----------------------------------------------
>>> KOSAKI Motohiro (6):
>>> Revert "mm: mempolicy: Let vma_merge and vma_split handle
>>> vma->vm_policy linkages"
>>> mempolicy: Kill all mempolicy sharing
>>> mempolicy: fix a race in shared_policy_replace()
>>> mempolicy: fix refcount leak in mpol_set_shared_policy()
>>> mempolicy: fix a memory corruption by refcount imbalance in
>>> alloc_pages_vma()
>>> MAINTAINERS: Added MEMPOLICY entry
>>>
>>> MAINTAINERS | 7 +++
>>> mm/mempolicy.c | 151 ++++++++++++++++++++++++++++++++++++++++----------------
>>> mm/shmem.c | 9 ++--
>>> 3 files changed, 120 insertions(+), 47 deletions(-)
>>
>> I don't see these patches queued anywhere. They aren't in linux-next,
>> mmotm, or Linus' tree. Did these get dropped? Is the revert still
>> needed?
>
> Sorry. my fault. yes, it is needed. currently, Some LTP was fail since
> Mel's "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages" patch.

The series still isn't queued anywhere. Are you planning on resending
it again, or should it get picked up in a particular tree?

josh

2012-08-15 20:20:58

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/6][resend] mempolicy memory corruption fixlet

On Wed, 15 Aug 2012 07:40:46 -0400
Josh Boyer <[email protected]> wrote:

> >> I don't see these patches queued anywhere. They aren't in linux-next,
> >> mmotm, or Linus' tree. Did these get dropped? Is the revert still
> >> needed?
> >
> > Sorry. my fault. yes, it is needed. currently, Some LTP was fail since
> > Mel's "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages" patch.
>
> The series still isn't queued anywhere. Are you planning on resending
> it again, or should it get picked up in a particular tree?

The patches need a refresh and a retest please, including incorporation
of Christoph's changelog modifications.

And for gawd's sake, please stop using my google address! It messes me
all up.