Andrea posted this one-liner a while ago as part of a larger patch. He said
it fixed return of the wrong policy in some conditions. Was this a valid fix?
--- linux-2.6.10-rc2/mm/mempolicy.c 2004-11-11 03:23:03.000000000 -0500
+++ edited/mm/mempolicy.c 2004-11-15 22:09:41.387881104 -0500
@@ -902,7 +902,7 @@ sp_lookup(struct shared_policy *sp, unsi
struct sp_node *p = rb_entry(n, struct sp_node, nd);
if (start >= p->end) {
n = n->rb_right;
- } else if (end < p->start) {
+ } else if (end <= p->start) {
n = n->rb_left;
} else {
break;
--Chuck Ebbert 15-Nov-04 22:20:16
On Mon, Nov 15, 2004 at 11:15:51PM -0500, Chuck Ebbert wrote:
> Andrea posted this one-liner a while ago as part of a larger patch. He said
> it fixed return of the wrong policy in some conditions. Was this a valid fix?
Yes it was.
-Andi
On Wed, 17 Nov 2004 at 02:00:20 +0100, Andi Kleen wrote:
> On Mon, Nov 15, 2004 at 11:15:51PM -0500, Chuck Ebbert wrote:
> > Andrea posted this one-liner a while ago as part of a larger patch. He said
> > it fixed return of the wrong policy in some conditions. Was this a valid fix?
>
> Yes it was.
At least it wasn't dropped -- it's in -mm as part of
fix-for-mpol-mm-corruption-on-tmpfs, though it's unrelated to tmpfs.
(That patch contains three separate changes...)
Should just this part, which changes '<' to '<=', be pushed upstream?
--Chuck Ebbert 16-Nov-04 22:54:02
On Tue, Nov 16, 2004 at 10:54:09PM -0500, Chuck Ebbert wrote:
> On Wed, 17 Nov 2004 at 02:00:20 +0100, Andi Kleen wrote:
>
> > On Mon, Nov 15, 2004 at 11:15:51PM -0500, Chuck Ebbert wrote:
> > > Andrea posted this one-liner a while ago as part of a larger patch. He said
> > > it fixed return of the wrong policy in some conditions. Was this a valid fix?
> >
> > Yes it was.
>
> At least it wasn't dropped -- it's in -mm as part of
> fix-for-mpol-mm-corruption-on-tmpfs, though it's unrelated to tmpfs.
> (That patch contains three separate changes...)
>
> Should just this part, which changes '<' to '<=', be pushed upstream?
Yes. I'm sure Andrea will take care of that himself.
-Andi
Andi Kleen <[email protected]> wrote:
>
> On Tue, Nov 16, 2004 at 10:54:09PM -0500, Chuck Ebbert wrote:
> > On Wed, 17 Nov 2004 at 02:00:20 +0100, Andi Kleen wrote:
> >
> > > On Mon, Nov 15, 2004 at 11:15:51PM -0500, Chuck Ebbert wrote:
> > > > Andrea posted this one-liner a while ago as part of a larger patch. He said
> > > > it fixed return of the wrong policy in some conditions. Was this a valid fix?
> > >
> > > Yes it was.
> >
> > At least it wasn't dropped -- it's in -mm as part of
> > fix-for-mpol-mm-corruption-on-tmpfs, though it's unrelated to tmpfs.
> > (That patch contains three separate changes...)
> >
> > Should just this part, which changes '<' to '<=', be pushed upstream?
>
> Yes. I'm sure Andrea will take care of that himself.
>
That fix is contained within fix-for-mpol-mm-corruption-on-tmpfs.patch
anyway, isn't it? And that patch is slated for merging once I work out
whether Hugh and Andrea have sorted things out?
From: Andrea Arcangeli <[email protected]>
With the inline symlink shmem_inode_info structure is overwritten with data
until vfs_inode, and that caused the ->policy to be a corrupted pointer
during unlink. It wasn't immediatly easy to see what was going on due the
random mm corruption that generated a weird oops, it looked more like a
race condition on freed memory at first.
There's simply no need to set a policy for inodes, since the idx is always
zero. All we have to do is to initialize the data structure (the semaphore
may need to run during the page allocation for the non-inline symlink) but
we don't need to allocate the rb nodes. This way we don't need to call
mpol_free during the destroy_inode (not doable at all if the policy rbtree
is corrupt by the inline symlink ;).
An equivalent version of this patch based on a 2.6.5 tree with additional
numa features on top of this (i.e. interleaved by default, and that's
prompted me to add a comment in the LNK init path), works fine in a numa
simulation on my laptop (untested on the bare hardware).
The patch includes another unrelated bugfix I did while checking
mempolicy.c code that would return the wrong policy in some case and some
unrelated optimizations again in mempolicy.c (like to avoid rebalancing the
tree while destroying it and by breaking loops early and not checking for
invariant conditions in the replace operation). You want to review the
rebalance optimization I did in shared_policy_replace, that's tricky code.
Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
25-akpm/mm/mempolicy.c | 18 ++++++++----------
25-akpm/mm/shmem.c | 12 ++++++++++--
2 files changed, 18 insertions(+), 12 deletions(-)
diff -puN mm/mempolicy.c~fix-for-mpol-mm-corruption-on-tmpfs mm/mempolicy.c
--- 25/mm/mempolicy.c~fix-for-mpol-mm-corruption-on-tmpfs 2004-11-15 20:01:14.174522104 -0800
+++ 25-akpm/mm/mempolicy.c 2004-11-15 20:01:14.179521344 -0800
@@ -1078,13 +1078,13 @@ sp_lookup(struct shared_policy *sp, unsi
while (n) {
struct sp_node *p = rb_entry(n, struct sp_node, nd);
- if (start >= p->end) {
+
+ if (start >= p->end)
n = n->rb_right;
- } else if (end < p->start) {
+ else if (end <= p->start)
n = n->rb_left;
- } else {
+ else
break;
- }
}
if (!n)
return NULL;
@@ -1197,12 +1197,10 @@ restart:
return -ENOMEM;
goto restart;
}
- n->end = end;
+ n->end = start;
sp_insert(sp, new2);
- new2 = NULL;
- }
- /* Old crossing beginning, but not end (easy) */
- if (n->start < start && n->end > start)
+ break;
+ } else
n->end = start;
}
if (!next)
@@ -1256,11 +1254,11 @@ void mpol_free_shared_policy(struct shar
while (next) {
n = rb_entry(next, struct sp_node, nd);
next = rb_next(&n->nd);
- rb_erase(&n->nd, &p->root);
mpol_free(n->policy);
kmem_cache_free(sn_cache, n);
}
spin_unlock(&p->lock);
+ p->root = RB_ROOT;
}
struct page *
diff -puN mm/shmem.c~fix-for-mpol-mm-corruption-on-tmpfs mm/shmem.c
--- 25/mm/shmem.c~fix-for-mpol-mm-corruption-on-tmpfs 2004-11-15 20:01:14.175521952 -0800
+++ 25-akpm/mm/shmem.c 2004-11-15 20:01:14.181521040 -0800
@@ -1283,7 +1283,6 @@ shmem_get_inode(struct super_block *sb,
info = SHMEM_I(inode);
memset(info, 0, (char *)inode - (char *)info);
spin_lock_init(&info->lock);
- mpol_shared_policy_init(&info->policy);
INIT_LIST_HEAD(&info->swaplist);
switch (mode & S_IFMT) {
@@ -1294,6 +1293,7 @@ shmem_get_inode(struct super_block *sb,
case S_IFREG:
inode->i_op = &shmem_inode_operations;
inode->i_fop = &shmem_file_operations;
+ mpol_shared_policy_init(&info->policy);
break;
case S_IFDIR:
inode->i_nlink++;
@@ -1303,6 +1303,11 @@ shmem_get_inode(struct super_block *sb,
inode->i_fop = &simple_dir_operations;
break;
case S_IFLNK:
+ /*
+ * Must not load anything in the rbtree,
+ * mpol_free_shared_policy will not be called.
+ */
+ mpol_shared_policy_init(&info->policy);
break;
}
} else if (sbinfo) {
@@ -2021,7 +2026,10 @@ static struct inode *shmem_alloc_inode(s
static void shmem_destroy_inode(struct inode *inode)
{
- mpol_free_shared_policy(&SHMEM_I(inode)->policy);
+ if ((inode->i_mode & S_IFMT) == S_IFREG) {
+ /* only struct inode is valid if it's an inline symlink */
+ mpol_free_shared_policy(&SHMEM_I(inode)->policy);
+ }
kmem_cache_free(shmem_inode_cachep, SHMEM_I(inode));
}
_
On Wed, 17 Nov 2004, Andrew Morton wrote:
> Andi Kleen <[email protected]> wrote:
> > On Tue, Nov 16, 2004 at 10:54:09PM -0500, Chuck Ebbert wrote:
> > > On Wed, 17 Nov 2004 at 02:00:20 +0100, Andi Kleen wrote:
> > > > On Mon, Nov 15, 2004 at 11:15:51PM -0500, Chuck Ebbert wrote:
> > > > > Andrea posted this one-liner a while ago as part of a larger patch. He said
> > > > > it fixed return of the wrong policy in some conditions. Was this a valid fix?
> > > >
> > > > Yes it was.
> > >
> > > At least it wasn't dropped -- it's in -mm as part of
> > > fix-for-mpol-mm-corruption-on-tmpfs, though it's unrelated to tmpfs.
> > > (That patch contains three separate changes...)
> > >
> > > Should just this part, which changes '<' to '<=', be pushed upstream?
> >
> > Yes. I'm sure Andrea will take care of that himself.
>
> That fix is contained within fix-for-mpol-mm-corruption-on-tmpfs.patch
> anyway, isn't it?
Yes; and Chuck is right that it's three patches not one.
I think at the least you should split it by file into mm/shmem.c
and mm/mempolicy.c parts, they're entirely independent.
I've seen Andi's ack on the '<=' fix,
I've not seen his ack on the mempolicy optimizations.
> And that patch is slated for merging once I work out
> whether Hugh and Andrea have sorted things out?
Well... it remains the case that Andrea prefers his shmem.c
patch to mine, and I prefer mine to his, while we both agree
the other's works. I'm a lot more anxious to see the fix go
into 2.6.10 than to lose it amidst debate back and forth; and
I bet Andrea feels just the same. Choose whichever you prefer
or find easier to go with - I expect that'll be Andrea's since
you have it there in your tree.
I'm rather more relaxed about it since observing that you now have
Steve Longerbeam's patch, acked by Andi, in your tree. I presume
you're intending that to go in 2.6.11 or 12, rather than just putting
it there to experiment? It's a bit silly at present since it leaves
the shmem info->policy in place, while adding a mapping->policy:
I need to go in to convert over and remove shmem's info->policy.
Whereupon the whole problem fixed by Andrea, and the area of our
disagreement, will just vanish.
Hugh
Hugh Dickins <[email protected]> wrote:
>
> On Wed, 17 Nov 2004, Andrew Morton wrote:
> > Andi Kleen <[email protected]> wrote:
> > > On Tue, Nov 16, 2004 at 10:54:09PM -0500, Chuck Ebbert wrote:
> > > > On Wed, 17 Nov 2004 at 02:00:20 +0100, Andi Kleen wrote:
> > > > > On Mon, Nov 15, 2004 at 11:15:51PM -0500, Chuck Ebbert wrote:
> > > > > > Andrea posted this one-liner a while ago as part of a larger patch. He said
> > > > > > it fixed return of the wrong policy in some conditions. Was this a valid fix?
> > > > >
> > > > > Yes it was.
> > > >
> > > > At least it wasn't dropped -- it's in -mm as part of
> > > > fix-for-mpol-mm-corruption-on-tmpfs, though it's unrelated to tmpfs.
> > > > (That patch contains three separate changes...)
> > > >
> > > > Should just this part, which changes '<' to '<=', be pushed upstream?
> > >
> > > Yes. I'm sure Andrea will take care of that himself.
> >
> > That fix is contained within fix-for-mpol-mm-corruption-on-tmpfs.patch
> > anyway, isn't it?
>
> Yes; and Chuck is right that it's three patches not one.
Always a source of hassles, that.
> I think at the least you should split it by file into mm/shmem.c
> and mm/mempolicy.c parts, they're entirely independent.
>
> I've seen Andi's ack on the '<=' fix,
> I've not seen his ack on the mempolicy optimizations.
Sigh. OK, I'll split the patch into three and will feed the `<=' fix and
the symlink fix into 2.6.10. The mempolicy optimisation can await 2.6.11.
Andrew Morton <[email protected]> wrote:
>
> OK, I'll split the patch into three and will feed the `<=' fix and
> the symlink fix into 2.6.10.
Here's the splitup:
fix-for-mpol-mm-corruption-on-tmpfs.patch:
From: Andrea Arcangeli <[email protected]>
With the inline symlink shmem_inode_info structure is overwritten with data
until vfs_inode, and that caused the ->policy to be a corrupted pointer
during unlink. It wasn't immediatly easy to see what was going on due the
random mm corruption that generated a weird oops, it looked more like a
race condition on freed memory at first.
There's simply no need to set a policy for inodes, since the idx is always
zero. All we have to do is to initialize the data structure (the semaphore
may need to run during the page allocation for the non-inline symlink) but
we don't need to allocate the rb nodes. This way we don't need to call
mpol_free during the destroy_inode (not doable at all if the policy rbtree
is corrupt by the inline symlink ;).
An equivalent version of this patch based on a 2.6.5 tree with additional
numa features on top of this (i.e. interleaved by default, and that's
prompted me to add a comment in the LNK init path), works fine in a numa
simulation on my laptop (untested on the bare hardware).
Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
25-akpm/mm/shmem.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)
diff -puN mm/shmem.c~fix-for-mpol-mm-corruption-on-tmpfs mm/shmem.c
--- 25/mm/shmem.c~fix-for-mpol-mm-corruption-on-tmpfs 2004-11-17 12:25:10.202621096 -0800
+++ 25-akpm/mm/shmem.c 2004-11-17 12:25:10.208620184 -0800
@@ -1283,7 +1283,6 @@ shmem_get_inode(struct super_block *sb,
info = SHMEM_I(inode);
memset(info, 0, (char *)inode - (char *)info);
spin_lock_init(&info->lock);
- mpol_shared_policy_init(&info->policy);
INIT_LIST_HEAD(&info->swaplist);
switch (mode & S_IFMT) {
@@ -1294,6 +1293,7 @@ shmem_get_inode(struct super_block *sb,
case S_IFREG:
inode->i_op = &shmem_inode_operations;
inode->i_fop = &shmem_file_operations;
+ mpol_shared_policy_init(&info->policy);
break;
case S_IFDIR:
inode->i_nlink++;
@@ -1303,6 +1303,11 @@ shmem_get_inode(struct super_block *sb,
inode->i_fop = &simple_dir_operations;
break;
case S_IFLNK:
+ /*
+ * Must not load anything in the rbtree,
+ * mpol_free_shared_policy will not be called.
+ */
+ mpol_shared_policy_init(&info->policy);
break;
}
} else if (sbinfo) {
@@ -2021,7 +2026,10 @@ static struct inode *shmem_alloc_inode(s
static void shmem_destroy_inode(struct inode *inode)
{
- mpol_free_shared_policy(&SHMEM_I(inode)->policy);
+ if ((inode->i_mode & S_IFMT) == S_IFREG) {
+ /* only struct inode is valid if it's an inline symlink */
+ mpol_free_shared_policy(&SHMEM_I(inode)->policy);
+ }
kmem_cache_free(shmem_inode_cachep, SHMEM_I(inode));
}
_
mempolicy-selects-wrong-policy-fix.patch:
From: Andrea Arcangeli <[email protected]>
mempolicy.c code will return the wrong policy in some cases.
Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
25-akpm/mm/mempolicy.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff -puN mm/mempolicy.c~mempolicy-selects-wrong-policy-fix mm/mempolicy.c
--- 25/mm/mempolicy.c~mempolicy-selects-wrong-policy-fix 2004-11-17 12:25:42.253748584 -0800
+++ 25-akpm/mm/mempolicy.c 2004-11-17 12:25:42.257747976 -0800
@@ -1092,13 +1092,13 @@ sp_lookup(struct shared_policy *sp, unsi
while (n) {
struct sp_node *p = rb_entry(n, struct sp_node, nd);
- if (start >= p->end) {
+
+ if (start >= p->end)
n = n->rb_right;
- } else if (end < p->start) {
+ else if (end <= p->start)
n = n->rb_left;
- } else {
+ else
break;
- }
}
if (!n)
return NULL;
_
mempolicy-optimization.patch:
From: Andrea Arcangeli <[email protected]>
Some optimizations in mempolicy.c (like to avoid rebalancing the tree while
destroying it and by breaking loops early and not checking for invariant
conditions in the replace operation).
Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
25-akpm/mm/mempolicy.c | 10 ++++------
1 files changed, 4 insertions(+), 6 deletions(-)
diff -puN mm/mempolicy.c~mempolicy-optimization mm/mempolicy.c
--- 25/mm/mempolicy.c~mempolicy-optimization 2004-11-17 12:26:40.149947024 -0800
+++ 25-akpm/mm/mempolicy.c 2004-11-17 12:26:40.153946416 -0800
@@ -1211,12 +1211,10 @@ restart:
return -ENOMEM;
goto restart;
}
- n->end = end;
+ n->end = start;
sp_insert(sp, new2);
- new2 = NULL;
- }
- /* Old crossing beginning, but not end (easy) */
- if (n->start < start && n->end > start)
+ break;
+ } else
n->end = start;
}
if (!next)
@@ -1270,11 +1268,11 @@ void mpol_free_shared_policy(struct shar
while (next) {
n = rb_entry(next, struct sp_node, nd);
next = rb_next(&n->nd);
- rb_erase(&n->nd, &p->root);
mpol_free(n->policy);
kmem_cache_free(sn_cache, n);
}
spin_unlock(&p->lock);
+ p->root = RB_ROOT;
}
struct page *
_
On Wed, Nov 17, 2004 at 12:21:23PM -0800, Andrew Morton wrote:
> Sigh. OK, I'll split the patch into three and will feed the `<=' fix and
> the symlink fix into 2.6.10. [..]
thanks.
> [..] The mempolicy optimisation can await 2.6.11.
sure.
About Hugh's version of the shmem.c part, I'm fine with it, but I find
more robust to destroy the mpol in the delete_inode callback than in
delete_inode (for shmfs is the same due the dcache pin), since delete_inode
is normally associated with the unlink operation, but the mpol must go
away before the inode is freed, and the inode is freed in the
destroy_inode (again for shmfs it's the same as delete_inode), plus I
find my version a bit simpler.
As Hugh said as far as one of the two ges merged I'm fine of course ;).
On Wed, Nov 17, 2004 at 08:06:35PM +0000, Hugh Dickins wrote:
> I bet Andrea feels just the same. [..]
indeed
> Some optimizations in mempolicy.c (like to avoid rebalancing the tree while
> destroying it and by breaking loops early and not checking for invariant
> conditions in the replace operation).
[...]
>
> diff -puN mm/mempolicy.c~mempolicy-optimization mm/mempolicy.c
> --- 25/mm/mempolicy.c~mempolicy-optimization 2004-11-17 12:26:40.149947024 -0800
> +++ 25-akpm/mm/mempolicy.c 2004-11-17 12:26:40.153946416 -0800
> @@ -1211,12 +1211,10 @@ restart:
> return -ENOMEM;
> goto restart;
> }
> - n->end = end;
> + n->end = start;
> sp_insert(sp, new2);
> - new2 = NULL;
> - }
> - /* Old crossing beginning, but not end (easy) */
> - if (n->start < start && n->end > start)
> + break;
> + } else
> n->end = start;
> }
> if (!next)
I'm not quite sure about this one.
> @@ -1270,11 +1268,11 @@ void mpol_free_shared_policy(struct shar
> while (next) {
> n = rb_entry(next, struct sp_node, nd);
> next = rb_next(&n->nd);
> - rb_erase(&n->nd, &p->root);
> mpol_free(n->policy);
> kmem_cache_free(sn_cache, n);
> }
> spin_unlock(&p->lock);
> + p->root = RB_ROOT;
> }
This hunk is fine.
-Andi