From: Andi Kleen <[email protected]>
slab_node() could access current->mempolicy from interrupt context.
However there's a race condition during exit where the mempolicy
is first freed and then the pointer zeroed.
Using this from interrupts seems bogus anyways. The interrupt
will interrupt a random process and therefore get a random
mempolicy. Many times, this will be idle's, which noone can change.
Just disable this here and always use local for slab
from interrupts. I also cleaned up the callers of slab_node a bit
which always passed the same argument.
I believe the original mempolicy code did that in fact,
so it's likely a regression.
v2: send version with correct logic
v3: simplify. fix typo.
Reported-by: Arun Sharma <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/mempolicy.h | 2 +-
mm/mempolicy.c | 6 ++++--
mm/slab.c | 4 ++--
mm/slub.c | 2 +-
4 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 7c727a9..7106786 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -215,7 +215,7 @@ extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
extern bool init_nodemask_of_mempolicy(nodemask_t *mask);
extern bool mempolicy_nodemask_intersects(struct task_struct *tsk,
const nodemask_t *mask);
-extern unsigned slab_node(struct mempolicy *policy);
+extern unsigned slab_node(void);
extern enum zone_type policy_zone;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index cfb6c86..e05e007 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1586,9 +1586,11 @@ static unsigned interleave_nodes(struct mempolicy *policy)
* task can change it's policy. The system default policy requires no
* such protection.
*/
-unsigned slab_node(struct mempolicy *policy)
+unsigned slab_node(void)
{
- if (!policy || policy->flags & MPOL_F_LOCAL)
+ struct mempolicy *policy = current->mempolicy;
+
+ if (!in_interrupt() || !policy || policy->flags & MPOL_F_LOCAL)
return numa_node_id();
switch (policy->mode) {
diff --git a/mm/slab.c b/mm/slab.c
index e901a36..af3b405 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3336,7 +3336,7 @@ static void *alternate_node_alloc(struct kmem_cache *cachep, gfp_t flags)
if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
nid_alloc = cpuset_slab_spread_node();
else if (current->mempolicy)
- nid_alloc = slab_node(current->mempolicy);
+ nid_alloc = slab_node();
if (nid_alloc != nid_here)
return ____cache_alloc_node(cachep, flags, nid_alloc);
return NULL;
@@ -3368,7 +3368,7 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
retry_cpuset:
cpuset_mems_cookie = get_mems_allowed();
- zonelist = node_zonelist(slab_node(current->mempolicy), flags);
+ zonelist = node_zonelist(slab_node(), flags);
retry:
/*
diff --git a/mm/slub.c b/mm/slub.c
index ffe13fd..ef936f3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1614,7 +1614,7 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags,
do {
cpuset_mems_cookie = get_mems_allowed();
- zonelist = node_zonelist(slab_node(current->mempolicy), flags);
+ zonelist = node_zonelist(slab_node(), flags);
for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
struct kmem_cache_node *n;
--
1.7.7.6
> diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
> index 7c727a9..7106786 100644
> --- a/include/linux/mempolicy.h
> +++ b/include/linux/mempolicy.h
> @@ -215,7 +215,7 @@ extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
> ?extern bool init_nodemask_of_mempolicy(nodemask_t *mask);
> ?extern bool mempolicy_nodemask_intersects(struct task_struct *tsk,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?const nodemask_t *mask);
> -extern unsigned slab_node(struct mempolicy *policy);
> +extern unsigned slab_node(void);
>
> ?extern enum zone_type policy_zone;
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index cfb6c86..e05e007 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1586,9 +1586,11 @@ static unsigned interleave_nodes(struct mempolicy *policy)
> ?* task can change it's policy. ?The system default policy requires no
> ?* such protection.
> ?*/
> -unsigned slab_node(struct mempolicy *policy)
> +unsigned slab_node(void)
> ?{
> - ? ? ? if (!policy || policy->flags & MPOL_F_LOCAL)
> + ? ? ? struct mempolicy *policy = current->mempolicy;
> +
> + ? ? ? if (!in_interrupt() || !policy || policy->flags & MPOL_F_LOCAL)
> ? ? ? ? ? ? ? ?return numa_node_id();
I think your patch is correct. but I don't like interrupt context
dereference current task.
It would be nice if we only see current->mempolicy when !in_interrupt.
But this doesn't mean NAK anyway.
Acked-by: KOSAKI Motohiro <[email protected]>
On Mon, 7 May 2012, Andi Kleen wrote:
> slab_node() could access current->mempolicy from interrupt context.
> However there's a race condition during exit where the mempolicy
> is first freed and then the pointer zeroed.
Acked-by: Christoph Lameter <[email protected]>
On Mon, 7 May 2012, KOSAKI Motohiro wrote:
> > diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
> > index 7c727a9..7106786 100644
> > --- a/include/linux/mempolicy.h
> > +++ b/include/linux/mempolicy.h
> > @@ -215,7 +215,7 @@ extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
> > ?extern bool init_nodemask_of_mempolicy(nodemask_t *mask);
> > ?extern bool mempolicy_nodemask_intersects(struct task_struct *tsk,
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?const nodemask_t *mask);
> > -extern unsigned slab_node(struct mempolicy *policy);
> > +extern unsigned slab_node(void);
> >
> > ?extern enum zone_type policy_zone;
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index cfb6c86..e05e007 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -1586,9 +1586,11 @@ static unsigned interleave_nodes(struct mempolicy *policy)
> > ?* task can change it's policy. ?The system default policy requires no
> > ?* such protection.
> > ?*/
> > -unsigned slab_node(struct mempolicy *policy)
> > +unsigned slab_node(void)
> > ?{
> > - ? ? ? if (!policy || policy->flags & MPOL_F_LOCAL)
> > + ? ? ? struct mempolicy *policy = current->mempolicy;
> > +
> > + ? ? ? if (!in_interrupt() || !policy || policy->flags & MPOL_F_LOCAL)
> > ? ? ? ? ? ? ? ?return numa_node_id();
>
> I think your patch is correct. but I don't like interrupt context
> dereference current task.
> It would be nice if we only see current->mempolicy when !in_interrupt.
>
> But this doesn't mean NAK anyway.
>
> Acked-by: KOSAKI Motohiro <[email protected]>
Sigh, this was acked by Christoph and KOSAKI when the logic is reversed
and does the exact opposite of what's intended?
>> > + ? ? ? if (!in_interrupt() || !policy || policy->flags & MPOL_F_LOCAL)
>> > ? ? ? ? ? ? ? ?return numa_node_id();
>>
>> I think your patch is correct. but I don't like interrupt context
>> dereference current task.
>> It would be nice if we only see current->mempolicy when !in_interrupt.
>>
>> But this doesn't mean NAK anyway.
>>
>> ?Acked-by: KOSAKI Motohiro <[email protected]>
>
> Sigh, this was acked by Christoph and KOSAKI when the logic is reversed
> and does the exact opposite of what's intended?
Ahhh, Good catch!
On Tue, 8 May 2012, David Rientjes wrote:
> > But this doesn't mean NAK anyway.
> >
> > Acked-by: KOSAKI Motohiro <[email protected]>
>
> Sigh, this was acked by Christoph and KOSAKI when the logic is reversed
> and does the exact opposite of what's intended?
Ping? Anyone going to send a fixed up patch?
On Wed, 30 May 2012, Pekka Enberg wrote:
> On Tue, 8 May 2012, David Rientjes wrote:
> > > But this doesn't mean NAK anyway.
> > >
> > > Acked-by: KOSAKI Motohiro <[email protected]>
> >
> > Sigh, this was acked by Christoph and KOSAKI when the logic is reversed
> > and does the exact opposite of what's intended?
>
> Ping? Anyone going to send a fixed up patch?
I thought this was done and fixed in another email thread?
On Wed, May 30, 2012 at 10:40:39AM -0500, Christoph Lameter wrote:
> On Wed, 30 May 2012, Pekka Enberg wrote:
>
> > On Tue, 8 May 2012, David Rientjes wrote:
> > > > But this doesn't mean NAK anyway.
> > > >
> > > > Acked-by: KOSAKI Motohiro <[email protected]>
> > >
> > > Sigh, this was acked by Christoph and KOSAKI when the logic is reversed
> > > and does the exact opposite of what's intended?
> >
> > Ping? Anyone going to send a fixed up patch?
>
> I thought this was done and fixed in another email thread?
Yes the latest patch was final. But I can resend.
-Andi
>
--
[email protected] -- Speaking for myself only
On Wed, 30 May 2012, Andi Kleen wrote:
> > > > Sigh, this was acked by Christoph and KOSAKI when the logic is reversed
> > > > and does the exact opposite of what's intended?
> > >
> > > Ping? Anyone going to send a fixed up patch?
> >
> > I thought this was done and fixed in another email thread?
>
> Yes the latest patch was final. But I can resend.
>
Latest patch where? The last patch posted in this thread had reversed
logic.
Any idea why this isn't in today's linux-next if it was fixed in another
thread?
From: Andi Kleen <[email protected]>
slab_node() could access current->mempolicy from interrupt context.
However there's a race condition during exit where the mempolicy
is first freed and then the pointer zeroed.
Using this from interrupts seems bogus anyways. The interrupt
will interrupt a random process and therefore get a random
mempolicy. Many times, this will be idle's, which noone can change.
Just disable this here and always use local for slab
from interrupts. I also cleaned up the callers of slab_node a bit
which always passed the same argument.
I believe the original mempolicy code did that in fact,
so it's likely a regression.
v2: send version with correct logic
v3: simplify. fix typo.
Reported-by: Arun Sharma <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
[[email protected]: Rework patch logic and avoid dereference of current
task if in interrupt context.]
Signed-off-by: David Mackey <[email protected]>
---
include/linux/mempolicy.h | 2 +-
mm/mempolicy.c | 4 +++-
mm/slab.c | 4 ++--
mm/slub.c | 2 +-
4 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 4aa4273..95b738c 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -215,7 +215,7 @@ extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
extern bool init_nodemask_of_mempolicy(nodemask_t *mask);
extern bool mempolicy_nodemask_intersects(struct task_struct *tsk,
const nodemask_t *mask);
-extern unsigned slab_node(struct mempolicy *policy);
+extern unsigned slab_node(void);
extern enum zone_type policy_zone;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f15c1b2..65801a0 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1602,8 +1602,10 @@ static unsigned interleave_nodes(struct mempolicy *policy)
* task can change it's policy. The system default policy requires no
* such protection.
*/
-unsigned slab_node(struct mempolicy *policy)
+unsigned slab_node(void)
{
+ struct mempolicy *policy = in_interrupt() ? NULL : current->mempolicy;
+
if (!policy || policy->flags & MPOL_F_LOCAL)
return numa_node_id();
diff --git a/mm/slab.c b/mm/slab.c
index e901a36..af3b405 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3336,7 +3336,7 @@ static void *alternate_node_alloc(struct kmem_cache *cachep, gfp_t flags)
if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
nid_alloc = cpuset_slab_spread_node();
else if (current->mempolicy)
- nid_alloc = slab_node(current->mempolicy);
+ nid_alloc = slab_node();
if (nid_alloc != nid_here)
return ____cache_alloc_node(cachep, flags, nid_alloc);
return NULL;
@@ -3368,7 +3368,7 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
retry_cpuset:
cpuset_mems_cookie = get_mems_allowed();
- zonelist = node_zonelist(slab_node(current->mempolicy), flags);
+ zonelist = node_zonelist(slab_node(), flags);
retry:
/*
diff --git a/mm/slub.c b/mm/slub.c
index 80848cd..b4f23ad 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1614,7 +1614,7 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags,
do {
cpuset_mems_cookie = get_mems_allowed();
- zonelist = node_zonelist(slab_node(current->mempolicy), flags);
+ zonelist = node_zonelist(slab_node(), flags);
for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
struct kmem_cache_node *n;
--
1.7.7.6
> [[email protected]: Rework patch logic and avoid dereference of current
> task if in interrupt context.]
avoiding this reference doesn't make sense, it's totally valid.
This is based on a older version. I sent the fixed one some time ago.
-Andi
--
[email protected] -- Speaking for myself only
On Thu, May 31, 2012 at 12:22 AM, Andi Kleen <[email protected]> wrote:
>> [[email protected]: Rework patch logic and avoid dereference of current
>> task if in interrupt context.]
>
> avoiding this reference doesn't make sense, it's totally valid.
> This is based on a older version. I sent the fixed one some time ago.
Where? I think David's version is most cleaner one.
Acked-by: KOSAKI Motohiro <[email protected]>
On Thu, May 31, 2012 at 12:30 AM, David Rientjes <[email protected]> wrote:
> Any idea why this isn't in today's linux-next if it was fixed in another
> thread?
I didn't seen the fixed one either which is why I haven't picked it up.
On Thu, May 31, 2012 at 12:22 AM, Andi Kleen <[email protected]> wrote:
>>> [[email protected]: Rework patch logic and avoid dereference of current
>>> task if in interrupt context.]
>>
>> avoiding this reference doesn't make sense, it's totally valid.
>> This is based on a older version. I sent the fixed one some time ago.
On Thu, May 31, 2012 at 7:59 AM, KOSAKI Motohiro
<[email protected]> wrote:
> Where? I think David's version is most cleaner one.
>
> ?Acked-by: KOSAKI Motohiro <[email protected]>
Monsieur Lameter, Monsieur Rientjes, ACK/NAK?
On Wed, 30 May 2012, David Mackey wrote:
> From: Andi Kleen <[email protected]>
>
> slab_node() could access current->mempolicy from interrupt context.
> However there's a race condition during exit where the mempolicy
> is first freed and then the pointer zeroed.
>
> Using this from interrupts seems bogus anyways. The interrupt
> will interrupt a random process and therefore get a random
> mempolicy. Many times, this will be idle's, which noone can change.
>
> Just disable this here and always use local for slab
> from interrupts. I also cleaned up the callers of slab_node a bit
> which always passed the same argument.
>
> I believe the original mempolicy code did that in fact,
> so it's likely a regression.
>
> v2: send version with correct logic
> v3: simplify. fix typo.
> Reported-by: Arun Sharma <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>
> [[email protected]: Rework patch logic and avoid dereference of current
> task if in interrupt context.]
> Signed-off-by: David Mackey <[email protected]>
Acked-by: David Rientjes <[email protected]>
Thanks for following up on this.