From: Andi Kleen <[email protected]>
slab_node() could access current->mempolicy from interrupt context.
However there's a race condition during exit where the mempolicy
is first freed and then the pointer zeroed.
Using this from interrupts seems bogus anyways. The interrupt
will interrupt a random process and therefore get a random
mempolicy. Many times, this will be idle's, which noone can change.
Just disable this here and always use local for slab
from interrupts. I also cleaned up the callers of slab_node a bit
which always passed the same argument.
I believe the original mempolicy code did that in fact,
so it's likely a regression.
v2: send version with correct logic
v3: simplify. fix typo.
Reported-by: Arun Sharma <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/mempolicy.h | 2 +-
mm/mempolicy.c | 6 ++++--
mm/slab.c | 4 ++--
mm/slub.c | 2 +-
4 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 7c727a9..7106786 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -215,7 +215,7 @@ extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
extern bool init_nodemask_of_mempolicy(nodemask_t *mask);
extern bool mempolicy_nodemask_intersects(struct task_struct *tsk,
const nodemask_t *mask);
-extern unsigned slab_node(struct mempolicy *policy);
+extern unsigned slab_node(void);
extern enum zone_type policy_zone;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index cfb6c86..b65eb06 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1586,9 +1586,11 @@ static unsigned interleave_nodes(struct mempolicy *policy)
* task can change it's policy. The system default policy requires no
* such protection.
*/
-unsigned slab_node(struct mempolicy *policy)
+unsigned slab_node(void)
{
- if (!policy || policy->flags & MPOL_F_LOCAL)
+ struct mempolicy *policy = current->mempolicy;
+
+ if (in_interrupt() || !policy || policy->flags & MPOL_F_LOCAL)
return numa_node_id();
switch (policy->mode) {
diff --git a/mm/slab.c b/mm/slab.c
index e901a36..af3b405 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3336,7 +3336,7 @@ static void *alternate_node_alloc(struct kmem_cache *cachep, gfp_t flags)
if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
nid_alloc = cpuset_slab_spread_node();
else if (current->mempolicy)
- nid_alloc = slab_node(current->mempolicy);
+ nid_alloc = slab_node();
if (nid_alloc != nid_here)
return ____cache_alloc_node(cachep, flags, nid_alloc);
return NULL;
@@ -3368,7 +3368,7 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
retry_cpuset:
cpuset_mems_cookie = get_mems_allowed();
- zonelist = node_zonelist(slab_node(current->mempolicy), flags);
+ zonelist = node_zonelist(slab_node(), flags);
retry:
/*
diff --git a/mm/slub.c b/mm/slub.c
index ffe13fd..ef936f3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1614,7 +1614,7 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags,
do {
cpuset_mems_cookie = get_mems_allowed();
- zonelist = node_zonelist(slab_node(current->mempolicy), flags);
+ zonelist = node_zonelist(slab_node(), flags);
for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
struct kmem_cache_node *n;
--
1.7.7.6
>
> slab_node() could access current->mempolicy from interrupt context.
> However there's a race condition during exit where the mempolicy
> is first freed and then the pointer zeroed.
>
> Using this from interrupts seems bogus anyways. The interrupt
> will interrupt a random process and therefore get a random
> mempolicy. Many times, this will be idle's, which noone can change.
>
> Just disable this here and always use local for slab
> from interrupts. I also cleaned up the callers of slab_node a bit
> which always passed the same argument.
>
> I believe the original mempolicy code did that in fact,
> so it's likely a regression.
Reviewed-by: Christoph Lameter <[email protected]>
On Wed, 30 May 2012, Andi Kleen wrote:
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index cfb6c86..b65eb06 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1586,9 +1586,11 @@ static unsigned interleave_nodes(struct mempolicy *policy)
> * task can change it's policy. The system default policy requires no
> * such protection.
> */
> -unsigned slab_node(struct mempolicy *policy)
> +unsigned slab_node(void)
> {
> - if (!policy || policy->flags & MPOL_F_LOCAL)
> + struct mempolicy *policy = current->mempolicy;
> +
> + if (in_interrupt() || !policy || policy->flags & MPOL_F_LOCAL)
> return numa_node_id();
>
> switch (policy->mode) {
I think the version proposed by David Mackey is more clear, it makes it
obvious that we don't want to dereference current in interrupt context
whereas that relies on short-circuiting your conditional in your approach.
On Thu, May 31, 2012 at 11:45 PM, David Rientjes <[email protected]> wrote:
> On Wed, 30 May 2012, Andi Kleen wrote:
>
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index cfb6c86..b65eb06 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -1586,9 +1586,11 @@ static unsigned interleave_nodes(struct mempolicy *policy)
>> ? * task can change it's policy. ?The system default policy requires no
>> ? * such protection.
>> ? */
>> -unsigned slab_node(struct mempolicy *policy)
>> +unsigned slab_node(void)
>> ?{
>> - ? ? if (!policy || policy->flags & MPOL_F_LOCAL)
>> + ? ? struct mempolicy *policy = current->mempolicy;
>> +
>> + ? ? if (in_interrupt() || !policy || policy->flags & MPOL_F_LOCAL)
>> ? ? ? ? ? ? ? return numa_node_id();
>>
>> ? ? ? switch (policy->mode) {
>
> I think the version proposed by David Mackey is more clear, it makes it
> obvious that we don't want to dereference current in interrupt context
> whereas that relies on short-circuiting your conditional in your approach.
I like it better also. Christoph, Andi?
On Fri, 1 Jun 2012, Pekka Enberg wrote:
> > I think the version proposed by David Mackey is more clear, it makes it
> > obvious that we don't want to dereference current in interrupt context
> > whereas that relies on short-circuiting your conditional in your approach.
>
> I like it better also. Christoph, Andi?
I dont like the conditional in the assignment followed by a test for NULL
which is also not easy to read.
Would prefer that the control flow be redesigned.
Something like
if (in_interrupt())
return numa_node_id();
policy = current->mempolicy
if (!policy || ...)
return numa_node_id();
From: Andi Kleen <[email protected]>
From: Andi Kleen <[email protected]>
slab_node() could access current->mempolicy from interrupt context.
However there's a race condition during exit where the mempolicy
is first freed and then the pointer zeroed.
Using this from interrupts seems bogus anyways. The interrupt
will interrupt a random process and therefore get a random
mempolicy. Many times, this will be idle's, which noone can change.
Just disable this here and always use local for slab
from interrupts. I also cleaned up the callers of slab_node a bit
which always passed the same argument.
I believe the original mempolicy code did that in fact,
so it's likely a regression.
v2: send version with correct logic
v3: simplify. fix typo.
Reported-by: Arun Sharma <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
[[email protected]: Rework control flow based on feedback from
[email protected], fix logic, and cleanup current task_struct reference]
Signed-off-by: David Mackey <[email protected]>
---
include/linux/mempolicy.h | 2 +-
mm/mempolicy.c | 8 +++++++-
mm/slab.c | 4 ++--
mm/slub.c | 2 +-
4 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 4aa4273..95b738c 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -215,7 +215,7 @@ extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
extern bool init_nodemask_of_mempolicy(nodemask_t *mask);
extern bool mempolicy_nodemask_intersects(struct task_struct *tsk,
const nodemask_t *mask);
-extern unsigned slab_node(struct mempolicy *policy);
+extern unsigned slab_node(void);
extern enum zone_type policy_zone;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f15c1b2..cb0b230 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1602,8 +1602,14 @@ static unsigned interleave_nodes(struct mempolicy *policy)
* task can change it's policy. The system default policy requires no
* such protection.
*/
-unsigned slab_node(struct mempolicy *policy)
+unsigned slab_node(void)
{
+ struct mempolicy *policy;
+
+ if (in_interrupt())
+ return numa_node_id();
+
+ policy = current->mempolicy;
if (!policy || policy->flags & MPOL_F_LOCAL)
return numa_node_id();
diff --git a/mm/slab.c b/mm/slab.c
index e901a36..af3b405 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3336,7 +3336,7 @@ static void *alternate_node_alloc(struct kmem_cache *cachep, gfp_t flags)
if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
nid_alloc = cpuset_slab_spread_node();
else if (current->mempolicy)
- nid_alloc = slab_node(current->mempolicy);
+ nid_alloc = slab_node();
if (nid_alloc != nid_here)
return ____cache_alloc_node(cachep, flags, nid_alloc);
return NULL;
@@ -3368,7 +3368,7 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
retry_cpuset:
cpuset_mems_cookie = get_mems_allowed();
- zonelist = node_zonelist(slab_node(current->mempolicy), flags);
+ zonelist = node_zonelist(slab_node(), flags);
retry:
/*
diff --git a/mm/slub.c b/mm/slub.c
index 8c691fa..0d9241a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1617,7 +1617,7 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
do {
cpuset_mems_cookie = get_mems_allowed();
- zonelist = node_zonelist(slab_node(current->mempolicy), flags);
+ zonelist = node_zonelist(slab_node(), flags);
for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
struct kmem_cache_node *n;
--
1.7.4.1
On Sat, 9 Jun 2012, David Mackey wrote:
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index f15c1b2..cb0b230 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1602,8 +1602,14 @@ static unsigned interleave_nodes(struct mempolicy *policy)
> * task can change it's policy. The system default policy requires no
> * such protection.
> */
> -unsigned slab_node(struct mempolicy *policy)
> +unsigned slab_node(void)
> {
> + struct mempolicy *policy;
> +
> + if (in_interrupt())
> + return numa_node_id();
> +
> + policy = current->mempolicy;
> if (!policy || policy->flags & MPOL_F_LOCAL)
> return numa_node_id();
>
Should probably be numa_mem_id() in both these cases for
CONFIG_HAVE_MEMORYLESS_NODES, but it won't cause a problem in this form
either.
Acked-by: David Rientjes <[email protected]>
On Sat, 9 Jun 2012, David Mackey wrote:
> I believe the original mempolicy code did that in fact,
> so it's likely a regression.
Acked-by: Christoph Lameter <[email protected]>
On Sat, 9 Jun 2012, David Rientjes wrote:
> On Sat, 9 Jun 2012, David Mackey wrote:
>
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index f15c1b2..cb0b230 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -1602,8 +1602,14 @@ static unsigned interleave_nodes(struct mempolicy *policy)
> > * task can change it's policy. The system default policy requires no
> > * such protection.
> > */
> > -unsigned slab_node(struct mempolicy *policy)
> > +unsigned slab_node(void)
> > {
> > + struct mempolicy *policy;
> > +
> > + if (in_interrupt())
> > + return numa_node_id();
> > +
> > + policy = current->mempolicy;
> > if (!policy || policy->flags & MPOL_F_LOCAL)
> > return numa_node_id();
> >
>
> Should probably be numa_mem_id() in both these cases for
> CONFIG_HAVE_MEMORYLESS_NODES, but it won't cause a problem in this form
> either.
>
> Acked-by: David Rientjes <[email protected]>
>
Still missing from linux-next, who's going to pick this up?
On Sun, Jun 17, 2012 at 4:11 AM, David Rientjes <[email protected]> wrote:
> On Sat, 9 Jun 2012, David Rientjes wrote:
>
>> On Sat, 9 Jun 2012, David Mackey wrote:
>>
>> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> > index f15c1b2..cb0b230 100644
>> > --- a/mm/mempolicy.c
>> > +++ b/mm/mempolicy.c
>> > @@ -1602,8 +1602,14 @@ static unsigned interleave_nodes(struct mempolicy *policy)
>> > ? * task can change it's policy. ?The system default policy requires no
>> > ? * such protection.
>> > ? */
>> > -unsigned slab_node(struct mempolicy *policy)
>> > +unsigned slab_node(void)
>> > ?{
>> > + ? struct mempolicy *policy;
>> > +
>> > + ? if (in_interrupt())
>> > + ? ? ? ? ? return numa_node_id();
>> > +
>> > + ? policy = current->mempolicy;
>> > ? ? if (!policy || policy->flags & MPOL_F_LOCAL)
>> > ? ? ? ? ? ? return numa_node_id();
>> >
>>
>> Should probably be numa_mem_id() in both these cases for
>> CONFIG_HAVE_MEMORYLESS_NODES, but it won't cause a problem in this form
>> either.
>>
>> Acked-by: David Rientjes <[email protected]>
>>
>
> Still missing from linux-next, who's going to pick this up?
I'm going to pick it up. I've been postponing merging it until dust
has settled from Christoph's "common slab" patch series.
(6/9/12 5:40 AM), David Mackey wrote:
> From: Andi Kleen<[email protected]>
>
> From: Andi Kleen<[email protected]>
>
> slab_node() could access current->mempolicy from interrupt context.
> However there's a race condition during exit where the mempolicy
> is first freed and then the pointer zeroed.
>
> Using this from interrupts seems bogus anyways. The interrupt
> will interrupt a random process and therefore get a random
> mempolicy. Many times, this will be idle's, which noone can change.
>
> Just disable this here and always use local for slab
> from interrupts. I also cleaned up the callers of slab_node a bit
> which always passed the same argument.
>
> I believe the original mempolicy code did that in fact,
> so it's likely a regression.
>
> v2: send version with correct logic
> v3: simplify. fix typo.
> Reported-by: Arun Sharma<[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Andi Kleen<[email protected]>
> [[email protected]: Rework control flow based on feedback from
> [email protected], fix logic, and cleanup current task_struct reference]
> Signed-off-by: David Mackey<[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
On Mon, Jun 18, 2012 at 11:20 AM, KOSAKI Motohiro
<[email protected]> wrote:
> (6/9/12 5:40 AM), David Mackey wrote:
>> From: Andi Kleen<[email protected]>
>>
>> From: Andi Kleen<[email protected]>
>>
>> slab_node() could access current->mempolicy from interrupt context.
>> However there's a race condition during exit where the mempolicy
>> is first freed and then the pointer zeroed.
>>
>> Using this from interrupts seems bogus anyways. The interrupt
>> will interrupt a random process and therefore get a random
>> mempolicy. Many times, this will be idle's, which noone can change.
>>
>> Just disable this here and always use local for slab
>> from interrupts. I also cleaned up the callers of slab_node a bit
>> which always passed the same argument.
>>
>> I believe the original mempolicy code did that in fact,
>> so it's likely a regression.
>>
>> v2: send version with correct logic
>> v3: simplify. fix typo.
>> Reported-by: Arun Sharma<[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Andi Kleen<[email protected]>
>> [[email protected]: Rework control flow based on feedback from
>> [email protected], fix logic, and cleanup current task_struct reference]
>> Signed-off-by: David Mackey<[email protected]>
>
> Acked-by: KOSAKI Motohiro <[email protected]>
Applied, thanks!