2020-07-28 09:49:29

by Zhang, Qiang

[permalink] [raw]
Subject: [PATCH] mm/slab.c: add node spinlock protect in __cache_free_alien

From: Zhang Qiang <[email protected]>

We should add node spinlock protect "n->alien" which may be
assigned to NULL in cpuup_canceled func. cause address access
exception.

Fixes: 18bf854117c6 ("slab: use get_node() and kmem_cache_node() functions")
Signed-off-by: Zhang Qiang <[email protected]>
---
mm/slab.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index a89633603b2d..290523c90b4e 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -759,8 +759,10 @@ static int __cache_free_alien(struct kmem_cache *cachep, void *objp,

n = get_node(cachep, node);
STATS_INC_NODEFREES(cachep);
+ spin_lock(&n->list_lock);
if (n->alien && n->alien[page_node]) {
alien = n->alien[page_node];
+ spin_unlock(&n->list_lock);
ac = &alien->ac;
spin_lock(&alien->lock);
if (unlikely(ac->avail == ac->limit)) {
@@ -769,14 +771,15 @@ static int __cache_free_alien(struct kmem_cache *cachep, void *objp,
}
ac->entry[ac->avail++] = objp;
spin_unlock(&alien->lock);
- slabs_destroy(cachep, &list);
} else {
+ spin_unlock(&n->list_lock);
n = get_node(cachep, page_node);
spin_lock(&n->list_lock);
free_block(cachep, &objp, 1, page_node, &list);
spin_unlock(&n->list_lock);
- slabs_destroy(cachep, &list);
}
+
+ slabs_destroy(cachep, &list);
return 1;
}

--
2.26.2


2020-07-28 20:04:56

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] mm/slab.c: add node spinlock protect in __cache_free_alien

On Tue, 28 Jul 2020, [email protected] wrote:

> From: Zhang Qiang <[email protected]>
>
> We should add node spinlock protect "n->alien" which may be
> assigned to NULL in cpuup_canceled func. cause address access
> exception.
>

Hi, do you have an example NULL pointer dereference where you have hit
this?

This rather looks like something to fix up in cpuup_canceled() since it's
currently manipulating the alien cache for the canceled cpu's node.

> Fixes: 18bf854117c6 ("slab: use get_node() and kmem_cache_node() functions")
> Signed-off-by: Zhang Qiang <[email protected]>
> ---
> mm/slab.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index a89633603b2d..290523c90b4e 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -759,8 +759,10 @@ static int __cache_free_alien(struct kmem_cache *cachep, void *objp,
>
> n = get_node(cachep, node);
> STATS_INC_NODEFREES(cachep);
> + spin_lock(&n->list_lock);
> if (n->alien && n->alien[page_node]) {
> alien = n->alien[page_node];
> + spin_unlock(&n->list_lock);
> ac = &alien->ac;
> spin_lock(&alien->lock);
> if (unlikely(ac->avail == ac->limit)) {
> @@ -769,14 +771,15 @@ static int __cache_free_alien(struct kmem_cache *cachep, void *objp,
> }
> ac->entry[ac->avail++] = objp;
> spin_unlock(&alien->lock);
> - slabs_destroy(cachep, &list);
> } else {
> + spin_unlock(&n->list_lock);
> n = get_node(cachep, page_node);
> spin_lock(&n->list_lock);
> free_block(cachep, &objp, 1, page_node, &list);
> spin_unlock(&n->list_lock);
> - slabs_destroy(cachep, &list);
> }
> +
> + slabs_destroy(cachep, &list);
> return 1;
> }
>
> --
> 2.26.2
>
>

2020-07-29 01:28:28

by Zhang, Qiang

[permalink] [raw]
Subject: 回复: [PATCH] mm/slab.c: add node spinlock pr otect in __cache_free_alien



________________________________________
??????: David Rientjes <[email protected]>
????ʱ??: 2020??7??29?? 3:46
?ռ???: Zhang, Qiang
????: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
????: Re: [PATCH] mm/slab.c: add node spinlock protect in __cache_free_alien

On Tue, 28 Jul 2020, [email protected] wrote:

> From: Zhang Qiang <[email protected]>
>
> We should add node spinlock protect "n->alien" which may be
> assigned to NULL in cpuup_canceled func. cause address access
> exception.
>

>Hi, do you have an example NULL pointer dereference where you have hit
>this?

>This rather looks like something to fix up in cpuup_canceled() since it's
>currently manipulating the alien cache for the canceled cpu's node.

yes , it is fix up in cpuup_canceled it's
currently manipulating the alien cache for the canceled cpu's node which may be the same as the node being operated on in the __cache_free_alien func.

void cpuup_canceled
{
n = get_node(cachep, node);
spin_lock_irq(&n->list_lock);
...
n->alien = NULL;
spin_unlock_irq(&n->list_lock);
....
}

> Fixes: 18bf854117c6 ("slab: use get_node() and kmem_cache_node() functions")
> Signed-off-by: Zhang Qiang <[email protected]>
> ---
> mm/slab.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index a89633603b2d..290523c90b4e 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -759,8 +759,10 @@ static int __cache_free_alien(struct kmem_cache *cachep, void *objp,
>
> n = get_node(cachep, node);
> STATS_INC_NODEFREES(cachep);
> + spin_lock(&n->list_lock);
> if (n->alien && n->alien[page_node]) {
> alien = n->alien[page_node];
> + spin_unlock(&n->list_lock);
> ac = &alien->ac;
> spin_lock(&alien->lock);
> if (unlikely(ac->avail == ac->limit)) {
> @@ -769,14 +771,15 @@ static int __cache_free_alien(struct kmem_cache *cachep, void *objp,
> }
> ac->entry[ac->avail++] = objp;
> spin_unlock(&alien->lock);
> - slabs_destroy(cachep, &list);
> } else {
> + spin_unlock(&n->list_lock);
> n = get_node(cachep, page_node);
> spin_lock(&n->list_lock);
> free_block(cachep, &objp, 1, page_node, &list);
> spin_unlock(&n->list_lock);
> - slabs_destroy(cachep, &list);
> }
> +
> + slabs_destroy(cachep, &list);
> return 1;
> }
>
> --
> 2.26.2
>
>

2020-07-29 23:36:03

by David Rientjes

[permalink] [raw]
Subject: Re: 回复: [PATCH] mm/slab.c: add node spinlock protect in __cache_free_ alien

On Wed, 29 Jul 2020, Zhang, Qiang wrote:

> > From: Zhang Qiang <[email protected]>
> >
> > We should add node spinlock protect "n->alien" which may be
> > assigned to NULL in cpuup_canceled func. cause address access
> > exception.
> >
>
> >Hi, do you have an example NULL pointer dereference where you have hit
> >this?
>

If you have a NULL pointer dereference or a GPF that occurred because of
this, it would be helpful to provide as rationale.

> >This rather looks like something to fix up in cpuup_canceled() since it's
> >currently manipulating the alien cache for the canceled cpu's node.
>
> yes , it is fix up in cpuup_canceled it's
> currently manipulating the alien cache for the canceled cpu's node which may be the same as the node being operated on in the __cache_free_alien func.
>
> void cpuup_canceled
> {
> n = get_node(cachep, node);
> spin_lock_irq(&n->list_lock);
> ...
> n->alien = NULL;
> spin_unlock_irq(&n->list_lock);
> ....
> }
>

Right, so the idea is that this should be fixed in cpuup_canceled()
instead -- why would we invaliate the entire node's alien cache because a
single cpu failed to come online?