LinuxLists.cc - [Patch](memory hotplug) Make kmem_cache_node for SLUB on memory online to avoid panic(take 3)

2007-10-18 03:26:36

Subject: [Patch](memory hotplug) Make kmem_cache_node for SLUB on memory online to avoid panic(take 3)

This patch fixes panic due to access NULL pointer
of kmem_cache_node at discard_slab() after memory online.

When memory online is called, kmem_cache_nodes are created for
all SLUBs for new node whose memory are available.

slab_mem_going_online_callback() is called to make kmem_cache_node()
in callback of memory online event. If it (or other callbacks) fails,
then slab_mem_offline_callback() is called for rollback.

In memory offline, slab_mem_going_offline_callback() is called to
shrink all slub cache, then slab_mem_offline_callback() is called later.

This patch is tested on my ia64 box.

Please apply.

Signed-off-by: Yasunori Goto <[email protected]>

---
mm/slub.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 115 insertions(+)

Index: current/mm/slub.c
===================================================================
--- current.orig/mm/slub.c 2007-10-17 21:17:53.000000000 +0900
+++ current/mm/slub.c 2007-10-17 22:23:08.000000000 +0900
@@ -20,6 +20,7 @@
#include <linux/mempolicy.h>
#include <linux/ctype.h>
#include <linux/kallsyms.h>
+#include <linux/memory.h>

/*
* Lock order:
@@ -2688,6 +2689,118 @@ int kmem_cache_shrink(struct kmem_cache
}
EXPORT_SYMBOL(kmem_cache_shrink);

+#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
+static int slab_mem_going_offline_callback(void *arg)
+{
+ struct kmem_cache *s;
+
+ down_read(&slub_lock);
+ list_for_each_entry(s, &slab_caches, list)
+ kmem_cache_shrink(s);
+ up_read(&slub_lock);
+
+ return 0;
+}
+
+static void slab_mem_offline_callback(void *arg)
+{
+ struct kmem_cache_node *n;
+ struct kmem_cache *s;
+ struct memory_notify *marg = arg;
+ int offline_node;
+
+ offline_node = marg->status_change_nid;
+
+ /*
+ * If the node still has available memory. we need kmem_cache_node
+ * for it yet.
+ */
+ if (offline_node < 0)
+ return;
+
+ down_read(&slub_lock);
+ list_for_each_entry(s, &slab_caches, list) {
+ n = get_node(s, offline_node);
+ if (n) {
+ /*
+ * if n->nr_slabs > 0, slabs still exist on the node
+ * that is going down. We were unable to free them,
+ * and offline_pages() function shoudn't call this
+ * callback. So, we must fail.
+ */
+ BUG_ON(atomic_read(&n->nr_slabs));
+
+ s->node[offline_node] = NULL;
+ kmem_cache_free(kmalloc_caches, n);
+ }
+ }
+ up_read(&slub_lock);
+}
+
+static int slab_mem_going_online_callback(void *arg)
+{
+ struct kmem_cache_node *n;
+ struct kmem_cache *s;
+ struct memory_notify *marg = arg;
+ int nid = marg->status_change_nid;
+
+ /*
+ * If the node's memory is already available, then kmem_cache_node is
+ * already created. Nothing to do.
+ */
+ if (nid < 0)
+ return 0;
+
+ /*
+ * We are bringing a node online. No memory is availabe yet. We must
+ * allocate a kmem_cache_node structure in order to bring the node
+ * online.
+ */
+ down_read(&slub_lock);
+ list_for_each_entry(s, &slab_caches, list) {
+ /*
+ * XXX: kmem_cache_alloc_node will fallback to other nodes
+ * since memory is not yet available from the node that
+ * is brought up.
+ */
+ n = kmem_cache_alloc(kmalloc_caches, GFP_KERNEL);
+ if (!n)
+ return -ENOMEM;
+ init_kmem_cache_node(n);
+ s->node[nid] = n;
+ }
+ up_read(&slub_lock);
+
+ return 0;
+}
+
+static int slab_memory_callback(struct notifier_block *self,
+ unsigned long action, void *arg)
+{
+ int ret = 0;
+
+ switch (action) {
+ case MEM_GOING_ONLINE:
+ ret = slab_mem_going_online_callback(arg);
+ break;
+ case MEM_GOING_OFFLINE:
+ ret = slab_mem_going_offline_callback(arg);
+ break;
+ case MEM_OFFLINE:
+ case MEM_CANCEL_ONLINE:
+ slab_mem_offline_callback(arg);
+ break;
+ case MEM_ONLINE:
+ case MEM_CANCEL_OFFLINE:
+ break;
+ }
+
+ ret = notifier_from_errno(ret);
+ return ret;
+}
+
+#endif /* CONFIG_MEMORY_HOTPLUG */
+
/********************************************************************
* Basic setup of slabs
*******************************************************************/
@@ -2709,6 +2822,8 @@ void __init kmem_cache_init(void)
sizeof(struct kmem_cache_node), GFP_KERNEL);
kmalloc_caches[0].refcount = -1;
caches++;
+
+ hotplug_memory_notifier(slab_memory_callback, 1);
#endif

/* Able to allocate the per node structures */

--
Yasunori Goto

2007-10-18 03:47:19

by Andrew Morton

[permalink] [raw]

Subject: Re: [Patch](memory hotplug) Make kmem_cache_node for SLUB on memory online to avoid panic(take 3)

On Thu, 18 Oct 2007 12:25:37 +0900 Yasunori Goto <[email protected]> wrote:

>
> This patch fixes panic due to access NULL pointer
> of kmem_cache_node at discard_slab() after memory online.
>
> When memory online is called, kmem_cache_nodes are created for
> all SLUBs for new node whose memory are available.
>
> slab_mem_going_online_callback() is called to make kmem_cache_node()
> in callback of memory online event. If it (or other callbacks) fails,
> then slab_mem_offline_callback() is called for rollback.
>
> In memory offline, slab_mem_going_offline_callback() is called to
> shrink all slub cache, then slab_mem_offline_callback() is called later.
>
> This patch is tested on my ia64 box.
>
> ...
>
> +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)

hm. There should be no linkage between memory hotpluggability and
NUMA, surely?

> +static int slab_mem_going_offline_callback(void *arg)
> +{
> + struct kmem_cache *s;
> +
> + down_read(&slub_lock);
> + list_for_each_entry(s, &slab_caches, list)
> + kmem_cache_shrink(s);
> + up_read(&slub_lock);
> +
> + return 0;
> +}
> +
> +static void slab_mem_offline_callback(void *arg)
> +{
> + struct kmem_cache_node *n;
> + struct kmem_cache *s;
> + struct memory_notify *marg = arg;
> + int offline_node;
> +
> + offline_node = marg->status_change_nid;
> +
> + /*
> + * If the node still has available memory. we need kmem_cache_node
> + * for it yet.
> + */
> + if (offline_node < 0)
> + return;
> +
> + down_read(&slub_lock);
> + list_for_each_entry(s, &slab_caches, list) {
> + n = get_node(s, offline_node);
> + if (n) {
> + /*
> + * if n->nr_slabs > 0, slabs still exist on the node
> + * that is going down. We were unable to free them,
> + * and offline_pages() function shoudn't call this
> + * callback. So, we must fail.
> + */
> + BUG_ON(atomic_read(&n->nr_slabs));

Expereince tells us that WARN_ON is preferred for newly added code ;)

> + s->node[offline_node] = NULL;
> + kmem_cache_free(kmalloc_caches, n);
> + }
> + }
> + up_read(&slub_lock);
> +}
> +
> +static int slab_mem_going_online_callback(void *arg)
> +{
> + struct kmem_cache_node *n;
> + struct kmem_cache *s;
> + struct memory_notify *marg = arg;
> + int nid = marg->status_change_nid;
> +
> + /*
> + * If the node's memory is already available, then kmem_cache_node is
> + * already created. Nothing to do.
> + */
> + if (nid < 0)
> + return 0;
> +
> + /*
> + * We are bringing a node online. No memory is availabe yet. We must
> + * allocate a kmem_cache_node structure in order to bring the node
> + * online.
> + */
> + down_read(&slub_lock);
> + list_for_each_entry(s, &slab_caches, list) {
> + /*
> + * XXX: kmem_cache_alloc_node will fallback to other nodes
> + * since memory is not yet available from the node that
> + * is brought up.
> + */
> + n = kmem_cache_alloc(kmalloc_caches, GFP_KERNEL);
> + if (!n)
> + return -ENOMEM;

err, we forgot slub_lock. I'll fix that.

> + init_kmem_cache_node(n);
> + s->node[nid] = n;
> + }
> + up_read(&slub_lock);
> +
> + return 0;
> +}

So that's slub. Does slab already have this functionality or are you
not bothering to maintain slab in this area?

2007-10-18 06:26:25

by Christoph Lameter

[permalink] [raw]

Subject: Re: [Patch](memory hotplug) Make kmem_cache_node for SLUB on memory online to avoid panic(take 3)

On Wed, 17 Oct 2007, Andrew Morton wrote:

> > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
>
> hm. There should be no linkage between memory hotpluggability and
> NUMA, surely?

NUMA support in the slab allocators requires allocation of per node
structures. The per node structures are folded into the global structure
for non NUMA.

> > + /*
> > + * if n->nr_slabs > 0, slabs still exist on the node
> > + * that is going down. We were unable to free them,
> > + * and offline_pages() function shoudn't call this
> > + * callback. So, we must fail.
> > + */
> > + BUG_ON(atomic_read(&n->nr_slabs));
>
> Expereince tells us that WARN_ON is preferred for newly added code ;)

It would be bad to just zap a per node array while there is still data in
there. This will cause later failures when an attempt is made to free the
objects that now have no per node structure anymore.

> > + /*
> > + * XXX: kmem_cache_alloc_node will fallback to other nodes
> > + * since memory is not yet available from the node that
> > + * is brought up.
> > + */
> > + n = kmem_cache_alloc(kmalloc_caches, GFP_KERNEL);
> > + if (!n)
> > + return -ENOMEM;
>
> err, we forgot slub_lock. I'll fix that.

Right.

> So that's slub. Does slab already have this functionality or are you
> not bothering to maintain slab in this area?

Slab brings up a per node structure when the corresponding cpu is brought
up. That was sufficient as long as we did not have any memoryless nodes.
Now we may have to fix some things over there as well.

2007-10-18 07:27:08

by Andrew Morton

[permalink] [raw]

Subject: Re: [Patch](memory hotplug) Make kmem_cache_node for SLUB on memory online to avoid panic(take 3)

On Wed, 17 Oct 2007 23:25:58 -0700 (PDT) Christoph Lameter <[email protected]> wrote:

> > So that's slub. Does slab already have this functionality or are you
> > not bothering to maintain slab in this area?
>
> Slab brings up a per node structure when the corresponding cpu is brought
> up. That was sufficient as long as we did not have any memoryless nodes.
> Now we may have to fix some things over there as well.

Is there amy point? Our time would be better spent in making
slab.c go away. How close are we to being able to do that anwyay?

2007-10-18 08:34:17

by Yasunori Goto

[permalink] [raw]

Subject: Re: [Patch](memory hotplug) Make kmem_cache_node for SLUB on memory online to avoid panic(take 3)

> On Wed, 17 Oct 2007 23:25:58 -0700 (PDT) Christoph Lameter <[email protected]> wrote:
>
> > > So that's slub. Does slab already have this functionality or are you
> > > not bothering to maintain slab in this area?
> >
> > Slab brings up a per node structure when the corresponding cpu is brought
> > up. That was sufficient as long as we did not have any memoryless nodes.

Right. At least, I don't have any experience of panic with SLAB so far.
(If panic occurred, I already made a patch.).

> > Now we may have to fix some things over there as well.

Though the fix may be better for it, my priority is very low for it
now.

--
Yasunori Goto

2007-10-18 09:13:45

by Christoph Lameter

[permalink] [raw]

Subject: Re: [Patch](memory hotplug) Make kmem_cache_node for SLUB on memory online to avoid panic(take 3)

On Thu, 18 Oct 2007, Andrew Morton wrote:

> > Slab brings up a per node structure when the corresponding cpu is brought
> > up. That was sufficient as long as we did not have any memoryless nodes.
> > Now we may have to fix some things over there as well.
>
> Is there amy point? Our time would be better spent in making
> slab.c go away. How close are we to being able to do that anwyay?

Well the problem right now is the regression in slab_free() on SMP.
AFAICT UP and NUMA is fine and also most loads under SMP. Concurrent
allocation / frees on multiple processors are several times faster (I see
up to 10 fold improvements on an 8p).

However, long sequences of free operations from a single processor under
SMP require too many atomic operations compared with SLAB. If I only do
frees on a single processor on SMP then I can produce a 30% regression for
slabs between 128 and 1024 byte in size. I have a patchset in the works
that reduces the atomic operations for those.

SLAB currently has an advantage since it uses coarser grained locking.
SLAB can take a global lock and then perform queue operations on
multiple objects. SLUB has fine grained locking which increases
concurrency but also the overhead of atomic operations.

The regression does not surface under UP since we do not need to do
locking. And it does not surface under NUMA since the alien cache stuff in
SLAB is reducing slab_free performance compared to SMP.

2007-10-18 09:21:34

by Yasunori Goto

[permalink] [raw]

Subject: Re: [Patch](memory hotplug) Make kmem_cache_node for SLUB on memory online to avoid panic(take 3)

> On Thu, 18 Oct 2007 12:25:37 +0900 Yasunori Goto <[email protected]> wrote:
>
> >
> > This patch fixes panic due to access NULL pointer
> > of kmem_cache_node at discard_slab() after memory online.
> >
> > When memory online is called, kmem_cache_nodes are created for
> > all SLUBs for new node whose memory are available.
> >
> > slab_mem_going_online_callback() is called to make kmem_cache_node()
> > in callback of memory online event. If it (or other callbacks) fails,
> > then slab_mem_offline_callback() is called for rollback.
> >
> > In memory offline, slab_mem_going_offline_callback() is called to
> > shrink all slub cache, then slab_mem_offline_callback() is called later.
> >
> > This patch is tested on my ia64 box.
> >
> > ...
> >
> > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
>
> hm. There should be no linkage between memory hotpluggability and
> NUMA, surely?

Sure. IBM's powerpc boxes have to support memory hotplug even if it
is non-numa machine. They have the Dynamic Logical Partitioning feature.

> > + down_read(&slub_lock);
> > + list_for_each_entry(s, &slab_caches, list) {
> > + n = get_node(s, offline_node);
> > + if (n) {
> > + /*
> > + * if n->nr_slabs > 0, slabs still exist on the node
> > + * that is going down. We were unable to free them,
> > + * and offline_pages() function shoudn't call this
> > + * callback. So, we must fail.
> > + */
> > + BUG_ON(atomic_read(&n->nr_slabs));
>
> Expereince tells us that WARN_ON is preferred for newly added code ;)

Oh... Ok!

> > + s->node[offline_node] = NULL;
> > + kmem_cache_free(kmalloc_caches, n);
> > + }
> > + }
> > + up_read(&slub_lock);
> > +}
> > +
> > +static int slab_mem_going_online_callback(void *arg)
> > +{
> > + struct kmem_cache_node *n;
> > + struct kmem_cache *s;
> > + struct memory_notify *marg = arg;
> > + int nid = marg->status_change_nid;
> > +
> > + /*
> > + * If the node's memory is already available, then kmem_cache_node is
> > + * already created. Nothing to do.
> > + */
> > + if (nid < 0)
> > + return 0;
> > +
> > + /*
> > + * We are bringing a node online. No memory is availabe yet. We must
> > + * allocate a kmem_cache_node structure in order to bring the node
> > + * online.
> > + */
> > + down_read(&slub_lock);
> > + list_for_each_entry(s, &slab_caches, list) {
> > + /*
> > + * XXX: kmem_cache_alloc_node will fallback to other nodes
> > + * since memory is not yet available from the node that
> > + * is brought up.
> > + */
> > + n = kmem_cache_alloc(kmalloc_caches, GFP_KERNEL);
> > + if (!n)
> > + return -ENOMEM;
>
> err, we forgot slub_lock. I'll fix that.

Oops. Indeed. Thanks for your check.

Bye.

--
Yasunori Goto

2007-10-23 04:15:08

by Olof Johansson

[permalink] [raw]

Subject: [PATCH] Fix warning in mm/slub.c

Hi,

"Make kmem_cache_node for SLUB on memory online to avoid panic" introduced
the following:

mm/slub.c:2737: warning: passing argument 1 of 'atomic_read' from
incompatible pointer type

Signed-off-by: Olof Johansson <[email protected]>

diff --git a/mm/slub.c b/mm/slub.c
index aac1dd3..bcdb2c8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2734,7 +2734,7 @@ static void slab_mem_offline_callback(void *arg)
* and offline_pages() function shoudn't call this
* callback. So, we must fail.
*/
- BUG_ON(atomic_read(&n->nr_slabs));
+ BUG_ON(atomic_long_read(&n->nr_slabs));

s->node[offline_node] = NULL;
kmem_cache_free(kmalloc_caches, n);

2007-10-23 05:36:45

by Yasunori Goto

[permalink] [raw]

Subject: Re: [PATCH] Fix warning in mm/slub.c

> "Make kmem_cache_node for SLUB on memory online to avoid panic" introduced
> the following:
>
> mm/slub.c:2737: warning: passing argument 1 of 'atomic_read' from
> incompatible pointer type
>
>
> Signed-off-by: Olof Johansson <[email protected]>
>
>
> diff --git a/mm/slub.c b/mm/slub.c
> index aac1dd3..bcdb2c8 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2734,7 +2734,7 @@ static void slab_mem_offline_callback(void *arg)
> * and offline_pages() function shoudn't call this
> * callback. So, we must fail.
> */
> - BUG_ON(atomic_read(&n->nr_slabs));
> + BUG_ON(atomic_long_read(&n->nr_slabs));
>
> s->node[offline_node] = NULL;
> kmem_cache_free(kmalloc_caches, n);

Oops, yes. Thanks.

Acked-by: Yasunori Goto <[email protected]>

--
Yasunori Goto

2007-10-23 07:52:37

by Pekka Enberg

[permalink] [raw]

Subject: Re: [PATCH] Fix warning in mm/slub.c

Hi,

On 10/23/07, Yasunori Goto <[email protected]> wrote:
> > "Make kmem_cache_node for SLUB on memory online to avoid panic" introduced
> > the following:
> >
> > mm/slub.c:2737: warning: passing argument 1 of 'atomic_read' from
> > incompatible pointer type
> >
> >
> > Signed-off-by: Olof Johansson <[email protected]>
> >
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index aac1dd3..bcdb2c8 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2734,7 +2734,7 @@ static void slab_mem_offline_callback(void *arg)
> > * and offline_pages() function shoudn't call this
> > * callback. So, we must fail.
> > */
> > - BUG_ON(atomic_read(&n->nr_slabs));
> > + BUG_ON(atomic_long_read(&n->nr_slabs));
> >
> > s->node[offline_node] = NULL;
> > kmem_cache_free(kmalloc_caches, n);
>
>
> Oops, yes. Thanks.
>
> Acked-by: Yasunori Goto <[email protected]>

Reviewed-by: Pekka Enberg <[email protected]>

2007-10-23 16:21:32

by Christoph Lameter

[permalink] [raw]

Subject: Re: [PATCH] Fix warning in mm/slub.c

Acked-by: Christoph Lameter <[email protected]>