LinuxLists.cc - [PATCH] slab: annotate on-slab caches nodelist locks

2012-10-29 10:50:59

Subject: [PATCH] slab: annotate on-slab caches nodelist locks

We currently provide lockdep annotation for kmalloc caches, and also
caches that have SLAB_DEBUG_OBJECTS enabled. The reason for this is that
we can quite frequently nest in the l3->list_lock lock, which is not
something trivial to avoid.

My proposal with this patch, is to extend this to caches whose slab
management object lives within the slab as well ("on_slab"). The need
for this arose in the context of testing kmemcg-slab patches. With such
patchset, we can have per-memcg kmalloc caches. So the same path that
led to nesting between kmalloc caches will could then lead to in-memcg
nesting. Because they are not annotated, lockdep will trigger.

Signed-off-by: Glauber Costa <[email protected]>
CC: Christoph Lameter <[email protected]>
CC: Pekka Enberg <[email protected]>
CC: David Rientjes <[email protected]>
CC: JoonSoo Kim <[email protected]>

---
Instead of "on_slab", I considered checking the memcg cache's root
cache, and annotating that only in case this is a kmalloc cache.
I ended up annotating on_slab caches, because given how frequently
those locks can nest, it seemed like a safe choice to go. I was
a little bit inspired by the key's name as well, that indicated
this could work for all on_slab caches. Let me know if you guys
want a different test condition for this.
---
mm/slab.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index 9b7f6b63..ef1c8b3 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -654,6 +654,26 @@ static void init_node_lock_keys(int q)
}
}

+static void on_slab_lock_classes_node(struct kmem_cache *cachep, int q)
+{
+ struct kmem_list3 *l3;
+ l3 = cachep->nodelists[q];
+ if (!l3)
+ return;
+
+ slab_set_lock_classes(cachep, &on_slab_l3_key,
+ &on_slab_alc_key, q);
+}
+
+static inline void on_slab_lock_classes(struct kmem_cache *cachep)
+{
+ int node;
+
+ VM_BUG_ON(OFF_SLAB(cachep));
+ for_each_node(node)
+ on_slab_lock_classes_node(cachep, node);
+}
+
static inline void init_lock_keys(void)
{
int node;
@@ -670,6 +690,10 @@ static inline void init_lock_keys(void)
{
}

+static inline void on_slab_lock_classes(struct kmem_cache *cachep)
+{
+}
+
static void slab_set_debugobj_lock_classes_node(struct kmem_cache *cachep, int node)
{
}
@@ -1397,6 +1421,9 @@ static int __cpuinit cpuup_prepare(long cpu)
free_alien_cache(alien);
if (cachep->flags & SLAB_DEBUG_OBJECTS)
slab_set_debugobj_lock_classes_node(cachep, node);
+ else if (!OFF_SLAB(cachep) &&
+ !(cachep->flags & SLAB_DESTROY_BY_RCU))
+ on_slab_lock_classes_node(cachep, node);
}
init_node_lock_keys(node);

@@ -2554,7 +2581,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
WARN_ON_ONCE(flags & SLAB_DESTROY_BY_RCU);

slab_set_debugobj_lock_classes(cachep);
- }
+ } else if (!OFF_SLAB(cachep) && !(flags & SLAB_DESTROY_BY_RCU))
+ on_slab_lock_classes(cachep);

return 0;
}
--
1.7.11.7

2012-10-31 08:03:37

by Pekka Enberg

[permalink] [raw]

Subject: Re: [PATCH] slab: annotate on-slab caches nodelist locks

(Adding Peter and Michael to CC.)

On Mon, Oct 29, 2012 at 12:49 PM, Glauber Costa <[email protected]> wrote:
> We currently provide lockdep annotation for kmalloc caches, and also
> caches that have SLAB_DEBUG_OBJECTS enabled. The reason for this is that
> we can quite frequently nest in the l3->list_lock lock, which is not
> something trivial to avoid.
>
> My proposal with this patch, is to extend this to caches whose slab
> management object lives within the slab as well ("on_slab"). The need
> for this arose in the context of testing kmemcg-slab patches. With such
> patchset, we can have per-memcg kmalloc caches. So the same path that
> led to nesting between kmalloc caches will could then lead to in-memcg
> nesting. Because they are not annotated, lockdep will trigger.
>
> Signed-off-by: Glauber Costa <[email protected]>
> CC: Christoph Lameter <[email protected]>
> CC: Pekka Enberg <[email protected]>
> CC: David Rientjes <[email protected]>
> CC: JoonSoo Kim <[email protected]>
>
> ---
> Instead of "on_slab", I considered checking the memcg cache's root
> cache, and annotating that only in case this is a kmalloc cache.
> I ended up annotating on_slab caches, because given how frequently
> those locks can nest, it seemed like a safe choice to go. I was
> a little bit inspired by the key's name as well, that indicated
> this could work for all on_slab caches. Let me know if you guys
> want a different test condition for this.
> ---
> mm/slab.c | 30 +++++++++++++++++++++++++++++-
> 1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 9b7f6b63..ef1c8b3 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -654,6 +654,26 @@ static void init_node_lock_keys(int q)
> }
> }
>
> +static void on_slab_lock_classes_node(struct kmem_cache *cachep, int q)
> +{
> + struct kmem_list3 *l3;
> + l3 = cachep->nodelists[q];
> + if (!l3)
> + return;
> +
> + slab_set_lock_classes(cachep, &on_slab_l3_key,
> + &on_slab_alc_key, q);
> +}
> +
> +static inline void on_slab_lock_classes(struct kmem_cache *cachep)
> +{
> + int node;
> +
> + VM_BUG_ON(OFF_SLAB(cachep));
> + for_each_node(node)
> + on_slab_lock_classes_node(cachep, node);
> +}
> +
> static inline void init_lock_keys(void)
> {
> int node;
> @@ -670,6 +690,10 @@ static inline void init_lock_keys(void)
> {
> }
>
> +static inline void on_slab_lock_classes(struct kmem_cache *cachep)
> +{
> +}
> +
> static void slab_set_debugobj_lock_classes_node(struct kmem_cache *cachep, int node)
> {
> }
> @@ -1397,6 +1421,9 @@ static int __cpuinit cpuup_prepare(long cpu)
> free_alien_cache(alien);
> if (cachep->flags & SLAB_DEBUG_OBJECTS)
> slab_set_debugobj_lock_classes_node(cachep, node);
> + else if (!OFF_SLAB(cachep) &&
> + !(cachep->flags & SLAB_DESTROY_BY_RCU))
> + on_slab_lock_classes_node(cachep, node);
> }
> init_node_lock_keys(node);
>
> @@ -2554,7 +2581,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
> WARN_ON_ONCE(flags & SLAB_DESTROY_BY_RCU);
>
> slab_set_debugobj_lock_classes(cachep);
> - }
> + } else if (!OFF_SLAB(cachep) && !(flags & SLAB_DESTROY_BY_RCU))
> + on_slab_lock_classes(cachep);
>
> return 0;
> }
> --
> 1.7.11.7
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2012-11-01 07:11:16

by Michael wang

[permalink] [raw]

Subject: Re: [PATCH] slab: annotate on-slab caches nodelist locks

On 10/29/2012 06:49 PM, Glauber Costa wrote:
> We currently provide lockdep annotation for kmalloc caches, and also
> caches that have SLAB_DEBUG_OBJECTS enabled. The reason for this is that
> we can quite frequently nest in the l3->list_lock lock, which is not
> something trivial to avoid.
>
> My proposal with this patch, is to extend this to caches whose slab
> management object lives within the slab as well ("on_slab"). The need
> for this arose in the context of testing kmemcg-slab patches. With such
> patchset, we can have per-memcg kmalloc caches. So the same path that
> led to nesting between kmalloc caches will could then lead to in-memcg
> nesting. Because they are not annotated, lockdep will trigger.

Hi, Glauber

I'm trying to understand what's the issue we are trying to solve, but
looks like I need some help...

So allow me to ask few questions:

1. what's scene will cause the fake dead lock?
2. what's the conflict caches?
3. how does their lock operation nested?

And I think it will be better if we have the bug log in patch comment,
so folks will easily know what's the reason we need this patch ;-)

Regards,
Michael Wang

>
> Signed-off-by: Glauber Costa <[email protected]>
> CC: Christoph Lameter <[email protected]>
> CC: Pekka Enberg <[email protected]>
> CC: David Rientjes <[email protected]>
> CC: JoonSoo Kim <[email protected]>
>
> ---
> Instead of "on_slab", I considered checking the memcg cache's root
> cache, and annotating that only in case this is a kmalloc cache.
> I ended up annotating on_slab caches, because given how frequently
> those locks can nest, it seemed like a safe choice to go. I was
> a little bit inspired by the key's name as well, that indicated
> this could work for all on_slab caches. Let me know if you guys
> want a different test condition for this.
> ---
> mm/slab.c | 30 +++++++++++++++++++++++++++++-
> 1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 9b7f6b63..ef1c8b3 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -654,6 +654,26 @@ static void init_node_lock_keys(int q)
> }
> }
>
> +static void on_slab_lock_classes_node(struct kmem_cache *cachep, int q)
> +{
> + struct kmem_list3 *l3;
> + l3 = cachep->nodelists[q];
> + if (!l3)
> + return;
> +
> + slab_set_lock_classes(cachep, &on_slab_l3_key,
> + &on_slab_alc_key, q);
> +}
> +
> +static inline void on_slab_lock_classes(struct kmem_cache *cachep)
> +{
> + int node;
> +
> + VM_BUG_ON(OFF_SLAB(cachep));
> + for_each_node(node)
> + on_slab_lock_classes_node(cachep, node);
> +}
> +
> static inline void init_lock_keys(void)
> {
> int node;
> @@ -670,6 +690,10 @@ static inline void init_lock_keys(void)
> {
> }
>
> +static inline void on_slab_lock_classes(struct kmem_cache *cachep)
> +{
> +}
> +
> static void slab_set_debugobj_lock_classes_node(struct kmem_cache *cachep, int node)
> {
> }
> @@ -1397,6 +1421,9 @@ static int __cpuinit cpuup_prepare(long cpu)
> free_alien_cache(alien);
> if (cachep->flags & SLAB_DEBUG_OBJECTS)
> slab_set_debugobj_lock_classes_node(cachep, node);
> + else if (!OFF_SLAB(cachep) &&
> + !(cachep->flags & SLAB_DESTROY_BY_RCU))
> + on_slab_lock_classes_node(cachep, node);
> }
> init_node_lock_keys(node);
>
> @@ -2554,7 +2581,8 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
> WARN_ON_ONCE(flags & SLAB_DESTROY_BY_RCU);
>
> slab_set_debugobj_lock_classes(cachep);
> - }
> + } else if (!OFF_SLAB(cachep) && !(flags & SLAB_DESTROY_BY_RCU))
> + on_slab_lock_classes(cachep);
>
> return 0;
> }
>

2012-11-01 08:48:34

by Glauber Costa

[permalink] [raw]

Subject: Re: [PATCH] slab: annotate on-slab caches nodelist locks

On 11/01/2012 11:11 AM, Michael Wang wrote:
> On 10/29/2012 06:49 PM, Glauber Costa wrote:
>> We currently provide lockdep annotation for kmalloc caches, and also
>> caches that have SLAB_DEBUG_OBJECTS enabled. The reason for this is that
>> we can quite frequently nest in the l3->list_lock lock, which is not
>> something trivial to avoid.
>>
>> My proposal with this patch, is to extend this to caches whose slab
>> management object lives within the slab as well ("on_slab"). The need
>> for this arose in the context of testing kmemcg-slab patches. With such
>> patchset, we can have per-memcg kmalloc caches. So the same path that
>> led to nesting between kmalloc caches will could then lead to in-memcg
>> nesting. Because they are not annotated, lockdep will trigger.
>
> Hi, Glauber
>
> I'm trying to understand what's the issue we are trying to solve, but
> looks like I need some help...
>
Understandably =)

This will not trigger in an upstream kernel, so in this sense, it is not
an existing bug. It happens when the kmemcg-slab series is applied
(https://lkml.org/lkml/2012/10/16/186) and (http://lwn.net/Articles/519877/)

Because this is a big series, I am for a while adopting the policy of
sending out patches that are in principle independent of the series, to
be reviewed on their own. But in some cases like this, some context may
end up missing.

Now, of course I won't tell you to go read it all, so here is a summary:
* We operate in a containerized environment, with each container inside
a cgroup
* in this context, it is necessary to account and limit the amount of
kernel memory that can be tracked back to processes. This is akin of
OpenVZ's beancounters (http://wiki.openvz.org/Proc/user_beancounters)
* To do that, we create a version of each slab that a cgroup uses.
Processes in that cgroup will allocate from that slab.

This means that we will have cgroup-specific versions of slabs like
kmalloc-XX, dentry, inode, etc.

> So allow me to ask few questions:
>
> 1. what's scene will cause the fake dead lock?

This lockdep annotation exists because when freeing from kmalloc caches,
it is possible to nest in the l3 list_lock. The particular one I hit was
when we reach cache_flusharray with the l3 list_lock held, which seems
to happen quite often.

> 2. what's the conflict caches?
kmalloc-XX and kmalloc-memcg-y-XX

> 3. how does their lock operation nested?
>

In the same way kmalloc-XX would nest with itself.

2012-11-01 09:10:53

by Michael wang

[permalink] [raw]

Subject: Re: [PATCH] slab: annotate on-slab caches nodelist locks

On 11/02/2012 12:48 AM, Glauber Costa wrote:
> On 11/01/2012 11:11 AM, Michael Wang wrote:
>> On 10/29/2012 06:49 PM, Glauber Costa wrote:
>>> We currently provide lockdep annotation for kmalloc caches, and also
>>> caches that have SLAB_DEBUG_OBJECTS enabled. The reason for this is that
>>> we can quite frequently nest in the l3->list_lock lock, which is not
>>> something trivial to avoid.
>>>
>>> My proposal with this patch, is to extend this to caches whose slab
>>> management object lives within the slab as well ("on_slab"). The need
>>> for this arose in the context of testing kmemcg-slab patches. With such
>>> patchset, we can have per-memcg kmalloc caches. So the same path that
>>> led to nesting between kmalloc caches will could then lead to in-memcg
>>> nesting. Because they are not annotated, lockdep will trigger.
>>
>> Hi, Glauber
>>
>> I'm trying to understand what's the issue we are trying to solve, but
>> looks like I need some help...
>>
> Understandably =)
>
> This will not trigger in an upstream kernel, so in this sense, it is not
> an existing bug. It happens when the kmemcg-slab series is applied
> (https://lkml.org/lkml/2012/10/16/186) and (http://lwn.net/Articles/519877/)
>
> Because this is a big series, I am for a while adopting the policy of
> sending out patches that are in principle independent of the series, to
> be reviewed on their own. But in some cases like this, some context may
> end up missing.
>
> Now, of course I won't tell you to go read it all, so here is a summary:
> * We operate in a containerized environment, with each container inside
> a cgroup
> * in this context, it is necessary to account and limit the amount of
> kernel memory that can be tracked back to processes. This is akin of
> OpenVZ's beancounters (http://wiki.openvz.org/Proc/user_beancounters)
> * To do that, we create a version of each slab that a cgroup uses.
> Processes in that cgroup will allocate from that slab.
>
> This means that we will have cgroup-specific versions of slabs like
> kmalloc-XX, dentry, inode, etc.
>
>> So allow me to ask few questions:
>>
>> 1. what's scene will cause the fake dead lock?
>
> This lockdep annotation exists because when freeing from kmalloc caches,
> it is possible to nest in the l3 list_lock. The particular one I hit was
> when we reach cache_flusharray with the l3 list_lock held, which seems
> to happen quite often.
>
>> 2. what's the conflict caches?
> kmalloc-XX and kmalloc-memcg-y-XX
>
>> 3. how does their lock operation nested?
>>
>
> In the same way kmalloc-XX would nest with itself.

So this is a patch to fix the possible BUG if other patch applied?
I'm not sure but sounds like not the right process...add this one to
that patch set may be better :)

Regards,
Michael Wang

>

2012-11-01 09:13:31

by Glauber Costa

[permalink] [raw]

Subject: Re: [PATCH] slab: annotate on-slab caches nodelist locks

On 11/01/2012 01:10 PM, Michael Wang wrote:
> On 11/02/2012 12:48 AM, Glauber Costa wrote:
>> On 11/01/2012 11:11 AM, Michael Wang wrote:
>>> On 10/29/2012 06:49 PM, Glauber Costa wrote:
>>>> We currently provide lockdep annotation for kmalloc caches, and also
>>>> caches that have SLAB_DEBUG_OBJECTS enabled. The reason for this is that
>>>> we can quite frequently nest in the l3->list_lock lock, which is not
>>>> something trivial to avoid.
>>>>
>>>> My proposal with this patch, is to extend this to caches whose slab
>>>> management object lives within the slab as well ("on_slab"). The need
>>>> for this arose in the context of testing kmemcg-slab patches. With such
>>>> patchset, we can have per-memcg kmalloc caches. So the same path that
>>>> led to nesting between kmalloc caches will could then lead to in-memcg
>>>> nesting. Because they are not annotated, lockdep will trigger.
>>>
>>> Hi, Glauber
>>>
>>> I'm trying to understand what's the issue we are trying to solve, but
>>> looks like I need some help...
>>>
>> Understandably =)
>>
>> This will not trigger in an upstream kernel, so in this sense, it is not
>> an existing bug. It happens when the kmemcg-slab series is applied
>> (https://lkml.org/lkml/2012/10/16/186) and (http://lwn.net/Articles/519877/)
>>
>> Because this is a big series, I am for a while adopting the policy of
>> sending out patches that are in principle independent of the series, to
>> be reviewed on their own. But in some cases like this, some context may
>> end up missing.
>>
>> Now, of course I won't tell you to go read it all, so here is a summary:
>> * We operate in a containerized environment, with each container inside
>> a cgroup
>> * in this context, it is necessary to account and limit the amount of
>> kernel memory that can be tracked back to processes. This is akin of
>> OpenVZ's beancounters (http://wiki.openvz.org/Proc/user_beancounters)
>> * To do that, we create a version of each slab that a cgroup uses.
>> Processes in that cgroup will allocate from that slab.
>>
>> This means that we will have cgroup-specific versions of slabs like
>> kmalloc-XX, dentry, inode, etc.
>>
>>> So allow me to ask few questions:
>>>
>>> 1. what's scene will cause the fake dead lock?
>>
>> This lockdep annotation exists because when freeing from kmalloc caches,
>> it is possible to nest in the l3 list_lock. The particular one I hit was
>> when we reach cache_flusharray with the l3 list_lock held, which seems
>> to happen quite often.
>>
>>> 2. what's the conflict caches?
>> kmalloc-XX and kmalloc-memcg-y-XX
>>
>>> 3. how does their lock operation nested?
>>>
>>
>> In the same way kmalloc-XX would nest with itself.
>
> So this is a patch to fix the possible BUG if other patch applied?
> I'm not sure but sounds like not the right process...add this one to
> that patch set may be better :)
>

It is in the patchset. As I said, I've *also* (not exclusively) been
sending separately for a while patches that are potentially good on
their own (iow, have no code dependency with the rest of the series). In
some cases it help, in some, it doesn't