Kmemleak could quickly fail to allocate an object structure and then
disable itself in a low-memory situation. For example, running a mmap()
workload triggering swapping and OOM. This is especially problematic for
running things like LTP testsuite where one OOM test case would disable
the whole kmemleak and render the rest of test cases without kmemleak
watching for leaking.
Kmemleak allocation could fail even though the tracked memory is
succeeded. Hence, it could still try to start a direct reclaim if it is
not executed in an atomic context (spinlock, irq-handler etc), or a
high-priority allocation in an atomic context as a last-ditch effort.
Since kmemleak is a debug feature, it is unlikely to be used in
production that memory resources is scarce where direct reclaim or
high-priority atomic allocations should not be granted lightly.
Unless there is a brave soul to reimplement the kmemleak to embed it's
metadata into the tracked memory itself in a foreseeable future, this
provides a good balance between enabling kmemleak in a low-memory
situation and not introducing too much hackiness into the existing
code for now.
Signed-off-by: Qian Cai <[email protected]>
---
v3: Update the commit log.
Simplify the code inspired by graph_trace_open() from ftrace.
v2: Remove the needless checking for NULL objects in slab_post_alloc_hook()
per Catalin.
mm/kmemleak.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index a2d894d3de07..239927166894 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -581,6 +581,17 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size,
unsigned long untagged_ptr;
object = kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
+ if (!object) {
+ /*
+ * The tracked memory was allocated successful, if the kmemleak
+ * object failed to allocate for some reasons, it ends up with
+ * the whole kmemleak disabled, so let it success at all cost.
+ */
+ gfp = (in_atomic() || irqs_disabled()) ? GFP_ATOMIC :
+ gfp_kmemleak_mask(gfp) | __GFP_DIRECT_RECLAIM;
+ object = kmem_cache_alloc(object_cache, gfp);
+ }
+
if (!object) {
pr_warn("Cannot allocate a kmemleak_object structure\n");
kmemleak_disable();
--
2.17.2 (Apple Git-113)
On Tue, 26 Mar 2019, Qian Cai wrote:
> + if (!object) {
> + /*
> + * The tracked memory was allocated successful, if the kmemleak
> + * object failed to allocate for some reasons, it ends up with
> + * the whole kmemleak disabled, so let it success at all cost.
"let it succeed at all costs"
> + */
> + gfp = (in_atomic() || irqs_disabled()) ? GFP_ATOMIC :
> + gfp_kmemleak_mask(gfp) | __GFP_DIRECT_RECLAIM;
> + object = kmem_cache_alloc(object_cache, gfp);
> + }
> +
> if (!object) {
If the alloc must succeed then this check is no longer necessary.
On Tue, Mar 26, 2019 at 11:43:38AM -0400, Qian Cai wrote:
> Unless there is a brave soul to reimplement the kmemleak to embed it's
> metadata into the tracked memory itself in a foreseeable future, this
> provides a good balance between enabling kmemleak in a low-memory
> situation and not introducing too much hackiness into the existing
> code for now.
I don't understand kmemleak. Kirill pointed me at this a few days ago:
https://gist.github.com/kiryl/3225e235fea390aa2e49bf625bbe83ec
It's caused by the XArray allocating memory using GFP_NOWAIT | __GFP_NOWARN.
kmemleak then decides it needs to allocate memory to track this memory.
So it calls kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \
__GFP_NORETRY | __GFP_NOMEMALLOC | \
__GFP_NOWARN | __GFP_NOFAIL)
then the page allocator gets to see GFP_NOFAIL | GFP_NOWAIT and gets angry.
But I don't understand why kmemleak needs to mess with the GFP flags at
all. Just allocate using the same flags as the caller, and fail the original
allocation if the kmemleak allocation fails. Like this:
+++ b/mm/slab.h
@@ -435,12 +435,22 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags,
for (i = 0; i < size; i++) {
p[i] = kasan_slab_alloc(s, p[i], flags);
/* As p[i] might get tagged, call kmemleak hook after KASAN. */
- kmemleak_alloc_recursive(p[i], s->object_size, 1,
- s->flags, flags);
+ if (kmemleak_alloc_recursive(p[i], s->object_size, 1,
+ s->flags, flags))
+ goto fail;
}
if (memcg_kmem_enabled())
memcg_kmem_put_cache(s);
+ return;
+
+fail:
+ while (i > 0) {
+ kasan_blah(...);
+ kmemleak_blah();
+ i--;
+ }
+ free_blah(p);
+ *p = NULL;
}
#ifndef CONFIG_SLOB
and if we had something like this, we wouldn't need kmemleak to have this
self-disabling or must-succeed property.
On Tue, Mar 26, 2019 at 11:43:38AM -0400, Qian Cai wrote:
> Kmemleak could quickly fail to allocate an object structure and then
> disable itself in a low-memory situation. For example, running a mmap()
> workload triggering swapping and OOM. This is especially problematic for
> running things like LTP testsuite where one OOM test case would disable
> the whole kmemleak and render the rest of test cases without kmemleak
> watching for leaking.
>
> Kmemleak allocation could fail even though the tracked memory is
> succeeded. Hence, it could still try to start a direct reclaim if it is
> not executed in an atomic context (spinlock, irq-handler etc), or a
> high-priority allocation in an atomic context as a last-ditch effort.
> Since kmemleak is a debug feature, it is unlikely to be used in
> production that memory resources is scarce where direct reclaim or
> high-priority atomic allocations should not be granted lightly.
>
> Unless there is a brave soul to reimplement the kmemleak to embed it's
> metadata into the tracked memory itself in a foreseeable future, this
> provides a good balance between enabling kmemleak in a low-memory
> situation and not introducing too much hackiness into the existing
> code for now.
Embedding the metadata would help with the slab allocations (though not
with vmalloc) but it comes with its own potential issues. There are some
bits of kmemleak that rely on deferred freeing of metadata for RCU
traversal, so this wouldn't go well with embedding it.
I wonder whether we'd be better off to replace the metadata allocator
with gen_pool. This way we'd also get rid of early logging/replaying of
the memory allocations since we can populate the gen_pool early with a
static buffer.
--
Catalin
On Tue, Mar 26, 2019 at 09:05:36AM -0700, Matthew Wilcox wrote:
> On Tue, Mar 26, 2019 at 11:43:38AM -0400, Qian Cai wrote:
> > Unless there is a brave soul to reimplement the kmemleak to embed it's
> > metadata into the tracked memory itself in a foreseeable future, this
> > provides a good balance between enabling kmemleak in a low-memory
> > situation and not introducing too much hackiness into the existing
> > code for now.
>
> I don't understand kmemleak. Kirill pointed me at this a few days ago:
>
> https://gist.github.com/kiryl/3225e235fea390aa2e49bf625bbe83ec
>
> It's caused by the XArray allocating memory using GFP_NOWAIT | __GFP_NOWARN.
> kmemleak then decides it needs to allocate memory to track this memory.
> So it calls kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
>
> #define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \
> __GFP_NORETRY | __GFP_NOMEMALLOC | \
> __GFP_NOWARN | __GFP_NOFAIL)
>
> then the page allocator gets to see GFP_NOFAIL | GFP_NOWAIT and gets angry.
>
> But I don't understand why kmemleak needs to mess with the GFP flags at
> all.
Originally, it was just preserving GFP_KERNEL | GFP_ATOMIC. Starting
with commit 6ae4bd1f0bc4 ("kmemleak: Allow kmemleak metadata allocations
to fail"), this mask changed, aimed at making kmemleak allocation
failures less verbose (i.e. just disable it since it's a debug tool).
Commit d9570ee3bd1d ("kmemleak: allow to coexist with fault injection")
introduced __GFP_NOFAIL but this came with its own problems which have
been previously reported (the warning you mentioned is another one of
these). We didn't get to any clear conclusion on how best to allow
allocations to fail with fault injection but not for the kmemleak
metadata. Your suggestion below would probably do the trick.
> Just allocate using the same flags as the caller, and fail the original
> allocation if the kmemleak allocation fails. Like this:
>
> +++ b/mm/slab.h
> @@ -435,12 +435,22 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags,
> for (i = 0; i < size; i++) {
> p[i] = kasan_slab_alloc(s, p[i], flags);
> /* As p[i] might get tagged, call kmemleak hook after KASAN. */
> - kmemleak_alloc_recursive(p[i], s->object_size, 1,
> - s->flags, flags);
> + if (kmemleak_alloc_recursive(p[i], s->object_size, 1,
> + s->flags, flags))
> + goto fail;
> }
>
> if (memcg_kmem_enabled())
> memcg_kmem_put_cache(s);
> + return;
> +
> +fail:
> + while (i > 0) {
> + kasan_blah(...);
> + kmemleak_blah();
> + i--;
> + }
> + free_blah(p);
> + *p = NULL;
> }
>
> #ifndef CONFIG_SLOB
>
>
> and if we had something like this, we wouldn't need kmemleak to have this
> self-disabling or must-succeed property.
We'd still need the self-disabling in place since there are a few other
places where we call kmemleak_alloc() from.
--
Catalin
On 3/26/19 12:00 PM, Christopher Lameter wrote:
>> + */
>> + gfp = (in_atomic() || irqs_disabled()) ? GFP_ATOMIC :
>> + gfp_kmemleak_mask(gfp) | __GFP_DIRECT_RECLAIM;
>> + object = kmem_cache_alloc(object_cache, gfp);
>> + }
>> +
>> if (!object) {
>
> If the alloc must succeed then this check is no longer necessary.
Well, GFP_ATOMIC could still fail. It looks like the only thing that will never
fail is (__GFP_DIRECT_RECLAIM | __GFP_NOFAIL) as it keeps retrying in
__alloc_pages_slowpath().
On Tue 26-03-19 16:20:41, Catalin Marinas wrote:
> On Tue, Mar 26, 2019 at 09:05:36AM -0700, Matthew Wilcox wrote:
> > On Tue, Mar 26, 2019 at 11:43:38AM -0400, Qian Cai wrote:
> > > Unless there is a brave soul to reimplement the kmemleak to embed it's
> > > metadata into the tracked memory itself in a foreseeable future, this
> > > provides a good balance between enabling kmemleak in a low-memory
> > > situation and not introducing too much hackiness into the existing
> > > code for now.
> >
> > I don't understand kmemleak. Kirill pointed me at this a few days ago:
> >
> > https://gist.github.com/kiryl/3225e235fea390aa2e49bf625bbe83ec
> >
> > It's caused by the XArray allocating memory using GFP_NOWAIT | __GFP_NOWARN.
> > kmemleak then decides it needs to allocate memory to track this memory.
> > So it calls kmem_cache_alloc(object_cache, gfp_kmemleak_mask(gfp));
> >
> > #define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \
> > __GFP_NORETRY | __GFP_NOMEMALLOC | \
> > __GFP_NOWARN | __GFP_NOFAIL)
> >
> > then the page allocator gets to see GFP_NOFAIL | GFP_NOWAIT and gets angry.
> >
> > But I don't understand why kmemleak needs to mess with the GFP flags at
> > all.
>
> Originally, it was just preserving GFP_KERNEL | GFP_ATOMIC. Starting
> with commit 6ae4bd1f0bc4 ("kmemleak: Allow kmemleak metadata allocations
> to fail"), this mask changed, aimed at making kmemleak allocation
> failures less verbose (i.e. just disable it since it's a debug tool).
>
> Commit d9570ee3bd1d ("kmemleak: allow to coexist with fault injection")
> introduced __GFP_NOFAIL but this came with its own problems which have
> been previously reported (the warning you mentioned is another one of
> these). We didn't get to any clear conclusion on how best to allow
> allocations to fail with fault injection but not for the kmemleak
> metadata. Your suggestion below would probably do the trick.
I have objected to that on several occasions. An implicit __GFP_NOFAIL
is simply broken and __GFP_NOWAIT allocations are a shiny example of
that. You cannot loop inside the allocator for an unbound amount of time
potentially with locks held. I have heard that there are some plans to
deal with that but nothing has really materialized AFAIK. d9570ee3bd1d
should be reverted I believe.
The proper way around is to keep a pool objects and keep spare objects
for restrected allocation contexts.
--
Michal Hocko
SUSE Labs
On 3/26/19 12:06 PM, Catalin Marinas wrote:
> I wonder whether we'd be better off to replace the metadata allocator
> with gen_pool. This way we'd also get rid of early logging/replaying of
> the memory allocations since we can populate the gen_pool early with a
> static buffer.
I suppose this is not going to work well, as DMA_API_DEBUG use a similar
approach [1] but I still saw it is struggling in a low-memory situation and
disable itself occasionally.
[1] https://lkml.org/lkml/2018/12/10/383