LinuxLists.cc - Re: [PATCH v4 5/7] mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB

2016-03-01 11:57:16

Subject: Re: [PATCH v4 5/7] mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB

On 02/29/2016 08:12 PM, Dmitry Vyukov wrote:

>>> diff --git a/lib/Makefile b/lib/Makefile
>>> index a7c26a4..10a4ae3 100644
>>> --- a/lib/Makefile
>>> +++ b/lib/Makefile
>>> @@ -167,6 +167,13 @@ obj-$(CONFIG_SG_SPLIT) += sg_split.o
>>> obj-$(CONFIG_STMP_DEVICE) += stmp_device.o
>>> obj-$(CONFIG_IRQ_POLL) += irq_poll.o
>>>
>>> +ifeq ($(CONFIG_KASAN),y)
>>> +ifeq ($(CONFIG_SLAB),y)
>>
>> Just try to imagine that another subsystem wants to use stackdepot. How this gonna look like?
>>
>> We have Kconfig to describe dependencies. So, this should be under CONFIG_STACKDEPOT.
>> So any user of this feature can just do 'select STACKDEPOT' in Kconfig.
>>
>>> + obj-y += stackdepot.o
>>> + KASAN_SANITIZE_slub.o := n
_stackdepot.o

>>
>>> +
>>> + stack->hash = hash;
>>> + stack->size = size;
>>> + stack->handle.slabindex = depot_index;
>>> + stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
>>> + __memcpy(stack->entries, entries, size * sizeof(unsigned long));
>>
>> s/__memcpy/memcpy/
>
> memcpy should be instrumented by asan/tsan, and we would like to avoid
> that instrumentation here.

KASAN_SANITIZE_* := n already takes care about this.
__memcpy() is a special thing solely for kasan internals and some assembly code.
And it's not available generally.

>>> + if (unlikely(!smp_load_acquire(&next_slab_inited))) {
>>> + if (!preempt_count() && !in_irq()) {
>>
>> If you trying to detect atomic context here, than this doesn't work. E.g. you can't know
>> about held spinlocks in non-preemptible kernel.
>> And I'm not sure why need this. You know gfp flags here, so allocation in atomic context shouldn't be problem.
>
>
> We don't have gfp flags for kfree.
> I wonder how CONFIG_DEBUG_ATOMIC_SLEEP handles this. Maybe it has the answer.

It hasn't. It doesn't guarantee that atomic context always will be detected.

> Alternatively, we can always assume that we are in atomic context in kfree.
>

Or do this allocation in separate context, put in work queue.

>
>
>>> + alloc_flags &= (__GFP_RECLAIM | __GFP_IO | __GFP_FS |
>>> + __GFP_NOWARN | __GFP_NORETRY |
>>> + __GFP_NOMEMALLOC | __GFP_DIRECT_RECLAIM);
>>
>> I think blacklist approach would be better here.
>>
>>> + page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
>>
>> STACK_ALLOC_ORDER = 4 - that's a lot. Do you really need that much?
>
> Part of the issue the atomic context above. When we can't allocate
> memory we still want to save the stack trace. When we have less than
> STACK_ALLOC_ORDER memory, we try to preallocate another
> STACK_ALLOC_ORDER in advance. So in the worst case, we have
> STACK_ALLOC_ORDER memory and that should be enough to handle all
> kmalloc/kfree in the atomic context. 1 page does not look enough. I
> think Alex did some measuring of the failure race (when we are out of
> memory and can't allocate more).
>

A lot of 4-order pages will lead to high fragmentation. You don't need physically contiguous memory here,
so try to use vmalloc(). It is slower, but fragmentation won't be problem.

And one more thing. Take a look at mempool, because it's generally used to solve the problem you have here
(guaranteed allocation in atomic context).

2016-03-04 14:52:26

by Alexander Potapenko

[permalink] [raw]

Subject: Re: [PATCH v4 5/7] mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB

On Tue, Mar 1, 2016 at 12:57 PM, Andrey Ryabinin <[email protected]> wrote:
>
>
> On 02/29/2016 08:12 PM, Dmitry Vyukov wrote:
>
>>>> diff --git a/lib/Makefile b/lib/Makefile
>>>> index a7c26a4..10a4ae3 100644
>>>> --- a/lib/Makefile
>>>> +++ b/lib/Makefile
>>>> @@ -167,6 +167,13 @@ obj-$(CONFIG_SG_SPLIT) += sg_split.o
>>>> obj-$(CONFIG_STMP_DEVICE) += stmp_device.o
>>>> obj-$(CONFIG_IRQ_POLL) += irq_poll.o
>>>>
>>>> +ifeq ($(CONFIG_KASAN),y)
>>>> +ifeq ($(CONFIG_SLAB),y)
>>>
>>> Just try to imagine that another subsystem wants to use stackdepot. How this gonna look like?
>>>
>>> We have Kconfig to describe dependencies. So, this should be under CONFIG_STACKDEPOT.
>>> So any user of this feature can just do 'select STACKDEPOT' in Kconfig.
>>>
>>>> + obj-y += stackdepot.o
>>>> + KASAN_SANITIZE_slub.o := n
> _stackdepot.o
>
>
>>>
>>>> +
>>>> + stack->hash = hash;
>>>> + stack->size = size;
>>>> + stack->handle.slabindex = depot_index;
>>>> + stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
>>>> + __memcpy(stack->entries, entries, size * sizeof(unsigned long));
>>>
>>> s/__memcpy/memcpy/
>>
>> memcpy should be instrumented by asan/tsan, and we would like to avoid
>> that instrumentation here.
>
> KASAN_SANITIZE_* := n already takes care about this.
> __memcpy() is a special thing solely for kasan internals and some assembly code.
> And it's not available generally.
As far as I can see, KASAN_SANITIZE_*:=n does not guarantee it.
It just removes KASAN flags from GCC command line, it does not
necessarily replace memcpy() calls with some kind of a
non-instrumented memcpy().

We see two possible ways to deal with this problem:
1. Define "memcpy" to "__memcpy" in lib/stackdepot.c under CONFIG_KASAN.
2. Create mm/kasan/kasan_stackdepot.c stub which will include
lib/stackdepot.c, and define "memcpy" to "__memcpy" in that file.
This way we'll be able to instrument the original stackdepot.c and
won't miss reports from it if someone starts using it somewhere else.

>>>> + if (unlikely(!smp_load_acquire(&next_slab_inited))) {
>>>> + if (!preempt_count() && !in_irq()) {
>>>
>>> If you trying to detect atomic context here, than this doesn't work. E.g. you can't know
>>> about held spinlocks in non-preemptible kernel.
>>> And I'm not sure why need this. You know gfp flags here, so allocation in atomic context shouldn't be problem.
>>
>>
>> We don't have gfp flags for kfree.
>> I wonder how CONFIG_DEBUG_ATOMIC_SLEEP handles this. Maybe it has the answer.
>
> It hasn't. It doesn't guarantee that atomic context always will be detected.
>
>> Alternatively, we can always assume that we are in atomic context in kfree.
>>
>
> Or do this allocation in separate context, put in work queue.
>
>>
>>
>>>> + alloc_flags &= (__GFP_RECLAIM | __GFP_IO | __GFP_FS |
>>>> + __GFP_NOWARN | __GFP_NORETRY |
>>>> + __GFP_NOMEMALLOC | __GFP_DIRECT_RECLAIM);
>>>
>>> I think blacklist approach would be better here.
>>>
>>>> + page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
>>>
>>> STACK_ALLOC_ORDER = 4 - that's a lot. Do you really need that much?
>>
>> Part of the issue the atomic context above. When we can't allocate
>> memory we still want to save the stack trace. When we have less than
>> STACK_ALLOC_ORDER memory, we try to preallocate another
>> STACK_ALLOC_ORDER in advance. So in the worst case, we have
>> STACK_ALLOC_ORDER memory and that should be enough to handle all
>> kmalloc/kfree in the atomic context. 1 page does not look enough. I
>> think Alex did some measuring of the failure race (when we are out of
>> memory and can't allocate more).
>>
>
> A lot of 4-order pages will lead to high fragmentation. You don't need physically contiguous memory here,
> so try to use vmalloc(). It is slower, but fragmentation won't be problem.
>
> And one more thing. Take a look at mempool, because it's generally used to solve the problem you have here
> (guaranteed allocation in atomic context).
>
>

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2016-03-04 15:05:43

by Andrey Ryabinin

[permalink] [raw]

Subject: Re: [PATCH v4 5/7] mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB

2016-03-04 17:52 GMT+03:00 Alexander Potapenko <[email protected]>:
> On Tue, Mar 1, 2016 at 12:57 PM, Andrey Ryabinin <[email protected]> wrote:
>>
>>
>> On 02/29/2016 08:12 PM, Dmitry Vyukov wrote:
>>
>>>>> diff --git a/lib/Makefile b/lib/Makefile
>>>>> index a7c26a4..10a4ae3 100644
>>>>> --- a/lib/Makefile
>>>>> +++ b/lib/Makefile
>>>>> @@ -167,6 +167,13 @@ obj-$(CONFIG_SG_SPLIT) += sg_split.o
>>>>> obj-$(CONFIG_STMP_DEVICE) += stmp_device.o
>>>>> obj-$(CONFIG_IRQ_POLL) += irq_poll.o
>>>>>
>>>>> +ifeq ($(CONFIG_KASAN),y)
>>>>> +ifeq ($(CONFIG_SLAB),y)
>>>>
>>>> Just try to imagine that another subsystem wants to use stackdepot. How this gonna look like?
>>>>
>>>> We have Kconfig to describe dependencies. So, this should be under CONFIG_STACKDEPOT.
>>>> So any user of this feature can just do 'select STACKDEPOT' in Kconfig.
>>>>
>>>>> + obj-y += stackdepot.o
>>>>> + KASAN_SANITIZE_slub.o := n
>> _stackdepot.o
>>
>>
>>>>
>>>>> +
>>>>> + stack->hash = hash;
>>>>> + stack->size = size;
>>>>> + stack->handle.slabindex = depot_index;
>>>>> + stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
>>>>> + __memcpy(stack->entries, entries, size * sizeof(unsigned long));
>>>>
>>>> s/__memcpy/memcpy/
>>>
>>> memcpy should be instrumented by asan/tsan, and we would like to avoid
>>> that instrumentation here.
>>
>> KASAN_SANITIZE_* := n already takes care about this.
>> __memcpy() is a special thing solely for kasan internals and some assembly code.
>> And it's not available generally.
> As far as I can see, KASAN_SANITIZE_*:=n does not guarantee it.
> It just removes KASAN flags from GCC command line, it does not
> necessarily replace memcpy() calls with some kind of a
> non-instrumented memcpy().
>

With removed kasan cflags '__SANITIZE_ADDRESS__' is not defined,
hence enable the following defines from arch/x86/include/asm/string_64.h:

#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)

/*
* For files that not instrumented (e.g. mm/slub.c) we
* should use not instrumented version of mem* functions.
*/

#undef memcpy
#define memcpy(dst, src, len) __memcpy(dst, src, len)
#define memmove(dst, src, len) __memmove(dst, src, len)
#define memset(s, c, n) __memset(s, c, n)
#endif

2016-03-04 15:06:59

by Alexander Potapenko

[permalink] [raw]

Subject: Re: [PATCH v4 5/7] mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB

On Fri, Mar 4, 2016 at 4:01 PM, Andrey Ryabinin <[email protected]> wrote:
> 2016-03-04 17:52 GMT+03:00 Alexander Potapenko <[email protected]>:
>> On Tue, Mar 1, 2016 at 12:57 PM, Andrey Ryabinin <[email protected]> wrote:
>>>
>>>
>>> On 02/29/2016 08:12 PM, Dmitry Vyukov wrote:
>>>
>>>>>> diff --git a/lib/Makefile b/lib/Makefile
>>>>>> index a7c26a4..10a4ae3 100644
>>>>>> --- a/lib/Makefile
>>>>>> +++ b/lib/Makefile
>>>>>> @@ -167,6 +167,13 @@ obj-$(CONFIG_SG_SPLIT) += sg_split.o
>>>>>> obj-$(CONFIG_STMP_DEVICE) += stmp_device.o
>>>>>> obj-$(CONFIG_IRQ_POLL) += irq_poll.o
>>>>>>
>>>>>> +ifeq ($(CONFIG_KASAN),y)
>>>>>> +ifeq ($(CONFIG_SLAB),y)
>>>>>
>>>>> Just try to imagine that another subsystem wants to use stackdepot. How this gonna look like?
>>>>>
>>>>> We have Kconfig to describe dependencies. So, this should be under CONFIG_STACKDEPOT.
>>>>> So any user of this feature can just do 'select STACKDEPOT' in Kconfig.
>>>>>
>>>>>> + obj-y += stackdepot.o
>>>>>> + KASAN_SANITIZE_slub.o := n
>>> _stackdepot.o
>>>
>>>
>>>>>
>>>>>> +
>>>>>> + stack->hash = hash;
>>>>>> + stack->size = size;
>>>>>> + stack->handle.slabindex = depot_index;
>>>>>> + stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
>>>>>> + __memcpy(stack->entries, entries, size * sizeof(unsigned long));
>>>>>
>>>>> s/__memcpy/memcpy/
>>>>
>>>> memcpy should be instrumented by asan/tsan, and we would like to avoid
>>>> that instrumentation here.
>>>
>>> KASAN_SANITIZE_* := n already takes care about this.
>>> __memcpy() is a special thing solely for kasan internals and some assembly code.
>>> And it's not available generally.
>> As far as I can see, KASAN_SANITIZE_*:=n does not guarantee it.
>> It just removes KASAN flags from GCC command line, it does not
>> necessarily replace memcpy() calls with some kind of a
>> non-instrumented memcpy().
>>
>
> With removed kasan cflags '__SANITIZE_ADDRESS__' is not defined,
> hence enable the following defines from arch/x86/include/asm/string_64.h:
>
> #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
>
> /*
> * For files that not instrumented (e.g. mm/slub.c) we
> * should use not instrumented version of mem* functions.
> */
>
> #undef memcpy
> #define memcpy(dst, src, len) __memcpy(dst, src, len)
> #define memmove(dst, src, len) __memmove(dst, src, len)
> #define memset(s, c, n) __memset(s, c, n)
> #endif
Nice!
What do you think about providing stub .c files to decouple the shared
code used by KASAN runtime from the rest of kernel?
(This is a completely different story though and can be done separately).

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2016-03-04 16:30:32

by Andrey Ryabinin

[permalink] [raw]

Subject: Re: [PATCH v4 5/7] mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB

2016-03-04 18:06 GMT+03:00 Alexander Potapenko <[email protected]>:
> On Fri, Mar 4, 2016 at 4:01 PM, Andrey Ryabinin <[email protected]> wrote:
>> 2016-03-04 17:52 GMT+03:00 Alexander Potapenko <[email protected]>:
>>> On Tue, Mar 1, 2016 at 12:57 PM, Andrey Ryabinin <[email protected]> wrote:
>>>>>>> +
>>>>>>> + stack->hash = hash;
>>>>>>> + stack->size = size;
>>>>>>> + stack->handle.slabindex = depot_index;
>>>>>>> + stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
>>>>>>> + __memcpy(stack->entries, entries, size * sizeof(unsigned long));
>>>>>>
>>>>>> s/__memcpy/memcpy/
>>>>>
>>>>> memcpy should be instrumented by asan/tsan, and we would like to avoid
>>>>> that instrumentation here.
>>>>
>>>> KASAN_SANITIZE_* := n already takes care about this.
>>>> __memcpy() is a special thing solely for kasan internals and some assembly code.
>>>> And it's not available generally.
>>> As far as I can see, KASAN_SANITIZE_*:=n does not guarantee it.
>>> It just removes KASAN flags from GCC command line, it does not
>>> necessarily replace memcpy() calls with some kind of a
>>> non-instrumented memcpy().
>>>
>>
>> With removed kasan cflags '__SANITIZE_ADDRESS__' is not defined,
>> hence enable the following defines from arch/x86/include/asm/string_64.h:
>>
>> #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
>>
>> /*
>> * For files that not instrumented (e.g. mm/slub.c) we
>> * should use not instrumented version of mem* functions.
>> */
>>
>> #undef memcpy
>> #define memcpy(dst, src, len) __memcpy(dst, src, len)
>> #define memmove(dst, src, len) __memmove(dst, src, len)
>> #define memset(s, c, n) __memset(s, c, n)
>> #endif
> Nice!
> What do you think about providing stub .c files to decouple the shared
> code used by KASAN runtime from the rest of kernel?

Actually, I'm not quite understand why you need that at all, but your
idea will not link due to multiple definitions of the same functions.
Link problem should be easy to workaround with 'objcopy
--prefix-symbol=' though.

> (This is a completely different story though and can be done separately).
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg

2016-03-08 11:42:42

by Alexander Potapenko

[permalink] [raw]

Subject: Re: [PATCH v4 5/7] mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB

On Tue, Mar 1, 2016 at 12:57 PM, Andrey Ryabinin <[email protected]> wrote:
>
>
> On 02/29/2016 08:12 PM, Dmitry Vyukov wrote:
>
>>>> diff --git a/lib/Makefile b/lib/Makefile
>>>> index a7c26a4..10a4ae3 100644
>>>> --- a/lib/Makefile
>>>> +++ b/lib/Makefile
>>>> @@ -167,6 +167,13 @@ obj-$(CONFIG_SG_SPLIT) += sg_split.o
>>>> obj-$(CONFIG_STMP_DEVICE) += stmp_device.o
>>>> obj-$(CONFIG_IRQ_POLL) += irq_poll.o
>>>>
>>>> +ifeq ($(CONFIG_KASAN),y)
>>>> +ifeq ($(CONFIG_SLAB),y)
>>>
>>> Just try to imagine that another subsystem wants to use stackdepot. How this gonna look like?
>>>
>>> We have Kconfig to describe dependencies. So, this should be under CONFIG_STACKDEPOT.
>>> So any user of this feature can just do 'select STACKDEPOT' in Kconfig.
>>>
>>>> + obj-y += stackdepot.o
>>>> + KASAN_SANITIZE_slub.o := n
> _stackdepot.o
>
>
>>>
>>>> +
>>>> + stack->hash = hash;
>>>> + stack->size = size;
>>>> + stack->handle.slabindex = depot_index;
>>>> + stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
>>>> + __memcpy(stack->entries, entries, size * sizeof(unsigned long));
>>>
>>> s/__memcpy/memcpy/
>>
>> memcpy should be instrumented by asan/tsan, and we would like to avoid
>> that instrumentation here.
>
> KASAN_SANITIZE_* := n already takes care about this.
> __memcpy() is a special thing solely for kasan internals and some assembly code.
> And it's not available generally.
>
>
>>>> + if (unlikely(!smp_load_acquire(&next_slab_inited))) {
>>>> + if (!preempt_count() && !in_irq()) {
>>>
>>> If you trying to detect atomic context here, than this doesn't work. E.g. you can't know
>>> about held spinlocks in non-preemptible kernel.
>>> And I'm not sure why need this. You know gfp flags here, so allocation in atomic context shouldn't be problem.
>>
>>
>> We don't have gfp flags for kfree.
>> I wonder how CONFIG_DEBUG_ATOMIC_SLEEP handles this. Maybe it has the answer.
>
> It hasn't. It doesn't guarantee that atomic context always will be detected.
>
>> Alternatively, we can always assume that we are in atomic context in kfree.
>>
>
> Or do this allocation in separate context, put in work queue.
>
>>
>>
>>>> + alloc_flags &= (__GFP_RECLAIM | __GFP_IO | __GFP_FS |
>>>> + __GFP_NOWARN | __GFP_NORETRY |
>>>> + __GFP_NOMEMALLOC | __GFP_DIRECT_RECLAIM);
>>>
>>> I think blacklist approach would be better here.
>>>
>>>> + page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
>>>
>>> STACK_ALLOC_ORDER = 4 - that's a lot. Do you really need that much?
>>
>> Part of the issue the atomic context above. When we can't allocate
>> memory we still want to save the stack trace. When we have less than
>> STACK_ALLOC_ORDER memory, we try to preallocate another
>> STACK_ALLOC_ORDER in advance. So in the worst case, we have
>> STACK_ALLOC_ORDER memory and that should be enough to handle all
>> kmalloc/kfree in the atomic context. 1 page does not look enough. I
>> think Alex did some measuring of the failure race (when we are out of
>> memory and can't allocate more).
>>
>
> A lot of 4-order pages will lead to high fragmentation. You don't need physically contiguous memory here,
> so try to use vmalloc(). It is slower, but fragmentation won't be problem.
I've tried using vmalloc(), but turned out it's calling KASAN hooks
again. Dealing with reentrancy in this case sounds like an overkill.
Given that we only require 9 Mb most of the time, is allocating
physical pages still a problem?

> And one more thing. Take a look at mempool, because it's generally used to solve the problem you have here
> (guaranteed allocation in atomic context).
As far as I understood the docs, mempools have a drawback of
allocating too much memory which won't be available for any other use.
O'Reily's "Linux Device Drivers" even suggests not using mempools in
any case when it's easier to deal with allocation failures (that
advice is for device drivers, not sure if that stands for other
subsystems though).

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2016-03-10 16:58:49

by Andrey Ryabinin

[permalink] [raw]

Subject: Re: [PATCH v4 5/7] mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB

2016-03-08 14:42 GMT+03:00 Alexander Potapenko <[email protected]>:
> On Tue, Mar 1, 2016 at 12:57 PM, Andrey Ryabinin <[email protected]> wrote:
>>>>
>>>>> + page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
>>>>
>>>> STACK_ALLOC_ORDER = 4 - that's a lot. Do you really need that much?
>>>
>>> Part of the issue the atomic context above. When we can't allocate
>>> memory we still want to save the stack trace. When we have less than
>>> STACK_ALLOC_ORDER memory, we try to preallocate another
>>> STACK_ALLOC_ORDER in advance. So in the worst case, we have
>>> STACK_ALLOC_ORDER memory and that should be enough to handle all
>>> kmalloc/kfree in the atomic context. 1 page does not look enough. I
>>> think Alex did some measuring of the failure race (when we are out of
>>> memory and can't allocate more).
>>>
>>
>> A lot of 4-order pages will lead to high fragmentation. You don't need physically contiguous memory here,
>> so try to use vmalloc(). It is slower, but fragmentation won't be problem.
> I've tried using vmalloc(), but turned out it's calling KASAN hooks
> again. Dealing with reentrancy in this case sounds like an overkill.

We'll have to deal with recursion eventually. Using stackdepot for
page owner will cause recursion.

> Given that we only require 9 Mb most of the time, is allocating
> physical pages still a problem?
>

This is not about size, this about fragmentation. vmalloc allows to
utilize available low-order pages,
hence reduce the fragmentation.

>> And one more thing. Take a look at mempool, because it's generally used to solve the problem you have here
>> (guaranteed allocation in atomic context).
> As far as I understood the docs, mempools have a drawback of
> allocating too much memory which won't be available for any other use.

As far as I understood your code, it has a drawback of
allocating too much memory which won't be available for any other use ;)

However, now I think that mempool doesn't fit here. We never free
memory => never return it to pool.
And this will cause 5sec delays between allocation retries in mempool_alloc().

> O'Reily's "Linux Device Drivers" even suggests not using mempools in
> any case when it's easier to deal with allocation failures (that
> advice is for device drivers, not sure if that stands for other
> subsystems though).
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg

2016-03-11 11:18:29

by Alexander Potapenko

[permalink] [raw]

Subject: Re: [PATCH v4 5/7] mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB

On Thu, Mar 10, 2016 at 5:58 PM, Andrey Ryabinin <[email protected]> wrote:
> 2016-03-08 14:42 GMT+03:00 Alexander Potapenko <[email protected]>:
>> On Tue, Mar 1, 2016 at 12:57 PM, Andrey Ryabinin <[email protected]> wrote:
>>>>>
>>>>>> + page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
>>>>>
>>>>> STACK_ALLOC_ORDER = 4 - that's a lot. Do you really need that much?
>>>>
>>>> Part of the issue the atomic context above. When we can't allocate
>>>> memory we still want to save the stack trace. When we have less than
>>>> STACK_ALLOC_ORDER memory, we try to preallocate another
>>>> STACK_ALLOC_ORDER in advance. So in the worst case, we have
>>>> STACK_ALLOC_ORDER memory and that should be enough to handle all
>>>> kmalloc/kfree in the atomic context. 1 page does not look enough. I
>>>> think Alex did some measuring of the failure race (when we are out of
>>>> memory and can't allocate more).
>>>>
>>>
>>> A lot of 4-order pages will lead to high fragmentation. You don't need physically contiguous memory here,
>>> so try to use vmalloc(). It is slower, but fragmentation won't be problem.
>> I've tried using vmalloc(), but turned out it's calling KASAN hooks
>> again. Dealing with reentrancy in this case sounds like an overkill.
>
> We'll have to deal with recursion eventually. Using stackdepot for
> page owner will cause recursion.
>
>> Given that we only require 9 Mb most of the time, is allocating
>> physical pages still a problem?
>>
>
> This is not about size, this about fragmentation. vmalloc allows to
> utilize available low-order pages,
> hence reduce the fragmentation.
I've attempted to add __vmalloc(STACK_ALLOC_SIZE, alloc_flags,
PAGE_KERNEL) (also tried vmalloc(STACK_ALLOC_SIZE)) instead of
page_alloc() and am now getting a crash in
kmem_cache_alloc_node_trace() in mm/slab.c, because it doesn't allow
the kmem_cache pointer to be NULL (it's dereferenced when calling
trace_kmalloc_node()).

Steven, do you know if this because of my code violating some contract
(e.g. I'm calling vmalloc() too early, when kmalloc_caches[] haven't
been initialized), or is this a bug in kmem_cache_alloc_node_trace()
itself?

>>> And one more thing. Take a look at mempool, because it's generally used to solve the problem you have here
>>> (guaranteed allocation in atomic context).
>> As far as I understood the docs, mempools have a drawback of
>> allocating too much memory which won't be available for any other use.
>
> As far as I understood your code, it has a drawback of
> allocating too much memory which won't be available for any other use ;)
>
> However, now I think that mempool doesn't fit here. We never free
> memory => never return it to pool.
> And this will cause 5sec delays between allocation retries in mempool_alloc().
>
>
>> O'Reily's "Linux Device Drivers" even suggests not using mempools in
>> any case when it's easier to deal with allocation failures (that
>> advice is for device drivers, not sure if that stands for other
>> subsystems though).
>>
>>
>> --
>> Alexander Potapenko
>> Software Engineer
>>
>> Google Germany GmbH
>> Erika-Mann-Straße, 33
>> 80636 München
>>
>> Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
>> Registergericht und -nummer: Hamburg, HRB 86891
>> Sitz der Gesellschaft: Hamburg

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2016-03-11 11:43:50

by Andrey Ryabinin

[permalink] [raw]

Subject: Re: [PATCH v4 5/7] mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB

On 03/11/2016 02:18 PM, Alexander Potapenko wrote:
> On Thu, Mar 10, 2016 at 5:58 PM, Andrey Ryabinin <[email protected]> wrote:
>> 2016-03-08 14:42 GMT+03:00 Alexander Potapenko <[email protected]>:
>>> On Tue, Mar 1, 2016 at 12:57 PM, Andrey Ryabinin <[email protected]> wrote:
>>>>>>
>>>>>>> + page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
>>>>>>
>>>>>> STACK_ALLOC_ORDER = 4 - that's a lot. Do you really need that much?
>>>>>
>>>>> Part of the issue the atomic context above. When we can't allocate
>>>>> memory we still want to save the stack trace. When we have less than
>>>>> STACK_ALLOC_ORDER memory, we try to preallocate another
>>>>> STACK_ALLOC_ORDER in advance. So in the worst case, we have
>>>>> STACK_ALLOC_ORDER memory and that should be enough to handle all
>>>>> kmalloc/kfree in the atomic context. 1 page does not look enough. I
>>>>> think Alex did some measuring of the failure race (when we are out of
>>>>> memory and can't allocate more).
>>>>>
>>>>
>>>> A lot of 4-order pages will lead to high fragmentation. You don't need physically contiguous memory here,
>>>> so try to use vmalloc(). It is slower, but fragmentation won't be problem.
>>> I've tried using vmalloc(), but turned out it's calling KASAN hooks
>>> again. Dealing with reentrancy in this case sounds like an overkill.
>>
>> We'll have to deal with recursion eventually. Using stackdepot for
>> page owner will cause recursion.
>>
>>> Given that we only require 9 Mb most of the time, is allocating
>>> physical pages still a problem?
>>>
>>
>> This is not about size, this about fragmentation. vmalloc allows to
>> utilize available low-order pages,
>> hence reduce the fragmentation.
> I've attempted to add __vmalloc(STACK_ALLOC_SIZE, alloc_flags,
> PAGE_KERNEL) (also tried vmalloc(STACK_ALLOC_SIZE)) instead of
> page_alloc() and am now getting a crash in
> kmem_cache_alloc_node_trace() in mm/slab.c, because it doesn't allow
> the kmem_cache pointer to be NULL (it's dereferenced when calling
> trace_kmalloc_node()).
>
> Steven, do you know if this because of my code violating some contract
> (e.g. I'm calling vmalloc() too early, when kmalloc_caches[] haven't
> been initialized),

Probably. kmem_cache_init() goes before vmalloc_init().

> or is this a bug in kmem_cache_alloc_node_trace()
> itself?
>

2016-03-11 14:50:05

by Alexander Potapenko

[permalink] [raw]

Subject: Re: [PATCH v4 5/7] mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB

On Fri, Mar 11, 2016 at 12:43 PM, Andrey Ryabinin
<[email protected]> wrote:
>
>
> On 03/11/2016 02:18 PM, Alexander Potapenko wrote:
>> On Thu, Mar 10, 2016 at 5:58 PM, Andrey Ryabinin <[email protected]> wrote:
>>> 2016-03-08 14:42 GMT+03:00 Alexander Potapenko <[email protected]>:
>>>> On Tue, Mar 1, 2016 at 12:57 PM, Andrey Ryabinin <[email protected]> wrote:
>>>>>>>
>>>>>>>> + page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
>>>>>>>
>>>>>>> STACK_ALLOC_ORDER = 4 - that's a lot. Do you really need that much?
>>>>>>
>>>>>> Part of the issue the atomic context above. When we can't allocate
>>>>>> memory we still want to save the stack trace. When we have less than
>>>>>> STACK_ALLOC_ORDER memory, we try to preallocate another
>>>>>> STACK_ALLOC_ORDER in advance. So in the worst case, we have
>>>>>> STACK_ALLOC_ORDER memory and that should be enough to handle all
>>>>>> kmalloc/kfree in the atomic context. 1 page does not look enough. I
>>>>>> think Alex did some measuring of the failure race (when we are out of
>>>>>> memory and can't allocate more).
>>>>>>
>>>>>
>>>>> A lot of 4-order pages will lead to high fragmentation. You don't need physically contiguous memory here,
>>>>> so try to use vmalloc(). It is slower, but fragmentation won't be problem.
>>>> I've tried using vmalloc(), but turned out it's calling KASAN hooks
>>>> again. Dealing with reentrancy in this case sounds like an overkill.
>>>
>>> We'll have to deal with recursion eventually. Using stackdepot for
>>> page owner will cause recursion.
>>>
>>>> Given that we only require 9 Mb most of the time, is allocating
>>>> physical pages still a problem?
>>>>
>>>
>>> This is not about size, this about fragmentation. vmalloc allows to
>>> utilize available low-order pages,
>>> hence reduce the fragmentation.
>> I've attempted to add __vmalloc(STACK_ALLOC_SIZE, alloc_flags,
>> PAGE_KERNEL) (also tried vmalloc(STACK_ALLOC_SIZE)) instead of
>> page_alloc() and am now getting a crash in
>> kmem_cache_alloc_node_trace() in mm/slab.c, because it doesn't allow
>> the kmem_cache pointer to be NULL (it's dereferenced when calling
>> trace_kmalloc_node()).
>>
>> Steven, do you know if this because of my code violating some contract
>> (e.g. I'm calling vmalloc() too early, when kmalloc_caches[] haven't
>> been initialized),
>
> Probably. kmem_cache_init() goes before vmalloc_init().
The solution I'm currently testing is to introduce a per-CPU recursion
flag that depot_save_stack() checks and bails out if it's set.
In addition I look at |kmalloc_caches[KMALLOC_SHIFT_HIGH]| and
in_interrupt() to see if vmalloc() is available.
In the case it is not, I fall back to alloc_pages().

Right now (after 20 minutes of running Trinity) vmalloc() has been
called 490 times, alloc_pages() - only 13 times.
I hope it's now much better from the fragmentation point of view.
>
>> or is this a bug in kmem_cache_alloc_node_trace()
>> itself?
>>

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2016-03-11 16:10:39

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH v4 5/7] mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB

On Fri, 11 Mar 2016 14:43:45 +0300
Andrey Ryabinin <[email protected]> wrote:

> >> This is not about size, this about fragmentation. vmalloc allows to
> >> utilize available low-order pages,
> >> hence reduce the fragmentation.
> > I've attempted to add __vmalloc(STACK_ALLOC_SIZE, alloc_flags,
> > PAGE_KERNEL) (also tried vmalloc(STACK_ALLOC_SIZE)) instead of
> > page_alloc() and am now getting a crash in
> > kmem_cache_alloc_node_trace() in mm/slab.c, because it doesn't allow
> > the kmem_cache pointer to be NULL (it's dereferenced when calling
> > trace_kmalloc_node()).
> >
> > Steven, do you know if this because of my code violating some contract
> > (e.g. I'm calling vmalloc() too early, when kmalloc_caches[] haven't
> > been initialized),
>
> Probably. kmem_cache_init() goes before vmalloc_init().

Agreed, that function can not be called with cachep NULL, nor can it be
called before kmem_cache is set up to point to kmem_cache_boot.

-- Steve