2016-10-22 15:17:31

by Christoph Hellwig

[permalink] [raw]
Subject: reduce latency in __purge_vmap_area_lazy

Hi all,

this is my spin at sorting out the long lock hold times in
__purge_vmap_area_lazy. It is based on the patch from Joel sent this
week. I don't have any good numbers for it, but it survived an
xfstests run on XFS which is a significant vmalloc user.


2016-10-22 15:17:33

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 2/7] mm: remove free_unmap_vmap_area_addr

Just inline it into the only caller.

Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Jisheng Zhang <[email protected]>
---
mm/vmalloc.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 45de736..cf1a5ab 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -731,16 +731,6 @@ static struct vmap_area *find_vmap_area(unsigned long addr)
return va;
}

-static void free_unmap_vmap_area_addr(unsigned long addr)
-{
- struct vmap_area *va;
-
- va = find_vmap_area(addr);
- BUG_ON(!va);
- free_unmap_vmap_area(va);
-}
-
-
/*** Per cpu kva allocator ***/

/*
@@ -1098,6 +1088,7 @@ void vm_unmap_ram(const void *mem, unsigned int count)
{
unsigned long size = (unsigned long)count << PAGE_SHIFT;
unsigned long addr = (unsigned long)mem;
+ struct vmap_area *va;

BUG_ON(!addr);
BUG_ON(addr < VMALLOC_START);
@@ -1107,10 +1098,14 @@ void vm_unmap_ram(const void *mem, unsigned int count)
debug_check_no_locks_freed(mem, size);
vmap_debug_free_range(addr, addr+size);

- if (likely(count <= VMAP_MAX_ALLOC))
+ if (likely(count <= VMAP_MAX_ALLOC)) {
vb_free(mem, size);
- else
- free_unmap_vmap_area_addr(addr);
+ return;
+ }
+
+ va = find_vmap_area(addr);
+ BUG_ON(!va);
+ free_unmap_vmap_area(va);
}
EXPORT_SYMBOL(vm_unmap_ram);

--
2.1.4

2016-10-22 15:17:41

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 4/7] mm: defer vmalloc from atomic context

We want to be able to use a sleeping lock for freeing vmap to keep
latency down. For this we need to use the deferred vfree mechanisms
no only from interrupt, but from any atomic context.

Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/vmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a4e2cec..bcc1a64 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1509,7 +1509,7 @@ void vfree(const void *addr)

if (!addr)
return;
- if (unlikely(in_interrupt())) {
+ if (unlikely(in_atomic())) {
struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
if (llist_add((struct llist_node *)addr, &p->list))
schedule_work(&p->wq);
--
2.1.4

2016-10-22 15:17:44

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 6/7] mm: turn vmap_purge_lock into a mutex

The purge_lock spinlock causes high latencies with non RT kernel. This
has been reported multiple times on lkml [1] [2] and affects
applications like audio.

This patch replaces it with a mutex to allow preemption while holding
the lock.

Thanks to Joel Fernandes for the detailed report and analysis as well
as an earlier attempt at fixing this issue.

[1] http://lists.openwall.net/linux-kernel/2016/03/23/29
[2] https://lkml.org/lkml/2016/10/9/59

Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Jisheng Zhang <[email protected]>
---
mm/vmalloc.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0e7f523..23d6797 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -606,7 +606,7 @@ static atomic_t vmap_lazy_nr = ATOMIC_INIT(0);
* by this look, but we want to avoid concurrent calls for performance
* reasons and to make the pcpu_get_vm_areas more deterministic.
*/
-static DEFINE_SPINLOCK(vmap_purge_lock);
+static DEFINE_MUTEX(vmap_purge_lock);

/* for per-CPU blocks */
static void purge_fragmented_blocks_allcpus(void);
@@ -660,9 +660,9 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
*/
static void try_purge_vmap_area_lazy(void)
{
- if (spin_trylock(&vmap_purge_lock)) {
+ if (mutex_trylock(&vmap_purge_lock)) {
__purge_vmap_area_lazy(ULONG_MAX, 0);
- spin_unlock(&vmap_purge_lock);
+ mutex_unlock(&vmap_purge_lock);
}
}

@@ -671,10 +671,10 @@ static void try_purge_vmap_area_lazy(void)
*/
static void purge_vmap_area_lazy(void)
{
- spin_lock(&vmap_purge_lock);
+ mutex_lock(&vmap_purge_lock);
purge_fragmented_blocks_allcpus();
__purge_vmap_area_lazy(ULONG_MAX, 0);
- spin_unlock(&vmap_purge_lock);
+ mutex_unlock(&vmap_purge_lock);
}

/*
@@ -1063,11 +1063,11 @@ void vm_unmap_aliases(void)
rcu_read_unlock();
}

- spin_lock(&vmap_purge_lock);
+ mutex_lock(&vmap_purge_lock);
purge_fragmented_blocks_allcpus();
if (!__purge_vmap_area_lazy(start, end) && flush)
flush_tlb_kernel_range(start, end);
- spin_unlock(&vmap_purge_lock);
+ mutex_unlock(&vmap_purge_lock);
}
EXPORT_SYMBOL_GPL(vm_unmap_aliases);

--
2.1.4

2016-10-22 15:17:52

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 7/7] mm: add preempt points into __purge_vmap_area_lazy

From: Joel Fernandes <[email protected]>

Use cond_resched_lock to avoid holding the vmap_area_lock for a
potentially long time and thus creating bad latencies for various
workloads.

Signed-off-by: Joel Fernandes <[email protected]>
[hch: split from a larger patch by Joel, wrote the crappy changelog]
Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Jisheng Zhang <[email protected]>
---
mm/vmalloc.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 23d6797..6c8b921 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -628,7 +628,7 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
struct llist_node *valist;
struct vmap_area *va;
struct vmap_area *n_va;
- int nr = 0;
+ bool do_free = false;

lockdep_assert_held(&vmap_purge_lock);

@@ -638,18 +638,22 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
start = va->va_start;
if (va->va_end > end)
end = va->va_end;
- nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
+ do_free = true;
}

- if (!nr)
+ if (!do_free)
return false;

- atomic_sub(nr, &vmap_lazy_nr);
flush_tlb_kernel_range(start, end);

spin_lock(&vmap_area_lock);
- llist_for_each_entry_safe(va, n_va, valist, purge_list)
+ llist_for_each_entry_safe(va, n_va, valist, purge_list) {
+ int nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
+
__free_vmap_area(va);
+ atomic_sub(nr, &vmap_lazy_nr);
+ cond_resched_lock(&vmap_area_lock);
+ }
spin_unlock(&vmap_area_lock);
return true;
}
--
2.1.4

2016-10-22 15:17:39

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 3/7] mm: refactor __purge_vmap_area_lazy

Move the purge_lock synchronization to the callers, move the call to
purge_fragmented_blocks_allcpus at the beginning of the function to
the callers that need it, move the force_flush behavior to the caller
that needs it, and pass start and end by value instead of by reference.

No change in behavior.

Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Jisheng Zhang <[email protected]>
---
mm/vmalloc.c | 80 ++++++++++++++++++++++++++----------------------------------
1 file changed, 35 insertions(+), 45 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index cf1a5ab..a4e2cec 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -601,6 +601,13 @@ static unsigned long lazy_max_pages(void)

static atomic_t vmap_lazy_nr = ATOMIC_INIT(0);

+/*
+ * Serialize vmap purging. There is no actual criticial section protected
+ * by this look, but we want to avoid concurrent calls for performance
+ * reasons and to make the pcpu_get_vm_areas more deterministic.
+ */
+static DEFINE_SPINLOCK(vmap_purge_lock);
+
/* for per-CPU blocks */
static void purge_fragmented_blocks_allcpus(void);

@@ -615,59 +622,36 @@ void set_iounmap_nonlazy(void)

/*
* Purges all lazily-freed vmap areas.
- *
- * If sync is 0 then don't purge if there is already a purge in progress.
- * If force_flush is 1, then flush kernel TLBs between *start and *end even
- * if we found no lazy vmap areas to unmap (callers can use this to optimise
- * their own TLB flushing).
- * Returns with *start = min(*start, lowest purged address)
- * *end = max(*end, highest purged address)
*/
-static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end,
- int sync, int force_flush)
+static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
{
- static DEFINE_SPINLOCK(purge_lock);
struct llist_node *valist;
struct vmap_area *va;
struct vmap_area *n_va;
int nr = 0;

- /*
- * If sync is 0 but force_flush is 1, we'll go sync anyway but callers
- * should not expect such behaviour. This just simplifies locking for
- * the case that isn't actually used at the moment anyway.
- */
- if (!sync && !force_flush) {
- if (!spin_trylock(&purge_lock))
- return;
- } else
- spin_lock(&purge_lock);
-
- if (sync)
- purge_fragmented_blocks_allcpus();
+ lockdep_assert_held(&vmap_purge_lock);

valist = llist_del_all(&vmap_purge_list);
llist_for_each_entry(va, valist, purge_list) {
- if (va->va_start < *start)
- *start = va->va_start;
- if (va->va_end > *end)
- *end = va->va_end;
+ if (va->va_start < start)
+ start = va->va_start;
+ if (va->va_end > end)
+ end = va->va_end;
nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
}

- if (nr)
- atomic_sub(nr, &vmap_lazy_nr);
+ if (!nr)
+ return false;

- if (nr || force_flush)
- flush_tlb_kernel_range(*start, *end);
+ atomic_sub(nr, &vmap_lazy_nr);
+ flush_tlb_kernel_range(start, end);

- if (nr) {
- spin_lock(&vmap_area_lock);
- llist_for_each_entry_safe(va, n_va, valist, purge_list)
- __free_vmap_area(va);
- spin_unlock(&vmap_area_lock);
- }
- spin_unlock(&purge_lock);
+ spin_lock(&vmap_area_lock);
+ llist_for_each_entry_safe(va, n_va, valist, purge_list)
+ __free_vmap_area(va);
+ spin_unlock(&vmap_area_lock);
+ return true;
}

/*
@@ -676,9 +660,10 @@ static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end,
*/
static void try_purge_vmap_area_lazy(void)
{
- unsigned long start = ULONG_MAX, end = 0;
-
- __purge_vmap_area_lazy(&start, &end, 0, 0);
+ if (spin_trylock(&vmap_purge_lock)) {
+ __purge_vmap_area_lazy(ULONG_MAX, 0);
+ spin_unlock(&vmap_purge_lock);
+ }
}

/*
@@ -686,9 +671,10 @@ static void try_purge_vmap_area_lazy(void)
*/
static void purge_vmap_area_lazy(void)
{
- unsigned long start = ULONG_MAX, end = 0;
-
- __purge_vmap_area_lazy(&start, &end, 1, 0);
+ spin_lock(&vmap_purge_lock);
+ purge_fragmented_blocks_allcpus();
+ __purge_vmap_area_lazy(ULONG_MAX, 0);
+ spin_unlock(&vmap_purge_lock);
}

/*
@@ -1075,7 +1061,11 @@ void vm_unmap_aliases(void)
rcu_read_unlock();
}

- __purge_vmap_area_lazy(&start, &end, 1, flush);
+ spin_lock(&vmap_purge_lock);
+ purge_fragmented_blocks_allcpus();
+ if (!__purge_vmap_area_lazy(start, end) && flush)
+ flush_tlb_kernel_range(start, end);
+ spin_unlock(&vmap_purge_lock);
}
EXPORT_SYMBOL_GPL(vm_unmap_aliases);

--
2.1.4

2016-10-22 15:18:20

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 5/7] mm: mark all calls into the vmalloc subsystem as potentially sleeping

We will take a sleeping lock in later in this series, so this adds the
proper safeguards.

Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Jisheng Zhang <[email protected]>
---
mm/vmalloc.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index bcc1a64..0e7f523 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -365,7 +365,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
BUG_ON(offset_in_page(size));
BUG_ON(!is_power_of_2(align));

- might_sleep_if(gfpflags_allow_blocking(gfp_mask));
+ might_sleep();

va = kmalloc_node(sizeof(struct vmap_area),
gfp_mask & GFP_RECLAIM_MASK, node);
@@ -1037,6 +1037,8 @@ void vm_unmap_aliases(void)
if (unlikely(!vmap_initialized))
return;

+ might_sleep();
+
for_each_possible_cpu(cpu) {
struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
struct vmap_block *vb;
@@ -1080,6 +1082,7 @@ void vm_unmap_ram(const void *mem, unsigned int count)
unsigned long addr = (unsigned long)mem;
struct vmap_area *va;

+ might_sleep();
BUG_ON(!addr);
BUG_ON(addr < VMALLOC_START);
BUG_ON(addr > VMALLOC_END);
@@ -1431,6 +1434,8 @@ struct vm_struct *remove_vm_area(const void *addr)
{
struct vmap_area *va;

+ might_sleep();
+
va = find_vmap_area((unsigned long)addr);
if (va && va->flags & VM_VM_AREA) {
struct vm_struct *vm = va->vm;
--
2.1.4

2016-10-22 15:18:57

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 1/7] mm: remove free_unmap_vmap_area_noflush

Just inline it into the only caller.

Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Jisheng Zhang <[email protected]>
---
mm/vmalloc.c | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f2481cb..45de736 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -711,22 +711,13 @@ static void free_vmap_area_noflush(struct vmap_area *va)
}

/*
- * Free and unmap a vmap area, caller ensuring flush_cache_vunmap had been
- * called for the correct range previously.
- */
-static void free_unmap_vmap_area_noflush(struct vmap_area *va)
-{
- unmap_vmap_area(va);
- free_vmap_area_noflush(va);
-}
-
-/*
* Free and unmap a vmap area
*/
static void free_unmap_vmap_area(struct vmap_area *va)
{
flush_cache_vunmap(va->va_start, va->va_end);
- free_unmap_vmap_area_noflush(va);
+ unmap_vmap_area(va);
+ free_vmap_area_noflush(va);
}

static struct vmap_area *find_vmap_area(unsigned long addr)
--
2.1.4

2016-10-25 05:30:22

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH 4/7] mm: defer vmalloc from atomic context



On 10/22/2016 06:17 PM, Christoph Hellwig wrote:
> We want to be able to use a sleeping lock for freeing vmap to keep
> latency down. For this we need to use the deferred vfree mechanisms
> no only from interrupt, but from any atomic context.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> mm/vmalloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index a4e2cec..bcc1a64 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1509,7 +1509,7 @@ void vfree(const void *addr)
>
> if (!addr)
> return;
> - if (unlikely(in_interrupt())) {
> + if (unlikely(in_atomic())) {

in_atomic() cannot always detect atomic context, thus it shouldn't be used here.
You can add something like vfree_in_atomic() and use it in atomic call sites.

> struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
> if (llist_add((struct llist_node *)addr, &p->list))
> schedule_work(&p->wq);
>

2016-11-05 03:43:20

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 4/7] mm: defer vmalloc from atomic context

On Mon, Oct 24, 2016 at 8:44 AM, Andrey Ryabinin
<[email protected]> wrote:
>
>
> On 10/22/2016 06:17 PM, Christoph Hellwig wrote:
>> We want to be able to use a sleeping lock for freeing vmap to keep
>> latency down. For this we need to use the deferred vfree mechanisms
>> no only from interrupt, but from any atomic context.
>>
>> Signed-off-by: Christoph Hellwig <[email protected]>
>> ---
>> mm/vmalloc.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index a4e2cec..bcc1a64 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -1509,7 +1509,7 @@ void vfree(const void *addr)
>>
>> if (!addr)
>> return;
>> - if (unlikely(in_interrupt())) {
>> + if (unlikely(in_atomic())) {
>
> in_atomic() cannot always detect atomic context, thus it shouldn't be used here.
> You can add something like vfree_in_atomic() and use it in atomic call sites.
>

So because in_atomic doesn't work for !CONFIG_PREEMPT kernels, can we
always defer the work in these cases?

So for non-preemptible kernels, we always defer:

if (!IS_ENABLED(CONFIG_PREEMPT) || in_atomic()) {
// defer
}

Is this fine? Or any other ideas?

Thanks,
Joel

2016-11-07 15:09:51

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/7] mm: defer vmalloc from atomic context

On Mon, Nov 07, 2016 at 06:01:45PM +0300, Andrey Ryabinin wrote:
> > So because in_atomic doesn't work for !CONFIG_PREEMPT kernels, can we
> > always defer the work in these cases?
> >
> > So for non-preemptible kernels, we always defer:
> >
> > if (!IS_ENABLED(CONFIG_PREEMPT) || in_atomic()) {
> > // defer
> > }
> >
> > Is this fine? Or any other ideas?
> >
>
> What's wrong with my idea?
> We can add vfree_in_atomic() and use it to free vmapped stacks
> and for any other places where vfree() used 'in_atomict() && !in_interrupt()' context.

I somehow missed the mail, sorry. That beeing said always defer is
going to suck badly in terms of performance, so I'm not sure it's an all
that good idea.

vfree_in_atomic sounds good, but I wonder if we'll need to annotate
more callers than just the stacks. I'm fairly bust this week, do you
want to give that a spin? Otherwise I'll give it a try towards the
end of this week or next week.

2016-11-07 16:50:51

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 4/7] mm: defer vmalloc from atomic context

On Mon, Nov 7, 2016 at 7:01 AM, Andrey Ryabinin <[email protected]> wrote:
> On 11/05/2016 06:43 AM, Joel Fernandes wrote:
>> On Mon, Oct 24, 2016 at 8:44 AM, Andrey Ryabinin
>> <[email protected]> wrote:
>>>
>>>
>>> On 10/22/2016 06:17 PM, Christoph Hellwig wrote:
>>>> We want to be able to use a sleeping lock for freeing vmap to keep
>>>> latency down. For this we need to use the deferred vfree mechanisms
>>>> no only from interrupt, but from any atomic context.
>>>>
>>>> Signed-off-by: Christoph Hellwig <[email protected]>
>>>> ---
>>>> mm/vmalloc.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>>> index a4e2cec..bcc1a64 100644
>>>> --- a/mm/vmalloc.c
>>>> +++ b/mm/vmalloc.c
>>>> @@ -1509,7 +1509,7 @@ void vfree(const void *addr)
>>>>
>>>> if (!addr)
>>>> return;
>>>> - if (unlikely(in_interrupt())) {
>>>> + if (unlikely(in_atomic())) {
>>>
>>> in_atomic() cannot always detect atomic context, thus it shouldn't be used here.
>>> You can add something like vfree_in_atomic() and use it in atomic call sites.
>>>
>>
>> So because in_atomic doesn't work for !CONFIG_PREEMPT kernels, can we
>> always defer the work in these cases?
>>
>> So for non-preemptible kernels, we always defer:
>>
>> if (!IS_ENABLED(CONFIG_PREEMPT) || in_atomic()) {
>> // defer
>> }
>>
>> Is this fine? Or any other ideas?
>>
>
> What's wrong with my idea?
> We can add vfree_in_atomic() and use it to free vmapped stacks
> and for any other places where vfree() used 'in_atomict() && !in_interrupt()' context.

Yes, this sounds like a better idea as there may not be that many
callers and my idea may hurt perf.

Thanks,

Joel

2016-11-07 18:35:00

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH 4/7] mm: defer vmalloc from atomic context

On 11/05/2016 06:43 AM, Joel Fernandes wrote:
> On Mon, Oct 24, 2016 at 8:44 AM, Andrey Ryabinin
> <[email protected]> wrote:
>>
>>
>> On 10/22/2016 06:17 PM, Christoph Hellwig wrote:
>>> We want to be able to use a sleeping lock for freeing vmap to keep
>>> latency down. For this we need to use the deferred vfree mechanisms
>>> no only from interrupt, but from any atomic context.
>>>
>>> Signed-off-by: Christoph Hellwig <[email protected]>
>>> ---
>>> mm/vmalloc.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index a4e2cec..bcc1a64 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -1509,7 +1509,7 @@ void vfree(const void *addr)
>>>
>>> if (!addr)
>>> return;
>>> - if (unlikely(in_interrupt())) {
>>> + if (unlikely(in_atomic())) {
>>
>> in_atomic() cannot always detect atomic context, thus it shouldn't be used here.
>> You can add something like vfree_in_atomic() and use it in atomic call sites.
>>
>
> So because in_atomic doesn't work for !CONFIG_PREEMPT kernels, can we
> always defer the work in these cases?
>
> So for non-preemptible kernels, we always defer:
>
> if (!IS_ENABLED(CONFIG_PREEMPT) || in_atomic()) {
> // defer
> }
>
> Is this fine? Or any other ideas?
>

What's wrong with my idea?
We can add vfree_in_atomic() and use it to free vmapped stacks
and for any other places where vfree() used 'in_atomict() && !in_interrupt()' context.

2016-11-08 15:21:53

by Andrey Ryabinin

[permalink] [raw]
Subject: [PATCH 1/3] mm/vmalloc: add vfree_atomic()

We are going to use sleeping lock for freeing vmap. However some
vfree() users want to free memory from atomic (but not from interrupt)
context. For this we add vfree_atomic() - deferred variation of vfree()
which can be used in any atomic context (except NMIs).

Signed-off-by: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jisheng Zhang <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: John Dias <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
---
include/linux/vmalloc.h | 1 +
mm/vmalloc.c | 36 ++++++++++++++++++++++++++++++------
2 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 3d9d786..d68edff 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -82,6 +82,7 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
const void *caller);

extern void vfree(const void *addr);
+extern void vfree_atomic(const void *addr);

extern void *vmap(struct page **pages, unsigned int count,
unsigned long flags, pgprot_t prot);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 719ced3..b0edc67 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1471,7 +1471,33 @@ static void __vunmap(const void *addr, int deallocate_pages)
kfree(area);
return;
}
-
+
+static inline void __vfree_deferred(const void *addr)
+{
+ struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
+
+ if (llist_add((struct llist_node *)addr, &p->list))
+ schedule_work(&p->wq);
+}
+
+/**
+ * vfree_atomic - release memory allocated by vmalloc()
+ * @addr: memory base address
+ *
+ * This one is just like vfree() but can be called in any atomic context
+ * except NMIs.
+ */
+void vfree_atomic(const void *addr)
+{
+ BUG_ON(in_nmi());
+
+ kmemleak_free(addr);
+
+ if (!addr)
+ return;
+ __vfree_deferred(addr);
+}
+
/**
* vfree - release memory allocated by vmalloc()
* @addr: memory base address
@@ -1494,11 +1520,9 @@ void vfree(const void *addr)

if (!addr)
return;
- if (unlikely(in_interrupt())) {
- struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
- if (llist_add((struct llist_node *)addr, &p->list))
- schedule_work(&p->wq);
- } else
+ if (unlikely(in_interrupt()))
+ __vfree_deferred(addr);
+ else
__vunmap(addr, 1);
}
EXPORT_SYMBOL(vfree);
--
2.7.3

2016-11-08 15:21:57

by Andrey Ryabinin

[permalink] [raw]
Subject: [PATCH 2/3] kernel/fork: use vfree_atomic() to free thread stack

vfree() is going to use sleeping lock. Thread stack freed in atomic
context, therefore we must use vfree_atomic() here.

Signed-off-by: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jisheng Zhang <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: John Dias <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
---
kernel/fork.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index fd85c68..417e94f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -229,7 +229,7 @@ static inline void free_thread_stack(struct task_struct *tsk)
}
local_irq_restore(flags);

- vfree(tsk->stack);
+ vfree_atomic(tsk->stack);
return;
}
#endif
--
2.7.3

2016-11-08 15:22:57

by Andrey Ryabinin

[permalink] [raw]
Subject: [PATCH 3/3] x86/ldt: use vfree_atomic() to free ldt entries

vfree() is going to use sleeping lock. free_ldt_struct()
may be called with disabled preemption, therefore we must
use vfree_atomic() here.

E.g. call trace:
vfree()
free_ldt_struct()
destroy_context_ldt()
__mmdrop()
finish_task_switch()
schedule_tail()
ret_from_fork()

Signed-off-by: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jisheng Zhang <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: John Dias <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
---
arch/x86/kernel/ldt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 6707039..4d12cdf 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -93,7 +93,7 @@ static void free_ldt_struct(struct ldt_struct *ldt)

paravirt_free_ldt(ldt->entries, ldt->size);
if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
- vfree(ldt->entries);
+ vfree_atomic(ldt->entries);
else
free_page((unsigned long)ldt->entries);
kfree(ldt);
--
2.7.3

2016-11-08 17:36:49

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [PATCH 4/7] mm: defer vmalloc from atomic context



On 11/07/2016 06:09 PM, Christoph Hellwig wrote:
> On Mon, Nov 07, 2016 at 06:01:45PM +0300, Andrey Ryabinin wrote:
>>> So because in_atomic doesn't work for !CONFIG_PREEMPT kernels, can we
>>> always defer the work in these cases?
>>>
>>> So for non-preemptible kernels, we always defer:
>>>
>>> if (!IS_ENABLED(CONFIG_PREEMPT) || in_atomic()) {
>>> // defer
>>> }
>>>
>>> Is this fine? Or any other ideas?
>>>
>>
>> What's wrong with my idea?
>> We can add vfree_in_atomic() and use it to free vmapped stacks
>> and for any other places where vfree() used 'in_atomict() && !in_interrupt()' context.
>
> I somehow missed the mail, sorry. That beeing said always defer is
> going to suck badly in terms of performance, so I'm not sure it's an all
> that good idea.
>
> vfree_in_atomic sounds good, but I wonder if we'll need to annotate
> more callers than just the stacks. I'm fairly bust this week, do you
> want to give that a spin? Otherwise I'll give it a try towards the
> end of this week or next week.
>

Yeah, it appears that we need more annotations. I've found another case in free_ldt_struct(),
and I bet it won't be the last.
I'll send patches.