The purge_lock spinlock causes high latencies with non RT kernel. This has been
reported multiple times on lkml [1] [2] and affects applications like audio.
In this patch, I replace the spinlock with an atomic refcount so that
preemption is kept turned on during purge. This Ok to do since [3] builds the
lazy free list in advance and atomically retrieves the list so any instance of
purge will have its own list it is purging. Since the individual vmap area
frees are themselves protected by a lock, this is Ok.
The only thing left is the fact that previously it had trylock behavior, so
preserve that by using the refcount to keep track of in-progress purges and
abort like previously if there is an ongoing purge. Lastly, decrement
vmap_lazy_nr as the vmap areas are freed, and not in advance, so that the
vmap_lazy_nr is not reduced too soon as suggested in [2].
Tests:
x86_64 quad core machine on kernel 4.8, run: cyclictest -p 99 -n
Concurrently in a kernel module, run vmalloc and vfree 8K buffer in a loop.
Preemption configuration: CONFIG_PREEMPT__LL=y (low-latency desktop)
Without patch, cyclictest output:
policy: fifo: loadavg: 0.05 0.01 0.00 1/85 1272 Avg: 128 Max: 1177
policy: fifo: loadavg: 0.11 0.03 0.01 2/87 1447 Avg: 122 Max: 1897
policy: fifo: loadavg: 0.10 0.03 0.01 1/89 1656 Avg: 93 Max: 2886
With patch, cyclictest output:
policy: fifo: loadavg: 1.15 0.68 0.30 1/162 8399 Avg: 92 Max: 284
policy: fifo: loadavg: 1.21 0.71 0.32 2/164 9840 Avg: 94 Max: 296
policy: fifo: loadavg: 1.18 0.72 0.32 2/166 11253 Avg: 107 Max: 321
[1] http://lists.openwall.net/linux-kernel/2016/03/23/29
[2] https://lkml.org/lkml/2016/10/9/59
[3] https://lkml.org/lkml/2016/4/15/287
[thanks Chris for the cond_resched_lock ideas]
Cc: Chris Wilson <[email protected]>
Cc: Jisheng Zhang <[email protected]>
Cc: John Dias <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
---
v2 changes:
Updated test description in commit message, and typos.
mm/vmalloc.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 613d1d9..ab25966 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -626,11 +626,11 @@ void set_iounmap_nonlazy(void)
static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end,
int sync, int force_flush)
{
- static DEFINE_SPINLOCK(purge_lock);
+ static atomic_t purging;
struct llist_node *valist;
struct vmap_area *va;
struct vmap_area *n_va;
- int nr = 0;
+ int dofree = 0;
/*
* If sync is 0 but force_flush is 1, we'll go sync anyway but callers
@@ -638,10 +638,10 @@ static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end,
* the case that isn't actually used at the moment anyway.
*/
if (!sync && !force_flush) {
- if (!spin_trylock(&purge_lock))
+ if (atomic_cmpxchg(&purging, 0, 1))
return;
} else
- spin_lock(&purge_lock);
+ atomic_inc(&purging);
if (sync)
purge_fragmented_blocks_allcpus();
@@ -652,22 +652,23 @@ static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end,
*start = va->va_start;
if (va->va_end > *end)
*end = va->va_end;
- nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
+ dofree = 1;
}
- if (nr)
- atomic_sub(nr, &vmap_lazy_nr);
-
- if (nr || force_flush)
+ if (dofree || force_flush)
flush_tlb_kernel_range(*start, *end);
- if (nr) {
+ if (dofree) {
spin_lock(&vmap_area_lock);
- llist_for_each_entry_safe(va, n_va, valist, purge_list)
+ llist_for_each_entry_safe(va, n_va, valist, purge_list) {
__free_vmap_area(va);
+ atomic_sub(((va->va_end - va->va_start) >> PAGE_SHIFT),
+ &vmap_lazy_nr);
+ cond_resched_lock(&vmap_area_lock);
+ }
spin_unlock(&vmap_area_lock);
}
- spin_unlock(&purge_lock);
+ atomic_dec(&purging);
}
/*
--
2.8.0.rc3.226.g39d4020
FYI, we noticed the following commit:
https://github.com/0day-ci/linux Joel-Fernandes/mm-vmalloc-Replace-purge_lock-spinlock-with-atomic-refcount/20161015-184633
commit b1f58b69ba586afbe70d2f3931370fc1f701d1c2 ("mm: vmalloc: Replace purge_lock spinlock with atomic refcount")
in testcase: boot
on test machine: qemu-system-x86_64 -enable-kvm -smp 2 -m 512M
caused below changes:
+----------------------------------------------------------------------------+------------+------------+
| | fe4cd888f7 | b1f58b69ba |
+----------------------------------------------------------------------------+------------+------------+
| boot_successes | 8 | 5 |
| boot_failures | 14 | 17 |
| Kernel_panic-not_syncing:VFS:Unable_to_mount_root_fs_on_unknown-block(#,#) | 2 | |
| calltrace:prepare_namespace | 2 | |
| BUG:kernel_reboot-without-warning_in_test_stage | 12 | 3 |
| BUG:sleeping_function_called_from_invalid_context_at_mm/vmalloc.c | 0 | 14 |
| invoked_oom-killer:gfp_mask=0x | 0 | 1 |
| Mem-Info | 0 | 1 |
+----------------------------------------------------------------------------+------------+------------+
[init] Using pid_max = 32768
[init] Started watchdog process, PID is 9385
[main] Main thread is alive.
[ 13.986544] BUG: sleeping function called from invalid context at mm/vmalloc.c:667
[ 13.988089] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[ 13.989281] 1 lock held by swapper/0/0:
[ 13.989935] #0:
[ 13.990281] (
vmap_area_lock
[ 13.990772] ){......}
, at:
[ 13.991264] [<ffffffff81112aec>] __purge_vmap_area_lazy+0x28e/0x306
[ 13.992284] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-mm1-00158-gb1f58b6 #1
[ 13.993297] ffffffff81c03d08 ffffffff8131753a ffffffff81c0b540 ffff880014581360
[ 13.994371] ffffffff81c03d20 ffffffff8108d64c ffff880014581120 ffffffff81c03d88
[ 13.995439] ffffffff81112b31 ffffffff811124e6 ffffffff81c03db0 0000000081112723
[ 13.996570] Call Trace:
[ 13.996933] [<ffffffff8131753a>] dump_stack+0x79/0x95
[ 13.997631] [<ffffffff8108d64c>] ___might_sleep+0x110/0x11f
[ 13.998395] [<ffffffff81112b31>] __purge_vmap_area_lazy+0x2d3/0x306
[ 13.999242] [<ffffffff811124e6>] ? pmd_offset+0x19/0x49
[ 14.000008] [<ffffffff81112cd4>] free_vmap_area_noflush+0x6d/0x72
[ 14.000834] [<ffffffff81113e43>] remove_vm_area+0x84/0x90
[ 14.001575] [<ffffffff81113e85>] __vunmap+0x36/0xae
[ 14.002247] [<ffffffff81113f88>] vfree+0x5f/0x62
[ 14.002885] [<ffffffff8106f177>] free_thread_stack+0x54/0xcb
[ 14.003656] [<ffffffff8106f5bd>] put_task_stack+0x2a/0x4f
[ 14.004451] [<ffffffff8108cfa1>] finish_task_switch+0x188/0x1b4
[ 14.005278] [<ffffffff816bf338>] __schedule+0x3c8/0x4a7
[ 14.006014] [<ffffffff816bf49b>] schedule+0x84/0x95
[ 14.006676] [<ffffffff816bf6d4>] schedule_preempt_disabled+0x10/0x19
[ 14.007544] [<ffffffff8109cce9>] cpu_startup_entry+0x1c7/0x1cc
[ 14.008386] [<ffffffff816ba0dc>] rest_init+0xb3/0xb9
[ 14.009086] [<ffffffff81f0bf25>] start_kernel+0x43a/0x447
[ 14.009812] [<ffffffff81f0b120>] ? early_idt_handler_array+0x120/0x120
[ 14.010702] [<ffffffff81f0b2b3>] x86_64_start_reservations+0x38/0x3a
[ 14.011564] [<ffffffff81f0b3bf>] x86_64_start_kernel+0x10a/0x117
[main] Setsockopt(1 6 16ba000 5a) on fd 7 [1:2:1]
[main] Setsockopt(1 9 16ba000 e3) on fd 9 [16:3:2]
[main] Setsockopt(1 2d 16ba000 4) on fd 10 [1:1:1]
Thanks,
Xiaolong
On Sat, 15 Oct 2016 03:42:42 -0700
Joel Fernandes <[email protected]> wrote:
> The purge_lock spinlock causes high latencies with non RT kernel. This has been
> reported multiple times on lkml [1] [2] and affects applications like audio.
>
> In this patch, I replace the spinlock with an atomic refcount so that
> preemption is kept turned on during purge. This Ok to do since [3] builds the
> lazy free list in advance and atomically retrieves the list so any instance of
> purge will have its own list it is purging. Since the individual vmap area
> frees are themselves protected by a lock, this is Ok.
This is a good idea, and good results, but that's not what the spinlock was
for -- it was for enforcing the sync semantics.
Going this route, you'll have to audit callers to expect changed behavior
and change documentation of sync parameter.
I suspect a better approach would be to instead use a mutex for this, and
require that all sync=1 callers be able to sleep. I would say that most
probably already can.
Thanks,
Nick
Hi Nick,
On Sun, Oct 16, 2016 at 9:00 PM, Nicholas Piggin <[email protected]> wrote:
> On Sat, 15 Oct 2016 03:42:42 -0700
> Joel Fernandes <[email protected]> wrote:
>
>> The purge_lock spinlock causes high latencies with non RT kernel. This has been
>> reported multiple times on lkml [1] [2] and affects applications like audio.
>>
>> In this patch, I replace the spinlock with an atomic refcount so that
>> preemption is kept turned on during purge. This Ok to do since [3] builds the
>> lazy free list in advance and atomically retrieves the list so any instance of
>> purge will have its own list it is purging. Since the individual vmap area
>> frees are themselves protected by a lock, this is Ok.
>
> This is a good idea, and good results, but that's not what the spinlock was
> for -- it was for enforcing the sync semantics.
>
> Going this route, you'll have to audit callers to expect changed behavior
> and change documentation of sync parameter.
>
> I suspect a better approach would be to instead use a mutex for this, and
> require that all sync=1 callers be able to sleep. I would say that most
> probably already can.
Thanks, I agree mutex is the right way to fix this.
Regards,
Joel