2019-10-29 10:37:09

by Kevin Hao

[permalink] [raw]
Subject: [PATCH] dump_stack: Avoid the livelock of the dump_lock

In the current code, we uses the atomic_cmpxchg() to serialize the
output of the dump_stack(), but this implementation suffers the
thundering herd problem. We have observed such kind of livelock on a
Marvell cn96xx board(24 cpus) when heavily using the dump_stack() in
a kprobe handler. Actually we can use a spinlock here and leverage the
implementation of the spinlock(either ticket or queued spinlock) to
mediate such kind of livelock. Since the dump_stack() runs with the
irq disabled, so use the raw_spinlock_t to make it safe for rt kernel.

Signed-off-by: Kevin Hao <[email protected]>
---
lib/dump_stack.c | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/lib/dump_stack.c b/lib/dump_stack.c
index 5cff72f18c4a..fa971f75f1e2 100644
--- a/lib/dump_stack.c
+++ b/lib/dump_stack.c
@@ -83,37 +83,35 @@ static void __dump_stack(void)
* Architectures can override this implementation by implementing its own.
*/
#ifdef CONFIG_SMP
-static atomic_t dump_lock = ATOMIC_INIT(-1);
+static DEFINE_RAW_SPINLOCK(dump_lock);
+static int dump_cpu = -1;

asmlinkage __visible void dump_stack(void)
{
unsigned long flags;
int was_locked;
- int old;
int cpu;

/*
* Permit this cpu to perform nested stack dumps while serialising
* against other CPUs
*/
-retry:
local_irq_save(flags);
cpu = smp_processor_id();
- old = atomic_cmpxchg(&dump_lock, -1, cpu);
- if (old == -1) {
+
+ if (READ_ONCE(dump_cpu) != cpu) {
+ raw_spin_lock(&dump_lock);
+ dump_cpu = cpu;
was_locked = 0;
- } else if (old == cpu) {
+ } else
was_locked = 1;
- } else {
- local_irq_restore(flags);
- cpu_relax();
- goto retry;
- }

__dump_stack();

- if (!was_locked)
- atomic_set(&dump_lock, -1);
+ if (!was_locked) {
+ dump_cpu = -1;
+ raw_spin_unlock(&dump_lock);
+ }

local_irq_restore(flags);
}
--
2.14.4


2019-10-29 15:27:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] dump_stack: Avoid the livelock of the dump_lock

On Tue, Oct 29, 2019 at 3:09 PM Kevin Hao <[email protected]> wrote:
>
> Do you mean something like this?

Yeah, except I'd make it be something like

local_irq_restore(flags);
- cpu_relax();
+ do { cpu_relax(); } while (atomic_read(&dump_lock) != -1);
goto retry;

instead, ie doing the cpu_relax() inside that loop.

Obviously untested.

Linus

2019-10-29 19:42:13

by Kevin Hao

[permalink] [raw]
Subject: Re: [PATCH] dump_stack: Avoid the livelock of the dump_lock

On Tue, Oct 29, 2019 at 01:46:40PM +0100, Linus Torvalds wrote:
> Sorry for html crud, I'm at a conference without my laptop, so reading email on
> my phone..
>
> On Tue, Oct 29, 2019, 10:31 Kevin Hao <[email protected]> wrote:
>
> In the current code, we uses the atomic_cmpxchg() to serialize the
> output of the dump_stack(), but this implementation suffers the
> thundering herd problem. Actually we can use a spinlock here and leverage
> the
>
> implementation of the spinlock(either ticket or queued spinlock) to
> mediate such kind of livelock. Since the dump_stack() runs with the
> irq disabled, so use the raw_spinlock_t to make it safe for rt kernel.
>
>
> The problem with this is that I think it will deadlock easily with NMI's, won't
> it?
>
> There is a window between getting the spin lock and setting 'dump_cpu' where an
> incoming NMI would then deadlock because it also tries to get the lock, because
> it hasn't seen the fact that this cpu already got it.

Indeed, sorry I missed that.

>
> The cmpxchg is supposed to protect against that, and make the locking be atomic
> with the CPU setting.
>
> Can you try instead to make the retry loop not jump right back to the cmpxchg,
> but start looping just reading the CPU value, and only jumping back when it
> becomes -1?

Do you mean something like this?
diff --git a/lib/dump_stack.c b/lib/dump_stack.c
index 5cff72f18c4a..a0f1293a3638 100644
--- a/lib/dump_stack.c
+++ b/lib/dump_stack.c
@@ -107,6 +107,7 @@ asmlinkage __visible void dump_stack(void)
} else {
local_irq_restore(flags);
cpu_relax();
+ while (atomic_read(&dump_lock) != -1);
goto retry;
}

It does help. I will send a v2 if it pass more stress test.

Thanks,
Kevin

>
> ? ? ? Linus


Attachments:
(No filename) (1.78 kB)
signature.asc (499.00 B)
Download all attachments