2020-09-15 19:26:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: Kernel Benchmarking

On Tue, Sep 15, 2020 at 12:01 PM Matthieu Baerts
<[email protected]> wrote:
>
> Earlier today, I got one trace with 'sysrq-T' but it is more than 1100
> lines. It is attached to this email also with a version from
> "decode_stacktrace.sh", I hope that's alright.

Yeah, there's nothing interesting there.

The only relevant tasks seem to be the packetdrill ones that are
blocked on the page lock. I don't see anything that looks even
*remotely* like it could be holding a page lock and be waiting for
anything else.

A couple of pipe readers, a number of parents waiting on their
children, one futex waiter, one select loop.. Nothing at all
unexpected or remotely suspicious.

The packetdrill ones look very similar.

> I forgot one important thing, I was on top of David Miller's net-next
> branch by reflex. I can redo the traces on top of linux-next if needed.

Not likely an issue.

I'll go stare at the page lock code again to see if I've missed
anything. I still suspect it's a latent ABBA deadlock that is just
much *much* easier to trigger with the synchronous lock handoff, but I
don't see where it is.

I guess this is all fairly theoretical since we apparently need to do
that hybrid "limited fairness" patch anyway, and it fixes your issue,
but I hate not understanding the problem.

Linus


2020-09-15 19:40:52

by Matthieu Baerts

[permalink] [raw]
Subject: Re: Kernel Benchmarking

On 15/09/2020 21:24, Linus Torvalds wrote:
> On Tue, Sep 15, 2020 at 12:01 PM Matthieu Baerts
> <[email protected]> wrote:
>
>> I forgot one important thing, I was on top of David Miller's net-next
>> branch by reflex. I can redo the traces on top of linux-next if needed.
>
> Not likely an issue.
>
> I'll go stare at the page lock code again to see if I've missed
> anything. I still suspect it's a latent ABBA deadlock that is just
> much *much* easier to trigger with the synchronous lock handoff, but I
> don't see where it is.
>
> I guess this is all fairly theoretical since we apparently need to do
> that hybrid "limited fairness" patch anyway, and it fixes your issue,
> but I hate not understanding the problem.

I understand :)
Thank you again for looking at this issue!

Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
http://www.tessares.net