Date: Mon, 6 Apr 2015 12:23:33 -0500
From: Chris J Arges <chris.j.arges@canonical.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>, Rafael David Tinoco <inaddy@ubuntu.com>,
        Peter Anvin <hpa@zytor.com>, Jiang Liu <jiang.liu@linux.intel.com>,
        Peter Zijlstra <peterz@infradead.org>,
        LKML <linux-kernel@vger.kernel.org>, Jens Axboe <axboe@kernel.dk>,
        Frederic Weisbecker <fweisbec@gmail.com>,
        Gema Gomez <gema.gomez-solano@canonical.com>,
        the arch/x86 maintainers <x86@kernel.org>
Subject: Re: smp_call_function_single lockups
Message-ID: <20150406172332.GA14555@canonical.com>
References: <CA+55aFxd1WGNBzSHeOGiXXdUD1GqDYv9PUNGdrdiGFwaX7HYJQ@mail.gmail.com>
 <CA+55aFxFkw7cKu6R8-v9z=c+yG+jsPHyQKW5-yyn3+M0BuyvxA@mail.gmail.com>
 <20150331031536.GA9303@canonical.com>
 <CA+55aFykg3SAO16=NRiC+tP1gGj5hgbu+Y93ss4Qg30+qyZ=+w@mail.gmail.com>
 <20150331222327.GA12512@canonical.com>
 <20150401124336.GB12841@gmail.com>
 <20150401161047.GD12730@canonical.com>
 <CA+55aFxQ6q7MNS+4XWZ3=Xa0Hz6kumd84v_aEw3M4gBpXszTkQ@mail.gmail.com>
 <551C6A48.9060805@canonical.com>
 <CA+55aFw2Jb4ASOxckY1cwP23fAYv5dG1WYCkB6RyjjpP2hEQcw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CA+55aFw2Jb4ASOxckY1cwP23fAYv5dG1WYCkB6RyjjpP2hEQcw@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3543
Lines: 80

On Thu, Apr 02, 2015 at 10:31:50AM -0700, Linus Torvalds wrote:
> On Wed, Apr 1, 2015 at 2:59 PM, Chris J Arges
> <chris.j.arges@canonical.com> wrote:
> >
> > It is worthwhile to do a 'bisect' to see where on average it takes
> > longer to reproduce? Perhaps it will point to a relevant change, or it
> > may be completely useless.
> 
> It's likely to be an exercise in futility. "git bisect" is realyl bad
> at "gray area" things, and when it's a question of "it takes hours or
> days to reproduce", it's almost certainly not worth it. Not unless
> there is some really clear cut-off that we can believably say "this
> causes it to get much slower". And in this case, I don't think it's
> that clear-cut. Judging by DaveJ's attempts at bisecting things, the
> timing just changes. And the differences might be due to entirely
> unrelated changes like cacheline alignment etc.
> 
> So unless we find a real clear signature of the bug (I was hoping that
> the ISR bit would be that sign), I don't think trying to bisect it
> based on how quickly you can reproduce things is worthwhile.
> 
>                         Linus
>

Linus, Ingo,

I did some testing and found that at the following patch level, the issue was
much, much more likely to reproduce within < 15m.

commit b6b8a1451fc40412c57d10c94b62e22acab28f94
Author: Jan Kiszka <jan.kiszka@siemens.com>
Date:   Fri Mar 7 20:03:12 2014 +0100

    KVM: nVMX: Rework interception of IRQs and NMIs

    Move the check for leaving L2 on pending and intercepted IRQs or NMIs
    from the *_allowed handler into a dedicated callback. Invoke this
    callback at the relevant points before KVM checks if IRQs/NMIs can be
    injected. The callback has the task to switch from L2 to L1 if needed
    and inject the proper vmexit events.

    The rework fixes L2 wakeups from HLT and provides the foundation for
    preemption timer emulation.

However, when the following patch was applied the average time to reproduction
goes down greatly (the stress reproducer ran for hours without issue):

commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3
Author: Bandan Das <bsd@redhat.com>
Date:   Tue Jul 8 00:30:23 2014 -0400

    KVM: x86: Check for nested events if there is an injectable interrupt

    With commit b6b8a1451fc40412c57d1 that introduced
    vmx_check_nested_events, checks for injectable interrupts happen
    at different points in time for L1 and L2 that could potentially
    cause a race. The regression occurs because KVM_REQ_EVENT is always
    set when nested_run_pending is set even if there's no pending interrupt.
    Consequently, there could be a small window when check_nested_events
    returns without exiting to L1, but an interrupt comes through soon
    after and it incorrectly, gets injected to L2 by inject_pending_event
    Fix this by adding a call to check for nested events too when a check
    for injectable interrupt returns true

However we reproduced with v3.19 (containing these two patches) which did
eventually softlockup with a similar backtrace.

So far, this agrees with the current understanding that we may be not ACK'ing
certain interrupts (IPIs from the L1 guest) causing csd_lock_wait to spin and
causing the soft lockup.

Hopefully this helps shed more light on this issue.

Thanks,
--chris j arges 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/