2012-02-16 23:15:38

by Dave Jones

[permalink] [raw]
Subject: soft lockup detector & virtualisation

Lately I've noticed quite a few soft lockup bugs being reported.
In many of them, they're coming from inside virtual guests.

Is the softlockup detector fundamentally broken in this situation ?

If the host doesn't schedule the guest for whatever reason,
or the user suspends the VM and resumes it later ?

Here's the most recent example:
https://bugzilla.redhat.com/attachment.cgi?id=563767

In many of these, the code where it's "stuck" isn't anything
special, which is why I think the guest just hasn't had a
timeslice in 185 seconds.

Is there some way we can perhaps detect we're running virtualised,
and disable the detector automatically ?

thoughts ?

Dave


2012-02-17 01:39:41

by john stultz

[permalink] [raw]
Subject: Re: soft lockup detector & virtualisation

On Thu, Feb 16, 2012 at 3:15 PM, Dave Jones <[email protected]> wrote:
> Lately I've noticed quite a few soft lockup bugs being reported.
> In many of them, they're coming from inside virtual guests.
>
> Is the softlockup detector fundamentally broken in this situation ?
>
> If the host doesn't schedule the guest for whatever reason,
> or the user suspends the VM and resumes it later ?
>
> Here's the most recent example:
> https://bugzilla.redhat.com/attachment.cgi?id=563767
>
> In many of these, the code where it's "stuck" isn't anything
> special, which is why I think the guest just hasn't had a
> timeslice in 185 seconds.
>
> Is there some way we can perhaps detect we're running virtualised,
> and disable the detector automatically ?

I think Eric's work (See "Add check for suspended vm in softlockup
detector" sent out today) tries to address this issue.

thanks
-john

2012-02-17 01:57:46

by Eric B Munson

[permalink] [raw]
Subject: Re: soft lockup detector & virtualisation

On Thu, 16 Feb 2012 17:39:38 -0800, john stultz wrote:
> On Thu, Feb 16, 2012 at 3:15 PM, Dave Jones <[email protected]> wrote:
>> Lately I've noticed quite a few soft lockup bugs being reported.
>> In many of them, they're coming from inside virtual guests.
>>
>> Is the softlockup detector fundamentally broken in this situation ?
>>
>> If the host doesn't schedule the guest for whatever reason,
>> or the user suspends the VM and resumes it later ?
>>
>> Here's the most recent example:
>> https://bugzilla.redhat.com/attachment.cgi?id=563767
>>
>> In many of these, the code where it's "stuck" isn't anything
>> special, which is why I think the guest just hasn't had a
>> timeslice in 185 seconds.
>>
>> Is there some way we can perhaps detect we're running virtualised,
>> and disable the detector automatically ?
>
> I think Eric's work (See "Add check for suspended vm in softlockup
> detector" sent out today) tries to address this issue.
>
> thanks
> -john


The work I have been doing specifically handles the case where the
hypervisor suspends the guest. There is talk of extending that work to
handle preemption as well, which I think will cover your use case.

Eric