2017-11-23 16:00:40

by Radim Krčmář

[permalink] [raw]
Subject: Re: VMs freezing when host is running 4.14

2017-11-23 16:20+0100, Marc Haber:
> On Wed, Nov 22, 2017 at 05:43:13PM +0100, Radim Krčmář wrote:
> > 2017-11-22 16:52+0100, Marc Haber:
> > > On Wed, Nov 22, 2017 at 04:04:42PM +0100, 王金浦 wrote:
> > > > So all guest kernels are 4.14, or also other older kernel?
> > >
> > > Guest kernels are also 4.14, but the issue disappears when the host is
> > > downgraded to an older kernel. I therefore reckoned that the guest
> > > kernel doesn't matter, but that was before I saw the trace in the log.
> >
> > The two most suspicious patches since 4.13 (which I assume works) are
> >
> > 664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet been
> > injected")
>
> That one does not revert cleanly, the line in questions seems to have
> been removed a bit later.
>
> Reject is:
> 141 [24/5001]mh@fan:~/linux/git/linux ((v4.14.1) %) $ cat arch/x86/kvm/vmx.c.rej--- arch/x86/kvm/vmx.c
> +++ arch/x86/kvm/vmx.c
> @@ -2516,7 +2516,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> unsigned nr = vcpu->arch.exception.nr;
> bool has_error_code = vcpu->arch.exception.has_error_code;
> - bool reinject = vcpu->arch.exception.injected;
> + bool reinject = vcpu->arch.exception.reinject;
> u32 error_code = vcpu->arch.exception.error_code;
> u32 intr_info = nr | INTR_INFO_VALID_MASK;

This line one can be deleted as reinject isn't used in the function.

Btw. there have been already many fixes from Liran Alon for that patch
and your case could be the one adressed in
https://www.spinics.net/lists/kvm/msg159158.html

The patch is incorrect, but you might be able to see only its benefits.

> > and
> >
> > 9a6e7c39810e ("KVM: async_pf: Fix #DF due to inject "Page not Present"
> > and "Page Ready" exceptions simultaneously")
> >
> > please try reverting them to see if it helps,
>
> That one reverted cleanly. I am now running the new kernel on the
> affected machine, and I think that a second machine has joined the
> market of being affected.

That one had much lower chances of being the culprit.

> Would this matter on the host only or on the guests as well?

Only on the host.

Thanks.

From 1584870736775760447@xxx Thu Nov 23 15:21:56 +0000 2017
X-GM-THRID: 1584693144330950355
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread