Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752707AbdLNNQ1 convert rfc822-to-8bit (ORCPT ); Thu, 14 Dec 2017 08:16:27 -0500 Received: from szxga03-in.huawei.com ([45.249.212.189]:13371 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751925AbdLNNQW (ORCPT ); Thu, 14 Dec 2017 08:16:22 -0500 From: "Gonglei (Arei)" To: Liran Alon , "pbonzini@redhat.com" , "rkrcmar@redhat.com" CC: "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Huangweidong (C)" Subject: RE: [PATCH] KVM: x86: ioapic: Clear IRR for rtc bit when rtc EOI gotten Thread-Topic: [PATCH] KVM: x86: ioapic: Clear IRR for rtc bit when rtc EOI gotten Thread-Index: AQHTdNZie0roWTd+6kq5s0UBXx5la6NCR5YAgACIPmA= Date: Thu, 14 Dec 2017 13:15:56 +0000 Message-ID: <33183CC9F5247A488A2544077AF19020DA4EA2C4@DGGEMA505-MBX.china.huawei.com> References: <1513254206-25344-1-git-send-email-arei.gonglei@huawei.com> <5A327636.3050307@ORACLE.COM> In-Reply-To: <5A327636.3050307@ORACLE.COM> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.177.18.62] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090205.5A327997.0018,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=169.254.1.136, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 3e8e85315b16cfc3ee6e19c9382e08a2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4082 Lines: 108 > -----Original Message----- > From: Liran Alon [mailto:LIRAN.ALON@ORACLE.COM] > Sent: Thursday, December 14, 2017 9:02 PM > To: Gonglei (Arei); pbonzini@redhat.com; rkrcmar@redhat.com > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org; Huangweidong (C) > Subject: Re: [PATCH] KVM: x86: ioapic: Clear IRR for rtc bit when rtc EOI gotten > > > > On 14/12/17 14:23, Gonglei wrote: > > We hit a bug in our test while run PCMark 10 in a windows 7 VM, > > The VM got stuck and the wallclock was hang after several minutes running > > PCMark 10 in it. > > It is quite easily to reproduce the bug with the upstream KVM and Qemu. > > > > We found that KVM can not inject any RTC irq to VM after it was hang, it fails > to > > Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr. > > > > static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, > > int irq_level, bool line_status) > > { > > ... > > if (!irq_level) { > > ioapic->irr &= ~mask; > > ret = 1; > > goto out; > > } > > ... > > if ((edge && old_irr == ioapic->irr) || > > (!edge && entry.fields.remote_irr)) { > > ret = 0; > > goto out; > > } > > > > According to RTC spec, after RTC injects a High level irq, OS will read CMOS's > > register C to to clear the irq flag, and pull down the irq electric pin. > > > > For Qemu, we will emulate the reading operation in cmos_ioport_read(), > > but Guest OS will fire a write operation before to tell which register will be > read > > after this write, where we use s->cmos_index to record the following register > to read. > > > > But in our test, we found that there is a possible situation that Vcpu fails to > read > > RTC_REG_C to clear irq, This could happens while two VCpus are > writing/reading > > registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C, > > so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C, > > but before it tries to read register C, another vcpu1 is going to read > RTC_YEAR, > > it changes s->cmos_index to RTC_YEAR by a writing action. > > The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we > will miss > > calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never inject > RTC irq, > > and Windows VM will hang. > > If I understood correctly, this looks to me like a race-condition bug in > the Windows guest kernel. In real-hardware this race-condition will also > cause the RTC_YEAR to be read instead of RTC_REG_C. > Guest kernel should make sure that 2 CPUs does not attempt to read a > CMOS register in parallel as they can override each other's cmos_index. > > See for example how Linux kernel makes sure to avoid such kind of issues > in rtc_cmos_read() (arch/x86/kernel/rtc.c) by grabbing a cmos_lock. > Yes, I knew that. > > > > Let's clear IRR of rtc when corresponding EOI is gotten to avoid the issue. > > Can you elaborate a bit more why it makes sense to put such workaround > in KVM code instead of declaring this as guest kernel bug? > I agree it's a Windows bug. The big problem is there is not problem on Xen platform. Thanks, -Gonglei > Regards, > -Liran > > > > > Suggested-by: Paolo Bonzini > > Signed-off-by: Gonglei > > --- > > Thanks to Paolo provides a good solution. :) > > > > arch/x86/kvm/ioapic.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c > > index 4e822ad..5022d63 100644 > > --- a/arch/x86/kvm/ioapic.c > > +++ b/arch/x86/kvm/ioapic.c > > @@ -160,6 +160,7 @@ static void rtc_irq_eoi(struct kvm_ioapic *ioapic, > struct kvm_vcpu *vcpu) > > { > > if (test_and_clear_bit(vcpu->vcpu_id, > > ioapic->rtc_status.dest_map.map)) { > > + ioapic->irr &= ~(1 << RTC_GSI); > > --ioapic->rtc_status.pending_eoi; > > rtc_status_pending_eoi_check_valid(ioapic); > > } > >