Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752808AbdLNO06 (ORCPT ); Thu, 14 Dec 2017 09:26:58 -0500 Received: from mail-ot0-f195.google.com ([74.125.82.195]:41507 "EHLO mail-ot0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752338AbdLNO0z (ORCPT ); Thu, 14 Dec 2017 09:26:55 -0500 X-Google-Smtp-Source: ACJfBovVq1bE2eHbJJaNUNGTh2xSoMJUkbM/U84X/xpjO8CI2F59qZUQzdzmBN03rHhJrcjDUBLBHg== Subject: Re: [PATCH] KVM: x86: ioapic: Clear IRR for rtc bit when rtc EOI gotten To: "Gonglei (Arei)" , "rkrcmar@redhat.com" Cc: Liran Alon , "pbonzini@redhat.com" , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Huangweidong (C)" References: <1513254206-25344-1-git-send-email-arei.gonglei@huawei.com> <5A327636.3050307@ORACLE.COM> <33183CC9F5247A488A2544077AF19020DA4EA2C4@DGGEMA505-MBX.china.huawei.com> From: Quan Xu Message-ID: <5d545d25-cf15-a93b-f1a9-88e6775ded29@gmail.com> Date: Thu, 14 Dec 2017 22:26:48 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <33183CC9F5247A488A2544077AF19020DA4EA2C4@DGGEMA505-MBX.china.huawei.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4225 Lines: 110 On 2017/12/14 21:15, Gonglei (Arei) wrote: >> -----Original Message----- >> From: Liran Alon [mailto:LIRAN.ALON@ORACLE.COM] >> Sent: Thursday, December 14, 2017 9:02 PM >> To: Gonglei (Arei); pbonzini@redhat.com; rkrcmar@redhat.com >> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org; Huangweidong (C) >> Subject: Re: [PATCH] KVM: x86: ioapic: Clear IRR for rtc bit when rtc EOI gotten >> >> >> >> On 14/12/17 14:23, Gonglei wrote: >>> We hit a bug in our test while run PCMark 10 in a windows 7 VM, >>> The VM got stuck and the wallclock was hang after several minutes running >>> PCMark 10 in it. >>> It is quite easily to reproduce the bug with the upstream KVM and Qemu. >>> >>> We found that KVM can not inject any RTC irq to VM after it was hang, it fails >> to >>> Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr. >>> >>> static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, >>> int irq_level, bool line_status) >>> { >>> ... >>> if (!irq_level) { >>> ioapic->irr &= ~mask; >>> ret = 1; >>> goto out; >>> } >>> ... >>> if ((edge && old_irr == ioapic->irr) || >>> (!edge && entry.fields.remote_irr)) { >>> ret = 0; >>> goto out; >>> } >>> >>> According to RTC spec, after RTC injects a High level irq, OS will read CMOS's >>> register C to to clear the irq flag, and pull down the irq electric pin. >>> >>> For Qemu, we will emulate the reading operation in cmos_ioport_read(), >>> but Guest OS will fire a write operation before to tell which register will be >> read >>> after this write, where we use s->cmos_index to record the following register >> to read. >>> But in our test, we found that there is a possible situation that Vcpu fails to >> read >>> RTC_REG_C to clear irq, This could happens while two VCpus are >> writing/reading >>> registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C, >>> so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C, >>> but before it tries to read register C, another vcpu1 is going to read >> RTC_YEAR, >>> it changes s->cmos_index to RTC_YEAR by a writing action. >>> The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we >> will miss >>> calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never inject >> RTC irq, >>> and Windows VM will hang. >> If I understood correctly, this looks to me like a race-condition bug in >> the Windows guest kernel. In real-hardware this race-condition will also >> cause the RTC_YEAR to be read instead of RTC_REG_C. >> Guest kernel should make sure that 2 CPUs does not attempt to read a >> CMOS register in parallel as they can override each other's cmos_index. >> >> See for example how Linux kernel makes sure to avoid such kind of issues >> in rtc_cmos_read() (arch/x86/kernel/rtc.c) by grabbing a cmos_lock. >> > Yes, I knew that. > >>> Let's clear IRR of rtc when corresponding EOI is gotten to avoid the issue. >> Can you elaborate a bit more why it makes sense to put such workaround >> in KVM code instead of declaring this as guest kernel bug? >> > I agree it's a Windows bug. The big problem is there is not problem on Xen platform.  have you analyzed why it is not a problem on Xen? Quan > Thanks, > -Gonglei > >> Regards, >> -Liran >> >>> Suggested-by: Paolo Bonzini >>> Signed-off-by: Gonglei >>> --- >>> Thanks to Paolo provides a good solution. :) >>> >>> arch/x86/kvm/ioapic.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c >>> index 4e822ad..5022d63 100644 >>> --- a/arch/x86/kvm/ioapic.c >>> +++ b/arch/x86/kvm/ioapic.c >>> @@ -160,6 +160,7 @@ static void rtc_irq_eoi(struct kvm_ioapic *ioapic, >> struct kvm_vcpu *vcpu) >>> { >>> if (test_and_clear_bit(vcpu->vcpu_id, >>> ioapic->rtc_status.dest_map.map)) { >>> + ioapic->irr &= ~(1 << RTC_GSI); >>> --ioapic->rtc_status.pending_eoi; >>> rtc_status_pending_eoi_check_valid(ioapic); >>> } >>>