Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752405AbdLNNCA (ORCPT ); Thu, 14 Dec 2017 08:02:00 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:49913 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751974AbdLNNB6 (ORCPT ); Thu, 14 Dec 2017 08:01:58 -0500 Message-ID: <5A327636.3050307@ORACLE.COM> Date: Thu, 14 Dec 2017 15:01:42 +0200 From: Liran Alon User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Gonglei , pbonzini@redhat.com, rkrcmar@redhat.com CC: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, weidong.huang@huawei.com Subject: Re: [PATCH] KVM: x86: ioapic: Clear IRR for rtc bit when rtc EOI gotten References: <1513254206-25344-1-git-send-email-arei.gonglei@huawei.com> In-Reply-To: <1513254206-25344-1-git-send-email-arei.gonglei@huawei.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8744 signatures=668646 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1712140178 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3431 Lines: 84 On 14/12/17 14:23, Gonglei wrote: > We hit a bug in our test while run PCMark 10 in a windows 7 VM, > The VM got stuck and the wallclock was hang after several minutes running > PCMark 10 in it. > It is quite easily to reproduce the bug with the upstream KVM and Qemu. > > We found that KVM can not inject any RTC irq to VM after it was hang, it fails to > Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr. > > static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, > int irq_level, bool line_status) > { > ... > if (!irq_level) { > ioapic->irr &= ~mask; > ret = 1; > goto out; > } > ... > if ((edge && old_irr == ioapic->irr) || > (!edge && entry.fields.remote_irr)) { > ret = 0; > goto out; > } > > According to RTC spec, after RTC injects a High level irq, OS will read CMOS's > register C to to clear the irq flag, and pull down the irq electric pin. > > For Qemu, we will emulate the reading operation in cmos_ioport_read(), > but Guest OS will fire a write operation before to tell which register will be read > after this write, where we use s->cmos_index to record the following register to read. > > But in our test, we found that there is a possible situation that Vcpu fails to read > RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading > registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C, > so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C, > but before it tries to read register C, another vcpu1 is going to read RTC_YEAR, > it changes s->cmos_index to RTC_YEAR by a writing action. > The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we will miss > calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never inject RTC irq, > and Windows VM will hang. If I understood correctly, this looks to me like a race-condition bug in the Windows guest kernel. In real-hardware this race-condition will also cause the RTC_YEAR to be read instead of RTC_REG_C. Guest kernel should make sure that 2 CPUs does not attempt to read a CMOS register in parallel as they can override each other's cmos_index. See for example how Linux kernel makes sure to avoid such kind of issues in rtc_cmos_read() (arch/x86/kernel/rtc.c) by grabbing a cmos_lock. > > Let's clear IRR of rtc when corresponding EOI is gotten to avoid the issue. Can you elaborate a bit more why it makes sense to put such workaround in KVM code instead of declaring this as guest kernel bug? Regards, -Liran > > Suggested-by: Paolo Bonzini > Signed-off-by: Gonglei > --- > Thanks to Paolo provides a good solution. :) > > arch/x86/kvm/ioapic.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c > index 4e822ad..5022d63 100644 > --- a/arch/x86/kvm/ioapic.c > +++ b/arch/x86/kvm/ioapic.c > @@ -160,6 +160,7 @@ static void rtc_irq_eoi(struct kvm_ioapic *ioapic, struct kvm_vcpu *vcpu) > { > if (test_and_clear_bit(vcpu->vcpu_id, > ioapic->rtc_status.dest_map.map)) { > + ioapic->irr &= ~(1 << RTC_GSI); > --ioapic->rtc_status.pending_eoi; > rtc_status_pending_eoi_check_valid(ioapic); > } >