Received: by 2002:ac0:e350:0:0:0:0:0 with SMTP id g16csp940322imn; Sat, 30 Jul 2022 08:49:35 -0700 (PDT) X-Google-Smtp-Source: AA6agR6lL/xBR/yuAuewjpogZdZpVh1plwFp1GhPlFptLhWL1HA5eb0XyX8ZhvUKgfTfSL6oBEq+ X-Received: by 2002:a17:902:c713:b0:16e:cbe3:29da with SMTP id p19-20020a170902c71300b0016ecbe329damr3627793plp.61.1659196175401; Sat, 30 Jul 2022 08:49:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659196175; cv=none; d=google.com; s=arc-20160816; b=DyaB8AhwLsErto3e/3ptPX7xsRbzsjLH/yMnT5pu6Fi8KbLPPkHTNNt65eynMqENsy IF+qtrKLtpbzJg19+PzZpUoa5Vle/kFWJwtCSv+RHFgIIoiSwjhaJfd+wmOA3/t+L7w5 iUO7rc3MqNEbCDUHrFa5wRMfTXnTtBkW9oRp4t6qXLpxgozs4ks/l6HHGw6UD+f31RPF I3jL3Sa5Oha4Y+EFDRo71WVasF/e0pWHTjzTQMtYkrO0VxL5LvO3cTfBh6sXfpRx6lVo egE5EPTC7/QYOZA1P/34mcC6ri7CYekE8s8GYYBrIBWq/4hMmboBFYIKZU8jzTh/dMNd Fi2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=Shasy3oM/UnNVI3MkMReNR3M/LSkhB3SgHfHS7qVxiE=; b=lEl+mYkEuqk/4oIGVVEp3F6hRy8PYPpxFfFyoh0UHd3QG62ls3Q/7SkZuwV35DFmDF TLJQIf/Xn6YtrbLmKkDlHuylYDxSVlU2rZ3JPHqCH1GquUXmyKusRJHaZbSMyedVTofh /v9sGlVaV6CH/+6k7GucfQxRvaVCj+Y9I/AJbDRoMlxJQ0eJsy/krMquJt25tm8B76Vn U82/wq/ZROlVe9S1zdP4/0Bdl6/Pc+LblkKI9e4sB5bn7fR0cezUYuNlZySQf62+MQd+ rrqqYN4IvcDBATUhy0j7072Mhw71a3EZbmP7Nx5HGrCu7tvxDYtFpRNqbgflGnRBATYm c8rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@semihalf.com header.s=google header.b=R1q+yhnF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id iw6-20020a170903044600b0016bdabd1eb2si6239361plb.389.2022.07.30.08.49.19; Sat, 30 Jul 2022 08:49:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@semihalf.com header.s=google header.b=R1q+yhnF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235075AbiG3PaU (ORCPT + 99 others); Sat, 30 Jul 2022 11:30:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235027AbiG3PaR (ORCPT ); Sat, 30 Jul 2022 11:30:17 -0400 Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0178910BA for ; Sat, 30 Jul 2022 08:30:14 -0700 (PDT) Received: by mail-lf1-x12c.google.com with SMTP id z25so11222489lfr.2 for ; Sat, 30 Jul 2022 08:30:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=Shasy3oM/UnNVI3MkMReNR3M/LSkhB3SgHfHS7qVxiE=; b=R1q+yhnFk9WD3jHtTF9gau37yzpt6W/+NU6hqUgUvXFpNUjeyhAAtxAc7eE6un1D5K 986SjpabDCuABTTvg/N4dNQIr7rYsDyxnd59cEFNJ4/1PwDrVbtOQAtzgnsqAS0C4vo4 Rf9BxrtKIO4SYglaIjsQoC6DlQVjeNKDA21ODAikECjqBf4Cp+wH5Rn7RZfG30WEDgzi wsyowNA8DZ2RcKA1EOHf+iKlBLaLv8CR/5gXOuia35HKa/hOUxxgqZb4vCY4I1JnuRBD RA3CFAF+faf1kUgQl92s+vt7eBLGctyLQxms6N+mARbp+BVSUWudlygIulrn6zdkl22R dpDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Shasy3oM/UnNVI3MkMReNR3M/LSkhB3SgHfHS7qVxiE=; b=ZsPv8zIaza51skobZzTu764o6Ro6EgD4yCs1epfWdt2xiTjKEAi9aXnl8oa8KJ1GUX SZousO8xuCwWYNcjZ7AdfSXyLF/26xqH9V7u6pdZm3cyMAwKQMDcwPuR74pjLWgFYBJq p78DoHpdunox7XgznopiIlLTkWGC4DPK+F3Av+Xb4TCtgwson6cLgghFwXJBOcpiGmdH WFkDCVVGW4bLeZM6Hc72o9DMZve97qNAsSbCiy2Inf4OaNjAeZooj3XKKJWIDGHQ09gv +NNNOjO2DPaYUhSF7ORS6JI4gDF6ZHrBovNVgW+/eSHAD9RiQx9j/FnPLLIu5sRvheZh oM8w== X-Gm-Message-State: AJIora8lNNxxfNNVy6Z81r35+aJ/Z9UJM/vx3HILAFNEM9NT+inPbQi6 YnEdKXCAv2rv0dBIBfzArxJ8zg== X-Received: by 2002:a05:6512:2591:b0:48a:d133:66f3 with SMTP id bf17-20020a056512259100b0048ad13366f3mr3111959lfb.470.1659195013168; Sat, 30 Jul 2022 08:30:13 -0700 (PDT) Received: from ?IPv6:2a02:a31b:33d:9c00:463a:87e3:44fc:2b2f? ([2a02:a31b:33d:9c00:463a:87e3:44fc:2b2f]) by smtp.gmail.com with ESMTPSA id e6-20020a2ea546000000b0025dd6c8933csm1007503ljn.114.2022.07.30.08.30.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 30 Jul 2022 08:30:12 -0700 (PDT) Subject: Re: [PATCH 3/3] KVM: irqfd: Postpone resamplefd notify for oneshot interrupts To: "Liu, Rong L" , "Christopherson,, Sean" , Paolo Bonzini , "kvm@vger.kernel.org" Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "x86@kernel.org" , "H. Peter Anvin" , "linux-kernel@vger.kernel.org" , Eric Auger , Alex Williamson , Zhenyu Wang , Tomasz Nowicki , Grzegorz Jaszczyk , Dmitry Torokhov References: <20220715155928.26362-1-dmy@semihalf.com> <20220715155928.26362-4-dmy@semihalf.com> From: Dmytro Maluka Message-ID: <9054d9f9-f41e-05c7-ce8d-628a6c827c40@semihalf.com> Date: Sat, 30 Jul 2022 17:30:10 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/29/22 11:21 PM, Liu, Rong L wrote: > Hi Dmytro, > >> -----Original Message----- >> From: Dmytro Maluka >> Sent: Friday, July 15, 2022 8:59 AM >> To: Christopherson,, Sean ; Paolo Bonzini >> ; kvm@vger.kernel.org >> Cc: Thomas Gleixner ; Ingo Molnar >> ; Borislav Petkov ; Dave Hansen >> ; x86@kernel.org; H. Peter Anvin >> ; linux-kernel@vger.kernel.org; Eric Auger >> ; Alex Williamson >> ; Liu, Rong L ; >> Zhenyu Wang ; Tomasz Nowicki >> ; Grzegorz Jaszczyk ; Dmitry >> Torokhov ; Dmytro Maluka >> Subject: [PATCH 3/3] KVM: irqfd: Postpone resamplefd notify for oneshot >> interrupts >> >> The existing KVM mechanism for forwarding of level-triggered interrupts >> using resample eventfd doesn't work quite correctly in the case of >> interrupts that are handled in a Linux guest as oneshot interrupts >> (IRQF_ONESHOT). Such an interrupt is acked to the device in its >> threaded irq handler, i.e. later than it is acked to the interrupt >> controller (EOI at the end of hardirq), not earlier. >> >> Linux keeps such interrupt masked until its threaded handler finishes, >> to prevent the EOI from re-asserting an unacknowledged interrupt. >> However, with KVM + vfio (or whatever is listening on the resamplefd) >> we don't check that the interrupt is still masked in the guest at the >> moment of EOI. Resamplefd is notified regardless, so vfio prematurely >> unmasks the host physical IRQ, thus a new (unwanted) physical interrupt >> is generated in the host and queued for injection to the guest. >> >> The fact that the virtual IRQ is still masked doesn't prevent this new >> physical IRQ from being propagated to the guest, because: >> >> 1. It is not guaranteed that the vIRQ will remain masked by the time >> when vfio signals the trigger eventfd. >> 2. KVM marks this IRQ as pending (e.g. setting its bit in the virtual >> IRR register of IOAPIC on x86), so after the vIRQ is unmasked, this >> new pending interrupt is injected by KVM to the guest anyway. >> >> There are observed at least 2 user-visible issues caused by those >> extra erroneous pending interrupts for oneshot irq in the guest: >> >> 1. System suspend aborted due to a pending wakeup interrupt from >> ChromeOS EC (drivers/platform/chrome/cros_ec.c). >> 2. Annoying "invalid report id data" errors from ELAN0000 touchpad >> (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg >> every time the touchpad is touched. >> >> This patch fixes the issue on x86 by checking if the interrupt is >> unmasked when we receive irq ack (EOI) and, in case if it's masked, >> postponing resamplefd notify until the guest unmasks it. >> >> Important notes: >> >> 1. It doesn't fix the issue for other archs yet, due to some missing >> KVM functionality needed by this patch: >> - calling mask notifiers is implemented for x86 only >> - irqchip ->is_masked() is implemented for x86 only >> >> 2. It introduces an additional spinlock locking in the resample notify >> path, since we are no longer just traversing an RCU list of irqfds >> but also updating the resampler state. Hopefully this locking won't >> noticeably slow down anything for anyone. >> >> Regarding #2, there may be an alternative solution worth considering: >> extend KVM irqfd (userspace) API to send mask and unmask notifications >> directly to vfio/whatever, in addition to resample notifications, to >> let vfio check the irq state on its own. There is already locking on >> vfio side (see e.g. vfio_platform_unmask()), so this way we would avoid >> introducing any additional locking. Also such mask/unmask notifications >> could be useful for other cases. >> >> Link: https://lore.kernel.org/kvm/31420943-8c5f-125c-a5ee- >> d2fde2700083@semihalf.com/ >> Suggested-by: Sean Christopherson >> Signed-off-by: Dmytro Maluka >> --- >> include/linux/kvm_irqfd.h | 14 ++++++++++++ >> virt/kvm/eventfd.c | 45 >> +++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 59 insertions(+) >> >> diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h >> index dac047abdba7..01754a1abb9e 100644 >> --- a/include/linux/kvm_irqfd.h >> +++ b/include/linux/kvm_irqfd.h >> @@ -19,6 +19,16 @@ >> * resamplefd. All resamplers on the same gsi are de-asserted >> * together, so we don't need to track the state of each individual >> * user. We can also therefore share the same irq source ID. >> + * >> + * A special case is when the interrupt is still masked at the moment >> + * an irq ack is received. That likely means that the interrupt has >> + * been acknowledged to the interrupt controller but not acknowledged >> + * to the device yet, e.g. it might be a Linux guest's threaded >> + * oneshot interrupt (IRQF_ONESHOT). In this case notifying through >> + * resamplefd is postponed until the guest unmasks the interrupt, >> + * which is detected through the irq mask notifier. This prevents >> + * erroneous extra interrupts caused by premature re-assert of an >> + * unacknowledged interrupt by the resamplefd listener. >> */ > > I don't really agree with above statement. Please correct me if I am wrong. In > all modern x86 interrupt infrastructure(lapic/ioapic), for level triggered > interrupt, the sequence is always EOI (LAPIC), which is called interrupt ack in > the context of this discussion, then unmask (IOAPIC). Oneshot interrupt is > different only because the timing of above 2 events are different from a > "normal" level-triggered interrupt. It is like for level interrupt: Hardirq -> > EOI -> Unmask but for oneshot, it is like: hardirq->EOI, then sometime later > threadedirq->unmask. So based on this, I don't think you need to keep track of > whether the interrupt is unmasked or not, just need to call the notifier at the > end of unmask, instead of EOI. And calling notifier at the end of unmask, > instead of EOI won't break non-oneshot case. The only caveat is guest ioapic > update (virq unmask) doesn't cause vmexit. But the assumption is already made > that virq unmask causes vmexit. This doesn't seem true to me. For example, looking at how Linux handles level-triggered IOAPIC interrupts in handle_fasteoi_irq(), AFAICS normally it does EOI only, without masking or unmasking. I believe for regular level-triggered interrupts it's like: hardirq -> EOI while for oneshot interrupts: mask -> hardirq -> EOI -> threadedirq -> unmask I don't quite get what would be the point of unmasking in addition to an EOI in the normal case (not in a special case like oneshot interrupt, where we mask the interrupt in the beginning of hardirq, so need to unmask it later on). Isn't the whole point of "fast EOI" to minimize latencies by using just one LAPIC write, no IOAPIC writes? Thanks, Dmytro > >> struct kvm_kernel_irqfd_resampler { >> struct kvm *kvm; >> @@ -28,6 +38,10 @@ struct kvm_kernel_irqfd_resampler { >> */ >> struct list_head list; >> struct kvm_irq_ack_notifier notifier; >> + struct kvm_irq_mask_notifier mask_notifier; >> + bool masked; >> + bool pending; >> + spinlock_t lock; >> /* >> * Entry in list of kvm->irqfd.resampler_list. Use for sharing >> * resamplers among irqfds on the same gsi. >> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c >> index 50ddb1d1a7f0..9ff47ac33790 100644 >> --- a/virt/kvm/eventfd.c >> +++ b/virt/kvm/eventfd.c >> @@ -75,6 +75,44 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier >> *kian) >> kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID, >> resampler->notifier.gsi, 0, false); >> >> + spin_lock(&resampler->lock); >> + if (resampler->masked) { >> + resampler->pending = true; >> + spin_unlock(&resampler->lock); >> + return; >> + } >> + spin_unlock(&resampler->lock); >> + >> + idx = srcu_read_lock(&kvm->irq_srcu); >> + >> + list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link, >> + srcu_read_lock_held(&kvm->irq_srcu)) >> + eventfd_signal(irqfd->resamplefd, 1); >> + >> + srcu_read_unlock(&kvm->irq_srcu, idx); >> +} >> + >> +static void >> +irqfd_resampler_mask(struct kvm_irq_mask_notifier *kimn, bool >> masked) >> +{ >> + struct kvm_kernel_irqfd_resampler *resampler; >> + struct kvm *kvm; >> + struct kvm_kernel_irqfd *irqfd; >> + int idx; >> + >> + resampler = container_of(kimn, >> + struct kvm_kernel_irqfd_resampler, mask_notifier); >> + kvm = resampler->kvm; >> + >> + spin_lock(&resampler->lock); >> + resampler->masked = masked; >> + if (masked || !resampler->pending) { >> + spin_unlock(&resampler->lock); >> + return; >> + } >> + resampler->pending = false; >> + spin_unlock(&resampler->lock); >> + >> idx = srcu_read_lock(&kvm->irq_srcu); >> >> list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link, >> @@ -98,6 +136,8 @@ irqfd_resampler_shutdown(struct >> kvm_kernel_irqfd *irqfd) >> if (list_empty(&resampler->list)) { >> list_del(&resampler->link); >> kvm_unregister_irq_ack_notifier(kvm, &resampler->notifier); >> + kvm_unregister_irq_mask_notifier(kvm, resampler- >>> mask_notifier.irq, >> + &resampler->mask_notifier); >> kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID, >> resampler->notifier.gsi, 0, false); >> kfree(resampler); >> @@ -367,11 +407,16 @@ kvm_irqfd_assign(struct kvm *kvm, struct >> kvm_irqfd *args) >> INIT_LIST_HEAD(&resampler->list); >> resampler->notifier.gsi = irqfd->gsi; >> resampler->notifier.irq_acked = irqfd_resampler_ack; >> + resampler->mask_notifier.func = irqfd_resampler_mask; >> + kvm_irq_is_masked(kvm, irqfd->gsi, &resampler- >>> masked); >> + spin_lock_init(&resampler->lock); >> INIT_LIST_HEAD(&resampler->link); >> >> list_add(&resampler->link, &kvm->irqfds.resampler_list); >> kvm_register_irq_ack_notifier(kvm, >> &resampler->notifier); >> + kvm_register_irq_mask_notifier(kvm, irqfd->gsi, >> + &resampler->mask_notifier); >> irqfd->resampler = resampler; >> } >> >> -- >> 2.37.0.170.g444d1eabd0-goog >