Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp621354imw; Fri, 15 Jul 2022 09:40:50 -0700 (PDT) X-Google-Smtp-Source: AGRyM1svw4nAia44I7iRiDzr/EWGfVV3U0tdkrE1KyUIwSazi0oxaeJVtmKkxojg7mL2/O+yqXVP X-Received: by 2002:a05:6402:290a:b0:43a:444e:dca with SMTP id ee10-20020a056402290a00b0043a444e0dcamr19940593edb.355.1657903249779; Fri, 15 Jul 2022 09:40:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657903249; cv=none; d=google.com; s=arc-20160816; b=oK+jdYN6VCIvaEvOB9q8TWaQpRZFB/iKgnCenrb/QbBivTfwiHFxlIW4Ss0wA6wLRH Kf4JeqE2eOKOeZZEz9GCu1ayJ0ME7X7IXTBWDGEM0oDknMh58cMptYXsN09/MKLcxYX6 CXsQ50B7UKOk1nGcNpAsO6/TZio6vEO+OJ02MdO728MN4Hv9KsFG2JeJuD1DH9dUIsgS hE6gpasZ76XPyxb5rv/jt72kisDEtCM15KJUEgFUXzkDvpoPM4XSinvM3y9DaZ4ZPRaM jjjkfyHoeMx8Fs0vDM2TW+JEG8ScwIPWpjTwVjoLa9R79crkMr0tF3N/B9Kn+0GxNOPv bbCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6PvTZQ45Z0LZ1z1mI3Km4s/20uCm/f1rKAHjk57ZZk8=; b=pEyjtPfaNh6wMUzEFLL88C6zW+uB4lfoaUEgsni6TxdzPlKPbUfQfD3JSmCAwtqTYg vMSKELmcQAmWc7NtaYJDy764TFltp+2JDig8nMT/oEvp0wI0OVaunJhIZTaBxGCluWmx gz6Xt+SpMl8Ni9nJGNsA/JY1Q9Dx6dqi4F2xhqEJtXeGDP9Sd451BQ8tPMfUeypi2MqQ v2hviPt6vkLaw0NId9zDW4TNSoG/kQj6ve7dE0uiw+L1xPJglHm5iiZvz6g9uwZXVzf7 2dDNV6BW9+ZvK3lrZXD7liLScatMHXHzc2hG25d5QLtXyNjs/Gzqwf1IWqEHRjXLse3v CkRw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@semihalf.com header.s=google header.b=X5YO4csQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v1-20020a1709063bc100b0072b9209c0c9si5458377ejf.362.2022.07.15.09.40.24; Fri, 15 Jul 2022 09:40:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@semihalf.com header.s=google header.b=X5YO4csQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232885AbiGOQBG (ORCPT + 99 others); Fri, 15 Jul 2022 12:01:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232745AbiGOQA7 (ORCPT ); Fri, 15 Jul 2022 12:00:59 -0400 Received: from mail-lf1-x131.google.com (mail-lf1-x131.google.com [IPv6:2a00:1450:4864:20::131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 057CE6E8BB for ; Fri, 15 Jul 2022 09:00:54 -0700 (PDT) Received: by mail-lf1-x131.google.com with SMTP id t25so8514776lfg.7 for ; Fri, 15 Jul 2022 09:00:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=6PvTZQ45Z0LZ1z1mI3Km4s/20uCm/f1rKAHjk57ZZk8=; b=X5YO4csQqPan1hqJ4ywB5/7ABfqBnMBowXP4ayNMlBt4V+h/PIZmINY/jCxDJDQGAm /8SOHrHLa/sVo5ZQPSnK9diHS0b2GeeZbRrRgnYp68jI5GXiNmoCfgvb6sqvoda4yLPt j+5mqOBSkDYknzg/7l5iLcSqBg9WRX4+OvstF732nLI7pTgA0rSPpfWaxF9Lam1wdSig Q+tGkEqka1o0EbJlKP4iypS8s+4kk+va5bura3M2iitnQBm6PrXyr6oLo6s7ulbJRNei 1NllHMhQBtpkc+Uzbu05HdCxQf8GnEmw8zvCz7CwhB67lKDyW+u0fGT3SwGURVrSH3AI VTrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6PvTZQ45Z0LZ1z1mI3Km4s/20uCm/f1rKAHjk57ZZk8=; b=P1gZ9app7PjoqxmVuFxpk8GZw3ghyXPzwCzIoiioaONk5HlX5yq85t2uu0flLawWBv 6Epa/s6ZbaAKNB6LFN+qGkywnRHfUDM10nDLoNIzrQKeqVbOmVP7JrWINFZ3pLbP+9ir UWlqVKSR/j2D24b205KClJ9NhpqMuZPP7oKndRCa7fhnYkHte2D4VGV50i55A5pp2Jzl lj3WnK6GQTEcsn/TmgpWtUrPtKkpQhnL4z59ctR9FrgmJSg1Nl6pPS7a886+tqKfqmLu 4lFxd9iHtLTn2zKPapwPzHVQf9r1C7tziFXaG3mN2NwZZX0DFytf6xG0j+YZ+zwKOE/a sZLQ== X-Gm-Message-State: AJIora+fTJWFZU+zVv8JrR6RySEjPIa/gCgsR8DkxLYRjOYm3jkULtgz DAeWi2bGACrz/uV1frxNveCOhw== X-Received: by 2002:a05:6512:3d15:b0:489:d97d:8927 with SMTP id d21-20020a0565123d1500b00489d97d8927mr8657272lfv.80.1657900852352; Fri, 15 Jul 2022 09:00:52 -0700 (PDT) Received: from dmaluka.office.semihalf.net ([83.142.187.84]) by smtp.gmail.com with ESMTPSA id c12-20020a056512238c00b0047968606114sm959772lfv.111.2022.07.15.09.00.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Jul 2022 09:00:51 -0700 (PDT) From: Dmytro Maluka To: Sean Christopherson , Paolo Bonzini , kvm@vger.kernel.org Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , linux-kernel@vger.kernel.org, Eric Auger , Alex Williamson , Rong L Liu , Zhenyu Wang , Tomasz Nowicki , Grzegorz Jaszczyk , Dmitry Torokhov , Dmytro Maluka Subject: [PATCH 3/3] KVM: irqfd: Postpone resamplefd notify for oneshot interrupts Date: Fri, 15 Jul 2022 17:59:28 +0200 Message-Id: <20220715155928.26362-4-dmy@semihalf.com> X-Mailer: git-send-email 2.37.0.170.g444d1eabd0-goog In-Reply-To: <20220715155928.26362-1-dmy@semihalf.com> References: <20220715155928.26362-1-dmy@semihalf.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The existing KVM mechanism for forwarding of level-triggered interrupts using resample eventfd doesn't work quite correctly in the case of interrupts that are handled in a Linux guest as oneshot interrupts (IRQF_ONESHOT). Such an interrupt is acked to the device in its threaded irq handler, i.e. later than it is acked to the interrupt controller (EOI at the end of hardirq), not earlier. Linux keeps such interrupt masked until its threaded handler finishes, to prevent the EOI from re-asserting an unacknowledged interrupt. However, with KVM + vfio (or whatever is listening on the resamplefd) we don't check that the interrupt is still masked in the guest at the moment of EOI. Resamplefd is notified regardless, so vfio prematurely unmasks the host physical IRQ, thus a new (unwanted) physical interrupt is generated in the host and queued for injection to the guest. The fact that the virtual IRQ is still masked doesn't prevent this new physical IRQ from being propagated to the guest, because: 1. It is not guaranteed that the vIRQ will remain masked by the time when vfio signals the trigger eventfd. 2. KVM marks this IRQ as pending (e.g. setting its bit in the virtual IRR register of IOAPIC on x86), so after the vIRQ is unmasked, this new pending interrupt is injected by KVM to the guest anyway. There are observed at least 2 user-visible issues caused by those extra erroneous pending interrupts for oneshot irq in the guest: 1. System suspend aborted due to a pending wakeup interrupt from ChromeOS EC (drivers/platform/chrome/cros_ec.c). 2. Annoying "invalid report id data" errors from ELAN0000 touchpad (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg every time the touchpad is touched. This patch fixes the issue on x86 by checking if the interrupt is unmasked when we receive irq ack (EOI) and, in case if it's masked, postponing resamplefd notify until the guest unmasks it. Important notes: 1. It doesn't fix the issue for other archs yet, due to some missing KVM functionality needed by this patch: - calling mask notifiers is implemented for x86 only - irqchip ->is_masked() is implemented for x86 only 2. It introduces an additional spinlock locking in the resample notify path, since we are no longer just traversing an RCU list of irqfds but also updating the resampler state. Hopefully this locking won't noticeably slow down anything for anyone. Regarding #2, there may be an alternative solution worth considering: extend KVM irqfd (userspace) API to send mask and unmask notifications directly to vfio/whatever, in addition to resample notifications, to let vfio check the irq state on its own. There is already locking on vfio side (see e.g. vfio_platform_unmask()), so this way we would avoid introducing any additional locking. Also such mask/unmask notifications could be useful for other cases. Link: https://lore.kernel.org/kvm/31420943-8c5f-125c-a5ee-d2fde2700083@semihalf.com/ Suggested-by: Sean Christopherson Signed-off-by: Dmytro Maluka --- include/linux/kvm_irqfd.h | 14 ++++++++++++ virt/kvm/eventfd.c | 45 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+) diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h index dac047abdba7..01754a1abb9e 100644 --- a/include/linux/kvm_irqfd.h +++ b/include/linux/kvm_irqfd.h @@ -19,6 +19,16 @@ * resamplefd. All resamplers on the same gsi are de-asserted * together, so we don't need to track the state of each individual * user. We can also therefore share the same irq source ID. + * + * A special case is when the interrupt is still masked at the moment + * an irq ack is received. That likely means that the interrupt has + * been acknowledged to the interrupt controller but not acknowledged + * to the device yet, e.g. it might be a Linux guest's threaded + * oneshot interrupt (IRQF_ONESHOT). In this case notifying through + * resamplefd is postponed until the guest unmasks the interrupt, + * which is detected through the irq mask notifier. This prevents + * erroneous extra interrupts caused by premature re-assert of an + * unacknowledged interrupt by the resamplefd listener. */ struct kvm_kernel_irqfd_resampler { struct kvm *kvm; @@ -28,6 +38,10 @@ struct kvm_kernel_irqfd_resampler { */ struct list_head list; struct kvm_irq_ack_notifier notifier; + struct kvm_irq_mask_notifier mask_notifier; + bool masked; + bool pending; + spinlock_t lock; /* * Entry in list of kvm->irqfd.resampler_list. Use for sharing * resamplers among irqfds on the same gsi. diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 50ddb1d1a7f0..9ff47ac33790 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -75,6 +75,44 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian) kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID, resampler->notifier.gsi, 0, false); + spin_lock(&resampler->lock); + if (resampler->masked) { + resampler->pending = true; + spin_unlock(&resampler->lock); + return; + } + spin_unlock(&resampler->lock); + + idx = srcu_read_lock(&kvm->irq_srcu); + + list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link, + srcu_read_lock_held(&kvm->irq_srcu)) + eventfd_signal(irqfd->resamplefd, 1); + + srcu_read_unlock(&kvm->irq_srcu, idx); +} + +static void +irqfd_resampler_mask(struct kvm_irq_mask_notifier *kimn, bool masked) +{ + struct kvm_kernel_irqfd_resampler *resampler; + struct kvm *kvm; + struct kvm_kernel_irqfd *irqfd; + int idx; + + resampler = container_of(kimn, + struct kvm_kernel_irqfd_resampler, mask_notifier); + kvm = resampler->kvm; + + spin_lock(&resampler->lock); + resampler->masked = masked; + if (masked || !resampler->pending) { + spin_unlock(&resampler->lock); + return; + } + resampler->pending = false; + spin_unlock(&resampler->lock); + idx = srcu_read_lock(&kvm->irq_srcu); list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link, @@ -98,6 +136,8 @@ irqfd_resampler_shutdown(struct kvm_kernel_irqfd *irqfd) if (list_empty(&resampler->list)) { list_del(&resampler->link); kvm_unregister_irq_ack_notifier(kvm, &resampler->notifier); + kvm_unregister_irq_mask_notifier(kvm, resampler->mask_notifier.irq, + &resampler->mask_notifier); kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID, resampler->notifier.gsi, 0, false); kfree(resampler); @@ -367,11 +407,16 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) INIT_LIST_HEAD(&resampler->list); resampler->notifier.gsi = irqfd->gsi; resampler->notifier.irq_acked = irqfd_resampler_ack; + resampler->mask_notifier.func = irqfd_resampler_mask; + kvm_irq_is_masked(kvm, irqfd->gsi, &resampler->masked); + spin_lock_init(&resampler->lock); INIT_LIST_HEAD(&resampler->link); list_add(&resampler->link, &kvm->irqfds.resampler_list); kvm_register_irq_ack_notifier(kvm, &resampler->notifier); + kvm_register_irq_mask_notifier(kvm, irqfd->gsi, + &resampler->mask_notifier); irqfd->resampler = resampler; } -- 2.37.0.170.g444d1eabd0-goog