Received: by 2002:ab2:3319:0:b0:1ef:7a0f:c32d with SMTP id i25csp887148lqc; Fri, 8 Mar 2024 15:07:41 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUMa5mHtzNx1zsm7g/RdwRfc0IHCNyED6Pz8ejNl1eVj3JxMJze1DcAGe6aUZZBGpmFfPTyi+djH9Nj/xAZR9Nr2d6Rp5Ubmwo+8lulFw== X-Google-Smtp-Source: AGHT+IFq5QY9AVrEinPYPNlPeskHGsqwy1zPtnbL1hYjGpdiy7f327imxVV/jxHK/55G6EojN868 X-Received: by 2002:a05:6214:558d:b0:690:9022:3137 with SMTP id mi13-20020a056214558d00b0069090223137mr665078qvb.29.1709939261394; Fri, 08 Mar 2024 15:07:41 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709939261; cv=pass; d=google.com; s=arc-20160816; b=gianXytw8R4xKBGSxzNYLzLQvwzb5imZCoxKUVcyI5Qbnl4XCeWxVw32kExQyXSAU0 Z5guSp9mQs+/ZXvidh38SllSKPBiq8CudUkvsMfU03Gp9jYWsNyM5UfaUCcG6rZxZCVO QPlVSkBB7iEVaFMd48ujW+/ejmTl5GYBm9jR2pUjCCw3+GInFWCwLfZo/T4bNt8CDzJq HR7OrEuSoUgeFDiE/0x2aO+9FHcUUZekf13Y/P6iM+85+tDiD0Hjh/1f8qAPWEsgqIug nT+iPNAUjqcUVioRbk/9ElHXL/I8xFMj4TigPxdoIbVTlv/xfC2OSFL5i8ESIe3rooMT HuQQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=QEMDRUYAwwblqvmSVpxrM3ioP4wjTxARdoB/48mCjPw=; fh=AON1tP1R2s0GAvYSrU1I/R6N81mWhzuaPDd8Cwaumc8=; b=nafOKq9bUpxhSxYIzbhglUfoTt/ycdoHrmEC2sFhBzzFxdKCpofYbjUJc7qh4Itp9i sei2VhaaVC5C4UtX2Ya1r9xJfrnMnNaXnpY+10dU9LFGwAGT1vYm7YTJllT+nA6RuoRu l5SMNFf+zZfzVdgRLP+LT+s3MBD/7WcmZzPX8U5uG3lcuyjJtkOtEhpzS7Prgt4C4ptc HPZh/xGlQ5kGRYSDjfmO2uHn1zO4DIraq5f5xlx/C+6GfEXVIQnyEiHZNx70O9lu5Roz 623Kv65GCWm5aMPEVdT58zTLEE1dnq8MSjJLhMeaI8r1hUFbLDI1pGrDVAJMXUDcKTVc EC5w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dZ8KYCL4; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-97680-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-97680-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id jm5-20020ad45ec5000000b00690b92e28b8si477978qvb.243.2024.03.08.15.07.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 15:07:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-97680-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dZ8KYCL4; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-97680-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-97680-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 16F1C1C21624 for ; Fri, 8 Mar 2024 23:07:41 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7015D4E1DA; Fri, 8 Mar 2024 23:06:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dZ8KYCL4" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C81B44C66 for ; Fri, 8 Mar 2024 23:06:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709939177; cv=none; b=S4ZzWO9ON6wHijCnHU4a6RT4rgZEn5dJSUDdw0BWsNan7eYm6iVzIcnFsTm+XamPRxlY60ddYxQOFyhxdmtQa6vLh320a6eo9hPcHYORnwia7MyJFxf7AG1b8Mskj/YU+mH5qKxScoQxm0DmGXAR6GL4kNYlkOBQo1sk2ADz5DA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709939177; c=relaxed/simple; bh=8s1hywDNEEyjxGemEE8705xYNh1NyVGgBGOpM7NxcDs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qFj2ddIN9zL8+d8o7Zho842smywGpE4sYlgnBNlPZaudfsgjf40AYfpMX7iLR4xMHgi5vbEowQSdn8kb9sK6oL7PbtEtzTw+QX73e4/G6fq/pjhojbQ4QMnS8mJ8snD4S1NdzHg2BjjTLCFgWXgitV8PD0uARO8OsX6qvdb9LpE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dZ8KYCL4; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1709939174; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QEMDRUYAwwblqvmSVpxrM3ioP4wjTxARdoB/48mCjPw=; b=dZ8KYCL4UlYuWtqhEB+hnQWabZQyv36voGYcy7nVexlGTXLssL9E12PeukaCfY64c+bGOT JdCYjis6gDtYW6N+F5YeTqbQc5ts0Kh2Nrs6TRV4JT3W4ro+itkYf0fDlqB2aZ+1gHW67j CvYsGPZl/J/9bcskpbScwF7cZXUTvm8= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-277-v834kelnOJiUox6BG9FYpQ-1; Fri, 08 Mar 2024 18:06:10 -0500 X-MC-Unique: v834kelnOJiUox6BG9FYpQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4A6BF3801F50; Fri, 8 Mar 2024 23:06:10 +0000 (UTC) Received: from omen.home.shazbot.org (unknown [10.22.8.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 409ED47CF; Fri, 8 Mar 2024 23:06:08 +0000 (UTC) From: Alex Williamson To: alex.williamson@redhat.com Cc: kvm@vger.kernel.org, eric.auger@redhat.com, clg@redhat.com, reinette.chatre@intel.com, linux-kernel@vger.kernel.org, kevin.tian@intel.com, stable@vger.kernel.org Subject: [PATCH v2 4/7] vfio/pci: Create persistent INTx handler Date: Fri, 8 Mar 2024 16:05:25 -0700 Message-ID: <20240308230557.805580-5-alex.williamson@redhat.com> In-Reply-To: <20240308230557.805580-1-alex.williamson@redhat.com> References: <20240308230557.805580-1-alex.williamson@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 A vulnerability exists where the eventfd for INTx signaling can be deconfigured, which unregisters the IRQ handler but still allows eventfds to be signaled with a NULL context through the SET_IRQS ioctl or through unmask irqfd if the device interrupt is pending. Ideally this could be solved with some additional locking; the igate mutex serializes the ioctl and config space accesses, and the interrupt handler is unregistered relative to the trigger, but the irqfd path runs asynchronous to those. The igate mutex cannot be acquired from the atomic context of the eventfd wake function. Disabling the irqfd relative to the eventfd registration is potentially incompatible with existing userspace. As a result, the solution implemented here moves configuration of the INTx interrupt handler to track the lifetime of the INTx context object and irq_type configuration, rather than registration of a particular trigger eventfd. Synchronization is added between the ioctl path and eventfd_signal() wrapper such that the eventfd trigger can be dynamically updated relative to in-flight interrupts or irqfd callbacks. Cc: stable@vger.kernel.org Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Reported-by: Reinette Chatre Reviewed-by: Kevin Tian Reviewed-by: Reinette Chatre Signed-off-by: Alex Williamson --- drivers/vfio/pci/vfio_pci_intrs.c | 145 ++++++++++++++++-------------- 1 file changed, 78 insertions(+), 67 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 75c85eec21b3..fb5392b749ff 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -90,11 +90,15 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused) if (likely(is_intx(vdev) && !vdev->virq_disabled)) { struct vfio_pci_irq_ctx *ctx; + struct eventfd_ctx *trigger; ctx = vfio_irq_ctx_get(vdev, 0); if (WARN_ON_ONCE(!ctx)) return; - eventfd_signal(ctx->trigger); + + trigger = READ_ONCE(ctx->trigger); + if (likely(trigger)) + eventfd_signal(trigger); } } @@ -253,100 +257,100 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id) return ret; } -static int vfio_intx_enable(struct vfio_pci_core_device *vdev) +static int vfio_intx_enable(struct vfio_pci_core_device *vdev, + struct eventfd_ctx *trigger) { + struct pci_dev *pdev = vdev->pdev; struct vfio_pci_irq_ctx *ctx; + unsigned long irqflags; + char *name; + int ret; if (!is_irq_none(vdev)) return -EINVAL; - if (!vdev->pdev->irq) + if (!pdev->irq) return -ENODEV; + name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)", pci_name(pdev)); + if (!name) + return -ENOMEM; + ctx = vfio_irq_ctx_alloc(vdev, 0); if (!ctx) return -ENOMEM; + ctx->name = name; + ctx->trigger = trigger; + /* - * If the virtual interrupt is masked, restore it. Devices - * supporting DisINTx can be masked at the hardware level - * here, non-PCI-2.3 devices will have to wait until the - * interrupt is enabled. + * Fill the initial masked state based on virq_disabled. After + * enable, changing the DisINTx bit in vconfig directly changes INTx + * masking. igate prevents races during setup, once running masked + * is protected via irqlock. + * + * Devices supporting DisINTx also reflect the current mask state in + * the physical DisINTx bit, which is not affected during IRQ setup. + * + * Devices without DisINTx support require an exclusive interrupt. + * IRQ masking is performed at the IRQ chip. Again, igate protects + * against races during setup and IRQ handlers and irqfds are not + * yet active, therefore masked is stable and can be used to + * conditionally auto-enable the IRQ. + * + * irq_type must be stable while the IRQ handler is registered, + * therefore it must be set before request_irq(). */ ctx->masked = vdev->virq_disabled; - if (vdev->pci_2_3) - pci_intx(vdev->pdev, !ctx->masked); + if (vdev->pci_2_3) { + pci_intx(pdev, !ctx->masked); + irqflags = IRQF_SHARED; + } else { + irqflags = ctx->masked ? IRQF_NO_AUTOEN : 0; + } vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX; + ret = request_irq(pdev->irq, vfio_intx_handler, + irqflags, ctx->name, vdev); + if (ret) { + vdev->irq_type = VFIO_PCI_NUM_IRQS; + kfree(name); + vfio_irq_ctx_free(vdev, ctx, 0); + return ret; + } + return 0; } -static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd) +static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, + struct eventfd_ctx *trigger) { struct pci_dev *pdev = vdev->pdev; - unsigned long irqflags = IRQF_SHARED; struct vfio_pci_irq_ctx *ctx; - struct eventfd_ctx *trigger; - unsigned long flags; - int ret; + struct eventfd_ctx *old; ctx = vfio_irq_ctx_get(vdev, 0); if (WARN_ON_ONCE(!ctx)) return -EINVAL; - if (ctx->trigger) { - free_irq(pdev->irq, vdev); - kfree(ctx->name); - eventfd_ctx_put(ctx->trigger); - ctx->trigger = NULL; - } - - if (fd < 0) /* Disable only */ - return 0; - - ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)", - pci_name(pdev)); - if (!ctx->name) - return -ENOMEM; - - trigger = eventfd_ctx_fdget(fd); - if (IS_ERR(trigger)) { - kfree(ctx->name); - return PTR_ERR(trigger); - } + old = ctx->trigger; - ctx->trigger = trigger; + WRITE_ONCE(ctx->trigger, trigger); - /* - * Devices without DisINTx support require an exclusive interrupt, - * IRQ masking is performed at the IRQ chip. The masked status is - * protected by vdev->irqlock. Setup the IRQ without auto-enable and - * unmask as necessary below under lock. DisINTx is unmodified by - * the IRQ configuration and may therefore use auto-enable. - */ - if (!vdev->pci_2_3) - irqflags = IRQF_NO_AUTOEN; - - ret = request_irq(pdev->irq, vfio_intx_handler, - irqflags, ctx->name, vdev); - if (ret) { - ctx->trigger = NULL; - kfree(ctx->name); - eventfd_ctx_put(trigger); - return ret; + /* Releasing an old ctx requires synchronizing in-flight users */ + if (old) { + synchronize_irq(pdev->irq); + vfio_virqfd_flush_thread(&ctx->unmask); + eventfd_ctx_put(old); } - spin_lock_irqsave(&vdev->irqlock, flags); - if (!vdev->pci_2_3 && !ctx->masked) - enable_irq(pdev->irq); - spin_unlock_irqrestore(&vdev->irqlock, flags); - return 0; } static void vfio_intx_disable(struct vfio_pci_core_device *vdev) { + struct pci_dev *pdev = vdev->pdev; struct vfio_pci_irq_ctx *ctx; ctx = vfio_irq_ctx_get(vdev, 0); @@ -354,10 +358,13 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev) if (ctx) { vfio_virqfd_disable(&ctx->unmask); vfio_virqfd_disable(&ctx->mask); + free_irq(pdev->irq, vdev); + if (ctx->trigger) + eventfd_ctx_put(ctx->trigger); + kfree(ctx->name); + vfio_irq_ctx_free(vdev, ctx, 0); } - vfio_intx_set_signal(vdev, -1); vdev->irq_type = VFIO_PCI_NUM_IRQS; - vfio_irq_ctx_free(vdev, ctx, 0); } /* @@ -641,19 +648,23 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev, return -EINVAL; if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { + struct eventfd_ctx *trigger = NULL; int32_t fd = *(int32_t *)data; int ret; - if (is_intx(vdev)) - return vfio_intx_set_signal(vdev, fd); + if (fd >= 0) { + trigger = eventfd_ctx_fdget(fd); + if (IS_ERR(trigger)) + return PTR_ERR(trigger); + } - ret = vfio_intx_enable(vdev); - if (ret) - return ret; + if (is_intx(vdev)) + ret = vfio_intx_set_signal(vdev, trigger); + else + ret = vfio_intx_enable(vdev, trigger); - ret = vfio_intx_set_signal(vdev, fd); - if (ret) - vfio_intx_disable(vdev); + if (ret && trigger) + eventfd_ctx_put(trigger); return ret; } -- 2.44.0