Received: by 2002:ab2:69cc:0:b0:1fd:c486:4f03 with SMTP id n12csp513922lqp; Tue, 11 Jun 2024 10:46:05 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXfH3ibPalb/mNl2UMmHHdckR/+dMSouk+eZMepxsWaJUgWlCDFFh1TQDXxjQJ7nIMQTqqkAoOVozfIbFZ/BavhxVSBBeBuPzTdS5TiRA== X-Google-Smtp-Source: AGHT+IF1K/GXPrBJpeorv/uPgEpDKukCafc6c7Pycczu+M55nkxAPeSdK26mGs2AWmse0bHXsNE6 X-Received: by 2002:a05:622a:316:b0:43a:cd4f:b227 with SMTP id d75a77b69052e-44041cae6c0mr156430081cf.49.1718127964750; Tue, 11 Jun 2024 10:46:04 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1718127964; cv=pass; d=google.com; s=arc-20160816; b=k2IUgQV5Hgj2ZLtGKsPkSulSqKtfVWnRYy8vVBBygpUhnNtttbN78thjItTS09Y+I7 GeEHFy2XtjxOEJp5FaVQ3uH+bKpHlEOcK2LE+vsuIBq3LrTi4UpULjrfujjckT/PSZj8 NDhT872N+dR0oHHEkh8QNZq1z9YQ63rJGtgim2uSTI/cSwas867ivEyipQT3GgQJKag6 6lB8ZDMNy6bXn5IYEEhtwxMfRNO0Vkn8u3/z3sMUiOQQCxyx79y5GHAsfoV8Rv69EP7/ q+wq6NTrqo0djIgKHUnWO46AM92CqM++uIq8j++bHx/xe7pBzzSclDGgJGrJn5fqym8O PlKA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=CxIP1cX8RbojJf0gUvwYsfGSNOxjOtjKBPayNFENtxw=; fh=TA6TgDg0WGl2oW0vh3MH6XcFa/78QWX/Hk301p7gWXQ=; b=cFEHXyalE045DWJ3I0JIy8viyD8VFMe9B86mHWVDedq9vnMhyx4+NxGdLwwPTun1eI m0j1eAU2Pf1fxNC9zkW6meXHJMGo9Xq/yy+HM8zvPcHtvkB3UMXcRgok864DsEgoPqWF GG4quNbruOEWDnU2Zjf1irivI/DNjB7UjTX5pLrYlpx6T+9nde6g3EheM1QEx8KVBDPm FWhwpNSzmkdjPGCgmO/8JKMJE4PEsGotpEvX719zqqf1tilyT/snKZQMmiRIrcGQkB1I NEwvxROad5g0B3vxUaIWZrgX8W1P+ZLzLuioiUae9cRZWtL3bmzXmaAymhl71f80yTMa gy9g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.co.uk header.s=amazon201209 header.b=eeTgkTSF; arc=pass (i=1 spf=pass spfdomain=amazon.co.uk dkim=pass dkdomain=amazon.co.uk dmarc=pass fromdomain=amazon.co.uk); spf=pass (google.com: domain of linux-kernel+bounces-210348-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-210348-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.co.uk Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id d75a77b69052e-44038a87440si138493791cf.184.2024.06.11.10.46.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jun 2024 10:46:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-210348-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.co.uk header.s=amazon201209 header.b=eeTgkTSF; arc=pass (i=1 spf=pass spfdomain=amazon.co.uk dkim=pass dkdomain=amazon.co.uk dmarc=pass fromdomain=amazon.co.uk); spf=pass (google.com: domain of linux-kernel+bounces-210348-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-210348-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.co.uk Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 6662D1C22EDF for ; Tue, 11 Jun 2024 17:46:04 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3343A6BFD4; Tue, 11 Jun 2024 17:45:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="eeTgkTSF" Received: from smtp-fw-80006.amazon.com (smtp-fw-80006.amazon.com [99.78.197.217]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6656C1CD2B; Tue, 11 Jun 2024 17:45:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.217 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718127946; cv=none; b=bj1QcTXc0dtenLzEOjOJIo2lCpz3bjGMKGIoTyM1J9TEK/mV3130/q7leH9JvHJYvaVRM6EjGmQ8c+abg/DC3RneVh9VZ7qYCQVXEauGCVTMu3DctIuMBYZHD9rMQDTKZS0AJyjnH5ta5KcnRgUpLi2w/23D4e0DWTlw6kwI4cA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718127946; c=relaxed/simple; bh=TffUeohfdPQLAp43LWsakEtSlD8OSmDCAxIkjWVpEbs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ZpsqtkBOO+NZVMgoCTJ2yyd6EkNBNLFdk1DZf/Y5Ptgk2mJzG5SSdNsD8mfNEbK8tixQLDLRjsojQ9K+QJsMwnO7g35oFfglVBAnBh6WzuoYU6n4nqpyLJvtBSJkVTfYB6qNq6/TI3ezz/AfePSCtYMKv3PO9KxQy2XioSZ4V4g= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=eeTgkTSF; arc=none smtp.client-ip=99.78.197.217 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1718127944; x=1749663944; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CxIP1cX8RbojJf0gUvwYsfGSNOxjOtjKBPayNFENtxw=; b=eeTgkTSFHjaAptcChgpZtETTnqCmOtefy+70qL9glHGRAdSNQSXhytHa rsVHUDTrOzHLx8vrWRJ3x04YMBMLsawexq/enmtED/8u8rdvhlffNdWUR jECB8Eq6WJgVobhfY3TNcOCbcsE/E0eMeYlXuOjFzG6JycKYI0/NDSiW4 A=; X-IronPort-AV: E=Sophos;i="6.08,230,1712620800"; d="scan'208";a="301492498" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80006.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jun 2024 17:45:40 +0000 Received: from EX19MTAEUA001.ant.amazon.com [10.0.17.79:20292] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.35.119:2525] with esmtp (Farcaster) id 704b2958-af5b-40c0-9dea-ba1a9127a023; Tue, 11 Jun 2024 17:45:38 +0000 (UTC) X-Farcaster-Flow-ID: 704b2958-af5b-40c0-9dea-ba1a9127a023 Received: from EX19D007EUA002.ant.amazon.com (10.252.50.68) by EX19MTAEUA001.ant.amazon.com (10.252.50.192) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.34; Tue, 11 Jun 2024 17:45:38 +0000 Received: from EX19MTAUWA001.ant.amazon.com (10.250.64.204) by EX19D007EUA002.ant.amazon.com (10.252.50.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.34; Tue, 11 Jun 2024 17:45:38 +0000 Received: from dev-dsk-fgriffo-1c-69b51a13.eu-west-1.amazon.com (10.13.244.152) by mail-relay.amazon.com (10.250.64.204) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.34 via Frontend Transport; Tue, 11 Jun 2024 17:45:34 +0000 From: Fred Griffoul To: CC: Fred Griffoul , Catalin Marinas , Will Deacon , Alex Williamson , Waiman Long , Zefan Li , Tejun Heo , Johannes Weiner , Mark Rutland , Marc Zyngier , Oliver Upton , Mark Brown , Ard Biesheuvel , Joey Gouly , Ryan Roberts , Jeremy Linton , Jason Gunthorpe , Yi Liu , Kevin Tian , Eric Auger , Stefan Hajnoczi , "Christian Brauner" , Ankit Agrawal , "Reinette Chatre" , Ye Bin , , , , Subject: [PATCH v6 2/2] vfio/pci: add interrupt affinity support Date: Tue, 11 Jun 2024 17:44:25 +0000 Message-ID: <20240611174430.90787-3-fgriffo@amazon.co.uk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240611174430.90787-1-fgriffo@amazon.co.uk> References: <20240611174430.90787-1-fgriffo@amazon.co.uk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain The usual way to configure a device interrupt from userland is to write the /proc/irq//smp_affinity or smp_affinity_list files. When using vfio to implement a device driver or a virtual machine monitor, this may not be ideal: the process managing the vfio device interrupts may not be granted root privilege, for security reasons. Thus it cannot directly control the interrupt affinity and has to rely on an external command. This patch extends the VFIO_DEVICE_SET_IRQS ioctl() with a new data flag to specify the affinity of interrupts of a vfio pci device. The CPU affinity mask argument must be a subset of the process cpuset, otherwise an error -EPERM is returned. The vfio_irq_set argument shall be set-up in the following way: - the 'flags' field have the new flag VFIO_IRQ_SET_DATA_CPUSET set as well as VFIO_IRQ_SET_ACTION_TRIGGER. - the variable-length 'data' field is a cpu_set_t structure, as for the sched_setaffinity() syscall, the size of which is derived from 'argsz'. Signed-off-by: Fred Griffoul --- drivers/vfio/pci/vfio_pci_core.c | 2 +- drivers/vfio/pci/vfio_pci_intrs.c | 41 +++++++++++++++++++++++++++++++ drivers/vfio/vfio_main.c | 15 ++++++++--- include/uapi/linux/vfio.h | 15 ++++++++++- 4 files changed, 67 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 80cae87fff36..fbc490703031 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1174,7 +1174,7 @@ static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev, return -EINVAL; } - info.flags = VFIO_IRQ_INFO_EVENTFD; + info.flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_CPUSET; info.count = vfio_pci_get_irq_count(vdev, info.index); diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 8382c5834335..b339c42cb1c0 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "vfio_pci_priv.h" @@ -82,6 +83,40 @@ vfio_irq_ctx_alloc(struct vfio_pci_core_device *vdev, unsigned long index) return ctx; } +static int vfio_pci_set_affinity(struct vfio_pci_core_device *vdev, + unsigned int start, unsigned int count, + struct cpumask *irq_mask) +{ + cpumask_var_t allowed_mask; + int irq, err = 0; + unsigned int i; + + if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL)) + return -ENOMEM; + + cpuset_cpus_allowed(current, allowed_mask); + if (!cpumask_subset(irq_mask, allowed_mask)) { + err = -EPERM; + goto finish; + } + + for (i = start; i < start + count; i++) { + irq = pci_irq_vector(vdev->pdev, i); + if (irq < 0) { + err = -EINVAL; + break; + } + + err = irq_set_affinity(irq, irq_mask); + if (err) + break; + } + +finish: + free_cpumask_var(allowed_mask); + return err; +} + /* * INTx */ @@ -665,6 +700,9 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev, if (!is_intx(vdev)) return -EINVAL; + if (flags & VFIO_IRQ_SET_DATA_CPUSET) + return vfio_pci_set_affinity(vdev, start, count, data); + if (flags & VFIO_IRQ_SET_DATA_NONE) { vfio_send_intx_eventfd(vdev, vfio_irq_ctx_get(vdev, 0)); } else if (flags & VFIO_IRQ_SET_DATA_BOOL) { @@ -713,6 +751,9 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev, if (!irq_is(vdev, index)) return -EINVAL; + if (flags & VFIO_IRQ_SET_DATA_CPUSET) + return vfio_pci_set_affinity(vdev, start, count, data); + for (i = start; i < start + count; i++) { ctx = vfio_irq_ctx_get(vdev, i); if (!ctx) diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index e97d796a54fb..2e4f4e37cf89 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1505,23 +1505,30 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs, size = 0; break; case VFIO_IRQ_SET_DATA_BOOL: - size = sizeof(uint8_t); + size = size_mul(hdr->count, sizeof(uint8_t)); break; case VFIO_IRQ_SET_DATA_EVENTFD: - size = sizeof(int32_t); + size = size_mul(hdr->count, sizeof(int32_t)); + break; + case VFIO_IRQ_SET_DATA_CPUSET: + size = hdr->argsz - minsz; + if (size < cpumask_size()) + return -EINVAL; + if (size > cpumask_size()) + size = cpumask_size(); break; default: return -EINVAL; } if (size) { - if (hdr->argsz - minsz < hdr->count * size) + if (hdr->argsz - minsz < size) return -EINVAL; if (!data_size) return -EINVAL; - *data_size = hdr->count * size; + *data_size = size; } return 0; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 2b68e6cdf190..d2edf6b725f8 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -530,6 +530,10 @@ struct vfio_region_info_cap_nvlink2_lnkspd { * Absence of the NORESIZE flag indicates that vectors can be enabled * and disabled dynamically without impacting other vectors within the * index. + * + * The CPUSET flag indicates the interrupt index supports setting + * its affinity with a cpu_set_t configured with the SET_IRQ + * ioctl(). */ struct vfio_irq_info { __u32 argsz; @@ -538,6 +542,7 @@ struct vfio_irq_info { #define VFIO_IRQ_INFO_MASKABLE (1 << 1) #define VFIO_IRQ_INFO_AUTOMASKED (1 << 2) #define VFIO_IRQ_INFO_NORESIZE (1 << 3) +#define VFIO_IRQ_INFO_CPUSET (1 << 4) __u32 index; /* IRQ index */ __u32 count; /* Number of IRQs within this index */ }; @@ -580,6 +585,12 @@ struct vfio_irq_info { * * Note that ACTION_[UN]MASK specify user->kernel signaling (irqfds) while * ACTION_TRIGGER specifies kernel->user signaling. + * + * DATA_CPUSET specifies the affinity for the range of interrupt vectors. + * It must be set with ACTION_TRIGGER in 'flags'. The variable-length 'data' + * array is the CPU affinity mask represented as a 'cpu_set_t' structure, as + * for the sched_setaffinity() syscall argument: the 'argsz' field is used + * to check the actual cpu_set_t size. */ struct vfio_irq_set { __u32 argsz; @@ -587,6 +598,7 @@ struct vfio_irq_set { #define VFIO_IRQ_SET_DATA_NONE (1 << 0) /* Data not present */ #define VFIO_IRQ_SET_DATA_BOOL (1 << 1) /* Data is bool (u8) */ #define VFIO_IRQ_SET_DATA_EVENTFD (1 << 2) /* Data is eventfd (s32) */ +#define VFIO_IRQ_SET_DATA_CPUSET (1 << 6) /* Data is cpu_set_t */ #define VFIO_IRQ_SET_ACTION_MASK (1 << 3) /* Mask interrupt */ #define VFIO_IRQ_SET_ACTION_UNMASK (1 << 4) /* Unmask interrupt */ #define VFIO_IRQ_SET_ACTION_TRIGGER (1 << 5) /* Trigger interrupt */ @@ -599,7 +611,8 @@ struct vfio_irq_set { #define VFIO_IRQ_SET_DATA_TYPE_MASK (VFIO_IRQ_SET_DATA_NONE | \ VFIO_IRQ_SET_DATA_BOOL | \ - VFIO_IRQ_SET_DATA_EVENTFD) + VFIO_IRQ_SET_DATA_EVENTFD | \ + VFIO_IRQ_SET_DATA_CPUSET) #define VFIO_IRQ_SET_ACTION_TYPE_MASK (VFIO_IRQ_SET_ACTION_MASK | \ VFIO_IRQ_SET_ACTION_UNMASK | \ VFIO_IRQ_SET_ACTION_TRIGGER) -- 2.40.1