Received: by 2002:ab2:715a:0:b0:1fd:c064:50c with SMTP id l26csp58016lqm; Mon, 10 Jun 2024 12:34:08 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXpt9Qmoohp5ztk3unBFCAoPUW6UYogkf+un4+TVPonO3U8Xor4WlowE1bVBut/E6BCXPETgDrv1Dj/2i9TznOonqYavKKL+NAnLp72jg== X-Google-Smtp-Source: AGHT+IHxE9KcVwzkuUqLp9DBvxvkJINpQivcuYFXpewv2mIgxzl6Zg1O4dx7vJ9hieXGi8qNQTpp X-Received: by 2002:a05:6a20:12c6:b0:1b5:d143:72e7 with SMTP id adf61e73a8af0-1b5d1437426mr5570302637.32.1718048047758; Mon, 10 Jun 2024 12:34:07 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1718048047; cv=pass; d=google.com; s=arc-20160816; b=XFYN/9frqnc58uuDkkX9l1/sFNICaYh6/1C59dLECmTe5hBJlIkN7tMtHzbE3XVZRC vKJADnMTcvhM+B9TyQoYNDaB29CL1Df5eXxiETa9XcWgQ7pwGQvp0kyf3KO1ldJ559mb 5swdXYvO+vCDf9zIOdD1klNwqzvEr3ugUyk2icpepdppom6xXvm9g9JzOIR4uePQMuo1 5swhsPhNgECKRiJfWBU/18BUnJ9/JxbdEVU0Gsrin406RLNNT7Ie07zQ0In6sucuJWXw pKcGItFYWLIuVejPG1/ZRR/vxOR3/0mdSTeBaczuAVX7V9t2DmTFsOYQoPqSFrclQvyy SEyg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:organization:references :in-reply-to:message-id:subject:cc:to:from:date:dkim-signature; bh=5RyIS9264RlqB1OG3rOf8l6Ym1+Q4KsvmigQA6LMdr0=; fh=FKq0itEBZasVJTzMwma9IFpdYwln3kRSGyUj9Pjbmyw=; b=KrVKxid9Ioc1RxPztDACXxnOC81Cf+d26HtMauhue+ODX7Ep3dcdC543Qz1nVsjeN+ J9m4kd1TvvlY+r+v/B2t7+Yt9OQSrtK9gzfGMXq48BWxmIyKfhdu0UTZ3PbKz/Ct67X+ Xe1n3BoKB104EV2a4VulBX5H0sKltwyGquCiGdY6atZjYH5BuofN5vMj82co324EKqr2 C5W8+JskDWlApqEBNnZe3d939YSqLtfrwUQfJ8FPmJoEgFUGErGlvpS2HmwVB2T3OVYF zLA1JIr1K/wNjJpkOYiY3VVh3m5sbnPidEkys2QbII+q93pdjD5z5z0v4ib9rTJC8Aga LyxQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=L5OhnsNl; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-208802-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-208802-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id 41be03b00d2f7-6e555436fe9si4598535a12.598.2024.06.10.12.34.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jun 2024 12:34:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-208802-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=L5OhnsNl; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-208802-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-208802-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 35119B229F9 for ; Mon, 10 Jun 2024 19:32:32 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D914D14E2DE; Mon, 10 Jun 2024 19:32:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="L5OhnsNl" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B3F22230F for ; Mon, 10 Jun 2024 19:32:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718047940; cv=none; b=DEZ2vR4dxO2F5mK8nMcEE89k0+QiTwqKcyEY7fw+q2MHuI3EsFrrGASdkM0ftNHqx71x6hGNDGq/IKFYJFE7e/ZcGaBVGSSmoDQT72CIDhdSzFNzE/AhM9kl7gasCGqLiyWgMZ/wH8Bk+5ZKm0tIFclWTJW6q1JzKP9bNCapVYI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718047940; c=relaxed/simple; bh=InOa6heL5Z0zQtrtZXM1t8E5jy7Fl6gS5IxHmczhyvc=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IIgGhgGILZV84Fwa6qCUpZe8raQ6O65qIRlM+duKzGcHp++lZ7Ol5CL29sGHhzHTn9XxQjXP8XjQwO4+bGLuAQYut9OdhezngIWZzsjBl1jNwFwYclN1sxh3kjyYBKTDXUBFDpFgPXu7YybunpdnNur8AIbs6CdE5/JjXFJ3o74= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=L5OhnsNl; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718047938; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5RyIS9264RlqB1OG3rOf8l6Ym1+Q4KsvmigQA6LMdr0=; b=L5OhnsNldxAGfxfKdTWSJ96GH956qFe7lO+SQORq8PlvsKuHO/2gi+Z2ITQrnLKzdBced8 SqU9orty5oAgKU6LLFQWAUM7GX9g5tb+DsFl1YDmGk1twovz5bPniOUZjCcyn76EOnXFNc 5vaC1bS7Y5csJoqorMhY3V3LyOrhtH4= Received: from mail-ot1-f70.google.com (mail-ot1-f70.google.com [209.85.210.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-689-rSVgpKLUP4-M-Da43Tc3zA-1; Mon, 10 Jun 2024 15:32:17 -0400 X-MC-Unique: rSVgpKLUP4-M-Da43Tc3zA-1 Received: by mail-ot1-f70.google.com with SMTP id 46e09a7af769-6f91043c447so287398a34.2 for ; Mon, 10 Jun 2024 12:32:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718047933; x=1718652733; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=5RyIS9264RlqB1OG3rOf8l6Ym1+Q4KsvmigQA6LMdr0=; b=NzbrepaYE9g8+twiJ6JWA8RqLr0PwuXNDy+RmPuGSwFyriNDpQrF+CT3guJD6hM4Fz nPlwEXQ+r17ggcXNBkvj3oJoikNcmA6GSncnBTFbP6dumWrzXRWu9Rosb+n7oRWBGGpp Wef/u2/XhZsGjZSZyh3rfFtRuXC+7GnOhiR2L3OOOPcY4P6vYlhcH14S4QhfbBrxvJK6 GisLDkazKC35QBGhg0NrPvlGGGmt+qxBTxH97X/SD8d1enQl7nytVzwtZbe1WrVJ+91Q PTcjtks6yEvsOHPFv3fFON7PDYx2lCqBV+8yWNTWRbDh6yI6zvSAAQCIzU66ciPK+P9Z Qfvg== X-Forwarded-Encrypted: i=1; AJvYcCWaV8wo19dar/cgL3VlFHQqIujZwOnM3090NWQe2D/8w3khlS0Xv0VyEV4dVZpyqmGJui12FgxEqsPCrJbuG0umovjf998yX5je3M5m X-Gm-Message-State: AOJu0YwQUpRJPH6hDZI9BhObbEJepZmBtCT+Zy5dTqMwpXKbFYgVptza DJeiMBv+KK9GgUjeAxYUemrbgrAioMDbpkcgYW3UqLdUjXYhZoP0rQZj7qVo7Yp8as1GgfRkm8V VfvYwkJ4vehpICDvzdjOquDmMhg9hr88tmSZG2nZ2Kf94quZCbjcnlMACh/OKUQ== X-Received: by 2002:a05:6870:219e:b0:24c:b878:b4fc with SMTP id 586e51a60fabf-2546441a494mr11034211fac.17.1718047932659; Mon, 10 Jun 2024 12:32:12 -0700 (PDT) X-Received: by 2002:a05:6870:219e:b0:24c:b878:b4fc with SMTP id 586e51a60fabf-2546441a494mr11034175fac.17.1718047932132; Mon, 10 Jun 2024 12:32:12 -0700 (PDT) Received: from redhat.com ([38.15.36.11]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-25447e0171dsm2438233fac.13.2024.06.10.12.32.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jun 2024 12:32:11 -0700 (PDT) Date: Mon, 10 Jun 2024 13:32:07 -0600 From: Alex Williamson To: Fred Griffoul Cc: , Catalin Marinas , Will Deacon , Waiman Long , Zefan Li , Tejun Heo , Johannes Weiner , Mark Rutland , Marc Zyngier , Oliver Upton , Mark Brown , Ard Biesheuvel , Joey Gouly , Ryan Roberts , Jeremy Linton , Jason Gunthorpe , Yi Liu , Kevin Tian , Eric Auger , Stefan Hajnoczi , "Christian Brauner" , Ankit Agrawal , "Reinette Chatre" , Ye Bin , , , , Subject: Re: [PATCH v5 2/2] vfio/pci: add msi interrupt affinity support Message-ID: <20240610133207.7d039dab.alex.williamson@redhat.com> In-Reply-To: <20240610125713.86750-3-fgriffo@amazon.co.uk> References: <20240610125713.86750-1-fgriffo@amazon.co.uk> <20240610125713.86750-3-fgriffo@amazon.co.uk> Organization: Red Hat Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Mon, 10 Jun 2024 12:57:08 +0000 Fred Griffoul wrote: > The usual way to configure a device interrupt from userland is to write > the /proc/irq//smp_affinity or smp_affinity_list files. When using > vfio to implement a device driver or a virtual machine monitor, this may > not be ideal: the process managing the vfio device interrupts may not be > granted root privilege, for security reasons. Thus it cannot directly > control the interrupt affinity and has to rely on an external command. > > This patch extends the VFIO_DEVICE_SET_IRQS ioctl() with a new data flag > to specify the affinity of interrupts of a vfio pci device. > > The CPU affinity mask argument must be a subset of the process cpuset, > otherwise an error -EPERM is returned. > > The vfio_irq_set argument shall be set-up in the following way: > > - the 'flags' field have the new flag VFIO_IRQ_SET_DATA_AFFINITY set > as well as VFIO_IRQ_SET_ACTION_TRIGGER. > > - the variable-length 'data' field is a cpu_set_t structure, as > for the sched_setaffinity() syscall, the size of which is derived > from 'argsz'. > > Signed-off-by: Fred Griffoul > --- > drivers/vfio/pci/vfio_pci_core.c | 27 +++++++++++++++++---- > drivers/vfio/pci/vfio_pci_intrs.c | 39 +++++++++++++++++++++++++++++++ > drivers/vfio/vfio_main.c | 13 +++++++---- > include/uapi/linux/vfio.h | 10 +++++++- > 4 files changed, 80 insertions(+), 9 deletions(-) > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > index 80cae87fff36..2e3419560480 100644 > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -1192,6 +1192,7 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev, > { > unsigned long minsz = offsetofend(struct vfio_irq_set, count); > struct vfio_irq_set hdr; > + cpumask_var_t mask; > u8 *data = NULL; > int max, ret = 0; > size_t data_size = 0; > @@ -1207,9 +1208,22 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev, > return ret; > > if (data_size) { > - data = memdup_user(&arg->data, data_size); > - if (IS_ERR(data)) > - return PTR_ERR(data); > + if (hdr.flags & VFIO_IRQ_SET_DATA_AFFINITY) { > + if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) > + return -ENOMEM; > + > + if (copy_from_user(mask, &arg->data, data_size)) { > + ret = -EFAULT; > + goto out; > + } > + > + data = (u8 *)mask; Seems like this could just use the memdup_user() path, why do we care to copy it into a cpumask_var_t here? If we do care, wouldn't we implement something like get_user_cpu_mask() used by sched_setaffinity(2)? > + > + } else { > + data = memdup_user(&arg->data, data_size); > + if (IS_ERR(data)) > + return PTR_ERR(data); > + } > } > > mutex_lock(&vdev->igate); > @@ -1218,7 +1232,12 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev, > hdr.count, data); > > mutex_unlock(&vdev->igate); > - kfree(data); > + > +out: > + if (hdr.flags & VFIO_IRQ_SET_DATA_AFFINITY && data_size) > + free_cpumask_var(mask); > + else > + kfree(data); > > return ret; > } > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c > index 8382c5834335..fe01303cf94e 100644 > --- a/drivers/vfio/pci/vfio_pci_intrs.c > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > > #include "vfio_pci_priv.h" > > @@ -675,6 +676,41 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev, > return 0; > } > > +static int vfio_pci_set_msi_affinity(struct vfio_pci_core_device *vdev, > + unsigned int start, unsigned int count, > + struct cpumask *irq_mask) Aside from the name, what makes this unique to MSI vectors? > +{ > + struct vfio_pci_irq_ctx *ctx; > + cpumask_var_t allowed_mask; > + unsigned int i; > + int err = 0; > + > + if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL)) > + return -ENOMEM; > + > + cpuset_cpus_allowed(current, allowed_mask); > + if (!cpumask_subset(irq_mask, allowed_mask)) { > + err = -EPERM; > + goto finish; > + } > + > + for (i = start; i < start + count; i++) { > + ctx = vfio_irq_ctx_get(vdev, i); > + if (!ctx) { > + err = -EINVAL; > + break; > + } > + > + err = irq_set_affinity(ctx->producer.irq, irq_mask); > + if (err) > + break; > + } Is this typical/userful behavior to set a series of vectors to the same cpu_set_t? It's unusual behavior for this ioctl to apply the same data across multiple vectors. Should the DATA_AFFINITY case support an array of cpu_set_t? > + > +finish: > + free_cpumask_var(allowed_mask); > + return err; > +} > + > static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev, > unsigned index, unsigned start, > unsigned count, uint32_t flags, void *data) > @@ -713,6 +749,9 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev, > if (!irq_is(vdev, index)) > return -EINVAL; > > + if (flags & VFIO_IRQ_SET_DATA_AFFINITY) > + return vfio_pci_set_msi_affinity(vdev, start, count, data); > + > for (i = start; i < start + count; i++) { > ctx = vfio_irq_ctx_get(vdev, i); > if (!ctx) > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > index e97d796a54fb..e75c5d66681c 100644 > --- a/drivers/vfio/vfio_main.c > +++ b/drivers/vfio/vfio_main.c > @@ -1505,23 +1505,28 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs, > size = 0; > break; > case VFIO_IRQ_SET_DATA_BOOL: > - size = sizeof(uint8_t); > + size = hdr->count * sizeof(uint8_t); > break; > case VFIO_IRQ_SET_DATA_EVENTFD: > - size = sizeof(int32_t); > + size = size_mul(hdr->count, sizeof(int32_t)); Why use size_mul() in one place and not the other? > + break; > + case VFIO_IRQ_SET_DATA_AFFINITY: > + size = hdr->argsz - minsz; > + if (size > cpumask_size()) > + size = cpumask_size(); Or just set size = (hdr->argsz - minsz) / count? Generate an error if (hdr->argsz - minsz) % count? It seems like all the cpumask'items can be contained to the set affinity function. > break; > default: > return -EINVAL; > } > > if (size) { > - if (hdr->argsz - minsz < hdr->count * size) > + if (hdr->argsz - minsz < size) > return -EINVAL; > > if (!data_size) > return -EINVAL; > > - *data_size = hdr->count * size; > + *data_size = size; > } > > return 0; > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 2b68e6cdf190..5ba2ca223550 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -580,6 +580,12 @@ struct vfio_irq_info { > * > * Note that ACTION_[UN]MASK specify user->kernel signaling (irqfds) while > * ACTION_TRIGGER specifies kernel->user signaling. > + * > + * DATA_AFFINITY specifies the affinity for the range of interrupt vectors. > + * It must be set with ACTION_TRIGGER in 'flags'. The variable-length 'data' > + * array is a CPU affinity mask 'cpu_set_t' structure, as for the > + * sched_setaffinity() syscall argument: the 'argsz' field is used to check > + * the actual cpu_set_t size. > */ DATA_CPUSET? The IRQ_INFO ioctl should probably also report support for this feature. Is there any proposed userspace code that takes advantage of this interface? Thanks, Alex > struct vfio_irq_set { > __u32 argsz; > @@ -587,6 +593,7 @@ struct vfio_irq_set { > #define VFIO_IRQ_SET_DATA_NONE (1 << 0) /* Data not present */ > #define VFIO_IRQ_SET_DATA_BOOL (1 << 1) /* Data is bool (u8) */ > #define VFIO_IRQ_SET_DATA_EVENTFD (1 << 2) /* Data is eventfd (s32) */ > +#define VFIO_IRQ_SET_DATA_AFFINITY (1 << 6) /* Data is cpu_set_t */ > #define VFIO_IRQ_SET_ACTION_MASK (1 << 3) /* Mask interrupt */ > #define VFIO_IRQ_SET_ACTION_UNMASK (1 << 4) /* Unmask interrupt */ > #define VFIO_IRQ_SET_ACTION_TRIGGER (1 << 5) /* Trigger interrupt */ > @@ -599,7 +606,8 @@ struct vfio_irq_set { > > #define VFIO_IRQ_SET_DATA_TYPE_MASK (VFIO_IRQ_SET_DATA_NONE | \ > VFIO_IRQ_SET_DATA_BOOL | \ > - VFIO_IRQ_SET_DATA_EVENTFD) > + VFIO_IRQ_SET_DATA_EVENTFD | \ > + VFIO_IRQ_SET_DATA_AFFINITY) > #define VFIO_IRQ_SET_ACTION_TYPE_MASK (VFIO_IRQ_SET_ACTION_MASK | \ > VFIO_IRQ_SET_ACTION_UNMASK | \ > VFIO_IRQ_SET_ACTION_TRIGGER)