Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp877859rwr; Wed, 19 Apr 2023 14:44:55 -0700 (PDT) X-Google-Smtp-Source: AKy350Z3FkF4aV4RJAW7UgLfQUBoMoYjPULt/FIuYYiAAFDZ+8sSQt+0vc2xBPFLNjz/deDogd5j X-Received: by 2002:a17:90a:77cc:b0:246:fd0f:3811 with SMTP id e12-20020a17090a77cc00b00246fd0f3811mr4105438pjs.40.1681940695297; Wed, 19 Apr 2023 14:44:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681940695; cv=none; d=google.com; s=arc-20160816; b=WCjXpzEuelfQG2WsT9U3+NvEGOG3I6g5wkawRF/xCBfDqQnGHP6faatYnOW7xfeIQI Wh5pk3FkZzHwZzdq795jYPYNP2foSGhNP04VkWjY2seDI4ifYhiSTpD1SXng7uOBaDZt efjDp+6aFI7TatNWnw/IEnMgF4knVZLF7ws1Gkv/18tjXa0EOKPPI+sQwYRy7qwVhUxF zAVvag7+mNs3FyoBhF08clBKbeuXFAFkzs7O0ZNP8JSogBvSxpwwhzz7P3C2ZKzCXNPf CVCe7KzZF2mlWByBnWw/AQOp4Q2G2HlRQr8laa8tw/DutZ8QLLmQN841pYrUs8OcGlkh ob7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=AqqRZHP28AKreJPHuhkGouAT+TmrF/0x0xX/UpbfWDw=; b=MvesdEQGreA8qtGOuyA36Cn1nqMPiL81/6Ex8nEsjsf9mRiMP40t8EgmtAap3dOhVR mQ6QDHoIYxa3vCDDmb9wnIIoYIsXaKfpxKwW+++Qg2ZK79i786jKny9Zk1P8PlwudZHm UxNU9vWjs2dY9xa6c1udFkypXw4z1k2nn16uyclQLMtWYWsYBxtY3UrbmAC3ElZv8Up9 CYKlsVD848+QRYL7gBOaSIzVwHsRU71wG/0kHAHIQOHAotSTsIJd2xrl6Fm/CLmbpZky ShvImeMINBNRwYORF4IuSaHB3uBcVksiee+rB8cVOfa43HQpUlyg7c/cdr+DgPlUBywg CVEw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dIUb5Dba; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nv13-20020a17090b1b4d00b002471f613111si2663455pjb.62.2023.04.19.14.44.43; Wed, 19 Apr 2023 14:44:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dIUb5Dba; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231753AbjDSVjC (ORCPT + 99 others); Wed, 19 Apr 2023 17:39:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58438 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229542AbjDSVjA (ORCPT ); Wed, 19 Apr 2023 17:39:00 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B2621BEB for ; Wed, 19 Apr 2023 14:38:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681940290; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AqqRZHP28AKreJPHuhkGouAT+TmrF/0x0xX/UpbfWDw=; b=dIUb5Dbae62OgZ1tLvBDdW1z0E2b5fEG9w/iD3qJbuSSVV+ym1NswxIz9kPQ3N3Ee3EJf/ 2sH8CYUrERJni3kVOU8VYoKmsb6Si6XqMDfRmZeilY6N+mZFOE1srTLabjNsgUqXuEKkuc aUkTFU74b7SY4RyGNU9k43YbEJbCS6I= Received: from mail-il1-f200.google.com (mail-il1-f200.google.com [209.85.166.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-657-BvnyrE23PNCYCayXLjZEyQ-1; Wed, 19 Apr 2023 17:38:08 -0400 X-MC-Unique: BvnyrE23PNCYCayXLjZEyQ-1 Received: by mail-il1-f200.google.com with SMTP id e9e14a558f8ab-32b65428489so1374005ab.1 for ; Wed, 19 Apr 2023 14:38:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681940288; x=1684532288; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AqqRZHP28AKreJPHuhkGouAT+TmrF/0x0xX/UpbfWDw=; b=geKYVFa6WyBU37Sle7Ag/SwmvcfCEH7eVsz0vMm3Hzyq+4jX6B5yqO7efszQOxhrz9 5C7Xo1zLxNzzt/EOBuizdE0TUFSh5CU1eZNuXUaNoJkvQz2ul4+Mwv5b5zf7iJL+GTyn sb2dxRgfSEWsdW30mYiKFGIIBGA0OIGn2hzbbQ0weQ0ivHOh2GxKQ95mwXSALgJ0IeSs e0SmS1DEQuvLPWpmVaHXYZJWmb6PwOjqgUlsPqwATAZxO9i4uJhP06YiBlAVoJzGJ4kL 9Ea9rw1DR+RaSdJj97l+8pMZ7txdiuNU6WDrACs0vWKSj3PQ12Ue+htIUBNNO+4gXm9Z LhiA== X-Gm-Message-State: AAQBX9frdD2Bc+SsdOskBMrH+yZvvEYTHbaNhnapizJCTe7tLQc896nz JkxjkBkLhKW/s/z8NdGB6aGwl6nw24UwxC+Y+oTRmDvVMgolrPq1+fzS4pDrCLuo224N8bGlsBn vJlFDcDfgI6IqW3cy6UZ/JV0H X-Received: by 2002:a05:6e02:5d2:b0:324:7d63:8622 with SMTP id l18-20020a056e0205d200b003247d638622mr1145197ils.2.1681940288049; Wed, 19 Apr 2023 14:38:08 -0700 (PDT) X-Received: by 2002:a05:6e02:5d2:b0:324:7d63:8622 with SMTP id l18-20020a056e0205d200b003247d638622mr1145181ils.2.1681940287751; Wed, 19 Apr 2023 14:38:07 -0700 (PDT) Received: from redhat.com ([38.15.36.239]) by smtp.gmail.com with ESMTPSA id n4-20020a056638210400b0040d9a3d54desm4743jaj.170.2023.04.19.14.38.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Apr 2023 14:38:06 -0700 (PDT) Date: Wed, 19 Apr 2023 15:38:04 -0600 From: Alex Williamson To: Reinette Chatre Cc: , , , , , , , , , , , , Subject: Re: [PATCH V3 09/10] vfio/pci: Support dynamic MSI-X Message-ID: <20230419153804.7ad0b122.alex.williamson@redhat.com> In-Reply-To: References: <86cda5cf2742feff3b14954284fb509863355050.1681837892.git.reinette.chatre@intel.com> <20230418163829.149e8881.alex.williamson@redhat.com> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.35; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 19 Apr 2023 11:13:29 -0700 Reinette Chatre wrote: > Hi Alex, > > On 4/18/2023 3:38 PM, Alex Williamson wrote: > > On Tue, 18 Apr 2023 10:29:20 -0700 > > Reinette Chatre wrote: > > > >> Recently introduced pci_msix_alloc_irq_at() and pci_msix_free_irq() > >> enables an individual MSI-X interrupt to be allocated and freed after > >> MSI-X enabling. > >> > >> Use dynamic MSI-X (if supported by the device) to allocate an interrupt > >> after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at > >> the time a valid eventfd is assigned. This is different behavior from > >> a range provided during MSI-X enabling where interrupts are allocated > >> for the entire range whether a valid eventfd is provided for each > >> interrupt or not. > >> > >> Do not dynamically free interrupts, leave that to when MSI-X is > >> disabled. > > > > But we do, sometimes, even if essentially only on the error path. Is > > that worthwhile? It seems like we could entirely remove > > vfio_msi_free_irq() and rely only on pci_free_irq_vectors() on MSI/X > > teardown. > > Yes, it is only on the error path where dynamic MSI-X interrupts are > removed. I do not know how to determine if it is worthwhile. On the > kernel side failure seems unlikely since it would mean memory cannot > be allocated or request_irq() failed. In these cases it may not be > worthwhile since user space may try again and having the interrupt > already allocated would be helpful. The remaining error seems to be > if user space provided an invalid eventfd. An allocation in response > to wrong user input is a concern to me. Should we consider > buggy/malicious users? I am uncertain here so would defer to your > guidance. I don't really see that a malicious user can exploit anything here, their irq allocation is bound by the device support and they're entitled to make use of the full vector set of the device by virtue of having ownership of the device. All the MSI-X allocated irqs are freed when the interrupt mode is changed or the device is released regardless. The end result is also no different than if the user had not configured the vector when enabling MSI-X or configured it and later de-configured with a -1 eventfd. The irq is allocated but not attached to a ctx. We're intentionally using this as a cache. Also, as implemented here in v3, we might be freeing from the original allocation rather than a new, dynamically allocated irq. My thinking is that if we think there's a benefit to caching any allocated irqs, we should do so consistently. We don't currently know if the irq was allocated now or previously. Tracking that would add complexity for little benefit. The user can get to the same end result of an allocated, unused irq in numerous way, the state itself is not erroneous, and is actually in support of caching irq allocations. Removing the de-allocation of a single vector further simplifies the code as there exists only one path where irqs are freed, ie. pci_free_irq_vectors(). So I'd lean towards removing vfio_msi_free_irq(). > > I'd probably also add a comment in the commit log about the theory > > behind not dynamically freeing irqs, ie. latency, reliability, and > > whatever else we used to justify it. Thanks, > > Sure. How about something like below to replace the final sentence of > the changelog: > > "When a guest disables an interrupt, user space (Qemu) does not > disable the interrupt but instead assigns it a different trigger. A > common flow is thus for the VFIO_DEVICE_SET_IRQS ioctl() to be > used to assign a new eventfd to an already enabled interrupt. Freeing > and re-allocating an interrupt in this scenario would add unnecessary > latency as well as uncertainty since the re-allocation may fail. Do > not dynamically free interrupts when an interrupt is disabled, instead > support a subsequent re-enable to draw from the initial allocation when > possible. Leave freeing of interrupts to when MSI-X is disabled." There are other means besides caching irqs that could achieve the above, for instance if a trigger is simply swapped from one eventfd to another, that all happens within vfio_msi_set_vector_signal() where we could hold the irq for the transition. I think I might justify it as: The PCI-MSIX API requires that some number of irqs are allocated for an initial set of vectors when enabling MSI-X on the device. When dynamic MSIX allocation is not supported, the vector table, and thus the allocated irq set can only be resized by disabling and re-enabling MSIX with a different range. In that case the irq allocation is essentially a cache for configuring vectors within the previously allocated vector range. When dynamic MSIX allocation is supported, the API still requires some initial set of irqs to be allocated, but also supports allocating and freeing specific irq vectors both within and beyond the initially allocated range. For consistency between modes, as well as to reduce latency and improve reliability of allocations, and also simplicity, this implementation only releases irqs via pci_free_irq_vectors() when either the interrupt mode changes or the device is released. Does that cover the key points for someone that might want to revisit this decision later? Thanks, Alex