Received: by 10.223.185.116 with SMTP id b49csp6738209wrg; Wed, 28 Feb 2018 14:46:19 -0800 (PST) X-Google-Smtp-Source: AH8x2253TUkxpV+N6Ni1zbqBPPNTIa0Mw0lnflZNn8hYtV2aGJiE9LSQm0cmws4mF4CQrOjQYgLw X-Received: by 10.99.188.2 with SMTP id q2mr15735670pge.101.1519857979153; Wed, 28 Feb 2018 14:46:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519857979; cv=none; d=google.com; s=arc-20160816; b=lHgeG22601MvHjol/EchpO76ASMNCXQR2zf8NWX8xlOEBTqu254bIfrFAtKfmfcM4C fHcnJWsUobvt3ZNapzhCW+3oX2T+O3oBk4ZgzCHqcarbWJV9sibmwkMq753vhymGwoRf AsY2Sa+Kk33+2rCl9awBqHK6YtPCmkO3TljgDBN7eh11cWAbpHEc2V3DIpEAmlCeTowq f6/6WztOAtJAYwjmMYhlFmLzCvY+G76p5qFMdmT/M5mrZoSvwlvTzw0AgRsHXlwODSJ5 P4RpfJWLPZdB4MZGZR8gmXLAn0gmkcoLm5alamcuTAEzxdR9w9UUcxKswxnfmiAiQIq+ gB2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=Ex8JuJ2mSs2CCIbZQhT3nwDmkcT0NcQB0cUVPoUT38A=; b=pzvRkEVv04ZHRMFk9MQq8mqW+zOIT6VKbIBszN5GVz8bIgALrLoR2N0sg+2jovVXz0 HUkYu+3b/p8J9uiyo1F97dMpsZsibRfpHmfV7ChVxLikw6lt6Lqou9psnsQe4wGPHsR9 DmAQzuHo5qj6Fd/Dj8wsUNNynSVxXSDYGyqFHevykCSnocG0GhlxyYxo5Y0BdF2xvTAO 5tugR3k61BwYivsRWq2sRSFh1MR56s9U8Rl8T6VsFYHtkdeEGnly2mgQSgNfvyj8x8tq FryVh0pfusISYHmkcF56BaI6EFYOjsnlWnIPwkysIdeuRC4PfdYk3qoT69CfAvxp6T0F iH5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=gudBWr4Y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u68si1563722pgb.287.2018.02.28.14.46.04; Wed, 28 Feb 2018 14:46:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=gudBWr4Y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964775AbeB1WpQ (ORCPT + 99 others); Wed, 28 Feb 2018 17:45:16 -0500 Received: from mail-it0-f68.google.com ([209.85.214.68]:37195 "EHLO mail-it0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935461AbeB1WpN (ORCPT ); Wed, 28 Feb 2018 17:45:13 -0500 Received: by mail-it0-f68.google.com with SMTP id k79so5527127ita.2; Wed, 28 Feb 2018 14:45:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=Ex8JuJ2mSs2CCIbZQhT3nwDmkcT0NcQB0cUVPoUT38A=; b=gudBWr4YcZfa2379OlVfsyKiorybfUgqDziqGuHyk4/nVlKVoHuP9jHKLQWaN34Svq 3zlv5RdcbR9GNaDq23/3UoY18oRhG/rf+UhQdrp0mZVNcQg/v3QXpTQ/SKmdOo9ou1/J R9FaJyxqFzKOurgblGE3WKghpWdQX/EdL+W1UdyEXQ5VhLFEFAjIPrX9nZ+59KkMAWeV BYH8cwarcRc5r8Oshu0Dmm7pBAOR89Z/+NHzmFC5XBQzmkQ2/3JxyH5vCp7ljIVyBq+d SLsRSA+AmK0A4+ola2EqUzbjb038m1nqwoSYmwCFPYBQ/+cTZCD/v17j0Mecny4fy3vT vG4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=Ex8JuJ2mSs2CCIbZQhT3nwDmkcT0NcQB0cUVPoUT38A=; b=sUNl0QHcuT97wIJne9i23s6Lqj+W5luIJAuaUnVZFQKvGYKi+FIixpC/YJ7UniWzMv VNoGtgH1ier1SSmgS6snEI5sfn7n9KpXqJzZIUTLrTrYM9RM3XPvj0J69d4crV+rEjxH utSKxdPzxAhXAMFDEJOtRUGuKV0Xk7xZLPjgqfdio7BgurU8NNECJnMpVoJsDWGjDbfA j14/+1r6hxB/S6gpgZEV4wnmwKkpMCzUzCBFbbx57Q3nRVEUbgpYPanIdlDR/4VlKRGH Q68R6rnu6CTXMTJx7cy+GUdzXzMYGFalyQDNGdpE5Lp4uGUS3T8FiQIuN5FQfzqejQr6 uJjw== X-Gm-Message-State: APf1xPDT0YtH4UDu6AruQlaXeoMJoLx2bJZYqNYZth8W932P85FoFgdv XNnu/SgK2TsqB3BXyOHJRuo= X-Received: by 10.36.81.208 with SMTP id s199mr128089ita.46.1519857912071; Wed, 28 Feb 2018 14:45:12 -0800 (PST) Received: from [192.168.0.16] (184-100-240-187.ptld.qwest.net. [184.100.240.187]) by smtp.gmail.com with ESMTPSA id m14sm1902175iob.88.2018.02.28.14.45.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 28 Feb 2018 14:45:10 -0800 (PST) Subject: Re: [PATCH] pci-iov: Add support for unmanaged SR-IOV To: Alexander Duyck , Alex Williamson Cc: Bjorn Helgaas , linux-pci@vger.kernel.org, Alexander Duyck , virtio-dev@lists.oasis-open.org, kvm@vger.kernel.org, Netdev , "Daly, Dan" , LKML , Maximilian Heyne , "Wang, Liang-min" , "Rustad, Mark D" , David Woodhouse , dwmw@amazon.co.uk References: <20180227190520.3273.79728.stgit@localhost.localdomain> <20180227144015.228d4f5c@w520.home> From: Gregory Rose Message-ID: Date: Wed, 28 Feb 2018 14:45:07 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/28/2018 9:49 AM, Alexander Duyck wrote: > On Tue, Feb 27, 2018 at 2:25 PM, Alexander Duyck > wrote: >> On Tue, Feb 27, 2018 at 1:40 PM, Alex Williamson >> wrote: >>> On Tue, 27 Feb 2018 11:06:54 -0800 >>> Alexander Duyck wrote: >>> >>>> From: Alexander Duyck >>>> >>>> This patch is meant to add support for SR-IOV on devices when the VFs are >>>> not managed by the kernel. Examples of recent patches attempting to do this >>>> include: >>> It appears to enable sriov when the _pf_ is not managed by the >>> kernel, but by "managed" we mean that either there is no pf driver or >>> the pf driver doesn't provide an sriov_configure callback, >>> intentionally or otherwise. >>> >>>> virto - https://patchwork.kernel.org/patch/10241225/ >>>> pci-stub - https://patchwork.kernel.org/patch/10109935/ >>>> vfio - https://patchwork.kernel.org/patch/10103353/ >>>> uio - https://patchwork.kernel.org/patch/9974031/ >>> So is the goal to get around the issues with enabling sriov on each of >>> the above drivers by doing it under the covers or are you really just >>> trying to enable sriov for a truly unmanage (no pf driver) case? For >>> example, should a driver explicitly not wanting sriov enabled implement >>> a dummy sriov_configure function? >>> >>>> Since this is quickly blowing up into a multi-driver problem it is probably >>>> best to implement this solution in one spot. >>>> >>>> This patch is an attempt to do that. What we do with this patch is provide >>>> a generic call to enable SR-IOV in the case that the PF driver is either >>>> not present, or the PF driver doesn't support configuring SR-IOV. >>>> >>>> A new sysfs value called sriov_unmanaged_autoprobe has been added. This >>>> value is used as the drivers_autoprobe setting of the VFs when they are >>>> being managed by an external entity such as userspace or device firmware >>>> instead of being managed by the kernel. >>> Documentation/ABI/testing/sysfs-bus-pci update is missing. >> I can make sure to update that in the next version. >> >>>> One side effect of this change is that the sriov_drivers_autoprobe and >>>> sriov_unmanaged_autoprobe will only apply their updates when SR-IOV is >>>> disabled. Attempts to update them when SR-IOV is in use will only update >>>> the local value and will not update sriov->autoprobe. >>> And we expect users to understand when sriov_drivers_autoprobe applies >>> vs sriov_unmanaged_autoprobe, even though they're using the same >>> interfaces to enable sriov? Are all combinations expected to work, ex. >>> unmanaged sriov is enabled, a native pf driver loads, vfs work? Not >>> only does it seems like there's opportunity to use this incorrectly, I >>> think maybe it might be difficult to use correctly. >>> >>>> I based my patch set originally on the patch by Mark Rustad but there isn't >>>> much left after going through and cleaning out the bits that were no longer >>>> needed, and after incorporating the feedback from David Miller. >>>> >>>> I have included the authors of the original 4 patches above in the Cc here. >>>> My hope is to get feedback and/or review on if this works for their use >>>> cases. >>>> >>>> Cc: Mark Rustad >>>> Cc: Maximilian Heyne >>>> Cc: Liang-Min Wang >>>> Cc: David Woodhouse >>>> Signed-off-by: Alexander Duyck >>>> --- >>>> drivers/pci/iov.c | 27 +++++++++++++++++++- >>>> drivers/pci/pci-driver.c | 2 + >>>> drivers/pci/pci-sysfs.c | 62 +++++++++++++++++++++++++++++++++++++++++----- >>>> drivers/pci/pci.h | 4 ++- >>>> include/linux/pci.h | 1 + >>>> 5 files changed, 86 insertions(+), 10 deletions(-) >>>> >>>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c >>>> index 677924ae0350..7b8858bd8d03 100644 >>>> --- a/drivers/pci/iov.c >>>> +++ b/drivers/pci/iov.c >>>> @@ -446,6 +446,7 @@ static int sriov_init(struct pci_dev *dev, int pos) >>>> pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device); >>>> iov->pgsz = pgsz; >>>> iov->self = dev; >>>> + iov->autoprobe = true; >>>> iov->drivers_autoprobe = true; >>>> pci_read_config_dword(dev, pos + PCI_SRIOV_CAP, &iov->cap); >>>> pci_read_config_byte(dev, pos + PCI_SRIOV_FUNC_LINK, &iov->link); >>>> @@ -643,8 +644,11 @@ void pci_restore_iov_state(struct pci_dev *dev) >>>> */ >>>> void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool auto_probe) >>>> { >>>> - if (dev->is_physfn) >>>> + if (dev->is_physfn) { >>>> dev->sriov->drivers_autoprobe = auto_probe; >>>> + if (!dev->sriov->num_VFs) >>>> + dev->sriov->autoprobe = auto_probe; >>> Why is dev->sriov->autoprobe set any time other than immediately prior >>> to enabling VFs? >> My concern here was drivers that are still floating around with the >> old module parameter option for enabling SR-IOV. In the unlikely event >> that somebody was to use such a driver I wanted to make certain that >> the drivers_autoprobe state was pre-populated. >> >>>> + } >>>> } >>>> >>>> /** >>>> @@ -703,6 +707,27 @@ void pci_disable_sriov(struct pci_dev *dev) >>>> EXPORT_SYMBOL_GPL(pci_disable_sriov); >>>> >>>> /** >>>> + * pci_sriov_configure_unmanaged - helper to configure unmanaged SR-IOV >>>> + * @dev: the PCI device >>>> + * @nr_virtfn: number of virtual functions to enable, 0 to disable >>>> + * >>>> + * Used to provide generic enable/disable SR-IOV option for devices >>>> + * that do not manage the VFs generated by their driver, or have no >>>> + * driver present. >>>> + */ >>>> +int pci_sriov_configure_unmanaged(struct pci_dev *dev, int nr_virtfn) >>>> +{ >>>> + int rc = 0; >>>> + >>>> + if (!nr_virtfn) >>>> + pci_disable_sriov(dev); >>>> + else >>>> + rc = pci_enable_sriov(dev, nr_virtfn); >>>> + >>>> + return rc ? rc : nr_virtfn; >>>> +} >>>> + >>>> +/** >>>> * pci_num_vf - return number of VFs associated with a PF device_release_driver >>>> * @dev: the PCI device >>>> * >>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c >>>> index 3bed6beda051..2cc68dff6130 100644 >>>> --- a/drivers/pci/pci-driver.c >>>> +++ b/drivers/pci/pci-driver.c >>>> @@ -398,7 +398,7 @@ void __weak pcibios_free_irq(struct pci_dev *dev) >>>> #ifdef CONFIG_PCI_IOV >>>> static inline bool pci_device_can_probe(struct pci_dev *pdev) >>>> { >>>> - return (!pdev->is_virtfn || pdev->physfn->sriov->drivers_autoprobe); >>>> + return (!pdev->is_virtfn || pdev->physfn->sriov->autoprobe); >>>> } >>>> #else >>>> static inline bool pci_device_can_probe(struct pci_dev *pdev) >>>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c >>>> index eb6bee8724cc..e701b6dbc267 100644 >>>> --- a/drivers/pci/pci-sysfs.c >>>> +++ b/drivers/pci/pci-sysfs.c >>>> @@ -605,6 +605,7 @@ static ssize_t sriov_numvfs_store(struct device *dev, >>>> struct device_attribute *attr, >>>> const char *buf, size_t count) >>>> { >>>> + int (*sriov_configure)(struct pci_dev *dev, int num_vfs); >>>> struct pci_dev *pdev = to_pci_dev(dev); >>>> int ret; >>>> u16 num_vfs; >>>> @@ -622,15 +623,20 @@ static ssize_t sriov_numvfs_store(struct device *dev, >>>> goto exit; >>>> >>>> /* is PF driver loaded w/callback */ >>>> - if (!pdev->driver || !pdev->driver->sriov_configure) { >>>> - pci_info(pdev, "Driver doesn't support SRIOV configuration via sysfs\n"); >>>> - ret = -ENOENT; >>>> - goto exit; >>>> - } >>>> + if (pdev->driver && pdev->driver->sriov_configure) >>>> + sriov_configure = pdev->driver->sriov_configure; >>>> + else >>>> + sriov_configure = pci_sriov_configure_unmanaged; >>> So an unwitting user is now able to enable vfs, independent of the >>> pf... the trouble being that they're probably going to expect them to >>> work and the more common case is that they won't. For instance, what >>> can you do with an igbvf when igb isn't managing the pf? >> Well the VFs wouldn't be able to do anything. Basically they would be >> sitting there with no driver loaded on them unless they are assigned >> to a guest, or the root user had enabled the unmanaged option. If you >> did load a driver on it the VF would sit there with link down unless >> either the PF driver is loaded or some user-space entity steps in to >> start managing the PF. >> >> In reality this can already happen as last I recall igb and ixgbe were >> already capable of having the PF driver removed when SR-IOV was >> enabled and VFs were assigned. Basically the VFs just report link down >> and don't do anything. Reloading the PF driver would have it take over >> in the case of igb and ixgbe since they were designed to handle that >> type of scenario. >> >>> Or what happens when vfio-pci owns the pf, sriov is enabled via the >>> unmanaged interface, and the pf user driver segfaults and gets killed, >>> causing vfio-pci to restore the pf state, including wiping the sriov >>> config? >> Wiping the config shouldn't have any effect on the allocated VF pci >> devices. It will cause the VFs to act as though they have fallen off >> of the bus though and the guests would see a surprise remove type >> behavior if I am not mistaken. The MMIO and config accesses would fail >> until the SR-IOV configuration is restored. Really this shouldn't be a >> problem as long as the SR-IOV is enabled prior to loading the vfio-pci >> driver as I would assume it would restore the state an re-enable >> SR-IOV. >> >> In the grand scheme of things how would the situation you describe be >> any different than someone using the "reset" sysfs control on the PF >> while using SR-IOV with driver supported SR-IOV? >> >> I suppose if you really wanted we could add a new call that you could >> put into the sriov_configure pointer that would just make it always >> return error. Then that way the configuration could be locked until >> the driver is unloaded. >> >>> I guess I don't understand how vfs can operate fully independent of the >>> pf and why vfio-pci wouldn't just implement a dummy sriov_configure to >>> avoid contending with such issues. >> This approach isn't without risk, but it isn't as if it is really a >> new risk we are introducing, and we could do further work to help >> smooth out issues like this. Much of this depends on the design of the >> device and the drivers involved. What is happening in this case is >> that the VFs are managed outside of the kernel itself either by some >> user-space entity or by a firmware entity. > So I was thinking about this some more. In the case of vfio-pci things > are a bit different since you are essentially taking a given device > and handing it to a VM or userspace and it doesn't guarantee a > communication between the two. > > My thought is to look at making SR-IOV configuration static or treat > it as read-only when the vfio-pci driver is loaded on a given > interface. In addition I would set the TAINT_USER flag and add a > warning about loading vfio-pci on an active PF, and provide the number > of VFs that were allocated. > > The idea with all of this being that we would at least have a partial > lock on all of this so that you can allocate some number of VFs, and > then start partitioning things up where you could assign the PF and > VFs as needed to the various interfaces. Once the PF is assigned to > the vfio-pci driver it would be locked in terms of the number of VFs > that are provided so there would be no need for an extra communication > channel between the PF driver and the host to deal with changes. > > Thoughts? I've been following this conversation and I think putting in some restrictions in order to simplify the design is definitely worth considering.  I think there's a lot of potential for this but KISS still applies. Thanks for the work on this! - Greg