2020-02-19 18:54:52

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 0/7] vfio/pci: SR-IOV support

Changes since v1 are primarily to patch 3/7 where the commit log is
rewritten, along with option parsing and failure logging based on
upstream discussions. The primary user visible difference is that
option parsing is now much more strict. If a vf_token option is
provided that cannot be used, we generate an error. As a result of
this, opening a PF with a vf_token option will serve as a mechanism of
setting the vf_token. This seems like a more user friendly API than
the alternative of sometimes requiring the option (VFs in use) and
sometimes rejecting it, and upholds our desire that the option is
always either used or rejected.

This also means that the VFIO_DEVICE_FEATURE ioctl is not the only
means of setting the VF token, which might call into question whether
we absolutely need this new ioctl. Currently I'm keeping it because I
can imagine use cases, for example if a hypervisor were to support
SR-IOV, the PF device might be opened without consideration for a VF
token and we'd require the hypservisor to close and re-open the PF in
order to set a known VF token, which is impractical.

Series overview (same as provided with v1):

The synopsis of this series is that we have an ongoing desire to drive
PCIe SR-IOV PFs from userspace with VFIO. There's an immediate need
for this with DPDK drivers and potentially interesting future use
cases in virtualization. We've been reluctant to add this support
previously due to the dependency and trust relationship between the
VF device and PF driver. Minimally the PF driver can induce a denial
of service to the VF, but depending on the specific implementation,
the PF driver might also be responsible for moving data between VFs
or have direct access to the state of the VF, including data or state
otherwise private to the VF or VF driver.

To help resolve these concerns, we introduce a VF token into the VFIO
PCI ABI, which acts as a shared secret key between drivers. The
userspace PF driver is required to set the VF token to a known value
and userspace VF drivers are required to provide the token to access
the VF device. If a PF driver is restarted with VF drivers in use, it
must also provide the current token in order to prevent a rogue
untrusted PF driver from replacing a known driver. The degree to
which this new token is considered secret is left to the userspace
drivers, the kernel intentionally provides no means to retrieve the
current token.

Note that the above token is only required for this new model where
both the PF and VF devices are usable through vfio-pci. Existing
models of VFIO drivers where the PF is used without SR-IOV enabled
or the VF is bound to a userspace driver with an in-kernel, host PF
driver are unaffected.

The latter configuration above also highlights a new inverted scenario
that is now possible, a userspace PF driver with in-kernel VF drivers.
I believe this is a scenario that should be allowed, but should not be
enabled by default. This series includes code to set a default
driver_override for VFs sourced from a vfio-pci user owned PF, such
that the VFs are also bound to vfio-pci. This model is compatible
with tools like driverctl and allows the system administrator to
decide if other bindings should be enabled. The VF token interface
above exists only between vfio-pci PF and VF drivers, once a VF is
bound to another driver, the administrator has effectively pronounced
the device as trusted. The vfio-pci driver will note alternate
binding in dmesg for logging and debugging purposes.

Please review, comment, and test. The example QEMU implementation
provided with the RFC is still current for this version. Thanks,

Alex

RFC: https://lore.kernel.org/lkml/[email protected]/
v1: https://lore.kernel.org/lkml/[email protected]/

---

Alex Williamson (7):
vfio: Include optional device match in vfio_device_ops callbacks
vfio/pci: Implement match ops
vfio/pci: Introduce VF token
vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
vfio/pci: Add sriov_configure support
vfio/pci: Remove dev_fmt definition
vfio/pci: Cleanup .probe() exit paths


drivers/vfio/pci/vfio_pci.c | 383 +++++++++++++++++++++++++++++++++--
drivers/vfio/pci/vfio_pci_private.h | 10 +
drivers/vfio/vfio.c | 20 +-
include/linux/vfio.h | 4
include/uapi/linux/vfio.h | 37 +++
5 files changed, 426 insertions(+), 28 deletions(-)


2020-02-19 18:54:52

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 1/7] vfio: Include optional device match in vfio_device_ops callbacks

Allow bus drivers to provide their own callback to match a device to
the user provided string.

Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/vfio.c | 20 ++++++++++++++++----
include/linux/vfio.h | 4 ++++
2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index c8482624ca34..0bd77d6ea691 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -875,11 +875,23 @@ EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group,
char *buf)
{
- struct vfio_device *it, *device = NULL;
+ struct vfio_device *it, *device = ERR_PTR(-ENODEV);

mutex_lock(&group->device_lock);
list_for_each_entry(it, &group->device_list, group_next) {
- if (!strcmp(dev_name(it->dev), buf)) {
+ int ret;
+
+ if (it->ops->match) {
+ ret = it->ops->match(it->device_data, buf);
+ if (ret < 0) {
+ device = ERR_PTR(ret);
+ break;
+ }
+ } else {
+ ret = !strcmp(dev_name(it->dev), buf);
+ }
+
+ if (ret) {
device = it;
vfio_device_get(device);
break;
@@ -1430,8 +1442,8 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
return -EPERM;

device = vfio_device_get_from_name(group, buf);
- if (!device)
- return -ENODEV;
+ if (IS_ERR(device))
+ return PTR_ERR(device);

ret = device->ops->open(device->device_data);
if (ret) {
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e42a711a2800..029694b977f2 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -26,6 +26,9 @@
* operations documented below
* @mmap: Perform mmap(2) on a region of the device file descriptor
* @request: Request for the bus driver to release the device
+ * @match: Optional device name match callback (return: 0 for no-match, >0 for
+ * match, -errno for abort (ex. match with insufficient or incorrect
+ * additional args)
*/
struct vfio_device_ops {
char *name;
@@ -39,6 +42,7 @@ struct vfio_device_ops {
unsigned long arg);
int (*mmap)(void *device_data, struct vm_area_struct *vma);
void (*request)(void *device_data, unsigned int count);
+ int (*match)(void *device_data, char *buf);
};

extern struct iommu_group *vfio_iommu_group_get(struct device *dev);

2020-02-19 18:54:57

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 2/7] vfio/pci: Implement match ops

This currently serves the same purpose as the default implementation
but will be expanded for additional functionality.

Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/pci/vfio_pci.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 379a02c36e37..2ec6c31d0ab0 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1278,6 +1278,13 @@ static void vfio_pci_request(void *device_data, unsigned int count)
mutex_unlock(&vdev->igate);
}

+static int vfio_pci_match(void *device_data, char *buf)
+{
+ struct vfio_pci_device *vdev = device_data;
+
+ return !strcmp(pci_name(vdev->pdev), buf);
+}
+
static const struct vfio_device_ops vfio_pci_ops = {
.name = "vfio-pci",
.open = vfio_pci_open,
@@ -1287,6 +1294,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
.write = vfio_pci_write,
.mmap = vfio_pci_mmap,
.request = vfio_pci_request,
+ .match = vfio_pci_match,
};

static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);

2020-02-19 18:55:03

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 3/7] vfio/pci: Introduce VF token

If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
fully isolated from the PF. The PF can always cause a denial of service
to the VF, even if by simply resetting itself. The degree to which a PF
can access the data passed through a VF or interfere with its operation
is dependent on a given SR-IOV implementation. Therefore we want to
avoid a scenario where an existing vfio-pci based userspace driver might
assume the PF driver is trusted, for example assigning a PF to one VM
and VF to another with some expectation of isolation. IOMMU grouping
could be a solution to this, but imposes an unnecessarily strong
relationship between PF and VF drivers if they need to operate with the
same IOMMU context. Instead we introduce a "VF token", which is
essentially just a shared secret between PF and VF drivers, implemented
as a UUID.

The VF token can be set by a vfio-pci based PF driver and must be known
by the vfio-pci based VF driver in order to gain access to the device.
This allows the degree to which this VF token is considered secret to be
determined by the applications and environment. For example a VM might
generate a random UUID known only internally to the hypervisor while a
userspace networking appliance might use a shared, or even well know,
UUID among the application drivers.

To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD interface is
extended to accept key=value pairs in addition to the device name. This
allows us to most easily deny user access to the device without risk
that existing userspace drivers assume region offsets, IRQs, and other
device features, leading to more elaborate error paths. The format of
these options are expected to take the form:

"$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"

Where the device name is always provided first for compatibility and
additional options are specified in a space separated list. The
relation between and requirements for the additional options will be
vfio bus driver dependent, however unknown or unused option within this
schema should return error. This allow for future use of unknown
options as well as a positive indication to the user that an option is
used.

An example VF token option would take this form:

"0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"

When accessing a VF where the PF is making use of vfio-pci, the user
MUST provide the current vf_token. When accessing a PF, the user MUST
provide the current vf_token IF there are active VF users or MAY provide
a vf_token in order to set the current VF token when no VF users are
active. The former requirement assures VF users that an unassociated
driver cannot usurp the PF device. These semantics also imply that a
VF token MUST be set by a PF driver before VF drivers can access their
device, the default token is random and mechanisms to read the token are
not provided in order to protect the VF token of previous users. Use of
the vf_token option outside of these cases will return an error, as
discussed above.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/pci/vfio_pci.c | 198 +++++++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_private.h | 8 +
2 files changed, 205 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 2ec6c31d0ab0..8dd6ef9543ca 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -466,6 +466,44 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
vfio_pci_set_power_state(vdev, PCI_D3hot);
}

+static struct pci_driver vfio_pci_driver;
+
+static struct vfio_pci_device *get_pf_vdev(struct vfio_pci_device *vdev,
+ struct vfio_device **pf_dev)
+{
+ struct pci_dev *physfn = pci_physfn(vdev->pdev);
+
+ if (!vdev->pdev->is_virtfn)
+ return NULL;
+
+ *pf_dev = vfio_device_get_from_dev(&physfn->dev);
+ if (!*pf_dev)
+ return NULL;
+
+ if (pci_dev_driver(physfn) != &vfio_pci_driver) {
+ vfio_device_put(*pf_dev);
+ return NULL;
+ }
+
+ return vfio_device_data(*pf_dev);
+}
+
+static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int val)
+{
+ struct vfio_device *pf_dev;
+ struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev, &pf_dev);
+
+ if (!pf_vdev)
+ return;
+
+ mutex_lock(&pf_vdev->vf_token->lock);
+ pf_vdev->vf_token->users += val;
+ WARN_ON(pf_vdev->vf_token->users < 0);
+ mutex_unlock(&pf_vdev->vf_token->lock);
+
+ vfio_device_put(pf_dev);
+}
+
static void vfio_pci_release(void *device_data)
{
struct vfio_pci_device *vdev = device_data;
@@ -473,6 +511,7 @@ static void vfio_pci_release(void *device_data)
mutex_lock(&vdev->reflck->lock);

if (!(--vdev->refcnt)) {
+ vfio_pci_vf_token_user_add(vdev, -1);
vfio_spapr_pci_eeh_release(vdev->pdev);
vfio_pci_disable(vdev);
}
@@ -498,6 +537,7 @@ static int vfio_pci_open(void *device_data)
goto error;

vfio_spapr_pci_eeh_open(vdev->pdev);
+ vfio_pci_vf_token_user_add(vdev, 1);
}
vdev->refcnt++;
error:
@@ -1278,11 +1318,148 @@ static void vfio_pci_request(void *device_data, unsigned int count)
mutex_unlock(&vdev->igate);
}

+static int vfio_pci_validate_vf_token(struct vfio_pci_device *vdev,
+ bool vf_token, uuid_t *uuid)
+{
+ /*
+ * There's always some degree of trust or collaboration between SR-IOV
+ * PF and VFs, even if just that the PF hosts the SR-IOV capability and
+ * can disrupt VFs with a reset, but often the PF has more explicit
+ * access to deny service to the VF or access data passed through the
+ * VF. We therefore require an opt-in via a shared VF token (UUID) to
+ * represent this trust. This both prevents that a VF driver might
+ * assume the PF driver is a trusted, in-kernel driver, and also that
+ * a PF driver might be replaced with a rogue driver, unknown to in-use
+ * VF drivers.
+ *
+ * Therefore when presented with a VF, if the PF is a vfio device and
+ * it is bound to the vfio-pci driver, the user needs to provide a VF
+ * token to access the device, in the form of appending a vf_token to
+ * the device name, for example:
+ *
+ * "0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-f591514ba1f3"
+ *
+ * When presented with a PF which has VFs in use, the user must also
+ * provide the current VF token to prove collaboration with existing
+ * VF users. If VFs are not in use, the VF token provided for the PF
+ * device will act to set the VF token.
+ *
+ * If the VF token is provided but unused, a fault is generated.
+ */
+ if (!vdev->pdev->is_virtfn && !vdev->vf_token && !vf_token)
+ return 0; /* No VF token provided or required */
+
+ if (vdev->pdev->is_virtfn) {
+ struct vfio_device *pf_dev;
+ struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev, &pf_dev);
+ bool match;
+
+ if (!pf_vdev) {
+ if (!vf_token)
+ return 0; /* PF is not vfio-pci, no VF token */
+
+ pci_info_ratelimited(vdev->pdev,
+ "VF token incorrectly provided, PF not bound to vfio-pci\n");
+ return -EINVAL;
+ }
+
+ if (!vf_token) {
+ vfio_device_put(pf_dev);
+ pci_info_ratelimited(vdev->pdev,
+ "VF token required to access device\n");
+ return -EACCES;
+ }
+
+ mutex_lock(&pf_vdev->vf_token->lock);
+ match = uuid_equal(uuid, &pf_vdev->vf_token->uuid);
+ mutex_unlock(&pf_vdev->vf_token->lock);
+
+ vfio_device_put(pf_dev);
+
+ if (!match) {
+ pci_info_ratelimited(vdev->pdev,
+ "Incorrect VF token provided for device\n");
+ return -EACCES;
+ }
+ } else if (vdev->vf_token) {
+ mutex_lock(&vdev->vf_token->lock);
+ if (vdev->vf_token->users) {
+ if (!vf_token) {
+ mutex_unlock(&vdev->vf_token->lock);
+ pci_info_ratelimited(vdev->pdev,
+ "VF token required to access device\n");
+ return -EACCES;
+ }
+
+ if (!uuid_equal(uuid, &vdev->vf_token->uuid)) {
+ mutex_unlock(&vdev->vf_token->lock);
+ pci_info_ratelimited(vdev->pdev,
+ "Incorrect VF token provided for device\n");
+ return -EACCES;
+ }
+ } else if (vf_token) {
+ uuid_copy(&vdev->vf_token->uuid, uuid);
+ }
+
+ mutex_unlock(&vdev->vf_token->lock);
+ } else if (vf_token) {
+ pci_info_ratelimited(vdev->pdev,
+ "VF token incorrectly provided, not a PF or VF\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+#define VF_TOKEN_ARG "vf_token="
+
static int vfio_pci_match(void *device_data, char *buf)
{
struct vfio_pci_device *vdev = device_data;
+ bool vf_token = false;
+ uuid_t uuid;
+ int ret;
+
+ if (strncmp(pci_name(vdev->pdev), buf, strlen(pci_name(vdev->pdev))))
+ return 0; /* No match */
+
+ if (strlen(buf) > strlen(pci_name(vdev->pdev))) {
+ buf += strlen(pci_name(vdev->pdev));
+
+ if (*buf != ' ')
+ return 0; /* No match: non-whitespace after name */
+
+ while (*buf) {
+ if (*buf == ' ') {
+ buf++;
+ continue;
+ }
+
+ if (!vf_token && !strncmp(buf, VF_TOKEN_ARG,
+ strlen(VF_TOKEN_ARG))) {
+ buf += strlen(VF_TOKEN_ARG);
+
+ if (strlen(buf) < UUID_STRING_LEN)
+ return -EINVAL;
+
+ ret = uuid_parse(buf, &uuid);
+ if (ret)
+ return ret;

- return !strcmp(pci_name(vdev->pdev), buf);
+ vf_token = true;
+ buf += UUID_STRING_LEN;
+ } else {
+ /* Unknown/duplicate option */
+ return -EINVAL;
+ }
+ }
+ }
+
+ ret = vfio_pci_validate_vf_token(vdev, vf_token, &uuid);
+ if (ret)
+ return ret;
+
+ return 1; /* Match */
}

static const struct vfio_device_ops vfio_pci_ops = {
@@ -1354,6 +1531,19 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
return ret;
}

+ if (pdev->is_physfn) {
+ vdev->vf_token = kzalloc(sizeof(*vdev->vf_token), GFP_KERNEL);
+ if (!vdev->vf_token) {
+ vfio_pci_reflck_put(vdev->reflck);
+ vfio_del_group_dev(&pdev->dev);
+ vfio_iommu_group_put(group, &pdev->dev);
+ kfree(vdev);
+ return -ENOMEM;
+ }
+ mutex_init(&vdev->vf_token->lock);
+ uuid_gen(&vdev->vf_token->uuid);
+ }
+
if (vfio_pci_is_vga(pdev)) {
vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
vga_set_legacy_decoding(pdev,
@@ -1387,6 +1577,12 @@ static void vfio_pci_remove(struct pci_dev *pdev)
if (!vdev)
return;

+ if (vdev->vf_token) {
+ WARN_ON(vdev->vf_token->users);
+ mutex_destroy(&vdev->vf_token->lock);
+ kfree(vdev->vf_token);
+ }
+
vfio_pci_reflck_put(vdev->reflck);

vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 8a2c7607d513..76c11c915949 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -12,6 +12,7 @@
#include <linux/pci.h>
#include <linux/irqbypass.h>
#include <linux/types.h>
+#include <linux/uuid.h>

#ifndef VFIO_PCI_PRIVATE_H
#define VFIO_PCI_PRIVATE_H
@@ -84,6 +85,12 @@ struct vfio_pci_reflck {
struct mutex lock;
};

+struct vfio_pci_vf_token {
+ struct mutex lock;
+ uuid_t uuid;
+ int users;
+};
+
struct vfio_pci_device {
struct pci_dev *pdev;
void __iomem *barmap[PCI_STD_NUM_BARS];
@@ -122,6 +129,7 @@ struct vfio_pci_device {
struct list_head dummy_resources_list;
struct mutex ioeventfds_lock;
struct list_head ioeventfds_list;
+ struct vfio_pci_vf_token *vf_token;
};

#define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)

2020-02-19 18:55:17

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 5/7] vfio/pci: Add sriov_configure support

With the VF Token interface we can now expect that a vfio userspace
driver must be in collaboration with the PF driver, an unwitting
userspace driver will not be able to get past the GET_DEVICE_FD step
in accessing the device. We can now move on to actually allowing
SR-IOV to be enabled by vfio-pci on the PF. Support for this is not
enabled by default in this commit, but it does provide a module option
for this to be enabled (enable_sriov=1). Enabling VFs is rather
straightforward, except we don't want to risk that a VF might get
autoprobed and bound to other drivers, so a bus notifier is used to
"capture" VFs to vfio-pci using the driver_override support. We
assume any later action to bind the device to other drivers is
condoned by the system admin and allow it with a log warning.

vfio-pci will disable SR-IOV on a PF before releasing the device,
allowing a VF driver to be assured other drivers cannot take over the
PF and that any other userspace driver must know the shared VF token.
This support also does not provide a mechanism for the PF userspace
driver itself to manipulate SR-IOV through the vfio API. With this
patch SR-IOV can only be enabled via the host sysfs interface and the
PF driver user cannot create or remove VFs.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/pci/vfio_pci.c | 106 +++++++++++++++++++++++++++++++----
drivers/vfio/pci/vfio_pci_private.h | 2 +
2 files changed, 97 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index e4d5d26e5e71..b40ade48a844 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -54,6 +54,12 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
MODULE_PARM_DESC(disable_idle_d3,
"Disable using the PCI D3 low power state for idle, unused devices");

+static bool enable_sriov;
+#ifdef CONFIG_PCI_IOV
+module_param(enable_sriov, bool, 0644);
+MODULE_PARM_DESC(enable_sriov, "Enable support for SR-IOV configuration");
+#endif
+
static inline bool vfio_vga_disabled(void)
{
#ifdef CONFIG_VFIO_PCI_VGA
@@ -1528,6 +1534,35 @@ static const struct vfio_device_ops vfio_pci_ops = {

static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
+static struct pci_driver vfio_pci_driver;
+
+static int vfio_pci_bus_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct vfio_pci_device *vdev = container_of(nb,
+ struct vfio_pci_device, nb);
+ struct device *dev = data;
+ struct pci_dev *pdev = to_pci_dev(dev);
+ struct pci_dev *physfn = pci_physfn(pdev);
+
+ if (action == BUS_NOTIFY_ADD_DEVICE &&
+ pdev->is_virtfn && physfn == vdev->pdev) {
+ pci_info(vdev->pdev, "Captured SR-IOV VF %s driver_override\n",
+ pci_name(pdev));
+ pdev->driver_override = kasprintf(GFP_KERNEL, "%s",
+ vfio_pci_ops.name);
+ } else if (action == BUS_NOTIFY_BOUND_DRIVER &&
+ pdev->is_virtfn && physfn == vdev->pdev) {
+ struct pci_driver *drv = pci_dev_driver(pdev);
+
+ if (drv && drv != &vfio_pci_driver)
+ pci_warn(vdev->pdev,
+ "VF %s bound to driver %s while PF bound to vfio-pci\n",
+ pci_name(pdev), drv->name);
+ }
+
+ return 0;
+}

static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
@@ -1539,12 +1574,12 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
return -EINVAL;

/*
- * Prevent binding to PFs with VFs enabled, this too easily allows
- * userspace instance with VFs and PFs from the same device, which
- * cannot work. Disabling SR-IOV here would initiate removing the
- * VFs, which would unbind the driver, which is prone to blocking
- * if that VF is also in use by vfio-pci. Just reject these PFs
- * and let the user sort it out.
+ * Prevent binding to PFs with VFs enabled, the VFs might be in use
+ * by the host or other users. We cannot capture the VFs if they
+ * already exist, nor can we track VF users. Disabling SR-IOV here
+ * would initiate removing the VFs, which would unbind the driver,
+ * which is prone to blocking if that VF is also in use by vfio-pci.
+ * Just reject these PFs and let the user sort it out.
*/
if (pci_num_vf(pdev)) {
pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n");
@@ -1592,6 +1627,18 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
kfree(vdev);
return -ENOMEM;
}
+
+ vdev->nb.notifier_call = vfio_pci_bus_notifier;
+ ret = bus_register_notifier(&pci_bus_type, &vdev->nb);
+ if (ret) {
+ kfree(vdev->vf_token);
+ vfio_pci_reflck_put(vdev->reflck);
+ vfio_del_group_dev(&pdev->dev);
+ vfio_iommu_group_put(group, &pdev->dev);
+ kfree(vdev);
+ return ret;
+ }
+
mutex_init(&vdev->vf_token->lock);
uuid_gen(&vdev->vf_token->uuid);
}
@@ -1625,6 +1672,8 @@ static void vfio_pci_remove(struct pci_dev *pdev)
{
struct vfio_pci_device *vdev;

+ pci_disable_sriov(pdev);
+
vdev = vfio_del_group_dev(&pdev->dev);
if (!vdev)
return;
@@ -1635,6 +1684,9 @@ static void vfio_pci_remove(struct pci_dev *pdev)
kfree(vdev->vf_token);
}

+ if (vdev->nb.notifier_call)
+ bus_unregister_notifier(&pci_bus_type, &vdev->nb);
+
vfio_pci_reflck_put(vdev->reflck);

vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
@@ -1683,16 +1735,48 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
return PCI_ERS_RESULT_CAN_RECOVER;
}

+static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
+{
+ struct vfio_pci_device *vdev;
+ struct vfio_device *device;
+ int ret = 0;
+
+ might_sleep();
+
+ if (!enable_sriov)
+ return -ENOENT;
+
+ device = vfio_device_get_from_dev(&pdev->dev);
+ if (!device)
+ return -ENODEV;
+
+ vdev = vfio_device_data(device);
+ if (!vdev) {
+ vfio_device_put(device);
+ return -ENODEV;
+ }
+
+ if (nr_virtfn == 0)
+ pci_disable_sriov(pdev);
+ else
+ ret = pci_enable_sriov(pdev, nr_virtfn);
+
+ vfio_device_put(device);
+
+ return ret < 0 ? ret : nr_virtfn;
+}
+
static const struct pci_error_handlers vfio_err_handlers = {
.error_detected = vfio_pci_aer_err_detected,
};

static struct pci_driver vfio_pci_driver = {
- .name = "vfio-pci",
- .id_table = NULL, /* only dynamic ids */
- .probe = vfio_pci_probe,
- .remove = vfio_pci_remove,
- .err_handler = &vfio_err_handlers,
+ .name = "vfio-pci",
+ .id_table = NULL, /* only dynamic ids */
+ .probe = vfio_pci_probe,
+ .remove = vfio_pci_remove,
+ .sriov_configure = vfio_pci_sriov_configure,
+ .err_handler = &vfio_err_handlers,
};

static DEFINE_MUTEX(reflck_lock);
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 76c11c915949..36ec69081ecd 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -13,6 +13,7 @@
#include <linux/irqbypass.h>
#include <linux/types.h>
#include <linux/uuid.h>
+#include <linux/notifier.h>

#ifndef VFIO_PCI_PRIVATE_H
#define VFIO_PCI_PRIVATE_H
@@ -130,6 +131,7 @@ struct vfio_pci_device {
struct mutex ioeventfds_lock;
struct list_head ioeventfds_list;
struct vfio_pci_vf_token *vf_token;
+ struct notifier_block nb;
};

#define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)

2020-02-19 18:55:26

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 4/7] vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user

The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device
agnostic ioctl for setting, retrieving, and probing device features.
This implementation provides a 16-bit field for specifying a feature
index, where the data porition of the ioctl is determined by the
semantics for the given feature. Additional flag bits indicate the
direction and nature of the operation; SET indicates user data is
provided into the device feature, GET indicates the device feature is
written out into user data. The PROBE flag augments determining
whether the given feature is supported, and if provided, whether the
given operation on the feature is supported.

The first user of this ioctl is for setting the vfio-pci VF token,
where the user provides a shared secret key (UUID) on a SR-IOV PF
device, which users must provide when opening associated VF devices.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/pci/vfio_pci.c | 52 +++++++++++++++++++++++++++++++++++++++++++
include/uapi/linux/vfio.h | 37 +++++++++++++++++++++++++++++++
2 files changed, 89 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 8dd6ef9543ca..e4d5d26e5e71 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1180,6 +1180,58 @@ static long vfio_pci_ioctl(void *device_data,

return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
ioeventfd.data, count, ioeventfd.fd);
+ } else if (cmd == VFIO_DEVICE_FEATURE) {
+ struct vfio_device_feature feature;
+ uuid_t uuid;
+
+ minsz = offsetofend(struct vfio_device_feature, flags);
+
+ if (copy_from_user(&feature, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (feature.argsz < minsz)
+ return -EINVAL;
+
+ if (feature.flags & ~(VFIO_DEVICE_FEATURE_MASK |
+ VFIO_DEVICE_FEATURE_SET |
+ VFIO_DEVICE_FEATURE_GET |
+ VFIO_DEVICE_FEATURE_PROBE))
+ return -EINVAL;
+
+ switch (feature.flags & VFIO_DEVICE_FEATURE_MASK) {
+ case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
+ if (!vdev->vf_token)
+ return -ENOTTY;
+
+ /*
+ * We do not support GET of the VF Token UUID as this
+ * could expose the token of the previous device user.
+ */
+ if (feature.flags & VFIO_DEVICE_FEATURE_GET)
+ return -EINVAL;
+
+ if (feature.flags & VFIO_DEVICE_FEATURE_PROBE)
+ return 0;
+
+ /* Don't SET unless told to do so */
+ if (!(feature.flags & VFIO_DEVICE_FEATURE_SET))
+ return -EINVAL;
+
+ if (feature.argsz < minsz + sizeof(uuid))
+ return -EINVAL;
+
+ if (copy_from_user(&uuid, (void __user *)(arg + minsz),
+ sizeof(uuid)))
+ return -EFAULT;
+
+ mutex_lock(&vdev->vf_token->lock);
+ uuid_copy(&vdev->vf_token->uuid, &uuid);
+ mutex_unlock(&vdev->vf_token->lock);
+
+ return 0;
+ default:
+ return -ENOTTY;
+ }
}

return -ENOTTY;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9e843a147ead..aa37f90a2180 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -707,6 +707,43 @@ struct vfio_device_ioeventfd {

#define VFIO_DEVICE_IOEVENTFD _IO(VFIO_TYPE, VFIO_BASE + 16)

+/**
+ * VFIO_DEVICE_FEATURE - _IORW(VFIO_TYPE, VFIO_BASE + 17,
+ * struct vfio_device_feature)
+ *
+ * Get, set, or probe feature data of the device. The feature is selected
+ * using the FEATURE_MASK portion of the flags field. Support for a feature
+ * can be probed by setting both the FEATURE_MASK and PROBE bits. A probe
+ * may optionally include the GET and/or SET bits to determine read vs write
+ * access of the feature respectively. Probing a feature will return success
+ * if the feature is supported and all of the optionally indicated GET/SET
+ * methods are supported. The format of the data portion of the structure is
+ * specific to the given feature. The data portion is not required for
+ * probing.
+ *
+ * Return 0 on success, -errno on failure.
+ */
+struct vfio_device_feature {
+ __u32 argsz;
+ __u32 flags;
+#define VFIO_DEVICE_FEATURE_MASK (0xffff) /* 16-bit feature index */
+#define VFIO_DEVICE_FEATURE_GET (1 << 16) /* Get feature into data[] */
+#define VFIO_DEVICE_FEATURE_SET (1 << 17) /* Set feature from data[] */
+#define VFIO_DEVICE_FEATURE_PROBE (1 << 18) /* Probe feature support */
+ __u8 data[];
+};
+
+#define VFIO_DEVICE_FEATURE _IO(VFIO_TYPE, VFIO_BASE + 17)
+
+/*
+ * Provide support for setting a PCI VF Token, which is used as a shared
+ * secret between PF and VF drivers. This feature may only be set on a
+ * PCI SR-IOV PF when SR-IOV is enabled on the PF and there are no existing
+ * open VFs. Data provided when setting this feature is a 16-byte array
+ * (__u8 b[16]), representing a UUID.
+ */
+#define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN (0)
+
/* -------- API for Type1 VFIO IOMMU -------- */

/**

2020-02-19 18:55:26

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 6/7] vfio/pci: Remove dev_fmt definition

It currently results in messages like:

"vfio-pci 0000:03:00.0: vfio_pci: ..."

Which is quite a bit redundant.

Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/pci/vfio_pci.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index b40ade48a844..497ecadef2ba 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -9,7 +9,6 @@
*/

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-#define dev_fmt pr_fmt

#include <linux/device.h>
#include <linux/eventfd.h>

2020-02-19 18:55:33

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 7/7] vfio/pci: Cleanup .probe() exit paths

The cleanup is getting a tad long.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/pci/vfio_pci.c | 54 ++++++++++++++++++++-----------------------
1 file changed, 25 insertions(+), 29 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 497ecadef2ba..7d410224343a 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1591,8 +1591,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)

vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev) {
- vfio_iommu_group_put(group, &pdev->dev);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto out_group_put;
}

vdev->pdev = pdev;
@@ -1603,43 +1603,27 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
INIT_LIST_HEAD(&vdev->ioeventfds_list);

ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
- if (ret) {
- vfio_iommu_group_put(group, &pdev->dev);
- kfree(vdev);
- return ret;
- }
+ if (ret)
+ goto out_free;

ret = vfio_pci_reflck_attach(vdev);
- if (ret) {
- vfio_del_group_dev(&pdev->dev);
- vfio_iommu_group_put(group, &pdev->dev);
- kfree(vdev);
- return ret;
- }
+ if (ret)
+ goto out_del_group_dev;

if (pdev->is_physfn) {
vdev->vf_token = kzalloc(sizeof(*vdev->vf_token), GFP_KERNEL);
if (!vdev->vf_token) {
- vfio_pci_reflck_put(vdev->reflck);
- vfio_del_group_dev(&pdev->dev);
- vfio_iommu_group_put(group, &pdev->dev);
- kfree(vdev);
- return -ENOMEM;
- }
-
- vdev->nb.notifier_call = vfio_pci_bus_notifier;
- ret = bus_register_notifier(&pci_bus_type, &vdev->nb);
- if (ret) {
- kfree(vdev->vf_token);
- vfio_pci_reflck_put(vdev->reflck);
- vfio_del_group_dev(&pdev->dev);
- vfio_iommu_group_put(group, &pdev->dev);
- kfree(vdev);
- return ret;
+ ret = -ENOMEM;
+ goto out_reflck;
}

mutex_init(&vdev->vf_token->lock);
uuid_gen(&vdev->vf_token->uuid);
+
+ vdev->nb.notifier_call = vfio_pci_bus_notifier;
+ ret = bus_register_notifier(&pci_bus_type, &vdev->nb);
+ if (ret)
+ goto out_vf_token;
}

if (vfio_pci_is_vga(pdev)) {
@@ -1665,6 +1649,18 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
}

return ret;
+
+out_vf_token:
+ kfree(vdev->vf_token);
+out_reflck:
+ vfio_pci_reflck_put(vdev->reflck);
+out_del_group_dev:
+ vfio_del_group_dev(&pdev->dev);
+out_free:
+ kfree(vdev);
+out_group_put:
+ vfio_iommu_group_put(group, &pdev->dev);
+ return ret;
}

static void vfio_pci_remove(struct pci_dev *pdev)

2020-02-25 02:35:01

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 0/7] vfio/pci: SR-IOV support

> From: Alex Williamson
> Sent: Thursday, February 20, 2020 2:54 AM
>
> Changes since v1 are primarily to patch 3/7 where the commit log is
> rewritten, along with option parsing and failure logging based on
> upstream discussions. The primary user visible difference is that
> option parsing is now much more strict. If a vf_token option is
> provided that cannot be used, we generate an error. As a result of
> this, opening a PF with a vf_token option will serve as a mechanism of
> setting the vf_token. This seems like a more user friendly API than
> the alternative of sometimes requiring the option (VFs in use) and
> sometimes rejecting it, and upholds our desire that the option is
> always either used or rejected.
>
> This also means that the VFIO_DEVICE_FEATURE ioctl is not the only
> means of setting the VF token, which might call into question whether
> we absolutely need this new ioctl. Currently I'm keeping it because I
> can imagine use cases, for example if a hypervisor were to support
> SR-IOV, the PF device might be opened without consideration for a VF
> token and we'd require the hypservisor to close and re-open the PF in
> order to set a known VF token, which is impractical.
>
> Series overview (same as provided with v1):

Thanks for doing this!

>
> The synopsis of this series is that we have an ongoing desire to drive
> PCIe SR-IOV PFs from userspace with VFIO. There's an immediate need
> for this with DPDK drivers and potentially interesting future use

Can you provide a link to the DPDK discussion?

> cases in virtualization. We've been reluctant to add this support
> previously due to the dependency and trust relationship between the
> VF device and PF driver. Minimally the PF driver can induce a denial
> of service to the VF, but depending on the specific implementation,
> the PF driver might also be responsible for moving data between VFs
> or have direct access to the state of the VF, including data or state
> otherwise private to the VF or VF driver.

Just a loud thinking. While the motivation of VF token sounds reasonable
to me, I'm curious why the same concern is not raised in other usages.
For example, there is no such design in virtio framework, where the
virtio device could also be restarted, putting in separate process (vhost-user),
and even in separate VM (virtio-vhost-user), etc. Of course the para-
virtualized attribute of virtio implies some degree of trust, but as you
mentioned many SR-IOV implementations support VF->PF communication
which also implies some level of trust. It's perfectly fine if VFIO just tries
to do better than other sub-systems, but knowing how other people
tackle the similar problem may make the whole picture clearer. ????

+Jason.

>
> To help resolve these concerns, we introduce a VF token into the VFIO
> PCI ABI, which acts as a shared secret key between drivers. The
> userspace PF driver is required to set the VF token to a known value
> and userspace VF drivers are required to provide the token to access
> the VF device. If a PF driver is restarted with VF drivers in use, it
> must also provide the current token in order to prevent a rogue
> untrusted PF driver from replacing a known driver. The degree to
> which this new token is considered secret is left to the userspace
> drivers, the kernel intentionally provides no means to retrieve the
> current token.

I'm wondering whether the token idea can be used beyond SR-IOV, e.g.
(1) we may allow vfio user space to manage Scalable IOV in the future,
which faces the similar challenge between the PF and mdev; (2) the
token might be used as a canonical way to replace off-tree acs-override
workaround, say, allowing the admin to assign devices within the
same iommu group to different VMs which trust each other. I'm not
sure how much complexity will be further introduced, but it's greatly
appreciated if you can help think a bit and if feasible abstract some
logic in vfio core layer for such potential usages...

>
> Note that the above token is only required for this new model where
> both the PF and VF devices are usable through vfio-pci. Existing
> models of VFIO drivers where the PF is used without SR-IOV enabled
> or the VF is bound to a userspace driver with an in-kernel, host PF
> driver are unaffected.
>
> The latter configuration above also highlights a new inverted scenario
> that is now possible, a userspace PF driver with in-kernel VF drivers.
> I believe this is a scenario that should be allowed, but should not be
> enabled by default. This series includes code to set a default
> driver_override for VFs sourced from a vfio-pci user owned PF, such
> that the VFs are also bound to vfio-pci. This model is compatible
> with tools like driverctl and allows the system administrator to
> decide if other bindings should be enabled. The VF token interface
> above exists only between vfio-pci PF and VF drivers, once a VF is
> bound to another driver, the administrator has effectively pronounced
> the device as trusted. The vfio-pci driver will note alternate
> binding in dmesg for logging and debugging purposes.
>
> Please review, comment, and test. The example QEMU implementation
> provided with the RFC is still current for this version. Thanks,
>
> Alex
>
> RFC:
> https://lore.kernel.org/lkml/158085337582.9445.17682266437583505502.stg
> [email protected]/
> v1:
> https://lore.kernel.org/lkml/158145472604.16827.15751375540102298130.st
> [email protected]/
>
> ---
>
> Alex Williamson (7):
> vfio: Include optional device match in vfio_device_ops callbacks
> vfio/pci: Implement match ops
> vfio/pci: Introduce VF token
> vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
> vfio/pci: Add sriov_configure support
> vfio/pci: Remove dev_fmt definition
> vfio/pci: Cleanup .probe() exit paths
>
>
> drivers/vfio/pci/vfio_pci.c | 383
> +++++++++++++++++++++++++++++++++--
> drivers/vfio/pci/vfio_pci_private.h | 10 +
> drivers/vfio/vfio.c | 20 +-
> include/linux/vfio.h | 4
> include/uapi/linux/vfio.h | 37 +++
> 5 files changed, 426 insertions(+), 28 deletions(-)

2020-02-25 03:00:05

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 3/7] vfio/pci: Introduce VF token

> From: Alex Williamson
> Sent: Thursday, February 20, 2020 2:54 AM
>
> If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
> fully isolated from the PF. The PF can always cause a denial of service
> to the VF, even if by simply resetting itself. The degree to which a PF
> can access the data passed through a VF or interfere with its operation
> is dependent on a given SR-IOV implementation. Therefore we want to
> avoid a scenario where an existing vfio-pci based userspace driver might
> assume the PF driver is trusted, for example assigning a PF to one VM
> and VF to another with some expectation of isolation. IOMMU grouping
> could be a solution to this, but imposes an unnecessarily strong
> relationship between PF and VF drivers if they need to operate with the
> same IOMMU context. Instead we introduce a "VF token", which is
> essentially just a shared secret between PF and VF drivers, implemented
> as a UUID.
>
> The VF token can be set by a vfio-pci based PF driver and must be known
> by the vfio-pci based VF driver in order to gain access to the device.
> This allows the degree to which this VF token is considered secret to be
> determined by the applications and environment. For example a VM might
> generate a random UUID known only internally to the hypervisor while a
> userspace networking appliance might use a shared, or even well know,
> UUID among the application drivers.
>
> To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD interface is
> extended to accept key=value pairs in addition to the device name. This
> allows us to most easily deny user access to the device without risk
> that existing userspace drivers assume region offsets, IRQs, and other
> device features, leading to more elaborate error paths. The format of
> these options are expected to take the form:
>
> "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"
>
> Where the device name is always provided first for compatibility and
> additional options are specified in a space separated list. The
> relation between and requirements for the additional options will be
> vfio bus driver dependent, however unknown or unused option within this
> schema should return error. This allow for future use of unknown
> options as well as a positive indication to the user that an option is
> used.
>
> An example VF token option would take this form:
>
> "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"
>
> When accessing a VF where the PF is making use of vfio-pci, the user
> MUST provide the current vf_token. When accessing a PF, the user MUST
> provide the current vf_token IF there are active VF users or MAY provide
> a vf_token in order to set the current VF token when no VF users are
> active. The former requirement assures VF users that an unassociated
> driver cannot usurp the PF device. These semantics also imply that a
> VF token MUST be set by a PF driver before VF drivers can access their
> device, the default token is random and mechanisms to read the token are
> not provided in order to protect the VF token of previous users. Use of
> the vf_token option outside of these cases will return an error, as
> discussed above.
>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
> drivers/vfio/pci/vfio_pci.c | 198
> +++++++++++++++++++++++++++++++++++
> drivers/vfio/pci/vfio_pci_private.h | 8 +
> 2 files changed, 205 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 2ec6c31d0ab0..8dd6ef9543ca 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -466,6 +466,44 @@ static void vfio_pci_disable(struct vfio_pci_device
> *vdev)
> vfio_pci_set_power_state(vdev, PCI_D3hot);
> }
>
> +static struct pci_driver vfio_pci_driver;
> +
> +static struct vfio_pci_device *get_pf_vdev(struct vfio_pci_device *vdev,
> + struct vfio_device **pf_dev)
> +{
> + struct pci_dev *physfn = pci_physfn(vdev->pdev);
> +
> + if (!vdev->pdev->is_virtfn)
> + return NULL;
> +
> + *pf_dev = vfio_device_get_from_dev(&physfn->dev);
> + if (!*pf_dev)
> + return NULL;
> +
> + if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> + vfio_device_put(*pf_dev);
> + return NULL;
> + }
> +
> + return vfio_device_data(*pf_dev);
> +}
> +
> +static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int val)
> +{
> + struct vfio_device *pf_dev;
> + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev, &pf_dev);
> +
> + if (!pf_vdev)
> + return;
> +
> + mutex_lock(&pf_vdev->vf_token->lock);
> + pf_vdev->vf_token->users += val;
> + WARN_ON(pf_vdev->vf_token->users < 0);
> + mutex_unlock(&pf_vdev->vf_token->lock);
> +
> + vfio_device_put(pf_dev);
> +}
> +
> static void vfio_pci_release(void *device_data)
> {
> struct vfio_pci_device *vdev = device_data;
> @@ -473,6 +511,7 @@ static void vfio_pci_release(void *device_data)
> mutex_lock(&vdev->reflck->lock);
>
> if (!(--vdev->refcnt)) {
> + vfio_pci_vf_token_user_add(vdev, -1);
> vfio_spapr_pci_eeh_release(vdev->pdev);
> vfio_pci_disable(vdev);
> }
> @@ -498,6 +537,7 @@ static int vfio_pci_open(void *device_data)
> goto error;
>
> vfio_spapr_pci_eeh_open(vdev->pdev);
> + vfio_pci_vf_token_user_add(vdev, 1);
> }
> vdev->refcnt++;
> error:
> @@ -1278,11 +1318,148 @@ static void vfio_pci_request(void *device_data,
> unsigned int count)
> mutex_unlock(&vdev->igate);
> }
>
> +static int vfio_pci_validate_vf_token(struct vfio_pci_device *vdev,
> + bool vf_token, uuid_t *uuid)
> +{
> + /*
> + * There's always some degree of trust or collaboration between SR-
> IOV
> + * PF and VFs, even if just that the PF hosts the SR-IOV capability and
> + * can disrupt VFs with a reset, but often the PF has more explicit
> + * access to deny service to the VF or access data passed through the
> + * VF. We therefore require an opt-in via a shared VF token (UUID)
> to
> + * represent this trust. This both prevents that a VF driver might
> + * assume the PF driver is a trusted, in-kernel driver, and also that
> + * a PF driver might be replaced with a rogue driver, unknown to in-
> use
> + * VF drivers.
> + *
> + * Therefore when presented with a VF, if the PF is a vfio device and
> + * it is bound to the vfio-pci driver, the user needs to provide a VF
> + * token to access the device, in the form of appending a vf_token to
> + * the device name, for example:
> + *
> + * "0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-f591514ba1f3"
> + *
> + * When presented with a PF which has VFs in use, the user must also
> + * provide the current VF token to prove collaboration with existing
> + * VF users. If VFs are not in use, the VF token provided for the PF
> + * device will act to set the VF token.
> + *
> + * If the VF token is provided but unused, a fault is generated.

fault->error, otherwise it is easy to consider a CPU fault. ????

> + */
> + if (!vdev->pdev->is_virtfn && !vdev->vf_token && !vf_token)
> + return 0; /* No VF token provided or required */
> +
> + if (vdev->pdev->is_virtfn) {
> + struct vfio_device *pf_dev;
> + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> &pf_dev);
> + bool match;
> +
> + if (!pf_vdev) {
> + if (!vf_token)
> + return 0; /* PF is not vfio-pci, no VF token */
> +
> + pci_info_ratelimited(vdev->pdev,
> + "VF token incorrectly provided, PF not bound
> to vfio-pci\n");
> + return -EINVAL;
> + }
> +
> + if (!vf_token) {
> + vfio_device_put(pf_dev);
> + pci_info_ratelimited(vdev->pdev,
> + "VF token required to access device\n");
> + return -EACCES;
> + }
> +
> + mutex_lock(&pf_vdev->vf_token->lock);
> + match = uuid_equal(uuid, &pf_vdev->vf_token->uuid);
> + mutex_unlock(&pf_vdev->vf_token->lock);
> +
> + vfio_device_put(pf_dev);
> +
> + if (!match) {
> + pci_info_ratelimited(vdev->pdev,
> + "Incorrect VF token provided for device\n");
> + return -EACCES;
> + }
> + } else if (vdev->vf_token) {
> + mutex_lock(&vdev->vf_token->lock);
> + if (vdev->vf_token->users) {
> + if (!vf_token) {
> + mutex_unlock(&vdev->vf_token->lock);
> + pci_info_ratelimited(vdev->pdev,
> + "VF token required to access
> device\n");
> + return -EACCES;
> + }
> +
> + if (!uuid_equal(uuid, &vdev->vf_token->uuid)) {
> + mutex_unlock(&vdev->vf_token->lock);
> + pci_info_ratelimited(vdev->pdev,
> + "Incorrect VF token provided for
> device\n");
> + return -EACCES;
> + }
> + } else if (vf_token) {
> + uuid_copy(&vdev->vf_token->uuid, uuid);
> + }

It implies that we allow PF to be accessed w/o providing a VF token,
as long as no VF is currently in-use, which further means no VF can
be further assigned since no one knows the random uuid allocated
by vfio. Just want to confirm whether it is the desired flavor. If an
user really wants to use PF-only, possibly he should disable SR-IOV
instead...

> +
> + mutex_unlock(&vdev->vf_token->lock);
> + } else if (vf_token) {
> + pci_info_ratelimited(vdev->pdev,
> + "VF token incorrectly provided, not a PF or VF\n");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +#define VF_TOKEN_ARG "vf_token="
> +
> static int vfio_pci_match(void *device_data, char *buf)
> {
> struct vfio_pci_device *vdev = device_data;
> + bool vf_token = false;
> + uuid_t uuid;
> + int ret;
> +
> + if (strncmp(pci_name(vdev->pdev), buf, strlen(pci_name(vdev-
> >pdev))))
> + return 0; /* No match */
> +
> + if (strlen(buf) > strlen(pci_name(vdev->pdev))) {
> + buf += strlen(pci_name(vdev->pdev));
> +
> + if (*buf != ' ')
> + return 0; /* No match: non-whitespace after name */
> +
> + while (*buf) {
> + if (*buf == ' ') {
> + buf++;
> + continue;
> + }
> +
> + if (!vf_token && !strncmp(buf, VF_TOKEN_ARG,
> + strlen(VF_TOKEN_ARG))) {
> + buf += strlen(VF_TOKEN_ARG);
> +
> + if (strlen(buf) < UUID_STRING_LEN)
> + return -EINVAL;
> +
> + ret = uuid_parse(buf, &uuid);
> + if (ret)
> + return ret;
>
> - return !strcmp(pci_name(vdev->pdev), buf);
> + vf_token = true;
> + buf += UUID_STRING_LEN;
> + } else {
> + /* Unknown/duplicate option */
> + return -EINVAL;
> + }
> + }
> + }
> +
> + ret = vfio_pci_validate_vf_token(vdev, vf_token, &uuid);
> + if (ret)
> + return ret;
> +
> + return 1; /* Match */
> }
>
> static const struct vfio_device_ops vfio_pci_ops = {
> @@ -1354,6 +1531,19 @@ static int vfio_pci_probe(struct pci_dev *pdev,
> const struct pci_device_id *id)
> return ret;
> }
>
> + if (pdev->is_physfn) {
> + vdev->vf_token = kzalloc(sizeof(*vdev->vf_token),
> GFP_KERNEL);
> + if (!vdev->vf_token) {
> + vfio_pci_reflck_put(vdev->reflck);
> + vfio_del_group_dev(&pdev->dev);
> + vfio_iommu_group_put(group, &pdev->dev);
> + kfree(vdev);
> + return -ENOMEM;
> + }
> + mutex_init(&vdev->vf_token->lock);
> + uuid_gen(&vdev->vf_token->uuid);

should we also regenerate a random uuid somewhere when SR-IOV is
disabled and then re-enabled on a PF? Although vfio disallows userspace
to read uuid, it is always safer to avoid caching a secret from previous
user.

> + }
> +
> if (vfio_pci_is_vga(pdev)) {
> vga_client_register(pdev, vdev, NULL,
> vfio_pci_set_vga_decode);
> vga_set_legacy_decoding(pdev,
> @@ -1387,6 +1577,12 @@ static void vfio_pci_remove(struct pci_dev *pdev)
> if (!vdev)
> return;
>
> + if (vdev->vf_token) {
> + WARN_ON(vdev->vf_token->users);
> + mutex_destroy(&vdev->vf_token->lock);
> + kfree(vdev->vf_token);
> + }
> +
> vfio_pci_reflck_put(vdev->reflck);
>
> vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
> diff --git a/drivers/vfio/pci/vfio_pci_private.h
> b/drivers/vfio/pci/vfio_pci_private.h
> index 8a2c7607d513..76c11c915949 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -12,6 +12,7 @@
> #include <linux/pci.h>
> #include <linux/irqbypass.h>
> #include <linux/types.h>
> +#include <linux/uuid.h>
>
> #ifndef VFIO_PCI_PRIVATE_H
> #define VFIO_PCI_PRIVATE_H
> @@ -84,6 +85,12 @@ struct vfio_pci_reflck {
> struct mutex lock;
> };
>
> +struct vfio_pci_vf_token {
> + struct mutex lock;
> + uuid_t uuid;
> + int users;
> +};
> +
> struct vfio_pci_device {
> struct pci_dev *pdev;
> void __iomem *barmap[PCI_STD_NUM_BARS];
> @@ -122,6 +129,7 @@ struct vfio_pci_device {
> struct list_head dummy_resources_list;
> struct mutex ioeventfds_lock;
> struct list_head ioeventfds_list;
> + struct vfio_pci_vf_token *vf_token;
> };
>
> #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)

2020-02-25 03:08:25

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 5/7] vfio/pci: Add sriov_configure support

> From: Alex Williamson
> Sent: Thursday, February 20, 2020 2:54 AM
>
> With the VF Token interface we can now expect that a vfio userspace
> driver must be in collaboration with the PF driver, an unwitting
> userspace driver will not be able to get past the GET_DEVICE_FD step
> in accessing the device. We can now move on to actually allowing
> SR-IOV to be enabled by vfio-pci on the PF. Support for this is not
> enabled by default in this commit, but it does provide a module option
> for this to be enabled (enable_sriov=1). Enabling VFs is rather
> straightforward, except we don't want to risk that a VF might get
> autoprobed and bound to other drivers, so a bus notifier is used to
> "capture" VFs to vfio-pci using the driver_override support. We
> assume any later action to bind the device to other drivers is
> condoned by the system admin and allow it with a log warning.
>
> vfio-pci will disable SR-IOV on a PF before releasing the device,
> allowing a VF driver to be assured other drivers cannot take over the
> PF and that any other userspace driver must know the shared VF token.
> This support also does not provide a mechanism for the PF userspace
> driver itself to manipulate SR-IOV through the vfio API. With this
> patch SR-IOV can only be enabled via the host sysfs interface and the
> PF driver user cannot create or remove VFs.

I'm not sure how many devices can be properly configured simply
with pci_enable_sriov. It is not unusual to require PF driver prepare
something before turning PCI SR-IOV capability. If you look kernel
PF drivers, there are only two using generic pci_sriov_configure_
simple (simple wrapper like pci_enable_sriov), while most others
implementing their own callback. However vfio itself has no idea
thus I'm not sure how an user knows whether using this option can
actually meet his purpose. I may miss something here, possibly
using DPDK as an example will make it clearer.

>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
> drivers/vfio/pci/vfio_pci.c | 106 +++++++++++++++++++++++++++++++--
> --
> drivers/vfio/pci/vfio_pci_private.h | 2 +
> 2 files changed, 97 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index e4d5d26e5e71..b40ade48a844 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -54,6 +54,12 @@ module_param(disable_idle_d3, bool, S_IRUGO |
> S_IWUSR);
> MODULE_PARM_DESC(disable_idle_d3,
> "Disable using the PCI D3 low power state for idle, unused
> devices");
>
> +static bool enable_sriov;
> +#ifdef CONFIG_PCI_IOV
> +module_param(enable_sriov, bool, 0644);
> +MODULE_PARM_DESC(enable_sriov, "Enable support for SR-IOV
> configuration");
> +#endif
> +
> static inline bool vfio_vga_disabled(void)
> {
> #ifdef CONFIG_VFIO_PCI_VGA
> @@ -1528,6 +1534,35 @@ static const struct vfio_device_ops vfio_pci_ops =
> {
>
> static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
> static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck);
> +static struct pci_driver vfio_pci_driver;
> +
> +static int vfio_pci_bus_notifier(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + struct vfio_pci_device *vdev = container_of(nb,
> + struct vfio_pci_device, nb);
> + struct device *dev = data;
> + struct pci_dev *pdev = to_pci_dev(dev);
> + struct pci_dev *physfn = pci_physfn(pdev);
> +
> + if (action == BUS_NOTIFY_ADD_DEVICE &&
> + pdev->is_virtfn && physfn == vdev->pdev) {
> + pci_info(vdev->pdev, "Captured SR-IOV VF %s
> driver_override\n",
> + pci_name(pdev));
> + pdev->driver_override = kasprintf(GFP_KERNEL, "%s",
> + vfio_pci_ops.name);
> + } else if (action == BUS_NOTIFY_BOUND_DRIVER &&
> + pdev->is_virtfn && physfn == vdev->pdev) {
> + struct pci_driver *drv = pci_dev_driver(pdev);
> +
> + if (drv && drv != &vfio_pci_driver)
> + pci_warn(vdev->pdev,
> + "VF %s bound to driver %s while PF bound to
> vfio-pci\n",
> + pci_name(pdev), drv->name);
> + }
> +
> + return 0;
> +}
>
> static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
> @@ -1539,12 +1574,12 @@ static int vfio_pci_probe(struct pci_dev *pdev,
> const struct pci_device_id *id)
> return -EINVAL;
>
> /*
> - * Prevent binding to PFs with VFs enabled, this too easily allows
> - * userspace instance with VFs and PFs from the same device, which
> - * cannot work. Disabling SR-IOV here would initiate removing the
> - * VFs, which would unbind the driver, which is prone to blocking
> - * if that VF is also in use by vfio-pci. Just reject these PFs
> - * and let the user sort it out.
> + * Prevent binding to PFs with VFs enabled, the VFs might be in use
> + * by the host or other users. We cannot capture the VFs if they
> + * already exist, nor can we track VF users. Disabling SR-IOV here
> + * would initiate removing the VFs, which would unbind the driver,
> + * which is prone to blocking if that VF is also in use by vfio-pci.
> + * Just reject these PFs and let the user sort it out.
> */
> if (pci_num_vf(pdev)) {
> pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n");
> @@ -1592,6 +1627,18 @@ static int vfio_pci_probe(struct pci_dev *pdev,
> const struct pci_device_id *id)
> kfree(vdev);
> return -ENOMEM;
> }
> +
> + vdev->nb.notifier_call = vfio_pci_bus_notifier;
> + ret = bus_register_notifier(&pci_bus_type, &vdev->nb);
> + if (ret) {
> + kfree(vdev->vf_token);
> + vfio_pci_reflck_put(vdev->reflck);
> + vfio_del_group_dev(&pdev->dev);
> + vfio_iommu_group_put(group, &pdev->dev);
> + kfree(vdev);
> + return ret;
> + }
> +
> mutex_init(&vdev->vf_token->lock);
> uuid_gen(&vdev->vf_token->uuid);
> }
> @@ -1625,6 +1672,8 @@ static void vfio_pci_remove(struct pci_dev *pdev)
> {
> struct vfio_pci_device *vdev;
>
> + pci_disable_sriov(pdev);
> +
> vdev = vfio_del_group_dev(&pdev->dev);
> if (!vdev)
> return;
> @@ -1635,6 +1684,9 @@ static void vfio_pci_remove(struct pci_dev *pdev)
> kfree(vdev->vf_token);
> }
>
> + if (vdev->nb.notifier_call)
> + bus_unregister_notifier(&pci_bus_type, &vdev->nb);
> +
> vfio_pci_reflck_put(vdev->reflck);
>
> vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
> @@ -1683,16 +1735,48 @@ static pci_ers_result_t
> vfio_pci_aer_err_detected(struct pci_dev *pdev,
> return PCI_ERS_RESULT_CAN_RECOVER;
> }
>
> +static int vfio_pci_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
> +{
> + struct vfio_pci_device *vdev;
> + struct vfio_device *device;
> + int ret = 0;
> +
> + might_sleep();
> +
> + if (!enable_sriov)
> + return -ENOENT;
> +
> + device = vfio_device_get_from_dev(&pdev->dev);
> + if (!device)
> + return -ENODEV;
> +
> + vdev = vfio_device_data(device);
> + if (!vdev) {
> + vfio_device_put(device);
> + return -ENODEV;
> + }
> +
> + if (nr_virtfn == 0)
> + pci_disable_sriov(pdev);
> + else
> + ret = pci_enable_sriov(pdev, nr_virtfn);
> +
> + vfio_device_put(device);
> +
> + return ret < 0 ? ret : nr_virtfn;
> +}
> +
> static const struct pci_error_handlers vfio_err_handlers = {
> .error_detected = vfio_pci_aer_err_detected,
> };
>
> static struct pci_driver vfio_pci_driver = {
> - .name = "vfio-pci",
> - .id_table = NULL, /* only dynamic ids */
> - .probe = vfio_pci_probe,
> - .remove = vfio_pci_remove,
> - .err_handler = &vfio_err_handlers,
> + .name = "vfio-pci",
> + .id_table = NULL, /* only dynamic ids */
> + .probe = vfio_pci_probe,
> + .remove = vfio_pci_remove,
> + .sriov_configure = vfio_pci_sriov_configure,
> + .err_handler = &vfio_err_handlers,
> };
>
> static DEFINE_MUTEX(reflck_lock);
> diff --git a/drivers/vfio/pci/vfio_pci_private.h
> b/drivers/vfio/pci/vfio_pci_private.h
> index 76c11c915949..36ec69081ecd 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -13,6 +13,7 @@
> #include <linux/irqbypass.h>
> #include <linux/types.h>
> #include <linux/uuid.h>
> +#include <linux/notifier.h>
>
> #ifndef VFIO_PCI_PRIVATE_H
> #define VFIO_PCI_PRIVATE_H
> @@ -130,6 +131,7 @@ struct vfio_pci_device {
> struct mutex ioeventfds_lock;
> struct list_head ioeventfds_list;
> struct vfio_pci_vf_token *vf_token;
> + struct notifier_block nb;
> };
>
> #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)

2020-02-25 06:11:08

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] vfio/pci: SR-IOV support


On 2020/2/25 上午10:33, Tian, Kevin wrote:
>> From: Alex Williamson
>> Sent: Thursday, February 20, 2020 2:54 AM
>>
>> Changes since v1 are primarily to patch 3/7 where the commit log is
>> rewritten, along with option parsing and failure logging based on
>> upstream discussions. The primary user visible difference is that
>> option parsing is now much more strict. If a vf_token option is
>> provided that cannot be used, we generate an error. As a result of
>> this, opening a PF with a vf_token option will serve as a mechanism of
>> setting the vf_token. This seems like a more user friendly API than
>> the alternative of sometimes requiring the option (VFs in use) and
>> sometimes rejecting it, and upholds our desire that the option is
>> always either used or rejected.
>>
>> This also means that the VFIO_DEVICE_FEATURE ioctl is not the only
>> means of setting the VF token, which might call into question whether
>> we absolutely need this new ioctl. Currently I'm keeping it because I
>> can imagine use cases, for example if a hypervisor were to support
>> SR-IOV, the PF device might be opened without consideration for a VF
>> token and we'd require the hypservisor to close and re-open the PF in
>> order to set a known VF token, which is impractical.
>>
>> Series overview (same as provided with v1):
> Thanks for doing this!
>
>> The synopsis of this series is that we have an ongoing desire to drive
>> PCIe SR-IOV PFs from userspace with VFIO. There's an immediate need
>> for this with DPDK drivers and potentially interesting future use
> Can you provide a link to the DPDK discussion?
>
>> cases in virtualization. We've been reluctant to add this support
>> previously due to the dependency and trust relationship between the
>> VF device and PF driver. Minimally the PF driver can induce a denial
>> of service to the VF, but depending on the specific implementation,
>> the PF driver might also be responsible for moving data between VFs
>> or have direct access to the state of the VF, including data or state
>> otherwise private to the VF or VF driver.
> Just a loud thinking. While the motivation of VF token sounds reasonable
> to me, I'm curious why the same concern is not raised in other usages.
> For example, there is no such design in virtio framework, where the
> virtio device could also be restarted, putting in separate process (vhost-user),
> and even in separate VM (virtio-vhost-user), etc.


AFAIK, the restart could only be triggered by either VM or qemu. But
yes, the datapath could be offloaded.

But I'm not sure introducing another dedicated mechanism is better than
using the exist generic POSIX mechanism to make sure the connection
(AF_UINX) is secure.


> Of course the para-
> virtualized attribute of virtio implies some degree of trust, but as you
> mentioned many SR-IOV implementations support VF->PF communication
> which also implies some level of trust. It's perfectly fine if VFIO just tries
> to do better than other sub-systems, but knowing how other people
> tackle the similar problem may make the whole picture clearer. ????
>
> +Jason.


I'm not quite sure e.g allowing userspace PF driver with kernel VF
driver would not break the assumption of kernel security model. At least
we should forbid a unprivileged PF driver running in userspace.

Thanks

2020-02-27 17:34:54

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user

On Wed, 19 Feb 2020 11:54:18 -0700
Alex Williamson <[email protected]> wrote:

> The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device
> agnostic ioctl for setting, retrieving, and probing device features.
> This implementation provides a 16-bit field for specifying a feature
> index, where the data porition of the ioctl is determined by the
> semantics for the given feature. Additional flag bits indicate the
> direction and nature of the operation; SET indicates user data is
> provided into the device feature, GET indicates the device feature is
> written out into user data. The PROBE flag augments determining
> whether the given feature is supported, and if provided, whether the
> given operation on the feature is supported.
>
> The first user of this ioctl is for setting the vfio-pci VF token,
> where the user provides a shared secret key (UUID) on a SR-IOV PF
> device, which users must provide when opening associated VF devices.
>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
> drivers/vfio/pci/vfio_pci.c | 52 +++++++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/vfio.h | 37 +++++++++++++++++++++++++++++++
> 2 files changed, 89 insertions(+)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 8dd6ef9543ca..e4d5d26e5e71 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1180,6 +1180,58 @@ static long vfio_pci_ioctl(void *device_data,
>
> return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
> ioeventfd.data, count, ioeventfd.fd);
> + } else if (cmd == VFIO_DEVICE_FEATURE) {
> + struct vfio_device_feature feature;
> + uuid_t uuid;
> +
> + minsz = offsetofend(struct vfio_device_feature, flags);
> +
> + if (copy_from_user(&feature, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (feature.argsz < minsz)
> + return -EINVAL;
> +
> + if (feature.flags & ~(VFIO_DEVICE_FEATURE_MASK |
> + VFIO_DEVICE_FEATURE_SET |
> + VFIO_DEVICE_FEATURE_GET |
> + VFIO_DEVICE_FEATURE_PROBE))
> + return -EINVAL;

GET|SET|PROBE is well-defined, but what about GET|SET without PROBE? Do
we want to fence this in the generic ioctl handler part? Or is there
any sane way to implement that (read and then write back something?)

> +
> + switch (feature.flags & VFIO_DEVICE_FEATURE_MASK) {
> + case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
> + if (!vdev->vf_token)
> + return -ENOTTY;
> +
> + /*
> + * We do not support GET of the VF Token UUID as this
> + * could expose the token of the previous device user.
> + */
> + if (feature.flags & VFIO_DEVICE_FEATURE_GET)
> + return -EINVAL;
> +
> + if (feature.flags & VFIO_DEVICE_FEATURE_PROBE)
> + return 0;
> +
> + /* Don't SET unless told to do so */
> + if (!(feature.flags & VFIO_DEVICE_FEATURE_SET))
> + return -EINVAL;
> +
> + if (feature.argsz < minsz + sizeof(uuid))
> + return -EINVAL;
> +
> + if (copy_from_user(&uuid, (void __user *)(arg + minsz),
> + sizeof(uuid)))
> + return -EFAULT;
> +
> + mutex_lock(&vdev->vf_token->lock);
> + uuid_copy(&vdev->vf_token->uuid, &uuid);
> + mutex_unlock(&vdev->vf_token->lock);
> +
> + return 0;
> + default:
> + return -ENOTTY;
> + }
> }
>
> return -ENOTTY;
(...)

2020-03-05 06:39:45

by Vamsi Krishna Attunuru

[permalink] [raw]
Subject: RE: [dpdk-dev] [PATCH v2 0/7] vfio/pci: SR-IOV support


> -----Original Message-----
> From: dev <[email protected]> On Behalf Of Alex Williamson
> Sent: Thursday, February 20, 2020 12:24 AM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: [dpdk-dev] [PATCH v2 0/7] vfio/pci: SR-IOV support
>
> Changes since v1 are primarily to patch 3/7 where the commit log is
> rewritten, along with option parsing and failure logging based on upstream
> discussions. The primary user visible difference is that option parsing is now
> much more strict. If a vf_token option is provided that cannot be used, we
> generate an error. As a result of this, opening a PF with a vf_token option
> will serve as a mechanism of setting the vf_token. This seems like a more
> user friendly API than the alternative of sometimes requiring the option (VFs
> in use) and sometimes rejecting it, and upholds our desire that the option is
> always either used or rejected.
>
> This also means that the VFIO_DEVICE_FEATURE ioctl is not the only means
> of setting the VF token, which might call into question whether we absolutely
> need this new ioctl. Currently I'm keeping it because I can imagine use cases,
> for example if a hypervisor were to support SR-IOV, the PF device might be
> opened without consideration for a VF token and we'd require the
> hypservisor to close and re-open the PF in order to set a known VF token,
> which is impractical.
>
> Series overview (same as provided with v1):
>
> The synopsis of this series is that we have an ongoing desire to drive PCIe SR-
> IOV PFs from userspace with VFIO. There's an immediate need for this with
> DPDK drivers and potentially interesting future use cases in virtualization.
> We've been reluctant to add this support previously due to the dependency
> and trust relationship between the VF device and PF driver. Minimally the PF
> driver can induce a denial of service to the VF, but depending on the specific
> implementation, the PF driver might also be responsible for moving data
> between VFs or have direct access to the state of the VF, including data or
> state otherwise private to the VF or VF driver.
>
> To help resolve these concerns, we introduce a VF token into the VFIO PCI
> ABI, which acts as a shared secret key between drivers. The userspace PF
> driver is required to set the VF token to a known value and userspace VF
> drivers are required to provide the token to access the VF device. If a PF
> driver is restarted with VF drivers in use, it must also provide the current
> token in order to prevent a rogue untrusted PF driver from replacing a known
> driver. The degree to which this new token is considered secret is left to the
> userspace drivers, the kernel intentionally provides no means to retrieve the
> current token.
>
> Note that the above token is only required for this new model where both
> the PF and VF devices are usable through vfio-pci. Existing models of VFIO
> drivers where the PF is used without SR-IOV enabled or the VF is bound to a
> userspace driver with an in-kernel, host PF driver are unaffected.
>
> The latter configuration above also highlights a new inverted scenario that is
> now possible, a userspace PF driver with in-kernel VF drivers.
> I believe this is a scenario that should be allowed, but should not be enabled
> by default. This series includes code to set a default driver_override for VFs
> sourced from a vfio-pci user owned PF, such that the VFs are also bound to
> vfio-pci. This model is compatible with tools like driverctl and allows the
> system administrator to decide if other bindings should be enabled. The VF
> token interface above exists only between vfio-pci PF and VF drivers, once a
> VF is bound to another driver, the administrator has effectively pronounced
> the device as trusted. The vfio-pci driver will note alternate binding in dmesg
> for logging and debugging purposes.
>
> Please review, comment, and test. The example QEMU implementation
> provided with the RFC is still current for this version. Thanks,
>
> Alex

Hi Alex,

Thanks for enabling this feature support.

Tested-by: Vamsi Attunuru <[email protected]>

Tested v2 patch set with below DPDK patch.
http://patches.dpdk.org/patch/66281/

Regards
A Vamsi

>
> RFC: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lore.kernel.org_lkml_158085337582.9445.17682266437583505502.stgit-
> 40gimli.home_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=2rpxxNF2qeP0
> 2gVZIWTVrW-6zNZz5-uKt9pRqpR_M3U&m=V-6mKmCTHPZa5jwepXU_-
> Ma1_BGF0OWJ_IRCF_p4GVo&s=YnO98PGK9ro7F6_XZTccHdYcZ-
> rMMOin0nRFhPD6Uv4&e=
> v1: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lore.kernel.org_lkml_158145472604.16827.15751375540102298130.stgit
> -
> 40gimli.home_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=2rpxxNF2qeP0
> 2gVZIWTVrW-6zNZz5-uKt9pRqpR_M3U&m=V-6mKmCTHPZa5jwepXU_-
> Ma1_BGF0OWJ_IRCF_p4GVo&s=rvUxLCENwNk0GBYkcsBVVobsLfMb4BV5gtc
> 3VqYQTS4&e=
>
> ---
>
> Alex Williamson (7):
> vfio: Include optional device match in vfio_device_ops callbacks
> vfio/pci: Implement match ops
> vfio/pci: Introduce VF token
> vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
> vfio/pci: Add sriov_configure support
> vfio/pci: Remove dev_fmt definition
> vfio/pci: Cleanup .probe() exit paths
>
>
> drivers/vfio/pci/vfio_pci.c | 383
> +++++++++++++++++++++++++++++++++--
> drivers/vfio/pci/vfio_pci_private.h | 10 +
> drivers/vfio/vfio.c | 20 +-
> include/linux/vfio.h | 4
> include/uapi/linux/vfio.h | 37 +++
> 5 files changed, 426 insertions(+), 28 deletions(-)

2020-03-05 17:22:09

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] vfio/pci: SR-IOV support

On Tue, 25 Feb 2020 14:09:07 +0800
Jason Wang <[email protected]> wrote:

> On 2020/2/25 上午10:33, Tian, Kevin wrote:
> >> From: Alex Williamson
> >> Sent: Thursday, February 20, 2020 2:54 AM
> >>
> >> Changes since v1 are primarily to patch 3/7 where the commit log is
> >> rewritten, along with option parsing and failure logging based on
> >> upstream discussions. The primary user visible difference is that
> >> option parsing is now much more strict. If a vf_token option is
> >> provided that cannot be used, we generate an error. As a result of
> >> this, opening a PF with a vf_token option will serve as a mechanism of
> >> setting the vf_token. This seems like a more user friendly API than
> >> the alternative of sometimes requiring the option (VFs in use) and
> >> sometimes rejecting it, and upholds our desire that the option is
> >> always either used or rejected.
> >>
> >> This also means that the VFIO_DEVICE_FEATURE ioctl is not the only
> >> means of setting the VF token, which might call into question whether
> >> we absolutely need this new ioctl. Currently I'm keeping it because I
> >> can imagine use cases, for example if a hypervisor were to support
> >> SR-IOV, the PF device might be opened without consideration for a VF
> >> token and we'd require the hypservisor to close and re-open the PF in
> >> order to set a known VF token, which is impractical.
> >>
> >> Series overview (same as provided with v1):
> > Thanks for doing this!
> >
> >> The synopsis of this series is that we have an ongoing desire to drive
> >> PCIe SR-IOV PFs from userspace with VFIO. There's an immediate need
> >> for this with DPDK drivers and potentially interesting future use
> > Can you provide a link to the DPDK discussion?
> >
> >> cases in virtualization. We've been reluctant to add this support
> >> previously due to the dependency and trust relationship between the
> >> VF device and PF driver. Minimally the PF driver can induce a denial
> >> of service to the VF, but depending on the specific implementation,
> >> the PF driver might also be responsible for moving data between VFs
> >> or have direct access to the state of the VF, including data or state
> >> otherwise private to the VF or VF driver.
> > Just a loud thinking. While the motivation of VF token sounds reasonable
> > to me, I'm curious why the same concern is not raised in other usages.
> > For example, there is no such design in virtio framework, where the
> > virtio device could also be restarted, putting in separate process (vhost-user),
> > and even in separate VM (virtio-vhost-user), etc.
>
>
> AFAIK, the restart could only be triggered by either VM or qemu. But
> yes, the datapath could be offloaded.
>
> But I'm not sure introducing another dedicated mechanism is better than
> using the exist generic POSIX mechanism to make sure the connection
> (AF_UINX) is secure.
>
>
> > Of course the para-
> > virtualized attribute of virtio implies some degree of trust, but as you
> > mentioned many SR-IOV implementations support VF->PF communication
> > which also implies some level of trust. It's perfectly fine if VFIO just tries
> > to do better than other sub-systems, but knowing how other people
> > tackle the similar problem may make the whole picture clearer. ????
> >
> > +Jason.
>
>
> I'm not quite sure e.g allowing userspace PF driver with kernel VF
> driver would not break the assumption of kernel security model. At least
> we should forbid a unprivileged PF driver running in userspace.

It might be useful to have your opinion on this series, because that's
exactly what we're trying to do here. Various environments, DPDK
specifically, want a userspace PF driver. This series takes steps to
mitigate the risk of having such a driver, such as requiring this VF
token interface to extend the VFIO interface and validate participation
around a PF that is not considered trusted by the kernel. We also set
a driver_override to try to make sure no host kernel driver can
automatically bind to a VF of a user owned PF, only vfio-pci, but we
don't prevent the admin from creating configurations where the VFs are
used by other host kernel drivers.

I think the question Kevin is inquiring about is whether virtio devices
are susceptible to the type of collaborative, shared key environment
we're creating here. For example, can a VM or qemu have access to
reset a virtio device in a way that could affect other devices, ex. FLR
on a PF that could interfere with VF operation. Thanks,

Alex

2020-03-05 17:35:12

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] vfio/pci: SR-IOV support

Hi Kevin,

Sorry for the delay, I've been out on PTO...

On Tue, 25 Feb 2020 02:33:27 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Alex Williamson
> > Sent: Thursday, February 20, 2020 2:54 AM
> >
> > Changes since v1 are primarily to patch 3/7 where the commit log is
> > rewritten, along with option parsing and failure logging based on
> > upstream discussions. The primary user visible difference is that
> > option parsing is now much more strict. If a vf_token option is
> > provided that cannot be used, we generate an error. As a result of
> > this, opening a PF with a vf_token option will serve as a mechanism of
> > setting the vf_token. This seems like a more user friendly API than
> > the alternative of sometimes requiring the option (VFs in use) and
> > sometimes rejecting it, and upholds our desire that the option is
> > always either used or rejected.
> >
> > This also means that the VFIO_DEVICE_FEATURE ioctl is not the only
> > means of setting the VF token, which might call into question whether
> > we absolutely need this new ioctl. Currently I'm keeping it because I
> > can imagine use cases, for example if a hypervisor were to support
> > SR-IOV, the PF device might be opened without consideration for a VF
> > token and we'd require the hypservisor to close and re-open the PF in
> > order to set a known VF token, which is impractical.
> >
> > Series overview (same as provided with v1):
>
> Thanks for doing this!
>
> >
> > The synopsis of this series is that we have an ongoing desire to drive
> > PCIe SR-IOV PFs from userspace with VFIO. There's an immediate need
> > for this with DPDK drivers and potentially interesting future use
>
> Can you provide a link to the DPDK discussion?

There's a thread here which proposed an out-of-tree driver that enables
a parallel sr-iov enabling interface for a vfio-pci own device.
Clearly I felt strongly about it ;)

https://patches.dpdk.org/patch/58810/

Also, documentation for making use of an Intel FPGA device with DPDK
requires the PF bound to igb_uio to support enabling SR-IOV:

https://doc.dpdk.org/guides/bbdevs/fpga_lte_fec.html

> > cases in virtualization. We've been reluctant to add this support
> > previously due to the dependency and trust relationship between the
> > VF device and PF driver. Minimally the PF driver can induce a denial
> > of service to the VF, but depending on the specific implementation,
> > the PF driver might also be responsible for moving data between VFs
> > or have direct access to the state of the VF, including data or state
> > otherwise private to the VF or VF driver.
>
> Just a loud thinking. While the motivation of VF token sounds reasonable
> to me, I'm curious why the same concern is not raised in other usages.
> For example, there is no such design in virtio framework, where the
> virtio device could also be restarted, putting in separate process (vhost-user),
> and even in separate VM (virtio-vhost-user), etc. Of course the para-
> virtualized attribute of virtio implies some degree of trust, but as you
> mentioned many SR-IOV implementations support VF->PF communication
> which also implies some level of trust. It's perfectly fine if VFIO just tries
> to do better than other sub-systems, but knowing how other people
> tackle the similar problem may make the whole picture clearer. ????
>
> +Jason.

We can follow the thread with Jason, but I can't really speak to
whether virtio needs something similar or doesn't provide enough PF
access to be concerned. If they need a similar solution, we can
collaborate, but the extension we're defining here is specifically part
of the vfio-pci ABI, so it might not be easily portable to virtio.

> > To help resolve these concerns, we introduce a VF token into the VFIO
> > PCI ABI, which acts as a shared secret key between drivers. The
> > userspace PF driver is required to set the VF token to a known value
> > and userspace VF drivers are required to provide the token to access
> > the VF device. If a PF driver is restarted with VF drivers in use, it
> > must also provide the current token in order to prevent a rogue
> > untrusted PF driver from replacing a known driver. The degree to
> > which this new token is considered secret is left to the userspace
> > drivers, the kernel intentionally provides no means to retrieve the
> > current token.
>
> I'm wondering whether the token idea can be used beyond SR-IOV, e.g.
> (1) we may allow vfio user space to manage Scalable IOV in the future,
> which faces the similar challenge between the PF and mdev; (2) the
> token might be used as a canonical way to replace off-tree acs-override
> workaround, say, allowing the admin to assign devices within the
> same iommu group to different VMs which trust each other. I'm not
> sure how much complexity will be further introduced, but it's greatly
> appreciated if you can help think a bit and if feasible abstract some
> logic in vfio core layer for such potential usages...

I don't see how this can be used for ACS override. Lacking ACS, we
must assume lack of DMA isolation, which results in our IOMMU grouping.
If we split IOMMU groups, that implies something that doesn't exist. A
user can already create a process that can own the vfio group and pass
vfio devices to other tasks, with the restriction of having a single
DMA address space. If there is DMA isolation, then an mdev solution
might be better, but given the IOMMU integration of SIOV, I'm not sure
why the devices wouldn't simply be placed in separate groups by the
IOMMU driver. Thanks,

Alex

> > Note that the above token is only required for this new model where
> > both the PF and VF devices are usable through vfio-pci. Existing
> > models of VFIO drivers where the PF is used without SR-IOV enabled
> > or the VF is bound to a userspace driver with an in-kernel, host PF
> > driver are unaffected.
> >
> > The latter configuration above also highlights a new inverted scenario
> > that is now possible, a userspace PF driver with in-kernel VF drivers.
> > I believe this is a scenario that should be allowed, but should not be
> > enabled by default. This series includes code to set a default
> > driver_override for VFs sourced from a vfio-pci user owned PF, such
> > that the VFs are also bound to vfio-pci. This model is compatible
> > with tools like driverctl and allows the system administrator to
> > decide if other bindings should be enabled. The VF token interface
> > above exists only between vfio-pci PF and VF drivers, once a VF is
> > bound to another driver, the administrator has effectively pronounced
> > the device as trusted. The vfio-pci driver will note alternate
> > binding in dmesg for logging and debugging purposes.
> >
> > Please review, comment, and test. The example QEMU implementation
> > provided with the RFC is still current for this version. Thanks,
> >
> > Alex
> >
> > RFC:
> > https://lore.kernel.org/lkml/158085337582.9445.17682266437583505502.stg
> > [email protected]/
> > v1:
> > https://lore.kernel.org/lkml/158145472604.16827.15751375540102298130.st
> > [email protected]/
> >
> > ---
> >
> > Alex Williamson (7):
> > vfio: Include optional device match in vfio_device_ops callbacks
> > vfio/pci: Implement match ops
> > vfio/pci: Introduce VF token
> > vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
> > vfio/pci: Add sriov_configure support
> > vfio/pci: Remove dev_fmt definition
> > vfio/pci: Cleanup .probe() exit paths
> >
> >
> > drivers/vfio/pci/vfio_pci.c | 383
> > +++++++++++++++++++++++++++++++++--
> > drivers/vfio/pci/vfio_pci_private.h | 10 +
> > drivers/vfio/vfio.c | 20 +-
> > include/linux/vfio.h | 4
> > include/uapi/linux/vfio.h | 37 +++
> > 5 files changed, 426 insertions(+), 28 deletions(-)
>

2020-03-05 18:19:38

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] vfio/pci: Introduce VF token

On Tue, 25 Feb 2020 02:59:37 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Alex Williamson
> > Sent: Thursday, February 20, 2020 2:54 AM
> >
> > If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
> > fully isolated from the PF. The PF can always cause a denial of service
> > to the VF, even if by simply resetting itself. The degree to which a PF
> > can access the data passed through a VF or interfere with its operation
> > is dependent on a given SR-IOV implementation. Therefore we want to
> > avoid a scenario where an existing vfio-pci based userspace driver might
> > assume the PF driver is trusted, for example assigning a PF to one VM
> > and VF to another with some expectation of isolation. IOMMU grouping
> > could be a solution to this, but imposes an unnecessarily strong
> > relationship between PF and VF drivers if they need to operate with the
> > same IOMMU context. Instead we introduce a "VF token", which is
> > essentially just a shared secret between PF and VF drivers, implemented
> > as a UUID.
> >
> > The VF token can be set by a vfio-pci based PF driver and must be known
> > by the vfio-pci based VF driver in order to gain access to the device.
> > This allows the degree to which this VF token is considered secret to be
> > determined by the applications and environment. For example a VM might
> > generate a random UUID known only internally to the hypervisor while a
> > userspace networking appliance might use a shared, or even well know,
> > UUID among the application drivers.
> >
> > To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD interface is
> > extended to accept key=value pairs in addition to the device name. This
> > allows us to most easily deny user access to the device without risk
> > that existing userspace drivers assume region offsets, IRQs, and other
> > device features, leading to more elaborate error paths. The format of
> > these options are expected to take the form:
> >
> > "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"
> >
> > Where the device name is always provided first for compatibility and
> > additional options are specified in a space separated list. The
> > relation between and requirements for the additional options will be
> > vfio bus driver dependent, however unknown or unused option within this
> > schema should return error. This allow for future use of unknown
> > options as well as a positive indication to the user that an option is
> > used.
> >
> > An example VF token option would take this form:
> >
> > "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"
> >
> > When accessing a VF where the PF is making use of vfio-pci, the user
> > MUST provide the current vf_token. When accessing a PF, the user MUST
> > provide the current vf_token IF there are active VF users or MAY provide
> > a vf_token in order to set the current VF token when no VF users are
> > active. The former requirement assures VF users that an unassociated
> > driver cannot usurp the PF device. These semantics also imply that a
> > VF token MUST be set by a PF driver before VF drivers can access their
> > device, the default token is random and mechanisms to read the token are
> > not provided in order to protect the VF token of previous users. Use of
> > the vf_token option outside of these cases will return an error, as
> > discussed above.
> >
> > Signed-off-by: Alex Williamson <[email protected]>
> > ---
> > drivers/vfio/pci/vfio_pci.c | 198
> > +++++++++++++++++++++++++++++++++++
> > drivers/vfio/pci/vfio_pci_private.h | 8 +
> > 2 files changed, 205 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index 2ec6c31d0ab0..8dd6ef9543ca 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -466,6 +466,44 @@ static void vfio_pci_disable(struct vfio_pci_device
> > *vdev)
> > vfio_pci_set_power_state(vdev, PCI_D3hot);
> > }
> >
> > +static struct pci_driver vfio_pci_driver;
> > +
> > +static struct vfio_pci_device *get_pf_vdev(struct vfio_pci_device *vdev,
> > + struct vfio_device **pf_dev)
> > +{
> > + struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > +
> > + if (!vdev->pdev->is_virtfn)
> > + return NULL;
> > +
> > + *pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > + if (!*pf_dev)
> > + return NULL;
> > +
> > + if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> > + vfio_device_put(*pf_dev);
> > + return NULL;
> > + }
> > +
> > + return vfio_device_data(*pf_dev);
> > +}
> > +
> > +static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int val)
> > +{
> > + struct vfio_device *pf_dev;
> > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev, &pf_dev);
> > +
> > + if (!pf_vdev)
> > + return;
> > +
> > + mutex_lock(&pf_vdev->vf_token->lock);
> > + pf_vdev->vf_token->users += val;
> > + WARN_ON(pf_vdev->vf_token->users < 0);
> > + mutex_unlock(&pf_vdev->vf_token->lock);
> > +
> > + vfio_device_put(pf_dev);
> > +}
> > +
> > static void vfio_pci_release(void *device_data)
> > {
> > struct vfio_pci_device *vdev = device_data;
> > @@ -473,6 +511,7 @@ static void vfio_pci_release(void *device_data)
> > mutex_lock(&vdev->reflck->lock);
> >
> > if (!(--vdev->refcnt)) {
> > + vfio_pci_vf_token_user_add(vdev, -1);
> > vfio_spapr_pci_eeh_release(vdev->pdev);
> > vfio_pci_disable(vdev);
> > }
> > @@ -498,6 +537,7 @@ static int vfio_pci_open(void *device_data)
> > goto error;
> >
> > vfio_spapr_pci_eeh_open(vdev->pdev);
> > + vfio_pci_vf_token_user_add(vdev, 1);
> > }
> > vdev->refcnt++;
> > error:
> > @@ -1278,11 +1318,148 @@ static void vfio_pci_request(void *device_data,
> > unsigned int count)
> > mutex_unlock(&vdev->igate);
> > }
> >
> > +static int vfio_pci_validate_vf_token(struct vfio_pci_device *vdev,
> > + bool vf_token, uuid_t *uuid)
> > +{
> > + /*
> > + * There's always some degree of trust or collaboration between SR-
> > IOV
> > + * PF and VFs, even if just that the PF hosts the SR-IOV capability and
> > + * can disrupt VFs with a reset, but often the PF has more explicit
> > + * access to deny service to the VF or access data passed through the
> > + * VF. We therefore require an opt-in via a shared VF token (UUID)
> > to
> > + * represent this trust. This both prevents that a VF driver might
> > + * assume the PF driver is a trusted, in-kernel driver, and also that
> > + * a PF driver might be replaced with a rogue driver, unknown to in-
> > use
> > + * VF drivers.
> > + *
> > + * Therefore when presented with a VF, if the PF is a vfio device and
> > + * it is bound to the vfio-pci driver, the user needs to provide a VF
> > + * token to access the device, in the form of appending a vf_token to
> > + * the device name, for example:
> > + *
> > + * "0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-f591514ba1f3"
> > + *
> > + * When presented with a PF which has VFs in use, the user must also
> > + * provide the current VF token to prove collaboration with existing
> > + * VF users. If VFs are not in use, the VF token provided for the PF
> > + * device will act to set the VF token.
> > + *
> > + * If the VF token is provided but unused, a fault is generated.
>
> fault->error, otherwise it is easy to consider a CPU fault. ????

Ok, I can make that change, but I think you might have a unique
background to make a leap that a userspace ioctl can trigger a CPU
fault ;)

> > + */
> > + if (!vdev->pdev->is_virtfn && !vdev->vf_token && !vf_token)
> > + return 0; /* No VF token provided or required */
> > +
> > + if (vdev->pdev->is_virtfn) {
> > + struct vfio_device *pf_dev;
> > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> > &pf_dev);
> > + bool match;
> > +
> > + if (!pf_vdev) {
> > + if (!vf_token)
> > + return 0; /* PF is not vfio-pci, no VF token */
> > +
> > + pci_info_ratelimited(vdev->pdev,
> > + "VF token incorrectly provided, PF not bound
> > to vfio-pci\n");
> > + return -EINVAL;
> > + }
> > +
> > + if (!vf_token) {
> > + vfio_device_put(pf_dev);
> > + pci_info_ratelimited(vdev->pdev,
> > + "VF token required to access device\n");
> > + return -EACCES;
> > + }
> > +
> > + mutex_lock(&pf_vdev->vf_token->lock);
> > + match = uuid_equal(uuid, &pf_vdev->vf_token->uuid);
> > + mutex_unlock(&pf_vdev->vf_token->lock);
> > +
> > + vfio_device_put(pf_dev);
> > +
> > + if (!match) {
> > + pci_info_ratelimited(vdev->pdev,
> > + "Incorrect VF token provided for device\n");
> > + return -EACCES;
> > + }
> > + } else if (vdev->vf_token) {
> > + mutex_lock(&vdev->vf_token->lock);
> > + if (vdev->vf_token->users) {
> > + if (!vf_token) {
> > + mutex_unlock(&vdev->vf_token->lock);
> > + pci_info_ratelimited(vdev->pdev,
> > + "VF token required to access
> > device\n");
> > + return -EACCES;
> > + }
> > +
> > + if (!uuid_equal(uuid, &vdev->vf_token->uuid)) {
> > + mutex_unlock(&vdev->vf_token->lock);
> > + pci_info_ratelimited(vdev->pdev,
> > + "Incorrect VF token provided for
> > device\n");
> > + return -EACCES;
> > + }
> > + } else if (vf_token) {
> > + uuid_copy(&vdev->vf_token->uuid, uuid);
> > + }
>
> It implies that we allow PF to be accessed w/o providing a VF token,
> as long as no VF is currently in-use, which further means no VF can
> be further assigned since no one knows the random uuid allocated
> by vfio. Just want to confirm whether it is the desired flavor. If an
> user really wants to use PF-only, possibly he should disable SR-IOV
> instead...

Yes, this is the behavior I'm intending. Are you suggesting that we
should require a VF token in order to access a PF that has SR-IOV
already enabled? This introduces an inconsistency that SR-IOV can be
enabled via sysfs asynchronous to the GET_DEVICE_FD ioctl, so we'd need
to secure the sysfs interface to only allow enabling SR-IOV when the PF
is already opened to cases where the VF token is already set? Thus
SR-IOV could be pre-enabled, but the user must provide a vf_token
option on GET_DEVICE_FD, otherwise SR-IOV could only be enabled after
the user sets a VF token. But then do we need to invalidate the token
at some point, or else it seems like we have the same scenario when the
next user comes along. We believe there are PFs that require no
special VF support other than sriov_configure, so those driver could
theoretically close the PF after setting a VF token. That makes it
difficult to determine the lifetime of a VF token and leads to the
interface proposed here of an initial random token, then the user set
token persisting indefinitely.

I've tended consider all of these to be mechanisms that a user can
shoot themselves in the foot. Yes, the user and admin can do things
that will fail to work with this interface, for example my testing
involves QEMU, where we don't expose SR-IOV to the guest yet and the
igb driver for the PF will encounter problems running a device with
SR-IOV enabled that it doesn't know about. Do we want to try to play
nanny and require specific semantics? I've opt'd for the more simple
code here.

> > +
> > + mutex_unlock(&vdev->vf_token->lock);
> > + } else if (vf_token) {
> > + pci_info_ratelimited(vdev->pdev,
> > + "VF token incorrectly provided, not a PF or VF\n");
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +#define VF_TOKEN_ARG "vf_token="
> > +
> > static int vfio_pci_match(void *device_data, char *buf)
> > {
> > struct vfio_pci_device *vdev = device_data;
> > + bool vf_token = false;
> > + uuid_t uuid;
> > + int ret;
> > +
> > + if (strncmp(pci_name(vdev->pdev), buf, strlen(pci_name(vdev-
> > >pdev))))
> > + return 0; /* No match */
> > +
> > + if (strlen(buf) > strlen(pci_name(vdev->pdev))) {
> > + buf += strlen(pci_name(vdev->pdev));
> > +
> > + if (*buf != ' ')
> > + return 0; /* No match: non-whitespace after name */
> > +
> > + while (*buf) {
> > + if (*buf == ' ') {
> > + buf++;
> > + continue;
> > + }
> > +
> > + if (!vf_token && !strncmp(buf, VF_TOKEN_ARG,
> > + strlen(VF_TOKEN_ARG))) {
> > + buf += strlen(VF_TOKEN_ARG);
> > +
> > + if (strlen(buf) < UUID_STRING_LEN)
> > + return -EINVAL;
> > +
> > + ret = uuid_parse(buf, &uuid);
> > + if (ret)
> > + return ret;
> >
> > - return !strcmp(pci_name(vdev->pdev), buf);
> > + vf_token = true;
> > + buf += UUID_STRING_LEN;
> > + } else {
> > + /* Unknown/duplicate option */
> > + return -EINVAL;
> > + }
> > + }
> > + }
> > +
> > + ret = vfio_pci_validate_vf_token(vdev, vf_token, &uuid);
> > + if (ret)
> > + return ret;
> > +
> > + return 1; /* Match */
> > }
> >
> > static const struct vfio_device_ops vfio_pci_ops = {
> > @@ -1354,6 +1531,19 @@ static int vfio_pci_probe(struct pci_dev *pdev,
> > const struct pci_device_id *id)
> > return ret;
> > }
> >
> > + if (pdev->is_physfn) {
> > + vdev->vf_token = kzalloc(sizeof(*vdev->vf_token),
> > GFP_KERNEL);
> > + if (!vdev->vf_token) {
> > + vfio_pci_reflck_put(vdev->reflck);
> > + vfio_del_group_dev(&pdev->dev);
> > + vfio_iommu_group_put(group, &pdev->dev);
> > + kfree(vdev);
> > + return -ENOMEM;
> > + }
> > + mutex_init(&vdev->vf_token->lock);
> > + uuid_gen(&vdev->vf_token->uuid);
>
> should we also regenerate a random uuid somewhere when SR-IOV is
> disabled and then re-enabled on a PF? Although vfio disallows userspace
> to read uuid, it is always safer to avoid caching a secret from previous
> user.

What if our user is QEMU emulating SR-IOV to the guest. Do we want to
force a new VF token is set every time we bounce the VFs? Why? As
above, the session lifetime of the VF token might be difficult to
determine and I'm not sure paranoia is a sufficient reason to try to
create boundaries for it. Thanks,

Alex

> > + }
> > +
> > if (vfio_pci_is_vga(pdev)) {
> > vga_client_register(pdev, vdev, NULL,
> > vfio_pci_set_vga_decode);
> > vga_set_legacy_decoding(pdev,
> > @@ -1387,6 +1577,12 @@ static void vfio_pci_remove(struct pci_dev *pdev)
> > if (!vdev)
> > return;
> >
> > + if (vdev->vf_token) {
> > + WARN_ON(vdev->vf_token->users);
> > + mutex_destroy(&vdev->vf_token->lock);
> > + kfree(vdev->vf_token);
> > + }
> > +
> > vfio_pci_reflck_put(vdev->reflck);
> >
> > vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
> > diff --git a/drivers/vfio/pci/vfio_pci_private.h
> > b/drivers/vfio/pci/vfio_pci_private.h
> > index 8a2c7607d513..76c11c915949 100644
> > --- a/drivers/vfio/pci/vfio_pci_private.h
> > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > @@ -12,6 +12,7 @@
> > #include <linux/pci.h>
> > #include <linux/irqbypass.h>
> > #include <linux/types.h>
> > +#include <linux/uuid.h>
> >
> > #ifndef VFIO_PCI_PRIVATE_H
> > #define VFIO_PCI_PRIVATE_H
> > @@ -84,6 +85,12 @@ struct vfio_pci_reflck {
> > struct mutex lock;
> > };
> >
> > +struct vfio_pci_vf_token {
> > + struct mutex lock;
> > + uuid_t uuid;
> > + int users;
> > +};
> > +
> > struct vfio_pci_device {
> > struct pci_dev *pdev;
> > void __iomem *barmap[PCI_STD_NUM_BARS];
> > @@ -122,6 +129,7 @@ struct vfio_pci_device {
> > struct list_head dummy_resources_list;
> > struct mutex ioeventfds_lock;
> > struct list_head ioeventfds_list;
> > + struct vfio_pci_vf_token *vf_token;
> > };
> >
> > #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
>

2020-03-05 18:23:11

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] vfio/pci: Add sriov_configure support

On Tue, 25 Feb 2020 03:08:00 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Alex Williamson
> > Sent: Thursday, February 20, 2020 2:54 AM
> >
> > With the VF Token interface we can now expect that a vfio userspace
> > driver must be in collaboration with the PF driver, an unwitting
> > userspace driver will not be able to get past the GET_DEVICE_FD step
> > in accessing the device. We can now move on to actually allowing
> > SR-IOV to be enabled by vfio-pci on the PF. Support for this is not
> > enabled by default in this commit, but it does provide a module option
> > for this to be enabled (enable_sriov=1). Enabling VFs is rather
> > straightforward, except we don't want to risk that a VF might get
> > autoprobed and bound to other drivers, so a bus notifier is used to
> > "capture" VFs to vfio-pci using the driver_override support. We
> > assume any later action to bind the device to other drivers is
> > condoned by the system admin and allow it with a log warning.
> >
> > vfio-pci will disable SR-IOV on a PF before releasing the device,
> > allowing a VF driver to be assured other drivers cannot take over the
> > PF and that any other userspace driver must know the shared VF token.
> > This support also does not provide a mechanism for the PF userspace
> > driver itself to manipulate SR-IOV through the vfio API. With this
> > patch SR-IOV can only be enabled via the host sysfs interface and the
> > PF driver user cannot create or remove VFs.
>
> I'm not sure how many devices can be properly configured simply
> with pci_enable_sriov. It is not unusual to require PF driver prepare
> something before turning PCI SR-IOV capability. If you look kernel
> PF drivers, there are only two using generic pci_sriov_configure_
> simple (simple wrapper like pci_enable_sriov), while most others
> implementing their own callback. However vfio itself has no idea
> thus I'm not sure how an user knows whether using this option can
> actually meet his purpose. I may miss something here, possibly
> using DPDK as an example will make it clearer.

There is still the entire vfio userspace driver interface. Imagine for
example that QEMU emulates the SR-IOV capability and makes a call out
to libvirt (or maybe runs with privs for the PF SR-IOV sysfs attribs)
when the guest enables SR-IOV. Can't we assume that any PF specific
support can still be performed in the userspace/guest driver, leaving
us with a very simple and generic sriov_configure callback in vfio-pci?
Thanks,

Alex

2020-03-05 20:52:20

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user

On Thu, 27 Feb 2020 18:34:07 +0100
Cornelia Huck <[email protected]> wrote:

> On Wed, 19 Feb 2020 11:54:18 -0700
> Alex Williamson <[email protected]> wrote:
>
> > The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device
> > agnostic ioctl for setting, retrieving, and probing device features.
> > This implementation provides a 16-bit field for specifying a feature
> > index, where the data porition of the ioctl is determined by the
> > semantics for the given feature. Additional flag bits indicate the
> > direction and nature of the operation; SET indicates user data is
> > provided into the device feature, GET indicates the device feature is
> > written out into user data. The PROBE flag augments determining
> > whether the given feature is supported, and if provided, whether the
> > given operation on the feature is supported.
> >
> > The first user of this ioctl is for setting the vfio-pci VF token,
> > where the user provides a shared secret key (UUID) on a SR-IOV PF
> > device, which users must provide when opening associated VF devices.
> >
> > Signed-off-by: Alex Williamson <[email protected]>
> > ---
> > drivers/vfio/pci/vfio_pci.c | 52 +++++++++++++++++++++++++++++++++++++++++++
> > include/uapi/linux/vfio.h | 37 +++++++++++++++++++++++++++++++
> > 2 files changed, 89 insertions(+)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index 8dd6ef9543ca..e4d5d26e5e71 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -1180,6 +1180,58 @@ static long vfio_pci_ioctl(void *device_data,
> >
> > return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
> > ioeventfd.data, count, ioeventfd.fd);
> > + } else if (cmd == VFIO_DEVICE_FEATURE) {
> > + struct vfio_device_feature feature;
> > + uuid_t uuid;
> > +
> > + minsz = offsetofend(struct vfio_device_feature, flags);
> > +
> > + if (copy_from_user(&feature, (void __user *)arg, minsz))
> > + return -EFAULT;
> > +
> > + if (feature.argsz < minsz)
> > + return -EINVAL;
> > +
> > + if (feature.flags & ~(VFIO_DEVICE_FEATURE_MASK |
> > + VFIO_DEVICE_FEATURE_SET |
> > + VFIO_DEVICE_FEATURE_GET |
> > + VFIO_DEVICE_FEATURE_PROBE))
> > + return -EINVAL;
>
> GET|SET|PROBE is well-defined, but what about GET|SET without PROBE? Do
> we want to fence this in the generic ioctl handler part? Or is there
> any sane way to implement that (read and then write back something?)

I'd be ok with discouraging combinations of GET|SET|!PROBE generically.
I don't think there's an intuitive answer to whether it should be
applied as GET|SET or SET|GET. If some future feature wanted an atomic
op we could add something like a test-and-set. Thanks,

Alex

> > +
> > + switch (feature.flags & VFIO_DEVICE_FEATURE_MASK) {
> > + case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
> > + if (!vdev->vf_token)
> > + return -ENOTTY;
> > +
> > + /*
> > + * We do not support GET of the VF Token UUID as this
> > + * could expose the token of the previous device user.
> > + */
> > + if (feature.flags & VFIO_DEVICE_FEATURE_GET)
> > + return -EINVAL;
> > +
> > + if (feature.flags & VFIO_DEVICE_FEATURE_PROBE)
> > + return 0;
> > +
> > + /* Don't SET unless told to do so */
> > + if (!(feature.flags & VFIO_DEVICE_FEATURE_SET))
> > + return -EINVAL;
> > +
> > + if (feature.argsz < minsz + sizeof(uuid))
> > + return -EINVAL;
> > +
> > + if (copy_from_user(&uuid, (void __user *)(arg + minsz),
> > + sizeof(uuid)))
> > + return -EFAULT;
> > +
> > + mutex_lock(&vdev->vf_token->lock);
> > + uuid_copy(&vdev->vf_token->uuid, &uuid);
> > + mutex_unlock(&vdev->vf_token->lock);
> > +
> > + return 0;
> > + default:
> > + return -ENOTTY;
> > + }
> > }
> >
> > return -ENOTTY;
> (...)

2020-03-06 03:37:17

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] vfio/pci: SR-IOV support


On 2020/3/6 上午1:14, Alex Williamson wrote:
> On Tue, 25 Feb 2020 14:09:07 +0800
> Jason Wang <[email protected]> wrote:
>
>> On 2020/2/25 上午10:33, Tian, Kevin wrote:
>>>> From: Alex Williamson
>>>> Sent: Thursday, February 20, 2020 2:54 AM
>>>>
>>>> Changes since v1 are primarily to patch 3/7 where the commit log is
>>>> rewritten, along with option parsing and failure logging based on
>>>> upstream discussions. The primary user visible difference is that
>>>> option parsing is now much more strict. If a vf_token option is
>>>> provided that cannot be used, we generate an error. As a result of
>>>> this, opening a PF with a vf_token option will serve as a mechanism of
>>>> setting the vf_token. This seems like a more user friendly API than
>>>> the alternative of sometimes requiring the option (VFs in use) and
>>>> sometimes rejecting it, and upholds our desire that the option is
>>>> always either used or rejected.
>>>>
>>>> This also means that the VFIO_DEVICE_FEATURE ioctl is not the only
>>>> means of setting the VF token, which might call into question whether
>>>> we absolutely need this new ioctl. Currently I'm keeping it because I
>>>> can imagine use cases, for example if a hypervisor were to support
>>>> SR-IOV, the PF device might be opened without consideration for a VF
>>>> token and we'd require the hypservisor to close and re-open the PF in
>>>> order to set a known VF token, which is impractical.
>>>>
>>>> Series overview (same as provided with v1):
>>> Thanks for doing this!
>>>
>>>> The synopsis of this series is that we have an ongoing desire to drive
>>>> PCIe SR-IOV PFs from userspace with VFIO. There's an immediate need
>>>> for this with DPDK drivers and potentially interesting future use
>>> Can you provide a link to the DPDK discussion?
>>>
>>>> cases in virtualization. We've been reluctant to add this support
>>>> previously due to the dependency and trust relationship between the
>>>> VF device and PF driver. Minimally the PF driver can induce a denial
>>>> of service to the VF, but depending on the specific implementation,
>>>> the PF driver might also be responsible for moving data between VFs
>>>> or have direct access to the state of the VF, including data or state
>>>> otherwise private to the VF or VF driver.
>>> Just a loud thinking. While the motivation of VF token sounds reasonable
>>> to me, I'm curious why the same concern is not raised in other usages.
>>> For example, there is no such design in virtio framework, where the
>>> virtio device could also be restarted, putting in separate process (vhost-user),
>>> and even in separate VM (virtio-vhost-user), etc.
>>
>> AFAIK, the restart could only be triggered by either VM or qemu. But
>> yes, the datapath could be offloaded.
>>
>> But I'm not sure introducing another dedicated mechanism is better than
>> using the exist generic POSIX mechanism to make sure the connection
>> (AF_UINX) is secure.
>>
>>
>>> Of course the para-
>>> virtualized attribute of virtio implies some degree of trust, but as you
>>> mentioned many SR-IOV implementations support VF->PF communication
>>> which also implies some level of trust. It's perfectly fine if VFIO just tries
>>> to do better than other sub-systems, but knowing how other people
>>> tackle the similar problem may make the whole picture clearer. ????
>>>
>>> +Jason.
>>
>> I'm not quite sure e.g allowing userspace PF driver with kernel VF
>> driver would not break the assumption of kernel security model. At least
>> we should forbid a unprivileged PF driver running in userspace.
> It might be useful to have your opinion on this series, because that's
> exactly what we're trying to do here. Various environments, DPDK
> specifically, want a userspace PF driver. This series takes steps to
> mitigate the risk of having such a driver, such as requiring this VF
> token interface to extend the VFIO interface and validate participation
> around a PF that is not considered trusted by the kernel.


I may miss something. But what happens if:

- PF driver is running by unprivileged user
- PF is programmed to send translated DMA request
- Then unprivileged user can mangle the kernel data


> We also set
> a driver_override to try to make sure no host kernel driver can
> automatically bind to a VF of a user owned PF, only vfio-pci, but we
> don't prevent the admin from creating configurations where the VFs are
> used by other host kernel drivers.
>
> I think the question Kevin is inquiring about is whether virtio devices
> are susceptible to the type of collaborative, shared key environment
> we're creating here. For example, can a VM or qemu have access to
> reset a virtio device in a way that could affect other devices, ex. FLR
> on a PF that could interfere with VF operation. Thanks,


Right, but I'm not sure it can be done only via virtio or need support
from transport (e.g PCI).

Thanks


>
> Alex
>

2020-03-06 07:57:52

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 5/7] vfio/pci: Add sriov_configure support

> From: Alex Williamson <[email protected]>
> Sent: Friday, March 6, 2020 2:23 AM
>
> On Tue, 25 Feb 2020 03:08:00 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Alex Williamson
> > > Sent: Thursday, February 20, 2020 2:54 AM
> > >
> > > With the VF Token interface we can now expect that a vfio userspace
> > > driver must be in collaboration with the PF driver, an unwitting
> > > userspace driver will not be able to get past the GET_DEVICE_FD step
> > > in accessing the device. We can now move on to actually allowing
> > > SR-IOV to be enabled by vfio-pci on the PF. Support for this is not
> > > enabled by default in this commit, but it does provide a module option
> > > for this to be enabled (enable_sriov=1). Enabling VFs is rather
> > > straightforward, except we don't want to risk that a VF might get
> > > autoprobed and bound to other drivers, so a bus notifier is used to
> > > "capture" VFs to vfio-pci using the driver_override support. We
> > > assume any later action to bind the device to other drivers is
> > > condoned by the system admin and allow it with a log warning.
> > >
> > > vfio-pci will disable SR-IOV on a PF before releasing the device,
> > > allowing a VF driver to be assured other drivers cannot take over the
> > > PF and that any other userspace driver must know the shared VF token.
> > > This support also does not provide a mechanism for the PF userspace
> > > driver itself to manipulate SR-IOV through the vfio API. With this
> > > patch SR-IOV can only be enabled via the host sysfs interface and the
> > > PF driver user cannot create or remove VFs.
> >
> > I'm not sure how many devices can be properly configured simply
> > with pci_enable_sriov. It is not unusual to require PF driver prepare
> > something before turning PCI SR-IOV capability. If you look kernel
> > PF drivers, there are only two using generic pci_sriov_configure_
> > simple (simple wrapper like pci_enable_sriov), while most others
> > implementing their own callback. However vfio itself has no idea
> > thus I'm not sure how an user knows whether using this option can
> > actually meet his purpose. I may miss something here, possibly
> > using DPDK as an example will make it clearer.
>
> There is still the entire vfio userspace driver interface. Imagine for
> example that QEMU emulates the SR-IOV capability and makes a call out
> to libvirt (or maybe runs with privs for the PF SR-IOV sysfs attribs)
> when the guest enables SR-IOV. Can't we assume that any PF specific
> support can still be performed in the userspace/guest driver, leaving
> us with a very simple and generic sriov_configure callback in vfio-pci?

Makes sense. One concern, though, is how an user could be warned
if he inadvertently uses sysfs to enable SR-IOV on a vfio device whose
userspace driver is incapable of handling it. Note any VFIO device,
if SR-IOV capable, will allow user to do so once the module option is
turned on and the callback is registered. I felt such uncertainty can be
contained by toggling SR-IOV through a vfio api, but from your description
obviously it is what you want to avoid. Is it due to the sequence reason,
e.g. that SR-IOV must be enabled before userspace PF driver sets the
token?

Thanks
Kevin

2020-03-06 08:34:24

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 3/7] vfio/pci: Introduce VF token

> From: Alex Williamson <[email protected]>
> Sent: Friday, March 6, 2020 2:18 AM
>
> On Tue, 25 Feb 2020 02:59:37 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Alex Williamson
> > > Sent: Thursday, February 20, 2020 2:54 AM
> > >
> > > If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
> > > fully isolated from the PF. The PF can always cause a denial of service
> > > to the VF, even if by simply resetting itself. The degree to which a PF
> > > can access the data passed through a VF or interfere with its operation
> > > is dependent on a given SR-IOV implementation. Therefore we want to
> > > avoid a scenario where an existing vfio-pci based userspace driver might
> > > assume the PF driver is trusted, for example assigning a PF to one VM
> > > and VF to another with some expectation of isolation. IOMMU grouping
> > > could be a solution to this, but imposes an unnecessarily strong
> > > relationship between PF and VF drivers if they need to operate with the
> > > same IOMMU context. Instead we introduce a "VF token", which is
> > > essentially just a shared secret between PF and VF drivers, implemented
> > > as a UUID.
> > >
> > > The VF token can be set by a vfio-pci based PF driver and must be known
> > > by the vfio-pci based VF driver in order to gain access to the device.
> > > This allows the degree to which this VF token is considered secret to be
> > > determined by the applications and environment. For example a VM
> might
> > > generate a random UUID known only internally to the hypervisor while a
> > > userspace networking appliance might use a shared, or even well know,
> > > UUID among the application drivers.
> > >
> > > To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD interface
> is
> > > extended to accept key=value pairs in addition to the device name. This
> > > allows us to most easily deny user access to the device without risk
> > > that existing userspace drivers assume region offsets, IRQs, and other
> > > device features, leading to more elaborate error paths. The format of
> > > these options are expected to take the form:
> > >
> > > "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"
> > >
> > > Where the device name is always provided first for compatibility and
> > > additional options are specified in a space separated list. The
> > > relation between and requirements for the additional options will be
> > > vfio bus driver dependent, however unknown or unused option within
> this
> > > schema should return error. This allow for future use of unknown
> > > options as well as a positive indication to the user that an option is
> > > used.
> > >
> > > An example VF token option would take this form:
> > >
> > > "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"
> > >
> > > When accessing a VF where the PF is making use of vfio-pci, the user
> > > MUST provide the current vf_token. When accessing a PF, the user MUST
> > > provide the current vf_token IF there are active VF users or MAY provide
> > > a vf_token in order to set the current VF token when no VF users are
> > > active. The former requirement assures VF users that an unassociated
> > > driver cannot usurp the PF device. These semantics also imply that a
> > > VF token MUST be set by a PF driver before VF drivers can access their
> > > device, the default token is random and mechanisms to read the token
> are
> > > not provided in order to protect the VF token of previous users. Use of
> > > the vf_token option outside of these cases will return an error, as
> > > discussed above.
> > >
> > > Signed-off-by: Alex Williamson <[email protected]>
> > > ---
> > > drivers/vfio/pci/vfio_pci.c | 198
> > > +++++++++++++++++++++++++++++++++++
> > > drivers/vfio/pci/vfio_pci_private.h | 8 +
> > > 2 files changed, 205 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > index 2ec6c31d0ab0..8dd6ef9543ca 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -466,6 +466,44 @@ static void vfio_pci_disable(struct
> vfio_pci_device
> > > *vdev)
> > > vfio_pci_set_power_state(vdev, PCI_D3hot);
> > > }
> > >
> > > +static struct pci_driver vfio_pci_driver;
> > > +
> > > +static struct vfio_pci_device *get_pf_vdev(struct vfio_pci_device *vdev,
> > > + struct vfio_device **pf_dev)
> > > +{
> > > + struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > > +
> > > + if (!vdev->pdev->is_virtfn)
> > > + return NULL;
> > > +
> > > + *pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > > + if (!*pf_dev)
> > > + return NULL;
> > > +
> > > + if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> > > + vfio_device_put(*pf_dev);
> > > + return NULL;
> > > + }
> > > +
> > > + return vfio_device_data(*pf_dev);
> > > +}
> > > +
> > > +static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int
> val)
> > > +{
> > > + struct vfio_device *pf_dev;
> > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev, &pf_dev);
> > > +
> > > + if (!pf_vdev)
> > > + return;
> > > +
> > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > + pf_vdev->vf_token->users += val;
> > > + WARN_ON(pf_vdev->vf_token->users < 0);
> > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > +
> > > + vfio_device_put(pf_dev);
> > > +}
> > > +
> > > static void vfio_pci_release(void *device_data)
> > > {
> > > struct vfio_pci_device *vdev = device_data;
> > > @@ -473,6 +511,7 @@ static void vfio_pci_release(void *device_data)
> > > mutex_lock(&vdev->reflck->lock);
> > >
> > > if (!(--vdev->refcnt)) {
> > > + vfio_pci_vf_token_user_add(vdev, -1);
> > > vfio_spapr_pci_eeh_release(vdev->pdev);
> > > vfio_pci_disable(vdev);
> > > }
> > > @@ -498,6 +537,7 @@ static int vfio_pci_open(void *device_data)
> > > goto error;
> > >
> > > vfio_spapr_pci_eeh_open(vdev->pdev);
> > > + vfio_pci_vf_token_user_add(vdev, 1);
> > > }
> > > vdev->refcnt++;
> > > error:
> > > @@ -1278,11 +1318,148 @@ static void vfio_pci_request(void
> *device_data,
> > > unsigned int count)
> > > mutex_unlock(&vdev->igate);
> > > }
> > >
> > > +static int vfio_pci_validate_vf_token(struct vfio_pci_device *vdev,
> > > + bool vf_token, uuid_t *uuid)
> > > +{
> > > + /*
> > > + * There's always some degree of trust or collaboration between SR-
> > > IOV
> > > + * PF and VFs, even if just that the PF hosts the SR-IOV capability and
> > > + * can disrupt VFs with a reset, but often the PF has more explicit
> > > + * access to deny service to the VF or access data passed through the
> > > + * VF. We therefore require an opt-in via a shared VF token (UUID)
> > > to
> > > + * represent this trust. This both prevents that a VF driver might
> > > + * assume the PF driver is a trusted, in-kernel driver, and also that
> > > + * a PF driver might be replaced with a rogue driver, unknown to in-
> > > use
> > > + * VF drivers.
> > > + *
> > > + * Therefore when presented with a VF, if the PF is a vfio device and
> > > + * it is bound to the vfio-pci driver, the user needs to provide a VF
> > > + * token to access the device, in the form of appending a vf_token to
> > > + * the device name, for example:
> > > + *
> > > + * "0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-f591514ba1f3"
> > > + *
> > > + * When presented with a PF which has VFs in use, the user must also
> > > + * provide the current VF token to prove collaboration with existing
> > > + * VF users. If VFs are not in use, the VF token provided for the PF
> > > + * device will act to set the VF token.
> > > + *
> > > + * If the VF token is provided but unused, a fault is generated.
> >
> > fault->error, otherwise it is easy to consider a CPU fault. ????
>
> Ok, I can make that change, but I think you might have a unique
> background to make a leap that a userspace ioctl can trigger a CPU
> fault ;)
>
> > > + */
> > > + if (!vdev->pdev->is_virtfn && !vdev->vf_token && !vf_token)
> > > + return 0; /* No VF token provided or required */
> > > +
> > > + if (vdev->pdev->is_virtfn) {
> > > + struct vfio_device *pf_dev;
> > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> > > &pf_dev);
> > > + bool match;
> > > +
> > > + if (!pf_vdev) {
> > > + if (!vf_token)
> > > + return 0; /* PF is not vfio-pci, no VF token */
> > > +
> > > + pci_info_ratelimited(vdev->pdev,
> > > + "VF token incorrectly provided, PF not bound
> > > to vfio-pci\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + if (!vf_token) {
> > > + vfio_device_put(pf_dev);
> > > + pci_info_ratelimited(vdev->pdev,
> > > + "VF token required to access device\n");
> > > + return -EACCES;
> > > + }
> > > +
> > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > + match = uuid_equal(uuid, &pf_vdev->vf_token->uuid);
> > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > +
> > > + vfio_device_put(pf_dev);
> > > +
> > > + if (!match) {
> > > + pci_info_ratelimited(vdev->pdev,
> > > + "Incorrect VF token provided for device\n");
> > > + return -EACCES;
> > > + }
> > > + } else if (vdev->vf_token) {
> > > + mutex_lock(&vdev->vf_token->lock);
> > > + if (vdev->vf_token->users) {
> > > + if (!vf_token) {
> > > + mutex_unlock(&vdev->vf_token->lock);
> > > + pci_info_ratelimited(vdev->pdev,
> > > + "VF token required to access
> > > device\n");
> > > + return -EACCES;
> > > + }
> > > +
> > > + if (!uuid_equal(uuid, &vdev->vf_token->uuid)) {
> > > + mutex_unlock(&vdev->vf_token->lock);
> > > + pci_info_ratelimited(vdev->pdev,
> > > + "Incorrect VF token provided for
> > > device\n");
> > > + return -EACCES;
> > > + }
> > > + } else if (vf_token) {
> > > + uuid_copy(&vdev->vf_token->uuid, uuid);
> > > + }
> >
> > It implies that we allow PF to be accessed w/o providing a VF token,
> > as long as no VF is currently in-use, which further means no VF can
> > be further assigned since no one knows the random uuid allocated
> > by vfio. Just want to confirm whether it is the desired flavor. If an
> > user really wants to use PF-only, possibly he should disable SR-IOV
> > instead...
>
> Yes, this is the behavior I'm intending. Are you suggesting that we
> should require a VF token in order to access a PF that has SR-IOV
> already enabled? This introduces an inconsistency that SR-IOV can be

yes. I felt that it's meaningless otherwise if an user has no attempt to
manage SR-IOV but still leaving it enabled. In many cases, enabling of
SR-IOV may reserve some resource in the hardware, thus simply hurting
PF performance.

> enabled via sysfs asynchronous to the GET_DEVICE_FD ioctl, so we'd need
> to secure the sysfs interface to only allow enabling SR-IOV when the PF
> is already opened to cases where the VF token is already set? Thus

yes, the PF is assigned to the userspace driver, thus it's reasonable to
have the userspace driver decide whether to enable or disable SR-IOV
when the PF is under its control. as I replied to patch [5/7], the sysfs
interface alone looks problematic w/o knowing whether the userspace
driver is willing to manage VFs (by setting a token)...

> SR-IOV could be pre-enabled, but the user must provide a vf_token
> option on GET_DEVICE_FD, otherwise SR-IOV could only be enabled after
> the user sets a VF token. But then do we need to invalidate the token
> at some point, or else it seems like we have the same scenario when the
> next user comes along. We believe there are PFs that require no

I think so, e.g. when SR-IOV is being disabled, or when the fd is closed.

> special VF support other than sriov_configure, so those driver could
> theoretically close the PF after setting a VF token. That makes it

theoretically yes, but I'm not sure the real gain of supporting such
usage. ????

btw with your question I realize another potential open. Now an
user could also use sysfs to reset the PF, which definitely affects the
state of VFs. Do we want a token match with that path? or such
intention is assumed to be trusted by VF drivers given that only
privileged users can do it?

> difficult to determine the lifetime of a VF token and leads to the
> interface proposed here of an initial random token, then the user set
> token persisting indefinitely.
>
> I've tended consider all of these to be mechanisms that a user can
> shoot themselves in the foot. Yes, the user and admin can do things
> that will fail to work with this interface, for example my testing
> involves QEMU, where we don't expose SR-IOV to the guest yet and the
> igb driver for the PF will encounter problems running a device with
> SR-IOV enabled that it doesn't know about. Do we want to try to play
> nanny and require specific semantics? I've opt'd for the more simple
> code here.
>
> > > +
> > > + mutex_unlock(&vdev->vf_token->lock);
> > > + } else if (vf_token) {
> > > + pci_info_ratelimited(vdev->pdev,
> > > + "VF token incorrectly provided, not a PF or VF\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +#define VF_TOKEN_ARG "vf_token="
> > > +
> > > static int vfio_pci_match(void *device_data, char *buf)
> > > {
> > > struct vfio_pci_device *vdev = device_data;
> > > + bool vf_token = false;
> > > + uuid_t uuid;
> > > + int ret;
> > > +
> > > + if (strncmp(pci_name(vdev->pdev), buf, strlen(pci_name(vdev-
> > > >pdev))))
> > > + return 0; /* No match */
> > > +
> > > + if (strlen(buf) > strlen(pci_name(vdev->pdev))) {
> > > + buf += strlen(pci_name(vdev->pdev));
> > > +
> > > + if (*buf != ' ')
> > > + return 0; /* No match: non-whitespace after name */
> > > +
> > > + while (*buf) {
> > > + if (*buf == ' ') {
> > > + buf++;
> > > + continue;
> > > + }
> > > +
> > > + if (!vf_token && !strncmp(buf, VF_TOKEN_ARG,
> > > + strlen(VF_TOKEN_ARG))) {
> > > + buf += strlen(VF_TOKEN_ARG);
> > > +
> > > + if (strlen(buf) < UUID_STRING_LEN)
> > > + return -EINVAL;
> > > +
> > > + ret = uuid_parse(buf, &uuid);
> > > + if (ret)
> > > + return ret;
> > >
> > > - return !strcmp(pci_name(vdev->pdev), buf);
> > > + vf_token = true;
> > > + buf += UUID_STRING_LEN;
> > > + } else {
> > > + /* Unknown/duplicate option */
> > > + return -EINVAL;
> > > + }
> > > + }
> > > + }
> > > +
> > > + ret = vfio_pci_validate_vf_token(vdev, vf_token, &uuid);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + return 1; /* Match */
> > > }
> > >
> > > static const struct vfio_device_ops vfio_pci_ops = {
> > > @@ -1354,6 +1531,19 @@ static int vfio_pci_probe(struct pci_dev *pdev,
> > > const struct pci_device_id *id)
> > > return ret;
> > > }
> > >
> > > + if (pdev->is_physfn) {
> > > + vdev->vf_token = kzalloc(sizeof(*vdev->vf_token),
> > > GFP_KERNEL);
> > > + if (!vdev->vf_token) {
> > > + vfio_pci_reflck_put(vdev->reflck);
> > > + vfio_del_group_dev(&pdev->dev);
> > > + vfio_iommu_group_put(group, &pdev->dev);
> > > + kfree(vdev);
> > > + return -ENOMEM;
> > > + }
> > > + mutex_init(&vdev->vf_token->lock);
> > > + uuid_gen(&vdev->vf_token->uuid);
> >
> > should we also regenerate a random uuid somewhere when SR-IOV is
> > disabled and then re-enabled on a PF? Although vfio disallows userspace
> > to read uuid, it is always safer to avoid caching a secret from previous
> > user.
>
> What if our user is QEMU emulating SR-IOV to the guest. Do we want to
> force a new VF token is set every time we bounce the VFs? Why? As
> above, the session lifetime of the VF token might be difficult to
> determine and I'm not sure paranoia is a sufficient reason to try to
> create boundaries for it. Thanks,
>
> Alex
>
> > > + }
> > > +
> > > if (vfio_pci_is_vga(pdev)) {
> > > vga_client_register(pdev, vdev, NULL,
> > > vfio_pci_set_vga_decode);
> > > vga_set_legacy_decoding(pdev,
> > > @@ -1387,6 +1577,12 @@ static void vfio_pci_remove(struct pci_dev
> *pdev)
> > > if (!vdev)
> > > return;
> > >
> > > + if (vdev->vf_token) {
> > > + WARN_ON(vdev->vf_token->users);
> > > + mutex_destroy(&vdev->vf_token->lock);
> > > + kfree(vdev->vf_token);
> > > + }
> > > +
> > > vfio_pci_reflck_put(vdev->reflck);
> > >
> > > vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
> > > diff --git a/drivers/vfio/pci/vfio_pci_private.h
> > > b/drivers/vfio/pci/vfio_pci_private.h
> > > index 8a2c7607d513..76c11c915949 100644
> > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > @@ -12,6 +12,7 @@
> > > #include <linux/pci.h>
> > > #include <linux/irqbypass.h>
> > > #include <linux/types.h>
> > > +#include <linux/uuid.h>
> > >
> > > #ifndef VFIO_PCI_PRIVATE_H
> > > #define VFIO_PCI_PRIVATE_H
> > > @@ -84,6 +85,12 @@ struct vfio_pci_reflck {
> > > struct mutex lock;
> > > };
> > >
> > > +struct vfio_pci_vf_token {
> > > + struct mutex lock;
> > > + uuid_t uuid;
> > > + int users;
> > > +};
> > > +
> > > struct vfio_pci_device {
> > > struct pci_dev *pdev;
> > > void __iomem *barmap[PCI_STD_NUM_BARS];
> > > @@ -122,6 +129,7 @@ struct vfio_pci_device {
> > > struct list_head dummy_resources_list;
> > > struct mutex ioeventfds_lock;
> > > struct list_head ioeventfds_list;
> > > + struct vfio_pci_vf_token *vf_token;
> > > };
> > >
> > > #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
> >

2020-03-06 09:22:21

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 0/7] vfio/pci: SR-IOV support

> From: Alex Williamson
> Sent: Friday, March 6, 2020 1:34 AM
>
> Hi Kevin,
>
> Sorry for the delay, I've been out on PTO...
>
> On Tue, 25 Feb 2020 02:33:27 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Alex Williamson
> > > Sent: Thursday, February 20, 2020 2:54 AM
> > >
> > > Changes since v1 are primarily to patch 3/7 where the commit log is
> > > rewritten, along with option parsing and failure logging based on
> > > upstream discussions. The primary user visible difference is that
> > > option parsing is now much more strict. If a vf_token option is
> > > provided that cannot be used, we generate an error. As a result of
> > > this, opening a PF with a vf_token option will serve as a mechanism of
> > > setting the vf_token. This seems like a more user friendly API than
> > > the alternative of sometimes requiring the option (VFs in use) and
> > > sometimes rejecting it, and upholds our desire that the option is
> > > always either used or rejected.
> > >
> > > This also means that the VFIO_DEVICE_FEATURE ioctl is not the only
> > > means of setting the VF token, which might call into question whether
> > > we absolutely need this new ioctl. Currently I'm keeping it because I
> > > can imagine use cases, for example if a hypervisor were to support
> > > SR-IOV, the PF device might be opened without consideration for a VF
> > > token and we'd require the hypservisor to close and re-open the PF in
> > > order to set a known VF token, which is impractical.
> > >
> > > Series overview (same as provided with v1):
> >
> > Thanks for doing this!
> >
> > >
> > > The synopsis of this series is that we have an ongoing desire to drive
> > > PCIe SR-IOV PFs from userspace with VFIO. There's an immediate need
> > > for this with DPDK drivers and potentially interesting future use
> >
> > Can you provide a link to the DPDK discussion?
>
> There's a thread here which proposed an out-of-tree driver that enables
> a parallel sr-iov enabling interface for a vfio-pci own device.
> Clearly I felt strongly about it ;)
>
> https://patches.dpdk.org/patch/58810/
>
> Also, documentation for making use of an Intel FPGA device with DPDK
> requires the PF bound to igb_uio to support enabling SR-IOV:
>
> https://doc.dpdk.org/guides/bbdevs/fpga_lte_fec.html

thanks. it is useful.

>
> > > cases in virtualization. We've been reluctant to add this support
> > > previously due to the dependency and trust relationship between the
> > > VF device and PF driver. Minimally the PF driver can induce a denial
> > > of service to the VF, but depending on the specific implementation,
> > > the PF driver might also be responsible for moving data between VFs
> > > or have direct access to the state of the VF, including data or state
> > > otherwise private to the VF or VF driver.
> >
> > Just a loud thinking. While the motivation of VF token sounds reasonable
> > to me, I'm curious why the same concern is not raised in other usages.
> > For example, there is no such design in virtio framework, where the
> > virtio device could also be restarted, putting in separate process (vhost-
> user),
> > and even in separate VM (virtio-vhost-user), etc. Of course the para-
> > virtualized attribute of virtio implies some degree of trust, but as you
> > mentioned many SR-IOV implementations support VF->PF communication
> > which also implies some level of trust. It's perfectly fine if VFIO just tries
> > to do better than other sub-systems, but knowing how other people
> > tackle the similar problem may make the whole picture clearer. ????
> >
> > +Jason.
>
> We can follow the thread with Jason, but I can't really speak to
> whether virtio needs something similar or doesn't provide enough PF
> access to be concerned. If they need a similar solution, we can
> collaborate, but the extension we're defining here is specifically part
> of the vfio-pci ABI, so it might not be easily portable to virtio.
>
> > > To help resolve these concerns, we introduce a VF token into the VFIO
> > > PCI ABI, which acts as a shared secret key between drivers. The
> > > userspace PF driver is required to set the VF token to a known value
> > > and userspace VF drivers are required to provide the token to access
> > > the VF device. If a PF driver is restarted with VF drivers in use, it
> > > must also provide the current token in order to prevent a rogue
> > > untrusted PF driver from replacing a known driver. The degree to
> > > which this new token is considered secret is left to the userspace
> > > drivers, the kernel intentionally provides no means to retrieve the
> > > current token.
> >
> > I'm wondering whether the token idea can be used beyond SR-IOV, e.g.
> > (1) we may allow vfio user space to manage Scalable IOV in the future,
> > which faces the similar challenge between the PF and mdev; (2) the
> > token might be used as a canonical way to replace off-tree acs-override
> > workaround, say, allowing the admin to assign devices within the
> > same iommu group to different VMs which trust each other. I'm not
> > sure how much complexity will be further introduced, but it's greatly
> > appreciated if you can help think a bit and if feasible abstract some
> > logic in vfio core layer for such potential usages...
>
> I don't see how this can be used for ACS override. Lacking ACS, we
> must assume lack of DMA isolation, which results in our IOMMU grouping.
> If we split IOMMU groups, that implies something that doesn't exist. A
> user can already create a process that can own the vfio group and pass
> vfio devices to other tasks, with the restriction of having a single
> DMA address space. If there is DMA isolation, then an mdev solution
> might be better, but given the IOMMU integration of SIOV, I'm not sure
> why the devices wouldn't simply be placed in separate groups by the
> IOMMU driver. Thanks,

You are right. I overlooked the single DMA address space limitation.

>
> Alex
>
> > > Note that the above token is only required for this new model where
> > > both the PF and VF devices are usable through vfio-pci. Existing
> > > models of VFIO drivers where the PF is used without SR-IOV enabled
> > > or the VF is bound to a userspace driver with an in-kernel, host PF
> > > driver are unaffected.
> > >
> > > The latter configuration above also highlights a new inverted scenario
> > > that is now possible, a userspace PF driver with in-kernel VF drivers.
> > > I believe this is a scenario that should be allowed, but should not be
> > > enabled by default. This series includes code to set a default
> > > driver_override for VFs sourced from a vfio-pci user owned PF, such
> > > that the VFs are also bound to vfio-pci. This model is compatible
> > > with tools like driverctl and allows the system administrator to
> > > decide if other bindings should be enabled. The VF token interface
> > > above exists only between vfio-pci PF and VF drivers, once a VF is
> > > bound to another driver, the administrator has effectively pronounced
> > > the device as trusted. The vfio-pci driver will note alternate
> > > binding in dmesg for logging and debugging purposes.
> > >
> > > Please review, comment, and test. The example QEMU implementation
> > > provided with the RFC is still current for this version. Thanks,
> > >
> > > Alex
> > >
> > > RFC:
> > >
> https://lore.kernel.org/lkml/158085337582.9445.17682266437583505502.stg
> > > [email protected]/
> > > v1:
> > >
> https://lore.kernel.org/lkml/158145472604.16827.15751375540102298130.st
> > > [email protected]/
> > >
> > > ---
> > >
> > > Alex Williamson (7):
> > > vfio: Include optional device match in vfio_device_ops callbacks
> > > vfio/pci: Implement match ops
> > > vfio/pci: Introduce VF token
> > > vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user
> > > vfio/pci: Add sriov_configure support
> > > vfio/pci: Remove dev_fmt definition
> > > vfio/pci: Cleanup .probe() exit paths
> > >
> > >
> > > drivers/vfio/pci/vfio_pci.c | 383
> > > +++++++++++++++++++++++++++++++++--
> > > drivers/vfio/pci/vfio_pci_private.h | 10 +
> > > drivers/vfio/vfio.c | 20 +-
> > > include/linux/vfio.h | 4
> > > include/uapi/linux/vfio.h | 37 +++
> > > 5 files changed, 426 insertions(+), 28 deletions(-)
> >

2020-03-06 09:46:26

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 5/7] vfio/pci: Add sriov_configure support

> From: Tian, Kevin
> Sent: Friday, March 6, 2020 3:57 PM
>
> > From: Alex Williamson <[email protected]>
> > Sent: Friday, March 6, 2020 2:23 AM
> >
> > On Tue, 25 Feb 2020 03:08:00 +0000
> > "Tian, Kevin" <[email protected]> wrote:
> >
> > > > From: Alex Williamson
> > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > >
> > > > With the VF Token interface we can now expect that a vfio userspace
> > > > driver must be in collaboration with the PF driver, an unwitting
> > > > userspace driver will not be able to get past the GET_DEVICE_FD step
> > > > in accessing the device. We can now move on to actually allowing
> > > > SR-IOV to be enabled by vfio-pci on the PF. Support for this is not
> > > > enabled by default in this commit, but it does provide a module option
> > > > for this to be enabled (enable_sriov=1). Enabling VFs is rather
> > > > straightforward, except we don't want to risk that a VF might get
> > > > autoprobed and bound to other drivers, so a bus notifier is used to
> > > > "capture" VFs to vfio-pci using the driver_override support. We
> > > > assume any later action to bind the device to other drivers is
> > > > condoned by the system admin and allow it with a log warning.
> > > >
> > > > vfio-pci will disable SR-IOV on a PF before releasing the device,
> > > > allowing a VF driver to be assured other drivers cannot take over the
> > > > PF and that any other userspace driver must know the shared VF token.
> > > > This support also does not provide a mechanism for the PF userspace
> > > > driver itself to manipulate SR-IOV through the vfio API. With this
> > > > patch SR-IOV can only be enabled via the host sysfs interface and the
> > > > PF driver user cannot create or remove VFs.
> > >
> > > I'm not sure how many devices can be properly configured simply
> > > with pci_enable_sriov. It is not unusual to require PF driver prepare
> > > something before turning PCI SR-IOV capability. If you look kernel
> > > PF drivers, there are only two using generic pci_sriov_configure_
> > > simple (simple wrapper like pci_enable_sriov), while most others
> > > implementing their own callback. However vfio itself has no idea
> > > thus I'm not sure how an user knows whether using this option can
> > > actually meet his purpose. I may miss something here, possibly
> > > using DPDK as an example will make it clearer.
> >
> > There is still the entire vfio userspace driver interface. Imagine for
> > example that QEMU emulates the SR-IOV capability and makes a call out
> > to libvirt (or maybe runs with privs for the PF SR-IOV sysfs attribs)
> > when the guest enables SR-IOV. Can't we assume that any PF specific
> > support can still be performed in the userspace/guest driver, leaving
> > us with a very simple and generic sriov_configure callback in vfio-pci?
>
> Makes sense. One concern, though, is how an user could be warned
> if he inadvertently uses sysfs to enable SR-IOV on a vfio device whose
> userspace driver is incapable of handling it. Note any VFIO device,
> if SR-IOV capable, will allow user to do so once the module option is
> turned on and the callback is registered. I felt such uncertainty can be
> contained by toggling SR-IOV through a vfio api, but from your description
> obviously it is what you want to avoid. Is it due to the sequence reason,
> e.g. that SR-IOV must be enabled before userspace PF driver sets the
> token?
>

reading again I found that you specifically mentioned "the PF driver user
cannot create or remove VFs.". However I failed to get the rationale
behind. If the VF drivers have built the trust with the PF driver through
the token, what is the problem of allowing the PF driver to further manage
SR-IOV itself? suppose any VF removal will be done in a cooperate way
to avoid surprise impact to related VF drivers. then possibly a new vfio
ioctl for setting the VF numbers plus a token from the userspace driver
could also serve the purpose of this patch series (GET_DEVICE_FD + sysfs)?

Thanks
Kevin

2020-03-06 15:40:07

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] vfio/pci: Introduce VF token

On Fri, 6 Mar 2020 08:32:40 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Alex Williamson <[email protected]>
> > Sent: Friday, March 6, 2020 2:18 AM
> >
> > On Tue, 25 Feb 2020 02:59:37 +0000
> > "Tian, Kevin" <[email protected]> wrote:
> >
> > > > From: Alex Williamson
> > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > >
> > > > If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
> > > > fully isolated from the PF. The PF can always cause a denial of service
> > > > to the VF, even if by simply resetting itself. The degree to which a PF
> > > > can access the data passed through a VF or interfere with its operation
> > > > is dependent on a given SR-IOV implementation. Therefore we want to
> > > > avoid a scenario where an existing vfio-pci based userspace driver might
> > > > assume the PF driver is trusted, for example assigning a PF to one VM
> > > > and VF to another with some expectation of isolation. IOMMU grouping
> > > > could be a solution to this, but imposes an unnecessarily strong
> > > > relationship between PF and VF drivers if they need to operate with the
> > > > same IOMMU context. Instead we introduce a "VF token", which is
> > > > essentially just a shared secret between PF and VF drivers, implemented
> > > > as a UUID.
> > > >
> > > > The VF token can be set by a vfio-pci based PF driver and must be known
> > > > by the vfio-pci based VF driver in order to gain access to the device.
> > > > This allows the degree to which this VF token is considered secret to be
> > > > determined by the applications and environment. For example a VM
> > might
> > > > generate a random UUID known only internally to the hypervisor while a
> > > > userspace networking appliance might use a shared, or even well know,
> > > > UUID among the application drivers.
> > > >
> > > > To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD interface
> > is
> > > > extended to accept key=value pairs in addition to the device name. This
> > > > allows us to most easily deny user access to the device without risk
> > > > that existing userspace drivers assume region offsets, IRQs, and other
> > > > device features, leading to more elaborate error paths. The format of
> > > > these options are expected to take the form:
> > > >
> > > > "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"
> > > >
> > > > Where the device name is always provided first for compatibility and
> > > > additional options are specified in a space separated list. The
> > > > relation between and requirements for the additional options will be
> > > > vfio bus driver dependent, however unknown or unused option within
> > this
> > > > schema should return error. This allow for future use of unknown
> > > > options as well as a positive indication to the user that an option is
> > > > used.
> > > >
> > > > An example VF token option would take this form:
> > > >
> > > > "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"
> > > >
> > > > When accessing a VF where the PF is making use of vfio-pci, the user
> > > > MUST provide the current vf_token. When accessing a PF, the user MUST
> > > > provide the current vf_token IF there are active VF users or MAY provide
> > > > a vf_token in order to set the current VF token when no VF users are
> > > > active. The former requirement assures VF users that an unassociated
> > > > driver cannot usurp the PF device. These semantics also imply that a
> > > > VF token MUST be set by a PF driver before VF drivers can access their
> > > > device, the default token is random and mechanisms to read the token
> > are
> > > > not provided in order to protect the VF token of previous users. Use of
> > > > the vf_token option outside of these cases will return an error, as
> > > > discussed above.
> > > >
> > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > ---
> > > > drivers/vfio/pci/vfio_pci.c | 198
> > > > +++++++++++++++++++++++++++++++++++
> > > > drivers/vfio/pci/vfio_pci_private.h | 8 +
> > > > 2 files changed, 205 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > > index 2ec6c31d0ab0..8dd6ef9543ca 100644
> > > > --- a/drivers/vfio/pci/vfio_pci.c
> > > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > > @@ -466,6 +466,44 @@ static void vfio_pci_disable(struct
> > vfio_pci_device
> > > > *vdev)
> > > > vfio_pci_set_power_state(vdev, PCI_D3hot);
> > > > }
> > > >
> > > > +static struct pci_driver vfio_pci_driver;
> > > > +
> > > > +static struct vfio_pci_device *get_pf_vdev(struct vfio_pci_device *vdev,
> > > > + struct vfio_device **pf_dev)
> > > > +{
> > > > + struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > > > +
> > > > + if (!vdev->pdev->is_virtfn)
> > > > + return NULL;
> > > > +
> > > > + *pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > > > + if (!*pf_dev)
> > > > + return NULL;
> > > > +
> > > > + if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> > > > + vfio_device_put(*pf_dev);
> > > > + return NULL;
> > > > + }
> > > > +
> > > > + return vfio_device_data(*pf_dev);
> > > > +}
> > > > +
> > > > +static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int
> > val)
> > > > +{
> > > > + struct vfio_device *pf_dev;
> > > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev, &pf_dev);
> > > > +
> > > > + if (!pf_vdev)
> > > > + return;
> > > > +
> > > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > > + pf_vdev->vf_token->users += val;
> > > > + WARN_ON(pf_vdev->vf_token->users < 0);
> > > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > > +
> > > > + vfio_device_put(pf_dev);
> > > > +}
> > > > +
> > > > static void vfio_pci_release(void *device_data)
> > > > {
> > > > struct vfio_pci_device *vdev = device_data;
> > > > @@ -473,6 +511,7 @@ static void vfio_pci_release(void *device_data)
> > > > mutex_lock(&vdev->reflck->lock);
> > > >
> > > > if (!(--vdev->refcnt)) {
> > > > + vfio_pci_vf_token_user_add(vdev, -1);
> > > > vfio_spapr_pci_eeh_release(vdev->pdev);
> > > > vfio_pci_disable(vdev);
> > > > }
> > > > @@ -498,6 +537,7 @@ static int vfio_pci_open(void *device_data)
> > > > goto error;
> > > >
> > > > vfio_spapr_pci_eeh_open(vdev->pdev);
> > > > + vfio_pci_vf_token_user_add(vdev, 1);
> > > > }
> > > > vdev->refcnt++;
> > > > error:
> > > > @@ -1278,11 +1318,148 @@ static void vfio_pci_request(void
> > *device_data,
> > > > unsigned int count)
> > > > mutex_unlock(&vdev->igate);
> > > > }
> > > >
> > > > +static int vfio_pci_validate_vf_token(struct vfio_pci_device *vdev,
> > > > + bool vf_token, uuid_t *uuid)
> > > > +{
> > > > + /*
> > > > + * There's always some degree of trust or collaboration between SR-
> > > > IOV
> > > > + * PF and VFs, even if just that the PF hosts the SR-IOV capability and
> > > > + * can disrupt VFs with a reset, but often the PF has more explicit
> > > > + * access to deny service to the VF or access data passed through the
> > > > + * VF. We therefore require an opt-in via a shared VF token (UUID)
> > > > to
> > > > + * represent this trust. This both prevents that a VF driver might
> > > > + * assume the PF driver is a trusted, in-kernel driver, and also that
> > > > + * a PF driver might be replaced with a rogue driver, unknown to in-
> > > > use
> > > > + * VF drivers.
> > > > + *
> > > > + * Therefore when presented with a VF, if the PF is a vfio device and
> > > > + * it is bound to the vfio-pci driver, the user needs to provide a VF
> > > > + * token to access the device, in the form of appending a vf_token to
> > > > + * the device name, for example:
> > > > + *
> > > > + * "0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-f591514ba1f3"
> > > > + *
> > > > + * When presented with a PF which has VFs in use, the user must also
> > > > + * provide the current VF token to prove collaboration with existing
> > > > + * VF users. If VFs are not in use, the VF token provided for the PF
> > > > + * device will act to set the VF token.
> > > > + *
> > > > + * If the VF token is provided but unused, a fault is generated.
> > >
> > > fault->error, otherwise it is easy to consider a CPU fault. ????
> >
> > Ok, I can make that change, but I think you might have a unique
> > background to make a leap that a userspace ioctl can trigger a CPU
> > fault ;)
> >
> > > > + */
> > > > + if (!vdev->pdev->is_virtfn && !vdev->vf_token && !vf_token)
> > > > + return 0; /* No VF token provided or required */
> > > > +
> > > > + if (vdev->pdev->is_virtfn) {
> > > > + struct vfio_device *pf_dev;
> > > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> > > > &pf_dev);
> > > > + bool match;
> > > > +
> > > > + if (!pf_vdev) {
> > > > + if (!vf_token)
> > > > + return 0; /* PF is not vfio-pci, no VF token */
> > > > +
> > > > + pci_info_ratelimited(vdev->pdev,
> > > > + "VF token incorrectly provided, PF not bound
> > > > to vfio-pci\n");
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > + if (!vf_token) {
> > > > + vfio_device_put(pf_dev);
> > > > + pci_info_ratelimited(vdev->pdev,
> > > > + "VF token required to access device\n");
> > > > + return -EACCES;
> > > > + }
> > > > +
> > > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > > + match = uuid_equal(uuid, &pf_vdev->vf_token->uuid);
> > > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > > +
> > > > + vfio_device_put(pf_dev);
> > > > +
> > > > + if (!match) {
> > > > + pci_info_ratelimited(vdev->pdev,
> > > > + "Incorrect VF token provided for device\n");
> > > > + return -EACCES;
> > > > + }
> > > > + } else if (vdev->vf_token) {
> > > > + mutex_lock(&vdev->vf_token->lock);
> > > > + if (vdev->vf_token->users) {
> > > > + if (!vf_token) {
> > > > + mutex_unlock(&vdev->vf_token->lock);
> > > > + pci_info_ratelimited(vdev->pdev,
> > > > + "VF token required to access
> > > > device\n");
> > > > + return -EACCES;
> > > > + }
> > > > +
> > > > + if (!uuid_equal(uuid, &vdev->vf_token->uuid)) {
> > > > + mutex_unlock(&vdev->vf_token->lock);
> > > > + pci_info_ratelimited(vdev->pdev,
> > > > + "Incorrect VF token provided for
> > > > device\n");
> > > > + return -EACCES;
> > > > + }
> > > > + } else if (vf_token) {
> > > > + uuid_copy(&vdev->vf_token->uuid, uuid);
> > > > + }
> > >
> > > It implies that we allow PF to be accessed w/o providing a VF token,
> > > as long as no VF is currently in-use, which further means no VF can
> > > be further assigned since no one knows the random uuid allocated
> > > by vfio. Just want to confirm whether it is the desired flavor. If an
> > > user really wants to use PF-only, possibly he should disable SR-IOV
> > > instead...
> >
> > Yes, this is the behavior I'm intending. Are you suggesting that we
> > should require a VF token in order to access a PF that has SR-IOV
> > already enabled? This introduces an inconsistency that SR-IOV can be
>
> yes. I felt that it's meaningless otherwise if an user has no attempt to
> manage SR-IOV but still leaving it enabled. In many cases, enabling of
> SR-IOV may reserve some resource in the hardware, thus simply hurting
> PF performance.

But a user needs to be granted access to a device by a privileged
entity and the privileged entity may also enable SR-IOV, so it seems
you're assuming the privileged entity is operating independently and
not in the best interest of enabling the specific user case.

> > enabled via sysfs asynchronous to the GET_DEVICE_FD ioctl, so we'd need
> > to secure the sysfs interface to only allow enabling SR-IOV when the PF
> > is already opened to cases where the VF token is already set? Thus
>
> yes, the PF is assigned to the userspace driver, thus it's reasonable to
> have the userspace driver decide whether to enable or disable SR-IOV
> when the PF is under its control. as I replied to patch [5/7], the sysfs
> interface alone looks problematic w/o knowing whether the userspace
> driver is willing to manage VFs (by setting a token)...

As I replied in patch [5/7] the operations don't need to happen
independently, configuring SR-IOV in advance of the user driver
attaching or in collaboration with the user driver can also be enabled
with this series as is. Allowing the user driver to directly enable
SR-IOV and create VFs in the host is something I've avoided here, but
not precluded for later extensions. I think that allowing a user to
perform these operations represents a degree of privilege beyond
ownership of the PF itself, which is why I'm currently only enabling
the sysfs sriov_configure interface. The user driver needs to work in
collaboration with a privileged entity to enable SR-IOV, or be granted
access to operate on the sysfs interface directly.

> > SR-IOV could be pre-enabled, but the user must provide a vf_token
> > option on GET_DEVICE_FD, otherwise SR-IOV could only be enabled after
> > the user sets a VF token. But then do we need to invalidate the token
> > at some point, or else it seems like we have the same scenario when the
> > next user comes along. We believe there are PFs that require no
>
> I think so, e.g. when SR-IOV is being disabled, or when the fd is closed.

Can you articulate a specific risk that this would resolve? If we have
devices like the one supported by pci-pf-stub, where it's apparently
sufficient to provide no device access other than to enable SR-IOV on
the PF, re-implementing that in vfio-pci would require that the
userspace driver is notified when the SR-IOV configuration is changed
such that a VF token can be re-inserted. For what gain?

> > special VF support other than sriov_configure, so those driver could
> > theoretically close the PF after setting a VF token. That makes it
>
> theoretically yes, but I'm not sure the real gain of supporting such
> usage. ????

Likewise I don't see the gain of restricting it.

> btw with your question I realize another potential open. Now an
> user could also use sysfs to reset the PF, which definitely affects the
> state of VFs. Do we want a token match with that path? or such
> intention is assumed to be trusted by VF drivers given that only
> privileged users can do it?

I think we're going into the weeds here, a privileged user can use the
pci-sysfs reset interface to break all sorts of things. I'm certainly
not going to propose any sort of VF token interface to restrict it.
Privileged users can do bad things via sysfs. Privileged users can
configure PFs in ways that may not be compatible with any given
userspace VF driver. I'm assuming collaboration in the best interest
of enabling the user driver. Thanks,

Alex

> > difficult to determine the lifetime of a VF token and leads to the
> > interface proposed here of an initial random token, then the user set
> > token persisting indefinitely.
> >
> > I've tended consider all of these to be mechanisms that a user can
> > shoot themselves in the foot. Yes, the user and admin can do things
> > that will fail to work with this interface, for example my testing
> > involves QEMU, where we don't expose SR-IOV to the guest yet and the
> > igb driver for the PF will encounter problems running a device with
> > SR-IOV enabled that it doesn't know about. Do we want to try to play
> > nanny and require specific semantics? I've opt'd for the more simple
> > code here.
> >
> > > > +
> > > > + mutex_unlock(&vdev->vf_token->lock);
> > > > + } else if (vf_token) {
> > > > + pci_info_ratelimited(vdev->pdev,
> > > > + "VF token incorrectly provided, not a PF or VF\n");
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +#define VF_TOKEN_ARG "vf_token="
> > > > +
> > > > static int vfio_pci_match(void *device_data, char *buf)
> > > > {
> > > > struct vfio_pci_device *vdev = device_data;
> > > > + bool vf_token = false;
> > > > + uuid_t uuid;
> > > > + int ret;
> > > > +
> > > > + if (strncmp(pci_name(vdev->pdev), buf, strlen(pci_name(vdev-
> > > > >pdev))))
> > > > + return 0; /* No match */
> > > > +
> > > > + if (strlen(buf) > strlen(pci_name(vdev->pdev))) {
> > > > + buf += strlen(pci_name(vdev->pdev));
> > > > +
> > > > + if (*buf != ' ')
> > > > + return 0; /* No match: non-whitespace after name */
> > > > +
> > > > + while (*buf) {
> > > > + if (*buf == ' ') {
> > > > + buf++;
> > > > + continue;
> > > > + }
> > > > +
> > > > + if (!vf_token && !strncmp(buf, VF_TOKEN_ARG,
> > > > + strlen(VF_TOKEN_ARG))) {
> > > > + buf += strlen(VF_TOKEN_ARG);
> > > > +
> > > > + if (strlen(buf) < UUID_STRING_LEN)
> > > > + return -EINVAL;
> > > > +
> > > > + ret = uuid_parse(buf, &uuid);
> > > > + if (ret)
> > > > + return ret;
> > > >
> > > > - return !strcmp(pci_name(vdev->pdev), buf);
> > > > + vf_token = true;
> > > > + buf += UUID_STRING_LEN;
> > > > + } else {
> > > > + /* Unknown/duplicate option */
> > > > + return -EINVAL;
> > > > + }
> > > > + }
> > > > + }
> > > > +
> > > > + ret = vfio_pci_validate_vf_token(vdev, vf_token, &uuid);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + return 1; /* Match */
> > > > }
> > > >
> > > > static const struct vfio_device_ops vfio_pci_ops = {
> > > > @@ -1354,6 +1531,19 @@ static int vfio_pci_probe(struct pci_dev *pdev,
> > > > const struct pci_device_id *id)
> > > > return ret;
> > > > }
> > > >
> > > > + if (pdev->is_physfn) {
> > > > + vdev->vf_token = kzalloc(sizeof(*vdev->vf_token),
> > > > GFP_KERNEL);
> > > > + if (!vdev->vf_token) {
> > > > + vfio_pci_reflck_put(vdev->reflck);
> > > > + vfio_del_group_dev(&pdev->dev);
> > > > + vfio_iommu_group_put(group, &pdev->dev);
> > > > + kfree(vdev);
> > > > + return -ENOMEM;
> > > > + }
> > > > + mutex_init(&vdev->vf_token->lock);
> > > > + uuid_gen(&vdev->vf_token->uuid);
> > >
> > > should we also regenerate a random uuid somewhere when SR-IOV is
> > > disabled and then re-enabled on a PF? Although vfio disallows userspace
> > > to read uuid, it is always safer to avoid caching a secret from previous
> > > user.
> >
> > What if our user is QEMU emulating SR-IOV to the guest. Do we want to
> > force a new VF token is set every time we bounce the VFs? Why? As
> > above, the session lifetime of the VF token might be difficult to
> > determine and I'm not sure paranoia is a sufficient reason to try to
> > create boundaries for it. Thanks,
> >
> > Alex
> >
> > > > + }
> > > > +
> > > > if (vfio_pci_is_vga(pdev)) {
> > > > vga_client_register(pdev, vdev, NULL,
> > > > vfio_pci_set_vga_decode);
> > > > vga_set_legacy_decoding(pdev,
> > > > @@ -1387,6 +1577,12 @@ static void vfio_pci_remove(struct pci_dev
> > *pdev)
> > > > if (!vdev)
> > > > return;
> > > >
> > > > + if (vdev->vf_token) {
> > > > + WARN_ON(vdev->vf_token->users);
> > > > + mutex_destroy(&vdev->vf_token->lock);
> > > > + kfree(vdev->vf_token);
> > > > + }
> > > > +
> > > > vfio_pci_reflck_put(vdev->reflck);
> > > >
> > > > vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
> > > > diff --git a/drivers/vfio/pci/vfio_pci_private.h
> > > > b/drivers/vfio/pci/vfio_pci_private.h
> > > > index 8a2c7607d513..76c11c915949 100644
> > > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > > @@ -12,6 +12,7 @@
> > > > #include <linux/pci.h>
> > > > #include <linux/irqbypass.h>
> > > > #include <linux/types.h>
> > > > +#include <linux/uuid.h>
> > > >
> > > > #ifndef VFIO_PCI_PRIVATE_H
> > > > #define VFIO_PCI_PRIVATE_H
> > > > @@ -84,6 +85,12 @@ struct vfio_pci_reflck {
> > > > struct mutex lock;
> > > > };
> > > >
> > > > +struct vfio_pci_vf_token {
> > > > + struct mutex lock;
> > > > + uuid_t uuid;
> > > > + int users;
> > > > +};
> > > > +
> > > > struct vfio_pci_device {
> > > > struct pci_dev *pdev;
> > > > void __iomem *barmap[PCI_STD_NUM_BARS];
> > > > @@ -122,6 +129,7 @@ struct vfio_pci_device {
> > > > struct list_head dummy_resources_list;
> > > > struct mutex ioeventfds_lock;
> > > > struct list_head ioeventfds_list;
> > > > + struct vfio_pci_vf_token *vf_token;
> > > > };
> > > >
> > > > #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
> > >
>

2020-03-06 15:50:52

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] vfio/pci: Add sriov_configure support

On Fri, 6 Mar 2020 09:45:40 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Tian, Kevin
> > Sent: Friday, March 6, 2020 3:57 PM
> >
> > > From: Alex Williamson <[email protected]>
> > > Sent: Friday, March 6, 2020 2:23 AM
> > >
> > > On Tue, 25 Feb 2020 03:08:00 +0000
> > > "Tian, Kevin" <[email protected]> wrote:
> > >
> > > > > From: Alex Williamson
> > > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > > >
> > > > > With the VF Token interface we can now expect that a vfio userspace
> > > > > driver must be in collaboration with the PF driver, an unwitting
> > > > > userspace driver will not be able to get past the GET_DEVICE_FD step
> > > > > in accessing the device. We can now move on to actually allowing
> > > > > SR-IOV to be enabled by vfio-pci on the PF. Support for this is not
> > > > > enabled by default in this commit, but it does provide a module option
> > > > > for this to be enabled (enable_sriov=1). Enabling VFs is rather
> > > > > straightforward, except we don't want to risk that a VF might get
> > > > > autoprobed and bound to other drivers, so a bus notifier is used to
> > > > > "capture" VFs to vfio-pci using the driver_override support. We
> > > > > assume any later action to bind the device to other drivers is
> > > > > condoned by the system admin and allow it with a log warning.
> > > > >
> > > > > vfio-pci will disable SR-IOV on a PF before releasing the device,
> > > > > allowing a VF driver to be assured other drivers cannot take over the
> > > > > PF and that any other userspace driver must know the shared VF token.
> > > > > This support also does not provide a mechanism for the PF userspace
> > > > > driver itself to manipulate SR-IOV through the vfio API. With this
> > > > > patch SR-IOV can only be enabled via the host sysfs interface and the
> > > > > PF driver user cannot create or remove VFs.
> > > >
> > > > I'm not sure how many devices can be properly configured simply
> > > > with pci_enable_sriov. It is not unusual to require PF driver prepare
> > > > something before turning PCI SR-IOV capability. If you look kernel
> > > > PF drivers, there are only two using generic pci_sriov_configure_
> > > > simple (simple wrapper like pci_enable_sriov), while most others
> > > > implementing their own callback. However vfio itself has no idea
> > > > thus I'm not sure how an user knows whether using this option can
> > > > actually meet his purpose. I may miss something here, possibly
> > > > using DPDK as an example will make it clearer.
> > >
> > > There is still the entire vfio userspace driver interface. Imagine for
> > > example that QEMU emulates the SR-IOV capability and makes a call out
> > > to libvirt (or maybe runs with privs for the PF SR-IOV sysfs attribs)
> > > when the guest enables SR-IOV. Can't we assume that any PF specific
> > > support can still be performed in the userspace/guest driver, leaving
> > > us with a very simple and generic sriov_configure callback in vfio-pci?
> >
> > Makes sense. One concern, though, is how an user could be warned
> > if he inadvertently uses sysfs to enable SR-IOV on a vfio device whose
> > userspace driver is incapable of handling it. Note any VFIO device,
> > if SR-IOV capable, will allow user to do so once the module option is
> > turned on and the callback is registered. I felt such uncertainty can be
> > contained by toggling SR-IOV through a vfio api, but from your description
> > obviously it is what you want to avoid. Is it due to the sequence reason,
> > e.g. that SR-IOV must be enabled before userspace PF driver sets the
> > token?
> >
>
> reading again I found that you specifically mentioned "the PF driver user
> cannot create or remove VFs.". However I failed to get the rationale
> behind. If the VF drivers have built the trust with the PF driver through
> the token, what is the problem of allowing the PF driver to further manage
> SR-IOV itself? suppose any VF removal will be done in a cooperate way
> to avoid surprise impact to related VF drivers. then possibly a new vfio
> ioctl for setting the VF numbers plus a token from the userspace driver
> could also serve the purpose of this patch series (GET_DEVICE_FD + sysfs)?

If a user is allowed to create VFs, does that user automatically get
ownership of those devices? How is that accomplished? What if we want
to make use of the VF via a separate process? How do we coordinate
that with the PF driver? All of these problems are resolved if we
assume the userspace PF driver needs to operate in collaboration with a
privileged entity to interact with sysfs to configure SR-IOV and manage
the resulting VFs. I have no desire to take on that responsibility
within vfio-pci and I also feel that a user owning a PF device should
not inherently grant that user the ability to create and remove other
devices on the host, even if they are sourced from the PF. Thanks,

Alex

2020-03-06 16:26:05

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] vfio/pci: SR-IOV support

On Fri, 6 Mar 2020 11:35:21 +0800
Jason Wang <[email protected]> wrote:

> On 2020/3/6 上午1:14, Alex Williamson wrote:
> > On Tue, 25 Feb 2020 14:09:07 +0800
> > Jason Wang <[email protected]> wrote:
> >
> >> On 2020/2/25 上午10:33, Tian, Kevin wrote:
> >>>> From: Alex Williamson
> >>>> Sent: Thursday, February 20, 2020 2:54 AM
> >>>>
> >>>> Changes since v1 are primarily to patch 3/7 where the commit log is
> >>>> rewritten, along with option parsing and failure logging based on
> >>>> upstream discussions. The primary user visible difference is that
> >>>> option parsing is now much more strict. If a vf_token option is
> >>>> provided that cannot be used, we generate an error. As a result of
> >>>> this, opening a PF with a vf_token option will serve as a mechanism of
> >>>> setting the vf_token. This seems like a more user friendly API than
> >>>> the alternative of sometimes requiring the option (VFs in use) and
> >>>> sometimes rejecting it, and upholds our desire that the option is
> >>>> always either used or rejected.
> >>>>
> >>>> This also means that the VFIO_DEVICE_FEATURE ioctl is not the only
> >>>> means of setting the VF token, which might call into question whether
> >>>> we absolutely need this new ioctl. Currently I'm keeping it because I
> >>>> can imagine use cases, for example if a hypervisor were to support
> >>>> SR-IOV, the PF device might be opened without consideration for a VF
> >>>> token and we'd require the hypservisor to close and re-open the PF in
> >>>> order to set a known VF token, which is impractical.
> >>>>
> >>>> Series overview (same as provided with v1):
> >>> Thanks for doing this!
> >>>
> >>>> The synopsis of this series is that we have an ongoing desire to drive
> >>>> PCIe SR-IOV PFs from userspace with VFIO. There's an immediate need
> >>>> for this with DPDK drivers and potentially interesting future use
> >>> Can you provide a link to the DPDK discussion?
> >>>
> >>>> cases in virtualization. We've been reluctant to add this support
> >>>> previously due to the dependency and trust relationship between the
> >>>> VF device and PF driver. Minimally the PF driver can induce a denial
> >>>> of service to the VF, but depending on the specific implementation,
> >>>> the PF driver might also be responsible for moving data between VFs
> >>>> or have direct access to the state of the VF, including data or state
> >>>> otherwise private to the VF or VF driver.
> >>> Just a loud thinking. While the motivation of VF token sounds reasonable
> >>> to me, I'm curious why the same concern is not raised in other usages.
> >>> For example, there is no such design in virtio framework, where the
> >>> virtio device could also be restarted, putting in separate process (vhost-user),
> >>> and even in separate VM (virtio-vhost-user), etc.
> >>
> >> AFAIK, the restart could only be triggered by either VM or qemu. But
> >> yes, the datapath could be offloaded.
> >>
> >> But I'm not sure introducing another dedicated mechanism is better than
> >> using the exist generic POSIX mechanism to make sure the connection
> >> (AF_UINX) is secure.
> >>
> >>
> >>> Of course the para-
> >>> virtualized attribute of virtio implies some degree of trust, but as you
> >>> mentioned many SR-IOV implementations support VF->PF communication
> >>> which also implies some level of trust. It's perfectly fine if VFIO just tries
> >>> to do better than other sub-systems, but knowing how other people
> >>> tackle the similar problem may make the whole picture clearer. ????
> >>>
> >>> +Jason.
> >>
> >> I'm not quite sure e.g allowing userspace PF driver with kernel VF
> >> driver would not break the assumption of kernel security model. At least
> >> we should forbid a unprivileged PF driver running in userspace.
> > It might be useful to have your opinion on this series, because that's
> > exactly what we're trying to do here. Various environments, DPDK
> > specifically, want a userspace PF driver. This series takes steps to
> > mitigate the risk of having such a driver, such as requiring this VF
> > token interface to extend the VFIO interface and validate participation
> > around a PF that is not considered trusted by the kernel.
>
>
> I may miss something. But what happens if:
>
> - PF driver is running by unprivileged user
> - PF is programmed to send translated DMA request
> - Then unprivileged user can mangle the kernel data

ATS is a security risk regardless of SR-IOV, how does this change it?
Thanks,

Alex

> > We also set
> > a driver_override to try to make sure no host kernel driver can
> > automatically bind to a VF of a user owned PF, only vfio-pci, but we
> > don't prevent the admin from creating configurations where the VFs are
> > used by other host kernel drivers.
> >
> > I think the question Kevin is inquiring about is whether virtio devices
> > are susceptible to the type of collaborative, shared key environment
> > we're creating here. For example, can a VM or qemu have access to
> > reset a virtio device in a way that could affect other devices, ex. FLR
> > on a PF that could interfere with VF operation. Thanks,
>
>
> Right, but I'm not sure it can be done only via virtio or need support
> from transport (e.g PCI).
>
> Thanks
>
>
> >
> > Alex
> >

2020-03-06 22:18:22

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] vfio/pci: Add sriov_configure support

On Fri, 6 Mar 2020 07:57:19 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Alex Williamson <[email protected]>
> > Sent: Friday, March 6, 2020 2:23 AM
> >
> > On Tue, 25 Feb 2020 03:08:00 +0000
> > "Tian, Kevin" <[email protected]> wrote:
> >
> > > > From: Alex Williamson
> > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > >
> > > > With the VF Token interface we can now expect that a vfio userspace
> > > > driver must be in collaboration with the PF driver, an unwitting
> > > > userspace driver will not be able to get past the GET_DEVICE_FD step
> > > > in accessing the device. We can now move on to actually allowing
> > > > SR-IOV to be enabled by vfio-pci on the PF. Support for this is not
> > > > enabled by default in this commit, but it does provide a module option
> > > > for this to be enabled (enable_sriov=1). Enabling VFs is rather
> > > > straightforward, except we don't want to risk that a VF might get
> > > > autoprobed and bound to other drivers, so a bus notifier is used to
> > > > "capture" VFs to vfio-pci using the driver_override support. We
> > > > assume any later action to bind the device to other drivers is
> > > > condoned by the system admin and allow it with a log warning.
> > > >
> > > > vfio-pci will disable SR-IOV on a PF before releasing the device,
> > > > allowing a VF driver to be assured other drivers cannot take over the
> > > > PF and that any other userspace driver must know the shared VF token.
> > > > This support also does not provide a mechanism for the PF userspace
> > > > driver itself to manipulate SR-IOV through the vfio API. With this
> > > > patch SR-IOV can only be enabled via the host sysfs interface and the
> > > > PF driver user cannot create or remove VFs.
> > >
> > > I'm not sure how many devices can be properly configured simply
> > > with pci_enable_sriov. It is not unusual to require PF driver prepare
> > > something before turning PCI SR-IOV capability. If you look kernel
> > > PF drivers, there are only two using generic pci_sriov_configure_
> > > simple (simple wrapper like pci_enable_sriov), while most others
> > > implementing their own callback. However vfio itself has no idea
> > > thus I'm not sure how an user knows whether using this option can
> > > actually meet his purpose. I may miss something here, possibly
> > > using DPDK as an example will make it clearer.
> >
> > There is still the entire vfio userspace driver interface. Imagine for
> > example that QEMU emulates the SR-IOV capability and makes a call out
> > to libvirt (or maybe runs with privs for the PF SR-IOV sysfs attribs)
> > when the guest enables SR-IOV. Can't we assume that any PF specific
> > support can still be performed in the userspace/guest driver, leaving
> > us with a very simple and generic sriov_configure callback in vfio-pci?
>
> Makes sense. One concern, though, is how an user could be warned
> if he inadvertently uses sysfs to enable SR-IOV on a vfio device whose
> userspace driver is incapable of handling it. Note any VFIO device,
> if SR-IOV capable, will allow user to do so once the module option is
> turned on and the callback is registered. I felt such uncertainty can be
> contained by toggling SR-IOV through a vfio api, but from your description
> obviously it is what you want to avoid. Is it due to the sequence reason,
> e.g. that SR-IOV must be enabled before userspace PF driver sets the
> token?

As in my other reply, enabling SR-IOV via a vfio API suggests that
we're not only granting the user owning the PF device access to the
device itself, but also the ability to create and remove subordinate
devices on the host. That implies an extended degree of trust in the
user beyond the PF device itself and raises questions about whether a
user who is allowed to create VF devices should automatically be
granted access to those VF devices, what the mechanism would be for
that, and how we might re-assign those devices to other users,
potentially including host kernel usage. What I'm proposing here
doesn't preclude some future extension in that direction, but instead
tries to simplify a first step towards enabling SR-IOV by leaving the
SR-IOV enablement and VF assignment in the realm of a privileged system
entity.

So, what I think you're suggesting here is that we should restrict
vfio_pci_sriov_configure() to reject enabling SR-IOV until a user
driver has configured a VF token. That requires both that the
userspace driver has initialized to this point before SR-IOV can be
enabled and that we would be forced to define a termination point for
the user set VF token. Logically, this would need to be when the
userspace driver exits or closes the PF device, which implies that we
need to disable SR-IOV on the PF at this point, or we're left in an
inconsistent state where VFs are enabled but cannot be disabled because
we don't have a valid VF token. Now we're back to nearly a state where
the user has control of not creating devices on the host, but removing
them by closing the device, which will necessarily require that any VF
driver release the device, whether userspace or kernel.

I'm not sure what we're gaining by doing this though. I agree that
there will be users that enable SR-IOV on a PF and then try to, for
example, assign the PF and all the VFs to a VM. The VFs will fail due
to lacking VF token support, unless they've patch QEMU with my test
code, but depending on the PF driver in the guest, it may, or more
likely won't work. But don't you think the VFs and probably PF not
working is a sufficient clue that the configuration is invalid? OTOH,
from what I've heard of the device in the ID table of the pci-pf-stub
driver, they might very well be able to work with both PF and VFs in
QEMU using only my test code to set the VF token.

Therefore, I'm afraid what you're asking for here is to impose a usage
restriction as a sanity test, when we don't really know what might be
sane for this particular piece of hardware or use case. There are
infinite ways that a vfio based userspace driver can fail to configure
their hardware and make it work correctly, many of them are device
specific. Isn't this just one of those cases? Thanks,

Alex

2020-03-07 01:05:58

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 3/7] vfio/pci: Introduce VF token

> From: Alex Williamson <[email protected]>
> Sent: Friday, March 6, 2020 11:39 PM
>
> On Fri, 6 Mar 2020 08:32:40 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Alex Williamson <[email protected]>
> > > Sent: Friday, March 6, 2020 2:18 AM
> > >
> > > On Tue, 25 Feb 2020 02:59:37 +0000
> > > "Tian, Kevin" <[email protected]> wrote:
> > >
> > > > > From: Alex Williamson
> > > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > > >
> > > > > If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
> > > > > fully isolated from the PF. The PF can always cause a denial of service
> > > > > to the VF, even if by simply resetting itself. The degree to which a PF
> > > > > can access the data passed through a VF or interfere with its
> operation
> > > > > is dependent on a given SR-IOV implementation. Therefore we want
> to
> > > > > avoid a scenario where an existing vfio-pci based userspace driver
> might
> > > > > assume the PF driver is trusted, for example assigning a PF to one VM
> > > > > and VF to another with some expectation of isolation. IOMMU
> grouping
> > > > > could be a solution to this, but imposes an unnecessarily strong
> > > > > relationship between PF and VF drivers if they need to operate with
> the
> > > > > same IOMMU context. Instead we introduce a "VF token", which is
> > > > > essentially just a shared secret between PF and VF drivers,
> implemented
> > > > > as a UUID.
> > > > >
> > > > > The VF token can be set by a vfio-pci based PF driver and must be
> known
> > > > > by the vfio-pci based VF driver in order to gain access to the device.
> > > > > This allows the degree to which this VF token is considered secret to
> be
> > > > > determined by the applications and environment. For example a VM
> > > might
> > > > > generate a random UUID known only internally to the hypervisor
> while a
> > > > > userspace networking appliance might use a shared, or even well
> know,
> > > > > UUID among the application drivers.
> > > > >
> > > > > To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD
> interface
> > > is
> > > > > extended to accept key=value pairs in addition to the device name.
> This
> > > > > allows us to most easily deny user access to the device without risk
> > > > > that existing userspace drivers assume region offsets, IRQs, and other
> > > > > device features, leading to more elaborate error paths. The format of
> > > > > these options are expected to take the form:
> > > > >
> > > > > "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"
> > > > >
> > > > > Where the device name is always provided first for compatibility and
> > > > > additional options are specified in a space separated list. The
> > > > > relation between and requirements for the additional options will be
> > > > > vfio bus driver dependent, however unknown or unused option
> within
> > > this
> > > > > schema should return error. This allow for future use of unknown
> > > > > options as well as a positive indication to the user that an option is
> > > > > used.
> > > > >
> > > > > An example VF token option would take this form:
> > > > >
> > > > > "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"
> > > > >
> > > > > When accessing a VF where the PF is making use of vfio-pci, the user
> > > > > MUST provide the current vf_token. When accessing a PF, the user
> MUST
> > > > > provide the current vf_token IF there are active VF users or MAY
> provide
> > > > > a vf_token in order to set the current VF token when no VF users are
> > > > > active. The former requirement assures VF users that an
> unassociated
> > > > > driver cannot usurp the PF device. These semantics also imply that a
> > > > > VF token MUST be set by a PF driver before VF drivers can access their
> > > > > device, the default token is random and mechanisms to read the
> token
> > > are
> > > > > not provided in order to protect the VF token of previous users. Use
> of
> > > > > the vf_token option outside of these cases will return an error, as
> > > > > discussed above.
> > > > >
> > > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > > ---
> > > > > drivers/vfio/pci/vfio_pci.c | 198
> > > > > +++++++++++++++++++++++++++++++++++
> > > > > drivers/vfio/pci/vfio_pci_private.h | 8 +
> > > > > 2 files changed, 205 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > > > index 2ec6c31d0ab0..8dd6ef9543ca 100644
> > > > > --- a/drivers/vfio/pci/vfio_pci.c
> > > > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > > > @@ -466,6 +466,44 @@ static void vfio_pci_disable(struct
> > > vfio_pci_device
> > > > > *vdev)
> > > > > vfio_pci_set_power_state(vdev, PCI_D3hot);
> > > > > }
> > > > >
> > > > > +static struct pci_driver vfio_pci_driver;
> > > > > +
> > > > > +static struct vfio_pci_device *get_pf_vdev(struct vfio_pci_device
> *vdev,
> > > > > + struct vfio_device **pf_dev)
> > > > > +{
> > > > > + struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > > > > +
> > > > > + if (!vdev->pdev->is_virtfn)
> > > > > + return NULL;
> > > > > +
> > > > > + *pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > > > > + if (!*pf_dev)
> > > > > + return NULL;
> > > > > +
> > > > > + if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> > > > > + vfio_device_put(*pf_dev);
> > > > > + return NULL;
> > > > > + }
> > > > > +
> > > > > + return vfio_device_data(*pf_dev);
> > > > > +}
> > > > > +
> > > > > +static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev,
> int
> > > val)
> > > > > +{
> > > > > + struct vfio_device *pf_dev;
> > > > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> &pf_dev);
> > > > > +
> > > > > + if (!pf_vdev)
> > > > > + return;
> > > > > +
> > > > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > > > + pf_vdev->vf_token->users += val;
> > > > > + WARN_ON(pf_vdev->vf_token->users < 0);
> > > > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > > > +
> > > > > + vfio_device_put(pf_dev);
> > > > > +}
> > > > > +
> > > > > static void vfio_pci_release(void *device_data)
> > > > > {
> > > > > struct vfio_pci_device *vdev = device_data;
> > > > > @@ -473,6 +511,7 @@ static void vfio_pci_release(void *device_data)
> > > > > mutex_lock(&vdev->reflck->lock);
> > > > >
> > > > > if (!(--vdev->refcnt)) {
> > > > > + vfio_pci_vf_token_user_add(vdev, -1);
> > > > > vfio_spapr_pci_eeh_release(vdev->pdev);
> > > > > vfio_pci_disable(vdev);
> > > > > }
> > > > > @@ -498,6 +537,7 @@ static int vfio_pci_open(void *device_data)
> > > > > goto error;
> > > > >
> > > > > vfio_spapr_pci_eeh_open(vdev->pdev);
> > > > > + vfio_pci_vf_token_user_add(vdev, 1);
> > > > > }
> > > > > vdev->refcnt++;
> > > > > error:
> > > > > @@ -1278,11 +1318,148 @@ static void vfio_pci_request(void
> > > *device_data,
> > > > > unsigned int count)
> > > > > mutex_unlock(&vdev->igate);
> > > > > }
> > > > >
> > > > > +static int vfio_pci_validate_vf_token(struct vfio_pci_device *vdev,
> > > > > + bool vf_token, uuid_t *uuid)
> > > > > +{
> > > > > + /*
> > > > > + * There's always some degree of trust or collaboration
> between SR-
> > > > > IOV
> > > > > + * PF and VFs, even if just that the PF hosts the SR-IOV
> capability and
> > > > > + * can disrupt VFs with a reset, but often the PF has more
> explicit
> > > > > + * access to deny service to the VF or access data passed
> through the
> > > > > + * VF. We therefore require an opt-in via a shared VF token
> (UUID)
> > > > > to
> > > > > + * represent this trust. This both prevents that a VF driver
> might
> > > > > + * assume the PF driver is a trusted, in-kernel driver, and also
> that
> > > > > + * a PF driver might be replaced with a rogue driver, unknown
> to in-
> > > > > use
> > > > > + * VF drivers.
> > > > > + *
> > > > > + * Therefore when presented with a VF, if the PF is a vfio
> device and
> > > > > + * it is bound to the vfio-pci driver, the user needs to provide
> a VF
> > > > > + * token to access the device, in the form of appending a
> vf_token to
> > > > > + * the device name, for example:
> > > > > + *
> > > > > + * "0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-
> f591514ba1f3"
> > > > > + *
> > > > > + * When presented with a PF which has VFs in use, the user
> must also
> > > > > + * provide the current VF token to prove collaboration with
> existing
> > > > > + * VF users. If VFs are not in use, the VF token provided for
> the PF
> > > > > + * device will act to set the VF token.
> > > > > + *
> > > > > + * If the VF token is provided but unused, a fault is generated.
> > > >
> > > > fault->error, otherwise it is easy to consider a CPU fault. ????
> > >
> > > Ok, I can make that change, but I think you might have a unique
> > > background to make a leap that a userspace ioctl can trigger a CPU
> > > fault ;)
> > >
> > > > > + */
> > > > > + if (!vdev->pdev->is_virtfn && !vdev->vf_token && !vf_token)
> > > > > + return 0; /* No VF token provided or required */
> > > > > +
> > > > > + if (vdev->pdev->is_virtfn) {
> > > > > + struct vfio_device *pf_dev;
> > > > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> > > > > &pf_dev);
> > > > > + bool match;
> > > > > +
> > > > > + if (!pf_vdev) {
> > > > > + if (!vf_token)
> > > > > + return 0; /* PF is not vfio-pci, no VF
> token */
> > > > > +
> > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > + "VF token incorrectly provided, PF not
> bound
> > > > > to vfio-pci\n");
> > > > > + return -EINVAL;
> > > > > + }
> > > > > +
> > > > > + if (!vf_token) {
> > > > > + vfio_device_put(pf_dev);
> > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > + "VF token required to access
> device\n");
> > > > > + return -EACCES;
> > > > > + }
> > > > > +
> > > > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > > > + match = uuid_equal(uuid, &pf_vdev->vf_token-
> >uuid);
> > > > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > > > +
> > > > > + vfio_device_put(pf_dev);
> > > > > +
> > > > > + if (!match) {
> > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > + "Incorrect VF token provided for
> device\n");
> > > > > + return -EACCES;
> > > > > + }
> > > > > + } else if (vdev->vf_token) {
> > > > > + mutex_lock(&vdev->vf_token->lock);
> > > > > + if (vdev->vf_token->users) {
> > > > > + if (!vf_token) {
> > > > > + mutex_unlock(&vdev->vf_token-
> >lock);
> > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > + "VF token required to access
> > > > > device\n");
> > > > > + return -EACCES;
> > > > > + }
> > > > > +
> > > > > + if (!uuid_equal(uuid, &vdev->vf_token->uuid))
> {
> > > > > + mutex_unlock(&vdev->vf_token-
> >lock);
> > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > + "Incorrect VF token provided
> for
> > > > > device\n");
> > > > > + return -EACCES;
> > > > > + }
> > > > > + } else if (vf_token) {
> > > > > + uuid_copy(&vdev->vf_token->uuid, uuid);
> > > > > + }
> > > >
> > > > It implies that we allow PF to be accessed w/o providing a VF token,
> > > > as long as no VF is currently in-use, which further means no VF can
> > > > be further assigned since no one knows the random uuid allocated
> > > > by vfio. Just want to confirm whether it is the desired flavor. If an
> > > > user really wants to use PF-only, possibly he should disable SR-IOV
> > > > instead...
> > >
> > > Yes, this is the behavior I'm intending. Are you suggesting that we
> > > should require a VF token in order to access a PF that has SR-IOV
> > > already enabled? This introduces an inconsistency that SR-IOV can be
> >
> > yes. I felt that it's meaningless otherwise if an user has no attempt to
> > manage SR-IOV but still leaving it enabled. In many cases, enabling of
> > SR-IOV may reserve some resource in the hardware, thus simply hurting
> > PF performance.
>
> But a user needs to be granted access to a device by a privileged
> entity and the privileged entity may also enable SR-IOV, so it seems
> you're assuming the privileged entity is operating independently and
> not in the best interest of enabling the specific user case.

what about throwing out a warning for such situation? so the userspace
knows some collaboration is missing before its access to the device.

>
> > > enabled via sysfs asynchronous to the GET_DEVICE_FD ioctl, so we'd
> need
> > > to secure the sysfs interface to only allow enabling SR-IOV when the PF
> > > is already opened to cases where the VF token is already set? Thus
> >
> > yes, the PF is assigned to the userspace driver, thus it's reasonable to
> > have the userspace driver decide whether to enable or disable SR-IOV
> > when the PF is under its control. as I replied to patch [5/7], the sysfs
> > interface alone looks problematic w/o knowing whether the userspace
> > driver is willing to manage VFs (by setting a token)...
>
> As I replied in patch [5/7] the operations don't need to happen
> independently, configuring SR-IOV in advance of the user driver
> attaching or in collaboration with the user driver can also be enabled
> with this series as is. Allowing the user driver to directly enable
> SR-IOV and create VFs in the host is something I've avoided here, but
> not precluded for later extensions. I think that allowing a user to
> perform these operations represents a degree of privilege beyond
> ownership of the PF itself, which is why I'm currently only enabling
> the sysfs sriov_configure interface. The user driver needs to work in
> collaboration with a privileged entity to enable SR-IOV, or be granted
> access to operate on the sysfs interface directly.

Thanks. this assumption was clearly overlooked in my previous thinking.

>
> > > SR-IOV could be pre-enabled, but the user must provide a vf_token
> > > option on GET_DEVICE_FD, otherwise SR-IOV could only be enabled after
> > > the user sets a VF token. But then do we need to invalidate the token
> > > at some point, or else it seems like we have the same scenario when the
> > > next user comes along. We believe there are PFs that require no
> >
> > I think so, e.g. when SR-IOV is being disabled, or when the fd is closed.
>
> Can you articulate a specific risk that this would resolve? If we have
> devices like the one supported by pci-pf-stub, where it's apparently
> sufficient to provide no device access other than to enable SR-IOV on
> the PF, re-implementing that in vfio-pci would require that the
> userspace driver is notified when the SR-IOV configuration is changed
> such that a VF token can be re-inserted. For what gain?
>
> > > special VF support other than sriov_configure, so those driver could
> > > theoretically close the PF after setting a VF token. That makes it
> >
> > theoretically yes, but I'm not sure the real gain of supporting such
> > usage. ????
>
> Likewise I don't see the gain of restricting it.
>
> > btw with your question I realize another potential open. Now an
> > user could also use sysfs to reset the PF, which definitely affects the
> > state of VFs. Do we want a token match with that path? or such
> > intention is assumed to be trusted by VF drivers given that only
> > privileged users can do it?
>
> I think we're going into the weeds here, a privileged user can use the
> pci-sysfs reset interface to break all sorts of things. I'm certainly
> not going to propose any sort of VF token interface to restrict it.
> Privileged users can do bad things via sysfs. Privileged users can
> configure PFs in ways that may not be compatible with any given
> userspace VF driver. I'm assuming collaboration in the best interest
> of enabling the user driver. Thanks,
>
> Alex
>
> > > difficult to determine the lifetime of a VF token and leads to the
> > > interface proposed here of an initial random token, then the user set
> > > token persisting indefinitely.
> > >
> > > I've tended consider all of these to be mechanisms that a user can
> > > shoot themselves in the foot. Yes, the user and admin can do things
> > > that will fail to work with this interface, for example my testing
> > > involves QEMU, where we don't expose SR-IOV to the guest yet and the
> > > igb driver for the PF will encounter problems running a device with
> > > SR-IOV enabled that it doesn't know about. Do we want to try to play
> > > nanny and require specific semantics? I've opt'd for the more simple
> > > code here.
> > >
> > > > > +
> > > > > + mutex_unlock(&vdev->vf_token->lock);
> > > > > + } else if (vf_token) {
> > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > + "VF token incorrectly provided, not a PF or
> VF\n");
> > > > > + return -EINVAL;
> > > > > + }
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +#define VF_TOKEN_ARG "vf_token="
> > > > > +
> > > > > static int vfio_pci_match(void *device_data, char *buf)
> > > > > {
> > > > > struct vfio_pci_device *vdev = device_data;
> > > > > + bool vf_token = false;
> > > > > + uuid_t uuid;
> > > > > + int ret;
> > > > > +
> > > > > + if (strncmp(pci_name(vdev->pdev), buf,
> strlen(pci_name(vdev-
> > > > > >pdev))))
> > > > > + return 0; /* No match */
> > > > > +
> > > > > + if (strlen(buf) > strlen(pci_name(vdev->pdev))) {
> > > > > + buf += strlen(pci_name(vdev->pdev));
> > > > > +
> > > > > + if (*buf != ' ')
> > > > > + return 0; /* No match: non-whitespace after
> name */
> > > > > +
> > > > > + while (*buf) {
> > > > > + if (*buf == ' ') {
> > > > > + buf++;
> > > > > + continue;
> > > > > + }
> > > > > +
> > > > > + if (!vf_token && !strncmp(buf,
> VF_TOKEN_ARG,
> > > > > +
> strlen(VF_TOKEN_ARG))) {
> > > > > + buf += strlen(VF_TOKEN_ARG);
> > > > > +
> > > > > + if (strlen(buf) < UUID_STRING_LEN)
> > > > > + return -EINVAL;
> > > > > +
> > > > > + ret = uuid_parse(buf, &uuid);
> > > > > + if (ret)
> > > > > + return ret;
> > > > >
> > > > > - return !strcmp(pci_name(vdev->pdev), buf);
> > > > > + vf_token = true;
> > > > > + buf += UUID_STRING_LEN;
> > > > > + } else {
> > > > > + /* Unknown/duplicate option */
> > > > > + return -EINVAL;
> > > > > + }
> > > > > + }
> > > > > + }
> > > > > +
> > > > > + ret = vfio_pci_validate_vf_token(vdev, vf_token, &uuid);
> > > > > + if (ret)
> > > > > + return ret;
> > > > > +
> > > > > + return 1; /* Match */
> > > > > }
> > > > >
> > > > > static const struct vfio_device_ops vfio_pci_ops = {
> > > > > @@ -1354,6 +1531,19 @@ static int vfio_pci_probe(struct pci_dev
> *pdev,
> > > > > const struct pci_device_id *id)
> > > > > return ret;
> > > > > }
> > > > >
> > > > > + if (pdev->is_physfn) {
> > > > > + vdev->vf_token = kzalloc(sizeof(*vdev->vf_token),
> > > > > GFP_KERNEL);
> > > > > + if (!vdev->vf_token) {
> > > > > + vfio_pci_reflck_put(vdev->reflck);
> > > > > + vfio_del_group_dev(&pdev->dev);
> > > > > + vfio_iommu_group_put(group, &pdev->dev);
> > > > > + kfree(vdev);
> > > > > + return -ENOMEM;
> > > > > + }
> > > > > + mutex_init(&vdev->vf_token->lock);
> > > > > + uuid_gen(&vdev->vf_token->uuid);
> > > >
> > > > should we also regenerate a random uuid somewhere when SR-IOV is
> > > > disabled and then re-enabled on a PF? Although vfio disallows
> userspace
> > > > to read uuid, it is always safer to avoid caching a secret from previous
> > > > user.
> > >
> > > What if our user is QEMU emulating SR-IOV to the guest. Do we want to
> > > force a new VF token is set every time we bounce the VFs? Why? As
> > > above, the session lifetime of the VF token might be difficult to
> > > determine and I'm not sure paranoia is a sufficient reason to try to
> > > create boundaries for it. Thanks,
> > >
> > > Alex
> > >
> > > > > + }
> > > > > +
> > > > > if (vfio_pci_is_vga(pdev)) {
> > > > > vga_client_register(pdev, vdev, NULL,
> > > > > vfio_pci_set_vga_decode);
> > > > > vga_set_legacy_decoding(pdev,
> > > > > @@ -1387,6 +1577,12 @@ static void vfio_pci_remove(struct pci_dev
> > > *pdev)
> > > > > if (!vdev)
> > > > > return;
> > > > >
> > > > > + if (vdev->vf_token) {
> > > > > + WARN_ON(vdev->vf_token->users);
> > > > > + mutex_destroy(&vdev->vf_token->lock);
> > > > > + kfree(vdev->vf_token);
> > > > > + }
> > > > > +
> > > > > vfio_pci_reflck_put(vdev->reflck);
> > > > >
> > > > > vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
> > > > > diff --git a/drivers/vfio/pci/vfio_pci_private.h
> > > > > b/drivers/vfio/pci/vfio_pci_private.h
> > > > > index 8a2c7607d513..76c11c915949 100644
> > > > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > > > @@ -12,6 +12,7 @@
> > > > > #include <linux/pci.h>
> > > > > #include <linux/irqbypass.h>
> > > > > #include <linux/types.h>
> > > > > +#include <linux/uuid.h>
> > > > >
> > > > > #ifndef VFIO_PCI_PRIVATE_H
> > > > > #define VFIO_PCI_PRIVATE_H
> > > > > @@ -84,6 +85,12 @@ struct vfio_pci_reflck {
> > > > > struct mutex lock;
> > > > > };
> > > > >
> > > > > +struct vfio_pci_vf_token {
> > > > > + struct mutex lock;
> > > > > + uuid_t uuid;
> > > > > + int users;
> > > > > +};
> > > > > +
> > > > > struct vfio_pci_device {
> > > > > struct pci_dev *pdev;
> > > > > void __iomem *barmap[PCI_STD_NUM_BARS];
> > > > > @@ -122,6 +129,7 @@ struct vfio_pci_device {
> > > > > struct list_head dummy_resources_list;
> > > > > struct mutex ioeventfds_lock;
> > > > > struct list_head ioeventfds_list;
> > > > > + struct vfio_pci_vf_token *vf_token;
> > > > > };
> > > > >
> > > > > #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
> > > >
> >

2020-03-07 01:36:11

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 5/7] vfio/pci: Add sriov_configure support

> From: Alex Williamson
> Sent: Saturday, March 7, 2020 6:18 AM
>
> On Fri, 6 Mar 2020 07:57:19 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Alex Williamson <[email protected]>
> > > Sent: Friday, March 6, 2020 2:23 AM
> > >
> > > On Tue, 25 Feb 2020 03:08:00 +0000
> > > "Tian, Kevin" <[email protected]> wrote:
> > >
> > > > > From: Alex Williamson
> > > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > > >
> > > > > With the VF Token interface we can now expect that a vfio userspace
> > > > > driver must be in collaboration with the PF driver, an unwitting
> > > > > userspace driver will not be able to get past the GET_DEVICE_FD step
> > > > > in accessing the device. We can now move on to actually allowing
> > > > > SR-IOV to be enabled by vfio-pci on the PF. Support for this is not
> > > > > enabled by default in this commit, but it does provide a module
> option
> > > > > for this to be enabled (enable_sriov=1). Enabling VFs is rather
> > > > > straightforward, except we don't want to risk that a VF might get
> > > > > autoprobed and bound to other drivers, so a bus notifier is used to
> > > > > "capture" VFs to vfio-pci using the driver_override support. We
> > > > > assume any later action to bind the device to other drivers is
> > > > > condoned by the system admin and allow it with a log warning.
> > > > >
> > > > > vfio-pci will disable SR-IOV on a PF before releasing the device,
> > > > > allowing a VF driver to be assured other drivers cannot take over the
> > > > > PF and that any other userspace driver must know the shared VF
> token.
> > > > > This support also does not provide a mechanism for the PF userspace
> > > > > driver itself to manipulate SR-IOV through the vfio API. With this
> > > > > patch SR-IOV can only be enabled via the host sysfs interface and the
> > > > > PF driver user cannot create or remove VFs.
> > > >
> > > > I'm not sure how many devices can be properly configured simply
> > > > with pci_enable_sriov. It is not unusual to require PF driver prepare
> > > > something before turning PCI SR-IOV capability. If you look kernel
> > > > PF drivers, there are only two using generic pci_sriov_configure_
> > > > simple (simple wrapper like pci_enable_sriov), while most others
> > > > implementing their own callback. However vfio itself has no idea
> > > > thus I'm not sure how an user knows whether using this option can
> > > > actually meet his purpose. I may miss something here, possibly
> > > > using DPDK as an example will make it clearer.
> > >
> > > There is still the entire vfio userspace driver interface. Imagine for
> > > example that QEMU emulates the SR-IOV capability and makes a call out
> > > to libvirt (or maybe runs with privs for the PF SR-IOV sysfs attribs)
> > > when the guest enables SR-IOV. Can't we assume that any PF specific
> > > support can still be performed in the userspace/guest driver, leaving
> > > us with a very simple and generic sriov_configure callback in vfio-pci?
> >
> > Makes sense. One concern, though, is how an user could be warned
> > if he inadvertently uses sysfs to enable SR-IOV on a vfio device whose
> > userspace driver is incapable of handling it. Note any VFIO device,
> > if SR-IOV capable, will allow user to do so once the module option is
> > turned on and the callback is registered. I felt such uncertainty can be
> > contained by toggling SR-IOV through a vfio api, but from your description
> > obviously it is what you want to avoid. Is it due to the sequence reason,
> > e.g. that SR-IOV must be enabled before userspace PF driver sets the
> > token?
>
> As in my other reply, enabling SR-IOV via a vfio API suggests that
> we're not only granting the user owning the PF device access to the
> device itself, but also the ability to create and remove subordinate
> devices on the host. That implies an extended degree of trust in the
> user beyond the PF device itself and raises questions about whether a
> user who is allowed to create VF devices should automatically be
> granted access to those VF devices, what the mechanism would be for
> that, and how we might re-assign those devices to other users,
> potentially including host kernel usage. What I'm proposing here
> doesn't preclude some future extension in that direction, but instead
> tries to simplify a first step towards enabling SR-IOV by leaving the
> SR-IOV enablement and VF assignment in the realm of a privileged system
> entity.

the intention is clear to me now.

>
> So, what I think you're suggesting here is that we should restrict
> vfio_pci_sriov_configure() to reject enabling SR-IOV until a user
> driver has configured a VF token. That requires both that the
> userspace driver has initialized to this point before SR-IOV can be
> enabled and that we would be forced to define a termination point for
> the user set VF token. Logically, this would need to be when the
> userspace driver exits or closes the PF device, which implies that we
> need to disable SR-IOV on the PF at this point, or we're left in an
> inconsistent state where VFs are enabled but cannot be disabled because
> we don't have a valid VF token. Now we're back to nearly a state where
> the user has control of not creating devices on the host, but removing
> them by closing the device, which will necessarily require that any VF
> driver release the device, whether userspace or kernel.
>
> I'm not sure what we're gaining by doing this though. I agree that
> there will be users that enable SR-IOV on a PF and then try to, for
> example, assign the PF and all the VFs to a VM. The VFs will fail due
> to lacking VF token support, unless they've patch QEMU with my test
> code, but depending on the PF driver in the guest, it may, or more
> likely won't work. But don't you think the VFs and probably PF not
> working is a sufficient clue that the configuration is invalid? OTOH,
> from what I've heard of the device in the ID table of the pci-pf-stub
> driver, they might very well be able to work with both PF and VFs in
> QEMU using only my test code to set the VF token.
>
> Therefore, I'm afraid what you're asking for here is to impose a usage
> restriction as a sanity test, when we don't really know what might be
> sane for this particular piece of hardware or use case. There are
> infinite ways that a vfio based userspace driver can fail to configure
> their hardware and make it work correctly, many of them are device
> specific. Isn't this just one of those cases? Thanks,
>

what you said all makes sense. so I withdraw the idea of manipulating
SR-IOV through vfio ioctl. However I still feel that simply registering
sriov_configuration callback by vfio-pci somehow violates the typical
expectation of the sysfs interface. Before this patch, the success return
of writing non-zero value to numvfs implies VFs are in sane state and
functionally ready for immediate use. However now the behavior of
success return becomes undefined for vfio devices, since even vfio-pci
itself doesn't know whether VFs are functional for a random device
(may know some if carrying the same device IDs from pci-pf-stub). It
simply relies on the privileged entity who knows exactly the implication
of such write, while there is no way to warn inadvertent users which
to me is not a good design from kernel API p.o.v. Of course we may
document such restriction and the driver_override may also be an
indirect way to warn such user if he wants to use VFs for other purpose.
But it is still less elegant than reporting it in the first place. Maybe
what we really require is a new sysfs attribute purely for enabling
PCI SR-IOV capability, which doesn't imply making VFs actually
functional as did through the existing numvfs?

Thanks
Kevin

2020-03-09 00:46:45

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] vfio/pci: Introduce VF token

On Sat, 7 Mar 2020 01:04:41 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Alex Williamson <[email protected]>
> > Sent: Friday, March 6, 2020 11:39 PM
> >
> > On Fri, 6 Mar 2020 08:32:40 +0000
> > "Tian, Kevin" <[email protected]> wrote:
> >
> > > > From: Alex Williamson <[email protected]>
> > > > Sent: Friday, March 6, 2020 2:18 AM
> > > >
> > > > On Tue, 25 Feb 2020 02:59:37 +0000
> > > > "Tian, Kevin" <[email protected]> wrote:
> > > >
> > > > > > From: Alex Williamson
> > > > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > > > >
> > > > > > If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
> > > > > > fully isolated from the PF. The PF can always cause a denial of service
> > > > > > to the VF, even if by simply resetting itself. The degree to which a PF
> > > > > > can access the data passed through a VF or interfere with its
> > operation
> > > > > > is dependent on a given SR-IOV implementation. Therefore we want
> > to
> > > > > > avoid a scenario where an existing vfio-pci based userspace driver
> > might
> > > > > > assume the PF driver is trusted, for example assigning a PF to one VM
> > > > > > and VF to another with some expectation of isolation. IOMMU
> > grouping
> > > > > > could be a solution to this, but imposes an unnecessarily strong
> > > > > > relationship between PF and VF drivers if they need to operate with
> > the
> > > > > > same IOMMU context. Instead we introduce a "VF token", which is
> > > > > > essentially just a shared secret between PF and VF drivers,
> > implemented
> > > > > > as a UUID.
> > > > > >
> > > > > > The VF token can be set by a vfio-pci based PF driver and must be
> > known
> > > > > > by the vfio-pci based VF driver in order to gain access to the device.
> > > > > > This allows the degree to which this VF token is considered secret to
> > be
> > > > > > determined by the applications and environment. For example a VM
> > > > might
> > > > > > generate a random UUID known only internally to the hypervisor
> > while a
> > > > > > userspace networking appliance might use a shared, or even well
> > know,
> > > > > > UUID among the application drivers.
> > > > > >
> > > > > > To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD
> > interface
> > > > is
> > > > > > extended to accept key=value pairs in addition to the device name.
> > This
> > > > > > allows us to most easily deny user access to the device without risk
> > > > > > that existing userspace drivers assume region offsets, IRQs, and other
> > > > > > device features, leading to more elaborate error paths. The format of
> > > > > > these options are expected to take the form:
> > > > > >
> > > > > > "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"
> > > > > >
> > > > > > Where the device name is always provided first for compatibility and
> > > > > > additional options are specified in a space separated list. The
> > > > > > relation between and requirements for the additional options will be
> > > > > > vfio bus driver dependent, however unknown or unused option
> > within
> > > > this
> > > > > > schema should return error. This allow for future use of unknown
> > > > > > options as well as a positive indication to the user that an option is
> > > > > > used.
> > > > > >
> > > > > > An example VF token option would take this form:
> > > > > >
> > > > > > "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"
> > > > > >
> > > > > > When accessing a VF where the PF is making use of vfio-pci, the user
> > > > > > MUST provide the current vf_token. When accessing a PF, the user
> > MUST
> > > > > > provide the current vf_token IF there are active VF users or MAY
> > provide
> > > > > > a vf_token in order to set the current VF token when no VF users are
> > > > > > active. The former requirement assures VF users that an
> > unassociated
> > > > > > driver cannot usurp the PF device. These semantics also imply that a
> > > > > > VF token MUST be set by a PF driver before VF drivers can access their
> > > > > > device, the default token is random and mechanisms to read the
> > token
> > > > are
> > > > > > not provided in order to protect the VF token of previous users. Use
> > of
> > > > > > the vf_token option outside of these cases will return an error, as
> > > > > > discussed above.
> > > > > >
> > > > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > > > ---
> > > > > > drivers/vfio/pci/vfio_pci.c | 198
> > > > > > +++++++++++++++++++++++++++++++++++
> > > > > > drivers/vfio/pci/vfio_pci_private.h | 8 +
> > > > > > 2 files changed, 205 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > > > > index 2ec6c31d0ab0..8dd6ef9543ca 100644
> > > > > > --- a/drivers/vfio/pci/vfio_pci.c
> > > > > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > > > > @@ -466,6 +466,44 @@ static void vfio_pci_disable(struct
> > > > vfio_pci_device
> > > > > > *vdev)
> > > > > > vfio_pci_set_power_state(vdev, PCI_D3hot);
> > > > > > }
> > > > > >
> > > > > > +static struct pci_driver vfio_pci_driver;
> > > > > > +
> > > > > > +static struct vfio_pci_device *get_pf_vdev(struct vfio_pci_device
> > *vdev,
> > > > > > + struct vfio_device **pf_dev)
> > > > > > +{
> > > > > > + struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > > > > > +
> > > > > > + if (!vdev->pdev->is_virtfn)
> > > > > > + return NULL;
> > > > > > +
> > > > > > + *pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > > > > > + if (!*pf_dev)
> > > > > > + return NULL;
> > > > > > +
> > > > > > + if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> > > > > > + vfio_device_put(*pf_dev);
> > > > > > + return NULL;
> > > > > > + }
> > > > > > +
> > > > > > + return vfio_device_data(*pf_dev);
> > > > > > +}
> > > > > > +
> > > > > > +static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev,
> > int
> > > > val)
> > > > > > +{
> > > > > > + struct vfio_device *pf_dev;
> > > > > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> > &pf_dev);
> > > > > > +
> > > > > > + if (!pf_vdev)
> > > > > > + return;
> > > > > > +
> > > > > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > > > > + pf_vdev->vf_token->users += val;
> > > > > > + WARN_ON(pf_vdev->vf_token->users < 0);
> > > > > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > > > > +
> > > > > > + vfio_device_put(pf_dev);
> > > > > > +}
> > > > > > +
> > > > > > static void vfio_pci_release(void *device_data)
> > > > > > {
> > > > > > struct vfio_pci_device *vdev = device_data;
> > > > > > @@ -473,6 +511,7 @@ static void vfio_pci_release(void *device_data)
> > > > > > mutex_lock(&vdev->reflck->lock);
> > > > > >
> > > > > > if (!(--vdev->refcnt)) {
> > > > > > + vfio_pci_vf_token_user_add(vdev, -1);
> > > > > > vfio_spapr_pci_eeh_release(vdev->pdev);
> > > > > > vfio_pci_disable(vdev);
> > > > > > }
> > > > > > @@ -498,6 +537,7 @@ static int vfio_pci_open(void *device_data)
> > > > > > goto error;
> > > > > >
> > > > > > vfio_spapr_pci_eeh_open(vdev->pdev);
> > > > > > + vfio_pci_vf_token_user_add(vdev, 1);
> > > > > > }
> > > > > > vdev->refcnt++;
> > > > > > error:
> > > > > > @@ -1278,11 +1318,148 @@ static void vfio_pci_request(void
> > > > *device_data,
> > > > > > unsigned int count)
> > > > > > mutex_unlock(&vdev->igate);
> > > > > > }
> > > > > >
> > > > > > +static int vfio_pci_validate_vf_token(struct vfio_pci_device *vdev,
> > > > > > + bool vf_token, uuid_t *uuid)
> > > > > > +{
> > > > > > + /*
> > > > > > + * There's always some degree of trust or collaboration
> > between SR-
> > > > > > IOV
> > > > > > + * PF and VFs, even if just that the PF hosts the SR-IOV
> > capability and
> > > > > > + * can disrupt VFs with a reset, but often the PF has more
> > explicit
> > > > > > + * access to deny service to the VF or access data passed
> > through the
> > > > > > + * VF. We therefore require an opt-in via a shared VF token
> > (UUID)
> > > > > > to
> > > > > > + * represent this trust. This both prevents that a VF driver
> > might
> > > > > > + * assume the PF driver is a trusted, in-kernel driver, and also
> > that
> > > > > > + * a PF driver might be replaced with a rogue driver, unknown
> > to in-
> > > > > > use
> > > > > > + * VF drivers.
> > > > > > + *
> > > > > > + * Therefore when presented with a VF, if the PF is a vfio
> > device and
> > > > > > + * it is bound to the vfio-pci driver, the user needs to provide
> > a VF
> > > > > > + * token to access the device, in the form of appending a
> > vf_token to
> > > > > > + * the device name, for example:
> > > > > > + *
> > > > > > + * "0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-
> > f591514ba1f3"
> > > > > > + *
> > > > > > + * When presented with a PF which has VFs in use, the user
> > must also
> > > > > > + * provide the current VF token to prove collaboration with
> > existing
> > > > > > + * VF users. If VFs are not in use, the VF token provided for
> > the PF
> > > > > > + * device will act to set the VF token.
> > > > > > + *
> > > > > > + * If the VF token is provided but unused, a fault is generated.
> > > > >
> > > > > fault->error, otherwise it is easy to consider a CPU fault. ????
> > > >
> > > > Ok, I can make that change, but I think you might have a unique
> > > > background to make a leap that a userspace ioctl can trigger a CPU
> > > > fault ;)
> > > >
> > > > > > + */
> > > > > > + if (!vdev->pdev->is_virtfn && !vdev->vf_token && !vf_token)
> > > > > > + return 0; /* No VF token provided or required */
> > > > > > +
> > > > > > + if (vdev->pdev->is_virtfn) {
> > > > > > + struct vfio_device *pf_dev;
> > > > > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> > > > > > &pf_dev);
> > > > > > + bool match;
> > > > > > +
> > > > > > + if (!pf_vdev) {
> > > > > > + if (!vf_token)
> > > > > > + return 0; /* PF is not vfio-pci, no VF
> > token */
> > > > > > +
> > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > + "VF token incorrectly provided, PF not
> > bound
> > > > > > to vfio-pci\n");
> > > > > > + return -EINVAL;
> > > > > > + }
> > > > > > +
> > > > > > + if (!vf_token) {
> > > > > > + vfio_device_put(pf_dev);
> > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > + "VF token required to access
> > device\n");
> > > > > > + return -EACCES;
> > > > > > + }
> > > > > > +
> > > > > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > > > > + match = uuid_equal(uuid, &pf_vdev->vf_token-
> > >uuid);
> > > > > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > > > > +
> > > > > > + vfio_device_put(pf_dev);
> > > > > > +
> > > > > > + if (!match) {
> > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > + "Incorrect VF token provided for
> > device\n");
> > > > > > + return -EACCES;
> > > > > > + }
> > > > > > + } else if (vdev->vf_token) {
> > > > > > + mutex_lock(&vdev->vf_token->lock);
> > > > > > + if (vdev->vf_token->users) {
> > > > > > + if (!vf_token) {
> > > > > > + mutex_unlock(&vdev->vf_token-
> > >lock);
> > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > + "VF token required to access
> > > > > > device\n");
> > > > > > + return -EACCES;
> > > > > > + }
> > > > > > +
> > > > > > + if (!uuid_equal(uuid, &vdev->vf_token->uuid))
> > {
> > > > > > + mutex_unlock(&vdev->vf_token-
> > >lock);
> > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > + "Incorrect VF token provided
> > for
> > > > > > device\n");
> > > > > > + return -EACCES;
> > > > > > + }
> > > > > > + } else if (vf_token) {
> > > > > > + uuid_copy(&vdev->vf_token->uuid, uuid);
> > > > > > + }
> > > > >
> > > > > It implies that we allow PF to be accessed w/o providing a VF token,
> > > > > as long as no VF is currently in-use, which further means no VF can
> > > > > be further assigned since no one knows the random uuid allocated
> > > > > by vfio. Just want to confirm whether it is the desired flavor. If an
> > > > > user really wants to use PF-only, possibly he should disable SR-IOV
> > > > > instead...
> > > >
> > > > Yes, this is the behavior I'm intending. Are you suggesting that we
> > > > should require a VF token in order to access a PF that has SR-IOV
> > > > already enabled? This introduces an inconsistency that SR-IOV can be
> > >
> > > yes. I felt that it's meaningless otherwise if an user has no attempt to
> > > manage SR-IOV but still leaving it enabled. In many cases, enabling of
> > > SR-IOV may reserve some resource in the hardware, thus simply hurting
> > > PF performance.
> >
> > But a user needs to be granted access to a device by a privileged
> > entity and the privileged entity may also enable SR-IOV, so it seems
> > you're assuming the privileged entity is operating independently and
> > not in the best interest of enabling the specific user case.
>
> what about throwing out a warning for such situation? so the userspace
> knows some collaboration is missing before its access to the device.

This seems arbitrary. pci-pf-stub proves to us that there are devices
that need no special setup for SR-IOV, we don't know that we don't have
such a device. Enabling SR-IOV after the user opens the device also
doesn't indicate there's necessarily collaboration between the two, so
if we generate a warning on one, how do we assume the other is ok? I
don't really understand why this is generating such concern. Thanks,

Alex

> > > > enabled via sysfs asynchronous to the GET_DEVICE_FD ioctl, so we'd
> > need
> > > > to secure the sysfs interface to only allow enabling SR-IOV when the PF
> > > > is already opened to cases where the VF token is already set? Thus
> > >
> > > yes, the PF is assigned to the userspace driver, thus it's reasonable to
> > > have the userspace driver decide whether to enable or disable SR-IOV
> > > when the PF is under its control. as I replied to patch [5/7], the sysfs
> > > interface alone looks problematic w/o knowing whether the userspace
> > > driver is willing to manage VFs (by setting a token)...
> >
> > As I replied in patch [5/7] the operations don't need to happen
> > independently, configuring SR-IOV in advance of the user driver
> > attaching or in collaboration with the user driver can also be enabled
> > with this series as is. Allowing the user driver to directly enable
> > SR-IOV and create VFs in the host is something I've avoided here, but
> > not precluded for later extensions. I think that allowing a user to
> > perform these operations represents a degree of privilege beyond
> > ownership of the PF itself, which is why I'm currently only enabling
> > the sysfs sriov_configure interface. The user driver needs to work in
> > collaboration with a privileged entity to enable SR-IOV, or be granted
> > access to operate on the sysfs interface directly.
>
> Thanks. this assumption was clearly overlooked in my previous thinking.
>
> >
> > > > SR-IOV could be pre-enabled, but the user must provide a vf_token
> > > > option on GET_DEVICE_FD, otherwise SR-IOV could only be enabled after
> > > > the user sets a VF token. But then do we need to invalidate the token
> > > > at some point, or else it seems like we have the same scenario when the
> > > > next user comes along. We believe there are PFs that require no
> > >
> > > I think so, e.g. when SR-IOV is being disabled, or when the fd is closed.
> >
> > Can you articulate a specific risk that this would resolve? If we have
> > devices like the one supported by pci-pf-stub, where it's apparently
> > sufficient to provide no device access other than to enable SR-IOV on
> > the PF, re-implementing that in vfio-pci would require that the
> > userspace driver is notified when the SR-IOV configuration is changed
> > such that a VF token can be re-inserted. For what gain?
> >
> > > > special VF support other than sriov_configure, so those driver could
> > > > theoretically close the PF after setting a VF token. That makes it
> > >
> > > theoretically yes, but I'm not sure the real gain of supporting such
> > > usage. ????
> >
> > Likewise I don't see the gain of restricting it.
> >
> > > btw with your question I realize another potential open. Now an
> > > user could also use sysfs to reset the PF, which definitely affects the
> > > state of VFs. Do we want a token match with that path? or such
> > > intention is assumed to be trusted by VF drivers given that only
> > > privileged users can do it?
> >
> > I think we're going into the weeds here, a privileged user can use the
> > pci-sysfs reset interface to break all sorts of things. I'm certainly
> > not going to propose any sort of VF token interface to restrict it.
> > Privileged users can do bad things via sysfs. Privileged users can
> > configure PFs in ways that may not be compatible with any given
> > userspace VF driver. I'm assuming collaboration in the best interest
> > of enabling the user driver. Thanks,
> >
> > Alex
> >
> > > > difficult to determine the lifetime of a VF token and leads to the
> > > > interface proposed here of an initial random token, then the user set
> > > > token persisting indefinitely.
> > > >
> > > > I've tended consider all of these to be mechanisms that a user can
> > > > shoot themselves in the foot. Yes, the user and admin can do things
> > > > that will fail to work with this interface, for example my testing
> > > > involves QEMU, where we don't expose SR-IOV to the guest yet and the
> > > > igb driver for the PF will encounter problems running a device with
> > > > SR-IOV enabled that it doesn't know about. Do we want to try to play
> > > > nanny and require specific semantics? I've opt'd for the more simple
> > > > code here.
> > > >
> > > > > > +
> > > > > > + mutex_unlock(&vdev->vf_token->lock);
> > > > > > + } else if (vf_token) {
> > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > + "VF token incorrectly provided, not a PF or
> > VF\n");
> > > > > > + return -EINVAL;
> > > > > > + }
> > > > > > +
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > +#define VF_TOKEN_ARG "vf_token="
> > > > > > +
> > > > > > static int vfio_pci_match(void *device_data, char *buf)
> > > > > > {
> > > > > > struct vfio_pci_device *vdev = device_data;
> > > > > > + bool vf_token = false;
> > > > > > + uuid_t uuid;
> > > > > > + int ret;
> > > > > > +
> > > > > > + if (strncmp(pci_name(vdev->pdev), buf,
> > strlen(pci_name(vdev-
> > > > > > >pdev))))
> > > > > > + return 0; /* No match */
> > > > > > +
> > > > > > + if (strlen(buf) > strlen(pci_name(vdev->pdev))) {
> > > > > > + buf += strlen(pci_name(vdev->pdev));
> > > > > > +
> > > > > > + if (*buf != ' ')
> > > > > > + return 0; /* No match: non-whitespace after
> > name */
> > > > > > +
> > > > > > + while (*buf) {
> > > > > > + if (*buf == ' ') {
> > > > > > + buf++;
> > > > > > + continue;
> > > > > > + }
> > > > > > +
> > > > > > + if (!vf_token && !strncmp(buf,
> > VF_TOKEN_ARG,
> > > > > > +
> > strlen(VF_TOKEN_ARG))) {
> > > > > > + buf += strlen(VF_TOKEN_ARG);
> > > > > > +
> > > > > > + if (strlen(buf) < UUID_STRING_LEN)
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + ret = uuid_parse(buf, &uuid);
> > > > > > + if (ret)
> > > > > > + return ret;
> > > > > >
> > > > > > - return !strcmp(pci_name(vdev->pdev), buf);
> > > > > > + vf_token = true;
> > > > > > + buf += UUID_STRING_LEN;
> > > > > > + } else {
> > > > > > + /* Unknown/duplicate option */
> > > > > > + return -EINVAL;
> > > > > > + }
> > > > > > + }
> > > > > > + }
> > > > > > +
> > > > > > + ret = vfio_pci_validate_vf_token(vdev, vf_token, &uuid);
> > > > > > + if (ret)
> > > > > > + return ret;
> > > > > > +
> > > > > > + return 1; /* Match */
> > > > > > }
> > > > > >
> > > > > > static const struct vfio_device_ops vfio_pci_ops = {
> > > > > > @@ -1354,6 +1531,19 @@ static int vfio_pci_probe(struct pci_dev
> > *pdev,
> > > > > > const struct pci_device_id *id)
> > > > > > return ret;
> > > > > > }
> > > > > >
> > > > > > + if (pdev->is_physfn) {
> > > > > > + vdev->vf_token = kzalloc(sizeof(*vdev->vf_token),
> > > > > > GFP_KERNEL);
> > > > > > + if (!vdev->vf_token) {
> > > > > > + vfio_pci_reflck_put(vdev->reflck);
> > > > > > + vfio_del_group_dev(&pdev->dev);
> > > > > > + vfio_iommu_group_put(group, &pdev->dev);
> > > > > > + kfree(vdev);
> > > > > > + return -ENOMEM;
> > > > > > + }
> > > > > > + mutex_init(&vdev->vf_token->lock);
> > > > > > + uuid_gen(&vdev->vf_token->uuid);
> > > > >
> > > > > should we also regenerate a random uuid somewhere when SR-IOV is
> > > > > disabled and then re-enabled on a PF? Although vfio disallows
> > userspace
> > > > > to read uuid, it is always safer to avoid caching a secret from previous
> > > > > user.
> > > >
> > > > What if our user is QEMU emulating SR-IOV to the guest. Do we want to
> > > > force a new VF token is set every time we bounce the VFs? Why? As
> > > > above, the session lifetime of the VF token might be difficult to
> > > > determine and I'm not sure paranoia is a sufficient reason to try to
> > > > create boundaries for it. Thanks,
> > > >
> > > > Alex
> > > >
> > > > > > + }
> > > > > > +
> > > > > > if (vfio_pci_is_vga(pdev)) {
> > > > > > vga_client_register(pdev, vdev, NULL,
> > > > > > vfio_pci_set_vga_decode);
> > > > > > vga_set_legacy_decoding(pdev,
> > > > > > @@ -1387,6 +1577,12 @@ static void vfio_pci_remove(struct pci_dev
> > > > *pdev)
> > > > > > if (!vdev)
> > > > > > return;
> > > > > >
> > > > > > + if (vdev->vf_token) {
> > > > > > + WARN_ON(vdev->vf_token->users);
> > > > > > + mutex_destroy(&vdev->vf_token->lock);
> > > > > > + kfree(vdev->vf_token);
> > > > > > + }
> > > > > > +
> > > > > > vfio_pci_reflck_put(vdev->reflck);
> > > > > >
> > > > > > vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev);
> > > > > > diff --git a/drivers/vfio/pci/vfio_pci_private.h
> > > > > > b/drivers/vfio/pci/vfio_pci_private.h
> > > > > > index 8a2c7607d513..76c11c915949 100644
> > > > > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > > > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > > > > @@ -12,6 +12,7 @@
> > > > > > #include <linux/pci.h>
> > > > > > #include <linux/irqbypass.h>
> > > > > > #include <linux/types.h>
> > > > > > +#include <linux/uuid.h>
> > > > > >
> > > > > > #ifndef VFIO_PCI_PRIVATE_H
> > > > > > #define VFIO_PCI_PRIVATE_H
> > > > > > @@ -84,6 +85,12 @@ struct vfio_pci_reflck {
> > > > > > struct mutex lock;
> > > > > > };
> > > > > >
> > > > > > +struct vfio_pci_vf_token {
> > > > > > + struct mutex lock;
> > > > > > + uuid_t uuid;
> > > > > > + int users;
> > > > > > +};
> > > > > > +
> > > > > > struct vfio_pci_device {
> > > > > > struct pci_dev *pdev;
> > > > > > void __iomem *barmap[PCI_STD_NUM_BARS];
> > > > > > @@ -122,6 +129,7 @@ struct vfio_pci_device {
> > > > > > struct list_head dummy_resources_list;
> > > > > > struct mutex ioeventfds_lock;
> > > > > > struct list_head ioeventfds_list;
> > > > > > + struct vfio_pci_vf_token *vf_token;
> > > > > > };
> > > > > >
> > > > > > #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
> > > > >
> > >
>

2020-03-09 00:47:32

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] vfio/pci: Add sriov_configure support

On Sat, 7 Mar 2020 01:35:23 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Alex Williamson
> > Sent: Saturday, March 7, 2020 6:18 AM
> >
> > On Fri, 6 Mar 2020 07:57:19 +0000
> > "Tian, Kevin" <[email protected]> wrote:
> >
> > > > From: Alex Williamson <[email protected]>
> > > > Sent: Friday, March 6, 2020 2:23 AM
> > > >
> > > > On Tue, 25 Feb 2020 03:08:00 +0000
> > > > "Tian, Kevin" <[email protected]> wrote:
> > > >
> > > > > > From: Alex Williamson
> > > > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > > > >
> > > > > > With the VF Token interface we can now expect that a vfio userspace
> > > > > > driver must be in collaboration with the PF driver, an unwitting
> > > > > > userspace driver will not be able to get past the GET_DEVICE_FD step
> > > > > > in accessing the device. We can now move on to actually allowing
> > > > > > SR-IOV to be enabled by vfio-pci on the PF. Support for this is not
> > > > > > enabled by default in this commit, but it does provide a module
> > option
> > > > > > for this to be enabled (enable_sriov=1). Enabling VFs is rather
> > > > > > straightforward, except we don't want to risk that a VF might get
> > > > > > autoprobed and bound to other drivers, so a bus notifier is used to
> > > > > > "capture" VFs to vfio-pci using the driver_override support. We
> > > > > > assume any later action to bind the device to other drivers is
> > > > > > condoned by the system admin and allow it with a log warning.
> > > > > >
> > > > > > vfio-pci will disable SR-IOV on a PF before releasing the device,
> > > > > > allowing a VF driver to be assured other drivers cannot take over the
> > > > > > PF and that any other userspace driver must know the shared VF
> > token.
> > > > > > This support also does not provide a mechanism for the PF userspace
> > > > > > driver itself to manipulate SR-IOV through the vfio API. With this
> > > > > > patch SR-IOV can only be enabled via the host sysfs interface and the
> > > > > > PF driver user cannot create or remove VFs.
> > > > >
> > > > > I'm not sure how many devices can be properly configured simply
> > > > > with pci_enable_sriov. It is not unusual to require PF driver prepare
> > > > > something before turning PCI SR-IOV capability. If you look kernel
> > > > > PF drivers, there are only two using generic pci_sriov_configure_
> > > > > simple (simple wrapper like pci_enable_sriov), while most others
> > > > > implementing their own callback. However vfio itself has no idea
> > > > > thus I'm not sure how an user knows whether using this option can
> > > > > actually meet his purpose. I may miss something here, possibly
> > > > > using DPDK as an example will make it clearer.
> > > >
> > > > There is still the entire vfio userspace driver interface. Imagine for
> > > > example that QEMU emulates the SR-IOV capability and makes a call out
> > > > to libvirt (or maybe runs with privs for the PF SR-IOV sysfs attribs)
> > > > when the guest enables SR-IOV. Can't we assume that any PF specific
> > > > support can still be performed in the userspace/guest driver, leaving
> > > > us with a very simple and generic sriov_configure callback in vfio-pci?
> > >
> > > Makes sense. One concern, though, is how an user could be warned
> > > if he inadvertently uses sysfs to enable SR-IOV on a vfio device whose
> > > userspace driver is incapable of handling it. Note any VFIO device,
> > > if SR-IOV capable, will allow user to do so once the module option is
> > > turned on and the callback is registered. I felt such uncertainty can be
> > > contained by toggling SR-IOV through a vfio api, but from your description
> > > obviously it is what you want to avoid. Is it due to the sequence reason,
> > > e.g. that SR-IOV must be enabled before userspace PF driver sets the
> > > token?
> >
> > As in my other reply, enabling SR-IOV via a vfio API suggests that
> > we're not only granting the user owning the PF device access to the
> > device itself, but also the ability to create and remove subordinate
> > devices on the host. That implies an extended degree of trust in the
> > user beyond the PF device itself and raises questions about whether a
> > user who is allowed to create VF devices should automatically be
> > granted access to those VF devices, what the mechanism would be for
> > that, and how we might re-assign those devices to other users,
> > potentially including host kernel usage. What I'm proposing here
> > doesn't preclude some future extension in that direction, but instead
> > tries to simplify a first step towards enabling SR-IOV by leaving the
> > SR-IOV enablement and VF assignment in the realm of a privileged system
> > entity.
>
> the intention is clear to me now.
>
> >
> > So, what I think you're suggesting here is that we should restrict
> > vfio_pci_sriov_configure() to reject enabling SR-IOV until a user
> > driver has configured a VF token. That requires both that the
> > userspace driver has initialized to this point before SR-IOV can be
> > enabled and that we would be forced to define a termination point for
> > the user set VF token. Logically, this would need to be when the
> > userspace driver exits or closes the PF device, which implies that we
> > need to disable SR-IOV on the PF at this point, or we're left in an
> > inconsistent state where VFs are enabled but cannot be disabled because
> > we don't have a valid VF token. Now we're back to nearly a state where
> > the user has control of not creating devices on the host, but removing
> > them by closing the device, which will necessarily require that any VF
> > driver release the device, whether userspace or kernel.
> >
> > I'm not sure what we're gaining by doing this though. I agree that
> > there will be users that enable SR-IOV on a PF and then try to, for
> > example, assign the PF and all the VFs to a VM. The VFs will fail due
> > to lacking VF token support, unless they've patch QEMU with my test
> > code, but depending on the PF driver in the guest, it may, or more
> > likely won't work. But don't you think the VFs and probably PF not
> > working is a sufficient clue that the configuration is invalid? OTOH,
> > from what I've heard of the device in the ID table of the pci-pf-stub
> > driver, they might very well be able to work with both PF and VFs in
> > QEMU using only my test code to set the VF token.
> >
> > Therefore, I'm afraid what you're asking for here is to impose a usage
> > restriction as a sanity test, when we don't really know what might be
> > sane for this particular piece of hardware or use case. There are
> > infinite ways that a vfio based userspace driver can fail to configure
> > their hardware and make it work correctly, many of them are device
> > specific. Isn't this just one of those cases? Thanks,
> >
>
> what you said all makes sense. so I withdraw the idea of manipulating
> SR-IOV through vfio ioctl. However I still feel that simply registering
> sriov_configuration callback by vfio-pci somehow violates the typical
> expectation of the sysfs interface. Before this patch, the success return
> of writing non-zero value to numvfs implies VFs are in sane state and
> functionally ready for immediate use. However now the behavior of
> success return becomes undefined for vfio devices, since even vfio-pci
> itself doesn't know whether VFs are functional for a random device
> (may know some if carrying the same device IDs from pci-pf-stub). It
> simply relies on the privileged entity who knows exactly the implication
> of such write, while there is no way to warn inadvertent users which
> to me is not a good design from kernel API p.o.v. Of course we may
> document such restriction and the driver_override may also be an
> indirect way to warn such user if he wants to use VFs for other purpose.
> But it is still less elegant than reporting it in the first place. Maybe
> what we really require is a new sysfs attribute purely for enabling
> PCI SR-IOV capability, which doesn't imply making VFs actually
> functional as did through the existing numvfs?

I don't read the same guarantee into the sysfs SR-IOV interface. If
such a guarantee exists, it's already broken by pci-pf-stub, which like
vfio-pci allows dynamic IDs and driver_override to bind to any PF device
allowing the ability to create (potentially) non-functional VFs. I
think it would be a really bad decision to fork a new sysfs interface
for this. I've already made SR-IOV support in vfio-pci an opt-in via a
module option, would it ease your concerns if I elaborate in the text
for the option that enabling SR-IOV may depend on support provided by a
vfio-pci userspace driver?

I think that without absolutely knowing that an operation is incorrect,
we're just generating noise and confusion by triggering warnings or
developing alternate interfaces. Unfortunately, we have no generic
means of knowing that an operation is incorrect, so I assume the best.
Thanks,

Alex

2020-03-09 01:35:25

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 3/7] vfio/pci: Introduce VF token

> From: Tian, Kevin
> Sent: Monday, March 9, 2020 9:22 AM
>
> > From: Alex Williamson <[email protected]>
> > Sent: Monday, March 9, 2020 8:46 AM
> >
> > On Sat, 7 Mar 2020 01:04:41 +0000
> > "Tian, Kevin" <[email protected]> wrote:
> >
> > > > From: Alex Williamson <[email protected]>
> > > > Sent: Friday, March 6, 2020 11:39 PM
> > > >
> > > > On Fri, 6 Mar 2020 08:32:40 +0000
> > > > "Tian, Kevin" <[email protected]> wrote:
> > > >
> > > > > > From: Alex Williamson <[email protected]>
> > > > > > Sent: Friday, March 6, 2020 2:18 AM
> > > > > >
> > > > > > On Tue, 25 Feb 2020 02:59:37 +0000
> > > > > > "Tian, Kevin" <[email protected]> wrote:
> > > > > >
> > > > > > > > From: Alex Williamson
> > > > > > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > > > > > >
> > > > > > > > If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are
> > not
> > > > > > > > fully isolated from the PF. The PF can always cause a denial of
> > service
> > > > > > > > to the VF, even if by simply resetting itself. The degree to which
> a
> > PF
> > > > > > > > can access the data passed through a VF or interfere with its
> > > > operation
> > > > > > > > is dependent on a given SR-IOV implementation. Therefore we
> > want
> > > > to
> > > > > > > > avoid a scenario where an existing vfio-pci based userspace
> driver
> > > > might
> > > > > > > > assume the PF driver is trusted, for example assigning a PF to
> one
> > VM
> > > > > > > > and VF to another with some expectation of isolation. IOMMU
> > > > grouping
> > > > > > > > could be a solution to this, but imposes an unnecessarily strong
> > > > > > > > relationship between PF and VF drivers if they need to operate
> > with
> > > > the
> > > > > > > > same IOMMU context. Instead we introduce a "VF token", which
> > is
> > > > > > > > essentially just a shared secret between PF and VF drivers,
> > > > implemented
> > > > > > > > as a UUID.
> > > > > > > >
> > > > > > > > The VF token can be set by a vfio-pci based PF driver and must
> be
> > > > known
> > > > > > > > by the vfio-pci based VF driver in order to gain access to the
> device.
> > > > > > > > This allows the degree to which this VF token is considered
> secret
> > to
> > > > be
> > > > > > > > determined by the applications and environment. For example a
> > VM
> > > > > > might
> > > > > > > > generate a random UUID known only internally to the hypervisor
> > > > while a
> > > > > > > > userspace networking appliance might use a shared, or even well
> > > > know,
> > > > > > > > UUID among the application drivers.
> > > > > > > >
> > > > > > > > To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD
> > > > interface
> > > > > > is
> > > > > > > > extended to accept key=value pairs in addition to the device
> name.
> > > > This
> > > > > > > > allows us to most easily deny user access to the device without
> risk
> > > > > > > > that existing userspace drivers assume region offsets, IRQs, and
> > other
> > > > > > > > device features, leading to more elaborate error paths. The
> > format of
> > > > > > > > these options are expected to take the form:
> > > > > > > >
> > > > > > > > "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"
> > > > > > > >
> > > > > > > > Where the device name is always provided first for compatibility
> > and
> > > > > > > > additional options are specified in a space separated list. The
> > > > > > > > relation between and requirements for the additional options
> will
> > be
> > > > > > > > vfio bus driver dependent, however unknown or unused option
> > > > within
> > > > > > this
> > > > > > > > schema should return error. This allow for future use of
> unknown
> > > > > > > > options as well as a positive indication to the user that an option
> is
> > > > > > > > used.
> > > > > > > >
> > > > > > > > An example VF token option would take this form:
> > > > > > > >
> > > > > > > > "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-
> 8569e5b08258"
> > > > > > > >
> > > > > > > > When accessing a VF where the PF is making use of vfio-pci, the
> > user
> > > > > > > > MUST provide the current vf_token. When accessing a PF, the
> > user
> > > > MUST
> > > > > > > > provide the current vf_token IF there are active VF users or MAY
> > > > provide
> > > > > > > > a vf_token in order to set the current VF token when no VF users
> > are
> > > > > > > > active. The former requirement assures VF users that an
> > > > unassociated
> > > > > > > > driver cannot usurp the PF device. These semantics also imply
> that
> > a
> > > > > > > > VF token MUST be set by a PF driver before VF drivers can access
> > their
> > > > > > > > device, the default token is random and mechanisms to read the
> > > > token
> > > > > > are
> > > > > > > > not provided in order to protect the VF token of previous users.
> > Use
> > > > of
> > > > > > > > the vf_token option outside of these cases will return an error,
> as
> > > > > > > > discussed above.
> > > > > > > >
> > > > > > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > > > > > ---
> > > > > > > > drivers/vfio/pci/vfio_pci.c | 198
> > > > > > > > +++++++++++++++++++++++++++++++++++
> > > > > > > > drivers/vfio/pci/vfio_pci_private.h | 8 +
> > > > > > > > 2 files changed, 205 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/vfio/pci/vfio_pci.c
> b/drivers/vfio/pci/vfio_pci.c
> > > > > > > > index 2ec6c31d0ab0..8dd6ef9543ca 100644
> > > > > > > > --- a/drivers/vfio/pci/vfio_pci.c
> > > > > > > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > > > > > > @@ -466,6 +466,44 @@ static void vfio_pci_disable(struct
> > > > > > vfio_pci_device
> > > > > > > > *vdev)
> > > > > > > > vfio_pci_set_power_state(vdev, PCI_D3hot);
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static struct pci_driver vfio_pci_driver;
> > > > > > > > +
> > > > > > > > +static struct vfio_pci_device *get_pf_vdev(struct vfio_pci_device
> > > > *vdev,
> > > > > > > > + struct vfio_device **pf_dev)
> > > > > > > > +{
> > > > > > > > + struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > > > > > > > +
> > > > > > > > + if (!vdev->pdev->is_virtfn)
> > > > > > > > + return NULL;
> > > > > > > > +
> > > > > > > > + *pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > > > > > > > + if (!*pf_dev)
> > > > > > > > + return NULL;
> > > > > > > > +
> > > > > > > > + if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> > > > > > > > + vfio_device_put(*pf_dev);
> > > > > > > > + return NULL;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + return vfio_device_data(*pf_dev);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vfio_pci_vf_token_user_add(struct vfio_pci_device
> > *vdev,
> > > > int
> > > > > > val)
> > > > > > > > +{
> > > > > > > > + struct vfio_device *pf_dev;
> > > > > > > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> > > > &pf_dev);
> > > > > > > > +
> > > > > > > > + if (!pf_vdev)
> > > > > > > > + return;
> > > > > > > > +
> > > > > > > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > > > > > > + pf_vdev->vf_token->users += val;
> > > > > > > > + WARN_ON(pf_vdev->vf_token->users < 0);
> > > > > > > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > > > > > > +
> > > > > > > > + vfio_device_put(pf_dev);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > static void vfio_pci_release(void *device_data)
> > > > > > > > {
> > > > > > > > struct vfio_pci_device *vdev = device_data;
> > > > > > > > @@ -473,6 +511,7 @@ static void vfio_pci_release(void
> > *device_data)
> > > > > > > > mutex_lock(&vdev->reflck->lock);
> > > > > > > >
> > > > > > > > if (!(--vdev->refcnt)) {
> > > > > > > > + vfio_pci_vf_token_user_add(vdev, -1);
> > > > > > > > vfio_spapr_pci_eeh_release(vdev->pdev);
> > > > > > > > vfio_pci_disable(vdev);
> > > > > > > > }
> > > > > > > > @@ -498,6 +537,7 @@ static int vfio_pci_open(void
> *device_data)
> > > > > > > > goto error;
> > > > > > > >
> > > > > > > > vfio_spapr_pci_eeh_open(vdev->pdev);
> > > > > > > > + vfio_pci_vf_token_user_add(vdev, 1);
> > > > > > > > }
> > > > > > > > vdev->refcnt++;
> > > > > > > > error:
> > > > > > > > @@ -1278,11 +1318,148 @@ static void vfio_pci_request(void
> > > > > > *device_data,
> > > > > > > > unsigned int count)
> > > > > > > > mutex_unlock(&vdev->igate);
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static int vfio_pci_validate_vf_token(struct vfio_pci_device
> *vdev,
> > > > > > > > + bool vf_token, uuid_t *uuid)
> > > > > > > > +{
> > > > > > > > + /*
> > > > > > > > + * There's always some degree of trust or collaboration
> > > > between SR-
> > > > > > > > IOV
> > > > > > > > + * PF and VFs, even if just that the PF hosts the SR-IOV
> > > > capability and
> > > > > > > > + * can disrupt VFs with a reset, but often the PF has more
> > > > explicit
> > > > > > > > + * access to deny service to the VF or access data passed
> > > > through the
> > > > > > > > + * VF. We therefore require an opt-in via a shared VF token
> > > > (UUID)
> > > > > > > > to
> > > > > > > > + * represent this trust. This both prevents that a VF driver
> > > > might
> > > > > > > > + * assume the PF driver is a trusted, in-kernel driver, and also
> > > > that
> > > > > > > > + * a PF driver might be replaced with a rogue driver, unknown
> > > > to in-
> > > > > > > > use
> > > > > > > > + * VF drivers.
> > > > > > > > + *
> > > > > > > > + * Therefore when presented with a VF, if the PF is a vfio
> > > > device and
> > > > > > > > + * it is bound to the vfio-pci driver, the user needs to provide
> > > > a VF
> > > > > > > > + * token to access the device, in the form of appending a
> > > > vf_token to
> > > > > > > > + * the device name, for example:
> > > > > > > > + *
> > > > > > > > + * "0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-
> > > > f591514ba1f3"
> > > > > > > > + *
> > > > > > > > + * When presented with a PF which has VFs in use, the user
> > > > must also
> > > > > > > > + * provide the current VF token to prove collaboration with
> > > > existing
> > > > > > > > + * VF users. If VFs are not in use, the VF token provided for
> > > > the PF
> > > > > > > > + * device will act to set the VF token.
> > > > > > > > + *
> > > > > > > > + * If the VF token is provided but unused, a fault is generated.
> > > > > > >
> > > > > > > fault->error, otherwise it is easy to consider a CPU fault. ????
> > > > > >
> > > > > > Ok, I can make that change, but I think you might have a unique
> > > > > > background to make a leap that a userspace ioctl can trigger a CPU
> > > > > > fault ;)
> > > > > >
> > > > > > > > + */
> > > > > > > > + if (!vdev->pdev->is_virtfn && !vdev->vf_token && !vf_token)
> > > > > > > > + return 0; /* No VF token provided or required */
> > > > > > > > +
> > > > > > > > + if (vdev->pdev->is_virtfn) {
> > > > > > > > + struct vfio_device *pf_dev;
> > > > > > > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> > > > > > > > &pf_dev);
> > > > > > > > + bool match;
> > > > > > > > +
> > > > > > > > + if (!pf_vdev) {
> > > > > > > > + if (!vf_token)
> > > > > > > > + return 0; /* PF is not vfio-pci, no VF
> > > > token */
> > > > > > > > +
> > > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > > + "VF token incorrectly provided, PF not
> > > > bound
> > > > > > > > to vfio-pci\n");
> > > > > > > > + return -EINVAL;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + if (!vf_token) {
> > > > > > > > + vfio_device_put(pf_dev);
> > > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > > + "VF token required to access
> > > > device\n");
> > > > > > > > + return -EACCES;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > > > > > > + match = uuid_equal(uuid, &pf_vdev->vf_token-
> > > > >uuid);
> > > > > > > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > > > > > > +
> > > > > > > > + vfio_device_put(pf_dev);
> > > > > > > > +
> > > > > > > > + if (!match) {
> > > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > > + "Incorrect VF token provided for
> > > > device\n");
> > > > > > > > + return -EACCES;
> > > > > > > > + }
> > > > > > > > + } else if (vdev->vf_token) {
> > > > > > > > + mutex_lock(&vdev->vf_token->lock);
> > > > > > > > + if (vdev->vf_token->users) {
> > > > > > > > + if (!vf_token) {
> > > > > > > > + mutex_unlock(&vdev->vf_token-
> > > > >lock);
> > > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > > + "VF token required to access
> > > > > > > > device\n");
> > > > > > > > + return -EACCES;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + if (!uuid_equal(uuid, &vdev->vf_token->uuid))
> > > > {
> > > > > > > > + mutex_unlock(&vdev->vf_token-
> > > > >lock);
> > > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > > + "Incorrect VF token provided
> > > > for
> > > > > > > > device\n");
> > > > > > > > + return -EACCES;
> > > > > > > > + }
> > > > > > > > + } else if (vf_token) {
> > > > > > > > + uuid_copy(&vdev->vf_token->uuid, uuid);
> > > > > > > > + }
> > > > > > >
> > > > > > > It implies that we allow PF to be accessed w/o providing a VF token,
> > > > > > > as long as no VF is currently in-use, which further means no VF can
> > > > > > > be further assigned since no one knows the random uuid allocated
> > > > > > > by vfio. Just want to confirm whether it is the desired flavor. If an
> > > > > > > user really wants to use PF-only, possibly he should disable SR-IOV
> > > > > > > instead...
> > > > > >
> > > > > > Yes, this is the behavior I'm intending. Are you suggesting that we
> > > > > > should require a VF token in order to access a PF that has SR-IOV
> > > > > > already enabled? This introduces an inconsistency that SR-IOV can
> be
> > > > >
> > > > > yes. I felt that it's meaningless otherwise if an user has no attempt to
> > > > > manage SR-IOV but still leaving it enabled. In many cases, enabling of
> > > > > SR-IOV may reserve some resource in the hardware, thus simply
> hurting
> > > > > PF performance.
> > > >
> > > > But a user needs to be granted access to a device by a privileged
> > > > entity and the privileged entity may also enable SR-IOV, so it seems
> > > > you're assuming the privileged entity is operating independently and
> > > > not in the best interest of enabling the specific user case.
> > >
> > > what about throwing out a warning for such situation? so the userspace
> > > knows some collaboration is missing before its access to the device.
> >
> > This seems arbitrary. pci-pf-stub proves to us that there are devices
> > that need no special setup for SR-IOV, we don't know that we don't have
> > such a device. Enabling SR-IOV after the user opens the device also

btw no special setup doesn't mean that a PF driver cannot do bad thing to
VFs. In such case, I think the whole token idea should be still applied.

> > doesn't indicate there's necessarily collaboration between the two, so
> > if we generate a warning on one, how do we assume the other is ok? I
> > don't really understand why this is generating such concern. Thanks,

specifically I feel we should warn both:

1) userspace driver GET_DEVICE_FD w/o providing a token on a PF
which has SR-IOV already enabled
2) admin writes non-zero numvfs to a PF which has already bound to
userspace driver which doesn't provide a token

in both cases VFs are enabled but cannot be used (if you agree that
the token idea should be also applied to 'no special setup' case)

> >
> > Alex
>
> I meant to warn the suboptimal case where the userspace driver doesn't
> provide a token when accessing a PF which has SR-IOV already enabled.
> I don't think a sane configuration/coordination should do this since all
> VFs are simply wasted and instead may hurt the PF performance...
>
> >
> > > > > > enabled via sysfs asynchronous to the GET_DEVICE_FD ioctl, so we'd
> > > > need
> > > > > > to secure the sysfs interface to only allow enabling SR-IOV when the
> > PF
> > > > > > is already opened to cases where the VF token is already set? Thus
> > > > >
> > > > > yes, the PF is assigned to the userspace driver, thus it's reasonable to
> > > > > have the userspace driver decide whether to enable or disable SR-IOV
> > > > > when the PF is under its control. as I replied to patch [5/7], the sysfs
> > > > > interface alone looks problematic w/o knowing whether the
> userspace
> > > > > driver is willing to manage VFs (by setting a token)...
> > > >
> > > > As I replied in patch [5/7] the operations don't need to happen
> > > > independently, configuring SR-IOV in advance of the user driver
> > > > attaching or in collaboration with the user driver can also be enabled
> > > > with this series as is. Allowing the user driver to directly enable
> > > > SR-IOV and create VFs in the host is something I've avoided here, but
> > > > not precluded for later extensions. I think that allowing a user to
> > > > perform these operations represents a degree of privilege beyond
> > > > ownership of the PF itself, which is why I'm currently only enabling
> > > > the sysfs sriov_configure interface. The user driver needs to work in
> > > > collaboration with a privileged entity to enable SR-IOV, or be granted
> > > > access to operate on the sysfs interface directly.
> > >
> > > Thanks. this assumption was clearly overlooked in my previous thinking.
> > >
> > > >
> > > > > > SR-IOV could be pre-enabled, but the user must provide a vf_token
> > > > > > option on GET_DEVICE_FD, otherwise SR-IOV could only be enabled
> > after
> > > > > > the user sets a VF token. But then do we need to invalidate the
> token
> > > > > > at some point, or else it seems like we have the same scenario when
> > the
> > > > > > next user comes along. We believe there are PFs that require no
> > > > >
> > > > > I think so, e.g. when SR-IOV is being disabled, or when the fd is closed.
> > > >
> > > > Can you articulate a specific risk that this would resolve? If we have
> > > > devices like the one supported by pci-pf-stub, where it's apparently
> > > > sufficient to provide no device access other than to enable SR-IOV on
> > > > the PF, re-implementing that in vfio-pci would require that the
> > > > userspace driver is notified when the SR-IOV configuration is changed
> > > > such that a VF token can be re-inserted. For what gain?
> > > >
> > > > > > special VF support other than sriov_configure, so those driver could
> > > > > > theoretically close the PF after setting a VF token. That makes it
> > > > >
> > > > > theoretically yes, but I'm not sure the real gain of supporting such
> > > > > usage. ????
> > > >
> > > > Likewise I don't see the gain of restricting it.
> > > >
> > > > > btw with your question I realize another potential open. Now an
> > > > > user could also use sysfs to reset the PF, which definitely affects the
> > > > > state of VFs. Do we want a token match with that path? or such
> > > > > intention is assumed to be trusted by VF drivers given that only
> > > > > privileged users can do it?
> > > >
> > > > I think we're going into the weeds here, a privileged user can use the
> > > > pci-sysfs reset interface to break all sorts of things. I'm certainly
> > > > not going to propose any sort of VF token interface to restrict it.
> > > > Privileged users can do bad things via sysfs. Privileged users can
> > > > configure PFs in ways that may not be compatible with any given
> > > > userspace VF driver. I'm assuming collaboration in the best interest
> > > > of enabling the user driver. Thanks,
> > > >
> > > > Alex
> > > >
> > > > > > difficult to determine the lifetime of a VF token and leads to the
> > > > > > interface proposed here of an initial random token, then the user
> set
> > > > > > token persisting indefinitely.
> > > > > >
> > > > > > I've tended consider all of these to be mechanisms that a user can
> > > > > > shoot themselves in the foot. Yes, the user and admin can do things
> > > > > > that will fail to work with this interface, for example my testing
> > > > > > involves QEMU, where we don't expose SR-IOV to the guest yet and
> > the
> > > > > > igb driver for the PF will encounter problems running a device with
> > > > > > SR-IOV enabled that it doesn't know about. Do we want to try to
> play
> > > > > > nanny and require specific semantics? I've opt'd for the more
> simple
> > > > > > code here.
> > > > > >
> > > > > > > > +
> > > > > > > > + mutex_unlock(&vdev->vf_token->lock);
> > > > > > > > + } else if (vf_token) {
> > > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > > + "VF token incorrectly provided, not a PF or
> > > > VF\n");
> > > > > > > > + return -EINVAL;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + return 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +#define VF_TOKEN_ARG "vf_token="
> > > > > > > > +
> > > > > > > > static int vfio_pci_match(void *device_data, char *buf)
> > > > > > > > {
> > > > > > > > struct vfio_pci_device *vdev = device_data;
> > > > > > > > + bool vf_token = false;
> > > > > > > > + uuid_t uuid;
> > > > > > > > + int ret;
> > > > > > > > +
> > > > > > > > + if (strncmp(pci_name(vdev->pdev), buf,
> > > > strlen(pci_name(vdev-
> > > > > > > > >pdev))))
> > > > > > > > + return 0; /* No match */
> > > > > > > > +
> > > > > > > > + if (strlen(buf) > strlen(pci_name(vdev->pdev))) {
> > > > > > > > + buf += strlen(pci_name(vdev->pdev));
> > > > > > > > +
> > > > > > > > + if (*buf != ' ')
> > > > > > > > + return 0; /* No match: non-whitespace after
> > > > name */
> > > > > > > > +
> > > > > > > > + while (*buf) {
> > > > > > > > + if (*buf == ' ') {
> > > > > > > > + buf++;
> > > > > > > > + continue;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + if (!vf_token && !strncmp(buf,
> > > > VF_TOKEN_ARG,
> > > > > > > > +
> > > > strlen(VF_TOKEN_ARG))) {
> > > > > > > > + buf += strlen(VF_TOKEN_ARG);
> > > > > > > > +
> > > > > > > > + if (strlen(buf) < UUID_STRING_LEN)
> > > > > > > > + return -EINVAL;
> > > > > > > > +
> > > > > > > > + ret = uuid_parse(buf, &uuid);
> > > > > > > > + if (ret)
> > > > > > > > + return ret;
> > > > > > > >
> > > > > > > > - return !strcmp(pci_name(vdev->pdev), buf);
> > > > > > > > + vf_token = true;
> > > > > > > > + buf += UUID_STRING_LEN;
> > > > > > > > + } else {
> > > > > > > > + /* Unknown/duplicate option */
> > > > > > > > + return -EINVAL;
> > > > > > > > + }
> > > > > > > > + }
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + ret = vfio_pci_validate_vf_token(vdev, vf_token, &uuid);
> > > > > > > > + if (ret)
> > > > > > > > + return ret;
> > > > > > > > +
> > > > > > > > + return 1; /* Match */
> > > > > > > > }
> > > > > > > >
> > > > > > > > static const struct vfio_device_ops vfio_pci_ops = {
> > > > > > > > @@ -1354,6 +1531,19 @@ static int vfio_pci_probe(struct
> pci_dev
> > > > *pdev,
> > > > > > > > const struct pci_device_id *id)
> > > > > > > > return ret;
> > > > > > > > }
> > > > > > > >
> > > > > > > > + if (pdev->is_physfn) {
> > > > > > > > + vdev->vf_token = kzalloc(sizeof(*vdev->vf_token),
> > > > > > > > GFP_KERNEL);
> > > > > > > > + if (!vdev->vf_token) {
> > > > > > > > + vfio_pci_reflck_put(vdev->reflck);
> > > > > > > > + vfio_del_group_dev(&pdev->dev);
> > > > > > > > + vfio_iommu_group_put(group, &pdev->dev);
> > > > > > > > + kfree(vdev);
> > > > > > > > + return -ENOMEM;
> > > > > > > > + }
> > > > > > > > + mutex_init(&vdev->vf_token->lock);
> > > > > > > > + uuid_gen(&vdev->vf_token->uuid);
> > > > > > >
> > > > > > > should we also regenerate a random uuid somewhere when SR-
> IOV
> > is
> > > > > > > disabled and then re-enabled on a PF? Although vfio disallows
> > > > userspace
> > > > > > > to read uuid, it is always safer to avoid caching a secret from
> > previous
> > > > > > > user.
> > > > > >
> > > > > > What if our user is QEMU emulating SR-IOV to the guest. Do we
> want
> > to
> > > > > > force a new VF token is set every time we bounce the VFs? Why? As
> > > > > > above, the session lifetime of the VF token might be difficult to
> > > > > > determine and I'm not sure paranoia is a sufficient reason to try to
> > > > > > create boundaries for it. Thanks,
> > > > > >
> > > > > > Alex
> > > > > >
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > if (vfio_pci_is_vga(pdev)) {
> > > > > > > > vga_client_register(pdev, vdev, NULL,
> > > > > > > > vfio_pci_set_vga_decode);
> > > > > > > > vga_set_legacy_decoding(pdev,
> > > > > > > > @@ -1387,6 +1577,12 @@ static void vfio_pci_remove(struct
> > pci_dev
> > > > > > *pdev)
> > > > > > > > if (!vdev)
> > > > > > > > return;
> > > > > > > >
> > > > > > > > + if (vdev->vf_token) {
> > > > > > > > + WARN_ON(vdev->vf_token->users);
> > > > > > > > + mutex_destroy(&vdev->vf_token->lock);
> > > > > > > > + kfree(vdev->vf_token);
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > vfio_pci_reflck_put(vdev->reflck);
> > > > > > > >
> > > > > > > > vfio_iommu_group_put(pdev->dev.iommu_group, &pdev-
> > >dev);
> > > > > > > > diff --git a/drivers/vfio/pci/vfio_pci_private.h
> > > > > > > > b/drivers/vfio/pci/vfio_pci_private.h
> > > > > > > > index 8a2c7607d513..76c11c915949 100644
> > > > > > > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > > > > > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > > > > > > @@ -12,6 +12,7 @@
> > > > > > > > #include <linux/pci.h>
> > > > > > > > #include <linux/irqbypass.h>
> > > > > > > > #include <linux/types.h>
> > > > > > > > +#include <linux/uuid.h>
> > > > > > > >
> > > > > > > > #ifndef VFIO_PCI_PRIVATE_H
> > > > > > > > #define VFIO_PCI_PRIVATE_H
> > > > > > > > @@ -84,6 +85,12 @@ struct vfio_pci_reflck {
> > > > > > > > struct mutex lock;
> > > > > > > > };
> > > > > > > >
> > > > > > > > +struct vfio_pci_vf_token {
> > > > > > > > + struct mutex lock;
> > > > > > > > + uuid_t uuid;
> > > > > > > > + int users;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > struct vfio_pci_device {
> > > > > > > > struct pci_dev *pdev;
> > > > > > > > void __iomem *barmap[PCI_STD_NUM_BARS];
> > > > > > > > @@ -122,6 +129,7 @@ struct vfio_pci_device {
> > > > > > > > struct list_head dummy_resources_list;
> > > > > > > > struct mutex ioeventfds_lock;
> > > > > > > > struct list_head ioeventfds_list;
> > > > > > > > + struct vfio_pci_vf_token *vf_token;
> > > > > > > > };
> > > > > > > >
> > > > > > > > #define is_intx(vdev) (vdev->irq_type ==
> > VFIO_PCI_INTX_IRQ_INDEX)
> > > > > > >
> > > > >
> > >

2020-03-09 02:15:44

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 3/7] vfio/pci: Introduce VF token

> From: Alex Williamson <[email protected]>
> Sent: Monday, March 9, 2020 8:46 AM
>
> On Sat, 7 Mar 2020 01:04:41 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Alex Williamson <[email protected]>
> > > Sent: Friday, March 6, 2020 11:39 PM
> > >
> > > On Fri, 6 Mar 2020 08:32:40 +0000
> > > "Tian, Kevin" <[email protected]> wrote:
> > >
> > > > > From: Alex Williamson <[email protected]>
> > > > > Sent: Friday, March 6, 2020 2:18 AM
> > > > >
> > > > > On Tue, 25 Feb 2020 02:59:37 +0000
> > > > > "Tian, Kevin" <[email protected]> wrote:
> > > > >
> > > > > > > From: Alex Williamson
> > > > > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > > > > >
> > > > > > > If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are
> not
> > > > > > > fully isolated from the PF. The PF can always cause a denial of
> service
> > > > > > > to the VF, even if by simply resetting itself. The degree to which a
> PF
> > > > > > > can access the data passed through a VF or interfere with its
> > > operation
> > > > > > > is dependent on a given SR-IOV implementation. Therefore we
> want
> > > to
> > > > > > > avoid a scenario where an existing vfio-pci based userspace driver
> > > might
> > > > > > > assume the PF driver is trusted, for example assigning a PF to one
> VM
> > > > > > > and VF to another with some expectation of isolation. IOMMU
> > > grouping
> > > > > > > could be a solution to this, but imposes an unnecessarily strong
> > > > > > > relationship between PF and VF drivers if they need to operate
> with
> > > the
> > > > > > > same IOMMU context. Instead we introduce a "VF token", which
> is
> > > > > > > essentially just a shared secret between PF and VF drivers,
> > > implemented
> > > > > > > as a UUID.
> > > > > > >
> > > > > > > The VF token can be set by a vfio-pci based PF driver and must be
> > > known
> > > > > > > by the vfio-pci based VF driver in order to gain access to the device.
> > > > > > > This allows the degree to which this VF token is considered secret
> to
> > > be
> > > > > > > determined by the applications and environment. For example a
> VM
> > > > > might
> > > > > > > generate a random UUID known only internally to the hypervisor
> > > while a
> > > > > > > userspace networking appliance might use a shared, or even well
> > > know,
> > > > > > > UUID among the application drivers.
> > > > > > >
> > > > > > > To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD
> > > interface
> > > > > is
> > > > > > > extended to accept key=value pairs in addition to the device name.
> > > This
> > > > > > > allows us to most easily deny user access to the device without risk
> > > > > > > that existing userspace drivers assume region offsets, IRQs, and
> other
> > > > > > > device features, leading to more elaborate error paths. The
> format of
> > > > > > > these options are expected to take the form:
> > > > > > >
> > > > > > > "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"
> > > > > > >
> > > > > > > Where the device name is always provided first for compatibility
> and
> > > > > > > additional options are specified in a space separated list. The
> > > > > > > relation between and requirements for the additional options will
> be
> > > > > > > vfio bus driver dependent, however unknown or unused option
> > > within
> > > > > this
> > > > > > > schema should return error. This allow for future use of unknown
> > > > > > > options as well as a positive indication to the user that an option is
> > > > > > > used.
> > > > > > >
> > > > > > > An example VF token option would take this form:
> > > > > > >
> > > > > > > "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"
> > > > > > >
> > > > > > > When accessing a VF where the PF is making use of vfio-pci, the
> user
> > > > > > > MUST provide the current vf_token. When accessing a PF, the
> user
> > > MUST
> > > > > > > provide the current vf_token IF there are active VF users or MAY
> > > provide
> > > > > > > a vf_token in order to set the current VF token when no VF users
> are
> > > > > > > active. The former requirement assures VF users that an
> > > unassociated
> > > > > > > driver cannot usurp the PF device. These semantics also imply that
> a
> > > > > > > VF token MUST be set by a PF driver before VF drivers can access
> their
> > > > > > > device, the default token is random and mechanisms to read the
> > > token
> > > > > are
> > > > > > > not provided in order to protect the VF token of previous users.
> Use
> > > of
> > > > > > > the vf_token option outside of these cases will return an error, as
> > > > > > > discussed above.
> > > > > > >
> > > > > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > > > > ---
> > > > > > > drivers/vfio/pci/vfio_pci.c | 198
> > > > > > > +++++++++++++++++++++++++++++++++++
> > > > > > > drivers/vfio/pci/vfio_pci_private.h | 8 +
> > > > > > > 2 files changed, 205 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > > > > > index 2ec6c31d0ab0..8dd6ef9543ca 100644
> > > > > > > --- a/drivers/vfio/pci/vfio_pci.c
> > > > > > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > > > > > @@ -466,6 +466,44 @@ static void vfio_pci_disable(struct
> > > > > vfio_pci_device
> > > > > > > *vdev)
> > > > > > > vfio_pci_set_power_state(vdev, PCI_D3hot);
> > > > > > > }
> > > > > > >
> > > > > > > +static struct pci_driver vfio_pci_driver;
> > > > > > > +
> > > > > > > +static struct vfio_pci_device *get_pf_vdev(struct vfio_pci_device
> > > *vdev,
> > > > > > > + struct vfio_device **pf_dev)
> > > > > > > +{
> > > > > > > + struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > > > > > > +
> > > > > > > + if (!vdev->pdev->is_virtfn)
> > > > > > > + return NULL;
> > > > > > > +
> > > > > > > + *pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > > > > > > + if (!*pf_dev)
> > > > > > > + return NULL;
> > > > > > > +
> > > > > > > + if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> > > > > > > + vfio_device_put(*pf_dev);
> > > > > > > + return NULL;
> > > > > > > + }
> > > > > > > +
> > > > > > > + return vfio_device_data(*pf_dev);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void vfio_pci_vf_token_user_add(struct vfio_pci_device
> *vdev,
> > > int
> > > > > val)
> > > > > > > +{
> > > > > > > + struct vfio_device *pf_dev;
> > > > > > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> > > &pf_dev);
> > > > > > > +
> > > > > > > + if (!pf_vdev)
> > > > > > > + return;
> > > > > > > +
> > > > > > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > > > > > + pf_vdev->vf_token->users += val;
> > > > > > > + WARN_ON(pf_vdev->vf_token->users < 0);
> > > > > > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > > > > > +
> > > > > > > + vfio_device_put(pf_dev);
> > > > > > > +}
> > > > > > > +
> > > > > > > static void vfio_pci_release(void *device_data)
> > > > > > > {
> > > > > > > struct vfio_pci_device *vdev = device_data;
> > > > > > > @@ -473,6 +511,7 @@ static void vfio_pci_release(void
> *device_data)
> > > > > > > mutex_lock(&vdev->reflck->lock);
> > > > > > >
> > > > > > > if (!(--vdev->refcnt)) {
> > > > > > > + vfio_pci_vf_token_user_add(vdev, -1);
> > > > > > > vfio_spapr_pci_eeh_release(vdev->pdev);
> > > > > > > vfio_pci_disable(vdev);
> > > > > > > }
> > > > > > > @@ -498,6 +537,7 @@ static int vfio_pci_open(void *device_data)
> > > > > > > goto error;
> > > > > > >
> > > > > > > vfio_spapr_pci_eeh_open(vdev->pdev);
> > > > > > > + vfio_pci_vf_token_user_add(vdev, 1);
> > > > > > > }
> > > > > > > vdev->refcnt++;
> > > > > > > error:
> > > > > > > @@ -1278,11 +1318,148 @@ static void vfio_pci_request(void
> > > > > *device_data,
> > > > > > > unsigned int count)
> > > > > > > mutex_unlock(&vdev->igate);
> > > > > > > }
> > > > > > >
> > > > > > > +static int vfio_pci_validate_vf_token(struct vfio_pci_device *vdev,
> > > > > > > + bool vf_token, uuid_t *uuid)
> > > > > > > +{
> > > > > > > + /*
> > > > > > > + * There's always some degree of trust or collaboration
> > > between SR-
> > > > > > > IOV
> > > > > > > + * PF and VFs, even if just that the PF hosts the SR-IOV
> > > capability and
> > > > > > > + * can disrupt VFs with a reset, but often the PF has more
> > > explicit
> > > > > > > + * access to deny service to the VF or access data passed
> > > through the
> > > > > > > + * VF. We therefore require an opt-in via a shared VF token
> > > (UUID)
> > > > > > > to
> > > > > > > + * represent this trust. This both prevents that a VF driver
> > > might
> > > > > > > + * assume the PF driver is a trusted, in-kernel driver, and also
> > > that
> > > > > > > + * a PF driver might be replaced with a rogue driver, unknown
> > > to in-
> > > > > > > use
> > > > > > > + * VF drivers.
> > > > > > > + *
> > > > > > > + * Therefore when presented with a VF, if the PF is a vfio
> > > device and
> > > > > > > + * it is bound to the vfio-pci driver, the user needs to provide
> > > a VF
> > > > > > > + * token to access the device, in the form of appending a
> > > vf_token to
> > > > > > > + * the device name, for example:
> > > > > > > + *
> > > > > > > + * "0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-
> > > f591514ba1f3"
> > > > > > > + *
> > > > > > > + * When presented with a PF which has VFs in use, the user
> > > must also
> > > > > > > + * provide the current VF token to prove collaboration with
> > > existing
> > > > > > > + * VF users. If VFs are not in use, the VF token provided for
> > > the PF
> > > > > > > + * device will act to set the VF token.
> > > > > > > + *
> > > > > > > + * If the VF token is provided but unused, a fault is generated.
> > > > > >
> > > > > > fault->error, otherwise it is easy to consider a CPU fault. ????
> > > > >
> > > > > Ok, I can make that change, but I think you might have a unique
> > > > > background to make a leap that a userspace ioctl can trigger a CPU
> > > > > fault ;)
> > > > >
> > > > > > > + */
> > > > > > > + if (!vdev->pdev->is_virtfn && !vdev->vf_token && !vf_token)
> > > > > > > + return 0; /* No VF token provided or required */
> > > > > > > +
> > > > > > > + if (vdev->pdev->is_virtfn) {
> > > > > > > + struct vfio_device *pf_dev;
> > > > > > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> > > > > > > &pf_dev);
> > > > > > > + bool match;
> > > > > > > +
> > > > > > > + if (!pf_vdev) {
> > > > > > > + if (!vf_token)
> > > > > > > + return 0; /* PF is not vfio-pci, no VF
> > > token */
> > > > > > > +
> > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > + "VF token incorrectly provided, PF not
> > > bound
> > > > > > > to vfio-pci\n");
> > > > > > > + return -EINVAL;
> > > > > > > + }
> > > > > > > +
> > > > > > > + if (!vf_token) {
> > > > > > > + vfio_device_put(pf_dev);
> > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > + "VF token required to access
> > > device\n");
> > > > > > > + return -EACCES;
> > > > > > > + }
> > > > > > > +
> > > > > > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > > > > > + match = uuid_equal(uuid, &pf_vdev->vf_token-
> > > >uuid);
> > > > > > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > > > > > +
> > > > > > > + vfio_device_put(pf_dev);
> > > > > > > +
> > > > > > > + if (!match) {
> > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > + "Incorrect VF token provided for
> > > device\n");
> > > > > > > + return -EACCES;
> > > > > > > + }
> > > > > > > + } else if (vdev->vf_token) {
> > > > > > > + mutex_lock(&vdev->vf_token->lock);
> > > > > > > + if (vdev->vf_token->users) {
> > > > > > > + if (!vf_token) {
> > > > > > > + mutex_unlock(&vdev->vf_token-
> > > >lock);
> > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > + "VF token required to access
> > > > > > > device\n");
> > > > > > > + return -EACCES;
> > > > > > > + }
> > > > > > > +
> > > > > > > + if (!uuid_equal(uuid, &vdev->vf_token->uuid))
> > > {
> > > > > > > + mutex_unlock(&vdev->vf_token-
> > > >lock);
> > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > + "Incorrect VF token provided
> > > for
> > > > > > > device\n");
> > > > > > > + return -EACCES;
> > > > > > > + }
> > > > > > > + } else if (vf_token) {
> > > > > > > + uuid_copy(&vdev->vf_token->uuid, uuid);
> > > > > > > + }
> > > > > >
> > > > > > It implies that we allow PF to be accessed w/o providing a VF token,
> > > > > > as long as no VF is currently in-use, which further means no VF can
> > > > > > be further assigned since no one knows the random uuid allocated
> > > > > > by vfio. Just want to confirm whether it is the desired flavor. If an
> > > > > > user really wants to use PF-only, possibly he should disable SR-IOV
> > > > > > instead...
> > > > >
> > > > > Yes, this is the behavior I'm intending. Are you suggesting that we
> > > > > should require a VF token in order to access a PF that has SR-IOV
> > > > > already enabled? This introduces an inconsistency that SR-IOV can be
> > > >
> > > > yes. I felt that it's meaningless otherwise if an user has no attempt to
> > > > manage SR-IOV but still leaving it enabled. In many cases, enabling of
> > > > SR-IOV may reserve some resource in the hardware, thus simply hurting
> > > > PF performance.
> > >
> > > But a user needs to be granted access to a device by a privileged
> > > entity and the privileged entity may also enable SR-IOV, so it seems
> > > you're assuming the privileged entity is operating independently and
> > > not in the best interest of enabling the specific user case.
> >
> > what about throwing out a warning for such situation? so the userspace
> > knows some collaboration is missing before its access to the device.
>
> This seems arbitrary. pci-pf-stub proves to us that there are devices
> that need no special setup for SR-IOV, we don't know that we don't have
> such a device. Enabling SR-IOV after the user opens the device also
> doesn't indicate there's necessarily collaboration between the two, so
> if we generate a warning on one, how do we assume the other is ok? I
> don't really understand why this is generating such concern. Thanks,
>
> Alex

I meant to warn the suboptimal case where the userspace driver doesn't
provide a token when accessing a PF which has SR-IOV already enabled.
I don't think a sane configuration/coordination should do this since all
VFs are simply wasted and instead may hurt the PF performance...

>
> > > > > enabled via sysfs asynchronous to the GET_DEVICE_FD ioctl, so we'd
> > > need
> > > > > to secure the sysfs interface to only allow enabling SR-IOV when the
> PF
> > > > > is already opened to cases where the VF token is already set? Thus
> > > >
> > > > yes, the PF is assigned to the userspace driver, thus it's reasonable to
> > > > have the userspace driver decide whether to enable or disable SR-IOV
> > > > when the PF is under its control. as I replied to patch [5/7], the sysfs
> > > > interface alone looks problematic w/o knowing whether the userspace
> > > > driver is willing to manage VFs (by setting a token)...
> > >
> > > As I replied in patch [5/7] the operations don't need to happen
> > > independently, configuring SR-IOV in advance of the user driver
> > > attaching or in collaboration with the user driver can also be enabled
> > > with this series as is. Allowing the user driver to directly enable
> > > SR-IOV and create VFs in the host is something I've avoided here, but
> > > not precluded for later extensions. I think that allowing a user to
> > > perform these operations represents a degree of privilege beyond
> > > ownership of the PF itself, which is why I'm currently only enabling
> > > the sysfs sriov_configure interface. The user driver needs to work in
> > > collaboration with a privileged entity to enable SR-IOV, or be granted
> > > access to operate on the sysfs interface directly.
> >
> > Thanks. this assumption was clearly overlooked in my previous thinking.
> >
> > >
> > > > > SR-IOV could be pre-enabled, but the user must provide a vf_token
> > > > > option on GET_DEVICE_FD, otherwise SR-IOV could only be enabled
> after
> > > > > the user sets a VF token. But then do we need to invalidate the token
> > > > > at some point, or else it seems like we have the same scenario when
> the
> > > > > next user comes along. We believe there are PFs that require no
> > > >
> > > > I think so, e.g. when SR-IOV is being disabled, or when the fd is closed.
> > >
> > > Can you articulate a specific risk that this would resolve? If we have
> > > devices like the one supported by pci-pf-stub, where it's apparently
> > > sufficient to provide no device access other than to enable SR-IOV on
> > > the PF, re-implementing that in vfio-pci would require that the
> > > userspace driver is notified when the SR-IOV configuration is changed
> > > such that a VF token can be re-inserted. For what gain?
> > >
> > > > > special VF support other than sriov_configure, so those driver could
> > > > > theoretically close the PF after setting a VF token. That makes it
> > > >
> > > > theoretically yes, but I'm not sure the real gain of supporting such
> > > > usage. ????
> > >
> > > Likewise I don't see the gain of restricting it.
> > >
> > > > btw with your question I realize another potential open. Now an
> > > > user could also use sysfs to reset the PF, which definitely affects the
> > > > state of VFs. Do we want a token match with that path? or such
> > > > intention is assumed to be trusted by VF drivers given that only
> > > > privileged users can do it?
> > >
> > > I think we're going into the weeds here, a privileged user can use the
> > > pci-sysfs reset interface to break all sorts of things. I'm certainly
> > > not going to propose any sort of VF token interface to restrict it.
> > > Privileged users can do bad things via sysfs. Privileged users can
> > > configure PFs in ways that may not be compatible with any given
> > > userspace VF driver. I'm assuming collaboration in the best interest
> > > of enabling the user driver. Thanks,
> > >
> > > Alex
> > >
> > > > > difficult to determine the lifetime of a VF token and leads to the
> > > > > interface proposed here of an initial random token, then the user set
> > > > > token persisting indefinitely.
> > > > >
> > > > > I've tended consider all of these to be mechanisms that a user can
> > > > > shoot themselves in the foot. Yes, the user and admin can do things
> > > > > that will fail to work with this interface, for example my testing
> > > > > involves QEMU, where we don't expose SR-IOV to the guest yet and
> the
> > > > > igb driver for the PF will encounter problems running a device with
> > > > > SR-IOV enabled that it doesn't know about. Do we want to try to play
> > > > > nanny and require specific semantics? I've opt'd for the more simple
> > > > > code here.
> > > > >
> > > > > > > +
> > > > > > > + mutex_unlock(&vdev->vf_token->lock);
> > > > > > > + } else if (vf_token) {
> > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > + "VF token incorrectly provided, not a PF or
> > > VF\n");
> > > > > > > + return -EINVAL;
> > > > > > > + }
> > > > > > > +
> > > > > > > + return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +#define VF_TOKEN_ARG "vf_token="
> > > > > > > +
> > > > > > > static int vfio_pci_match(void *device_data, char *buf)
> > > > > > > {
> > > > > > > struct vfio_pci_device *vdev = device_data;
> > > > > > > + bool vf_token = false;
> > > > > > > + uuid_t uuid;
> > > > > > > + int ret;
> > > > > > > +
> > > > > > > + if (strncmp(pci_name(vdev->pdev), buf,
> > > strlen(pci_name(vdev-
> > > > > > > >pdev))))
> > > > > > > + return 0; /* No match */
> > > > > > > +
> > > > > > > + if (strlen(buf) > strlen(pci_name(vdev->pdev))) {
> > > > > > > + buf += strlen(pci_name(vdev->pdev));
> > > > > > > +
> > > > > > > + if (*buf != ' ')
> > > > > > > + return 0; /* No match: non-whitespace after
> > > name */
> > > > > > > +
> > > > > > > + while (*buf) {
> > > > > > > + if (*buf == ' ') {
> > > > > > > + buf++;
> > > > > > > + continue;
> > > > > > > + }
> > > > > > > +
> > > > > > > + if (!vf_token && !strncmp(buf,
> > > VF_TOKEN_ARG,
> > > > > > > +
> > > strlen(VF_TOKEN_ARG))) {
> > > > > > > + buf += strlen(VF_TOKEN_ARG);
> > > > > > > +
> > > > > > > + if (strlen(buf) < UUID_STRING_LEN)
> > > > > > > + return -EINVAL;
> > > > > > > +
> > > > > > > + ret = uuid_parse(buf, &uuid);
> > > > > > > + if (ret)
> > > > > > > + return ret;
> > > > > > >
> > > > > > > - return !strcmp(pci_name(vdev->pdev), buf);
> > > > > > > + vf_token = true;
> > > > > > > + buf += UUID_STRING_LEN;
> > > > > > > + } else {
> > > > > > > + /* Unknown/duplicate option */
> > > > > > > + return -EINVAL;
> > > > > > > + }
> > > > > > > + }
> > > > > > > + }
> > > > > > > +
> > > > > > > + ret = vfio_pci_validate_vf_token(vdev, vf_token, &uuid);
> > > > > > > + if (ret)
> > > > > > > + return ret;
> > > > > > > +
> > > > > > > + return 1; /* Match */
> > > > > > > }
> > > > > > >
> > > > > > > static const struct vfio_device_ops vfio_pci_ops = {
> > > > > > > @@ -1354,6 +1531,19 @@ static int vfio_pci_probe(struct pci_dev
> > > *pdev,
> > > > > > > const struct pci_device_id *id)
> > > > > > > return ret;
> > > > > > > }
> > > > > > >
> > > > > > > + if (pdev->is_physfn) {
> > > > > > > + vdev->vf_token = kzalloc(sizeof(*vdev->vf_token),
> > > > > > > GFP_KERNEL);
> > > > > > > + if (!vdev->vf_token) {
> > > > > > > + vfio_pci_reflck_put(vdev->reflck);
> > > > > > > + vfio_del_group_dev(&pdev->dev);
> > > > > > > + vfio_iommu_group_put(group, &pdev->dev);
> > > > > > > + kfree(vdev);
> > > > > > > + return -ENOMEM;
> > > > > > > + }
> > > > > > > + mutex_init(&vdev->vf_token->lock);
> > > > > > > + uuid_gen(&vdev->vf_token->uuid);
> > > > > >
> > > > > > should we also regenerate a random uuid somewhere when SR-IOV
> is
> > > > > > disabled and then re-enabled on a PF? Although vfio disallows
> > > userspace
> > > > > > to read uuid, it is always safer to avoid caching a secret from
> previous
> > > > > > user.
> > > > >
> > > > > What if our user is QEMU emulating SR-IOV to the guest. Do we want
> to
> > > > > force a new VF token is set every time we bounce the VFs? Why? As
> > > > > above, the session lifetime of the VF token might be difficult to
> > > > > determine and I'm not sure paranoia is a sufficient reason to try to
> > > > > create boundaries for it. Thanks,
> > > > >
> > > > > Alex
> > > > >
> > > > > > > + }
> > > > > > > +
> > > > > > > if (vfio_pci_is_vga(pdev)) {
> > > > > > > vga_client_register(pdev, vdev, NULL,
> > > > > > > vfio_pci_set_vga_decode);
> > > > > > > vga_set_legacy_decoding(pdev,
> > > > > > > @@ -1387,6 +1577,12 @@ static void vfio_pci_remove(struct
> pci_dev
> > > > > *pdev)
> > > > > > > if (!vdev)
> > > > > > > return;
> > > > > > >
> > > > > > > + if (vdev->vf_token) {
> > > > > > > + WARN_ON(vdev->vf_token->users);
> > > > > > > + mutex_destroy(&vdev->vf_token->lock);
> > > > > > > + kfree(vdev->vf_token);
> > > > > > > + }
> > > > > > > +
> > > > > > > vfio_pci_reflck_put(vdev->reflck);
> > > > > > >
> > > > > > > vfio_iommu_group_put(pdev->dev.iommu_group, &pdev-
> >dev);
> > > > > > > diff --git a/drivers/vfio/pci/vfio_pci_private.h
> > > > > > > b/drivers/vfio/pci/vfio_pci_private.h
> > > > > > > index 8a2c7607d513..76c11c915949 100644
> > > > > > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > > > > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > > > > > @@ -12,6 +12,7 @@
> > > > > > > #include <linux/pci.h>
> > > > > > > #include <linux/irqbypass.h>
> > > > > > > #include <linux/types.h>
> > > > > > > +#include <linux/uuid.h>
> > > > > > >
> > > > > > > #ifndef VFIO_PCI_PRIVATE_H
> > > > > > > #define VFIO_PCI_PRIVATE_H
> > > > > > > @@ -84,6 +85,12 @@ struct vfio_pci_reflck {
> > > > > > > struct mutex lock;
> > > > > > > };
> > > > > > >
> > > > > > > +struct vfio_pci_vf_token {
> > > > > > > + struct mutex lock;
> > > > > > > + uuid_t uuid;
> > > > > > > + int users;
> > > > > > > +};
> > > > > > > +
> > > > > > > struct vfio_pci_device {
> > > > > > > struct pci_dev *pdev;
> > > > > > > void __iomem *barmap[PCI_STD_NUM_BARS];
> > > > > > > @@ -122,6 +129,7 @@ struct vfio_pci_device {
> > > > > > > struct list_head dummy_resources_list;
> > > > > > > struct mutex ioeventfds_lock;
> > > > > > > struct list_head ioeventfds_list;
> > > > > > > + struct vfio_pci_vf_token *vf_token;
> > > > > > > };
> > > > > > >
> > > > > > > #define is_intx(vdev) (vdev->irq_type ==
> VFIO_PCI_INTX_IRQ_INDEX)
> > > > > >
> > > >
> >

2020-03-09 03:06:57

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 5/7] vfio/pci: Add sriov_configure support

> From: Alex Williamson
> Sent: Monday, March 9, 2020 8:46 AM
>
> On Sat, 7 Mar 2020 01:35:23 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Alex Williamson
> > > Sent: Saturday, March 7, 2020 6:18 AM
> > >
> > > On Fri, 6 Mar 2020 07:57:19 +0000
> > > "Tian, Kevin" <[email protected]> wrote:
> > >
> > > > > From: Alex Williamson <[email protected]>
> > > > > Sent: Friday, March 6, 2020 2:23 AM
> > > > >
> > > > > On Tue, 25 Feb 2020 03:08:00 +0000
> > > > > "Tian, Kevin" <[email protected]> wrote:
> > > > >
> > > > > > > From: Alex Williamson
> > > > > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > > > > >
> > > > > > > With the VF Token interface we can now expect that a vfio
> userspace
> > > > > > > driver must be in collaboration with the PF driver, an unwitting
> > > > > > > userspace driver will not be able to get past the GET_DEVICE_FD
> step
> > > > > > > in accessing the device. We can now move on to actually allowing
> > > > > > > SR-IOV to be enabled by vfio-pci on the PF. Support for this is not
> > > > > > > enabled by default in this commit, but it does provide a module
> > > option
> > > > > > > for this to be enabled (enable_sriov=1). Enabling VFs is rather
> > > > > > > straightforward, except we don't want to risk that a VF might get
> > > > > > > autoprobed and bound to other drivers, so a bus notifier is used
> to
> > > > > > > "capture" VFs to vfio-pci using the driver_override support. We
> > > > > > > assume any later action to bind the device to other drivers is
> > > > > > > condoned by the system admin and allow it with a log warning.
> > > > > > >
> > > > > > > vfio-pci will disable SR-IOV on a PF before releasing the device,
> > > > > > > allowing a VF driver to be assured other drivers cannot take over
> the
> > > > > > > PF and that any other userspace driver must know the shared VF
> > > token.
> > > > > > > This support also does not provide a mechanism for the PF
> userspace
> > > > > > > driver itself to manipulate SR-IOV through the vfio API. With this
> > > > > > > patch SR-IOV can only be enabled via the host sysfs interface and
> the
> > > > > > > PF driver user cannot create or remove VFs.
> > > > > >
> > > > > > I'm not sure how many devices can be properly configured simply
> > > > > > with pci_enable_sriov. It is not unusual to require PF driver prepare
> > > > > > something before turning PCI SR-IOV capability. If you look kernel
> > > > > > PF drivers, there are only two using generic pci_sriov_configure_
> > > > > > simple (simple wrapper like pci_enable_sriov), while most others
> > > > > > implementing their own callback. However vfio itself has no idea
> > > > > > thus I'm not sure how an user knows whether using this option can
> > > > > > actually meet his purpose. I may miss something here, possibly
> > > > > > using DPDK as an example will make it clearer.
> > > > >
> > > > > There is still the entire vfio userspace driver interface. Imagine for
> > > > > example that QEMU emulates the SR-IOV capability and makes a call
> out
> > > > > to libvirt (or maybe runs with privs for the PF SR-IOV sysfs attribs)
> > > > > when the guest enables SR-IOV. Can't we assume that any PF specific
> > > > > support can still be performed in the userspace/guest driver, leaving
> > > > > us with a very simple and generic sriov_configure callback in vfio-pci?
> > > >
> > > > Makes sense. One concern, though, is how an user could be warned
> > > > if he inadvertently uses sysfs to enable SR-IOV on a vfio device whose
> > > > userspace driver is incapable of handling it. Note any VFIO device,
> > > > if SR-IOV capable, will allow user to do so once the module option is
> > > > turned on and the callback is registered. I felt such uncertainty can be
> > > > contained by toggling SR-IOV through a vfio api, but from your
> description
> > > > obviously it is what you want to avoid. Is it due to the sequence reason,
> > > > e.g. that SR-IOV must be enabled before userspace PF driver sets the
> > > > token?
> > >
> > > As in my other reply, enabling SR-IOV via a vfio API suggests that
> > > we're not only granting the user owning the PF device access to the
> > > device itself, but also the ability to create and remove subordinate
> > > devices on the host. That implies an extended degree of trust in the
> > > user beyond the PF device itself and raises questions about whether a
> > > user who is allowed to create VF devices should automatically be
> > > granted access to those VF devices, what the mechanism would be for
> > > that, and how we might re-assign those devices to other users,
> > > potentially including host kernel usage. What I'm proposing here
> > > doesn't preclude some future extension in that direction, but instead
> > > tries to simplify a first step towards enabling SR-IOV by leaving the
> > > SR-IOV enablement and VF assignment in the realm of a privileged system
> > > entity.
> >
> > the intention is clear to me now.
> >
> > >
> > > So, what I think you're suggesting here is that we should restrict
> > > vfio_pci_sriov_configure() to reject enabling SR-IOV until a user
> > > driver has configured a VF token. That requires both that the
> > > userspace driver has initialized to this point before SR-IOV can be
> > > enabled and that we would be forced to define a termination point for
> > > the user set VF token. Logically, this would need to be when the
> > > userspace driver exits or closes the PF device, which implies that we
> > > need to disable SR-IOV on the PF at this point, or we're left in an
> > > inconsistent state where VFs are enabled but cannot be disabled because
> > > we don't have a valid VF token. Now we're back to nearly a state where
> > > the user has control of not creating devices on the host, but removing
> > > them by closing the device, which will necessarily require that any VF
> > > driver release the device, whether userspace or kernel.
> > >
> > > I'm not sure what we're gaining by doing this though. I agree that
> > > there will be users that enable SR-IOV on a PF and then try to, for
> > > example, assign the PF and all the VFs to a VM. The VFs will fail due
> > > to lacking VF token support, unless they've patch QEMU with my test
> > > code, but depending on the PF driver in the guest, it may, or more
> > > likely won't work. But don't you think the VFs and probably PF not
> > > working is a sufficient clue that the configuration is invalid? OTOH,
> > > from what I've heard of the device in the ID table of the pci-pf-stub
> > > driver, they might very well be able to work with both PF and VFs in
> > > QEMU using only my test code to set the VF token.
> > >
> > > Therefore, I'm afraid what you're asking for here is to impose a usage
> > > restriction as a sanity test, when we don't really know what might be
> > > sane for this particular piece of hardware or use case. There are
> > > infinite ways that a vfio based userspace driver can fail to configure
> > > their hardware and make it work correctly, many of them are device
> > > specific. Isn't this just one of those cases? Thanks,
> > >
> >
> > what you said all makes sense. so I withdraw the idea of manipulating
> > SR-IOV through vfio ioctl. However I still feel that simply registering
> > sriov_configuration callback by vfio-pci somehow violates the typical
> > expectation of the sysfs interface. Before this patch, the success return
> > of writing non-zero value to numvfs implies VFs are in sane state and
> > functionally ready for immediate use. However now the behavior of
> > success return becomes undefined for vfio devices, since even vfio-pci
> > itself doesn't know whether VFs are functional for a random device
> > (may know some if carrying the same device IDs from pci-pf-stub). It
> > simply relies on the privileged entity who knows exactly the implication
> > of such write, while there is no way to warn inadvertent users which
> > to me is not a good design from kernel API p.o.v. Of course we may
> > document such restriction and the driver_override may also be an
> > indirect way to warn such user if he wants to use VFs for other purpose.
> > But it is still less elegant than reporting it in the first place. Maybe
> > what we really require is a new sysfs attribute purely for enabling
> > PCI SR-IOV capability, which doesn't imply making VFs actually
> > functional as did through the existing numvfs?
>
> I don't read the same guarantee into the sysfs SR-IOV interface. If
> such a guarantee exists, it's already broken by pci-pf-stub, which like
> vfio-pci allows dynamic IDs and driver_override to bind to any PF device
> allowing the ability to create (potentially) non-functional VFs. I

I don't know whether others raised the similar concern and how
it was addressed for pci-pf-stub before. Many places describe
numvfs as the preferred interface to enable/disable VFs while
'enable' just reads functional to me.

> think it would be a really bad decision to fork a new sysfs interface
> for this. I've already made SR-IOV support in vfio-pci an opt-in via a
> module option, would it ease your concerns if I elaborate in the text
> for the option that enabling SR-IOV may depend on support provided by a
> vfio-pci userspace driver?

Sure.

>
> I think that without absolutely knowing that an operation is incorrect,
> we're just generating noise and confusion by triggering warnings or
> developing alternate interfaces. Unfortunately, we have no generic
> means of knowing that an operation is incorrect, so I assume the best.
> Thanks,
>
> Alex

2020-03-09 03:54:41

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] vfio/pci: SR-IOV support


On 2020/3/7 上午12:24, Alex Williamson wrote:
> On Fri, 6 Mar 2020 11:35:21 +0800
> Jason Wang <[email protected]> wrote:
>
>> On 2020/3/6 上午1:14, Alex Williamson wrote:
>>> On Tue, 25 Feb 2020 14:09:07 +0800
>>> Jason Wang <[email protected]> wrote:
>>>
>>>> On 2020/2/25 上午10:33, Tian, Kevin wrote:
>>>>>> From: Alex Williamson
>>>>>> Sent: Thursday, February 20, 2020 2:54 AM
>>>>>>
>>>>>> Changes since v1 are primarily to patch 3/7 where the commit log is
>>>>>> rewritten, along with option parsing and failure logging based on
>>>>>> upstream discussions. The primary user visible difference is that
>>>>>> option parsing is now much more strict. If a vf_token option is
>>>>>> provided that cannot be used, we generate an error. As a result of
>>>>>> this, opening a PF with a vf_token option will serve as a mechanism of
>>>>>> setting the vf_token. This seems like a more user friendly API than
>>>>>> the alternative of sometimes requiring the option (VFs in use) and
>>>>>> sometimes rejecting it, and upholds our desire that the option is
>>>>>> always either used or rejected.
>>>>>>
>>>>>> This also means that the VFIO_DEVICE_FEATURE ioctl is not the only
>>>>>> means of setting the VF token, which might call into question whether
>>>>>> we absolutely need this new ioctl. Currently I'm keeping it because I
>>>>>> can imagine use cases, for example if a hypervisor were to support
>>>>>> SR-IOV, the PF device might be opened without consideration for a VF
>>>>>> token and we'd require the hypservisor to close and re-open the PF in
>>>>>> order to set a known VF token, which is impractical.
>>>>>>
>>>>>> Series overview (same as provided with v1):
>>>>> Thanks for doing this!
>>>>>
>>>>>> The synopsis of this series is that we have an ongoing desire to drive
>>>>>> PCIe SR-IOV PFs from userspace with VFIO. There's an immediate need
>>>>>> for this with DPDK drivers and potentially interesting future use
>>>>> Can you provide a link to the DPDK discussion?
>>>>>
>>>>>> cases in virtualization. We've been reluctant to add this support
>>>>>> previously due to the dependency and trust relationship between the
>>>>>> VF device and PF driver. Minimally the PF driver can induce a denial
>>>>>> of service to the VF, but depending on the specific implementation,
>>>>>> the PF driver might also be responsible for moving data between VFs
>>>>>> or have direct access to the state of the VF, including data or state
>>>>>> otherwise private to the VF or VF driver.
>>>>> Just a loud thinking. While the motivation of VF token sounds reasonable
>>>>> to me, I'm curious why the same concern is not raised in other usages.
>>>>> For example, there is no such design in virtio framework, where the
>>>>> virtio device could also be restarted, putting in separate process (vhost-user),
>>>>> and even in separate VM (virtio-vhost-user), etc.
>>>> AFAIK, the restart could only be triggered by either VM or qemu. But
>>>> yes, the datapath could be offloaded.
>>>>
>>>> But I'm not sure introducing another dedicated mechanism is better than
>>>> using the exist generic POSIX mechanism to make sure the connection
>>>> (AF_UINX) is secure.
>>>>
>>>>
>>>>> Of course the para-
>>>>> virtualized attribute of virtio implies some degree of trust, but as you
>>>>> mentioned many SR-IOV implementations support VF->PF communication
>>>>> which also implies some level of trust. It's perfectly fine if VFIO just tries
>>>>> to do better than other sub-systems, but knowing how other people
>>>>> tackle the similar problem may make the whole picture clearer. ????
>>>>>
>>>>> +Jason.
>>>> I'm not quite sure e.g allowing userspace PF driver with kernel VF
>>>> driver would not break the assumption of kernel security model. At least
>>>> we should forbid a unprivileged PF driver running in userspace.
>>> It might be useful to have your opinion on this series, because that's
>>> exactly what we're trying to do here. Various environments, DPDK
>>> specifically, want a userspace PF driver. This series takes steps to
>>> mitigate the risk of having such a driver, such as requiring this VF
>>> token interface to extend the VFIO interface and validate participation
>>> around a PF that is not considered trusted by the kernel.
>>
>> I may miss something. But what happens if:
>>
>> - PF driver is running by unprivileged user
>> - PF is programmed to send translated DMA request
>> - Then unprivileged user can mangle the kernel data
> ATS is a security risk regardless of SR-IOV, how does this change it?
> Thanks,


My understanding is the ATS only happen for some bugous devices. Some
hardware has on-chip IOMMU, this probably means unprivileged userspace
PF driver can control the on-chip IOMMU in this case.

Thanks


>
> Alex
>
>>> We also set
>>> a driver_override to try to make sure no host kernel driver can
>>> automatically bind to a VF of a user owned PF, only vfio-pci, but we
>>> don't prevent the admin from creating configurations where the VFs are
>>> used by other host kernel drivers.
>>>
>>> I think the question Kevin is inquiring about is whether virtio devices
>>> are susceptible to the type of collaborative, shared key environment
>>> we're creating here. For example, can a VM or qemu have access to
>>> reset a virtio device in a way that could affect other devices, ex. FLR
>>> on a PF that could interfere with VF operation. Thanks,
>>
>> Right, but I'm not sure it can be done only via virtio or need support
>> from transport (e.g PCI).
>>
>> Thanks
>>
>>
>>> Alex
>>>

2020-03-09 14:47:39

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] vfio/pci: SR-IOV support

On Mon, 9 Mar 2020 11:36:46 +0800
Jason Wang <[email protected]> wrote:

> On 2020/3/7 上午12:24, Alex Williamson wrote:
> > On Fri, 6 Mar 2020 11:35:21 +0800
> > Jason Wang <[email protected]> wrote:
> >
> >> On 2020/3/6 上午1:14, Alex Williamson wrote:
> >>> On Tue, 25 Feb 2020 14:09:07 +0800
> >>> Jason Wang <[email protected]> wrote:
> >>>
> >>>> On 2020/2/25 上午10:33, Tian, Kevin wrote:
> >>>>>> From: Alex Williamson
> >>>>>> Sent: Thursday, February 20, 2020 2:54 AM
> >>>>>>
> >>>>>> Changes since v1 are primarily to patch 3/7 where the commit log is
> >>>>>> rewritten, along with option parsing and failure logging based on
> >>>>>> upstream discussions. The primary user visible difference is that
> >>>>>> option parsing is now much more strict. If a vf_token option is
> >>>>>> provided that cannot be used, we generate an error. As a result of
> >>>>>> this, opening a PF with a vf_token option will serve as a mechanism of
> >>>>>> setting the vf_token. This seems like a more user friendly API than
> >>>>>> the alternative of sometimes requiring the option (VFs in use) and
> >>>>>> sometimes rejecting it, and upholds our desire that the option is
> >>>>>> always either used or rejected.
> >>>>>>
> >>>>>> This also means that the VFIO_DEVICE_FEATURE ioctl is not the only
> >>>>>> means of setting the VF token, which might call into question whether
> >>>>>> we absolutely need this new ioctl. Currently I'm keeping it because I
> >>>>>> can imagine use cases, for example if a hypervisor were to support
> >>>>>> SR-IOV, the PF device might be opened without consideration for a VF
> >>>>>> token and we'd require the hypservisor to close and re-open the PF in
> >>>>>> order to set a known VF token, which is impractical.
> >>>>>>
> >>>>>> Series overview (same as provided with v1):
> >>>>> Thanks for doing this!
> >>>>>
> >>>>>> The synopsis of this series is that we have an ongoing desire to drive
> >>>>>> PCIe SR-IOV PFs from userspace with VFIO. There's an immediate need
> >>>>>> for this with DPDK drivers and potentially interesting future use
> >>>>> Can you provide a link to the DPDK discussion?
> >>>>>
> >>>>>> cases in virtualization. We've been reluctant to add this support
> >>>>>> previously due to the dependency and trust relationship between the
> >>>>>> VF device and PF driver. Minimally the PF driver can induce a denial
> >>>>>> of service to the VF, but depending on the specific implementation,
> >>>>>> the PF driver might also be responsible for moving data between VFs
> >>>>>> or have direct access to the state of the VF, including data or state
> >>>>>> otherwise private to the VF or VF driver.
> >>>>> Just a loud thinking. While the motivation of VF token sounds reasonable
> >>>>> to me, I'm curious why the same concern is not raised in other usages.
> >>>>> For example, there is no such design in virtio framework, where the
> >>>>> virtio device could also be restarted, putting in separate process (vhost-user),
> >>>>> and even in separate VM (virtio-vhost-user), etc.
> >>>> AFAIK, the restart could only be triggered by either VM or qemu. But
> >>>> yes, the datapath could be offloaded.
> >>>>
> >>>> But I'm not sure introducing another dedicated mechanism is better than
> >>>> using the exist generic POSIX mechanism to make sure the connection
> >>>> (AF_UINX) is secure.
> >>>>
> >>>>
> >>>>> Of course the para-
> >>>>> virtualized attribute of virtio implies some degree of trust, but as you
> >>>>> mentioned many SR-IOV implementations support VF->PF communication
> >>>>> which also implies some level of trust. It's perfectly fine if VFIO just tries
> >>>>> to do better than other sub-systems, but knowing how other people
> >>>>> tackle the similar problem may make the whole picture clearer. ????
> >>>>>
> >>>>> +Jason.
> >>>> I'm not quite sure e.g allowing userspace PF driver with kernel VF
> >>>> driver would not break the assumption of kernel security model. At least
> >>>> we should forbid a unprivileged PF driver running in userspace.
> >>> It might be useful to have your opinion on this series, because that's
> >>> exactly what we're trying to do here. Various environments, DPDK
> >>> specifically, want a userspace PF driver. This series takes steps to
> >>> mitigate the risk of having such a driver, such as requiring this VF
> >>> token interface to extend the VFIO interface and validate participation
> >>> around a PF that is not considered trusted by the kernel.
> >>
> >> I may miss something. But what happens if:
> >>
> >> - PF driver is running by unprivileged user
> >> - PF is programmed to send translated DMA request
> >> - Then unprivileged user can mangle the kernel data
> > ATS is a security risk regardless of SR-IOV, how does this change it?
> > Thanks,
>
>
> My understanding is the ATS only happen for some bugous devices. Some
> hardware has on-chip IOMMU, this probably means unprivileged userspace
> PF driver can control the on-chip IOMMU in this case.

Again, how does this relate to SR-IOV? A PF is currently assignable
regardless of the support in this series. Thanks,

Alex

2020-03-09 14:57:20

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] vfio/pci: Add sriov_configure support

On Mon, 9 Mar 2020 01:48:11 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Alex Williamson
> > Sent: Monday, March 9, 2020 8:46 AM
> >
> > On Sat, 7 Mar 2020 01:35:23 +0000
> > "Tian, Kevin" <[email protected]> wrote:
> >
> > > > From: Alex Williamson
> > > > Sent: Saturday, March 7, 2020 6:18 AM
> > > >
> > > > On Fri, 6 Mar 2020 07:57:19 +0000
> > > > "Tian, Kevin" <[email protected]> wrote:
> > > >
> > > > > > From: Alex Williamson <[email protected]>
> > > > > > Sent: Friday, March 6, 2020 2:23 AM
> > > > > >
> > > > > > On Tue, 25 Feb 2020 03:08:00 +0000
> > > > > > "Tian, Kevin" <[email protected]> wrote:
> > > > > >
> > > > > > > > From: Alex Williamson
> > > > > > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > > > > > >
> > > > > > > > With the VF Token interface we can now expect that a vfio
> > userspace
> > > > > > > > driver must be in collaboration with the PF driver, an unwitting
> > > > > > > > userspace driver will not be able to get past the GET_DEVICE_FD
> > step
> > > > > > > > in accessing the device. We can now move on to actually allowing
> > > > > > > > SR-IOV to be enabled by vfio-pci on the PF. Support for this is not
> > > > > > > > enabled by default in this commit, but it does provide a module
> > > > option
> > > > > > > > for this to be enabled (enable_sriov=1). Enabling VFs is rather
> > > > > > > > straightforward, except we don't want to risk that a VF might get
> > > > > > > > autoprobed and bound to other drivers, so a bus notifier is used
> > to
> > > > > > > > "capture" VFs to vfio-pci using the driver_override support. We
> > > > > > > > assume any later action to bind the device to other drivers is
> > > > > > > > condoned by the system admin and allow it with a log warning.
> > > > > > > >
> > > > > > > > vfio-pci will disable SR-IOV on a PF before releasing the device,
> > > > > > > > allowing a VF driver to be assured other drivers cannot take over
> > the
> > > > > > > > PF and that any other userspace driver must know the shared VF
> > > > token.
> > > > > > > > This support also does not provide a mechanism for the PF
> > userspace
> > > > > > > > driver itself to manipulate SR-IOV through the vfio API. With this
> > > > > > > > patch SR-IOV can only be enabled via the host sysfs interface and
> > the
> > > > > > > > PF driver user cannot create or remove VFs.
> > > > > > >
> > > > > > > I'm not sure how many devices can be properly configured simply
> > > > > > > with pci_enable_sriov. It is not unusual to require PF driver prepare
> > > > > > > something before turning PCI SR-IOV capability. If you look kernel
> > > > > > > PF drivers, there are only two using generic pci_sriov_configure_
> > > > > > > simple (simple wrapper like pci_enable_sriov), while most others
> > > > > > > implementing their own callback. However vfio itself has no idea
> > > > > > > thus I'm not sure how an user knows whether using this option can
> > > > > > > actually meet his purpose. I may miss something here, possibly
> > > > > > > using DPDK as an example will make it clearer.
> > > > > >
> > > > > > There is still the entire vfio userspace driver interface. Imagine for
> > > > > > example that QEMU emulates the SR-IOV capability and makes a call
> > out
> > > > > > to libvirt (or maybe runs with privs for the PF SR-IOV sysfs attribs)
> > > > > > when the guest enables SR-IOV. Can't we assume that any PF specific
> > > > > > support can still be performed in the userspace/guest driver, leaving
> > > > > > us with a very simple and generic sriov_configure callback in vfio-pci?
> > > > >
> > > > > Makes sense. One concern, though, is how an user could be warned
> > > > > if he inadvertently uses sysfs to enable SR-IOV on a vfio device whose
> > > > > userspace driver is incapable of handling it. Note any VFIO device,
> > > > > if SR-IOV capable, will allow user to do so once the module option is
> > > > > turned on and the callback is registered. I felt such uncertainty can be
> > > > > contained by toggling SR-IOV through a vfio api, but from your
> > description
> > > > > obviously it is what you want to avoid. Is it due to the sequence reason,
> > > > > e.g. that SR-IOV must be enabled before userspace PF driver sets the
> > > > > token?
> > > >
> > > > As in my other reply, enabling SR-IOV via a vfio API suggests that
> > > > we're not only granting the user owning the PF device access to the
> > > > device itself, but also the ability to create and remove subordinate
> > > > devices on the host. That implies an extended degree of trust in the
> > > > user beyond the PF device itself and raises questions about whether a
> > > > user who is allowed to create VF devices should automatically be
> > > > granted access to those VF devices, what the mechanism would be for
> > > > that, and how we might re-assign those devices to other users,
> > > > potentially including host kernel usage. What I'm proposing here
> > > > doesn't preclude some future extension in that direction, but instead
> > > > tries to simplify a first step towards enabling SR-IOV by leaving the
> > > > SR-IOV enablement and VF assignment in the realm of a privileged system
> > > > entity.
> > >
> > > the intention is clear to me now.
> > >
> > > >
> > > > So, what I think you're suggesting here is that we should restrict
> > > > vfio_pci_sriov_configure() to reject enabling SR-IOV until a user
> > > > driver has configured a VF token. That requires both that the
> > > > userspace driver has initialized to this point before SR-IOV can be
> > > > enabled and that we would be forced to define a termination point for
> > > > the user set VF token. Logically, this would need to be when the
> > > > userspace driver exits or closes the PF device, which implies that we
> > > > need to disable SR-IOV on the PF at this point, or we're left in an
> > > > inconsistent state where VFs are enabled but cannot be disabled because
> > > > we don't have a valid VF token. Now we're back to nearly a state where
> > > > the user has control of not creating devices on the host, but removing
> > > > them by closing the device, which will necessarily require that any VF
> > > > driver release the device, whether userspace or kernel.
> > > >
> > > > I'm not sure what we're gaining by doing this though. I agree that
> > > > there will be users that enable SR-IOV on a PF and then try to, for
> > > > example, assign the PF and all the VFs to a VM. The VFs will fail due
> > > > to lacking VF token support, unless they've patch QEMU with my test
> > > > code, but depending on the PF driver in the guest, it may, or more
> > > > likely won't work. But don't you think the VFs and probably PF not
> > > > working is a sufficient clue that the configuration is invalid? OTOH,
> > > > from what I've heard of the device in the ID table of the pci-pf-stub
> > > > driver, they might very well be able to work with both PF and VFs in
> > > > QEMU using only my test code to set the VF token.
> > > >
> > > > Therefore, I'm afraid what you're asking for here is to impose a usage
> > > > restriction as a sanity test, when we don't really know what might be
> > > > sane for this particular piece of hardware or use case. There are
> > > > infinite ways that a vfio based userspace driver can fail to configure
> > > > their hardware and make it work correctly, many of them are device
> > > > specific. Isn't this just one of those cases? Thanks,
> > > >
> > >
> > > what you said all makes sense. so I withdraw the idea of manipulating
> > > SR-IOV through vfio ioctl. However I still feel that simply registering
> > > sriov_configuration callback by vfio-pci somehow violates the typical
> > > expectation of the sysfs interface. Before this patch, the success return
> > > of writing non-zero value to numvfs implies VFs are in sane state and
> > > functionally ready for immediate use. However now the behavior of
> > > success return becomes undefined for vfio devices, since even vfio-pci
> > > itself doesn't know whether VFs are functional for a random device
> > > (may know some if carrying the same device IDs from pci-pf-stub). It
> > > simply relies on the privileged entity who knows exactly the implication
> > > of such write, while there is no way to warn inadvertent users which
> > > to me is not a good design from kernel API p.o.v. Of course we may
> > > document such restriction and the driver_override may also be an
> > > indirect way to warn such user if he wants to use VFs for other purpose.
> > > But it is still less elegant than reporting it in the first place. Maybe
> > > what we really require is a new sysfs attribute purely for enabling
> > > PCI SR-IOV capability, which doesn't imply making VFs actually
> > > functional as did through the existing numvfs?
> >
> > I don't read the same guarantee into the sysfs SR-IOV interface. If
> > such a guarantee exists, it's already broken by pci-pf-stub, which like
> > vfio-pci allows dynamic IDs and driver_override to bind to any PF device
> > allowing the ability to create (potentially) non-functional VFs. I
>
> I don't know whether others raised the similar concern and how
> it was addressed for pci-pf-stub before. Many places describe
> numvfs as the preferred interface to enable/disable VFs while
> 'enable' just reads functional to me.

From a PCI perspective, they are functional. We've enabled them in the
sense that they appear on the bus. Whether they are functional or not
depends on device specific configuration. If I take your definition to
an extreme, it seems that we might for example not allow SR-IOV to be
enabled unless an 82576 PF has the network link up because the VF
wouldn't be able to route packets until that point. Do we require that
the igb PF driver generates a warning that VFs might not be functional
if the link is down when SR-IOV is enabled? I'm absolutely not
recommending we do this, I'm just pointing out that I think a different
standard is being suggested here than actually exists. Thanks,

Alex

2020-03-09 15:36:44

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] vfio/pci: Introduce VF token

On Mon, 9 Mar 2020 01:33:48 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Tian, Kevin
> > Sent: Monday, March 9, 2020 9:22 AM
> >
> > > From: Alex Williamson <[email protected]>
> > > Sent: Monday, March 9, 2020 8:46 AM
> > >
> > > On Sat, 7 Mar 2020 01:04:41 +0000
> > > "Tian, Kevin" <[email protected]> wrote:
> > >
> > > > > From: Alex Williamson <[email protected]>
> > > > > Sent: Friday, March 6, 2020 11:39 PM
> > > > >
> > > > > On Fri, 6 Mar 2020 08:32:40 +0000
> > > > > "Tian, Kevin" <[email protected]> wrote:
> > > > >
> > > > > > > From: Alex Williamson <[email protected]>
> > > > > > > Sent: Friday, March 6, 2020 2:18 AM
> > > > > > >
> > > > > > > On Tue, 25 Feb 2020 02:59:37 +0000
> > > > > > > "Tian, Kevin" <[email protected]> wrote:
> > > > > > >
> > > > > > > > > From: Alex Williamson
> > > > > > > > > Sent: Thursday, February 20, 2020 2:54 AM
> > > > > > > > >
> > > > > > > > > If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are
> > > not
> > > > > > > > > fully isolated from the PF. The PF can always cause a denial of
> > > service
> > > > > > > > > to the VF, even if by simply resetting itself. The degree to which
> > a
> > > PF
> > > > > > > > > can access the data passed through a VF or interfere with its
> > > > > operation
> > > > > > > > > is dependent on a given SR-IOV implementation. Therefore we
> > > want
> > > > > to
> > > > > > > > > avoid a scenario where an existing vfio-pci based userspace
> > driver
> > > > > might
> > > > > > > > > assume the PF driver is trusted, for example assigning a PF to
> > one
> > > VM
> > > > > > > > > and VF to another with some expectation of isolation. IOMMU
> > > > > grouping
> > > > > > > > > could be a solution to this, but imposes an unnecessarily strong
> > > > > > > > > relationship between PF and VF drivers if they need to operate
> > > with
> > > > > the
> > > > > > > > > same IOMMU context. Instead we introduce a "VF token", which
> > > is
> > > > > > > > > essentially just a shared secret between PF and VF drivers,
> > > > > implemented
> > > > > > > > > as a UUID.
> > > > > > > > >
> > > > > > > > > The VF token can be set by a vfio-pci based PF driver and must
> > be
> > > > > known
> > > > > > > > > by the vfio-pci based VF driver in order to gain access to the
> > device.
> > > > > > > > > This allows the degree to which this VF token is considered
> > secret
> > > to
> > > > > be
> > > > > > > > > determined by the applications and environment. For example a
> > > VM
> > > > > > > might
> > > > > > > > > generate a random UUID known only internally to the hypervisor
> > > > > while a
> > > > > > > > > userspace networking appliance might use a shared, or even well
> > > > > know,
> > > > > > > > > UUID among the application drivers.
> > > > > > > > >
> > > > > > > > > To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD
> > > > > interface
> > > > > > > is
> > > > > > > > > extended to accept key=value pairs in addition to the device
> > name.
> > > > > This
> > > > > > > > > allows us to most easily deny user access to the device without
> > risk
> > > > > > > > > that existing userspace drivers assume region offsets, IRQs, and
> > > other
> > > > > > > > > device features, leading to more elaborate error paths. The
> > > format of
> > > > > > > > > these options are expected to take the form:
> > > > > > > > >
> > > > > > > > > "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"
> > > > > > > > >
> > > > > > > > > Where the device name is always provided first for compatibility
> > > and
> > > > > > > > > additional options are specified in a space separated list. The
> > > > > > > > > relation between and requirements for the additional options
> > will
> > > be
> > > > > > > > > vfio bus driver dependent, however unknown or unused option
> > > > > within
> > > > > > > this
> > > > > > > > > schema should return error. This allow for future use of
> > unknown
> > > > > > > > > options as well as a positive indication to the user that an option
> > is
> > > > > > > > > used.
> > > > > > > > >
> > > > > > > > > An example VF token option would take this form:
> > > > > > > > >
> > > > > > > > > "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-
> > 8569e5b08258"
> > > > > > > > >
> > > > > > > > > When accessing a VF where the PF is making use of vfio-pci, the
> > > user
> > > > > > > > > MUST provide the current vf_token. When accessing a PF, the
> > > user
> > > > > MUST
> > > > > > > > > provide the current vf_token IF there are active VF users or MAY
> > > > > provide
> > > > > > > > > a vf_token in order to set the current VF token when no VF users
> > > are
> > > > > > > > > active. The former requirement assures VF users that an
> > > > > unassociated
> > > > > > > > > driver cannot usurp the PF device. These semantics also imply
> > that
> > > a
> > > > > > > > > VF token MUST be set by a PF driver before VF drivers can access
> > > their
> > > > > > > > > device, the default token is random and mechanisms to read the
> > > > > token
> > > > > > > are
> > > > > > > > > not provided in order to protect the VF token of previous users.
> > > Use
> > > > > of
> > > > > > > > > the vf_token option outside of these cases will return an error,
> > as
> > > > > > > > > discussed above.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > > > > > > ---
> > > > > > > > > drivers/vfio/pci/vfio_pci.c | 198
> > > > > > > > > +++++++++++++++++++++++++++++++++++
> > > > > > > > > drivers/vfio/pci/vfio_pci_private.h | 8 +
> > > > > > > > > 2 files changed, 205 insertions(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/vfio/pci/vfio_pci.c
> > b/drivers/vfio/pci/vfio_pci.c
> > > > > > > > > index 2ec6c31d0ab0..8dd6ef9543ca 100644
> > > > > > > > > --- a/drivers/vfio/pci/vfio_pci.c
> > > > > > > > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > > > > > > > @@ -466,6 +466,44 @@ static void vfio_pci_disable(struct
> > > > > > > vfio_pci_device
> > > > > > > > > *vdev)
> > > > > > > > > vfio_pci_set_power_state(vdev, PCI_D3hot);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +static struct pci_driver vfio_pci_driver;
> > > > > > > > > +
> > > > > > > > > +static struct vfio_pci_device *get_pf_vdev(struct vfio_pci_device
> > > > > *vdev,
> > > > > > > > > + struct vfio_device **pf_dev)
> > > > > > > > > +{
> > > > > > > > > + struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > > > > > > > > +
> > > > > > > > > + if (!vdev->pdev->is_virtfn)
> > > > > > > > > + return NULL;
> > > > > > > > > +
> > > > > > > > > + *pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > > > > > > > > + if (!*pf_dev)
> > > > > > > > > + return NULL;
> > > > > > > > > +
> > > > > > > > > + if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> > > > > > > > > + vfio_device_put(*pf_dev);
> > > > > > > > > + return NULL;
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > + return vfio_device_data(*pf_dev);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void vfio_pci_vf_token_user_add(struct vfio_pci_device
> > > *vdev,
> > > > > int
> > > > > > > val)
> > > > > > > > > +{
> > > > > > > > > + struct vfio_device *pf_dev;
> > > > > > > > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> > > > > &pf_dev);
> > > > > > > > > +
> > > > > > > > > + if (!pf_vdev)
> > > > > > > > > + return;
> > > > > > > > > +
> > > > > > > > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > > > > > > > + pf_vdev->vf_token->users += val;
> > > > > > > > > + WARN_ON(pf_vdev->vf_token->users < 0);
> > > > > > > > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > > > > > > > +
> > > > > > > > > + vfio_device_put(pf_dev);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > static void vfio_pci_release(void *device_data)
> > > > > > > > > {
> > > > > > > > > struct vfio_pci_device *vdev = device_data;
> > > > > > > > > @@ -473,6 +511,7 @@ static void vfio_pci_release(void
> > > *device_data)
> > > > > > > > > mutex_lock(&vdev->reflck->lock);
> > > > > > > > >
> > > > > > > > > if (!(--vdev->refcnt)) {
> > > > > > > > > + vfio_pci_vf_token_user_add(vdev, -1);
> > > > > > > > > vfio_spapr_pci_eeh_release(vdev->pdev);
> > > > > > > > > vfio_pci_disable(vdev);
> > > > > > > > > }
> > > > > > > > > @@ -498,6 +537,7 @@ static int vfio_pci_open(void
> > *device_data)
> > > > > > > > > goto error;
> > > > > > > > >
> > > > > > > > > vfio_spapr_pci_eeh_open(vdev->pdev);
> > > > > > > > > + vfio_pci_vf_token_user_add(vdev, 1);
> > > > > > > > > }
> > > > > > > > > vdev->refcnt++;
> > > > > > > > > error:
> > > > > > > > > @@ -1278,11 +1318,148 @@ static void vfio_pci_request(void
> > > > > > > *device_data,
> > > > > > > > > unsigned int count)
> > > > > > > > > mutex_unlock(&vdev->igate);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +static int vfio_pci_validate_vf_token(struct vfio_pci_device
> > *vdev,
> > > > > > > > > + bool vf_token, uuid_t *uuid)
> > > > > > > > > +{
> > > > > > > > > + /*
> > > > > > > > > + * There's always some degree of trust or collaboration
> > > > > between SR-
> > > > > > > > > IOV
> > > > > > > > > + * PF and VFs, even if just that the PF hosts the SR-IOV
> > > > > capability and
> > > > > > > > > + * can disrupt VFs with a reset, but often the PF has more
> > > > > explicit
> > > > > > > > > + * access to deny service to the VF or access data passed
> > > > > through the
> > > > > > > > > + * VF. We therefore require an opt-in via a shared VF token
> > > > > (UUID)
> > > > > > > > > to
> > > > > > > > > + * represent this trust. This both prevents that a VF driver
> > > > > might
> > > > > > > > > + * assume the PF driver is a trusted, in-kernel driver, and also
> > > > > that
> > > > > > > > > + * a PF driver might be replaced with a rogue driver, unknown
> > > > > to in-
> > > > > > > > > use
> > > > > > > > > + * VF drivers.
> > > > > > > > > + *
> > > > > > > > > + * Therefore when presented with a VF, if the PF is a vfio
> > > > > device and
> > > > > > > > > + * it is bound to the vfio-pci driver, the user needs to provide
> > > > > a VF
> > > > > > > > > + * token to access the device, in the form of appending a
> > > > > vf_token to
> > > > > > > > > + * the device name, for example:
> > > > > > > > > + *
> > > > > > > > > + * "0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-
> > > > > f591514ba1f3"
> > > > > > > > > + *
> > > > > > > > > + * When presented with a PF which has VFs in use, the user
> > > > > must also
> > > > > > > > > + * provide the current VF token to prove collaboration with
> > > > > existing
> > > > > > > > > + * VF users. If VFs are not in use, the VF token provided for
> > > > > the PF
> > > > > > > > > + * device will act to set the VF token.
> > > > > > > > > + *
> > > > > > > > > + * If the VF token is provided but unused, a fault is generated.
> > > > > > > >
> > > > > > > > fault->error, otherwise it is easy to consider a CPU fault. ????
> > > > > > >
> > > > > > > Ok, I can make that change, but I think you might have a unique
> > > > > > > background to make a leap that a userspace ioctl can trigger a CPU
> > > > > > > fault ;)
> > > > > > >
> > > > > > > > > + */
> > > > > > > > > + if (!vdev->pdev->is_virtfn && !vdev->vf_token && !vf_token)
> > > > > > > > > + return 0; /* No VF token provided or required */
> > > > > > > > > +
> > > > > > > > > + if (vdev->pdev->is_virtfn) {
> > > > > > > > > + struct vfio_device *pf_dev;
> > > > > > > > > + struct vfio_pci_device *pf_vdev = get_pf_vdev(vdev,
> > > > > > > > > &pf_dev);
> > > > > > > > > + bool match;
> > > > > > > > > +
> > > > > > > > > + if (!pf_vdev) {
> > > > > > > > > + if (!vf_token)
> > > > > > > > > + return 0; /* PF is not vfio-pci, no VF
> > > > > token */
> > > > > > > > > +
> > > > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > > > + "VF token incorrectly provided, PF not
> > > > > bound
> > > > > > > > > to vfio-pci\n");
> > > > > > > > > + return -EINVAL;
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > + if (!vf_token) {
> > > > > > > > > + vfio_device_put(pf_dev);
> > > > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > > > + "VF token required to access
> > > > > device\n");
> > > > > > > > > + return -EACCES;
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > + mutex_lock(&pf_vdev->vf_token->lock);
> > > > > > > > > + match = uuid_equal(uuid, &pf_vdev->vf_token-
> > > > > >uuid);
> > > > > > > > > + mutex_unlock(&pf_vdev->vf_token->lock);
> > > > > > > > > +
> > > > > > > > > + vfio_device_put(pf_dev);
> > > > > > > > > +
> > > > > > > > > + if (!match) {
> > > > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > > > + "Incorrect VF token provided for
> > > > > device\n");
> > > > > > > > > + return -EACCES;
> > > > > > > > > + }
> > > > > > > > > + } else if (vdev->vf_token) {
> > > > > > > > > + mutex_lock(&vdev->vf_token->lock);
> > > > > > > > > + if (vdev->vf_token->users) {
> > > > > > > > > + if (!vf_token) {
> > > > > > > > > + mutex_unlock(&vdev->vf_token-
> > > > > >lock);
> > > > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > > > + "VF token required to access
> > > > > > > > > device\n");
> > > > > > > > > + return -EACCES;
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > + if (!uuid_equal(uuid, &vdev->vf_token->uuid))
> > > > > {
> > > > > > > > > + mutex_unlock(&vdev->vf_token-
> > > > > >lock);
> > > > > > > > > + pci_info_ratelimited(vdev->pdev,
> > > > > > > > > + "Incorrect VF token provided
> > > > > for
> > > > > > > > > device\n");
> > > > > > > > > + return -EACCES;
> > > > > > > > > + }
> > > > > > > > > + } else if (vf_token) {
> > > > > > > > > + uuid_copy(&vdev->vf_token->uuid, uuid);
> > > > > > > > > + }
> > > > > > > >
> > > > > > > > It implies that we allow PF to be accessed w/o providing a VF token,
> > > > > > > > as long as no VF is currently in-use, which further means no VF can
> > > > > > > > be further assigned since no one knows the random uuid allocated
> > > > > > > > by vfio. Just want to confirm whether it is the desired flavor. If an
> > > > > > > > user really wants to use PF-only, possibly he should disable SR-IOV
> > > > > > > > instead...
> > > > > > >
> > > > > > > Yes, this is the behavior I'm intending. Are you suggesting that we
> > > > > > > should require a VF token in order to access a PF that has SR-IOV
> > > > > > > already enabled? This introduces an inconsistency that SR-IOV can
> > be
> > > > > >
> > > > > > yes. I felt that it's meaningless otherwise if an user has no attempt to
> > > > > > manage SR-IOV but still leaving it enabled. In many cases, enabling of
> > > > > > SR-IOV may reserve some resource in the hardware, thus simply
> > hurting
> > > > > > PF performance.
> > > > >
> > > > > But a user needs to be granted access to a device by a privileged
> > > > > entity and the privileged entity may also enable SR-IOV, so it seems
> > > > > you're assuming the privileged entity is operating independently and
> > > > > not in the best interest of enabling the specific user case.
> > > >
> > > > what about throwing out a warning for such situation? so the userspace
> > > > knows some collaboration is missing before its access to the device.
> > >
> > > This seems arbitrary. pci-pf-stub proves to us that there are devices
> > > that need no special setup for SR-IOV, we don't know that we don't have
> > > such a device. Enabling SR-IOV after the user opens the device also
>
> btw no special setup doesn't mean that a PF driver cannot do bad thing to
> VFs. In such case, I think the whole token idea should be still applied.

pci-pf-stub is a native host driver that does not provide access to the
device to userspace. We trust native host drivers, whether they be
pci-pf-stub, igb, ixgbe, i40e, etc. We don't require a token to attach
a userspace driver to a VF of one of these PF drivers because we trust
them. The VF token idea is exclusively for cases where the PF driver
is an untrusted userspace driver.

> > > doesn't indicate there's necessarily collaboration between the two, so
> > > if we generate a warning on one, how do we assume the other is ok? I
> > > don't really understand why this is generating such concern. Thanks,
>
> specifically I feel we should warn both:
>
> 1) userspace driver GET_DEVICE_FD w/o providing a token on a PF
> which has SR-IOV already enabled
> 2) admin writes non-zero numvfs to a PF which has already bound to
> userspace driver which doesn't provide a token
>
> in both cases VFs are enabled but cannot be used (if you agree that
> the token idea should be also applied to 'no special setup' case)

Both of these seem to be imposing an ordering requirement simply for the
sake of generating a warning. Nothing else requires that ordering. In
case 1), it's just as legitimate for the user to call
VFIO_DEVICE_FEATURE to set the token after they've opened the device.
Perhaps the user would even look at config space via the vfio-pci API
in order to determine that SR-IOV is enabled and set a token. In case
2), the user driver may choose not to set a token until VFs have been
successfully created.

> > I meant to warn the suboptimal case where the userspace driver doesn't
> > provide a token when accessing a PF which has SR-IOV already enabled.
> > I don't think a sane configuration/coordination should do this since all
> > VFs are simply wasted and instead may hurt the PF performance...

*May* hurt performance, but we don't know. Some designs might have
resources dedicated to VFs that aren't used by the PF at all. As I've
experimented with this patch series, I find that an igb PF with SR-IOV
enabled assigned to a VM doesn't work at all, it's not simply a
performance issue. I suspect that's going to be a clue to the user
that their configuration is invalid. I'm sure we'll take some support
overhead as a result of that, but I don't see that we can generate an
arbitrary advisement warning when it very well might be supported on
other devices. This is the nature of a meta driver that supports any
device bound to it. Thanks,

Alex