2014-12-03 21:41:12

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH v5] Fixes for PCI backend for 3.19.


Since v4 (http://lists.xen.org/archives/html/xen-devel/2014-11/msg02130.html):
- Per David's review altered one of the patches.
v3 (https://lkml.org/lkml/2014/7/8/533):
- Epic discussion.

These patches fix some issues with PCI back and also add proper
bus/slot reset.


Documentation/ABI/testing/sysfs-driver-pciback | 12 ++
drivers/pci/pci.c | 5 +-
drivers/xen/xen-pciback/passthrough.c | 14 +-
drivers/xen/xen-pciback/pci_stub.c | 236 +++++++++++++++++++++----
drivers/xen/xen-pciback/pciback.h | 7 +-
drivers/xen/xen-pciback/vpci.c | 14 +-
drivers/xen/xen-pciback/xenbus.c | 4 +-
include/linux/device.h | 5 +
include/linux/pci.h | 2 +
9 files changed, 254 insertions(+), 45 deletions(-)


Jan Beulich (1):
xen-pciback: drop SR-IOV VFs when PF driver unloads

Konrad Rzeszutek Wilk (8):
xen/pciback: Don't deadlock when unbinding.
driver core: Provide an wrapper around the mutex to do lockdep warnings
xen/pciback: Include the domain id if removing the device whilst still in use
xen/pciback: Print out the domain owning the device.
xen/pciback: Remove tons of dereferences
PCI: Expose pci_load_saved_state for public consumption.
xen/pciback: Restore configuration space when detaching from a guest.
xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute


2014-12-03 21:41:16

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH v5 1/9] xen/pciback: Don't deadlock when unbinding.

As commit 0a9fd0152929db372ff61b0d6c280fdd34ae8bdb
'xen/pciback: Document the entry points for 'pcistub_put_pci_dev''
explained there are four entry points in this function.
Two of them are when the user fiddles in the SysFS to
unbind a device which might be in use by a guest or not.

Both 'unbind' states will cause a deadlock as the the PCI lock has
already been taken, which then pci_device_reset tries to take.

We can simplify this by requiring that all callers of
pcistub_put_pci_dev MUST hold the device lock. And then
we can just call the lockless version of pci_device_reset.

To make it even simpler we will modify xen_pcibk_release_pci_dev
to quality whether it should take a lock or not - as it ends
up calling xen_pcibk_release_pci_dev and needs to hold the lock.

Reviewed-by: Boris Ostrovsky <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/xen/xen-pciback/passthrough.c | 14 +++++++++++---
drivers/xen/xen-pciback/pci_stub.c | 12 ++++++------
drivers/xen/xen-pciback/pciback.h | 7 ++++---
drivers/xen/xen-pciback/vpci.c | 14 +++++++++++---
drivers/xen/xen-pciback/xenbus.c | 2 +-
5 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/drivers/xen/xen-pciback/passthrough.c b/drivers/xen/xen-pciback/passthrough.c
index 828dddc..f16a30e 100644
--- a/drivers/xen/xen-pciback/passthrough.c
+++ b/drivers/xen/xen-pciback/passthrough.c
@@ -69,7 +69,7 @@ static int __xen_pcibk_add_pci_dev(struct xen_pcibk_device *pdev,
}

static void __xen_pcibk_release_pci_dev(struct xen_pcibk_device *pdev,
- struct pci_dev *dev)
+ struct pci_dev *dev, bool lock)
{
struct passthrough_dev_data *dev_data = pdev->pci_dev_data;
struct pci_dev_entry *dev_entry, *t;
@@ -87,8 +87,13 @@ static void __xen_pcibk_release_pci_dev(struct xen_pcibk_device *pdev,

mutex_unlock(&dev_data->lock);

- if (found_dev)
+ if (found_dev) {
+ if (lock)
+ device_lock(&found_dev->dev);
pcistub_put_pci_dev(found_dev);
+ if (lock)
+ device_unlock(&found_dev->dev);
+ }
}

static int __xen_pcibk_init_devices(struct xen_pcibk_device *pdev)
@@ -156,8 +161,11 @@ static void __xen_pcibk_release_devices(struct xen_pcibk_device *pdev)
struct pci_dev_entry *dev_entry, *t;

list_for_each_entry_safe(dev_entry, t, &dev_data->dev_list, list) {
+ struct pci_dev *dev = dev_entry->dev;
list_del(&dev_entry->list);
- pcistub_put_pci_dev(dev_entry->dev);
+ device_lock(&dev->dev);
+ pcistub_put_pci_dev(dev);
+ device_unlock(&dev->dev);
kfree(dev_entry);
}

diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index 017069a..9cbe1a3 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -250,6 +250,8 @@ struct pci_dev *pcistub_get_pci_dev(struct xen_pcibk_device *pdev,
* - 'echo BDF > unbind' with a guest still using it. See pcistub_remove
*
* As such we have to be careful.
+ *
+ * To make this easier, the caller has to hold the device lock.
*/
void pcistub_put_pci_dev(struct pci_dev *dev)
{
@@ -276,11 +278,8 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
/* Cleanup our device
* (so it's ready for the next domain)
*/
-
- /* This is OK - we are running from workqueue context
- * and want to inhibit the user from fiddling with 'reset'
- */
- pci_reset_function(dev);
+ lockdep_assert_held(&dev->dev.mutex);
+ __pci_reset_function_locked(dev);
pci_restore_state(dev);

/* This disables the device. */
@@ -567,7 +566,8 @@ static void pcistub_remove(struct pci_dev *dev)
/* N.B. This ends up calling pcistub_put_pci_dev which ends up
* doing the FLR. */
xen_pcibk_release_pci_dev(found_psdev->pdev,
- found_psdev->dev);
+ found_psdev->dev,
+ false /* caller holds the lock. */);
}

spin_lock_irqsave(&pcistub_devices_lock, flags);
diff --git a/drivers/xen/xen-pciback/pciback.h b/drivers/xen/xen-pciback/pciback.h
index f72af87..58e38d5 100644
--- a/drivers/xen/xen-pciback/pciback.h
+++ b/drivers/xen/xen-pciback/pciback.h
@@ -99,7 +99,8 @@ struct xen_pcibk_backend {
unsigned int *domain, unsigned int *bus,
unsigned int *devfn);
int (*publish)(struct xen_pcibk_device *pdev, publish_pci_root_cb cb);
- void (*release)(struct xen_pcibk_device *pdev, struct pci_dev *dev);
+ void (*release)(struct xen_pcibk_device *pdev, struct pci_dev *dev,
+ bool lock);
int (*add)(struct xen_pcibk_device *pdev, struct pci_dev *dev,
int devid, publish_pci_dev_cb publish_cb);
struct pci_dev *(*get)(struct xen_pcibk_device *pdev,
@@ -122,10 +123,10 @@ static inline int xen_pcibk_add_pci_dev(struct xen_pcibk_device *pdev,
}

static inline void xen_pcibk_release_pci_dev(struct xen_pcibk_device *pdev,
- struct pci_dev *dev)
+ struct pci_dev *dev, bool lock)
{
if (xen_pcibk_backend && xen_pcibk_backend->release)
- return xen_pcibk_backend->release(pdev, dev);
+ return xen_pcibk_backend->release(pdev, dev, lock);
}

static inline struct pci_dev *
diff --git a/drivers/xen/xen-pciback/vpci.c b/drivers/xen/xen-pciback/vpci.c
index 51afff9..c99f8bb 100644
--- a/drivers/xen/xen-pciback/vpci.c
+++ b/drivers/xen/xen-pciback/vpci.c
@@ -145,7 +145,7 @@ out:
}

static void __xen_pcibk_release_pci_dev(struct xen_pcibk_device *pdev,
- struct pci_dev *dev)
+ struct pci_dev *dev, bool lock)
{
int slot;
struct vpci_dev_data *vpci_dev = pdev->pci_dev_data;
@@ -169,8 +169,13 @@ static void __xen_pcibk_release_pci_dev(struct xen_pcibk_device *pdev,
out:
mutex_unlock(&vpci_dev->lock);

- if (found_dev)
+ if (found_dev) {
+ if (lock)
+ device_lock(&found_dev->dev);
pcistub_put_pci_dev(found_dev);
+ if (lock)
+ device_unlock(&found_dev->dev);
+ }
}

static int __xen_pcibk_init_devices(struct xen_pcibk_device *pdev)
@@ -208,8 +213,11 @@ static void __xen_pcibk_release_devices(struct xen_pcibk_device *pdev)
struct pci_dev_entry *e, *tmp;
list_for_each_entry_safe(e, tmp, &vpci_dev->dev_list[slot],
list) {
+ struct pci_dev *dev = e->dev;
list_del(&e->list);
- pcistub_put_pci_dev(e->dev);
+ device_lock(&dev->dev);
+ pcistub_put_pci_dev(dev);
+ device_unlock(&dev->dev);
kfree(e);
}
}
diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c
index ad8d30c..eeee499 100644
--- a/drivers/xen/xen-pciback/xenbus.c
+++ b/drivers/xen/xen-pciback/xenbus.c
@@ -291,7 +291,7 @@ static int xen_pcibk_remove_device(struct xen_pcibk_device *pdev,

/* N.B. This ends up calling pcistub_put_pci_dev which ends up
* doing the FLR. */
- xen_pcibk_release_pci_dev(pdev, dev);
+ xen_pcibk_release_pci_dev(pdev, dev, true /* use the lock. */);

out:
return err;
--
1.9.3

2014-12-03 21:41:14

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH v5 3/9] xen/pciback: Include the domain id if removing the device whilst still in use

Cleanup the function a bit - also include the id of the
domain that is using the device.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Reviewed-by: David Vrabel <[email protected]>
---
drivers/xen/xen-pciback/pci_stub.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index 8b77089..e5ff09d 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -553,12 +553,14 @@ static void pcistub_remove(struct pci_dev *dev)
spin_unlock_irqrestore(&pcistub_devices_lock, flags);

if (found_psdev) {
- dev_dbg(&dev->dev, "found device to remove - in use? %p\n",
- found_psdev->pdev);
+ dev_dbg(&dev->dev, "found device to remove %s\n",
+ found_psdev->pdev ? "- in-use" : "");

if (found_psdev->pdev) {
- pr_warn("****** removing device %s while still in-use! ******\n",
- pci_name(found_psdev->dev));
+ int domid = xen_find_device_domain_owner(dev);
+
+ pr_warn("****** removing device %s while still in-use by domain %d! ******\n",
+ pci_name(found_psdev->dev), domid);
pr_warn("****** driver domain may still access this device's i/o resources!\n");
pr_warn("****** shutdown driver domain before binding device\n");
pr_warn("****** to other drivers or domains\n");
--
1.9.3

2014-12-03 21:41:58

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH v5 8/9] xen-pciback: drop SR-IOV VFs when PF driver unloads

From: Jan Beulich <[email protected]>

When a PF driver unloads, it may find it necessary to leave the VFs
around simply because of pciback having marked them as assigned to a
guest. Utilize a suitable notification to let go of the VFs, thus
allowing the PF to go back into the state it was before its driver
loaded (which in particular allows the driver to be loaded again with
it being able to create the VFs anew, but which also allows to then
pass through the PF instead of the VFs).

Don't do this however for any VFs currently in active use by a guest.

Signed-off-by: Jan Beulich <[email protected]>
[v2: Removed the switch statement, moved it about]
[v3: Redid it a bit differently]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/xen/xen-pciback/pci_stub.c | 54 ++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)

diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index 8580e53..cc3cbb4 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -1518,6 +1518,53 @@ parse_error:
fs_initcall(pcistub_init);
#endif

+#ifdef CONFIG_PCI_IOV
+static struct pcistub_device *find_vfs(const struct pci_dev *pdev)
+{
+ struct pcistub_device *psdev = NULL;
+ unsigned long flags;
+ bool found = false;
+
+ spin_lock_irqsave(&pcistub_devices_lock, flags);
+ list_for_each_entry(psdev, &pcistub_devices, dev_list) {
+ if (!psdev->pdev && psdev->dev != pdev
+ && pci_physfn(psdev->dev) == pdev) {
+ found = true;
+ break;
+ }
+ }
+ spin_unlock_irqrestore(&pcistub_devices_lock, flags);
+ if (found)
+ return psdev;
+ return NULL;
+}
+
+static int pci_stub_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct device *dev = data;
+ const struct pci_dev *pdev = to_pci_dev(dev);
+
+ if (action != BUS_NOTIFY_UNBIND_DRIVER)
+ return NOTIFY_DONE;
+
+ if (!pdev->is_physfn)
+ return NOTIFY_DONE;
+
+ for (;;) {
+ struct pcistub_device *psdev = find_vfs(pdev);
+ if (!psdev)
+ break;
+ device_release_driver(&psdev->dev->dev);
+ }
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block pci_stub_nb = {
+ .notifier_call = pci_stub_notifier,
+};
+#endif
+
static int __init xen_pcibk_init(void)
{
int err;
@@ -1539,12 +1586,19 @@ static int __init xen_pcibk_init(void)
err = xen_pcibk_xenbus_register();
if (err)
pcistub_exit();
+#ifdef CONFIG_PCI_IOV
+ else
+ bus_register_notifier(&pci_bus_type, &pci_stub_nb);
+#endif

return err;
}

static void __exit xen_pcibk_cleanup(void)
{
+#ifdef CONFIG_PCI_IOV
+ bus_unregister_notifier(&pci_bus_type, &pci_stub_nb);
+#endif
xen_pcibk_xenbus_unregister();
pcistub_exit();
}
--
1.9.3

2014-12-03 21:41:09

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH v5 6/9] PCI: Expose pci_load_saved_state for public consumption.

We have the pci_load_and_free_saved_state, and pci_store_saved_state
but are missing the functionality to just load the state
multiple times in the PCI device without having to free/save
the state.

This patch makes it possible to use this function.

CC: Bjorn Helgaas <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/pci/pci.c | 5 +++--
include/linux/pci.h | 2 ++
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..f00a9d6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1142,8 +1142,8 @@ EXPORT_SYMBOL_GPL(pci_store_saved_state);
* @dev: PCI device that we're dealing with
* @state: Saved state returned from pci_store_saved_state()
*/
-static int pci_load_saved_state(struct pci_dev *dev,
- struct pci_saved_state *state)
+int pci_load_saved_state(struct pci_dev *dev,
+ struct pci_saved_state *state)
{
struct pci_cap_saved_data *cap;

@@ -1171,6 +1171,7 @@ static int pci_load_saved_state(struct pci_dev *dev,
dev->state_saved = true;
return 0;
}
+EXPORT_SYMBOL_GPL(pci_load_saved_state);

/**
* pci_load_and_free_saved_state - Reload the save state pointed to by state,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5be8db4..08088cb1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1003,6 +1003,8 @@ void __iomem __must_check *pci_platform_rom(struct pci_dev *pdev, size_t *size);
int pci_save_state(struct pci_dev *dev);
void pci_restore_state(struct pci_dev *dev);
struct pci_saved_state *pci_store_saved_state(struct pci_dev *dev);
+int pci_load_saved_state(struct pci_dev *dev,
+ struct pci_saved_state *state);
int pci_load_and_free_saved_state(struct pci_dev *dev,
struct pci_saved_state **state);
struct pci_cap_saved_state *pci_find_saved_cap(struct pci_dev *dev, char cap);
--
1.9.3

2014-12-03 21:42:29

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

The life-cycle of a PCI device in Xen pciback is complex
and is constrained by the PCI generic locking mechanism.

It starts with the device being binded to us - for which
we do a device function reset (and done via SysFS
so the PCI lock is held)

If the device is unbinded from us - we also do a function
reset (also done via SysFS so the PCI lock is held).

If the device is un-assigned from a guest - we do a function
reset (no PCI lock).

All on the individual PCI function level (so bus:device:function).

Unfortunatly a function reset is not adequate for certain
PCIe devices. The reset for an individual PCI function "means
device must support FLR (PCIe or AF), PM reset on D3hot->D0
device specific reset, or be a singleton device on a bus
a secondary bus reset. FLR does not have widespread support,
reset is not very reliable, and bus topology is dictated by the
and device design. We need to provide a means for a user to
a bus reset in cases where the existing mechanisms are not
or not reliable. " (Adam Williamson, 'vfio-pci: PCI hot reset
interface' commit 8b27ee60bfd6bbb84d2df28fa706c5c5081066ca).

As such to do a slot or a bus reset is we need another mechanism.
This is not exposed SysFS as there is no good way of exposing
a bus topology there.

This is due to the complexity - we MUST know that the different
functions off a PCIe device are not in use by other drivers, or
if they are in use (say one of them is assigned to a guest
and the other is idle) - it is still OK to reset the slot
(assuming both of them are owned by Xen pciback).

This patch does that by doing an slot or bus reset (if
slot not supported) if all of the functions of a PCIe
device belong to Xen PCIback. We do not care if the device is
in-use as we depend on the toolstack to be aware of this -
however if it is we will WARN the user.

Due to the complexity with the PCI lock we cannot do
the reset when a device is binded ('echo $BDF > bind')
or when unbinded ('echo $BDF > unbind') as the pci_[slot|bus]_reset
also take the same lock resulting in a dead-lock.

Putting the reset function in a workqueue or thread
won't work either - as we have to do the reset function
outside the 'unbind' context (it holds the PCI lock).
But once you 'unbind' a device the device is no longer
under the ownership of Xen pciback and the pci_set_drvdata
has been reset so we cannot use a thread for this.

Instead of doing all this complex dance, we depend on the toolstack
doing the right thing. As such implement the 'do_flr' SysFS attribute
which 'xl' uses when a device is detached or attached from/to a guest.
It bypasses the need to worry about the PCI lock.

To not inadvertly do a bus reset that would affect devices that
are in use by other drivers (other than Xen pciback) prior
to the reset we check that all of the devices under the bridge
are owned by Xen pciback. If they are not we do not do
the bus (or slot) reset.

We also warn the user if the device is in use - but still
continue with the reset. This should not happen as the toolstack
also does the check.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
Documentation/ABI/testing/sysfs-driver-pciback | 12 +++
drivers/xen/xen-pciback/pci_stub.c | 124 ++++++++++++++++++++++---
2 files changed, 125 insertions(+), 11 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-pciback b/Documentation/ABI/testing/sysfs-driver-pciback
index 6a733bf..2d4e32f 100644
--- a/Documentation/ABI/testing/sysfs-driver-pciback
+++ b/Documentation/ABI/testing/sysfs-driver-pciback
@@ -11,3 +11,15 @@ Description:
#echo 00:19.0-E0:2:FF > /sys/bus/pci/drivers/pciback/quirks
will allow the guest to read and write to the configuration
register 0x0E.
+
+
+What: /sys/bus/pci/drivers/pciback/do_flr
+Date: December 2014
+KernelVersion: 3.19
+Contact: [email protected]
+Description:
+ An option to slot or bus reset an PCI device owned by
+ Xen PCI backend. Writing a string of DDDD:BB:DD.F will cause
+ the driver to perform an slot or bus reset if the device
+ supports. It also checks to make sure that all of the devices
+ under the bridge are owned by Xen PCI backend.
diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index cc3cbb4..f830bf4 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -100,14 +100,9 @@ static void pcistub_device_release(struct kref *kref)

xen_unregister_device_domain_owner(dev);

- /* Call the reset function which does not take lock as this
- * is called from "unbind" which takes a device_lock mutex.
- */
- __pci_reset_function_locked(dev);
+ /* Reset is done by the toolstack by using 'do_flr' on the SysFS. */
if (pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state))
dev_info(&dev->dev, "Could not reload PCI state\n");
- else
- pci_restore_state(dev);

if (dev->msix_cap) {
struct physdev_pci_device ppdev = {
@@ -123,9 +118,6 @@ static void pcistub_device_release(struct kref *kref)
err);
}

- /* Disable the device */
- xen_pcibk_reset_device(dev);
-
kfree(dev_data);
pci_set_drvdata(dev, NULL);

@@ -242,6 +234,87 @@ struct pci_dev *pcistub_get_pci_dev(struct xen_pcibk_device *pdev,
return found_dev;
}

+struct wrapper_args {
+ struct pci_dev *dev;
+ int in_use;
+};
+
+static int pcistub_pci_walk_wrapper(struct pci_dev *dev, void *data)
+{
+ struct pcistub_device *psdev, *found_psdev = NULL;
+ struct wrapper_args *arg = data;
+ unsigned long flags;
+
+ spin_lock_irqsave(&pcistub_devices_lock, flags);
+ list_for_each_entry(psdev, &pcistub_devices, dev_list) {
+ if (psdev->dev == dev) {
+ found_psdev = psdev;
+ if (psdev->pdev)
+ arg->in_use++;
+ break;
+ }
+ }
+ spin_unlock_irqrestore(&pcistub_devices_lock, flags);
+ dev_dbg(&dev->dev, "%s\n", found_psdev ? "OK" : "not owned by us!");
+
+ if (!found_psdev)
+ arg->dev = dev;
+ return found_psdev ? 0 : 1;
+}
+
+static int pcistub_reset_pci_dev(struct pci_dev *dev)
+{
+ struct xen_pcibk_dev_data *dev_data;
+ struct wrapper_args arg = { .dev = NULL, .in_use = 0 };
+ bool slot = false, bus = false;
+ int rc;
+
+ if (!dev)
+ return -EINVAL;
+
+ if (!pci_probe_reset_slot(dev->slot))
+ slot = true;
+ else if (!pci_probe_reset_bus(dev->bus)) {
+ /* We won't attempt to reset a root bridge. */
+ if (!pci_is_root_bus(dev->bus))
+ bus = true;
+ }
+ dev_dbg(&dev->dev, "resetting (FLR, D3, %s %s) the device\n",
+ slot ? "slot" : "", bus ? "bus" : "");
+
+ pci_walk_bus(dev->bus, pcistub_pci_walk_wrapper, &arg);
+
+ if (arg.in_use)
+ dev_err(&dev->dev, "is in use!\n");
+
+ /*
+ * Takes the PCI lock. OK to do it as we are never called
+ * from 'unbind' state and don't deadlock.
+ */
+ dev_data = pci_get_drvdata(dev);
+ if (!pci_load_saved_state(dev, dev_data->pci_saved_state))
+ pci_restore_state(dev);
+
+ pci_reset_function(dev);
+
+ /* This disables the device. */
+ xen_pcibk_reset_device(dev);
+
+ /* And cleanup up our emulated fields. */
+ xen_pcibk_config_reset_dev(dev);
+
+ if (!bus && !slot)
+ return 0;
+
+ /* All slots or devices under the bus should be part of pcistub! */
+ if (arg.dev) {
+ dev_err(&dev->dev, "depends on: %s!\n", pci_name(arg.dev));
+ return -EBUSY;
+ }
+ return slot ? pci_try_reset_slot(dev->slot) :
+ pci_try_reset_bus(dev->bus);
+}
+
/*
* Called when:
* - XenBus state has been reconfigure (pci unplug). See xen_pcibk_remove_device
@@ -277,8 +350,9 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
* pcistub and xen_pcibk when AER is in processing
*/
down_write(&pcistub_sem);
- /* Cleanup our device
- * (so it's ready for the next domain)
+ /* Cleanup our device (so it's ready for the next domain)
+ * That is the job of the toolstack which has to call 'do_flr' before
+ * providing the PCI device to a guest (see pcistub_do_flr).
*/
device_lock_assert(&dev->dev);
__pci_reset_function_locked(dev);
@@ -1389,6 +1463,29 @@ static ssize_t permissive_show(struct device_driver *drv, char *buf)
static DRIVER_ATTR(permissive, S_IRUSR | S_IWUSR, permissive_show,
permissive_add);

+static ssize_t pcistub_do_flr(struct device_driver *drv, const char *buf,
+ size_t count)
+{
+ int domain, bus, slot, func;
+ int err;
+ struct pcistub_device *psdev;
+
+ err = str_to_slot(buf, &domain, &bus, &slot, &func);
+ if (err)
+ goto out;
+
+ psdev = pcistub_device_find(domain, bus, slot, func);
+ if (psdev) {
+ err = pcistub_reset_pci_dev(psdev->dev);
+ pcistub_device_put(psdev);
+ } else
+ err = -ENODEV;
+out:
+ if (!err)
+ err = count;
+ return err;
+}
+static DRIVER_ATTR(do_flr, S_IWUSR, NULL, pcistub_do_flr);
static void pcistub_exit(void)
{
driver_remove_file(&xen_pcibk_pci_driver.driver, &driver_attr_new_slot);
@@ -1402,6 +1499,8 @@ static void pcistub_exit(void)
&driver_attr_irq_handlers);
driver_remove_file(&xen_pcibk_pci_driver.driver,
&driver_attr_irq_handler_state);
+ driver_remove_file(&xen_pcibk_pci_driver.driver,
+ &driver_attr_do_flr);
pci_unregister_driver(&xen_pcibk_pci_driver);
}

@@ -1495,6 +1594,9 @@ static int __init pcistub_init(void)
if (!err)
err = driver_create_file(&xen_pcibk_pci_driver.driver,
&driver_attr_irq_handler_state);
+ if (!err)
+ err = driver_create_file(&xen_pcibk_pci_driver.driver,
+ &driver_attr_do_flr);
if (err)
pcistub_exit();

--
1.9.3

2014-12-03 21:42:54

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH v5 4/9] xen/pciback: Print out the domain owning the device.

We had been printing it only if the device was built with
debug enabled. But this information is useful in the field
to troubleshoot.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Reviewed-by: David Vrabel <[email protected]>
---
drivers/xen/xen-pciback/xenbus.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c
index eeee499..fe17c80 100644
--- a/drivers/xen/xen-pciback/xenbus.c
+++ b/drivers/xen/xen-pciback/xenbus.c
@@ -247,7 +247,7 @@ static int xen_pcibk_export_device(struct xen_pcibk_device *pdev,
if (err)
goto out;

- dev_dbg(&dev->dev, "registering for %d\n", pdev->xdev->otherend_id);
+ dev_info(&dev->dev, "registering for %d\n", pdev->xdev->otherend_id);
if (xen_register_device_domain_owner(dev,
pdev->xdev->otherend_id) != 0) {
dev_err(&dev->dev, "Stealing ownership from dom%d.\n",
--
1.9.3

2014-12-03 21:41:08

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH v5 2/9] driver core: Provide an wrapper around the mutex to do lockdep warnings

Instead of open-coding it in drivers that want to double check
that their functions are indeed holding the device lock.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Suggested-by: David Vrabel <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
---
drivers/xen/xen-pciback/pci_stub.c | 2 +-
include/linux/device.h | 5 +++++
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index 9cbe1a3..8b77089 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -278,7 +278,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
/* Cleanup our device
* (so it's ready for the next domain)
*/
- lockdep_assert_held(&dev->dev.mutex);
+ device_lock_assert(&dev->dev);
__pci_reset_function_locked(dev);
pci_restore_state(dev);

diff --git a/include/linux/device.h b/include/linux/device.h
index ce1f2160..41d6a75 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -911,6 +911,11 @@ static inline void device_unlock(struct device *dev)
mutex_unlock(&dev->mutex);
}

+static inline void device_lock_assert(struct device *dev)
+{
+ lockdep_assert_held(&dev->mutex);
+}
+
void driver_init(void);

/*
--
1.9.3

2014-12-03 21:43:41

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH v5 7/9] xen/pciback: Restore configuration space when detaching from a guest.

The commit "xen/pciback: Don't deadlock when unbinding." was using
the version of pci_reset_function which would lock the device lock.
That is no good as we can dead-lock. As such we swapped to using
the lock-less version and requiring that the callers
of 'pcistub_put_pci_dev' take the device lock. And as such
this bug got exposed.

Using the lock-less version is OK, except that we tried to
use 'pci_restore_state' after the lock-less version of
__pci_reset_function_locked - which won't work as 'state_saved'
is set to false. Said 'state_saved' is a toggle boolean that
is to be used by the sequence of a) pci_save_state/pci_restore_state
or b) pci_load_and_free_saved_state/pci_restore_state. We don't
want to use a) as the guest might have messed up the PCI
configuration space and we want it to revert to the state
when the PCI device was binded to us. Therefore we pick
b) to restore the configuration space.

We restore from our 'golden' version of PCI configuration space, when an:
- Device is unbinded from pciback
- Device is detached from a guest.

Reported-by: Sander Eikelenboom <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
v2: Always FLR reset
---
drivers/xen/xen-pciback/pci_stub.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index 843a2ba..8580e53 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -105,7 +105,7 @@ static void pcistub_device_release(struct kref *kref)
*/
__pci_reset_function_locked(dev);
if (pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state))
- dev_dbg(&dev->dev, "Could not reload PCI state\n");
+ dev_info(&dev->dev, "Could not reload PCI state\n");
else
pci_restore_state(dev);

@@ -257,6 +257,8 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
{
struct pcistub_device *psdev, *found_psdev = NULL;
unsigned long flags;
+ struct xen_pcibk_dev_data *dev_data;
+ int ret;

spin_lock_irqsave(&pcistub_devices_lock, flags);

@@ -280,8 +282,18 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
*/
device_lock_assert(&dev->dev);
__pci_reset_function_locked(dev);
- pci_restore_state(dev);

+ dev_data = pci_get_drvdata(dev);
+ ret = pci_load_saved_state(dev, dev_data->pci_saved_state);
+ if (!ret) {
+ /*
+ * The usual sequence is pci_save_state & pci_restore_state
+ * but the guest might have messed the configuration space up.
+ * Use the initial version (when device was bound to us).
+ */
+ pci_restore_state(dev);
+ } else
+ dev_info(&dev->dev, "Could not reload PCI state\n");
/* This disables the device. */
xen_pcibk_reset_device(dev);

--
1.9.3

2014-12-03 21:44:22

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH v5 5/9] xen/pciback: Remove tons of dereferences

A little cleanup. No functional difference.

Reviewed-by: Boris Ostrovsky <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/xen/xen-pciback/pci_stub.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index e5ff09d..843a2ba 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -631,10 +631,12 @@ static pci_ers_result_t common_process(struct pcistub_device *psdev,
{
pci_ers_result_t res = result;
struct xen_pcie_aer_op *aer_op;
+ struct xen_pcibk_device *pdev = psdev->pdev;
+ struct xen_pci_sharedinfo *sh_info = pdev->sh_info;
int ret;

/*with PV AER drivers*/
- aer_op = &(psdev->pdev->sh_info->aer_op);
+ aer_op = &(sh_info->aer_op);
aer_op->cmd = aer_cmd ;
/*useful for error_detected callback*/
aer_op->err = state;
@@ -655,36 +657,36 @@ static pci_ers_result_t common_process(struct pcistub_device *psdev,
* this flag to judge whether we need to check pci-front give aer
* service ack signal
*/
- set_bit(_PCIB_op_pending, (unsigned long *)&psdev->pdev->flags);
+ set_bit(_PCIB_op_pending, (unsigned long *)&pdev->flags);

/*It is possible that a pcifront conf_read_write ops request invokes
* the callback which cause the spurious execution of wake_up.
* Yet it is harmless and better than a spinlock here
*/
set_bit(_XEN_PCIB_active,
- (unsigned long *)&psdev->pdev->sh_info->flags);
+ (unsigned long *)&sh_info->flags);
wmb();
- notify_remote_via_irq(psdev->pdev->evtchn_irq);
+ notify_remote_via_irq(pdev->evtchn_irq);

ret = wait_event_timeout(xen_pcibk_aer_wait_queue,
!(test_bit(_XEN_PCIB_active, (unsigned long *)
- &psdev->pdev->sh_info->flags)), 300*HZ);
+ &sh_info->flags)), 300*HZ);

if (!ret) {
if (test_bit(_XEN_PCIB_active,
- (unsigned long *)&psdev->pdev->sh_info->flags)) {
+ (unsigned long *)&sh_info->flags)) {
dev_err(&psdev->dev->dev,
"pcifront aer process not responding!\n");
clear_bit(_XEN_PCIB_active,
- (unsigned long *)&psdev->pdev->sh_info->flags);
+ (unsigned long *)&sh_info->flags);
aer_op->err = PCI_ERS_RESULT_NONE;
return res;
}
}
- clear_bit(_PCIB_op_pending, (unsigned long *)&psdev->pdev->flags);
+ clear_bit(_PCIB_op_pending, (unsigned long *)&pdev->flags);

if (test_bit(_XEN_PCIF_active,
- (unsigned long *)&psdev->pdev->sh_info->flags)) {
+ (unsigned long *)&sh_info->flags)) {
dev_dbg(&psdev->dev->dev,
"schedule pci_conf service in " DRV_NAME "\n");
xen_pcibk_test_and_schedule_op(psdev->pdev);
--
1.9.3

2014-12-03 23:49:54

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v5 6/9] PCI: Expose pci_load_saved_state for public consumption.

On Wed, Dec 3, 2014 at 2:40 PM, Konrad Rzeszutek Wilk
<[email protected]> wrote:
> We have the pci_load_and_free_saved_state, and pci_store_saved_state
> but are missing the functionality to just load the state
> multiple times in the PCI device without having to free/save
> the state.
>
> This patch makes it possible to use this function.
>
> CC: Bjorn Helgaas <[email protected]>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>

Acked-by: Bjorn Helgaas <[email protected]>

I assume you'll merge this whole series through your tree. Let me
know if you want me to do anything else.

> ---
> drivers/pci/pci.c | 5 +++--
> include/linux/pci.h | 2 ++
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 625a4ac..f00a9d6 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1142,8 +1142,8 @@ EXPORT_SYMBOL_GPL(pci_store_saved_state);
> * @dev: PCI device that we're dealing with
> * @state: Saved state returned from pci_store_saved_state()
> */
> -static int pci_load_saved_state(struct pci_dev *dev,
> - struct pci_saved_state *state)
> +int pci_load_saved_state(struct pci_dev *dev,
> + struct pci_saved_state *state)
> {
> struct pci_cap_saved_data *cap;
>
> @@ -1171,6 +1171,7 @@ static int pci_load_saved_state(struct pci_dev *dev,
> dev->state_saved = true;
> return 0;
> }
> +EXPORT_SYMBOL_GPL(pci_load_saved_state);
>
> /**
> * pci_load_and_free_saved_state - Reload the save state pointed to by state,
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 5be8db4..08088cb1 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1003,6 +1003,8 @@ void __iomem __must_check *pci_platform_rom(struct pci_dev *pdev, size_t *size);
> int pci_save_state(struct pci_dev *dev);
> void pci_restore_state(struct pci_dev *dev);
> struct pci_saved_state *pci_store_saved_state(struct pci_dev *dev);
> +int pci_load_saved_state(struct pci_dev *dev,
> + struct pci_saved_state *state);
> int pci_load_and_free_saved_state(struct pci_dev *dev,
> struct pci_saved_state **state);
> struct pci_cap_saved_state *pci_find_saved_cap(struct pci_dev *dev, char cap);
> --
> 1.9.3
>

2014-12-04 11:08:27

by David Vrabel

[permalink] [raw]
Subject: Re: [PATCH v5 8/9] xen-pciback: drop SR-IOV VFs when PF driver unloads

On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
> From: Jan Beulich <[email protected]>
>
> When a PF driver unloads, it may find it necessary to leave the VFs
> around simply because of pciback having marked them as assigned to a
> guest. Utilize a suitable notification to let go of the VFs, thus
> allowing the PF to go back into the state it was before its driver
> loaded (which in particular allows the driver to be loaded again with
> it being able to create the VFs anew, but which also allows to then
> pass through the PF instead of the VFs).
>
> Don't do this however for any VFs currently in active use by a guest.
>
> Signed-off-by: Jan Beulich <[email protected]>
> [v2: Removed the switch statement, moved it about]
> [v3: Redid it a bit differently]

Jan, are you happy with Konrad's change now?

David

> --- a/drivers/xen/xen-pciback/pci_stub.c
> +++ b/drivers/xen/xen-pciback/pci_stub.c
> @@ -1518,6 +1518,53 @@ parse_error:
> fs_initcall(pcistub_init);
> #endif
>
> +#ifdef CONFIG_PCI_IOV
> +static struct pcistub_device *find_vfs(const struct pci_dev *pdev)
> +{
> + struct pcistub_device *psdev = NULL;
> + unsigned long flags;
> + bool found = false;
> +
> + spin_lock_irqsave(&pcistub_devices_lock, flags);
> + list_for_each_entry(psdev, &pcistub_devices, dev_list) {
> + if (!psdev->pdev && psdev->dev != pdev
> + && pci_physfn(psdev->dev) == pdev) {
> + found = true;
> + break;
> + }
> + }
> + spin_unlock_irqrestore(&pcistub_devices_lock, flags);
> + if (found)
> + return psdev;
> + return NULL;
> +}
> +
> +static int pci_stub_notifier(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + struct device *dev = data;
> + const struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (action != BUS_NOTIFY_UNBIND_DRIVER)
> + return NOTIFY_DONE;
> +
> + if (!pdev->is_physfn)
> + return NOTIFY_DONE;
> +
> + for (;;) {
> + struct pcistub_device *psdev = find_vfs(pdev);
> + if (!psdev)
> + break;
> + device_release_driver(&psdev->dev->dev);
> + }
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block pci_stub_nb = {
> + .notifier_call = pci_stub_notifier,
> +};
> +#endif
> +
> static int __init xen_pcibk_init(void)
> {
> int err;
> @@ -1539,12 +1586,19 @@ static int __init xen_pcibk_init(void)
> err = xen_pcibk_xenbus_register();
> if (err)
> pcistub_exit();
> +#ifdef CONFIG_PCI_IOV
> + else
> + bus_register_notifier(&pci_bus_type, &pci_stub_nb);
> +#endif
>
> return err;
> }
>
> static void __exit xen_pcibk_cleanup(void)
> {
> +#ifdef CONFIG_PCI_IOV
> + bus_unregister_notifier(&pci_bus_type, &pci_stub_nb);
> +#endif
> xen_pcibk_xenbus_unregister();
> pcistub_exit();
> }
>

2014-12-04 11:30:20

by David Vrabel

[permalink] [raw]
Subject: Re: [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
>
> Instead of doing all this complex dance, we depend on the toolstack
> doing the right thing. As such implement the 'do_flr' SysFS attribute
> which 'xl' uses when a device is detached or attached from/to a guest.
> It bypasses the need to worry about the PCI lock.

No. Get pciback to add its own "reset" sysfs file (as I have repeatedly
proposed).

David

2014-12-04 11:36:37

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH v5 8/9] xen-pciback: drop SR-IOV VFs when PF driver unloads

>>> On 04.12.14 at 12:07, <[email protected]> wrote:
> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
>> From: Jan Beulich <[email protected]>
>>
>> When a PF driver unloads, it may find it necessary to leave the VFs
>> around simply because of pciback having marked them as assigned to a
>> guest. Utilize a suitable notification to let go of the VFs, thus
>> allowing the PF to go back into the state it was before its driver
>> loaded (which in particular allows the driver to be loaded again with
>> it being able to create the VFs anew, but which also allows to then
>> pass through the PF instead of the VFs).
>>
>> Don't do this however for any VFs currently in active use by a guest.
>>
>> Signed-off-by: Jan Beulich <[email protected]>
>> [v2: Removed the switch statement, moved it about]
>> [v3: Redid it a bit differently]
>
> Jan, are you happy with Konrad's change now?

Yes: http://lists.xenproject.org/archives/html/xen-devel/2014-12/msg00267.html

Jan

2014-12-04 12:06:48

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute


On Dec 4, 2014 6:30 AM, David Vrabel <[email protected]> wrote:
>
> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
> >
> > Instead of doing all this complex dance, we depend on the toolstack
> > doing the right thing. As such implement the 'do_flr' SysFS attribute
> > which 'xl' uses when a device is detached or attached from/to a guest.
> > It bypasses the need to worry about the PCI lock.
>
> No.  Get pciback to add its own "reset" sysfs file (as I have repeatedly
> proposed).
>

Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).

Unless you mean an different name (reset2).

> David
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?Ý¢j"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-12-04 12:25:26

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>
> On Dec 4, 2014 6:30 AM, David Vrabel <[email protected]> wrote:
>>
>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
>>>
>>> Instead of doing all this complex dance, we depend on the toolstack
>>> doing the right thing. As such implement the 'do_flr' SysFS attribute
>>> which 'xl' uses when a device is detached or attached from/to a guest.
>>> It bypasses the need to worry about the PCI lock.
>>
>> No. Get pciback to add its own "reset" sysfs file (as I have repeatedly
>> proposed).
>>
>
> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).

It is only needed if the core won't provide one.

+static int pcistub_try_create_reset_file(struct pci_dev *pci)
+{
+ struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
+ struct device *dev = &pci->dev;
+ int ret;
+
+ /* Already have a per-function reset? */
+ if (pci_probe_reset_function(pci) == 0)
+ return 0;
+
+ ret = device_create_file(dev, &dev_attr_reset);
+ if (ret < 0)
+ return ret;
+ dev_data->created_reset_file = true;
+ return 0;
+}

David



Attachments:
0002-xen-pciback-provide-a-reset-sysfs-file-to-try-harder.patch (6.50 kB)

2014-12-04 13:16:51

by Sander Eikelenboom

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute


Thursday, December 4, 2014, 1:24:47 PM, you wrote:

> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>
>> On Dec 4, 2014 6:30 AM, David Vrabel <[email protected]> wrote:
>>>
>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
>>>>
>>>> Instead of doing all this complex dance, we depend on the toolstack
>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute
>>>> which 'xl' uses when a device is detached or attached from/to a guest.
>>>> It bypasses the need to worry about the PCI lock.
>>>
>>> No. Get pciback to add its own "reset" sysfs file (as I have repeatedly
>>> proposed).
>>>
>>
>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).

> It is only needed if the core won't provide one.

> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
> +{
> + struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
> + struct device *dev = &pci->dev;
> + int ret;
> +
> + /* Already have a per-function reset? */
> + if (pci_probe_reset_function(pci) == 0)
> + return 0;
> +
> + ret = device_create_file(dev, &dev_attr_reset);
> + if (ret < 0)
> + return ret;
+ dev_data->>created_reset_file = true;
> + return 0;
> +}

Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling
"pci.c:__pci_dev_reset" ?

The problem with that function is that from my testing it seems that the
first option "pci_dev_specific_reset" always seems to return succes, so all the
other options are skipped (flr, pm, slot, bus). However the device it self is
not properly reset enough (perhaps the pci_dev_specific_reset is good enough for
none virtualization purposes and it's probably the least intrusive. For
virtualization however it would be nice to be sure it resets properly, or have a
way to force a specific reset routine.)

So it's the ordering and skipping of the other resets that seems to make
this workaround necessary in the first place.

> David




2014-12-04 13:43:12

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

On 04/12/14 13:10, Sander Eikelenboom wrote:
>
> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>
>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>
>>> On Dec 4, 2014 6:30 AM, David Vrabel <[email protected]> wrote:
>>>>
>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
>>>>>
>>>>> Instead of doing all this complex dance, we depend on the toolstack
>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute
>>>>> which 'xl' uses when a device is detached or attached from/to a guest.
>>>>> It bypasses the need to worry about the PCI lock.
>>>>
>>>> No. Get pciback to add its own "reset" sysfs file (as I have repeatedly
>>>> proposed).
>>>>
>>>
>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
>
>> It is only needed if the core won't provide one.
>
>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>> +{
>> + struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>> + struct device *dev = &pci->dev;
>> + int ret;
>> +
>> + /* Already have a per-function reset? */
>> + if (pci_probe_reset_function(pci) == 0)
>> + return 0;
>> +
>> + ret = device_create_file(dev, &dev_attr_reset);
>> + if (ret < 0)
>> + return ret;
> + dev_data->>created_reset_file = true;
>> + return 0;
>> +}
>
> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling
> "pci.c:__pci_dev_reset" ?
>
> The problem with that function is that from my testing it seems that the
> first option "pci_dev_specific_reset" always seems to return succes, so all the
> other options are skipped (flr, pm, slot, bus). However the device it self is
> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for
> none virtualization purposes and it's probably the least intrusive. For
> virtualization however it would be nice to be sure it resets properly, or have a
> way to force a specific reset routine.)

Then you need work with the maintainer for those specific devices or
drivers to fix their specific reset function.

I'm not adding stuff to pciback to workaround broken quirks.

David

2014-12-04 14:09:15

by Sander Eikelenboom

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute


Thursday, December 4, 2014, 2:43:06 PM, you wrote:

> On 04/12/14 13:10, Sander Eikelenboom wrote:
>>
>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>>
>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>>
>>>> On Dec 4, 2014 6:30 AM, David Vrabel <[email protected]> wrote:
>>>>>
>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
>>>>>>
>>>>>> Instead of doing all this complex dance, we depend on the toolstack
>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute
>>>>>> which 'xl' uses when a device is detached or attached from/to a guest.
>>>>>> It bypasses the need to worry about the PCI lock.
>>>>>
>>>>> No. Get pciback to add its own "reset" sysfs file (as I have repeatedly
>>>>> proposed).
>>>>>
>>>>
>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
>>
>>> It is only needed if the core won't provide one.
>>
>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>>> +{
>>> + struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>>> + struct device *dev = &pci->dev;
>>> + int ret;
>>> +
>>> + /* Already have a per-function reset? */
>>> + if (pci_probe_reset_function(pci) == 0)
>>> + return 0;
>>> +
>>> + ret = device_create_file(dev, &dev_attr_reset);
>>> + if (ret < 0)
>>> + return ret;
>> + dev_data->>created_reset_file = true;
>>> + return 0;
>>> +}
>>
>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling
>> "pci.c:__pci_dev_reset" ?
>>
>> The problem with that function is that from my testing it seems that the
>> first option "pci_dev_specific_reset" always seems to return succes, so all the
>> other options are skipped (flr, pm, slot, bus). However the device it self is
>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for
>> none virtualization purposes and it's probably the least intrusive. For
>> virtualization however it would be nice to be sure it resets properly, or have a
>> way to force a specific reset routine.)

> Then you need work with the maintainer for those specific devices or
> drivers to fix their specific reset function.

> I'm not adding stuff to pciback to workaround broken quirks.

OK that's a pretty clear message there, so if one wants to use pci and vga
passthrough one should better use KVM and vfio-pci.

vfio-pci has:
- logic to do the try-slot-bus-reset logic
- it has quirks specific to vga passthrough
implemented in from the looks of it a quite clean driver.
(the main issue with it so far was you could only seize devices based on
vendor and device id, which can be a problem when you have multiple devices.
However that was resolved recently if i am correct.)

And neither of those will be supported by xen-pciback if i get your message
right ?

--
Sander

> David

2014-12-04 14:14:37

by Sander Eikelenboom

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

Hello Sander,

Thursday, December 4, 2014, 3:09:09 PM, you wrote:


> Thursday, December 4, 2014, 2:43:06 PM, you wrote:

>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>>>
>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>>>
>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>>>
>>>>> On Dec 4, 2014 6:30 AM, David Vrabel <[email protected]> wrote:
>>>>>>
>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
>>>>>>>
>>>>>>> Instead of doing all this complex dance, we depend on the toolstack
>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute
>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest.
>>>>>>> It bypasses the need to worry about the PCI lock.
>>>>>>
>>>>>> No. Get pciback to add its own "reset" sysfs file (as I have repeatedly
>>>>>> proposed).
>>>>>>
>>>>>
>>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
>>>
>>>> It is only needed if the core won't provide one.
>>>
>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>>>> +{
>>>> + struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>>>> + struct device *dev = &pci->dev;
>>>> + int ret;
>>>> +
>>>> + /* Already have a per-function reset? */
>>>> + if (pci_probe_reset_function(pci) == 0)
>>>> + return 0;
>>>> +
>>>> + ret = device_create_file(dev, &dev_attr_reset);
>>>> + if (ret < 0)
>>>> + return ret;
>>> + dev_data->>created_reset_file = true;
>>>> + return 0;
>>>> +}
>>>
>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling
>>> "pci.c:__pci_dev_reset" ?
>>>
>>> The problem with that function is that from my testing it seems that the
>>> first option "pci_dev_specific_reset" always seems to return succes, so all the
>>> other options are skipped (flr, pm, slot, bus). However the device it self is
>>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for
>>> none virtualization purposes and it's probably the least intrusive. For
>>> virtualization however it would be nice to be sure it resets properly, or have a
>>> way to force a specific reset routine.)

>> Then you need work with the maintainer for those specific devices or
>> drivers to fix their specific reset function.

>> I'm not adding stuff to pciback to workaround broken quirks.

> OK that's a pretty clear message there, so if one wants to use pci and vga
> passthrough one should better use KVM and vfio-pci.

> vfio-pci has:
> - logic to do the try-slot-bus-reset logic
> - it has quirks specific to vga passthrough
Hrmm have to correct my self because the vga-pt quirks are part of the vfio-pci
part in qemu.

The try-slot-bus-reset logic is part of the kernel vfio-pci driver though and they faced the same locking issue it
seems:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=61cf16d8bd38c3dc52033ea75d5b1f8368514a17
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=890ed578df82f5b7b5a874f9f2fa4f117305df5f

> implemented in from the looks of it a quite clean driver.
> (the main issue with it so far was you could only seize devices based on
> vendor and device id, which can be a problem when you have multiple devices.
> However that was resolved recently if i am correct.)

> And neither of those will be supported by xen-pciback if i get your message
> right ?

> --
> Sander

>> David

2014-12-04 14:31:29

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

On 04/12/14 14:09, Sander Eikelenboom wrote:
>
> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
>
>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>>>
>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>>>
>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>>>
>>>>> On Dec 4, 2014 6:30 AM, David Vrabel <[email protected]> wrote:
>>>>>>
>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
>>>>>>>
>>>>>>> Instead of doing all this complex dance, we depend on the toolstack
>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute
>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest.
>>>>>>> It bypasses the need to worry about the PCI lock.
>>>>>>
>>>>>> No. Get pciback to add its own "reset" sysfs file (as I have repeatedly
>>>>>> proposed).
>>>>>>
>>>>>
>>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
>>>
>>>> It is only needed if the core won't provide one.
>>>
>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>>>> +{
>>>> + struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>>>> + struct device *dev = &pci->dev;
>>>> + int ret;
>>>> +
>>>> + /* Already have a per-function reset? */
>>>> + if (pci_probe_reset_function(pci) == 0)
>>>> + return 0;
>>>> +
>>>> + ret = device_create_file(dev, &dev_attr_reset);
>>>> + if (ret < 0)
>>>> + return ret;
>>> + dev_data->>created_reset_file = true;
>>>> + return 0;
>>>> +}
>>>
>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling
>>> "pci.c:__pci_dev_reset" ?
>>>
>>> The problem with that function is that from my testing it seems that the
>>> first option "pci_dev_specific_reset" always seems to return succes, so all the
>>> other options are skipped (flr, pm, slot, bus). However the device it self is
>>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for
>>> none virtualization purposes and it's probably the least intrusive. For
>>> virtualization however it would be nice to be sure it resets properly, or have a
>>> way to force a specific reset routine.)
>
>> Then you need work with the maintainer for those specific devices or
>> drivers to fix their specific reset function.
>
>> I'm not adding stuff to pciback to workaround broken quirks.
>
> OK that's a pretty clear message there, so if one wants to use pci and vga
> passthrough one should better use KVM and vfio-pci.

Have you (or anyone else) ever raised the problem with the broken reset
quirk for certain devices with the relevant maintainer?

> vfio-pci has:
> - logic to do the try-slot-bus-reset logic

Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
as well.

It makes no sense for both pciback and vfio-pci to workaround problems
with pci_function_reset() in different ways -- it should be fixed in the
core PCI code so both can benefit and make use of the same code.

David

2014-12-04 14:50:32

by Sander Eikelenboom

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute


Thursday, December 4, 2014, 3:31:11 PM, you wrote:

> On 04/12/14 14:09, Sander Eikelenboom wrote:
>>
>> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
>>
>>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>>>>
>>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>>>>
>>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>>>>>>
>>>>>> On Dec 4, 2014 6:30 AM, David Vrabel <[email protected]> wrote:
>>>>>>>
>>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
>>>>>>>>
>>>>>>>> Instead of doing all this complex dance, we depend on the toolstack
>>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute
>>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest.
>>>>>>>> It bypasses the need to worry about the PCI lock.
>>>>>>>
>>>>>>> No. Get pciback to add its own "reset" sysfs file (as I have repeatedly
>>>>>>> proposed).
>>>>>>>
>>>>>>
>>>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
>>>>
>>>>> It is only needed if the core won't provide one.
>>>>
>>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>>>>> +{
>>>>> + struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>>>>> + struct device *dev = &pci->dev;
>>>>> + int ret;
>>>>> +
>>>>> + /* Already have a per-function reset? */
>>>>> + if (pci_probe_reset_function(pci) == 0)
>>>>> + return 0;
>>>>> +
>>>>> + ret = device_create_file(dev, &dev_attr_reset);
>>>>> + if (ret < 0)
>>>>> + return ret;
>>>> + dev_data->>created_reset_file = true;
>>>>> + return 0;
>>>>> +}
>>>>
>>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling
>>>> "pci.c:__pci_dev_reset" ?
>>>>
>>>> The problem with that function is that from my testing it seems that the
>>>> first option "pci_dev_specific_reset" always seems to return succes, so all the
>>>> other options are skipped (flr, pm, slot, bus). However the device it self is
>>>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for
>>>> none virtualization purposes and it's probably the least intrusive. For
>>>> virtualization however it would be nice to be sure it resets properly, or have a
>>>> way to force a specific reset routine.)
>>
>>> Then you need work with the maintainer for those specific devices or
>>> drivers to fix their specific reset function.
>>
>>> I'm not adding stuff to pciback to workaround broken quirks.
>>
>> OK that's a pretty clear message there, so if one wants to use pci and vga
>> passthrough one should better use KVM and vfio-pci.

> Have you (or anyone else) ever raised the problem with the broken reset
> quirk for certain devices with the relevant maintainer?

>> vfio-pci has:
>> - logic to do the try-slot-bus-reset logic

> Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
> as well.

Depends on what you call an "incorrect fix" .. it fixes a quirk ..
you can say that's incorrect, but then you would have to remove 50% of
the kernel and Xen code as well.

(i do in general agree it's better to strive for a generic solution though,
that's exactly why i brought up that that function doesn't seem to work perfect
for virtualization purposes)

> It makes no sense for both pciback and vfio-pci to workaround problems
> with pci_function_reset() in different ways -- it should be fixed in the
> core PCI code so both can benefit and make use of the same code.

Well perhaps Bjorn knows why the order of resets and skipping the rest as
implemented in "pci.c:__pci_dev_reset" was implemented in that way ?

Especially what is the expectation about pci_dev_specific_reset doing a proper
reset for say a vga-card:
- i know it doesn't work on a radeon card (doesn't blank screen, on next guest
boot reports it's already posted, powermanagement doesn't work).
- while with a slot/bus reset, that all just works fine, screen blanks
immediately and everything else also works.

Added Alex as well since he added this workaround for KVM/vfio-pci, perhaps he knows why
he introduced the workaround in vfio-pci instead of trying to fix it in core pci
code ?

--
Sander


> David


2014-12-04 15:39:21

by Alex Williamson

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

On Thu, 2014-12-04 at 15:50 +0100, Sander Eikelenboom wrote:
> Thursday, December 4, 2014, 3:31:11 PM, you wrote:
>
> > On 04/12/14 14:09, Sander Eikelenboom wrote:
> >>
> >> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
> >>
> >>> On 04/12/14 13:10, Sander Eikelenboom wrote:
> >>>>
> >>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
> >>>>
> >>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
> >>>>>>
> >>>>>> On Dec 4, 2014 6:30 AM, David Vrabel <[email protected]> wrote:
> >>>>>>>
> >>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
> >>>>>>>>
> >>>>>>>> Instead of doing all this complex dance, we depend on the toolstack
> >>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute
> >>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest.
> >>>>>>>> It bypasses the need to worry about the PCI lock.
> >>>>>>>
> >>>>>>> No. Get pciback to add its own "reset" sysfs file (as I have repeatedly
> >>>>>>> proposed).
> >>>>>>>
> >>>>>>
> >>>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
> >>>>
> >>>>> It is only needed if the core won't provide one.
> >>>>
> >>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
> >>>>> +{
> >>>>> + struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
> >>>>> + struct device *dev = &pci->dev;
> >>>>> + int ret;
> >>>>> +
> >>>>> + /* Already have a per-function reset? */
> >>>>> + if (pci_probe_reset_function(pci) == 0)
> >>>>> + return 0;
> >>>>> +
> >>>>> + ret = device_create_file(dev, &dev_attr_reset);
> >>>>> + if (ret < 0)
> >>>>> + return ret;
> >>>> + dev_data->>created_reset_file = true;
> >>>>> + return 0;
> >>>>> +}
> >>>>
> >>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling
> >>>> "pci.c:__pci_dev_reset" ?
> >>>>
> >>>> The problem with that function is that from my testing it seems that the
> >>>> first option "pci_dev_specific_reset" always seems to return succes, so all the
> >>>> other options are skipped (flr, pm, slot, bus). However the device it self is
> >>>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for
> >>>> none virtualization purposes and it's probably the least intrusive. For
> >>>> virtualization however it would be nice to be sure it resets properly, or have a
> >>>> way to force a specific reset routine.)
> >>
> >>> Then you need work with the maintainer for those specific devices or
> >>> drivers to fix their specific reset function.
> >>
> >>> I'm not adding stuff to pciback to workaround broken quirks.
> >>
> >> OK that's a pretty clear message there, so if one wants to use pci and vga
> >> passthrough one should better use KVM and vfio-pci.
>
> > Have you (or anyone else) ever raised the problem with the broken reset
> > quirk for certain devices with the relevant maintainer?
>
> >> vfio-pci has:
> >> - logic to do the try-slot-bus-reset logic
>
> > Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
> > as well.
>
> Depends on what you call an "incorrect fix" .. it fixes a quirk ..
> you can say that's incorrect, but then you would have to remove 50% of
> the kernel and Xen code as well.
>
> (i do in general agree it's better to strive for a generic solution though,
> that's exactly why i brought up that that function doesn't seem to work perfect
> for virtualization purposes)
>
> > It makes no sense for both pciback and vfio-pci to workaround problems
> > with pci_function_reset() in different ways -- it should be fixed in the
> > core PCI code so both can benefit and make use of the same code.
>
> Well perhaps Bjorn knows why the order of resets and skipping the rest as
> implemented in "pci.c:__pci_dev_reset" was implemented in that way ?
>
> Especially what is the expectation about pci_dev_specific_reset doing a proper
> reset for say a vga-card:
> - i know it doesn't work on a radeon card (doesn't blank screen, on next guest
> boot reports it's already posted, powermanagement doesn't work).
> - while with a slot/bus reset, that all just works fine, screen blanks
> immediately and everything else also works.
>
> Added Alex as well since he added this workaround for KVM/vfio-pci, perhaps he knows why
> he introduced the workaround in vfio-pci instead of trying to fix it in core pci
> code ?

I don't know what workaround you're talking about. As devices are
released from the user, vfio-pci attempts to reset them. If
pci_reset_function() returns success we mark the device clean, otherwise
it gets marked dirty. Each time a device is released, if there are
dirty devices we test whether we can try a bus/slot reset to clean them.
In the case of assigning a GPU this typically means that the GPU or
audio function come through first, there's no reset mechanism so it gets
marked dirty, the next device comes through and we manage to try a bus
reset. vfio-pci does not have any device specific resets, all
functionality is added to the PCI-core, thank-you-very-much. I even
posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
bad so that pci_reset_function() won't claim that worked. All VGA
access quirks are done in QEMU, the kernel doesn't have any business in
remapping config space over MMIO regions or trapping other config space
backdoors.

I have never heard of problems with the dev specific reset claiming to
work and not doing anything, there are only a few of these, it should be
easy to debug.

I didn't read the original patch, but the title alone of this patch is
quite confusing. FLR is specifically a function-level-reset, so one
would expect 'do_flr' to be function specific. The pci-sysfs 'reset'
attribute is already function specific. If pci_reset_function() isn't
doing the job and we need to use bus/slot reset, it's clearly not an
FLR. Thanks,

Alex

2014-12-04 15:46:53

by David Vrabel

[permalink] [raw]
Subject: Re: [PATCH v5] Fixes for PCI backend for 3.19.

On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
>
> These patches fix some issues with PCI back and also add proper
> bus/slot reset.

Applied 1-8 to devel/for-linus-3.19. I did not apply #9.

Thanks.

David

2014-12-04 16:26:07

by Sander Eikelenboom

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute


Thursday, December 4, 2014, 4:39:06 PM, you wrote:

> On Thu, 2014-12-04 at 15:50 +0100, Sander Eikelenboom wrote:
>> Thursday, December 4, 2014, 3:31:11 PM, you wrote:
>>
>> > On 04/12/14 14:09, Sander Eikelenboom wrote:
>> >>
>> >> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
>> >>
>> >>> On 04/12/14 13:10, Sander Eikelenboom wrote:
>> >>>>
>> >>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
>> >>>>
>> >>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
>> >>>>>>
>> >>>>>> On Dec 4, 2014 6:30 AM, David Vrabel <[email protected]> wrote:
>> >>>>>>>
>> >>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
>> >>>>>>>>
>> >>>>>>>> Instead of doing all this complex dance, we depend on the toolstack
>> >>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute
>> >>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest.
>> >>>>>>>> It bypasses the need to worry about the PCI lock.
>> >>>>>>>
>> >>>>>>> No. Get pciback to add its own "reset" sysfs file (as I have repeatedly
>> >>>>>>> proposed).
>> >>>>>>>
>> >>>>>>
>> >>>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
>> >>>>
>> >>>>> It is only needed if the core won't provide one.
>> >>>>
>> >>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
>> >>>>> +{
>> >>>>> + struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
>> >>>>> + struct device *dev = &pci->dev;
>> >>>>> + int ret;
>> >>>>> +
>> >>>>> + /* Already have a per-function reset? */
>> >>>>> + if (pci_probe_reset_function(pci) == 0)
>> >>>>> + return 0;
>> >>>>> +
>> >>>>> + ret = device_create_file(dev, &dev_attr_reset);
>> >>>>> + if (ret < 0)
>> >>>>> + return ret;
>> >>>> + dev_data->>created_reset_file = true;
>> >>>>> + return 0;
>> >>>>> +}
>> >>>>
>> >>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling
>> >>>> "pci.c:__pci_dev_reset" ?
>> >>>>
>> >>>> The problem with that function is that from my testing it seems that the
>> >>>> first option "pci_dev_specific_reset" always seems to return succes, so all the
>> >>>> other options are skipped (flr, pm, slot, bus). However the device it self is
>> >>>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for
>> >>>> none virtualization purposes and it's probably the least intrusive. For
>> >>>> virtualization however it would be nice to be sure it resets properly, or have a
>> >>>> way to force a specific reset routine.)
>> >>
>> >>> Then you need work with the maintainer for those specific devices or
>> >>> drivers to fix their specific reset function.
>> >>
>> >>> I'm not adding stuff to pciback to workaround broken quirks.
>> >>
>> >> OK that's a pretty clear message there, so if one wants to use pci and vga
>> >> passthrough one should better use KVM and vfio-pci.
>>
>> > Have you (or anyone else) ever raised the problem with the broken reset
>> > quirk for certain devices with the relevant maintainer?
>>
>> >> vfio-pci has:
>> >> - logic to do the try-slot-bus-reset logic
>>
>> > Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
>> > as well.
>>
>> Depends on what you call an "incorrect fix" .. it fixes a quirk ..
>> you can say that's incorrect, but then you would have to remove 50% of
>> the kernel and Xen code as well.
>>
>> (i do in general agree it's better to strive for a generic solution though,
>> that's exactly why i brought up that that function doesn't seem to work perfect
>> for virtualization purposes)
>>
>> > It makes no sense for both pciback and vfio-pci to workaround problems
>> > with pci_function_reset() in different ways -- it should be fixed in the
>> > core PCI code so both can benefit and make use of the same code.
>>
>> Well perhaps Bjorn knows why the order of resets and skipping the rest as
>> implemented in "pci.c:__pci_dev_reset" was implemented in that way ?
>>
>> Especially what is the expectation about pci_dev_specific_reset doing a proper
>> reset for say a vga-card:
>> - i know it doesn't work on a radeon card (doesn't blank screen, on next guest
>> boot reports it's already posted, powermanagement doesn't work).
>> - while with a slot/bus reset, that all just works fine, screen blanks
>> immediately and everything else also works.
>>
>> Added Alex as well since he added this workaround for KVM/vfio-pci, perhaps he knows why
>> he introduced the workaround in vfio-pci instead of trying to fix it in core pci
>> code ?

> I don't know what workaround you're talking about. As devices are
> released from the user, vfio-pci attempts to reset them. If
> pci_reset_function() returns success we mark the device clean, otherwise
> it gets marked dirty. Each time a device is released, if there are
> dirty devices we test whether we can try a bus/slot reset to clean them.
> In the case of assigning a GPU this typically means that the GPU or
> audio function come through first, there's no reset mechanism so it gets
> marked dirty, the next device comes through and we manage to try a bus
> reset. vfio-pci does not have any device specific resets, all
> functionality is added to the PCI-core, thank-you-very-much. I even
> posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
> bad so that pci_reset_function() won't claim that worked. All VGA
> access quirks are done in QEMU, the kernel doesn't have any business in
> remapping config space over MMIO regions or trapping other config space
> backdoors.

Thanks for your insightful reply!

With "workaround" I was trying to refer to "vfio_pci_try_bus_reset()" which
implements how to reset the devices, it indeed uses function you introduced in
pci core code (with a solution for locking issues Konrad also seems to have
ran into:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=61cf16d8bd38c3dc52033ea75d5b1f8368514a17

David seems to be arguing the whole "vfio_pci_try_bus_reset()" should be not
needed and just doing calling "pci_reset_function()" (directly or by
echo "1" > /sys/bus/pci/devices/BDF/reset shoud always magically do the
right thing.
(Which in my opinion seems the contradict with the mere existence
of "vfio_pci_try_bus_reset()" (i don't think you would have implemented it
when you would have deemed it unnecessary))

> I have never heard of problems with the dev specific reset claiming to
> work and not doing anything, there are only a few of these, it should be
> easy to debug.

> I didn't read the original patch, but the title alone of this patch is
> quite confusing. FLR is specifically a function-level-reset, so one
> would expect 'do_flr' to be function specific. The pci-sysfs 'reset'
> attribute is already function specific. If pci_reset_function() isn't
> doing the job and we need to use bus/slot reset, it's clearly not an
> FLR. Thanks,
> Alex

The name "do_flr" is coming from the Xen xl toolstack which historically has
code that tries to reset devices using a echo "BDF" > /sys/bus/pci/drivers/pciback/do_flr

But the name "do_flr" and the debug messages indeed are incorrect (it's not
doing a flr nor a D3/PM reset), confusing and should not be used.

And as you seem to have solved the locking issue for vfio-pci, it is probably
possible for xen-pciback to do the same. Instead of letting xen-pciback
work around the locking problem by deferring to the xl toolstack the resetting
logic could be kept into xen-pciback it self.
That would also mean that the sysfs attribute would be unnecessary and make
the naming issue moot.

--
Sander

2014-12-04 16:55:29

by Alex Williamson

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

On Thu, 2014-12-04 at 17:25 +0100, Sander Eikelenboom wrote:
> Thursday, December 4, 2014, 4:39:06 PM, you wrote:
>
> > On Thu, 2014-12-04 at 15:50 +0100, Sander Eikelenboom wrote:
> >> Thursday, December 4, 2014, 3:31:11 PM, you wrote:
> >>
> >> > On 04/12/14 14:09, Sander Eikelenboom wrote:
> >> >>
> >> >> Thursday, December 4, 2014, 2:43:06 PM, you wrote:
> >> >>
> >> >>> On 04/12/14 13:10, Sander Eikelenboom wrote:
> >> >>>>
> >> >>>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
> >> >>>>
> >> >>>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
> >> >>>>>>
> >> >>>>>> On Dec 4, 2014 6:30 AM, David Vrabel <[email protected]> wrote:
> >> >>>>>>>
> >> >>>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
> >> >>>>>>>>
> >> >>>>>>>> Instead of doing all this complex dance, we depend on the toolstack
> >> >>>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute
> >> >>>>>>>> which 'xl' uses when a device is detached or attached from/to a guest.
> >> >>>>>>>> It bypasses the need to worry about the PCI lock.
> >> >>>>>>>
> >> >>>>>>> No. Get pciback to add its own "reset" sysfs file (as I have repeatedly
> >> >>>>>>> proposed).
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
> >> >>>>
> >> >>>>> It is only needed if the core won't provide one.
> >> >>>>
> >> >>>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
> >> >>>>> +{
> >> >>>>> + struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
> >> >>>>> + struct device *dev = &pci->dev;
> >> >>>>> + int ret;
> >> >>>>> +
> >> >>>>> + /* Already have a per-function reset? */
> >> >>>>> + if (pci_probe_reset_function(pci) == 0)
> >> >>>>> + return 0;
> >> >>>>> +
> >> >>>>> + ret = device_create_file(dev, &dev_attr_reset);
> >> >>>>> + if (ret < 0)
> >> >>>>> + return ret;
> >> >>>> + dev_data->>created_reset_file = true;
> >> >>>>> + return 0;
> >> >>>>> +}
> >> >>>>
> >> >>>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling
> >> >>>> "pci.c:__pci_dev_reset" ?
> >> >>>>
> >> >>>> The problem with that function is that from my testing it seems that the
> >> >>>> first option "pci_dev_specific_reset" always seems to return succes, so all the
> >> >>>> other options are skipped (flr, pm, slot, bus). However the device it self is
> >> >>>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for
> >> >>>> none virtualization purposes and it's probably the least intrusive. For
> >> >>>> virtualization however it would be nice to be sure it resets properly, or have a
> >> >>>> way to force a specific reset routine.)
> >> >>
> >> >>> Then you need work with the maintainer for those specific devices or
> >> >>> drivers to fix their specific reset function.
> >> >>
> >> >>> I'm not adding stuff to pciback to workaround broken quirks.
> >> >>
> >> >> OK that's a pretty clear message there, so if one wants to use pci and vga
> >> >> passthrough one should better use KVM and vfio-pci.
> >>
> >> > Have you (or anyone else) ever raised the problem with the broken reset
> >> > quirk for certain devices with the relevant maintainer?
> >>
> >> >> vfio-pci has:
> >> >> - logic to do the try-slot-bus-reset logic
> >>
> >> > Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
> >> > as well.
> >>
> >> Depends on what you call an "incorrect fix" .. it fixes a quirk ..
> >> you can say that's incorrect, but then you would have to remove 50% of
> >> the kernel and Xen code as well.
> >>
> >> (i do in general agree it's better to strive for a generic solution though,
> >> that's exactly why i brought up that that function doesn't seem to work perfect
> >> for virtualization purposes)
> >>
> >> > It makes no sense for both pciback and vfio-pci to workaround problems
> >> > with pci_function_reset() in different ways -- it should be fixed in the
> >> > core PCI code so both can benefit and make use of the same code.
> >>
> >> Well perhaps Bjorn knows why the order of resets and skipping the rest as
> >> implemented in "pci.c:__pci_dev_reset" was implemented in that way ?
> >>
> >> Especially what is the expectation about pci_dev_specific_reset doing a proper
> >> reset for say a vga-card:
> >> - i know it doesn't work on a radeon card (doesn't blank screen, on next guest
> >> boot reports it's already posted, powermanagement doesn't work).
> >> - while with a slot/bus reset, that all just works fine, screen blanks
> >> immediately and everything else also works.
> >>
> >> Added Alex as well since he added this workaround for KVM/vfio-pci, perhaps he knows why
> >> he introduced the workaround in vfio-pci instead of trying to fix it in core pci
> >> code ?
>
> > I don't know what workaround you're talking about. As devices are
> > released from the user, vfio-pci attempts to reset them. If
> > pci_reset_function() returns success we mark the device clean, otherwise
> > it gets marked dirty. Each time a device is released, if there are
> > dirty devices we test whether we can try a bus/slot reset to clean them.
> > In the case of assigning a GPU this typically means that the GPU or
> > audio function come through first, there's no reset mechanism so it gets
> > marked dirty, the next device comes through and we manage to try a bus
> > reset. vfio-pci does not have any device specific resets, all
> > functionality is added to the PCI-core, thank-you-very-much. I even
> > posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
> > bad so that pci_reset_function() won't claim that worked. All VGA
> > access quirks are done in QEMU, the kernel doesn't have any business in
> > remapping config space over MMIO regions or trapping other config space
> > backdoors.
>
> Thanks for your insightful reply!
>
> With "workaround" I was trying to refer to "vfio_pci_try_bus_reset()" which
> implements how to reset the devices, it indeed uses function you introduced in
> pci core code (with a solution for locking issues Konrad also seems to have
> ran into:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=61cf16d8bd38c3dc52033ea75d5b1f8368514a17
>
> David seems to be arguing the whole "vfio_pci_try_bus_reset()" should be not
> needed and just doing calling "pci_reset_function()" (directly or by
> echo "1" > /sys/bus/pci/devices/BDF/reset shoud always magically do the
> right thing.
> (Which in my opinion seems the contradict with the mere existence
> of "vfio_pci_try_bus_reset()" (i don't think you would have implemented it
> when you would have deemed it unnecessary))

That truly would be magic because a bus/slot reset and function reset
are completely different beasts. QEMU, through vfio-pci, makes use of
both. Take for instance hot-plugging the second port of a dual-port NIC
to a guest, where the first port may be (a) assigned to the same guest,
(b) assigned to a different guest, (c) in-use by the host, or (d)
not-in-use. For a hotplug I can only make use of a bus/slot reset in
one of those cases (d). For a cold-plug or VM reset, only two (a,d). I
don't see how pci_reset_function() can have that sort of visibility to
the ownership and usage of a given device. vfio-pci doesn't even have
this visibility, which is why the distinction is made in QEMU. vfio-pci
is just a conduit and gatekeeper to the PCI-core interfaces, for
instance preventing QEMU from doing a reset in the (b) and (c) cases.
What prevents that in the Xen case? Userspace?

> > I have never heard of problems with the dev specific reset claiming to
> > work and not doing anything, there are only a few of these, it should be
> > easy to debug.
>
> > I didn't read the original patch, but the title alone of this patch is
> > quite confusing. FLR is specifically a function-level-reset, so one
> > would expect 'do_flr' to be function specific. The pci-sysfs 'reset'
> > attribute is already function specific. If pci_reset_function() isn't
> > doing the job and we need to use bus/slot reset, it's clearly not an
> > FLR. Thanks,
> > Alex
>
> The name "do_flr" is coming from the Xen xl toolstack which historically has
> code that tries to reset devices using a echo "BDF" > /sys/bus/pci/drivers/pciback/do_flr

Redundant to /sys/bus/pci/devices/DDDD:BB:DD.F/reset

> But the name "do_flr" and the debug messages indeed are incorrect (it's not
> doing a flr nor a D3/PM reset), confusing and should not be used.
>
> And as you seem to have solved the locking issue for vfio-pci, it is probably
> possible for xen-pciback to do the same. Instead of letting xen-pciback
> work around the locking problem by deferring to the xl toolstack the resetting
> logic could be kept into xen-pciback it self.
> That would also mean that the sysfs attribute would be unnecessary and make
> the naming issue moot.

I would consider the try_*_reset() interfaces to be a workaround for
existing locking issues which are much harder to solve. It makes the
vfio-pci reset-on-release a best effort approach, which is generally
fine. For vfio I can't rely on a toolstack, nor maybe should you.
There's always a chance that the VM/user is sent a kill -9 and it's the
kernel's job to protect itself and return things to a quiescent state.
This is why I don't simply have QEMU send a bus reset on shutdown or put
reset policy that can affect other users or the host in userspace.
Thanks,

Alex

2014-12-04 18:00:04

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v5] Fixes for PCI backend for 3.19.

On Thu, Dec 04, 2014 at 03:46:42PM +0000, David Vrabel wrote:
> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
> >
> > These patches fix some issues with PCI back and also add proper
> > bus/slot reset.
>
> Applied 1-8 to devel/for-linus-3.19. I did not apply #9.

Excellent. Thank you!
>
> Thanks.
>
> David

2014-12-04 19:05:32

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

On Thu, Dec 04, 2014 at 02:31:11PM +0000, David Vrabel wrote:
> On 04/12/14 14:09, Sander Eikelenboom wrote:
> >
> > Thursday, December 4, 2014, 2:43:06 PM, you wrote:
> >
> >> On 04/12/14 13:10, Sander Eikelenboom wrote:
> >>>
> >>> Thursday, December 4, 2014, 1:24:47 PM, you wrote:
> >>>
> >>>> On 04/12/14 12:06, Konrad Rzeszutek Wilk wrote:
> >>>>>
> >>>>> On Dec 4, 2014 6:30 AM, David Vrabel <[email protected]> wrote:
> >>>>>>
> >>>>>> On 03/12/14 21:40, Konrad Rzeszutek Wilk wrote:
> >>>>>>>
> >>>>>>> Instead of doing all this complex dance, we depend on the toolstack
> >>>>>>> doing the right thing. As such implement the 'do_flr' SysFS attribute
> >>>>>>> which 'xl' uses when a device is detached or attached from/to a guest.
> >>>>>>> It bypasses the need to worry about the PCI lock.
> >>>>>>
> >>>>>> No. Get pciback to add its own "reset" sysfs file (as I have repeatedly
> >>>>>> proposed).
> >>>>>>
> >>>>>
> >>>>> Which does not work as the kobj will complain (as there is already an 'reset' associated with the PCI device).
> >>>
> >>>> It is only needed if the core won't provide one.
> >>>
> >>>> +static int pcistub_try_create_reset_file(struct pci_dev *pci)
> >>>> +{
> >>>> + struct xen_pcibk_dev_data *dev_data = pci_get_drvdata(pci);
> >>>> + struct device *dev = &pci->dev;
> >>>> + int ret;
> >>>> +
> >>>> + /* Already have a per-function reset? */
> >>>> + if (pci_probe_reset_function(pci) == 0)
> >>>> + return 0;
> >>>> +
> >>>> + ret = device_create_file(dev, &dev_attr_reset);
> >>>> + if (ret < 0)
> >>>> + return ret;
> >>> + dev_data->>created_reset_file = true;
> >>>> + return 0;
> >>>> +}
> >>>
> >>> Wouldn't the "core-reset-sysfs-file" be still wired to the end up calling
> >>> "pci.c:__pci_dev_reset" ?
> >>>
> >>> The problem with that function is that from my testing it seems that the
> >>> first option "pci_dev_specific_reset" always seems to return succes, so all the
> >>> other options are skipped (flr, pm, slot, bus). However the device it self is
> >>> not properly reset enough (perhaps the pci_dev_specific_reset is good enough for
> >>> none virtualization purposes and it's probably the least intrusive. For
> >>> virtualization however it would be nice to be sure it resets properly, or have a
> >>> way to force a specific reset routine.)
> >
> >> Then you need work with the maintainer for those specific devices or
> >> drivers to fix their specific reset function.
> >
> >> I'm not adding stuff to pciback to workaround broken quirks.
> >
> > OK that's a pretty clear message there, so if one wants to use pci and vga
> > passthrough one should better use KVM and vfio-pci.
>
> Have you (or anyone else) ever raised the problem with the broken reset
> quirk for certain devices with the relevant maintainer?
>
> > vfio-pci has:
> > - logic to do the try-slot-bus-reset logic
>
> Just because vfio-pci fixed it incorrectly doesn't mean pciback has to
> as well.
>
> It makes no sense for both pciback and vfio-pci to workaround problems
> with pci_function_reset() in different ways -- it should be fixed in the
> core PCI code so both can benefit and make use of the same code.

We seem to be going in circles.

This thread: http://patchwork.ozlabs.org/patch/368668/

has an interesting discussion that pretty much touches all of this
and I believe it ends with David agreeing with Alex that an
bus-reset is a perfect use-case.

I believe the contention was on how to expose this interface
to the user-space. David's was thinking to use 'reset' while
I used 'do_flr' (which is a misleading name but hte toolstack
already does it). Perhaps we should just have (as David suggested)
an 'bus_reset' on the devices.

I am going to go on a limb and presume that David was thinking
that this 'bus_reset' would be exposed in the generic PCI land?

>
> David
>

2014-12-05 10:30:13

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

On 04/12/14 15:39, Alex Williamson wrote:
>
> I don't know what workaround you're talking about. As devices are
> released from the user, vfio-pci attempts to reset them. If
> pci_reset_function() returns success we mark the device clean, otherwise
> it gets marked dirty. Each time a device is released, if there are
> dirty devices we test whether we can try a bus/slot reset to clean them.
> In the case of assigning a GPU this typically means that the GPU or
> audio function come through first, there's no reset mechanism so it gets
> marked dirty, the next device comes through and we manage to try a bus
> reset. vfio-pci does not have any device specific resets, all
> functionality is added to the PCI-core, thank-you-very-much. I even
> posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
> bad so that pci_reset_function() won't claim that worked. All VGA
> access quirks are done in QEMU, the kernel doesn't have any business in
> remapping config space over MMIO regions or trapping other config space
> backdoors.

Thanks for the info Alex, I hadn't got around to actually looking and
the vfio-pci code and was just going to what Sander said.

We probably do need to have a more in depth look at now PCI devices and
handled by both the toolstack and pciback but in the short term I would
like a simple solution that does not extend the ABI.

David

2014-12-05 17:23:09

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

On Fri, Dec 05, 2014 at 10:30:01AM +0000, David Vrabel wrote:
> On 04/12/14 15:39, Alex Williamson wrote:
> >
> > I don't know what workaround you're talking about. As devices are
> > released from the user, vfio-pci attempts to reset them. If
> > pci_reset_function() returns success we mark the device clean, otherwise
> > it gets marked dirty. Each time a device is released, if there are
> > dirty devices we test whether we can try a bus/slot reset to clean them.
> > In the case of assigning a GPU this typically means that the GPU or
> > audio function come through first, there's no reset mechanism so it gets
> > marked dirty, the next device comes through and we manage to try a bus
> > reset. vfio-pci does not have any device specific resets, all
> > functionality is added to the PCI-core, thank-you-very-much. I even
> > posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
> > bad so that pci_reset_function() won't claim that worked. All VGA
> > access quirks are done in QEMU, the kernel doesn't have any business in
> > remapping config space over MMIO regions or trapping other config space
> > backdoors.
>
> Thanks for the info Alex, I hadn't got around to actually looking and
> the vfio-pci code and was just going to what Sander said.
>
> We probably do need to have a more in depth look at now PCI devices and
> handled by both the toolstack and pciback but in the short term I would
> like a simple solution that does not extend the ABI.

Could you enumerate the 'simple solution' then please? I am having
a frustrating time figuring out what it is that you are proposing.


>
> David

2014-12-08 10:38:14

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

On 05/12/14 17:22, Konrad Rzeszutek Wilk wrote:
> On Fri, Dec 05, 2014 at 10:30:01AM +0000, David Vrabel wrote:
>> On 04/12/14 15:39, Alex Williamson wrote:
>>>
>>> I don't know what workaround you're talking about. As devices are
>>> released from the user, vfio-pci attempts to reset them. If
>>> pci_reset_function() returns success we mark the device clean, otherwise
>>> it gets marked dirty. Each time a device is released, if there are
>>> dirty devices we test whether we can try a bus/slot reset to clean them.
>>> In the case of assigning a GPU this typically means that the GPU or
>>> audio function come through first, there's no reset mechanism so it gets
>>> marked dirty, the next device comes through and we manage to try a bus
>>> reset. vfio-pci does not have any device specific resets, all
>>> functionality is added to the PCI-core, thank-you-very-much. I even
>>> posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
>>> bad so that pci_reset_function() won't claim that worked. All VGA
>>> access quirks are done in QEMU, the kernel doesn't have any business in
>>> remapping config space over MMIO regions or trapping other config space
>>> backdoors.
>>
>> Thanks for the info Alex, I hadn't got around to actually looking and
>> the vfio-pci code and was just going to what Sander said.
>>
>> We probably do need to have a more in depth look at now PCI devices and
>> handled by both the toolstack and pciback but in the short term I would
>> like a simple solution that does not extend the ABI.
>
> Could you enumerate the 'simple solution' then please? I am having
> a frustrating time figuring out what it is that you are proposing.

I've posted it repeatedly.

David

2014-12-08 16:04:20

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v5 9/9] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute

On Mon, Dec 08, 2014 at 10:38:09AM +0000, David Vrabel wrote:
> On 05/12/14 17:22, Konrad Rzeszutek Wilk wrote:
> > On Fri, Dec 05, 2014 at 10:30:01AM +0000, David Vrabel wrote:
> >> On 04/12/14 15:39, Alex Williamson wrote:
> >>>
> >>> I don't know what workaround you're talking about. As devices are
> >>> released from the user, vfio-pci attempts to reset them. If
> >>> pci_reset_function() returns success we mark the device clean, otherwise
> >>> it gets marked dirty. Each time a device is released, if there are
> >>> dirty devices we test whether we can try a bus/slot reset to clean them.
> >>> In the case of assigning a GPU this typically means that the GPU or
> >>> audio function come through first, there's no reset mechanism so it gets
> >>> marked dirty, the next device comes through and we manage to try a bus
> >>> reset. vfio-pci does not have any device specific resets, all
> >>> functionality is added to the PCI-core, thank-you-very-much. I even
> >>> posted a generic PCI quirk patch recently that marks AMD VGA PM reset as
> >>> bad so that pci_reset_function() won't claim that worked. All VGA
> >>> access quirks are done in QEMU, the kernel doesn't have any business in
> >>> remapping config space over MMIO regions or trapping other config space
> >>> backdoors.
> >>
> >> Thanks for the info Alex, I hadn't got around to actually looking and
> >> the vfio-pci code and was just going to what Sander said.
> >>
> >> We probably do need to have a more in depth look at now PCI devices and
> >> handled by both the toolstack and pciback but in the short term I would
> >> like a simple solution that does not extend the ABI.
> >
> > Could you enumerate the 'simple solution' then please? I am having
> > a frustrating time figuring out what it is that you are proposing.
>
> I've posted it repeatedly.

Are you referring to http://lists.xenproject.org/archives/html/xen-devel/2014-07/msg01270.html
which is still waiting for your feedback?