From: Amey Narkhede <[email protected]>
PCI and PCIe devices may support a number of possible reset mechanisms
for example Function Level Reset (FLR) provided via Advanced Feature or
PCIe capabilities, Power Management reset, bus reset, or device specific reset.
Currently the PCI subsystem creates a policy prioritizing these reset methods
which provides neither visibility nor control to userspace.
Expose the reset methods available per device to userspace, via sysfs
and allow an administrative user or device owner to have ability to
manage per device reset method priorities or exclusions.
This feature aims to allow greater control of a device for use cases
as device assignment, where specific device or platform issues may
interact poorly with a given reset method, and for which device specific
quirks have not been developed.
Suggested-by: Alex Williamson <[email protected]>
Reviewed-by: Alex Williamson <[email protected]>
Reviewed-by: Raphael Norwitz <[email protected]>
Amey Narkhede (4):
PCI: Refactor pcie_flr to follow calling convention of other reset
methods
PCI: Add new bitmap for keeping track of supported reset mechanisms
PCI: Remove reset_fn field from pci_dev
PCI/sysfs: Allow userspace to query and set device reset mechanism
Documentation/ABI/testing/sysfs-bus-pci | 15 ++
drivers/crypto/cavium/nitrox/nitrox_main.c | 4 +-
drivers/crypto/qat/qat_common/adf_aer.c | 2 +-
drivers/infiniband/hw/hfi1/chip.c | 4 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
.../ethernet/cavium/liquidio/lio_vf_main.c | 4 +-
.../ethernet/cavium/liquidio/octeon_mailbox.c | 2 +-
drivers/net/ethernet/freescale/enetc/enetc.c | 2 +-
.../ethernet/freescale/enetc/enetc_pci_mdio.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 +-
drivers/pci/pci-sysfs.c | 68 +++++++-
drivers/pci/pci.c | 160 ++++++++++--------
drivers/pci/pci.h | 11 +-
drivers/pci/pcie/aer.c | 12 +-
drivers/pci/probe.c | 4 +-
drivers/pci/quirks.c | 17 +-
include/linux/pci.h | 17 +-
17 files changed, 213 insertions(+), 117 deletions(-)
--
2.30.2
From: Amey Narkhede <[email protected]>
Introduce a new bitmap reset_methods in struct pci_dev
to keep track of reset mechanisms supported by the
device. Also refactor probing and reset functions
to take advantage of calling convention of reset
functions.
Signed-off-by: Amey Narkhede <[email protected]>
---
Reviewed-by: Alex Williamson <[email protected]>
Reviewed-by: Raphael Norwitz <[email protected]>
drivers/pci/pci.c | 106 ++++++++++++++++++++++++--------------------
drivers/pci/pci.h | 11 ++++-
drivers/pci/probe.c | 5 +--
include/linux/pci.h | 10 +++++
4 files changed, 79 insertions(+), 53 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4a7c084a3..407b44e85 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -40,6 +40,26 @@ const char *pci_power_names[] = {
};
EXPORT_SYMBOL_GPL(pci_power_names);
+static int pci_af_flr(struct pci_dev *dev, int probe);
+static int pci_pm_reset(struct pci_dev *dev, int probe);
+static int pci_dev_reset_slot_function(struct pci_dev *dev, int probe);
+static int pci_parent_bus_reset(struct pci_dev *dev, int probe);
+
+/*
+ * The ordering for functions in pci_reset_fn_methods
+ * is required for bitmap positions defined
+ * in reset_methods in struct pci_dev
+ */
+const struct pci_reset_fn_method pci_reset_fn_methods[] = {
+ { .reset_fn = &pci_dev_specific_reset, .name = "device_specific" },
+ { .reset_fn = &pcie_flr, .name = "flr" },
+ { .reset_fn = &pci_af_flr, .name = "af_flr" },
+ { .reset_fn = &pci_pm_reset, .name = "pm" },
+ { .reset_fn = &pci_dev_reset_slot_function, .name = "slot" },
+ { .reset_fn = &pci_parent_bus_reset, .name = "bus" },
+ { 0 },
+};
+
int isa_dma_bridge_buggy;
EXPORT_SYMBOL(isa_dma_bridge_buggy);
@@ -5080,71 +5100,59 @@ static void pci_dev_restore(struct pci_dev *dev)
*/
int __pci_reset_function_locked(struct pci_dev *dev)
{
- int rc;
+ int i, rc = -ENOTTY;
+ const struct pci_reset_fn_method *reset;
might_sleep();
- /*
- * A reset method returns -ENOTTY if it doesn't support this device
- * and we should try the next method.
- *
- * If it returns 0 (success), we're finished. If it returns any
- * other error, we're also finished: this indicates that further
- * reset mechanisms might be broken on the device.
- */
- rc = pci_dev_specific_reset(dev, 0);
- if (rc != -ENOTTY)
- return rc;
- rc = pcie_flr(dev, 0);
- if (rc != -ENOTTY)
- return rc;
- rc = pci_af_flr(dev, 0);
- if (rc != -ENOTTY)
- return rc;
- rc = pci_pm_reset(dev, 0);
- if (rc != -ENOTTY)
- return rc;
- rc = pci_dev_reset_slot_function(dev, 0);
- if (rc != -ENOTTY)
- return rc;
- return pci_parent_bus_reset(dev, 0);
+ for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
+ if (!(dev->reset_methods & (1 << i)))
+ continue;
+
+ /*
+ * A reset method returns -ENOTTY if it doesn't support this device
+ * and we should try the next method.
+ *
+ * If it returns 0 (success), we're finished. If it returns any
+ * other error, we're also finished: this indicates that further
+ * reset mechanisms might be broken on the device.
+ */
+ rc = reset->reset_fn(dev, 0);
+ if (rc != -ENOTTY)
+ return rc;
+ }
+ return rc;
}
EXPORT_SYMBOL_GPL(__pci_reset_function_locked);
/**
- * pci_probe_reset_function - check whether the device can be safely reset
- * @dev: PCI device to reset
+ * pci_init_reset_methods - check whether device can be safely reset
+ * and store supported reset mechanisms.
+ * @dev: PCI device to check for reset mechanisms
*
* Some devices allow an individual function to be reset without affecting
* other functions in the same device. The PCI device must be responsive
- * to PCI config space in order to use this function.
+ * to reads and writes to its PCI config space in order to use this function.
*
- * Returns 0 if the device function can be reset or negative if the
- * device doesn't support resetting a single function.
+ * Stores reset mechanisms supported by device in reset_methods bitmap
+ * field of struct pci_dev
*/
-int pci_probe_reset_function(struct pci_dev *dev)
+void pci_init_reset_methods(struct pci_dev *dev)
{
- int rc;
+ int i, rc;
+ const struct pci_reset_fn_method *reset;
- might_sleep();
+ dev->reset_methods = 0;
- rc = pci_dev_specific_reset(dev, 1);
- if (rc != -ENOTTY)
- return rc;
- rc = pcie_flr(dev, 1);
- if (rc != -ENOTTY)
- return rc;
- rc = pci_af_flr(dev, 1);
- if (rc != -ENOTTY)
- return rc;
- rc = pci_pm_reset(dev, 1);
- if (rc != -ENOTTY)
- return rc;
- rc = pci_dev_reset_slot_function(dev, 1);
- if (rc != -ENOTTY)
- return rc;
+ might_sleep();
- return pci_parent_bus_reset(dev, 1);
+ for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
+ rc = reset->reset_fn(dev, 1);
+ if (!rc)
+ dev->reset_methods |= (1 << i);
+ else if (rc != -ENOTTY)
+ break;
+ }
}
/**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index ef7c46613..ec093efdc 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -39,7 +39,7 @@ enum pci_mmap_api {
int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vmai,
enum pci_mmap_api mmap_api);
-int pci_probe_reset_function(struct pci_dev *dev);
+void pci_init_reset_methods(struct pci_dev *dev);
int pci_bridge_secondary_bus_reset(struct pci_dev *dev);
int pci_bus_error_reset(struct pci_dev *dev);
@@ -612,6 +612,15 @@ struct pci_dev_reset_methods {
int (*reset)(struct pci_dev *dev, int probe);
};
+typedef int (*pci_reset_fn_t)(struct pci_dev *, int);
+
+struct pci_reset_fn_method {
+ pci_reset_fn_t reset_fn;
+ char *name;
+};
+
+extern const struct pci_reset_fn_method pci_reset_fn_methods[];
+
#ifdef CONFIG_PCI_QUIRKS
int pci_dev_specific_reset(struct pci_dev *dev, int probe);
#else
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 953f15abc..01dd037bd 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2403,9 +2403,8 @@ static void pci_init_capabilities(struct pci_dev *dev)
pci_rcec_init(dev); /* Root Complex Event Collector */
pcie_report_downtraining(dev);
-
- if (pci_probe_reset_function(dev) == 0)
- dev->reset_fn = 1;
+ pci_init_reset_methods(dev);
+ dev->reset_fn = !!dev->reset_methods;
}
/*
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 621ff5224..56d6e4750 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -325,6 +325,16 @@ struct pci_dev {
unsigned int class; /* 3 bytes: (base,sub,prog-if) */
u8 revision; /* PCI revision, low byte of class word */
u8 hdr_type; /* PCI header type (`multi' flag masked out) */
+ /*
+ * bit 0 -> dev_specific
+ * bit 1 -> flr
+ * bit 2 -> af_flr
+ * bit 3 -> pm
+ * bit 4 -> slot
+ * bit 5 -> bus
+ * See pci_reset_fn_methods array in pci.c
+ */
+ u8 __bitwise reset_methods; /* bitmap for device supported reset capabilities */
#ifdef CONFIG_PCIEAER
u16 aer_cap; /* AER capability offset */
struct aer_stats *aer_stats; /* AER stats for this device */
--
2.30.2
From: Amey Narkhede <[email protected]>
reset_fn field is used to indicate whether the
device supports any reset mechanism or not.
Deprecate use of reset_fn in favor of new
reset_methods bitmap which can be used to keep
track of all supported reset mechanisms of a device.
Signed-off-by: Amey Narkhede <[email protected]>
---
Reviewed-by: Alex Williamson <[email protected]>
Reviewed-by: Raphael Norwitz <[email protected]>
drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 2 +-
drivers/pci/pci-sysfs.c | 6 ++----
drivers/pci/pci.c | 6 +++---
drivers/pci/probe.c | 1 -
drivers/pci/quirks.c | 2 +-
include/linux/pci.h | 1 -
6 files changed, 7 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 9b9d305c6..3e2c49e08 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -526,7 +526,7 @@ static void octeon_destroy_resources(struct octeon_device *oct)
oct->irq_name_storage = NULL;
}
/* Soft reset the octeon device before exiting */
- if (oct->pci_dev->reset_fn)
+ if (oct->pci_dev->reset_methods)
octeon_pci_flr(oct);
else
cn23xx_vf_ask_pf_to_do_flr(oct);
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index f8afd54ca..78d2c130c 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1334,7 +1334,7 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev)
pcie_vpd_create_sysfs_dev_files(dev);
- if (dev->reset_fn) {
+ if (dev->reset_methods) {
retval = device_create_file(&dev->dev, &dev_attr_reset);
if (retval)
goto error;
@@ -1417,10 +1417,8 @@ int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev)
static void pci_remove_capabilities_sysfs(struct pci_dev *dev)
{
pcie_vpd_remove_sysfs_dev_files(dev);
- if (dev->reset_fn) {
+ if (dev->reset_methods)
device_remove_file(&dev->dev, &dev_attr_reset);
- dev->reset_fn = 0;
- }
}
/**
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 407b44e85..b7f6c6588 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5175,7 +5175,7 @@ int pci_reset_function(struct pci_dev *dev)
{
int rc;
- if (!dev->reset_fn)
+ if (!dev->reset_methods)
return -ENOTTY;
pci_dev_lock(dev);
@@ -5211,7 +5211,7 @@ int pci_reset_function_locked(struct pci_dev *dev)
{
int rc;
- if (!dev->reset_fn)
+ if (!dev->reset_methods)
return -ENOTTY;
pci_dev_save_and_disable(dev);
@@ -5234,7 +5234,7 @@ int pci_try_reset_function(struct pci_dev *dev)
{
int rc;
- if (!dev->reset_fn)
+ if (!dev->reset_methods)
return -ENOTTY;
if (!pci_dev_trylock(dev))
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 01dd037bd..4764e031a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2404,7 +2404,6 @@ static void pci_init_capabilities(struct pci_dev *dev)
pcie_report_downtraining(dev);
pci_init_reset_methods(dev);
- dev->reset_fn = !!dev->reset_methods;
}
/*
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 0a3df84c9..20a81b1bc 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5535,7 +5535,7 @@ static void quirk_reset_lenovo_thinkpad_p50_nvgpu(struct pci_dev *pdev)
if (pdev->subsystem_vendor != PCI_VENDOR_ID_LENOVO ||
pdev->subsystem_device != 0x222e ||
- !pdev->reset_fn)
+ !pdev->reset_methods)
return;
if (pci_enable_device_mem(pdev))
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 56d6e4750..a2f003f4e 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -437,7 +437,6 @@ struct pci_dev {
unsigned int state_saved:1;
unsigned int is_physfn:1;
unsigned int is_virtfn:1;
- unsigned int reset_fn:1;
unsigned int is_hotplug_bridge:1;
unsigned int shpc_managed:1; /* SHPC owned by shpchp */
unsigned int is_thunderbolt:1; /* Thunderbolt controller */
--
2.30.2
From: Amey Narkhede <[email protected]>
Add reset_methods_enabled bitmap to struct pci_dev to
keep track of user preferred device reset mechanisms.
Add reset_method sysfs attribute to query and set
user preferred device reset mechanisms.
Signed-off-by: Amey Narkhede <[email protected]>
---
Reviewed-by: Alex Williamson <[email protected]>
Reviewed-by: Raphael Norwitz <[email protected]>
Documentation/ABI/testing/sysfs-bus-pci | 15 ++++++
drivers/pci/pci-sysfs.c | 66 +++++++++++++++++++++++--
drivers/pci/pci.c | 3 +-
include/linux/pci.h | 2 +
4 files changed, 82 insertions(+), 4 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 25c9c3977..ae53ecd2e 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -121,6 +121,21 @@ Description:
child buses, and re-discover devices removed earlier
from this part of the device tree.
+What: /sys/bus/pci/devices/.../reset_method
+Date: March 2021
+Contact: Amey Narkhede <[email protected]>
+Description:
+ Some devices allow an individual function to be reset
+ without affecting other functions in the same slot.
+ For devices that have this support, a file named reset_method
+ will be present in sysfs. Reading this file will give names
+ of the device supported reset methods. Currently used methods
+ are enclosed in brackets. Writing the name of any of the device
+ supported reset method to this file will set the reset method to
+ be used when resetting the device. Writing "none" to this file
+ will disable ability to reset the device and writing "default"
+ will return to the original value.
+
What: /sys/bus/pci/devices/.../reset
Date: July 2009
Contact: Michael S. Tsirkin <[email protected]>
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 78d2c130c..3cd06d1c0 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1304,6 +1304,59 @@ static const struct bin_attribute pcie_config_attr = {
.write = pci_write_config,
};
+static ssize_t reset_method_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ const struct pci_reset_fn_method *reset;
+ struct pci_dev *pdev = to_pci_dev(dev);
+ ssize_t len = 0;
+ int i;
+
+ for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
+ if (pdev->reset_methods_enabled & (1 << i))
+ len += sysfs_emit_at(buf, len, "[%s] ", reset->name);
+ else if (pdev->reset_methods & (1 << i))
+ len += sysfs_emit_at(buf, len, "%s ", reset->name);
+ }
+
+ return len;
+}
+
+static ssize_t reset_method_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ const struct pci_reset_fn_method *reset = pci_reset_fn_methods;
+ struct pci_dev *pdev = to_pci_dev(dev);
+ u8 reset_mechanism;
+ int i = 0;
+
+ /* Writing none disables reset */
+ if (sysfs_streq(buf, "none")) {
+ reset_mechanism = 0;
+ } else if (sysfs_streq(buf, "default")) {
+ /* Writing default returns to initial value */
+ reset_mechanism = pdev->reset_methods;
+ } else {
+ reset_mechanism = 0;
+ for (; reset->reset_fn; i++, reset++) {
+ if (sysfs_streq(buf, reset->name)) {
+ reset_mechanism = 1 << i;
+ break;
+ }
+ }
+ if (!reset_mechanism || !(pdev->reset_methods & reset_mechanism))
+ return -EINVAL;
+ }
+
+ pdev->reset_methods_enabled = reset_mechanism;
+
+ return count;
+}
+
+static DEVICE_ATTR_RW(reset_method);
+
static ssize_t reset_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
@@ -1337,11 +1390,16 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev)
if (dev->reset_methods) {
retval = device_create_file(&dev->dev, &dev_attr_reset);
if (retval)
- goto error;
+ goto err_reset;
+ retval = device_create_file(&dev->dev, &dev_attr_reset_method);
+ if (retval)
+ goto err_method;
}
return 0;
-error:
+err_method:
+ device_remove_file(&dev->dev, &dev_attr_reset);
+err_reset:
pcie_vpd_remove_sysfs_dev_files(dev);
return retval;
}
@@ -1417,8 +1475,10 @@ int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev)
static void pci_remove_capabilities_sysfs(struct pci_dev *dev)
{
pcie_vpd_remove_sysfs_dev_files(dev);
- if (dev->reset_methods)
+ if (dev->reset_methods) {
device_remove_file(&dev->dev, &dev_attr_reset);
+ device_remove_file(&dev->dev, &dev_attr_reset_method);
+ }
}
/**
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b7f6c6588..81cebea56 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5106,7 +5106,7 @@ int __pci_reset_function_locked(struct pci_dev *dev)
might_sleep();
for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
- if (!(dev->reset_methods & (1 << i)))
+ if (!(dev->reset_methods_enabled & (1 << i)))
continue;
/*
@@ -5153,6 +5153,7 @@ void pci_init_reset_methods(struct pci_dev *dev)
else if (rc != -ENOTTY)
break;
}
+ dev->reset_methods_enabled = dev->reset_methods;
}
/**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index a2f003f4e..400f614e0 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -335,6 +335,8 @@ struct pci_dev {
* See pci_reset_fn_methods array in pci.c
*/
u8 __bitwise reset_methods; /* bitmap for device supported reset capabilities */
+ /* bitmap for user enabled and device supported reset capabilities */
+ u8 __bitwise reset_methods_enabled;
#ifdef CONFIG_PCIEAER
u16 aer_cap; /* AER capability offset */
struct aer_stats *aer_stats; /* AER stats for this device */
--
2.30.2
On 21/03/12 11:20AM, Alex Williamson wrote:
> On Fri, 12 Mar 2021 23:04:48 +0530
> [email protected] wrote:
>
> > From: Amey Narkhede <[email protected]>
> >
> > PCI and PCIe devices may support a number of possible reset mechanisms
> > for example Function Level Reset (FLR) provided via Advanced Feature or
> > PCIe capabilities, Power Management reset, bus reset, or device specific reset.
> > Currently the PCI subsystem creates a policy prioritizing these reset methods
> > which provides neither visibility nor control to userspace.
> >
> > Expose the reset methods available per device to userspace, via sysfs
> > and allow an administrative user or device owner to have ability to
> > manage per device reset method priorities or exclusions.
> > This feature aims to allow greater control of a device for use cases
> > as device assignment, where specific device or platform issues may
> > interact poorly with a given reset method, and for which device specific
> > quirks have not been developed.
> >
> > Suggested-by: Alex Williamson <[email protected]>
> > Reviewed-by: Alex Williamson <[email protected]>
> > Reviewed-by: Raphael Norwitz <[email protected]>
>
> Reviews/Acks/Sign-off-by from others (aside from Tested/Reported-by)
> really need to be explicit, IMO. This is a common issue for new
> developers, but it really needs to be more formal. I wouldn't claim to
> be able to speak for Raphael and interpret his comments so far as his
> final seal of approval.
>
> Also in the patches, all Sign-offs/Reviews/Acks need to be above the
> triple dash '---' line. Anything between that line and the beginning
> of the diff is discarded by tools. People will often use that for
> difference between version since it will be discarded on commit.
> Likewise, the cover letter is not committed, so Review-by there are
> generally not done. I generally make my Sign-off last in the chain and
> maintainers will generally add theirs after that. This makes for a
> chain where someone can read up from the bottom to see how this commit
> entered the kernel. Reviews, Acks, and whatnot will therefore usually
> be collected above the author posting the patch.
>
> Since this is a v1 patch and it's likely there will be more revisions,
> rather than send a v2 immediately with corrections, I'd probably just
> reply to the cover letter retracting Raphael's Review-by for him to
> send his own and noting that you'll fix the commit reviews formatting,
> but will wait for a bit for further comments before sending a new
> version.
>
> No big deal, nice work getting it sent out. Thanks,
>
> Alex
>
Raphael sent me the email with
Reviewed-by: Raphael Norwitz <[email protected]> that
is why I included it.
So basically in v2 I should reorder tags such that Sign-off will be
the last. Did I get that right? Or am I missing something?
Thanks,
Amey
> > Amey Narkhede (4):
> > PCI: Refactor pcie_flr to follow calling convention of other reset
> > methods
> > PCI: Add new bitmap for keeping track of supported reset mechanisms
> > PCI: Remove reset_fn field from pci_dev
> > PCI/sysfs: Allow userspace to query and set device reset mechanism
> >
> > Documentation/ABI/testing/sysfs-bus-pci | 15 ++
> > drivers/crypto/cavium/nitrox/nitrox_main.c | 4 +-
> > drivers/crypto/qat/qat_common/adf_aer.c | 2 +-
> > drivers/infiniband/hw/hfi1/chip.c | 4 +-
> > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
> > .../ethernet/cavium/liquidio/lio_vf_main.c | 4 +-
> > .../ethernet/cavium/liquidio/octeon_mailbox.c | 2 +-
> > drivers/net/ethernet/freescale/enetc/enetc.c | 2 +-
> > .../ethernet/freescale/enetc/enetc_pci_mdio.c | 2 +-
> > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 +-
> > drivers/pci/pci-sysfs.c | 68 +++++++-
> > drivers/pci/pci.c | 160 ++++++++++--------
> > drivers/pci/pci.h | 11 +-
> > drivers/pci/pcie/aer.c | 12 +-
> > drivers/pci/probe.c | 4 +-
> > drivers/pci/quirks.c | 17 +-
> > include/linux/pci.h | 17 +-
> > 17 files changed, 213 insertions(+), 117 deletions(-)
> >
> > --
> > 2.30.2
> >
>
Hi Amey,
Thank you for sending the series over!
[...]
> > Reviews/Acks/Sign-off-by from others (aside from Tested/Reported-by)
> > really need to be explicit, IMO. This is a common issue for new
> > developers, but it really needs to be more formal. I wouldn't claim to
> > be able to speak for Raphael and interpret his comments so far as his
> > final seal of approval.
> >
> > Also in the patches, all Sign-offs/Reviews/Acks need to be above the
> > triple dash '---' line. Anything between that line and the beginning
> > of the diff is discarded by tools. People will often use that for
> > difference between version since it will be discarded on commit.
> > Likewise, the cover letter is not committed, so Review-by there are
> > generally not done. I generally make my Sign-off last in the chain and
> > maintainers will generally add theirs after that. This makes for a
> > chain where someone can read up from the bottom to see how this commit
> > entered the kernel. Reviews, Acks, and whatnot will therefore usually
> > be collected above the author posting the patch.
> >
> > Since this is a v1 patch and it's likely there will be more revisions,
> > rather than send a v2 immediately with corrections, I'd probably just
> > reply to the cover letter retracting Raphael's Review-by for him to
> > send his own and noting that you'll fix the commit reviews formatting,
> > but will wait for a bit for further comments before sending a new
> > version.
> >
> > No big deal, nice work getting it sent out. Thanks,
> >
> > Alex
> >
> Raphael sent me the email with
> Reviewed-by: Raphael Norwitz <[email protected]> that
> is why I included it.
> So basically in v2 I should reorder tags such that Sign-off will be
> the last. Did I get that right? Or am I missing something?
[...]
I am not sure about the messages outside of the mailing list between
you, Alex and Raphael, as normally conversation and any reviews would
happen here (on the mailing list, that is), but as long as everyone
involved is on the same page, then every should be fine.
In terms of how to format the patch, have a look at the following,
especially before you send another version, as there are some good tips
and recommendations there (including how to order things):
https://lore.kernel.org/linux-pci/[email protected]/
Krzysztof
On 21/03/12 07:58PM, Krzysztof Wilczyński wrote:
> Hi Amey,
>
> Thank you for sending the series over!
>
> [...]
> > > Reviews/Acks/Sign-off-by from others (aside from Tested/Reported-by)
> > > really need to be explicit, IMO. This is a common issue for new
> > > developers, but it really needs to be more formal. I wouldn't claim to
> > > be able to speak for Raphael and interpret his comments so far as his
> > > final seal of approval.
> > >
> > > Also in the patches, all Sign-offs/Reviews/Acks need to be above the
> > > triple dash '---' line. Anything between that line and the beginning
> > > of the diff is discarded by tools. People will often use that for
> > > difference between version since it will be discarded on commit.
> > > Likewise, the cover letter is not committed, so Review-by there are
> > > generally not done. I generally make my Sign-off last in the chain and
> > > maintainers will generally add theirs after that. This makes for a
> > > chain where someone can read up from the bottom to see how this commit
> > > entered the kernel. Reviews, Acks, and whatnot will therefore usually
> > > be collected above the author posting the patch.
> > >
> > > Since this is a v1 patch and it's likely there will be more revisions,
> > > rather than send a v2 immediately with corrections, I'd probably just
> > > reply to the cover letter retracting Raphael's Review-by for him to
> > > send his own and noting that you'll fix the commit reviews formatting,
> > > but will wait for a bit for further comments before sending a new
> > > version.
> > >
> > > No big deal, nice work getting it sent out. Thanks,
> > >
> > > Alex
> > >
> > Raphael sent me the email with
> > Reviewed-by: Raphael Norwitz <[email protected]> that
> > is why I included it.
> > So basically in v2 I should reorder tags such that Sign-off will be
> > the last. Did I get that right? Or am I missing something?
> [...]
>
> I am not sure about the messages outside of the mailing list between
> you, Alex and Raphael, as normally conversation and any reviews would
> happen here (on the mailing list, that is), but as long as everyone
> involved is on the same page, then every should be fine.
>
> In terms of how to format the patch, have a look at the following,
> especially before you send another version, as there are some good tips
> and recommendations there (including how to order things):
>
> https://lore.kernel.org/linux-pci/[email protected]/
>
> Krzysztof
Basically whole thing boils down to I'm not good at handling terminal
email clients. I'll surely keep those points mentioned by Bjorn
in my mind.
Thanks,
Amey
Hi Amey,
[...]
> Basically whole thing boils down to I'm not good at handling terminal
> email clients. I'll surely keep those points mentioned by Bjorn
> in my mind.
[...]
No worries. Thunderbird works fine with Google Mail and can send plain
text e-mails too, if you get tired of Mutt etc.
By the way, don't immediately send v2 quite yet. Allow people some time
to review first version. Well, unless you deem that you need to do it,
that is.
Krzysztof
On Sat, Mar 13, 2021 at 12:10:38AM +0530, Amey Narkhede wrote:
> On 21/03/12 11:20AM, Alex Williamson wrote:
> > On Fri, 12 Mar 2021 23:04:48 +0530
> > [email protected] wrote:
> >
> > > From: Amey Narkhede <[email protected]>
> > >
> > > PCI and PCIe devices may support a number of possible reset mechanisms
> > > for example Function Level Reset (FLR) provided via Advanced Feature or
> > > PCIe capabilities, Power Management reset, bus reset, or device specific reset.
> > > Currently the PCI subsystem creates a policy prioritizing these reset methods
> > > which provides neither visibility nor control to userspace.
> > >
> > > Expose the reset methods available per device to userspace, via sysfs
> > > and allow an administrative user or device owner to have ability to
> > > manage per device reset method priorities or exclusions.
> > > This feature aims to allow greater control of a device for use cases
> > > as device assignment, where specific device or platform issues may
> > > interact poorly with a given reset method, and for which device specific
> > > quirks have not been developed.
> > >
> > > Suggested-by: Alex Williamson <[email protected]>
> > > Reviewed-by: Alex Williamson <[email protected]>
> > > Reviewed-by: Raphael Norwitz <[email protected]>
> >
> > Reviews/Acks/Sign-off-by from others (aside from Tested/Reported-by)
> > really need to be explicit, IMO. This is a common issue for new
> > developers, but it really needs to be more formal. I wouldn't claim to
> > be able to speak for Raphael and interpret his comments so far as his
> > final seal of approval.
> >
> > Also in the patches, all Sign-offs/Reviews/Acks need to be above the
> > triple dash '---' line. Anything between that line and the beginning
> > of the diff is discarded by tools. People will often use that for
> > difference between version since it will be discarded on commit.
> > Likewise, the cover letter is not committed, so Review-by there are
> > generally not done. I generally make my Sign-off last in the chain and
> > maintainers will generally add theirs after that. This makes for a
> > chain where someone can read up from the bottom to see how this commit
> > entered the kernel. Reviews, Acks, and whatnot will therefore usually
> > be collected above the author posting the patch.
> >
> > Since this is a v1 patch and it's likely there will be more revisions,
> > rather than send a v2 immediately with corrections, I'd probably just
> > reply to the cover letter retracting Raphael's Review-by for him to
> > send his own and noting that you'll fix the commit reviews formatting,
> > but will wait for a bit for further comments before sending a new
> > version.
> >
> > No big deal, nice work getting it sent out. Thanks,
> >
> > Alex
> >
> Raphael sent me the email with
> Reviewed-by: Raphael Norwitz <[email protected]> that
> is why I included it.
> So basically in v2 I should reorder tags such that Sign-off will be
> the last. Did I get that right? Or am I missing something?
>
Just to confirm, I did send
Reviewed-by: Raphael Norwitz <[email protected]>
for the latest version and I'm happy to have it on this series.
> Thanks,
> Amey
>
> > > Amey Narkhede (4):
> > > PCI: Refactor pcie_flr to follow calling convention of other reset
> > > methods
> > > PCI: Add new bitmap for keeping track of supported reset mechanisms
> > > PCI: Remove reset_fn field from pci_dev
> > > PCI/sysfs: Allow userspace to query and set device reset mechanism
> > >
> > > Documentation/ABI/testing/sysfs-bus-pci | 15 ++
> > > drivers/crypto/cavium/nitrox/nitrox_main.c | 4 +-
> > > drivers/crypto/qat/qat_common/adf_aer.c | 2 +-
> > > drivers/infiniband/hw/hfi1/chip.c | 4 +-
> > > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
> > > .../ethernet/cavium/liquidio/lio_vf_main.c | 4 +-
> > > .../ethernet/cavium/liquidio/octeon_mailbox.c | 2 +-
> > > drivers/net/ethernet/freescale/enetc/enetc.c | 2 +-
> > > .../ethernet/freescale/enetc/enetc_pci_mdio.c | 2 +-
> > > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 +-
> > > drivers/pci/pci-sysfs.c | 68 +++++++-
> > > drivers/pci/pci.c | 160 ++++++++++--------
> > > drivers/pci/pci.h | 11 +-
> > > drivers/pci/pcie/aer.c | 12 +-
> > > drivers/pci/probe.c | 4 +-
> > > drivers/pci/quirks.c | 17 +-
> > > include/linux/pci.h | 17 +-
> > > 17 files changed, 213 insertions(+), 117 deletions(-)
> > >
> > > --
> > > 2.30.2
> > >
> >
On Fri, Mar 12, 2021 at 11:04:48PM +0530, [email protected] wrote:
> From: Amey Narkhede <[email protected]>
>
> PCI and PCIe devices may support a number of possible reset mechanisms
> for example Function Level Reset (FLR) provided via Advanced Feature or
> PCIe capabilities, Power Management reset, bus reset, or device specific reset.
> Currently the PCI subsystem creates a policy prioritizing these reset methods
> which provides neither visibility nor control to userspace.
>
> Expose the reset methods available per device to userspace, via sysfs
> and allow an administrative user or device owner to have ability to
> manage per device reset method priorities or exclusions.
> This feature aims to allow greater control of a device for use cases
> as device assignment, where specific device or platform issues may
> interact poorly with a given reset method, and for which device specific
> quirks have not been developed.
Sorry, are we talking about specific devices/flows/applications that
must have this functionality or about theoretical use case?
Thanks
On Friday 12 March 2021 23:04:50 [email protected] wrote:
> From: Amey Narkhede <[email protected]>
>
> Introduce a new bitmap reset_methods in struct pci_dev
> to keep track of reset mechanisms supported by the
> device. Also refactor probing and reset functions
> to take advantage of calling convention of reset
> functions.
>
> Signed-off-by: Amey Narkhede <[email protected]>
> ---
> Reviewed-by: Alex Williamson <[email protected]>
> Reviewed-by: Raphael Norwitz <[email protected]>
>
> drivers/pci/pci.c | 106 ++++++++++++++++++++++++--------------------
> drivers/pci/pci.h | 11 ++++-
> drivers/pci/probe.c | 5 +--
> include/linux/pci.h | 10 +++++
> 4 files changed, 79 insertions(+), 53 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 4a7c084a3..407b44e85 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -40,6 +40,26 @@ const char *pci_power_names[] = {
> };
> EXPORT_SYMBOL_GPL(pci_power_names);
>
> +static int pci_af_flr(struct pci_dev *dev, int probe);
> +static int pci_pm_reset(struct pci_dev *dev, int probe);
> +static int pci_dev_reset_slot_function(struct pci_dev *dev, int probe);
> +static int pci_parent_bus_reset(struct pci_dev *dev, int probe);
> +
> +/*
> + * The ordering for functions in pci_reset_fn_methods
> + * is required for bitmap positions defined
> + * in reset_methods in struct pci_dev
> + */
> +const struct pci_reset_fn_method pci_reset_fn_methods[] = {
> + { .reset_fn = &pci_dev_specific_reset, .name = "device_specific" },
> + { .reset_fn = &pcie_flr, .name = "flr" },
> + { .reset_fn = &pci_af_flr, .name = "af_flr" },
> + { .reset_fn = &pci_pm_reset, .name = "pm" },
> + { .reset_fn = &pci_dev_reset_slot_function, .name = "slot" },
> + { .reset_fn = &pci_parent_bus_reset, .name = "bus" },
Hello Amey! In the list of reset methods is missing PCIe Warm Reset.
Could you extend and prepare API also for PCIe Warm Reset? According to
PCI Express mini card and m.2 electromechanical specifications, PCIe
Warm Reset can be triggered by PERST# signal and more kernel drivers can
internally control PERST#. Just there is no kernel API and therefore
PCIe Warm Reset nor PERST# signal is unified.
> + { 0 },
> +};
> +
> int isa_dma_bridge_buggy;
> EXPORT_SYMBOL(isa_dma_bridge_buggy);
>
> @@ -5080,71 +5100,59 @@ static void pci_dev_restore(struct pci_dev *dev)
> */
> int __pci_reset_function_locked(struct pci_dev *dev)
> {
> - int rc;
> + int i, rc = -ENOTTY;
> + const struct pci_reset_fn_method *reset;
>
> might_sleep();
>
> - /*
> - * A reset method returns -ENOTTY if it doesn't support this device
> - * and we should try the next method.
> - *
> - * If it returns 0 (success), we're finished. If it returns any
> - * other error, we're also finished: this indicates that further
> - * reset mechanisms might be broken on the device.
> - */
> - rc = pci_dev_specific_reset(dev, 0);
> - if (rc != -ENOTTY)
> - return rc;
> - rc = pcie_flr(dev, 0);
> - if (rc != -ENOTTY)
> - return rc;
> - rc = pci_af_flr(dev, 0);
> - if (rc != -ENOTTY)
> - return rc;
> - rc = pci_pm_reset(dev, 0);
> - if (rc != -ENOTTY)
> - return rc;
> - rc = pci_dev_reset_slot_function(dev, 0);
> - if (rc != -ENOTTY)
> - return rc;
> - return pci_parent_bus_reset(dev, 0);
> + for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
> + if (!(dev->reset_methods & (1 << i)))
> + continue;
> +
> + /*
> + * A reset method returns -ENOTTY if it doesn't support this device
> + * and we should try the next method.
> + *
> + * If it returns 0 (success), we're finished. If it returns any
> + * other error, we're also finished: this indicates that further
> + * reset mechanisms might be broken on the device.
> + */
> + rc = reset->reset_fn(dev, 0);
> + if (rc != -ENOTTY)
> + return rc;
> + }
> + return rc;
> }
> EXPORT_SYMBOL_GPL(__pci_reset_function_locked);
>
> /**
> - * pci_probe_reset_function - check whether the device can be safely reset
> - * @dev: PCI device to reset
> + * pci_init_reset_methods - check whether device can be safely reset
> + * and store supported reset mechanisms.
> + * @dev: PCI device to check for reset mechanisms
> *
> * Some devices allow an individual function to be reset without affecting
> * other functions in the same device. The PCI device must be responsive
> - * to PCI config space in order to use this function.
> + * to reads and writes to its PCI config space in order to use this function.
> *
> - * Returns 0 if the device function can be reset or negative if the
> - * device doesn't support resetting a single function.
> + * Stores reset mechanisms supported by device in reset_methods bitmap
> + * field of struct pci_dev
> */
> -int pci_probe_reset_function(struct pci_dev *dev)
> +void pci_init_reset_methods(struct pci_dev *dev)
> {
> - int rc;
> + int i, rc;
> + const struct pci_reset_fn_method *reset;
>
> - might_sleep();
> + dev->reset_methods = 0;
>
> - rc = pci_dev_specific_reset(dev, 1);
> - if (rc != -ENOTTY)
> - return rc;
> - rc = pcie_flr(dev, 1);
> - if (rc != -ENOTTY)
> - return rc;
> - rc = pci_af_flr(dev, 1);
> - if (rc != -ENOTTY)
> - return rc;
> - rc = pci_pm_reset(dev, 1);
> - if (rc != -ENOTTY)
> - return rc;
> - rc = pci_dev_reset_slot_function(dev, 1);
> - if (rc != -ENOTTY)
> - return rc;
> + might_sleep();
>
> - return pci_parent_bus_reset(dev, 1);
> + for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
> + rc = reset->reset_fn(dev, 1);
> + if (!rc)
> + dev->reset_methods |= (1 << i);
> + else if (rc != -ENOTTY)
> + break;
> + }
> }
>
> /**
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index ef7c46613..ec093efdc 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -39,7 +39,7 @@ enum pci_mmap_api {
> int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vmai,
> enum pci_mmap_api mmap_api);
>
> -int pci_probe_reset_function(struct pci_dev *dev);
> +void pci_init_reset_methods(struct pci_dev *dev);
> int pci_bridge_secondary_bus_reset(struct pci_dev *dev);
> int pci_bus_error_reset(struct pci_dev *dev);
>
> @@ -612,6 +612,15 @@ struct pci_dev_reset_methods {
> int (*reset)(struct pci_dev *dev, int probe);
> };
>
> +typedef int (*pci_reset_fn_t)(struct pci_dev *, int);
> +
> +struct pci_reset_fn_method {
> + pci_reset_fn_t reset_fn;
> + char *name;
> +};
> +
> +extern const struct pci_reset_fn_method pci_reset_fn_methods[];
> +
> #ifdef CONFIG_PCI_QUIRKS
> int pci_dev_specific_reset(struct pci_dev *dev, int probe);
> #else
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 953f15abc..01dd037bd 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2403,9 +2403,8 @@ static void pci_init_capabilities(struct pci_dev *dev)
> pci_rcec_init(dev); /* Root Complex Event Collector */
>
> pcie_report_downtraining(dev);
> -
> - if (pci_probe_reset_function(dev) == 0)
> - dev->reset_fn = 1;
> + pci_init_reset_methods(dev);
> + dev->reset_fn = !!dev->reset_methods;
> }
>
> /*
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 621ff5224..56d6e4750 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -325,6 +325,16 @@ struct pci_dev {
> unsigned int class; /* 3 bytes: (base,sub,prog-if) */
> u8 revision; /* PCI revision, low byte of class word */
> u8 hdr_type; /* PCI header type (`multi' flag masked out) */
> + /*
> + * bit 0 -> dev_specific
> + * bit 1 -> flr
> + * bit 2 -> af_flr
> + * bit 3 -> pm
> + * bit 4 -> slot
> + * bit 5 -> bus
> + * See pci_reset_fn_methods array in pci.c
> + */
> + u8 __bitwise reset_methods; /* bitmap for device supported reset capabilities */
> #ifdef CONFIG_PCIEAER
> u16 aer_cap; /* AER capability offset */
> struct aer_stats *aer_stats; /* AER stats for this device */
> --
> 2.30.2
On Friday 12 March 2021 23:04:51 [email protected] wrote:
> From: Amey Narkhede <[email protected]>
>
> reset_fn field is used to indicate whether the
> device supports any reset mechanism or not.
> Deprecate use of reset_fn in favor of new
> reset_methods bitmap which can be used to keep
> track of all supported reset mechanisms of a device.
Hello Amey!
You cannot trigger PCIe Hot Reset (PCI secondary bus reset) in this
simple way from sysfs via new reset methods.
I proposed very similar functionality just few days ago:
https://lore.kernel.org/linux-pci/20210301171221.3d42a55i7h5ubqsb@pali/T/#u
And I realized that it needs more steps to be done.
At least some remove-reset-rescan procedure done atomically is required.
> Signed-off-by: Amey Narkhede <[email protected]>
> ---
> Reviewed-by: Alex Williamson <[email protected]>
> Reviewed-by: Raphael Norwitz <[email protected]>
>
> drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 2 +-
> drivers/pci/pci-sysfs.c | 6 ++----
> drivers/pci/pci.c | 6 +++---
> drivers/pci/probe.c | 1 -
> drivers/pci/quirks.c | 2 +-
> include/linux/pci.h | 1 -
> 6 files changed, 7 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> index 9b9d305c6..3e2c49e08 100644
> --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
> @@ -526,7 +526,7 @@ static void octeon_destroy_resources(struct octeon_device *oct)
> oct->irq_name_storage = NULL;
> }
> /* Soft reset the octeon device before exiting */
> - if (oct->pci_dev->reset_fn)
> + if (oct->pci_dev->reset_methods)
> octeon_pci_flr(oct);
> else
> cn23xx_vf_ask_pf_to_do_flr(oct);
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index f8afd54ca..78d2c130c 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1334,7 +1334,7 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev)
>
> pcie_vpd_create_sysfs_dev_files(dev);
>
> - if (dev->reset_fn) {
> + if (dev->reset_methods) {
> retval = device_create_file(&dev->dev, &dev_attr_reset);
> if (retval)
> goto error;
> @@ -1417,10 +1417,8 @@ int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev)
> static void pci_remove_capabilities_sysfs(struct pci_dev *dev)
> {
> pcie_vpd_remove_sysfs_dev_files(dev);
> - if (dev->reset_fn) {
> + if (dev->reset_methods)
> device_remove_file(&dev->dev, &dev_attr_reset);
> - dev->reset_fn = 0;
> - }
> }
>
> /**
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 407b44e85..b7f6c6588 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5175,7 +5175,7 @@ int pci_reset_function(struct pci_dev *dev)
> {
> int rc;
>
> - if (!dev->reset_fn)
> + if (!dev->reset_methods)
> return -ENOTTY;
>
> pci_dev_lock(dev);
> @@ -5211,7 +5211,7 @@ int pci_reset_function_locked(struct pci_dev *dev)
> {
> int rc;
>
> - if (!dev->reset_fn)
> + if (!dev->reset_methods)
> return -ENOTTY;
>
> pci_dev_save_and_disable(dev);
> @@ -5234,7 +5234,7 @@ int pci_try_reset_function(struct pci_dev *dev)
> {
> int rc;
>
> - if (!dev->reset_fn)
> + if (!dev->reset_methods)
> return -ENOTTY;
>
> if (!pci_dev_trylock(dev))
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 01dd037bd..4764e031a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2404,7 +2404,6 @@ static void pci_init_capabilities(struct pci_dev *dev)
>
> pcie_report_downtraining(dev);
> pci_init_reset_methods(dev);
> - dev->reset_fn = !!dev->reset_methods;
> }
>
> /*
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 0a3df84c9..20a81b1bc 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5535,7 +5535,7 @@ static void quirk_reset_lenovo_thinkpad_p50_nvgpu(struct pci_dev *pdev)
>
> if (pdev->subsystem_vendor != PCI_VENDOR_ID_LENOVO ||
> pdev->subsystem_device != 0x222e ||
> - !pdev->reset_fn)
> + !pdev->reset_methods)
> return;
>
> if (pci_enable_device_mem(pdev))
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 56d6e4750..a2f003f4e 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -437,7 +437,6 @@ struct pci_dev {
> unsigned int state_saved:1;
> unsigned int is_physfn:1;
> unsigned int is_virtfn:1;
> - unsigned int reset_fn:1;
> unsigned int is_hotplug_bridge:1;
> unsigned int shpc_managed:1; /* SHPC owned by shpchp */
> unsigned int is_thunderbolt:1; /* Thunderbolt controller */
> --
> 2.30.2
On Friday 12 March 2021 23:04:52 [email protected] wrote:
> From: Amey Narkhede <[email protected]>
>
> Add reset_methods_enabled bitmap to struct pci_dev to
> keep track of user preferred device reset mechanisms.
> Add reset_method sysfs attribute to query and set
> user preferred device reset mechanisms.
>
> Signed-off-by: Amey Narkhede <[email protected]>
> ---
> Reviewed-by: Alex Williamson <[email protected]>
> Reviewed-by: Raphael Norwitz <[email protected]>
>
> Documentation/ABI/testing/sysfs-bus-pci | 15 ++++++
> drivers/pci/pci-sysfs.c | 66 +++++++++++++++++++++++--
> drivers/pci/pci.c | 3 +-
> include/linux/pci.h | 2 +
> 4 files changed, 82 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> index 25c9c3977..ae53ecd2e 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -121,6 +121,21 @@ Description:
> child buses, and re-discover devices removed earlier
> from this part of the device tree.
>
> +What: /sys/bus/pci/devices/.../reset_method
> +Date: March 2021
> +Contact: Amey Narkhede <[email protected]>
> +Description:
> + Some devices allow an individual function to be reset
> + without affecting other functions in the same slot.
> + For devices that have this support, a file named reset_method
> + will be present in sysfs. Reading this file will give names
> + of the device supported reset methods. Currently used methods
> + are enclosed in brackets. Writing the name of any of the device
> + supported reset method to this file will set the reset method to
> + be used when resetting the device. Writing "none" to this file
> + will disable ability to reset the device and writing "default"
> + will return to the original value.
> +
Hello Amey!
I think that this API does not work for PCIe Hot Reset (=PCI secondary
bus reset) and PCIe Warm Reset.
First reset method is bound to the bus, not device and therefore kernel
does not have to see any registered device. So there would be no
"reset_method" sysfs file, and also no "reset" sysfs file. But PCIe Hot
Reset is in most cases needed when buggy card is not registered on bus,
to trigger this reset. And with this API this is not possible.
PCIe Warm Reset is done by PERST# signal. When signal is asserted then
device is in reset state and therefore is not registered. So again
kernel does not have to see registered device.
Moreover for mPCIe form factor cards, boards can share one PERST# signal
with more PCIe cards and control this signal via GPIO. So asserting
PERST# GPIO can trigger Warm reset for more PCIe cards, not just one. It
depends on board or topology.
So... I do not think that current approach with "reset_method" sysfs
entry bound to the PCI device does not work for PCI secondary bus reset
and also cannot be used for implementing PCIe Warm Reset.
I would rather suggest to re-design and prepare a new API which would
work also with PCIe Hot Reset and PCIe Warm Reset.
This "reset" sysfs file can work only with PCI Function Level Reset or
some PM or device specific reset. But not with reset types which are
more like slot or bus orientated.
> What: /sys/bus/pci/devices/.../reset
> Date: July 2009
> Contact: Michael S. Tsirkin <[email protected]>
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 78d2c130c..3cd06d1c0 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1304,6 +1304,59 @@ static const struct bin_attribute pcie_config_attr = {
> .write = pci_write_config,
> };
>
> +static ssize_t reset_method_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + const struct pci_reset_fn_method *reset;
> + struct pci_dev *pdev = to_pci_dev(dev);
> + ssize_t len = 0;
> + int i;
> +
> + for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
> + if (pdev->reset_methods_enabled & (1 << i))
> + len += sysfs_emit_at(buf, len, "[%s] ", reset->name);
> + else if (pdev->reset_methods & (1 << i))
> + len += sysfs_emit_at(buf, len, "%s ", reset->name);
> + }
> +
> + return len;
> +}
> +
> +static ssize_t reset_method_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + const struct pci_reset_fn_method *reset = pci_reset_fn_methods;
> + struct pci_dev *pdev = to_pci_dev(dev);
> + u8 reset_mechanism;
> + int i = 0;
> +
> + /* Writing none disables reset */
> + if (sysfs_streq(buf, "none")) {
> + reset_mechanism = 0;
> + } else if (sysfs_streq(buf, "default")) {
> + /* Writing default returns to initial value */
> + reset_mechanism = pdev->reset_methods;
> + } else {
> + reset_mechanism = 0;
> + for (; reset->reset_fn; i++, reset++) {
> + if (sysfs_streq(buf, reset->name)) {
> + reset_mechanism = 1 << i;
> + break;
> + }
> + }
> + if (!reset_mechanism || !(pdev->reset_methods & reset_mechanism))
> + return -EINVAL;
> + }
> +
> + pdev->reset_methods_enabled = reset_mechanism;
> +
> + return count;
> +}
> +
> +static DEVICE_ATTR_RW(reset_method);
> +
> static ssize_t reset_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t count)
> {
> @@ -1337,11 +1390,16 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev)
> if (dev->reset_methods) {
> retval = device_create_file(&dev->dev, &dev_attr_reset);
> if (retval)
> - goto error;
> + goto err_reset;
> + retval = device_create_file(&dev->dev, &dev_attr_reset_method);
> + if (retval)
> + goto err_method;
> }
> return 0;
>
> -error:
> +err_method:
> + device_remove_file(&dev->dev, &dev_attr_reset);
> +err_reset:
> pcie_vpd_remove_sysfs_dev_files(dev);
> return retval;
> }
> @@ -1417,8 +1475,10 @@ int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev)
> static void pci_remove_capabilities_sysfs(struct pci_dev *dev)
> {
> pcie_vpd_remove_sysfs_dev_files(dev);
> - if (dev->reset_methods)
> + if (dev->reset_methods) {
> device_remove_file(&dev->dev, &dev_attr_reset);
> + device_remove_file(&dev->dev, &dev_attr_reset_method);
> + }
> }
>
> /**
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b7f6c6588..81cebea56 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5106,7 +5106,7 @@ int __pci_reset_function_locked(struct pci_dev *dev)
> might_sleep();
>
> for (i = 0, reset = pci_reset_fn_methods; reset->reset_fn; i++, reset++) {
> - if (!(dev->reset_methods & (1 << i)))
> + if (!(dev->reset_methods_enabled & (1 << i)))
> continue;
>
> /*
> @@ -5153,6 +5153,7 @@ void pci_init_reset_methods(struct pci_dev *dev)
> else if (rc != -ENOTTY)
> break;
> }
> + dev->reset_methods_enabled = dev->reset_methods;
> }
>
> /**
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index a2f003f4e..400f614e0 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -335,6 +335,8 @@ struct pci_dev {
> * See pci_reset_fn_methods array in pci.c
> */
> u8 __bitwise reset_methods; /* bitmap for device supported reset capabilities */
> + /* bitmap for user enabled and device supported reset capabilities */
> + u8 __bitwise reset_methods_enabled;
> #ifdef CONFIG_PCIEAER
> u16 aer_cap; /* AER capability offset */
> struct aer_stats *aer_stats; /* AER stats for this device */
> --
> 2.30.2
On 21/03/15 12:55AM, Pali Roh?r wrote:
> On Friday 12 March 2021 23:04:52 [email protected] wrote:
> > From: Amey Narkhede <[email protected]>
> >
> > Add reset_methods_enabled bitmap to struct pci_dev to
> > keep track of user preferred device reset mechanisms.
> > Add reset_method sysfs attribute to query and set
> > user preferred device reset mechanisms.
> >
> > Signed-off-by: Amey Narkhede <[email protected]>
> > ---
> > Reviewed-by: Alex Williamson <[email protected]>
> > Reviewed-by: Raphael Norwitz <[email protected]>
> >
> > Documentation/ABI/testing/sysfs-bus-pci | 15 ++++++
> > drivers/pci/pci-sysfs.c | 66 +++++++++++++++++++++++--
> > drivers/pci/pci.c | 3 +-
> > include/linux/pci.h | 2 +
> > 4 files changed, 82 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> > index 25c9c3977..ae53ecd2e 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-pci
> > +++ b/Documentation/ABI/testing/sysfs-bus-pci
> > @@ -121,6 +121,21 @@ Description:
> > child buses, and re-discover devices removed earlier
> > from this part of the device tree.
> >
> > +What: /sys/bus/pci/devices/.../reset_method
> > +Date: March 2021
> > +Contact: Amey Narkhede <[email protected]>
> > +Description:
> > + Some devices allow an individual function to be reset
> > + without affecting other functions in the same slot.
> > + For devices that have this support, a file named reset_method
> > + will be present in sysfs. Reading this file will give names
> > + of the device supported reset methods. Currently used methods
> > + are enclosed in brackets. Writing the name of any of the device
> > + supported reset method to this file will set the reset method to
> > + be used when resetting the device. Writing "none" to this file
> > + will disable ability to reset the device and writing "default"
> > + will return to the original value.
> > +
>
> Hello Amey!
>
> I think that this API does not work for PCIe Hot Reset (=PCI secondary
> bus reset) and PCIe Warm Reset.
>
> First reset method is bound to the bus, not device and therefore kernel
> does not have to see any registered device. So there would be no
> "reset_method" sysfs file, and also no "reset" sysfs file. But PCIe Hot
> Reset is in most cases needed when buggy card is not registered on bus,
> to trigger this reset. And with this API this is not possible.
>
> PCIe Warm Reset is done by PERST# signal. When signal is asserted then
> device is in reset state and therefore is not registered. So again
> kernel does not have to see registered device.
>
> Moreover for mPCIe form factor cards, boards can share one PERST# signal
> with more PCIe cards and control this signal via GPIO. So asserting
> PERST# GPIO can trigger Warm reset for more PCIe cards, not just one. It
> depends on board or topology.
>
> So... I do not think that current approach with "reset_method" sysfs
> entry bound to the PCI device does not work for PCI secondary bus reset
> and also cannot be used for implementing PCIe Warm Reset.
>
> I would rather suggest to re-design and prepare a new API which would
> work also with PCIe Hot Reset and PCIe Warm Reset.
>
> This "reset" sysfs file can work only with PCI Function Level Reset or
> some PM or device specific reset. But not with reset types which are
> more like slot or bus orientated.
>
The scope of this patch was to expose current reset methods
to the userspace. Also reset methods are available
for only those devices that allow an individual function to be reset
without affecting other functions in the same device.
So if those conditions are satisfied by the device then it can
use slot reset (pci_dev_reset_slot_function) and secondary bus
reset(pci_parent_bus_reset) which I think are hot reset and
warm reset respectively.
Thanks,
Amey
[...]
On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> slot reset (pci_dev_reset_slot_function) and secondary bus
> reset(pci_parent_bus_reset) which I think are hot reset and
> warm reset respectively.
No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
type of reset, which is currently implemented only for PCIe hot plug
bridges and for PowerPC PowerNV platform and it just call PCI secondary
bus reset with some other hook. PCIe Warm Reset does not have API in
kernel and therefore drivers do not export this type of reset via any
kernel function (yet).
On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> On Mon, 15 Mar 2021 14:52:26 +0100
> Pali Rohár <[email protected]> wrote:
>
> > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > warm reset respectively.
> >
> > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > type of reset, which is currently implemented only for PCIe hot plug
> > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > bus reset with some other hook. PCIe Warm Reset does not have API in
> > kernel and therefore drivers do not export this type of reset via any
> > kernel function (yet).
>
> Warm reset is beyond the scope of this series, but could be implemented
> in a compatible way to fit within the pci_reset_fn_methods[] array
> defined here.
Ok!
> Note that with this series the resets available through
> pci_reset_function() and the per device reset attribute is sysfs remain
> exactly the same as they are currently. The bus and slot reset
> methods used here are limited to devices where only a single function is
> affected by the reset, therefore it is not like the patch you proposed
> which performed a reset irrespective of the downstream devices. This
> series only enables selection of the existing methods. Thanks,
>
> Alex
>
But with this patch series, there is still an issue with PCI secondary
bus reset mechanism as exported sysfs attribute does not do that
remove-reset-rescan procedure. As discussed in other thread, this reset
let device in unconfigured / broken state.
On Mon, 15 Mar 2021 15:52:38 +0100
Pali Rohár <[email protected]> wrote:
> On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > On Mon, 15 Mar 2021 14:52:26 +0100
> > Pali Rohár <[email protected]> wrote:
> >
> > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > warm reset respectively.
> > >
> > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > type of reset, which is currently implemented only for PCIe hot plug
> > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > kernel and therefore drivers do not export this type of reset via any
> > > kernel function (yet).
> >
> > Warm reset is beyond the scope of this series, but could be implemented
> > in a compatible way to fit within the pci_reset_fn_methods[] array
> > defined here.
>
> Ok!
>
> > Note that with this series the resets available through
> > pci_reset_function() and the per device reset attribute is sysfs remain
> > exactly the same as they are currently. The bus and slot reset
> > methods used here are limited to devices where only a single function is
> > affected by the reset, therefore it is not like the patch you proposed
> > which performed a reset irrespective of the downstream devices. This
> > series only enables selection of the existing methods. Thanks,
> >
> > Alex
> >
>
> But with this patch series, there is still an issue with PCI secondary
> bus reset mechanism as exported sysfs attribute does not do that
> remove-reset-rescan procedure. As discussed in other thread, this reset
> let device in unconfigured / broken state.
No, there's not:
int pci_reset_function(struct pci_dev *dev)
{
int rc;
if (!dev->reset_fn)
return -ENOTTY;
pci_dev_lock(dev);
>>> pci_dev_save_and_disable(dev);
rc = __pci_reset_function_locked(dev);
>>> pci_dev_restore(dev);
pci_dev_unlock(dev);
return rc;
}
The remove/re-scan was discussed primarily because your patch performed
a bus reset regardless of what devices were affected by that reset and
it's difficult to manage the scope where multiple devices are affected.
Here, the bus and slot reset functions will fail unless the scope is
limited to the single device triggering this reset. Thanks,
Alex
On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> On Mon, 15 Mar 2021 14:52:26 +0100
> Pali Roh?r <[email protected]> wrote:
>
> > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > warm reset respectively.
> >
> > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > type of reset, which is currently implemented only for PCIe hot plug
> > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > bus reset with some other hook. PCIe Warm Reset does not have API in
> > kernel and therefore drivers do not export this type of reset via any
> > kernel function (yet).
>
> Warm reset is beyond the scope of this series, but could be implemented
> in a compatible way to fit within the pci_reset_fn_methods[] array
> defined here. Note that with this series the resets available through
> pci_reset_function() and the per device reset attribute is sysfs remain
> exactly the same as they are currently. The bus and slot reset
> methods used here are limited to devices where only a single function is
> affected by the reset, therefore it is not like the patch you proposed
> which performed a reset irrespective of the downstream devices. This
> series only enables selection of the existing methods. Thanks,
Alex,
I asked the patch author here [1], but didn't get any response, maybe
you can answer me. What is the use case scenario for this functionality?
Thanks
[1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal
>
> Alex
>
On 21/03/15 05:07PM, Leon Romanovsky wrote:
> On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > On Mon, 15 Mar 2021 14:52:26 +0100
> > Pali Roh?r <[email protected]> wrote:
> >
> > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > warm reset respectively.
> > >
> > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > type of reset, which is currently implemented only for PCIe hot plug
> > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > kernel and therefore drivers do not export this type of reset via any
> > > kernel function (yet).
> >
> > Warm reset is beyond the scope of this series, but could be implemented
> > in a compatible way to fit within the pci_reset_fn_methods[] array
> > defined here. Note that with this series the resets available through
> > pci_reset_function() and the per device reset attribute is sysfs remain
> > exactly the same as they are currently. The bus and slot reset
> > methods used here are limited to devices where only a single function is
> > affected by the reset, therefore it is not like the patch you proposed
> > which performed a reset irrespective of the downstream devices. This
> > series only enables selection of the existing methods. Thanks,
>
> Alex,
>
> I asked the patch author here [1], but didn't get any response, maybe
> you can answer me. What is the use case scenario for this functionality?
>
> Thanks
>
> [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal
>
Sorry for not responding immediately. There were some buggy wifi cards
which needed FLR explicitly not sure if that behavior is fixed in
drivers. Also there is use a case at Nutanix but the engineer who
is involved is on PTO that is why I did not respond immediately as
I don't know the details yet.
Thanks,
Amey
On Mon, 15 Mar 2021 21:03:41 +0530
Amey Narkhede <[email protected]> wrote:
> On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > Pali Rohár <[email protected]> wrote:
> > >
> > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > warm reset respectively.
> > > >
> > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > kernel and therefore drivers do not export this type of reset via any
> > > > kernel function (yet).
> > >
> > > Warm reset is beyond the scope of this series, but could be implemented
> > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > defined here. Note that with this series the resets available through
> > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > exactly the same as they are currently. The bus and slot reset
> > > methods used here are limited to devices where only a single function is
> > > affected by the reset, therefore it is not like the patch you proposed
> > > which performed a reset irrespective of the downstream devices. This
> > > series only enables selection of the existing methods. Thanks,
> >
> > Alex,
> >
> > I asked the patch author here [1], but didn't get any response, maybe
> > you can answer me. What is the use case scenario for this functionality?
> >
> > Thanks
> >
> > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal
> >
> Sorry for not responding immediately. There were some buggy wifi cards
> which needed FLR explicitly not sure if that behavior is fixed in
> drivers. Also there is use a case at Nutanix but the engineer who
> is involved is on PTO that is why I did not respond immediately as
> I don't know the details yet.
And more generally, devices continue to have reset issues and we
impose a fixed priority in our ordering. We can and probably should
continue to quirk devices when we find broken resets so that we have
the best default behavior, but it's currently not easy for an end user
to experiment, ie. this reset works, that one doesn't. We might also
have platform issues where a given reset works better on a certain
platform. Exposing a way to test these things might lead to better
quirks. In the case I think Pali was looking for, they wanted a
mechanism to force a bus reset, if this was in reference to a single
function device, this could be accomplished by setting a priority for
that mechanism, which would translate to not only the sysfs reset
attribute, but also the reset mechanism used by vfio-pci. Thanks,
Alex
On Mon, 15 Mar 2021 14:52:26 +0100
Pali Rohár <[email protected]> wrote:
> On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > slot reset (pci_dev_reset_slot_function) and secondary bus
> > reset(pci_parent_bus_reset) which I think are hot reset and
> > warm reset respectively.
>
> No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> type of reset, which is currently implemented only for PCIe hot plug
> bridges and for PowerPC PowerNV platform and it just call PCI secondary
> bus reset with some other hook. PCIe Warm Reset does not have API in
> kernel and therefore drivers do not export this type of reset via any
> kernel function (yet).
Warm reset is beyond the scope of this series, but could be implemented
in a compatible way to fit within the pci_reset_fn_methods[] array
defined here. Note that with this series the resets available through
pci_reset_function() and the per device reset attribute is sysfs remain
exactly the same as they are currently. The bus and slot reset
methods used here are limited to devices where only a single function is
affected by the reset, therefore it is not like the patch you proposed
which performed a reset irrespective of the downstream devices. This
series only enables selection of the existing methods. Thanks,
Alex
On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> On Mon, 15 Mar 2021 21:03:41 +0530
> Amey Narkhede <[email protected]> wrote:
>
> > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > Pali Roh?r <[email protected]> wrote:
> > > >
> > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > warm reset respectively.
> > > > >
> > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > kernel function (yet).
> > > >
> > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > defined here. Note that with this series the resets available through
> > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > exactly the same as they are currently. The bus and slot reset
> > > > methods used here are limited to devices where only a single function is
> > > > affected by the reset, therefore it is not like the patch you proposed
> > > > which performed a reset irrespective of the downstream devices. This
> > > > series only enables selection of the existing methods. Thanks,
> > >
> > > Alex,
> > >
> > > I asked the patch author here [1], but didn't get any response, maybe
> > > you can answer me. What is the use case scenario for this functionality?
> > >
> > > Thanks
> > >
> > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > >
> > Sorry for not responding immediately. There were some buggy wifi cards
> > which needed FLR explicitly not sure if that behavior is fixed in
> > drivers. Also there is use a case at Nutanix but the engineer who
> > is involved is on PTO that is why I did not respond immediately as
> > I don't know the details yet.
>
> And more generally, devices continue to have reset issues and we
> impose a fixed priority in our ordering. We can and probably should
> continue to quirk devices when we find broken resets so that we have
> the best default behavior, but it's currently not easy for an end user
> to experiment, ie. this reset works, that one doesn't. We might also
> have platform issues where a given reset works better on a certain
> platform. Exposing a way to test these things might lead to better
> quirks. In the case I think Pali was looking for, they wanted a
> mechanism to force a bus reset, if this was in reference to a single
> function device, this could be accomplished by setting a priority for
> that mechanism, which would translate to not only the sysfs reset
> attribute, but also the reset mechanism used by vfio-pci. Thanks,
>
> Alex
>
To confirm from our end - we have seen many such instances where default
reset methods have not worked well on our platform. Debugging these
issues is painful in practice, and this interface would make it far
easier.
Having an interface like this would also help us better communicate the
issues we find with upstream. Allowing others to more easily test our
(or other entities') findings should give better visibility into
which issues apply to the device in general and which are platform
specific. In disambiguating the former from the latter, we should be
able to better quirk devices for everyone, and in the latter cases, this
interface allows for a safer and more elegant solution than any of the
current alternatives.
CC Alay, Suresh, Shyam and Felipe in case they have anything to add.
On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > On Mon, 15 Mar 2021 21:03:41 +0530
> > Amey Narkhede <[email protected]> wrote:
> >
> > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > Pali Roh?r <[email protected]> wrote:
> > > > >
> > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > warm reset respectively.
> > > > > >
> > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > kernel function (yet).
> > > > >
> > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > defined here. Note that with this series the resets available through
> > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > exactly the same as they are currently. The bus and slot reset
> > > > > methods used here are limited to devices where only a single function is
> > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > which performed a reset irrespective of the downstream devices. This
> > > > > series only enables selection of the existing methods. Thanks,
> > > >
> > > > Alex,
> > > >
> > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > you can answer me. What is the use case scenario for this functionality?
> > > >
> > > > Thanks
> > > >
> > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > >
> > > Sorry for not responding immediately. There were some buggy wifi cards
> > > which needed FLR explicitly not sure if that behavior is fixed in
> > > drivers. Also there is use a case at Nutanix but the engineer who
> > > is involved is on PTO that is why I did not respond immediately as
> > > I don't know the details yet.
> >
> > And more generally, devices continue to have reset issues and we
> > impose a fixed priority in our ordering. We can and probably should
> > continue to quirk devices when we find broken resets so that we have
> > the best default behavior, but it's currently not easy for an end user
> > to experiment, ie. this reset works, that one doesn't. We might also
> > have platform issues where a given reset works better on a certain
> > platform. Exposing a way to test these things might lead to better
> > quirks. In the case I think Pali was looking for, they wanted a
> > mechanism to force a bus reset, if this was in reference to a single
> > function device, this could be accomplished by setting a priority for
> > that mechanism, which would translate to not only the sysfs reset
> > attribute, but also the reset mechanism used by vfio-pci. Thanks,
> >
> > Alex
> >
>
> To confirm from our end - we have seen many such instances where default
> reset methods have not worked well on our platform. Debugging these
> issues is painful in practice, and this interface would make it far
> easier.
>
> Having an interface like this would also help us better communicate the
> issues we find with upstream. Allowing others to more easily test our
> (or other entities') findings should give better visibility into
> which issues apply to the device in general and which are platform
> specific. In disambiguating the former from the latter, we should be
> able to better quirk devices for everyone, and in the latter cases, this
> interface allows for a safer and more elegant solution than any of the
> current alternatives.
So to summarize, we are talking about test and debug interface to
overcome HW bugs, am I right?
My personal experience shows that once the easy workaround exists
(and write to generally available sysfs is very simple), the vendors
and users desire for proper fix decreases drastically. IMHO, we will
see increase of copy/paste in SO and blog posts, but reduce in quirks.
My 2-cents.
>
> CC Alay, Suresh, Shyam and Felipe in case they have anything to add.
On 21/03/17 06:20AM, Leon Romanovsky wrote:
> On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > Amey Narkhede <[email protected]> wrote:
> > >
> > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > >
> > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > warm reset respectively.
> > > > > > >
> > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > kernel function (yet).
> > > > > >
> > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > defined here. Note that with this series the resets available through
> > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > methods used here are limited to devices where only a single function is
> > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > series only enables selection of the existing methods. Thanks,
> > > > >
> > > > > Alex,
> > > > >
> > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > you can answer me. What is the use case scenario for this functionality?
> > > > >
> > > > > Thanks
> > > > >
> > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > >
> > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > is involved is on PTO that is why I did not respond immediately as
> > > > I don't know the details yet.
> > >
> > > And more generally, devices continue to have reset issues and we
> > > impose a fixed priority in our ordering. We can and probably should
> > > continue to quirk devices when we find broken resets so that we have
> > > the best default behavior, but it's currently not easy for an end user
> > > to experiment, ie. this reset works, that one doesn't. We might also
> > > have platform issues where a given reset works better on a certain
> > > platform. Exposing a way to test these things might lead to better
> > > quirks. In the case I think Pali was looking for, they wanted a
> > > mechanism to force a bus reset, if this was in reference to a single
> > > function device, this could be accomplished by setting a priority for
> > > that mechanism, which would translate to not only the sysfs reset
> > > attribute, but also the reset mechanism used by vfio-pci. Thanks,
> > >
> > > Alex
> > >
> >
> > To confirm from our end - we have seen many such instances where default
> > reset methods have not worked well on our platform. Debugging these
> > issues is painful in practice, and this interface would make it far
> > easier.
> >
> > Having an interface like this would also help us better communicate the
> > issues we find with upstream. Allowing others to more easily test our
> > (or other entities') findings should give better visibility into
> > which issues apply to the device in general and which are platform
> > specific. In disambiguating the former from the latter, we should be
> > able to better quirk devices for everyone, and in the latter cases, this
> > interface allows for a safer and more elegant solution than any of the
> > current alternatives.
>
> So to summarize, we are talking about test and debug interface to
> overcome HW bugs, am I right?
>
> My personal experience shows that once the easy workaround exists
> (and write to generally available sysfs is very simple), the vendors
> and users desire for proper fix decreases drastically. IMHO, we will
> see increase of copy/paste in SO and blog posts, but reduce in quirks.
>
> My 2-cents.
>
I agree with your point but at least it gives the userspace ability
to use broken device until bug is fixed in upstream.
This is also applicable for obscure devices without upstream
drivers for example custom FPGA based devices.
Another main application which I forgot to mention is virtualization
where vmm wants to reset the device when the guest is reset,
to emulate machine reboot as closely as possible.
Thanks,
Amey
On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > Amey Narkhede <[email protected]> wrote:
> > > >
> > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > >
> > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > warm reset respectively.
> > > > > > > >
> > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > kernel function (yet).
> > > > > > >
> > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > defined here. Note that with this series the resets available through
> > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > >
> > > > > > Alex,
> > > > > >
> > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > >
> > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > I don't know the details yet.
> > > >
> > > > And more generally, devices continue to have reset issues and we
> > > > impose a fixed priority in our ordering. We can and probably should
> > > > continue to quirk devices when we find broken resets so that we have
> > > > the best default behavior, but it's currently not easy for an end user
> > > > to experiment, ie. this reset works, that one doesn't. We might also
> > > > have platform issues where a given reset works better on a certain
> > > > platform. Exposing a way to test these things might lead to better
> > > > quirks. In the case I think Pali was looking for, they wanted a
> > > > mechanism to force a bus reset, if this was in reference to a single
> > > > function device, this could be accomplished by setting a priority for
> > > > that mechanism, which would translate to not only the sysfs reset
> > > > attribute, but also the reset mechanism used by vfio-pci. Thanks,
> > > >
> > > > Alex
> > > >
> > >
> > > To confirm from our end - we have seen many such instances where default
> > > reset methods have not worked well on our platform. Debugging these
> > > issues is painful in practice, and this interface would make it far
> > > easier.
> > >
> > > Having an interface like this would also help us better communicate the
> > > issues we find with upstream. Allowing others to more easily test our
> > > (or other entities') findings should give better visibility into
> > > which issues apply to the device in general and which are platform
> > > specific. In disambiguating the former from the latter, we should be
> > > able to better quirk devices for everyone, and in the latter cases, this
> > > interface allows for a safer and more elegant solution than any of the
> > > current alternatives.
> >
> > So to summarize, we are talking about test and debug interface to
> > overcome HW bugs, am I right?
> >
> > My personal experience shows that once the easy workaround exists
> > (and write to generally available sysfs is very simple), the vendors
> > and users desire for proper fix decreases drastically. IMHO, we will
> > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> >
> > My 2-cents.
> >
> I agree with your point but at least it gives the userspace ability
> to use broken device until bug is fixed in upstream.
As I said, I don't expect many fixes once "userspace" will be able to
use cheap workaround. There is no incentive to fix it.
> This is also applicable for obscure devices without upstream
> drivers for example custom FPGA based devices.
This is not relevant to upstream kernel. Those vendors ship everything
custom, they don't need upstream, we don't need them :)
> Another main application which I forgot to mention is virtualization
> where vmm wants to reset the device when the guest is reset,
> to emulate machine reboot as closely as possible.
It can work in very narrow case, because reset will cause to device
reprobe and most likely the driver will be different from the one that
started reset. I can imagine that net devices will lose their state and
config after such reset too.
IMHO, it will be saner for everyone if virtualization don't try such resets.
Thanks
>
> Thanks,
> Amey
On 21/03/17 01:02PM, Leon Romanovsky wrote:
> On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > Amey Narkhede <[email protected]> wrote:
> > > > >
> > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > > >
> > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > warm reset respectively.
> > > > > > > > >
> > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > kernel function (yet).
> > > > > > > >
> > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > defined here. Note that with this series the resets available through
> > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > >
> > > > > > > Alex,
> > > > > > >
> > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > >
> > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > I don't know the details yet.
> > > > >
> > > > > And more generally, devices continue to have reset issues and we
> > > > > impose a fixed priority in our ordering. We can and probably should
> > > > > continue to quirk devices when we find broken resets so that we have
> > > > > the best default behavior, but it's currently not easy for an end user
> > > > > to experiment, ie. this reset works, that one doesn't. We might also
> > > > > have platform issues where a given reset works better on a certain
> > > > > platform. Exposing a way to test these things might lead to better
> > > > > quirks. In the case I think Pali was looking for, they wanted a
> > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > function device, this could be accomplished by setting a priority for
> > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > attribute, but also the reset mechanism used by vfio-pci. Thanks,
> > > > >
> > > > > Alex
> > > > >
> > > >
> > > > To confirm from our end - we have seen many such instances where default
> > > > reset methods have not worked well on our platform. Debugging these
> > > > issues is painful in practice, and this interface would make it far
> > > > easier.
> > > >
> > > > Having an interface like this would also help us better communicate the
> > > > issues we find with upstream. Allowing others to more easily test our
> > > > (or other entities') findings should give better visibility into
> > > > which issues apply to the device in general and which are platform
> > > > specific. In disambiguating the former from the latter, we should be
> > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > interface allows for a safer and more elegant solution than any of the
> > > > current alternatives.
> > >
> > > So to summarize, we are talking about test and debug interface to
> > > overcome HW bugs, am I right?
> > >
> > > My personal experience shows that once the easy workaround exists
> > > (and write to generally available sysfs is very simple), the vendors
> > > and users desire for proper fix decreases drastically. IMHO, we will
> > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > >
> > > My 2-cents.
> > >
> > I agree with your point but at least it gives the userspace ability
> > to use broken device until bug is fixed in upstream.
>
> As I said, I don't expect many fixes once "userspace" will be able to
> use cheap workaround. There is no incentive to fix it.
>
> > This is also applicable for obscure devices without upstream
> > drivers for example custom FPGA based devices.
>
> This is not relevant to upstream kernel. Those vendors ship everything
> custom, they don't need upstream, we don't need them :)
>
By custom I meant hobbyists who could tinker with their custom FPGA.
> > Another main application which I forgot to mention is virtualization
> > where vmm wants to reset the device when the guest is reset,
> > to emulate machine reboot as closely as possible.
>
> It can work in very narrow case, because reset will cause to device
> reprobe and most likely the driver will be different from the one that
> started reset. I can imagine that net devices will lose their state and
> config after such reset too.
>
Not sure if I got that 100% right. The pci_reset_function() function
saves and restores device state over the reset.
> IMHO, it will be saner for everyone if virtualization don't try such resets.
>
> Thanks
>
The exists reset sysfs attribute was added for exactly this case
though.
Thanks,
Amey
On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > Amey Narkhede <[email protected]> wrote:
> > > > > >
> > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > warm reset respectively.
> > > > > > > > > >
> > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > kernel function (yet).
> > > > > > > > >
> > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > defined here. Note that with this series the resets available through
> > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > >
> > > > > > > > Alex,
> > > > > > > >
> > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > >
> > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > I don't know the details yet.
> > > > > >
> > > > > > And more generally, devices continue to have reset issues and we
> > > > > > impose a fixed priority in our ordering. We can and probably should
> > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > to experiment, ie. this reset works, that one doesn't. We might also
> > > > > > have platform issues where a given reset works better on a certain
> > > > > > platform. Exposing a way to test these things might lead to better
> > > > > > quirks. In the case I think Pali was looking for, they wanted a
> > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > function device, this could be accomplished by setting a priority for
> > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > attribute, but also the reset mechanism used by vfio-pci. Thanks,
> > > > > >
> > > > > > Alex
> > > > > >
> > > > >
> > > > > To confirm from our end - we have seen many such instances where default
> > > > > reset methods have not worked well on our platform. Debugging these
> > > > > issues is painful in practice, and this interface would make it far
> > > > > easier.
> > > > >
> > > > > Having an interface like this would also help us better communicate the
> > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > (or other entities') findings should give better visibility into
> > > > > which issues apply to the device in general and which are platform
> > > > > specific. In disambiguating the former from the latter, we should be
> > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > interface allows for a safer and more elegant solution than any of the
> > > > > current alternatives.
> > > >
> > > > So to summarize, we are talking about test and debug interface to
> > > > overcome HW bugs, am I right?
> > > >
> > > > My personal experience shows that once the easy workaround exists
> > > > (and write to generally available sysfs is very simple), the vendors
> > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > >
> > > > My 2-cents.
> > > >
> > > I agree with your point but at least it gives the userspace ability
> > > to use broken device until bug is fixed in upstream.
> >
> > As I said, I don't expect many fixes once "userspace" will be able to
> > use cheap workaround. There is no incentive to fix it.
> >
> > > This is also applicable for obscure devices without upstream
> > > drivers for example custom FPGA based devices.
> >
> > This is not relevant to upstream kernel. Those vendors ship everything
> > custom, they don't need upstream, we don't need them :)
> >
> By custom I meant hobbyists who could tinker with their custom FPGA.
I invite such hobbyists to send patches and include their FPGA in
upstream kernel.
>
> > > Another main application which I forgot to mention is virtualization
> > > where vmm wants to reset the device when the guest is reset,
> > > to emulate machine reboot as closely as possible.
> >
> > It can work in very narrow case, because reset will cause to device
> > reprobe and most likely the driver will be different from the one that
> > started reset. I can imagine that net devices will lose their state and
> > config after such reset too.
> >
> Not sure if I got that 100% right. The pci_reset_function() function
> saves and restores device state over the reset.
I'm talking about netdev state, but whatever given the existence of
sysfs reset knob.
>
> > IMHO, it will be saner for everyone if virtualization don't try such resets.
> >
> > Thanks
> >
> The exists reset sysfs attribute was added for exactly this case
> though.
I didn't know the rationale behind that file till you said and I
googled libvirt discussion, so ok. Do you propose that libvirt
will manage database of devices and their working reset types?
I'm not against this patch, just want to raise an attention that the
outcome of this patch will be decrease in fixes of broken devices.
Thanks
>
> Thanks,
> Amey
On 21/03/17 01:47PM, Leon Romanovsky wrote:
> On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > Amey Narkhede <[email protected]> wrote:
> > > > > > >
> > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > >
> > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > kernel function (yet).
> > > > > > > > > >
> > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > defined here. Note that with this series the resets available through
> > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > >
> > > > > > > > > Alex,
> > > > > > > > >
> > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > >
> > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > I don't know the details yet.
> > > > > > >
> > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > impose a fixed priority in our ordering. We can and probably should
> > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > to experiment, ie. this reset works, that one doesn't. We might also
> > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > platform. Exposing a way to test these things might lead to better
> > > > > > > quirks. In the case I think Pali was looking for, they wanted a
> > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > attribute, but also the reset mechanism used by vfio-pci. Thanks,
> > > > > > >
> > > > > > > Alex
> > > > > > >
> > > > > >
> > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > issues is painful in practice, and this interface would make it far
> > > > > > easier.
> > > > > >
> > > > > > Having an interface like this would also help us better communicate the
> > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > (or other entities') findings should give better visibility into
> > > > > > which issues apply to the device in general and which are platform
> > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > current alternatives.
> > > > >
> > > > > So to summarize, we are talking about test and debug interface to
> > > > > overcome HW bugs, am I right?
> > > > >
> > > > > My personal experience shows that once the easy workaround exists
> > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > >
> > > > > My 2-cents.
> > > > >
> > > > I agree with your point but at least it gives the userspace ability
> > > > to use broken device until bug is fixed in upstream.
> > >
> > > As I said, I don't expect many fixes once "userspace" will be able to
> > > use cheap workaround. There is no incentive to fix it.
> > >
> > > > This is also applicable for obscure devices without upstream
> > > > drivers for example custom FPGA based devices.
> > >
> > > This is not relevant to upstream kernel. Those vendors ship everything
> > > custom, they don't need upstream, we don't need them :)
> > >
> > By custom I meant hobbyists who could tinker with their custom FPGA.
>
> I invite such hobbyists to send patches and include their FPGA in
> upstream kernel.
>
> >
> > > > Another main application which I forgot to mention is virtualization
> > > > where vmm wants to reset the device when the guest is reset,
> > > > to emulate machine reboot as closely as possible.
> > >
> > > It can work in very narrow case, because reset will cause to device
> > > reprobe and most likely the driver will be different from the one that
> > > started reset. I can imagine that net devices will lose their state and
> > > config after such reset too.
> > >
> > Not sure if I got that 100% right. The pci_reset_function() function
> > saves and restores device state over the reset.
>
> I'm talking about netdev state, but whatever given the existence of
> sysfs reset knob.
>
> >
> > > IMHO, it will be saner for everyone if virtualization don't try such resets.
> > >
> > > Thanks
> > >
> > The exists reset sysfs attribute was added for exactly this case
> > though.
>
> I didn't know the rationale behind that file till you said and I
> googled libvirt discussion, so ok. Do you propose that libvirt
> will manage database of devices and their working reset types?
>
I don't have much idea about internals of libvirt but why would
it need to manage database of working reset types? It could just
read new reset_methods attribute to get the list of supported reset
methods.
> I'm not against this patch, just want to raise an attention that the
> outcome of this patch will be decrease in fixes of broken devices.
>
> Thanks
>
That makes sense but that isn't any different from existing reset
attribute. This patch inhances it and allows selecting a device supported
reset method instead of using first available reset method according to
existing hardcoded policy.
Thanks,
Amey
On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> On 21/03/17 01:47PM, Leon Romanovsky wrote:
> > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > Amey Narkhede <[email protected]> wrote:
> > > > > > > >
> > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > >
> > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > >
> > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > defined here. Note that with this series the resets available through
> > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > > >
> > > > > > > > > > Alex,
> > > > > > > > > >
> > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > >
> > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > I don't know the details yet.
> > > > > > > >
> > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > impose a fixed priority in our ordering. We can and probably should
> > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > to experiment, ie. this reset works, that one doesn't. We might also
> > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > platform. Exposing a way to test these things might lead to better
> > > > > > > > quirks. In the case I think Pali was looking for, they wanted a
> > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > attribute, but also the reset mechanism used by vfio-pci. Thanks,
> > > > > > > >
> > > > > > > > Alex
> > > > > > > >
> > > > > > >
> > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > easier.
> > > > > > >
> > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > (or other entities') findings should give better visibility into
> > > > > > > which issues apply to the device in general and which are platform
> > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > current alternatives.
> > > > > >
> > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > overcome HW bugs, am I right?
> > > > > >
> > > > > > My personal experience shows that once the easy workaround exists
> > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > >
> > > > > > My 2-cents.
> > > > > >
> > > > > I agree with your point but at least it gives the userspace ability
> > > > > to use broken device until bug is fixed in upstream.
> > > >
> > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > use cheap workaround. There is no incentive to fix it.
> > > >
> > > > > This is also applicable for obscure devices without upstream
> > > > > drivers for example custom FPGA based devices.
> > > >
> > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > custom, they don't need upstream, we don't need them :)
> > > >
> > > By custom I meant hobbyists who could tinker with their custom FPGA.
> >
> > I invite such hobbyists to send patches and include their FPGA in
> > upstream kernel.
> >
> > >
> > > > > Another main application which I forgot to mention is virtualization
> > > > > where vmm wants to reset the device when the guest is reset,
> > > > > to emulate machine reboot as closely as possible.
> > > >
> > > > It can work in very narrow case, because reset will cause to device
> > > > reprobe and most likely the driver will be different from the one that
> > > > started reset. I can imagine that net devices will lose their state and
> > > > config after such reset too.
> > > >
> > > Not sure if I got that 100% right. The pci_reset_function() function
> > > saves and restores device state over the reset.
> >
> > I'm talking about netdev state, but whatever given the existence of
> > sysfs reset knob.
> >
> > >
> > > > IMHO, it will be saner for everyone if virtualization don't try such resets.
> > > >
> > > > Thanks
> > > >
> > > The exists reset sysfs attribute was added for exactly this case
> > > though.
> >
> > I didn't know the rationale behind that file till you said and I
> > googled libvirt discussion, so ok. Do you propose that libvirt
> > will manage database of devices and their working reset types?
> >
> I don't have much idea about internals of libvirt but why would
> it need to manage database of working reset types? It could just
> read new reset_methods attribute to get the list of supported reset
> methods.
Because the idea of this patch is to read all supported reset types and
allow to the user to chose the working one. The user will do it with
help from StackOverflow, but libvirt will need to have some sort of
database, otherwise it won't be different from simple "echo 1 > reset"
which will iterate over all supported resets anyway.
> > I'm not against this patch, just want to raise an attention that the
> > outcome of this patch will be decrease in fixes of broken devices.
> >
> > Thanks
> >
> That makes sense but that isn't any different from existing reset
> attribute. This patch inhances it and allows selecting a device supported
> reset method instead of using first available reset method according to
> existing hardcoded policy.
The difference here is that this is a workaround to solve bugs that
should be fixed in the kernel.
Thanks
>
> Thanks,
> Amey
On Wed, 17 Mar 2021 15:58:40 +0200
Leon Romanovsky <[email protected]> wrote:
> On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> > On 21/03/17 01:47PM, Leon Romanovsky wrote:
> > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > > > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > > Amey Narkhede <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > >
> > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > >
> > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > defined here. Note that with this series the resets available through
> > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Alex,
> > > > > > > > > > >
> > > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > > >
> > > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > > I don't know the details yet.
> > > > > > > > >
> > > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > > impose a fixed priority in our ordering. We can and probably should
> > > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > > to experiment, ie. this reset works, that one doesn't. We might also
> > > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > > platform. Exposing a way to test these things might lead to better
> > > > > > > > > quirks. In the case I think Pali was looking for, they wanted a
> > > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > > attribute, but also the reset mechanism used by vfio-pci. Thanks,
> > > > > > > > >
> > > > > > > > > Alex
> > > > > > > > >
> > > > > > > >
> > > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > > easier.
> > > > > > > >
> > > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > > (or other entities') findings should give better visibility into
> > > > > > > > which issues apply to the device in general and which are platform
> > > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > > current alternatives.
> > > > > > >
> > > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > > overcome HW bugs, am I right?
> > > > > > >
> > > > > > > My personal experience shows that once the easy workaround exists
> > > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > > >
> > > > > > > My 2-cents.
> > > > > > >
> > > > > > I agree with your point but at least it gives the userspace ability
> > > > > > to use broken device until bug is fixed in upstream.
> > > > >
> > > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > > use cheap workaround. There is no incentive to fix it.
We can increase the annoyance factor of using a modified set of reset
methods, but ultimately we can only control what goes into our kernel,
other kernels might take v1 of this series and incorporate it
regardless of what happens here.
> > > > > > This is also applicable for obscure devices without upstream
> > > > > > drivers for example custom FPGA based devices.
> > > > >
> > > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > > custom, they don't need upstream, we don't need them :)
> > > > >
> > > > By custom I meant hobbyists who could tinker with their custom FPGA.
> > >
> > > I invite such hobbyists to send patches and include their FPGA in
> > > upstream kernel.
This is potentially another good use case, how receptive are we going
to be to an FPGA design that botches a reset. Do they have a valid
device ID for us to base a quirk on, are they just squatting on one, or
using the default from a library. Maybe the next bitstream will
resolve it, maybe without any external indication. IOW, what would the
quality level be for that quirk versus using this as a workaround,
where the user probably wouldn't mind a kernel nag?
> > > > > > Another main application which I forgot to mention is virtualization
> > > > > > where vmm wants to reset the device when the guest is reset,
> > > > > > to emulate machine reboot as closely as possible.
> > > > >
> > > > > It can work in very narrow case, because reset will cause to device
> > > > > reprobe and most likely the driver will be different from the one that
> > > > > started reset. I can imagine that net devices will lose their state and
> > > > > config after such reset too.
> > > > >
> > > > Not sure if I got that 100% right. The pci_reset_function() function
> > > > saves and restores device state over the reset.
> > >
> > > I'm talking about netdev state, but whatever given the existence of
> > > sysfs reset knob.
> > >
> > > >
> > > > > IMHO, it will be saner for everyone if virtualization don't try such resets.
That would cause a massive regression in device assignment support. As
with other sysfs attributes, triggering them alongside a running driver
is probably not going to end well. However, pci_reset_function() is
extremely useful for stopping devices and returning them to a default
state, when either rebooting a VM or returning the device to the host.
The device is not removed and re-probed when this occurs, vfio-pci is
able to hold onto the device across these actions. Sure, don't reset a
netdev device when it's in use, that's not what these are used for.
> > > > The exists reset sysfs attribute was added for exactly this case
> > > > though.
> > >
> > > I didn't know the rationale behind that file till you said and I
> > > googled libvirt discussion, so ok. Do you propose that libvirt
> > > will manage database of devices and their working reset types?
> > >
> > I don't have much idea about internals of libvirt but why would
> > it need to manage database of working reset types? It could just
> > read new reset_methods attribute to get the list of supported reset
> > methods.
>
> Because the idea of this patch is to read all supported reset types and
> allow to the user to chose the working one. The user will do it with
> help from StackOverflow, but libvirt will need to have some sort of
> database, otherwise it won't be different from simple "echo 1 > reset"
> which will iterate over all supported resets anyway.
AFAIK, libvirt no longer attempts to do resets itself, or is at least
moving in that direction. vfio-pci will reset as device when they're
opened by a user (when available) or triggered via the API.
> > > I'm not against this patch, just want to raise an attention that the
> > > outcome of this patch will be decrease in fixes of broken devices.
> > >
> > > Thanks
> > >
> > That makes sense but that isn't any different from existing reset
> > attribute. This patch inhances it and allows selecting a device supported
> > reset method instead of using first available reset method according to
> > existing hardcoded policy.
>
> The difference here is that this is a workaround to solve bugs that
> should be fixed in the kernel.
If we want to discourage using this as a primary means to resolve reset
issues on a device then we can create log warnings any time it's used.
Downstreams that really want this functionality are going to take this
patch from the list whether we accept it or not. As above, it seems
there are valid use cases. Even with mainstream vfio in QEMU, I go
through some hoops trying to determine if I can do a secondary bus
reset rather than a PM reset because it's not specified anywhere what a
"soft reset" means for any given device. This sort of interface could
make it easier to apply a system policy that a pci_reset_function()
should always perform a secondary bus reset if the only other option is
a PM reset. Maybe that policy mostly makes sense for a VM use case, so
we'd want one policy by default and another when the device is used for
this functionality. How could we accomplish that with a quirk? Thanks,
Alex
On Wed, 17 Mar 2021 20:02:06 +0100
Pali Rohár <[email protected]> wrote:
> On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > On Mon, 15 Mar 2021 15:52:38 +0100
> > Pali Rohár <[email protected]> wrote:
> >
> > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > Pali Rohár <[email protected]> wrote:
> > > >
> > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > warm reset respectively.
> > > > >
> > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > kernel function (yet).
> > > >
> > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > defined here.
> > >
> > > Ok!
> > >
> > > > Note that with this series the resets available through
> > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > exactly the same as they are currently. The bus and slot reset
> > > > methods used here are limited to devices where only a single function is
> > > > affected by the reset, therefore it is not like the patch you proposed
> > > > which performed a reset irrespective of the downstream devices. This
> > > > series only enables selection of the existing methods. Thanks,
> > > >
> > > > Alex
> > > >
> > >
> > > But with this patch series, there is still an issue with PCI secondary
> > > bus reset mechanism as exported sysfs attribute does not do that
> > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > let device in unconfigured / broken state.
> >
> > No, there's not:
> >
> > int pci_reset_function(struct pci_dev *dev)
> > {
> > int rc;
> >
> > if (!dev->reset_fn)
> > return -ENOTTY;
> >
> > pci_dev_lock(dev);
> > >>> pci_dev_save_and_disable(dev);
> >
> > rc = __pci_reset_function_locked(dev);
> >
> > >>> pci_dev_restore(dev);
> > pci_dev_unlock(dev);
> >
> > return rc;
> > }
> >
> > The remove/re-scan was discussed primarily because your patch performed
> > a bus reset regardless of what devices were affected by that reset and
> > it's difficult to manage the scope where multiple devices are affected.
> > Here, the bus and slot reset functions will fail unless the scope is
> > limited to the single device triggering this reset. Thanks,
> >
> > Alex
> >
>
> I was thinking a bit more about it and I'm really sure how it would
> behave with hotplugging PCIe bridge.
>
> On aardvark PCIe controller I have already tested that secondary bus
> reset bit is triggering Hot Reset event and then also Link Down event.
> These events are not handled by aardvark driver yet (needs to
> implemented into kernel's emulated root bridge code).
>
> But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> Kernel has already code which removes PCIe device if it changes presence
> bit (and inform via interrupt). And Link Down event triggers this
> change.
This is the difference between slot and bus resets, the slot reset is
implemented by the hotplug controller and disables presence detection
around the bus reset. Thanks,
Alex
On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> On Wed, 17 Mar 2021 20:02:06 +0100
> Pali Rohár <[email protected]> wrote:
>
> > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > Pali Rohár <[email protected]> wrote:
> > >
> > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > Pali Rohár <[email protected]> wrote:
> > > > >
> > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > warm reset respectively.
> > > > > >
> > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > kernel function (yet).
> > > > >
> > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > defined here.
> > > >
> > > > Ok!
> > > >
> > > > > Note that with this series the resets available through
> > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > exactly the same as they are currently. The bus and slot reset
> > > > > methods used here are limited to devices where only a single function is
> > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > which performed a reset irrespective of the downstream devices. This
> > > > > series only enables selection of the existing methods. Thanks,
> > > > >
> > > > > Alex
> > > > >
> > > >
> > > > But with this patch series, there is still an issue with PCI secondary
> > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > let device in unconfigured / broken state.
> > >
> > > No, there's not:
> > >
> > > int pci_reset_function(struct pci_dev *dev)
> > > {
> > > int rc;
> > >
> > > if (!dev->reset_fn)
> > > return -ENOTTY;
> > >
> > > pci_dev_lock(dev);
> > > >>> pci_dev_save_and_disable(dev);
> > >
> > > rc = __pci_reset_function_locked(dev);
> > >
> > > >>> pci_dev_restore(dev);
> > > pci_dev_unlock(dev);
> > >
> > > return rc;
> > > }
> > >
> > > The remove/re-scan was discussed primarily because your patch performed
> > > a bus reset regardless of what devices were affected by that reset and
> > > it's difficult to manage the scope where multiple devices are affected.
> > > Here, the bus and slot reset functions will fail unless the scope is
> > > limited to the single device triggering this reset. Thanks,
> > >
> > > Alex
> > >
> >
> > I was thinking a bit more about it and I'm really sure how it would
> > behave with hotplugging PCIe bridge.
> >
> > On aardvark PCIe controller I have already tested that secondary bus
> > reset bit is triggering Hot Reset event and then also Link Down event.
> > These events are not handled by aardvark driver yet (needs to
> > implemented into kernel's emulated root bridge code).
> >
> > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > Kernel has already code which removes PCIe device if it changes presence
> > bit (and inform via interrupt). And Link Down event triggers this
> > change.
>
> This is the difference between slot and bus resets, the slot reset is
> implemented by the hotplug controller and disables presence detection
> around the bus reset. Thanks,
Yes, but I'm talking about bus reset, not about slot reset.
I mean: to use bus reset via sysfs on hardware which supports slots and
hotplugging.
And if I'm reading code correctly, this combination is allowed, right?
Via these new patches it is possible to disable slot reset and enable
bus reset.
On Wed, 17 Mar 2021 20:24:24 +0100
Pali Rohár <[email protected]> wrote:
> On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > On Wed, 17 Mar 2021 20:02:06 +0100
> > Pali Rohár <[email protected]> wrote:
> >
> > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > Pali Rohár <[email protected]> wrote:
> > > >
> > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > Pali Rohár <[email protected]> wrote:
> > > > > >
> > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > warm reset respectively.
> > > > > > >
> > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > kernel function (yet).
> > > > > >
> > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > defined here.
> > > > >
> > > > > Ok!
> > > > >
> > > > > > Note that with this series the resets available through
> > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > methods used here are limited to devices where only a single function is
> > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > series only enables selection of the existing methods. Thanks,
> > > > > >
> > > > > > Alex
> > > > > >
> > > > >
> > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > let device in unconfigured / broken state.
> > > >
> > > > No, there's not:
> > > >
> > > > int pci_reset_function(struct pci_dev *dev)
> > > > {
> > > > int rc;
> > > >
> > > > if (!dev->reset_fn)
> > > > return -ENOTTY;
> > > >
> > > > pci_dev_lock(dev);
> > > > >>> pci_dev_save_and_disable(dev);
> > > >
> > > > rc = __pci_reset_function_locked(dev);
> > > >
> > > > >>> pci_dev_restore(dev);
> > > > pci_dev_unlock(dev);
> > > >
> > > > return rc;
> > > > }
> > > >
> > > > The remove/re-scan was discussed primarily because your patch performed
> > > > a bus reset regardless of what devices were affected by that reset and
> > > > it's difficult to manage the scope where multiple devices are affected.
> > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > limited to the single device triggering this reset. Thanks,
> > > >
> > > > Alex
> > > >
> > >
> > > I was thinking a bit more about it and I'm really sure how it would
> > > behave with hotplugging PCIe bridge.
> > >
> > > On aardvark PCIe controller I have already tested that secondary bus
> > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > These events are not handled by aardvark driver yet (needs to
> > > implemented into kernel's emulated root bridge code).
> > >
> > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > Kernel has already code which removes PCIe device if it changes presence
> > > bit (and inform via interrupt). And Link Down event triggers this
> > > change.
> >
> > This is the difference between slot and bus resets, the slot reset is
> > implemented by the hotplug controller and disables presence detection
> > around the bus reset. Thanks,
>
> Yes, but I'm talking about bus reset, not about slot reset.
>
> I mean: to use bus reset via sysfs on hardware which supports slots and
> hotplugging.
>
> And if I'm reading code correctly, this combination is allowed, right?
> Via these new patches it is possible to disable slot reset and enable
> bus reset.
That's true, a slot reset is simply a bus reset wrapped around code
that prevents the device from getting ejected. Maybe it would make
sense to combine the two as far as this interface is concerned, ie. a
single "bus" reset method that will always use slot reset when
available. Thanks,
Alex
On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> On Wed, 17 Mar 2021 20:24:24 +0100
> Pali Rohár <[email protected]> wrote:
>
> > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > Pali Rohár <[email protected]> wrote:
> > >
> > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > Pali Rohár <[email protected]> wrote:
> > > > >
> > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > >
> > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > warm reset respectively.
> > > > > > > >
> > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > kernel function (yet).
> > > > > > >
> > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > defined here.
> > > > > >
> > > > > > Ok!
> > > > > >
> > > > > > > Note that with this series the resets available through
> > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > >
> > > > > > > Alex
> > > > > > >
> > > > > >
> > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > let device in unconfigured / broken state.
> > > > >
> > > > > No, there's not:
> > > > >
> > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > {
> > > > > int rc;
> > > > >
> > > > > if (!dev->reset_fn)
> > > > > return -ENOTTY;
> > > > >
> > > > > pci_dev_lock(dev);
> > > > > >>> pci_dev_save_and_disable(dev);
> > > > >
> > > > > rc = __pci_reset_function_locked(dev);
> > > > >
> > > > > >>> pci_dev_restore(dev);
> > > > > pci_dev_unlock(dev);
> > > > >
> > > > > return rc;
> > > > > }
> > > > >
> > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > limited to the single device triggering this reset. Thanks,
> > > > >
> > > > > Alex
> > > > >
> > > >
> > > > I was thinking a bit more about it and I'm really sure how it would
> > > > behave with hotplugging PCIe bridge.
> > > >
> > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > These events are not handled by aardvark driver yet (needs to
> > > > implemented into kernel's emulated root bridge code).
> > > >
> > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > Kernel has already code which removes PCIe device if it changes presence
> > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > change.
> > >
> > > This is the difference between slot and bus resets, the slot reset is
> > > implemented by the hotplug controller and disables presence detection
> > > around the bus reset. Thanks,
> >
> > Yes, but I'm talking about bus reset, not about slot reset.
> >
> > I mean: to use bus reset via sysfs on hardware which supports slots and
> > hotplugging.
> >
> > And if I'm reading code correctly, this combination is allowed, right?
> > Via these new patches it is possible to disable slot reset and enable
> > bus reset.
>
> That's true, a slot reset is simply a bus reset wrapped around code
> that prevents the device from getting ejected.
Yes, this makes slot reset "safe". But bus reset is "unsafe".
> Maybe it would make
> sense to combine the two as far as this interface is concerned, ie. a
> single "bus" reset method that will always use slot reset when
> available. Thanks,
That should work when slot reset is available.
Other option is that mentioned remove-reset-rescan procedure.
But quick search in drivers/pci/hotplug/ results that not all hotplug
drivers implement reset_slot method.
So there is a possible issue with hotplug driver which may eject device
during bus reset (because e.g. slot reset is not implemented)?
On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:
> On Wed, 17 Mar 2021 20:40:24 +0100
> Pali Rohár <[email protected]> wrote:
>
> > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > Pali Rohár <[email protected]> wrote:
> > >
> > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > Pali Rohár <[email protected]> wrote:
> > > > >
> > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > >
> > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > warm reset respectively.
> > > > > > > > > >
> > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > kernel function (yet).
> > > > > > > > >
> > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > defined here.
> > > > > > > >
> > > > > > > > Ok!
> > > > > > > >
> > > > > > > > > Note that with this series the resets available through
> > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > >
> > > > > > > > > Alex
> > > > > > > > >
> > > > > > > >
> > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > let device in unconfigured / broken state.
> > > > > > >
> > > > > > > No, there's not:
> > > > > > >
> > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > {
> > > > > > > int rc;
> > > > > > >
> > > > > > > if (!dev->reset_fn)
> > > > > > > return -ENOTTY;
> > > > > > >
> > > > > > > pci_dev_lock(dev);
> > > > > > > >>> pci_dev_save_and_disable(dev);
> > > > > > >
> > > > > > > rc = __pci_reset_function_locked(dev);
> > > > > > >
> > > > > > > >>> pci_dev_restore(dev);
> > > > > > > pci_dev_unlock(dev);
> > > > > > >
> > > > > > > return rc;
> > > > > > > }
> > > > > > >
> > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > limited to the single device triggering this reset. Thanks,
> > > > > > >
> > > > > > > Alex
> > > > > > >
> > > > > >
> > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > behave with hotplugging PCIe bridge.
> > > > > >
> > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > implemented into kernel's emulated root bridge code).
> > > > > >
> > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > change.
> > > > >
> > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > implemented by the hotplug controller and disables presence detection
> > > > > around the bus reset. Thanks,
> > > >
> > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > >
> > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > hotplugging.
> > > >
> > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > Via these new patches it is possible to disable slot reset and enable
> > > > bus reset.
> > >
> > > That's true, a slot reset is simply a bus reset wrapped around code
> > > that prevents the device from getting ejected.
> >
> > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> >
> > > Maybe it would make
> > > sense to combine the two as far as this interface is concerned, ie. a
> > > single "bus" reset method that will always use slot reset when
> > > available. Thanks,
> >
> > That should work when slot reset is available.
> >
> > Other option is that mentioned remove-reset-rescan procedure.
>
> That's not something we can introduce to the pci_reset_function() path
> without a fair bit of collateral in using it through vfio-pci.
>
> > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > drivers implement reset_slot method.
> >
> > So there is a possible issue with hotplug driver which may eject device
> > during bus reset (because e.g. slot reset is not implemented)?
>
> People aren't reporting it, so maybe those controllers aren't being
> used for this use case. Or maybe introducing this patch will make
> these reset methods more readily accessible for testing. We can fix or
> blacklist those controllers for bus reset when reports come in. Thanks,
Ok! I do not know neither if those controllers are used, but looks like
that there are still changes in hotplug code.
So I guess with these patches people can test it and report issues when
such thing happen.
On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> On Mon, 15 Mar 2021 15:52:38 +0100
> Pali Rohár <[email protected]> wrote:
>
> > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > Pali Rohár <[email protected]> wrote:
> > >
> > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > warm reset respectively.
> > > >
> > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > kernel and therefore drivers do not export this type of reset via any
> > > > kernel function (yet).
> > >
> > > Warm reset is beyond the scope of this series, but could be implemented
> > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > defined here.
> >
> > Ok!
> >
> > > Note that with this series the resets available through
> > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > exactly the same as they are currently. The bus and slot reset
> > > methods used here are limited to devices where only a single function is
> > > affected by the reset, therefore it is not like the patch you proposed
> > > which performed a reset irrespective of the downstream devices. This
> > > series only enables selection of the existing methods. Thanks,
> > >
> > > Alex
> > >
> >
> > But with this patch series, there is still an issue with PCI secondary
> > bus reset mechanism as exported sysfs attribute does not do that
> > remove-reset-rescan procedure. As discussed in other thread, this reset
> > let device in unconfigured / broken state.
>
> No, there's not:
>
> int pci_reset_function(struct pci_dev *dev)
> {
> int rc;
>
> if (!dev->reset_fn)
> return -ENOTTY;
>
> pci_dev_lock(dev);
> >>> pci_dev_save_and_disable(dev);
>
> rc = __pci_reset_function_locked(dev);
>
> >>> pci_dev_restore(dev);
> pci_dev_unlock(dev);
>
> return rc;
> }
>
> The remove/re-scan was discussed primarily because your patch performed
> a bus reset regardless of what devices were affected by that reset and
> it's difficult to manage the scope where multiple devices are affected.
> Here, the bus and slot reset functions will fail unless the scope is
> limited to the single device triggering this reset. Thanks,
>
> Alex
>
I was thinking a bit more about it and I'm really sure how it would
behave with hotplugging PCIe bridge.
On aardvark PCIe controller I have already tested that secondary bus
reset bit is triggering Hot Reset event and then also Link Down event.
These events are not handled by aardvark driver yet (needs to
implemented into kernel's emulated root bridge code).
But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
Kernel has already code which removes PCIe device if it changes presence
bit (and inform via interrupt). And Link Down event triggers this
change.
Can somebody test these changes on some PCIe hotplug controller what
secondary bus reset via sysfs would do? Because currently it is not
exported as reset method and there can be different race conditions and
maybe error (?) if hotplug code is going to remove device on which user
triggered bus reset via sysfs.
And in my opinion this can happen also in case when only one device is
on the bus, so it perfectly matches all conditions when sysfs can use
bus reset for one device.
I can try to implement hotplug code into aardvark driver and root bridge
emulator to test how this patch would happen. But it would take some
time...
On Wed, 17 Mar 2021 20:40:24 +0100
Pali Rohár <[email protected]> wrote:
> On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > On Wed, 17 Mar 2021 20:24:24 +0100
> > Pali Rohár <[email protected]> wrote:
> >
> > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > Pali Rohár <[email protected]> wrote:
> > > >
> > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > Pali Rohár <[email protected]> wrote:
> > > > > >
> > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > >
> > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > warm reset respectively.
> > > > > > > > >
> > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > kernel function (yet).
> > > > > > > >
> > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > defined here.
> > > > > > >
> > > > > > > Ok!
> > > > > > >
> > > > > > > > Note that with this series the resets available through
> > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > >
> > > > > > > > Alex
> > > > > > > >
> > > > > > >
> > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > let device in unconfigured / broken state.
> > > > > >
> > > > > > No, there's not:
> > > > > >
> > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > {
> > > > > > int rc;
> > > > > >
> > > > > > if (!dev->reset_fn)
> > > > > > return -ENOTTY;
> > > > > >
> > > > > > pci_dev_lock(dev);
> > > > > > >>> pci_dev_save_and_disable(dev);
> > > > > >
> > > > > > rc = __pci_reset_function_locked(dev);
> > > > > >
> > > > > > >>> pci_dev_restore(dev);
> > > > > > pci_dev_unlock(dev);
> > > > > >
> > > > > > return rc;
> > > > > > }
> > > > > >
> > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > limited to the single device triggering this reset. Thanks,
> > > > > >
> > > > > > Alex
> > > > > >
> > > > >
> > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > behave with hotplugging PCIe bridge.
> > > > >
> > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > These events are not handled by aardvark driver yet (needs to
> > > > > implemented into kernel's emulated root bridge code).
> > > > >
> > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > change.
> > > >
> > > > This is the difference between slot and bus resets, the slot reset is
> > > > implemented by the hotplug controller and disables presence detection
> > > > around the bus reset. Thanks,
> > >
> > > Yes, but I'm talking about bus reset, not about slot reset.
> > >
> > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > hotplugging.
> > >
> > > And if I'm reading code correctly, this combination is allowed, right?
> > > Via these new patches it is possible to disable slot reset and enable
> > > bus reset.
> >
> > That's true, a slot reset is simply a bus reset wrapped around code
> > that prevents the device from getting ejected.
>
> Yes, this makes slot reset "safe". But bus reset is "unsafe".
>
> > Maybe it would make
> > sense to combine the two as far as this interface is concerned, ie. a
> > single "bus" reset method that will always use slot reset when
> > available. Thanks,
>
> That should work when slot reset is available.
>
> Other option is that mentioned remove-reset-rescan procedure.
That's not something we can introduce to the pci_reset_function() path
without a fair bit of collateral in using it through vfio-pci.
> But quick search in drivers/pci/hotplug/ results that not all hotplug
> drivers implement reset_slot method.
>
> So there is a possible issue with hotplug driver which may eject device
> during bus reset (because e.g. slot reset is not implemented)?
People aren't reporting it, so maybe those controllers aren't being
used for this use case. Or maybe introducing this patch will make
these reset methods more readily accessible for testing. We can fix or
blacklist those controllers for bus reset when reports come in. Thanks,
Alex
On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> On Wed, 17 Mar 2021 15:58:40 +0200
> Leon Romanovsky <[email protected]> wrote:
>
> > On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> > > On 21/03/17 01:47PM, Leon Romanovsky wrote:
> > > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > > > > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > > > Amey Narkhede <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > defined here. Note that with this series the resets available through
> > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Alex,
> > > > > > > > > > > >
> > > > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > > > >
> > > > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > > > I don't know the details yet.
> > > > > > > > > >
> > > > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > > > impose a fixed priority in our ordering. We can and probably should
> > > > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > > > to experiment, ie. this reset works, that one doesn't. We might also
> > > > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > > > platform. Exposing a way to test these things might lead to better
> > > > > > > > > > quirks. In the case I think Pali was looking for, they wanted a
> > > > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > > > attribute, but also the reset mechanism used by vfio-pci. Thanks,
> > > > > > > > > >
> > > > > > > > > > Alex
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > > > easier.
> > > > > > > > >
> > > > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > > > (or other entities') findings should give better visibility into
> > > > > > > > > which issues apply to the device in general and which are platform
> > > > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > > > current alternatives.
> > > > > > > >
> > > > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > > > overcome HW bugs, am I right?
> > > > > > > >
> > > > > > > > My personal experience shows that once the easy workaround exists
> > > > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > > > >
> > > > > > > > My 2-cents.
> > > > > > > >
> > > > > > > I agree with your point but at least it gives the userspace ability
> > > > > > > to use broken device until bug is fixed in upstream.
> > > > > >
> > > > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > > > use cheap workaround. There is no incentive to fix it.
>
> We can increase the annoyance factor of using a modified set of reset
> methods, but ultimately we can only control what goes into our kernel,
> other kernels might take v1 of this series and incorporate it
> regardless of what happens here.
>
> > > > > > > This is also applicable for obscure devices without upstream
> > > > > > > drivers for example custom FPGA based devices.
> > > > > >
> > > > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > > > custom, they don't need upstream, we don't need them :)
> > > > > >
> > > > > By custom I meant hobbyists who could tinker with their custom FPGA.
> > > >
> > > > I invite such hobbyists to send patches and include their FPGA in
> > > > upstream kernel.
>
> This is potentially another good use case, how receptive are we going
> to be to an FPGA design that botches a reset. Do they have a valid
> device ID for us to base a quirk on, are they just squatting on one, or
> using the default from a library. Maybe the next bitstream will
> resolve it, maybe without any external indication. IOW, what would the
> quality level be for that quirk versus using this as a workaround,
> where the user probably wouldn't mind a kernel nag?
It is worth to solve it when the need arises.
>
> > > > > > > Another main application which I forgot to mention is virtualization
> > > > > > > where vmm wants to reset the device when the guest is reset,
> > > > > > > to emulate machine reboot as closely as possible.
> > > > > >
> > > > > > It can work in very narrow case, because reset will cause to device
> > > > > > reprobe and most likely the driver will be different from the one that
> > > > > > started reset. I can imagine that net devices will lose their state and
> > > > > > config after such reset too.
> > > > > >
> > > > > Not sure if I got that 100% right. The pci_reset_function() function
> > > > > saves and restores device state over the reset.
> > > >
> > > > I'm talking about netdev state, but whatever given the existence of
> > > > sysfs reset knob.
> > > >
> > > > >
> > > > > > IMHO, it will be saner for everyone if virtualization don't try such resets.
>
> That would cause a massive regression in device assignment support. As
> with other sysfs attributes, triggering them alongside a running driver
> is probably not going to end well. However, pci_reset_function() is
> extremely useful for stopping devices and returning them to a default
> state, when either rebooting a VM or returning the device to the host.
> The device is not removed and re-probed when this occurs, vfio-pci is
> able to hold onto the device across these actions. Sure, don't reset a
> netdev device when it's in use, that's not what these are used for.
>
> > > > > The exists reset sysfs attribute was added for exactly this case
> > > > > though.
> > > >
> > > > I didn't know the rationale behind that file till you said and I
> > > > googled libvirt discussion, so ok. Do you propose that libvirt
> > > > will manage database of devices and their working reset types?
> > > >
> > > I don't have much idea about internals of libvirt but why would
> > > it need to manage database of working reset types? It could just
> > > read new reset_methods attribute to get the list of supported reset
> > > methods.
> >
> > Because the idea of this patch is to read all supported reset types and
> > allow to the user to chose the working one. The user will do it with
> > help from StackOverflow, but libvirt will need to have some sort of
> > database, otherwise it won't be different from simple "echo 1 > reset"
> > which will iterate over all supported resets anyway.
>
> AFAIK, libvirt no longer attempts to do resets itself, or is at least
> moving in that direction. vfio-pci will reset as device when they're
> opened by a user (when available) or triggered via the API.
<...>
> > The difference here is that this is a workaround to solve bugs that
> > should be fixed in the kernel.
>
> If we want to discourage using this as a primary means to resolve reset
> issues on a device then we can create log warnings any time it's used.
> Downstreams that really want this functionality are going to take this
> patch from the list whether we accept it or not. As above, it seems
> there are valid use cases. Even with mainstream vfio in QEMU, I go
> through some hoops trying to determine if I can do a secondary bus
> reset rather than a PM reset because it's not specified anywhere what a
> "soft reset" means for any given device. This sort of interface could
> make it easier to apply a system policy that a pci_reset_function()
> should always perform a secondary bus reset if the only other option is
> a PM reset. Maybe that policy mostly makes sense for a VM use case, so
> we'd want one policy by default and another when the device is used for
> this functionality. How could we accomplish that with a quirk? Thanks,
I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
If it is latter then we don't really need sysfs, if not, we still need
some sort of DB to create second policy, because "supported != working".
What am I missing?
Thanks
>
> Alex
>
On 21/03/18 11:09AM, Leon Romanovsky wrote:
> On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > On Wed, 17 Mar 2021 15:58:40 +0200
> > Leon Romanovsky <[email protected]> wrote:
> >
> > > On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> > > > On 21/03/17 01:47PM, Leon Romanovsky wrote:
> > > > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > > > > > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > > > > Amey Narkhede <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > > defined here. Note that with this series the resets available through
> > > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alex,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > > > > >
> > > > > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > > > > I don't know the details yet.
> > > > > > > > > > >
> > > > > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > > > > impose a fixed priority in our ordering. We can and probably should
> > > > > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > > > > to experiment, ie. this reset works, that one doesn't. We might also
> > > > > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > > > > platform. Exposing a way to test these things might lead to better
> > > > > > > > > > > quirks. In the case I think Pali was looking for, they wanted a
> > > > > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > > > > attribute, but also the reset mechanism used by vfio-pci. Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Alex
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > > > > easier.
> > > > > > > > > >
> > > > > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > > > > (or other entities') findings should give better visibility into
> > > > > > > > > > which issues apply to the device in general and which are platform
> > > > > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > > > > current alternatives.
> > > > > > > > >
> > > > > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > > > > overcome HW bugs, am I right?
> > > > > > > > >
> > > > > > > > > My personal experience shows that once the easy workaround exists
> > > > > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > > > > >
> > > > > > > > > My 2-cents.
> > > > > > > > >
> > > > > > > > I agree with your point but at least it gives the userspace ability
> > > > > > > > to use broken device until bug is fixed in upstream.
> > > > > > >
> > > > > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > > > > use cheap workaround. There is no incentive to fix it.
> >
> > We can increase the annoyance factor of using a modified set of reset
> > methods, but ultimately we can only control what goes into our kernel,
> > other kernels might take v1 of this series and incorporate it
> > regardless of what happens here.
> >
> > > > > > > > This is also applicable for obscure devices without upstream
> > > > > > > > drivers for example custom FPGA based devices.
> > > > > > >
> > > > > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > > > > custom, they don't need upstream, we don't need them :)
> > > > > > >
> > > > > > By custom I meant hobbyists who could tinker with their custom FPGA.
> > > > >
> > > > > I invite such hobbyists to send patches and include their FPGA in
> > > > > upstream kernel.
> >
> > This is potentially another good use case, how receptive are we going
> > to be to an FPGA design that botches a reset. Do they have a valid
> > device ID for us to base a quirk on, are they just squatting on one, or
> > using the default from a library. Maybe the next bitstream will
> > resolve it, maybe without any external indication. IOW, what would the
> > quality level be for that quirk versus using this as a workaround,
> > where the user probably wouldn't mind a kernel nag?
>
> It is worth to solve it when the need arises.
>
> >
> > > > > > > > Another main application which I forgot to mention is virtualization
> > > > > > > > where vmm wants to reset the device when the guest is reset,
> > > > > > > > to emulate machine reboot as closely as possible.
> > > > > > >
> > > > > > > It can work in very narrow case, because reset will cause to device
> > > > > > > reprobe and most likely the driver will be different from the one that
> > > > > > > started reset. I can imagine that net devices will lose their state and
> > > > > > > config after such reset too.
> > > > > > >
> > > > > > Not sure if I got that 100% right. The pci_reset_function() function
> > > > > > saves and restores device state over the reset.
> > > > >
> > > > > I'm talking about netdev state, but whatever given the existence of
> > > > > sysfs reset knob.
> > > > >
> > > > > >
> > > > > > > IMHO, it will be saner for everyone if virtualization don't try such resets.
> >
> > That would cause a massive regression in device assignment support. As
> > with other sysfs attributes, triggering them alongside a running driver
> > is probably not going to end well. However, pci_reset_function() is
> > extremely useful for stopping devices and returning them to a default
> > state, when either rebooting a VM or returning the device to the host.
> > The device is not removed and re-probed when this occurs, vfio-pci is
> > able to hold onto the device across these actions. Sure, don't reset a
> > netdev device when it's in use, that's not what these are used for.
> >
> > > > > > The exists reset sysfs attribute was added for exactly this case
> > > > > > though.
> > > > >
> > > > > I didn't know the rationale behind that file till you said and I
> > > > > googled libvirt discussion, so ok. Do you propose that libvirt
> > > > > will manage database of devices and their working reset types?
> > > > >
> > > > I don't have much idea about internals of libvirt but why would
> > > > it need to manage database of working reset types? It could just
> > > > read new reset_methods attribute to get the list of supported reset
> > > > methods.
> > >
> > > Because the idea of this patch is to read all supported reset types and
> > > allow to the user to chose the working one. The user will do it with
> > > help from StackOverflow, but libvirt will need to have some sort of
> > > database, otherwise it won't be different from simple "echo 1 > reset"
> > > which will iterate over all supported resets anyway.
> >
> > AFAIK, libvirt no longer attempts to do resets itself, or is at least
> > moving in that direction. vfio-pci will reset as device when they're
> > opened by a user (when available) or triggered via the API.
>
> <...>
>
> > > The difference here is that this is a workaround to solve bugs that
> > > should be fixed in the kernel.
> >
> > If we want to discourage using this as a primary means to resolve reset
> > issues on a device then we can create log warnings any time it's used.
> > Downstreams that really want this functionality are going to take this
> > patch from the list whether we accept it or not. As above, it seems
> > there are valid use cases. Even with mainstream vfio in QEMU, I go
> > through some hoops trying to determine if I can do a secondary bus
> > reset rather than a PM reset because it's not specified anywhere what a
> > "soft reset" means for any given device. This sort of interface could
> > make it easier to apply a system policy that a pci_reset_function()
> > should always perform a secondary bus reset if the only other option is
> > a PM reset. Maybe that policy mostly makes sense for a VM use case, so
> > we'd want one policy by default and another when the device is used for
> > this functionality. How could we accomplish that with a quirk? Thanks,
>
> I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
>
> If it is latter then we don't really need sysfs, if not, we still need
> some sort of DB to create second policy, because "supported != working".
> What am I missing?
>
> Thanks
>
Can you explain bit more about why supported != working?
Why would hardware indicate that it supports specific reset
method if it doesn't work? There is only an unsual quirk for intel
82599 which supports FLR but only reports in PF DEVCAP not in
VF DEVCAP so we need to directly call FLR without checking if it
is supported.
Thanks,
Amey
On 21/03/17 09:13PM, Pali Roh?r wrote:
> On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:
> > On Wed, 17 Mar 2021 20:40:24 +0100
> > Pali Roh?r <[email protected]> wrote:
> >
> > > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > > Pali Roh?r <[email protected]> wrote:
> > > >
> > > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > >
> > > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > > >
> > > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > >
> > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > kernel function (yet).
> > > > > > > > > >
> > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > defined here.
> > > > > > > > >
> > > > > > > > > Ok!
> > > > > > > > >
> > > > > > > > > > Note that with this series the resets available through
> > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > > >
> > > > > > > > > > Alex
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > > let device in unconfigured / broken state.
> > > > > > > >
> > > > > > > > No, there's not:
> > > > > > > >
> > > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > > {
> > > > > > > > int rc;
> > > > > > > >
> > > > > > > > if (!dev->reset_fn)
> > > > > > > > return -ENOTTY;
> > > > > > > >
> > > > > > > > pci_dev_lock(dev);
> > > > > > > > >>> pci_dev_save_and_disable(dev);
> > > > > > > >
> > > > > > > > rc = __pci_reset_function_locked(dev);
> > > > > > > >
> > > > > > > > >>> pci_dev_restore(dev);
> > > > > > > > pci_dev_unlock(dev);
> > > > > > > >
> > > > > > > > return rc;
> > > > > > > > }
> > > > > > > >
> > > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > > limited to the single device triggering this reset. Thanks,
> > > > > > > >
> > > > > > > > Alex
> > > > > > > >
> > > > > > >
> > > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > > behave with hotplugging PCIe bridge.
> > > > > > >
> > > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > > implemented into kernel's emulated root bridge code).
> > > > > > >
> > > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > > change.
> > > > > >
> > > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > > implemented by the hotplug controller and disables presence detection
> > > > > > around the bus reset. Thanks,
> > > > >
> > > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > > >
> > > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > > hotplugging.
> > > > >
> > > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > > Via these new patches it is possible to disable slot reset and enable
> > > > > bus reset.
> > > >
> > > > That's true, a slot reset is simply a bus reset wrapped around code
> > > > that prevents the device from getting ejected.
> > >
> > > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> > >
> > > > Maybe it would make
> > > > sense to combine the two as far as this interface is concerned, ie. a
> > > > single "bus" reset method that will always use slot reset when
> > > > available. Thanks,
> > >
> > > That should work when slot reset is available.
> > >
> > > Other option is that mentioned remove-reset-rescan procedure.
> >
> > That's not something we can introduce to the pci_reset_function() path
> > without a fair bit of collateral in using it through vfio-pci.
> >
> > > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > > drivers implement reset_slot method.
> > >
> > > So there is a possible issue with hotplug driver which may eject device
> > > during bus reset (because e.g. slot reset is not implemented)?
> >
> > People aren't reporting it, so maybe those controllers aren't being
> > used for this use case. Or maybe introducing this patch will make
> > these reset methods more readily accessible for testing. We can fix or
> > blacklist those controllers for bus reset when reports come in. Thanks,
>
> Ok! I do not know neither if those controllers are used, but looks like
> that there are still changes in hotplug code.
>
> So I guess with these patches people can test it and report issues when
> such thing happen.
So after a bit research as I understood we need to group slot
and bus reset together in a single category of reset methods and
then implicitly use slot reset if it is available when bus reset is
enabled by the user.
Is that right?
Thanks,
Amey
On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > Leon Romanovsky <[email protected]> wrote:
> > >
> > > > On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> > > > > On 21/03/17 01:47PM, Leon Romanovsky wrote:
> > > > > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > > > > > > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > > > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > > > > > Amey Narkhede <[email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > > > defined here. Note that with this series the resets available through
> > > > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Alex,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > > > > > >
> > > > > > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > > > > > I don't know the details yet.
> > > > > > > > > > > >
> > > > > > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > > > > > impose a fixed priority in our ordering. We can and probably should
> > > > > > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > > > > > to experiment, ie. this reset works, that one doesn't. We might also
> > > > > > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > > > > > platform. Exposing a way to test these things might lead to better
> > > > > > > > > > > > quirks. In the case I think Pali was looking for, they wanted a
> > > > > > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > > > > > attribute, but also the reset mechanism used by vfio-pci. Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Alex
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > > > > > easier.
> > > > > > > > > > >
> > > > > > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > > > > > (or other entities') findings should give better visibility into
> > > > > > > > > > > which issues apply to the device in general and which are platform
> > > > > > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > > > > > current alternatives.
> > > > > > > > > >
> > > > > > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > > > > > overcome HW bugs, am I right?
> > > > > > > > > >
> > > > > > > > > > My personal experience shows that once the easy workaround exists
> > > > > > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > > > > > >
> > > > > > > > > > My 2-cents.
> > > > > > > > > >
> > > > > > > > > I agree with your point but at least it gives the userspace ability
> > > > > > > > > to use broken device until bug is fixed in upstream.
> > > > > > > >
> > > > > > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > > > > > use cheap workaround. There is no incentive to fix it.
> > >
> > > We can increase the annoyance factor of using a modified set of reset
> > > methods, but ultimately we can only control what goes into our kernel,
> > > other kernels might take v1 of this series and incorporate it
> > > regardless of what happens here.
> > >
> > > > > > > > > This is also applicable for obscure devices without upstream
> > > > > > > > > drivers for example custom FPGA based devices.
> > > > > > > >
> > > > > > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > > > > > custom, they don't need upstream, we don't need them :)
> > > > > > > >
> > > > > > > By custom I meant hobbyists who could tinker with their custom FPGA.
> > > > > >
> > > > > > I invite such hobbyists to send patches and include their FPGA in
> > > > > > upstream kernel.
> > >
> > > This is potentially another good use case, how receptive are we going
> > > to be to an FPGA design that botches a reset. Do they have a valid
> > > device ID for us to base a quirk on, are they just squatting on one, or
> > > using the default from a library. Maybe the next bitstream will
> > > resolve it, maybe without any external indication. IOW, what would the
> > > quality level be for that quirk versus using this as a workaround,
> > > where the user probably wouldn't mind a kernel nag?
> >
> > It is worth to solve it when the need arises.
> >
> > >
> > > > > > > > > Another main application which I forgot to mention is virtualization
> > > > > > > > > where vmm wants to reset the device when the guest is reset,
> > > > > > > > > to emulate machine reboot as closely as possible.
> > > > > > > >
> > > > > > > > It can work in very narrow case, because reset will cause to device
> > > > > > > > reprobe and most likely the driver will be different from the one that
> > > > > > > > started reset. I can imagine that net devices will lose their state and
> > > > > > > > config after such reset too.
> > > > > > > >
> > > > > > > Not sure if I got that 100% right. The pci_reset_function() function
> > > > > > > saves and restores device state over the reset.
> > > > > >
> > > > > > I'm talking about netdev state, but whatever given the existence of
> > > > > > sysfs reset knob.
> > > > > >
> > > > > > >
> > > > > > > > IMHO, it will be saner for everyone if virtualization don't try such resets.
> > >
> > > That would cause a massive regression in device assignment support. As
> > > with other sysfs attributes, triggering them alongside a running driver
> > > is probably not going to end well. However, pci_reset_function() is
> > > extremely useful for stopping devices and returning them to a default
> > > state, when either rebooting a VM or returning the device to the host.
> > > The device is not removed and re-probed when this occurs, vfio-pci is
> > > able to hold onto the device across these actions. Sure, don't reset a
> > > netdev device when it's in use, that's not what these are used for.
> > >
> > > > > > > The exists reset sysfs attribute was added for exactly this case
> > > > > > > though.
> > > > > >
> > > > > > I didn't know the rationale behind that file till you said and I
> > > > > > googled libvirt discussion, so ok. Do you propose that libvirt
> > > > > > will manage database of devices and their working reset types?
> > > > > >
> > > > > I don't have much idea about internals of libvirt but why would
> > > > > it need to manage database of working reset types? It could just
> > > > > read new reset_methods attribute to get the list of supported reset
> > > > > methods.
> > > >
> > > > Because the idea of this patch is to read all supported reset types and
> > > > allow to the user to chose the working one. The user will do it with
> > > > help from StackOverflow, but libvirt will need to have some sort of
> > > > database, otherwise it won't be different from simple "echo 1 > reset"
> > > > which will iterate over all supported resets anyway.
> > >
> > > AFAIK, libvirt no longer attempts to do resets itself, or is at least
> > > moving in that direction. vfio-pci will reset as device when they're
> > > opened by a user (when available) or triggered via the API.
> >
> > <...>
> >
> > > > The difference here is that this is a workaround to solve bugs that
> > > > should be fixed in the kernel.
> > >
> > > If we want to discourage using this as a primary means to resolve reset
> > > issues on a device then we can create log warnings any time it's used.
> > > Downstreams that really want this functionality are going to take this
> > > patch from the list whether we accept it or not. As above, it seems
> > > there are valid use cases. Even with mainstream vfio in QEMU, I go
> > > through some hoops trying to determine if I can do a secondary bus
> > > reset rather than a PM reset because it's not specified anywhere what a
> > > "soft reset" means for any given device. This sort of interface could
> > > make it easier to apply a system policy that a pci_reset_function()
> > > should always perform a secondary bus reset if the only other option is
> > > a PM reset. Maybe that policy mostly makes sense for a VM use case, so
> > > we'd want one policy by default and another when the device is used for
> > > this functionality. How could we accomplish that with a quirk? Thanks,
> >
> > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> >
> > If it is latter then we don't really need sysfs, if not, we still need
> > some sort of DB to create second policy, because "supported != working".
> > What am I missing?
> >
> > Thanks
> >
> Can you explain bit more about why supported != working?
It is written in the commit message of this patch.
https://lore.kernel.org/lkml/[email protected]/
"This feature aims to allow greater control of a device for use cases
as device assignment, where specific device or platform issues may
interact poorly with a given reset method, and for which device specific
quirks have not been developed."
You wrote it and also repeated it a couple of times during the discussion.
If device can understand that specific reset doesn't work, it won't
perform it in first place.
Thanks
On Thu, 18 Mar 2021 11:09:34 +0200
Leon Romanovsky <[email protected]> wrote:
> On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > On Wed, 17 Mar 2021 15:58:40 +0200
> > Leon Romanovsky <[email protected]> wrote:
> >
> > > On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> > > > On 21/03/17 01:47PM, Leon Romanovsky wrote:
> > > > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > > > > > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > > > > Amey Narkhede <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > > defined here. Note that with this series the resets available through
> > > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alex,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > > > > >
> > > > > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > > > > I don't know the details yet.
> > > > > > > > > > >
> > > > > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > > > > impose a fixed priority in our ordering. We can and probably should
> > > > > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > > > > to experiment, ie. this reset works, that one doesn't. We might also
> > > > > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > > > > platform. Exposing a way to test these things might lead to better
> > > > > > > > > > > quirks. In the case I think Pali was looking for, they wanted a
> > > > > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > > > > attribute, but also the reset mechanism used by vfio-pci. Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Alex
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > > > > easier.
> > > > > > > > > >
> > > > > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > > > > (or other entities') findings should give better visibility into
> > > > > > > > > > which issues apply to the device in general and which are platform
> > > > > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > > > > current alternatives.
> > > > > > > > >
> > > > > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > > > > overcome HW bugs, am I right?
> > > > > > > > >
> > > > > > > > > My personal experience shows that once the easy workaround exists
> > > > > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > > > > >
> > > > > > > > > My 2-cents.
> > > > > > > > >
> > > > > > > > I agree with your point but at least it gives the userspace ability
> > > > > > > > to use broken device until bug is fixed in upstream.
> > > > > > >
> > > > > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > > > > use cheap workaround. There is no incentive to fix it.
> >
> > We can increase the annoyance factor of using a modified set of reset
> > methods, but ultimately we can only control what goes into our kernel,
> > other kernels might take v1 of this series and incorporate it
> > regardless of what happens here.
> >
> > > > > > > > This is also applicable for obscure devices without upstream
> > > > > > > > drivers for example custom FPGA based devices.
> > > > > > >
> > > > > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > > > > custom, they don't need upstream, we don't need them :)
> > > > > > >
> > > > > > By custom I meant hobbyists who could tinker with their custom FPGA.
> > > > >
> > > > > I invite such hobbyists to send patches and include their FPGA in
> > > > > upstream kernel.
> >
> > This is potentially another good use case, how receptive are we going
> > to be to an FPGA design that botches a reset. Do they have a valid
> > device ID for us to base a quirk on, are they just squatting on one, or
> > using the default from a library. Maybe the next bitstream will
> > resolve it, maybe without any external indication. IOW, what would the
> > quality level be for that quirk versus using this as a workaround,
> > where the user probably wouldn't mind a kernel nag?
>
> It is worth to solve it when the need arises.
>
> >
> > > > > > > > Another main application which I forgot to mention is virtualization
> > > > > > > > where vmm wants to reset the device when the guest is reset,
> > > > > > > > to emulate machine reboot as closely as possible.
> > > > > > >
> > > > > > > It can work in very narrow case, because reset will cause to device
> > > > > > > reprobe and most likely the driver will be different from the one that
> > > > > > > started reset. I can imagine that net devices will lose their state and
> > > > > > > config after such reset too.
> > > > > > >
> > > > > > Not sure if I got that 100% right. The pci_reset_function() function
> > > > > > saves and restores device state over the reset.
> > > > >
> > > > > I'm talking about netdev state, but whatever given the existence of
> > > > > sysfs reset knob.
> > > > >
> > > > > >
> > > > > > > IMHO, it will be saner for everyone if virtualization don't try such resets.
> >
> > That would cause a massive regression in device assignment support. As
> > with other sysfs attributes, triggering them alongside a running driver
> > is probably not going to end well. However, pci_reset_function() is
> > extremely useful for stopping devices and returning them to a default
> > state, when either rebooting a VM or returning the device to the host.
> > The device is not removed and re-probed when this occurs, vfio-pci is
> > able to hold onto the device across these actions. Sure, don't reset a
> > netdev device when it's in use, that's not what these are used for.
> >
> > > > > > The exists reset sysfs attribute was added for exactly this case
> > > > > > though.
> > > > >
> > > > > I didn't know the rationale behind that file till you said and I
> > > > > googled libvirt discussion, so ok. Do you propose that libvirt
> > > > > will manage database of devices and their working reset types?
> > > > >
> > > > I don't have much idea about internals of libvirt but why would
> > > > it need to manage database of working reset types? It could just
> > > > read new reset_methods attribute to get the list of supported reset
> > > > methods.
> > >
> > > Because the idea of this patch is to read all supported reset types and
> > > allow to the user to chose the working one. The user will do it with
> > > help from StackOverflow, but libvirt will need to have some sort of
> > > database, otherwise it won't be different from simple "echo 1 > reset"
> > > which will iterate over all supported resets anyway.
> >
> > AFAIK, libvirt no longer attempts to do resets itself, or is at least
> > moving in that direction. vfio-pci will reset as device when they're
> > opened by a user (when available) or triggered via the API.
>
> <...>
>
> > > The difference here is that this is a workaround to solve bugs that
> > > should be fixed in the kernel.
> >
> > If we want to discourage using this as a primary means to resolve reset
> > issues on a device then we can create log warnings any time it's used.
> > Downstreams that really want this functionality are going to take this
> > patch from the list whether we accept it or not. As above, it seems
> > there are valid use cases. Even with mainstream vfio in QEMU, I go
> > through some hoops trying to determine if I can do a secondary bus
> > reset rather than a PM reset because it's not specified anywhere what a
> > "soft reset" means for any given device. This sort of interface could
> > make it easier to apply a system policy that a pci_reset_function()
> > should always perform a secondary bus reset if the only other option is
> > a PM reset. Maybe that policy mostly makes sense for a VM use case, so
> > we'd want one policy by default and another when the device is used for
> > this functionality. How could we accomplish that with a quirk? Thanks,
>
> I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
>
> If it is latter then we don't really need sysfs, if not, we still need
> some sort of DB to create second policy, because "supported != working".
> What am I missing?
vfio-pci uses the internal kernel API, ie. the variants of
pci_reset_function(), which is the same interface used by the existing
sysfs reset mechanism. This proposed configuration of the reset method
would affect any driver using that same core infrastructure and from my
perspective that's really the goal. In the case where a supported
reset mechanism fails for a device, continuing to quirk those out for
the best default behavior makes sense, I'd be disappointed for a vendor
to not pursue improving the default behavior where it clearly makes
sense. However, there's also a policy decision, the kernel imposes a
preferential ordering of reset mechanism. Is that ordering the best
case for all users? I've presented above a case where a userspace may
prefer a policy of preferring a bus reset to a PM reset. So I think
the question is not only are there supported mechanisms that don't
work, where this interface allows userspace to more readily identify
and work around those sorts of issues, but it also enables user
preference and easier evaluation whether all of the supported reset
mechanisms work rather than just the first one we encounter in the
ordering we've decided to impose today. Thanks,
Alex
On 21/03/18 04:57PM, Leon Romanovsky wrote:
> On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> > On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > > Leon Romanovsky <[email protected]> wrote:
> > > >
> > > > > On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> > > > > > On 21/03/17 01:47PM, Leon Romanovsky wrote:
> > > > > > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > > > > > > > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > > > > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > > > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > > > > > > Amey Narkhede <[email protected]> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > > > > defined here. Note that with this series the resets available through
> > > > > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Alex,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > > > > > > I don't know the details yet.
> > > > > > > > > > > > >
> > > > > > > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > > > > > > impose a fixed priority in our ordering. We can and probably should
> > > > > > > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > > > > > > to experiment, ie. this reset works, that one doesn't. We might also
> > > > > > > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > > > > > > platform. Exposing a way to test these things might lead to better
> > > > > > > > > > > > > quirks. In the case I think Pali was looking for, they wanted a
> > > > > > > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > > > > > > attribute, but also the reset mechanism used by vfio-pci. Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alex
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > > > > > > easier.
> > > > > > > > > > > >
> > > > > > > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > > > > > > (or other entities') findings should give better visibility into
> > > > > > > > > > > > which issues apply to the device in general and which are platform
> > > > > > > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > > > > > > current alternatives.
> > > > > > > > > > >
> > > > > > > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > > > > > > overcome HW bugs, am I right?
> > > > > > > > > > >
> > > > > > > > > > > My personal experience shows that once the easy workaround exists
> > > > > > > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > > > > > > >
> > > > > > > > > > > My 2-cents.
> > > > > > > > > > >
> > > > > > > > > > I agree with your point but at least it gives the userspace ability
> > > > > > > > > > to use broken device until bug is fixed in upstream.
> > > > > > > > >
> > > > > > > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > > > > > > use cheap workaround. There is no incentive to fix it.
> > > >
> > > > We can increase the annoyance factor of using a modified set of reset
> > > > methods, but ultimately we can only control what goes into our kernel,
> > > > other kernels might take v1 of this series and incorporate it
> > > > regardless of what happens here.
> > > >
> > > > > > > > > > This is also applicable for obscure devices without upstream
> > > > > > > > > > drivers for example custom FPGA based devices.
> > > > > > > > >
> > > > > > > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > > > > > > custom, they don't need upstream, we don't need them :)
> > > > > > > > >
> > > > > > > > By custom I meant hobbyists who could tinker with their custom FPGA.
> > > > > > >
> > > > > > > I invite such hobbyists to send patches and include their FPGA in
> > > > > > > upstream kernel.
> > > >
> > > > This is potentially another good use case, how receptive are we going
> > > > to be to an FPGA design that botches a reset. Do they have a valid
> > > > device ID for us to base a quirk on, are they just squatting on one, or
> > > > using the default from a library. Maybe the next bitstream will
> > > > resolve it, maybe without any external indication. IOW, what would the
> > > > quality level be for that quirk versus using this as a workaround,
> > > > where the user probably wouldn't mind a kernel nag?
> > >
> > > It is worth to solve it when the need arises.
> > >
> > > >
> > > > > > > > > > Another main application which I forgot to mention is virtualization
> > > > > > > > > > where vmm wants to reset the device when the guest is reset,
> > > > > > > > > > to emulate machine reboot as closely as possible.
> > > > > > > > >
> > > > > > > > > It can work in very narrow case, because reset will cause to device
> > > > > > > > > reprobe and most likely the driver will be different from the one that
> > > > > > > > > started reset. I can imagine that net devices will lose their state and
> > > > > > > > > config after such reset too.
> > > > > > > > >
> > > > > > > > Not sure if I got that 100% right. The pci_reset_function() function
> > > > > > > > saves and restores device state over the reset.
> > > > > > >
> > > > > > > I'm talking about netdev state, but whatever given the existence of
> > > > > > > sysfs reset knob.
> > > > > > >
> > > > > > > >
> > > > > > > > > IMHO, it will be saner for everyone if virtualization don't try such resets.
> > > >
> > > > That would cause a massive regression in device assignment support. As
> > > > with other sysfs attributes, triggering them alongside a running driver
> > > > is probably not going to end well. However, pci_reset_function() is
> > > > extremely useful for stopping devices and returning them to a default
> > > > state, when either rebooting a VM or returning the device to the host.
> > > > The device is not removed and re-probed when this occurs, vfio-pci is
> > > > able to hold onto the device across these actions. Sure, don't reset a
> > > > netdev device when it's in use, that's not what these are used for.
> > > >
> > > > > > > > The exists reset sysfs attribute was added for exactly this case
> > > > > > > > though.
> > > > > > >
> > > > > > > I didn't know the rationale behind that file till you said and I
> > > > > > > googled libvirt discussion, so ok. Do you propose that libvirt
> > > > > > > will manage database of devices and their working reset types?
> > > > > > >
> > > > > > I don't have much idea about internals of libvirt but why would
> > > > > > it need to manage database of working reset types? It could just
> > > > > > read new reset_methods attribute to get the list of supported reset
> > > > > > methods.
> > > > >
> > > > > Because the idea of this patch is to read all supported reset types and
> > > > > allow to the user to chose the working one. The user will do it with
> > > > > help from StackOverflow, but libvirt will need to have some sort of
> > > > > database, otherwise it won't be different from simple "echo 1 > reset"
> > > > > which will iterate over all supported resets anyway.
> > > >
> > > > AFAIK, libvirt no longer attempts to do resets itself, or is at least
> > > > moving in that direction. vfio-pci will reset as device when they're
> > > > opened by a user (when available) or triggered via the API.
> > >
> > > <...>
> > >
> > > > > The difference here is that this is a workaround to solve bugs that
> > > > > should be fixed in the kernel.
> > > >
> > > > If we want to discourage using this as a primary means to resolve reset
> > > > issues on a device then we can create log warnings any time it's used.
> > > > Downstreams that really want this functionality are going to take this
> > > > patch from the list whether we accept it or not. As above, it seems
> > > > there are valid use cases. Even with mainstream vfio in QEMU, I go
> > > > through some hoops trying to determine if I can do a secondary bus
> > > > reset rather than a PM reset because it's not specified anywhere what a
> > > > "soft reset" means for any given device. This sort of interface could
> > > > make it easier to apply a system policy that a pci_reset_function()
> > > > should always perform a secondary bus reset if the only other option is
> > > > a PM reset. Maybe that policy mostly makes sense for a VM use case, so
> > > > we'd want one policy by default and another when the device is used for
> > > > this functionality. How could we accomplish that with a quirk? Thanks,
> > >
> > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > >
> > > If it is latter then we don't really need sysfs, if not, we still need
> > > some sort of DB to create second policy, because "supported != working".
> > > What am I missing?
> > >
> > > Thanks
> > >
> > Can you explain bit more about why supported != working?
>
> It is written in the commit message of this patch.
> https://lore.kernel.org/lkml/[email protected]/
> "This feature aims to allow greater control of a device for use cases
> as device assignment, where specific device or platform issues may
> interact poorly with a given reset method, and for which device specific
> quirks have not been developed."
>
> You wrote it and also repeated it a couple of times during the discussion.
>
> If device can understand that specific reset doesn't work, it won't
> perform it in first place.
>
> Thanks
Is it possible for device to understand whether or not specific reset
will work or not prior to performing reset and after it indicates
support for that reset method? Maybe theres problem with that particular
piece of hardware in that machine.
How can database be maintained if a particular machines have
particular piece of faulty HW?
If for some reason reset doesn't work it will just give -ENOTTY.
This isn't any different from existing behavior.Actually it informs user
that the reset method didn't reset the device and user can use different
reset method instead of implicitly using different reset method.
If user doesn't explicitly set preferred reset method then
we go ahead with existing implicit fall through behavior which will try all
available reset methods until any one of them works.
If you have device that doesn't support reset at all then you have
option to completely disable it unlike existing reset attribute where
you cannot disable reset. So it gives greater control where you can
disable the reset altogether when quirk isn't developed yet.
We can't expect to develop quirk for every device in existence.
For example on my laptop elantech touchpad still doesn't work in 2021
with vanilla kernel, arch linux applies the patch which was reverted in
mainline kernel for some reason.
Thanks,
Amey
On Thu, Mar 18, 2021 at 10:39:35AM -0600, Alex Williamson wrote:
> On Thu, 18 Mar 2021 11:09:34 +0200
> Leon Romanovsky <[email protected]> wrote:
<...>
> > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> >
> > If it is latter then we don't really need sysfs, if not, we still need
> > some sort of DB to create second policy, because "supported != working".
> > What am I missing?
>
> vfio-pci uses the internal kernel API, ie. the variants of
> pci_reset_function(), which is the same interface used by the existing
> sysfs reset mechanism. This proposed configuration of the reset method
> would affect any driver using that same core infrastructure and from my
> perspective that's really the goal. In the case where a supported
> reset mechanism fails for a device, continuing to quirk those out for
> the best default behavior makes sense, I'd be disappointed for a vendor
> to not pursue improving the default behavior where it clearly makes
> sense. However, there's also a policy decision, the kernel imposes a
> preferential ordering of reset mechanism. Is that ordering the best
> case for all users? I've presented above a case where a userspace may
> prefer a policy of preferring a bus reset to a PM reset. So I think
> the question is not only are there supported mechanisms that don't
> work, where this interface allows userspace to more readily identify
> and work around those sorts of issues, but it also enables user
> preference and easier evaluation whether all of the supported reset
> mechanisms work rather than just the first one we encounter in the
> ordering we've decided to impose today. Thanks,
Alex,
Which email client do you use?
Your responses are grouped as one huge block without any chance to respond
to you on specific point or answer to your question.
I see your flow and understand your position, but will repeat my
position. We need to make sure that vendors will have incentive to
supply quirks.
And regarding vendors, see Amey response below about his touchpad troubles.
The cheap electronics vendors don't care about their users.
Thanks
>
> Alex
>
On Thu, Mar 18, 2021 at 10:31:43PM +0530, Amey Narkhede wrote:
> On 21/03/18 04:57PM, Leon Romanovsky wrote:
> > On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> > > On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > > > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > > > Leon Romanovsky <[email protected]> wrote:
<...>
> > > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > > >
> > > > If it is latter then we don't really need sysfs, if not, we still need
> > > > some sort of DB to create second policy, because "supported != working".
> > > > What am I missing?
> > > >
> > > > Thanks
> > > >
> > > Can you explain bit more about why supported != working?
> >
> > It is written in the commit message of this patch.
> > https://lore.kernel.org/lkml/[email protected]/
> > "This feature aims to allow greater control of a device for use cases
> > as device assignment, where specific device or platform issues may
> > interact poorly with a given reset method, and for which device specific
> > quirks have not been developed."
> >
> > You wrote it and also repeated it a couple of times during the discussion.
> >
> > If device can understand that specific reset doesn't work, it won't
> > perform it in first place.
> >
> > Thanks
> Is it possible for device to understand whether or not specific reset
> will work or not prior to performing reset and after it indicates
> support for that reset method? Maybe theres problem with that particular
> piece of hardware in that machine.
> How can database be maintained if a particular machines have
> particular piece of faulty HW?
It was exactly the reason why I think that VM usecase presented by
you is not viable.
> If for some reason reset doesn't work it will just give -ENOTTY.
> This isn't any different from existing behavior.Actually it informs user
> that the reset method didn't reset the device and user can use different
> reset method instead of implicitly using different reset method.
> If user doesn't explicitly set preferred reset method then
> we go ahead with existing implicit fall through behavior which will try all
> available reset methods until any one of them works.
> If you have device that doesn't support reset at all then you have
> option to completely disable it unlike existing reset attribute where
> you cannot disable reset. So it gives greater control where you can
> disable the reset altogether when quirk isn't developed yet.
I explicitly asked to hear usecase, right now, I got an explanation from
Alex for policy decision (which doesn't need sysfs) and from you about
overcoming HW bugs with expectation that user will be guru of PCI reset
methods.
>
> We can't expect to develop quirk for every device in existence.
It doesn't give us an excuse do not try.
> For example on my laptop elantech touchpad still doesn't work in 2021
> with vanilla kernel, arch linux applies the patch which was reverted in
> mainline kernel for some reason.
I see it as a good example of cheap solution. Vendor won't fix your
touchpad because distros provide workaround. The same will be with reset.
Thanks
>
> Thanks,
> Amey
On 21/03/18 07:22PM, Leon Romanovsky wrote:
> On Thu, Mar 18, 2021 at 10:39:35AM -0600, Alex Williamson wrote:
> > On Thu, 18 Mar 2021 11:09:34 +0200
> > Leon Romanovsky <[email protected]> wrote:
>
> <...>
>
> > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > >
> > > If it is latter then we don't really need sysfs, if not, we still need
> > > some sort of DB to create second policy, because "supported != working".
> > > What am I missing?
> >
> > vfio-pci uses the internal kernel API, ie. the variants of
> > pci_reset_function(), which is the same interface used by the existing
> > sysfs reset mechanism. This proposed configuration of the reset method
> > would affect any driver using that same core infrastructure and from my
> > perspective that's really the goal. In the case where a supported
> > reset mechanism fails for a device, continuing to quirk those out for
> > the best default behavior makes sense, I'd be disappointed for a vendor
> > to not pursue improving the default behavior where it clearly makes
> > sense. However, there's also a policy decision, the kernel imposes a
> > preferential ordering of reset mechanism. Is that ordering the best
> > case for all users? I've presented above a case where a userspace may
> > prefer a policy of preferring a bus reset to a PM reset. So I think
> > the question is not only are there supported mechanisms that don't
> > work, where this interface allows userspace to more readily identify
> > and work around those sorts of issues, but it also enables user
> > preference and easier evaluation whether all of the supported reset
> > mechanisms work rather than just the first one we encounter in the
> > ordering we've decided to impose today. Thanks,
>
>
[...]
> And regarding vendors, see Amey response below about his touchpad troubles.
> The cheap electronics vendors don't care about their users.
>
> Thanks
>
On the side note that vendor probably doesn't care about
Linux users because even that reverted patch was submitted
by community member.
Many vendors are satisfied with windows only drivers.
They don't have any reason to support Linux. That doesn't
mean we should also abandon those users.
Thanks,
Amey
On 21/03/18 07:35PM, Leon Romanovsky wrote:
> On Thu, Mar 18, 2021 at 10:31:43PM +0530, Amey Narkhede wrote:
> > On 21/03/18 04:57PM, Leon Romanovsky wrote:
> > > On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> > > > On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > > > > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > > > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > > > > Leon Romanovsky <[email protected]> wrote:
>
> <...>
>
> > > > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > > > >
> > > > > If it is latter then we don't really need sysfs, if not, we still need
> > > > > some sort of DB to create second policy, because "supported != working".
> > > > > What am I missing?
> > > > >
> > > > > Thanks
> > > > >
> > > > Can you explain bit more about why supported != working?
> > >
> > > It is written in the commit message of this patch.
> > > https://lore.kernel.org/lkml/[email protected]/
> > > "This feature aims to allow greater control of a device for use cases
> > > as device assignment, where specific device or platform issues may
> > > interact poorly with a given reset method, and for which device specific
> > > quirks have not been developed."
> > >
> > > You wrote it and also repeated it a couple of times during the discussion.
> > >
> > > If device can understand that specific reset doesn't work, it won't
> > > perform it in first place.
> > >
> > > Thanks
> > Is it possible for device to understand whether or not specific reset
> > will work or not prior to performing reset and after it indicates
> > support for that reset method? Maybe theres problem with that particular
> > piece of hardware in that machine.
> > How can database be maintained if a particular machines have
> > particular piece of faulty HW?
>
> It was exactly the reason why I think that VM usecase presented by
> you is not viable.
>
Well I didn't present it as new use case. I just gave existing
usecase based on existing reset attribute. Nothing new here.
Nothing really changes wrt that use case.
> > If for some reason reset doesn't work it will just give -ENOTTY.
> > This isn't any different from existing behavior.Actually it informs user
> > that the reset method didn't reset the device and user can use different
> > reset method instead of implicitly using different reset method.
> > If user doesn't explicitly set preferred reset method then
> > we go ahead with existing implicit fall through behavior which will try all
> > available reset methods until any one of them works.
> > If you have device that doesn't support reset at all then you have
> > option to completely disable it unlike existing reset attribute where
> > you cannot disable reset. So it gives greater control where you can
> > disable the reset altogether when quirk isn't developed yet.
>
> I explicitly asked to hear usecase, right now, I got an explanation from
> Alex for policy decision (which doesn't need sysfs) and from you about
> overcoming HW bugs with expectation that user will be guru of PCI reset
> methods.
>
> >
> > We can't expect to develop quirk for every device in existence.
>
> It doesn't give us an excuse do not try.
>
> > For example on my laptop elantech touchpad still doesn't work in 2021
> > with vanilla kernel, arch linux applies the patch which was reverted in
> > mainline kernel for some reason.
>
> I see it as a good example of cheap solution. Vendor won't fix your
> touchpad because distros provide workaround. The same will be with reset.
>
> Thanks
>
As mentioned earlier not all vendors care about Linux and not
all of the population can afford to buy new HW just to run Linux.
Thanks,
Amey
On 15.03.21 00:55, Pali Rohár wrote:
> Moreover for mPCIe form factor cards, boards can share one PERST# signal
> with more PCIe cards and control this signal via GPIO. So asserting
> PERST# GPIO can trigger Warm reset for more PCIe cards, not just one. It
> depends on board or topology.
The pcengines apu* boards happen to be such candidates: they've got
three m.2 slots, but not all wired in the same way (depending on actual
model, not all have pcie wired). Reset lines are driven via gpio, and
some devices (I recall some lte basebands) sometimes need an explicit
reset in order to come up properly.
I have to check the schematics for the diffrent models, how exactly
these gpios are wired. (i've got reports that some production lines
don't have them wired at all - but couldn't confirm this on my own).
BTW: any idea how to inject board specific reset methods, after the
host brigde driver is already active ? In my case, apu boards, the
pci host bridge is probed via acpi and the apu board driver (which sets
up gpios, leds, keys, ...) comes much later.
--mtx
--
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287
On 18.03.21 18:35, Leon Romanovsky wrote:
> I see it as a good example of cheap solution. Vendor won't fix your
> touchpad because distros provide workaround. The same will be with reset.
Usually, vendor won't fix it, anyways, regardless of any kernel
workarounds.
Most Vendors are already completely overstrained w/ anything
software-related. A good reason why we should try to get rid firmware,
as much as we can.
It's really sad. A *decent* vendor would just provide a clean DT and
(actually matching!) schematics. But that's really hard to find, these
days :(
--mtx
--
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287
On 18.03.21 18:43, Amey Narkhede wrote:
> Well I didn't present it as new use case. I just gave existing
> usecase based on existing reset attribute. Nothing new here.
> Nothing really changes wrt that use case.
As a board driver maintainer, I fully support your case. At least as a
development/debugging. And even if people out there play around and find
their own workarounds, these can give us maintainers valuable insights
and save us a lot of time.
> As mentioned earlier not all vendors care about Linux and not
> all of the population can afford to buy new HW just to run Linux.
At least in the x86 world (arm is *much* better here), even the
(supposedly) Linux-friendly ones often don't really care, especially if
the board isn't the newerst model anymore.
Unfortunately, what we do or don't do in the kernel has practically no
influence on board vendor decisions. The best we can practically achieve
at their side is slowing them down on smearing bullshit into FW and acpi
tables. Even getting some useful documentation from vendors is a really
rare thing.
ARM world with device tree, of course, is much better (except for closed
consumer devices like "smartphones" or acpi-poisoned arm64 boxes). At
least for profession embedded boards.
--mtx
--
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287
On 18.03.21 18:22, Leon Romanovsky wrote:
> Which email client do you use?
> Your responses are grouped as one huge block without any chance to respond
> to you on specific point or answer to your question.
I'm reading this thread in Tbird, and threading / quoting all looks
nice.
> I see your flow and understand your position, but will repeat my
> position. We need to make sure that vendors will have incentive to
> supply quirks.
I really doubt we can influence that by any technical decision here in
the kernel.
> And regarding vendors, see Amey response below about his touchpad troubles.
> The cheap electronics vendors don't care about their users.
IMHO, the expensive ones don't care either.
Does eg. Dell publish board schematics ? Do they even publish exact part
lists (exact chipsets) along with their brochures, so customers can
check wether their HW is supported, before buying and trying out ?
Doesn't seem so. I've personally seen a lot cases where some supposedly
supported HW turned out to be some completely different and unsupported
HW that's sold under exactly the same product ID. One of many reasons
for not giving them a single penny anymore.
IMHO, there're only very few changes of convincing some HW vendor for
doing a better job on driver side:
a) product is targeted for a niche that can't live without Linux
(eg. embedded)
b) it's really *dangerous* for your market share if anything doesn't
work properly on Linux (eg. certan server machines)
c) somebody *really* big (like Google) is gun-pointing at some supplier,
who's got a lot to loose
d) a *massive* worldwide shitstorm against the vendor
[ And often, even a combination of them isn't enough. Did you know that
even Google doesn't get all specs necessary to replace away the ugly
FSP blob ? (it's the same w/ AMD, but meanwhile I'm pissed enought to
reverse engineer their AGESA blob). ]
You see, what we do here in the kernel has no practical influence on
those hw vendors.
--mtx
--
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287
On Thu, Mar 18, 2021 at 07:34:56PM +0100, Enrico Weigelt, metux IT consult wrote:
> On 18.03.21 18:22, Leon Romanovsky wrote:
>
> > Which email client do you use?
> > Your responses are grouped as one huge block without any chance to respond
> > to you on specific point or answer to your question.
>
> I'm reading this thread in Tbird, and threading / quoting all looks
> nice.
I'm not talking about threading or quoting but about response itself.
See it here https://lore.kernel.org/lkml/[email protected]/
Alex's response is one big chunk without any separations to paragraphs.
>
> > I see your flow and understand your position, but will repeat my
> > position. We need to make sure that vendors will have incentive to
> > supply quirks.
>
> I really doubt we can influence that by any technical decision here in
> the kernel.
There are subsystems that succeeded to do it, for example netdev, RDMA e.t.c.
>
> > And regarding vendors, see Amey response below about his touchpad troubles.
> > The cheap electronics vendors don't care about their users.
>
> IMHO, the expensive ones don't care either.
>
> Does eg. Dell publish board schematics ? Do they even publish exact part
> lists (exact chipsets) along with their brochures, so customers can
> check wether their HW is supported, before buying and trying out ?
They do it because they are allowed to do it and not because they
explicitly want to annoyance their customers.
>
> Doesn't seem so. I've personally seen a lot cases where some supposedly
> supported HW turned out to be some completely different and unsupported
> HW that's sold under exactly the same product ID. One of many reasons
> for not giving them a single penny anymore.
>
> IMHO, there're only very few changes of convincing some HW vendor for
> doing a better job on driver side:
>
> a) product is targeted for a niche that can't live without Linux
> (eg. embedded)
> b) it's really *dangerous* for your market share if anything doesn't
> work properly on Linux (eg. certan server machines)
> c) somebody *really* big (like Google) is gun-pointing at some supplier,
> who's got a lot to loose
> d) a *massive* worldwide shitstorm against the vendor
>
> [ And often, even a combination of them isn't enough. Did you know that
> even Google doesn't get all specs necessary to replace away the ugly
> FSP blob ? (it's the same w/ AMD, but meanwhile I'm pissed enought to
> reverse engineer their AGESA blob). ]
I don't know about this specific Google case, but from my previous experience.
The reasons why vendor says no to Google are usually due to licensing and legal
issues and not open source vs. proprietary.
>
> You see, what we do here in the kernel has no practical influence on
> those hw vendors.
I see it differently, but it doesn't matter. This is too theoretical
discussion to my taste.
>
>
> --mtx
>
> --
> ---
> Hinweis: unverschl?sselte E-Mails k?nnen leicht abgeh?rt und manipuliert
> werden ! F?r eine vertrauliche Kommunikation senden Sie bitte ihren
> GPG/PGP-Schl?ssel zu.
> ---
> Enrico Weigelt, metux IT consult
> Free software and Linux embedded engineering
> [email protected] -- +49-151-27565287
On Thu, Mar 18, 2021 at 11:13:44PM +0530, Amey Narkhede wrote:
> On 21/03/18 07:35PM, Leon Romanovsky wrote:
> > On Thu, Mar 18, 2021 at 10:31:43PM +0530, Amey Narkhede wrote:
> > > On 21/03/18 04:57PM, Leon Romanovsky wrote:
> > > > On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> > > > > On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > > > > > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > > > > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > > > > > Leon Romanovsky <[email protected]> wrote:
> >
> > <...>
> >
> > > > > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > > > > >
> > > > > > If it is latter then we don't really need sysfs, if not, we still need
> > > > > > some sort of DB to create second policy, because "supported != working".
> > > > > > What am I missing?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > Can you explain bit more about why supported != working?
> > > >
> > > > It is written in the commit message of this patch.
> > > > https://lore.kernel.org/lkml/[email protected]/
> > > > "This feature aims to allow greater control of a device for use cases
> > > > as device assignment, where specific device or platform issues may
> > > > interact poorly with a given reset method, and for which device specific
> > > > quirks have not been developed."
> > > >
> > > > You wrote it and also repeated it a couple of times during the discussion.
> > > >
> > > > If device can understand that specific reset doesn't work, it won't
> > > > perform it in first place.
> > > >
> > > > Thanks
> > > Is it possible for device to understand whether or not specific reset
> > > will work or not prior to performing reset and after it indicates
> > > support for that reset method? Maybe theres problem with that particular
> > > piece of hardware in that machine.
> > > How can database be maintained if a particular machines have
> > > particular piece of faulty HW?
> >
> > It was exactly the reason why I think that VM usecase presented by
> > you is not viable.
> >
> Well I didn't present it as new use case. I just gave existing
> usecase based on existing reset attribute. Nothing new here.
> Nothing really changes wrt that use case.
Of course it is new, please see Alex's response, he said that vfio uses
in-kernel API and not sysfs.
> > > If for some reason reset doesn't work it will just give -ENOTTY.
> > > This isn't any different from existing behavior.Actually it informs user
> > > that the reset method didn't reset the device and user can use different
> > > reset method instead of implicitly using different reset method.
> > > If user doesn't explicitly set preferred reset method then
> > > we go ahead with existing implicit fall through behavior which will try all
> > > available reset methods until any one of them works.
> > > If you have device that doesn't support reset at all then you have
> > > option to completely disable it unlike existing reset attribute where
> > > you cannot disable reset. So it gives greater control where you can
> > > disable the reset altogether when quirk isn't developed yet.
> >
> > I explicitly asked to hear usecase, right now, I got an explanation from
> > Alex for policy decision (which doesn't need sysfs) and from you about
> > overcoming HW bugs with expectation that user will be guru of PCI reset
> > methods.
> >
> > >
> > > We can't expect to develop quirk for every device in existence.
> >
> > It doesn't give us an excuse do not try.
> >
> > > For example on my laptop elantech touchpad still doesn't work in 2021
> > > with vanilla kernel, arch linux applies the patch which was reverted in
> > > mainline kernel for some reason.
> >
> > I see it as a good example of cheap solution. Vendor won't fix your
> > touchpad because distros provide workaround. The same will be with reset.
> >
> > Thanks
> >
> As mentioned earlier not all vendors care about Linux and not
> all of the population can afford to buy new HW just to run Linux.
Sorry, but you are not consistent. At the beginning, we talked about new HW
that has bugs but don't have quirks yet. Here we are talking about old HW
that still doesn't have quirks.
Thanks
>
> Thanks,
> Amey
On Thu, Mar 18, 2021 at 06:58:25PM +0100, Enrico Weigelt, metux IT consult wrote:
> On 18.03.21 18:35, Leon Romanovsky wrote:
>
> > I see it as a good example of cheap solution. Vendor won't fix your
> > touchpad because distros provide workaround. The same will be with reset.
>
> Usually, vendor won't fix it, anyways, regardless of any kernel
> workarounds.
It is not only vendors, but enthusiasts won't fix too, because their
distro works.
Thanks
On 19.03.21 13:59, Leon Romanovsky wrote:
>> I really doubt we can influence that by any technical decision here in
>> the kernel.
>
> There are subsystems that succeeded to do it, for example netdev, RDMA e.t.c.
I'd guess either hi-end / server or embedded products - already
mentioned that these are different fields. I've been talking about the
average consumer products.
OTOH, there're also very expensive vendors that are exceptionally bad,
eg. National instruments (who even are capable of breaking rpm so badly
with their proprietary packages that they open up 0day holes - i once
filed a report @FD on such a case).
>> IMHO, the expensive ones don't care either.
>>
>> Does eg. Dell publish board schematics ? Do they even publish exact part
>> lists (exact chipsets) along with their brochures, so customers can
>> check wether their HW is supported, before buying and trying out ?
>
> They do it because they are allowed to do it and not because they
> explicitly want to annoyance their customers.
Yes, they're just ignorant. They can still do that, because buy their
pretty expensive cheap-hardware. And that's mostly driven by purchase
people inside the customer organisations, who just don't care how much
damage they do to their own employers, by dictating purchase of
expensive broken-by-design hardware. ... but that's nothing we here have
any influence on - except for dissuasion and purchase boycott ...
In any case, I still fail to see why giving operators an debug knob
should make anything worse.
>> [ And often, even a combination of them isn't enough. Did you know that
>> even Google doesn't get all specs necessary to replace away the ugly
>> FSP blob ? (it's the same w/ AMD, but meanwhile I'm pissed enought to
>> reverse engineer their AGESA blob). ]
>
> I don't know about this specific Google case, but from my previous experience.
> The reasons why vendor says no to Google are usually due to licensing and legal
> issues and not open source vs. proprietary.
In short words: Google did (still does?) build their own mainboards and
FW (IIRC that's where LinuxBoot came from), but even with their HUGE
quantities (they buy cpus in quantities of truck loads) they still did
not manage to get any specs for writing their own early init w/o the
proprietary FSP.
The licensing / legal issues can either be:
a) we, the mightly Intel Corp., have been so extremly stupid for
licensing some vital IP stuff (what exactly could that be, in exactly
the prime domain of Intel ?) and signing such insane crontracts, that
we're not allowed to tell anybody how to actually use our own
products (yes: initializing the CPU and built-in interfaces belongs
exactly into that category)
b) we, the mighty Intel Corp., couldn't build something on our own, but
just stolen IP (in our primary domain) and are scared that anybody
could find out from just reading some early setup code.
c) we, the mighty Intel Corp., rule the world and we give a phrack on
what some tiny Customers like Google want from us.
d) we, the mightly Intel Corp., did do what our name tells: INTEL,
and we don't want anybody raise unpleasant questions.
choose your poison :P
--mtx
--
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
[email protected] -- +49-151-27565287
On 21/03/19 03:05PM, Leon Romanovsky wrote:
> On Thu, Mar 18, 2021 at 11:13:44PM +0530, Amey Narkhede wrote:
> > On 21/03/18 07:35PM, Leon Romanovsky wrote:
> > > On Thu, Mar 18, 2021 at 10:31:43PM +0530, Amey Narkhede wrote:
> > > > On 21/03/18 04:57PM, Leon Romanovsky wrote:
> > > > > On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> > > > > > On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > > > > > > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > > > > > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > > > > > > Leon Romanovsky <[email protected]> wrote:
> > >
> > > <...>
> > >
> > > > > > > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> > > > > > >
> > > > > > > If it is latter then we don't really need sysfs, if not, we still need
> > > > > > > some sort of DB to create second policy, because "supported != working".
> > > > > > > What am I missing?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > Can you explain bit more about why supported != working?
> > > > >
> > > > > It is written in the commit message of this patch.
> > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > > "This feature aims to allow greater control of a device for use cases
> > > > > as device assignment, where specific device or platform issues may
> > > > > interact poorly with a given reset method, and for which device specific
> > > > > quirks have not been developed."
> > > > >
> > > > > You wrote it and also repeated it a couple of times during the discussion.
> > > > >
> > > > > If device can understand that specific reset doesn't work, it won't
> > > > > perform it in first place.
> > > > >
> > > > > Thanks
> > > > Is it possible for device to understand whether or not specific reset
> > > > will work or not prior to performing reset and after it indicates
> > > > support for that reset method? Maybe theres problem with that particular
> > > > piece of hardware in that machine.
> > > > How can database be maintained if a particular machines have
> > > > particular piece of faulty HW?
> > >
> > > It was exactly the reason why I think that VM usecase presented by
> > > you is not viable.
> > >
> > Well I didn't present it as new use case. I just gave existing
> > usecase based on existing reset attribute. Nothing new here.
> > Nothing really changes wrt that use case.
>
> Of course it is new, please see Alex's response, he said that vfio uses
> in-kernel API and not sysfs.
>
Still it doesn't change in-kernel API either.
> > > > If for some reason reset doesn't work it will just give -ENOTTY.
> > > > This isn't any different from existing behavior.Actually it informs user
> > > > that the reset method didn't reset the device and user can use different
> > > > reset method instead of implicitly using different reset method.
> > > > If user doesn't explicitly set preferred reset method then
> > > > we go ahead with existing implicit fall through behavior which will try all
> > > > available reset methods until any one of them works.
> > > > If you have device that doesn't support reset at all then you have
> > > > option to completely disable it unlike existing reset attribute where
> > > > you cannot disable reset. So it gives greater control where you can
> > > > disable the reset altogether when quirk isn't developed yet.
> > >
> > > I explicitly asked to hear usecase, right now, I got an explanation from
> > > Alex for policy decision (which doesn't need sysfs) and from you about
> > > overcoming HW bugs with expectation that user will be guru of PCI reset
> > > methods.
> > >
> > > >
> > > > We can't expect to develop quirk for every device in existence.
> > >
> > > It doesn't give us an excuse do not try.
> > >
> > > > For example on my laptop elantech touchpad still doesn't work in 2021
> > > > with vanilla kernel, arch linux applies the patch which was reverted in
> > > > mainline kernel for some reason.
> > >
> > > I see it as a good example of cheap solution. Vendor won't fix your
> > > touchpad because distros provide workaround. The same will be with reset.
> > >
> > > Thanks
> > >
> > As mentioned earlier not all vendors care about Linux and not
> > all of the population can afford to buy new HW just to run Linux.
>
> Sorry, but you are not consistent. At the beginning, we talked about new HW
> that has bugs but don't have quirks yet. Here we are talking about old HW
> that still doesn't have quirks.
>
> Thanks
>
Does it really matter whether HW is old or new?
If old HW doesn't have quirks yet how can we expect
new one to have quirks? What if new HW is made by same vendors
who don't have any interest in Linux?
Thanks,
Amey
On Fri, Mar 19, 2021 at 08:53:17PM +0530, Amey Narkhede wrote:
> On 21/03/19 03:05PM, Leon Romanovsky wrote:
<...>
> > > > It was exactly the reason why I think that VM usecase presented by
> > > > you is not viable.
> > > >
> > > Well I didn't present it as new use case. I just gave existing
> > > usecase based on existing reset attribute. Nothing new here.
> > > Nothing really changes wrt that use case.
> >
> > Of course it is new, please see Alex's response, he said that vfio uses
> > in-kernel API and not sysfs.
> >
> Still it doesn't change in-kernel API either.
Right, but the issue is with user space part of this proposal and not
in-kernel API.
<...>
> > > As mentioned earlier not all vendors care about Linux and not
> > > all of the population can afford to buy new HW just to run Linux.
> >
> > Sorry, but you are not consistent. At the beginning, we talked about new HW
> > that has bugs but don't have quirks yet. Here we are talking about old HW
> > that still doesn't have quirks.
> >
> > Thanks
> >
> Does it really matter whether HW is old or new?
> If old HW doesn't have quirks yet how can we expect
> new one to have quirks? What if new HW is made by same vendors
> who don't have any interest in Linux?
It is pretty clear that this sysfs won't improve quirks situation but
has all potential to reduce their amount even more.
Let's stop this discussion here.
Thanks
>
> Thanks,
> Amey
On Fri, Mar 19, 2021 at 02:48:12PM +0100, Enrico Weigelt, metux IT consult wrote:
> On 19.03.21 13:59, Leon Romanovsky wrote:
<...>
> In any case, I still fail to see why giving operators an debug knob
> should make anything worse.
I see this patch as a workaround to stop and provide quirks for reset issues.
As a way forward, we can do this sysfs visible for DEBUG/EXPERT .config builds.
What do you think?
>
> > > [ And often, even a combination of them isn't enough. Did you know that
> > > even Google doesn't get all specs necessary to replace away the ugly
> > > FSP blob ? (it's the same w/ AMD, but meanwhile I'm pissed enought to
> > > reverse engineer their AGESA blob). ]
> >
> > I don't know about this specific Google case, but from my previous experience.
> > The reasons why vendor says no to Google are usually due to licensing and legal
> > issues and not open source vs. proprietary.
>
> In short words: Google did (still does?) build their own mainboards and
> FW (IIRC that's where LinuxBoot came from), but even with their HUGE
> quantities (they buy cpus in quantities of truck loads) they still did
> not manage to get any specs for writing their own early init w/o the
> proprietary FSP.
>
> The licensing / legal issues can either be:
>
> a) we, the mightly Intel Corp., have been so extremly stupid for
> licensing some vital IP stuff (what exactly could that be, in exactly
> the prime domain of Intel ?) and signing such insane crontracts, that
> we're not allowed to tell anybody how to actually use our own
> products (yes: initializing the CPU and built-in interfaces belongs
> exactly into that category)
> b) we, the mighty Intel Corp., couldn't build something on our own, but
> just stolen IP (in our primary domain) and are scared that anybody
> could find out from just reading some early setup code.
> c) we, the mighty Intel Corp., rule the world and we give a phrack on
> what some tiny Customers like Google want from us.
> d) we, the mightly Intel Corp., did do what our name tells: INTEL,
> and we don't want anybody raise unpleasant questions.
I would say
e) We, Intel, have fixes and optimization logic (patented or specific to different
customers) that is applicable to our HW and we can't open it to Google because it
will be used against us, in procurement and development. See recent article about
ex-Intel employee who used this information when placed bids in Microsoft.
https://www.usnews.com/news/best-states/oregon/articles/2021-02-08/intel-sues-engineer-who-went-to-microsoft-over-trade-secrets
>
>
> choose your poison :P
>
>
> --mtx
>
> --
> ---
> Hinweis: unverschl?sselte E-Mails k?nnen leicht abgeh?rt und manipuliert
> werden ! F?r eine vertrauliche Kommunikation senden Sie bitte ihren
> GPG/PGP-Schl?ssel zu.
> ---
> Enrico Weigelt, metux IT consult
> Free software and Linux embedded engineering
> [email protected] -- +49-151-27565287
On 21/03/19 05:37PM, Leon Romanovsky wrote:
> On Fri, Mar 19, 2021 at 08:53:17PM +0530, Amey Narkhede wrote:
> > On 21/03/19 03:05PM, Leon Romanovsky wrote:
>
> <...>
>
> > > > > It was exactly the reason why I think that VM usecase presented by
> > > > > you is not viable.
> > > > >
> > > > Well I didn't present it as new use case. I just gave existing
> > > > usecase based on existing reset attribute. Nothing new here.
> > > > Nothing really changes wrt that use case.
> > >
> > > Of course it is new, please see Alex's response, he said that vfio uses
> > > in-kernel API and not sysfs.
> > >
> > Still it doesn't change in-kernel API either.
>
> Right, but the issue is with user space part of this proposal and not
> in-kernel API.
Userspace part just inhances existing reset attribute still no
significant changes there.
>
>
> <...>
>
> > > > As mentioned earlier not all vendors care about Linux and not
> > > > all of the population can afford to buy new HW just to run Linux.
> > >
> > > Sorry, but you are not consistent. At the beginning, we talked about new HW
> > > that has bugs but don't have quirks yet. Here we are talking about old HW
> > > that still doesn't have quirks.
> > >
> > > Thanks
> > >
> > Does it really matter whether HW is old or new?
> > If old HW doesn't have quirks yet how can we expect
> > new one to have quirks? What if new HW is made by same vendors
> > who don't have any interest in Linux?
>
> It is pretty clear that this sysfs won't improve quirks situation but
> has all potential to reduce their amount even more.
>
> Let's stop this discussion here.
>
> Thanks
>
IMO it does improve usability of devices which I consider to be more
important than developing quirks which are just bandages in the end
not HW fix. There's no point in using Linux if
I can't use the device in the first place and expecting to wait
for some community member to develop quirk without vendor support
is simply unrealistic.
So let's stop this discussion here.
Thanks,
Amey
On Fri, Mar 19, 2021 at 02:59:47PM +0200, Leon Romanovsky wrote:
> On Thu, Mar 18, 2021 at 07:34:56PM +0100, Enrico Weigelt, metux IT consult wrote:
> > On 18.03.21 18:22, Leon Romanovsky wrote:
> >
> > > Which email client do you use? Your responses are grouped as
> > > one huge block without any chance to respond to you on specific
> > > point or answer to your question.
> >
> > I'm reading this thread in Tbird, and threading / quoting all
> > looks nice.
>
> I'm not talking about threading or quoting but about response
> itself. See it here
> https://lore.kernel.org/lkml/[email protected]/
> Alex's response is one big chunk without any separations to
> paragraphs.
Don't make this harder than it needs to be. I think it's totally
acceptable to just split Alex's text where you need to respond. For
example, Alex wrote this:
vfio-pci uses the internal kernel API, ie. the variants of
pci_reset_function(), which is the same interface used by the existing
sysfs reset mechanism. This proposed configuration of the reset method
would affect any driver using that same core infrastructure and from my
perspective that's really the goal. ...
If I wanted to respond to the first sentence, I would just do this:
aw> vfio-pci uses the internal kernel API, ie. the variants of
aw> pci_reset_function(), which is the same interface used by the existing
aw> sysfs reset mechanism.
I would write my response to the above here. The rest of the quote
continues on below. If the rest of Alex's message isn't relevant to
my response, I would remove it completely.
aw> This proposed configuration of the reset method
aw> would affect any driver using that same core infrastructure and from my
aw> perspective that's really the goal. ...
Bjorn
On Fri, 19 Mar 2021 14:59:47 +0200
Leon Romanovsky <[email protected]> wrote:
> On Thu, Mar 18, 2021 at 07:34:56PM +0100, Enrico Weigelt, metux IT consult wrote:
> > On 18.03.21 18:22, Leon Romanovsky wrote:
> >
> > > Which email client do you use?
> > > Your responses are grouped as one huge block without any chance to respond
> > > to you on specific point or answer to your question.
> >
> > I'm reading this thread in Tbird, and threading / quoting all looks
> > nice.
>
> I'm not talking about threading or quoting but about response itself.
> See it here https://lore.kernel.org/lkml/[email protected]/
> Alex's response is one big chunk without any separations to paragraphs.
I've never known paragraph breaks to be required to interject a reply.
Back on topic...
> >
> > > I see your flow and understand your position, but will repeat my
> > > position. We need to make sure that vendors will have incentive to
> > > supply quirks.
What if we taint the kernel or pci_warn() for cases where either all
the reset methods are disabled, ie. 'echo none > reset_method', or any
time a device specific method is disabled?
I'd almost go so far as to prevent disabling a device specific reset
altogether, but for example should a device specific reset that fixes
an aspect of FLR behavior prevent using a bus reset? I'd prefer in that
case if direct FLR were disabled via a device flag introduced with the
quirk and the remaining resets can still be selected by preference.
Theoretically all the other reset methods work and are available, it's
only a policy decision which to use, right?
If a device probes for a reset that's broken and distros start
including systemd scripts to apply a preference to avoid it, (a) that
enables them to work with existing kernels, and (b) indicates to us to
add the trivial quirk to flag that reset as broken.
The other side of the argument that this discourages quirks is that
this interface actually makes it significantly easier to report specific
reset methods as broken for a given device.
Thanks,
Alex
On Fri, Mar 19, 2021 at 10:57:11AM -0500, Bjorn Helgaas wrote:
> On Fri, Mar 19, 2021 at 02:59:47PM +0200, Leon Romanovsky wrote:
> > On Thu, Mar 18, 2021 at 07:34:56PM +0100, Enrico Weigelt, metux IT consult wrote:
> > > On 18.03.21 18:22, Leon Romanovsky wrote:
> > >
> > > > Which email client do you use? Your responses are grouped as
> > > > one huge block without any chance to respond to you on specific
> > > > point or answer to your question.
> > >
> > > I'm reading this thread in Tbird, and threading / quoting all
> > > looks nice.
> >
> > I'm not talking about threading or quoting but about response
> > itself. See it here
> > https://lore.kernel.org/lkml/[email protected]/
> > Alex's response is one big chunk without any separations to
> > paragraphs.
>
> Don't make this harder than it needs to be. I think it's totally
> acceptable to just split Alex's text where you need to respond. For
> example, Alex wrote this:
>
> vfio-pci uses the internal kernel API, ie. the variants of
> pci_reset_function(), which is the same interface used by the existing
> sysfs reset mechanism. This proposed configuration of the reset method
> would affect any driver using that same core infrastructure and from my
> perspective that's really the goal. ...
>
> If I wanted to respond to the first sentence, I would just do this:
>
> aw> vfio-pci uses the internal kernel API, ie. the variants of
> aw> pci_reset_function(), which is the same interface used by the existing
> aw> sysfs reset mechanism.
>
> I would write my response to the above here. The rest of the quote
> continues on below. If the rest of Alex's message isn't relevant to
> my response, I would remove it completely.
>
> aw> This proposed configuration of the reset method
> aw> would affect any driver using that same core infrastructure and from my
> aw> perspective that's really the goal. ...
>
> Bjorn
Thanks Bjorn, you presented me how to respond on such messages, however
I was more afraid if my setup needs some adjustments and it is only me
who sees it as one chunk.
Thanks
On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:
> On Fri, 19 Mar 2021 14:59:47 +0200
> Leon Romanovsky <[email protected]> wrote:
>
> > On Thu, Mar 18, 2021 at 07:34:56PM +0100, Enrico Weigelt, metux IT consult wrote:
> > > On 18.03.21 18:22, Leon Romanovsky wrote:
> > >
> > > > Which email client do you use?
> > > > Your responses are grouped as one huge block without any chance to respond
> > > > to you on specific point or answer to your question.
> > >
> > > I'm reading this thread in Tbird, and threading / quoting all looks
> > > nice.
> >
> > I'm not talking about threading or quoting but about response itself.
> > See it here https://lore.kernel.org/lkml/[email protected]/
> > Alex's response is one big chunk without any separations to paragraphs.
>
> I've never known paragraph breaks to be required to interject a reply.
Of course not, but as Bjorn said if you don't do paragraphs, we will
need manually break your message, fix ">" quotation marks and half
sentences.
I just wanted to be sure that this is not my mail client.
>
> Back on topic...
>
> > >
> > > > I see your flow and understand your position, but will repeat my
> > > > position. We need to make sure that vendors will have incentive to
> > > > supply quirks.
>
> What if we taint the kernel or pci_warn() for cases where either all
> the reset methods are disabled, ie. 'echo none > reset_method', or any
> time a device specific method is disabled?
What does it mean "none"? Does it mean nothing supported? If yes, I think that
pci_warn() will be enough. At least for me, taint is usable during debug stages,
probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.
>
> I'd almost go so far as to prevent disabling a device specific reset
> altogether, but for example should a device specific reset that fixes
> an aspect of FLR behavior prevent using a bus reset? I'd prefer in that
> case if direct FLR were disabled via a device flag introduced with the
> quirk and the remaining resets can still be selected by preference.
I don't know enough to discuss the PCI details, but you raised good point.
This sysfs is user visible API that is presented as is from device point
of view. It can be easily run into problems if PCI/core doesn't work with
user's choice.
>
> Theoretically all the other reset methods work and are available, it's
> only a policy decision which to use, right?
But this patch was presented as a way to overcome situations where
supported != working and user magically knows which reset type to set.
If you want to take this patch to be policy decision tool,
it will need to accept "reset_type1,reset_type2,..." sort of input,
so fallback will work natively.
I think that it will be much more robust and cleaner solution than it is now.
Something like that:
cat /sys/..../reset_policy
reset_type1,reset_type2,...,reset_typeX
echo "reset_type3,reset_type1" > /sys/..../reset_policy
cat /sys/..../reset_policy
reset_type3,reset_type1
Thanks
On Sat, 20 Mar 2021 11:10:08 +0200
Leon Romanovsky <[email protected]> wrote:
> On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:
> >
> > What if we taint the kernel or pci_warn() for cases where either all
> > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > time a device specific method is disabled?
>
> What does it mean "none"? Does it mean nothing supported? If yes, I think that
> pci_warn() will be enough. At least for me, taint is usable during debug stages,
> probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.
"none" as implemented in this patch, clearing the enabled function
reset methods.
> > I'd almost go so far as to prevent disabling a device specific reset
> > altogether, but for example should a device specific reset that fixes
> > an aspect of FLR behavior prevent using a bus reset? I'd prefer in that
> > case if direct FLR were disabled via a device flag introduced with the
> > quirk and the remaining resets can still be selected by preference.
>
> I don't know enough to discuss the PCI details, but you raised good point.
> This sysfs is user visible API that is presented as is from device point
> of view. It can be easily run into problems if PCI/core doesn't work with
> user's choice.
>
> >
> > Theoretically all the other reset methods work and are available, it's
> > only a policy decision which to use, right?
>
> But this patch was presented as a way to overcome situations where
> supported != working and user magically knows which reset type to set.
It's not magic, the new sysfs attributes expose which resets are
enabled and the order that they're used, the user can simply select the
next one. Being able to bypass a broken reset method is a helpful side
effect of getting to select a preferred reset method.
> If you want to take this patch to be policy decision tool,
> it will need to accept "reset_type1,reset_type2,..." sort of input,
> so fallback will work natively.
I don't see that as a requirement. We have fall-through support in the
kernel, but for a given device we're really only ever going to make use
of one of those methods. If a user knows enough about a device to have
a preference, I think it can be singular. That also significantly
simplifies the interface and supporting code. Thanks,
Alex
On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:
> On Sat, 20 Mar 2021 11:10:08 +0200
> Leon Romanovsky <[email protected]> wrote:
> > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:
> > >
> > > What if we taint the kernel or pci_warn() for cases where either all
> > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > time a device specific method is disabled?
> >
> > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.
>
> "none" as implemented in this patch, clearing the enabled function
> reset methods.
It is far from intuitive, the empty string will be easier to understand,
because "none" means no reset at all.
>
> > > I'd almost go so far as to prevent disabling a device specific reset
> > > altogether, but for example should a device specific reset that fixes
> > > an aspect of FLR behavior prevent using a bus reset? I'd prefer in that
> > > case if direct FLR were disabled via a device flag introduced with the
> > > quirk and the remaining resets can still be selected by preference.
> >
> > I don't know enough to discuss the PCI details, but you raised good point.
> > This sysfs is user visible API that is presented as is from device point
> > of view. It can be easily run into problems if PCI/core doesn't work with
> > user's choice.
> >
> > >
> > > Theoretically all the other reset methods work and are available, it's
> > > only a policy decision which to use, right?
> >
> > But this patch was presented as a way to overcome situations where
> > supported != working and user magically knows which reset type to set.
>
> It's not magic, the new sysfs attributes expose which resets are
> enabled and the order that they're used, the user can simply select the
> next one. Being able to bypass a broken reset method is a helpful side
> effect of getting to select a preferred reset method.
Magic in a sense that user has no idea what those resets mean, the
expectation is that he will blindly iterate till something works.
>
> > If you want to take this patch to be policy decision tool,
> > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > so fallback will work natively.
>
> I don't see that as a requirement. We have fall-through support in the
> kernel, but for a given device we're really only ever going to make use
> of one of those methods. If a user knows enough about a device to have
> a preference, I think it can be singular. That also significantly
> simplifies the interface and supporting code. Thanks,
I'm struggling to get requirements from this thread. You talked about
policy decision to overtake fallback mechanism, Amey wanted to avoid
quirks.
Do you have an example of such devices or we are talking about
theoretical case?
And I don't see why simple line parser with loop iterator over strchr()
suddenly becomes complicated code.
Thanks
>
> Alex
>
On 21/03/21 10:40AM, Leon Romanovsky wrote:
> On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:
> > On Sat, 20 Mar 2021 11:10:08 +0200
> > Leon Romanovsky <[email protected]> wrote:
> > > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:
> > > >
> > > > What if we taint the kernel or pci_warn() for cases where either all
> > > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > > time a device specific method is disabled?
> > >
> > > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.
> >
> > "none" as implemented in this patch, clearing the enabled function
> > reset methods.
>
> It is far from intuitive, the empty string will be easier to understand,
> because "none" means no reset at all.
>
> >
> > > > I'd almost go so far as to prevent disabling a device specific reset
> > > > altogether, but for example should a device specific reset that fixes
> > > > an aspect of FLR behavior prevent using a bus reset? I'd prefer in that
> > > > case if direct FLR were disabled via a device flag introduced with the
> > > > quirk and the remaining resets can still be selected by preference.
> > >
> > > I don't know enough to discuss the PCI details, but you raised good point.
> > > This sysfs is user visible API that is presented as is from device point
> > > of view. It can be easily run into problems if PCI/core doesn't work with
> > > user's choice.
> > >
> > > >
> > > > Theoretically all the other reset methods work and are available, it's
> > > > only a policy decision which to use, right?
> > >
> > > But this patch was presented as a way to overcome situations where
> > > supported != working and user magically knows which reset type to set.
> >
> > It's not magic, the new sysfs attributes expose which resets are
> > enabled and the order that they're used, the user can simply select the
> > next one. Being able to bypass a broken reset method is a helpful side
> > effect of getting to select a preferred reset method.
>
> Magic in a sense that user has no idea what those resets mean, the
> expectation is that he will blindly iterate till something works.
>
> >
> > > If you want to take this patch to be policy decision tool,
> > > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > > so fallback will work natively.
> >
> > I don't see that as a requirement. We have fall-through support in the
> > kernel, but for a given device we're really only ever going to make use
> > of one of those methods. If a user knows enough about a device to have
> > a preference, I think it can be singular. That also significantly
> > simplifies the interface and supporting code. Thanks,
>
> I'm struggling to get requirements from this thread. You talked about
> policy decision to overtake fallback mechanism, Amey wanted to avoid
> quirks.
Just to clarify I don't want to avoid quirks. I just want device
to be usable even if it doesn't have quirk as the quirk for that
particular device may not be developed at all for different reasons
mentioned earlier.
[...]
Thanks,
Amey
On Sun, 21 Mar 2021 10:40:55 +0200
Leon Romanovsky <[email protected]> wrote:
> On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:
> > On Sat, 20 Mar 2021 11:10:08 +0200
> > Leon Romanovsky <[email protected]> wrote:
> > > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:
> > > >
> > > > What if we taint the kernel or pci_warn() for cases where either all
> > > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > > time a device specific method is disabled?
> > >
> > > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.
> >
> > "none" as implemented in this patch, clearing the enabled function
> > reset methods.
>
> It is far from intuitive, the empty string will be easier to understand,
> because "none" means no reset at all.
"No reset at all" is what "none" achieves, the
pci_dev.reset_methods_enabled bitmap is cleared. We can use an empty
string, but I think we want a way to clear all enabled resets and a way
to return it to the default. I could see arguments for an empty string
serving either purpose, so this version proposed explicitly using
"none" and "default", as included in the ABI update.
> > > > I'd almost go so far as to prevent disabling a device specific reset
> > > > altogether, but for example should a device specific reset that fixes
> > > > an aspect of FLR behavior prevent using a bus reset? I'd prefer in that
> > > > case if direct FLR were disabled via a device flag introduced with the
> > > > quirk and the remaining resets can still be selected by preference.
> > >
> > > I don't know enough to discuss the PCI details, but you raised good point.
> > > This sysfs is user visible API that is presented as is from device point
> > > of view. It can be easily run into problems if PCI/core doesn't work with
> > > user's choice.
> > >
> > > >
> > > > Theoretically all the other reset methods work and are available, it's
> > > > only a policy decision which to use, right?
> > >
> > > But this patch was presented as a way to overcome situations where
> > > supported != working and user magically knows which reset type to set.
> >
> > It's not magic, the new sysfs attributes expose which resets are
> > enabled and the order that they're used, the user can simply select the
> > next one. Being able to bypass a broken reset method is a helpful side
> > effect of getting to select a preferred reset method.
>
> Magic in a sense that user has no idea what those resets mean, the
> expectation is that he will blindly iterate till something works.
Which ought to actually be a safe thing to do. We should have quirks to
exclude resets that are known broken but still probe as present and I'd
be perfectly fine if we issue a warning if the user disables all resets
for a given device.
> > > If you want to take this patch to be policy decision tool,
> > > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > > so fallback will work natively.
> >
> > I don't see that as a requirement. We have fall-through support in the
> > kernel, but for a given device we're really only ever going to make use
> > of one of those methods. If a user knows enough about a device to have
> > a preference, I think it can be singular. That also significantly
> > simplifies the interface and supporting code. Thanks,
>
> I'm struggling to get requirements from this thread. You talked about
> policy decision to overtake fallback mechanism, Amey wanted to avoid
> quirks.
>
> Do you have an example of such devices or we are talking about
> theoretical case?
Look at any device that already has a reset quirk and the process it
took to get there. Those are more than just theoretical cases.
For policy preference, I already described how I've configured QEMU to
prefer a bus reset rather than a PM reset due to lack of specification
regarding the scope of a PM "soft reset". This interface would allow a
system policy to do that same thing.
I don't think anyone is suggesting this as a means to avoid quirks that
would resolve reset issues and create the best default general behavior.
This provides a mechanism to test various reset methods, and thereby
identify broken methods, and set a policy. Sure, that policy might be
to avoid a broken reset in the interim before it gets quirked and
there's potential for abuse there, but I think the benefits outweigh
the risks.
> And I don't see why simple line parser with loop iterator over strchr()
> suddenly becomes complicated code.
Setting multiple bits in a bitmap is easy. How do you then go on to
allow the user to specify an ordering preference? If you have an
algorithm you'd like to propose that allows the user to manage the
ordering when enabling multiple methods without substantially
increasing the complexity, please share. IMO, a given device will
generally use one reset method and it seems sufficient to restrict user
preference to achieve all the use cases I've noted. Thanks,
Alex
On Thursday 18 March 2021 20:01:55 Amey Narkhede wrote:
> On 21/03/17 09:13PM, Pali Rohár wrote:
> > On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:
> > > On Wed, 17 Mar 2021 20:40:24 +0100
> > > Pali Rohár <[email protected]> wrote:
> > >
> > > > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > > > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > > > Pali Rohár <[email protected]> wrote:
> > > > >
> > > > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > >
> > > > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > >
> > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > >
> > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > defined here.
> > > > > > > > > >
> > > > > > > > > > Ok!
> > > > > > > > > >
> > > > > > > > > > > Note that with this series the resets available through
> > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Alex
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > > > let device in unconfigured / broken state.
> > > > > > > > >
> > > > > > > > > No, there's not:
> > > > > > > > >
> > > > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > > > {
> > > > > > > > > int rc;
> > > > > > > > >
> > > > > > > > > if (!dev->reset_fn)
> > > > > > > > > return -ENOTTY;
> > > > > > > > >
> > > > > > > > > pci_dev_lock(dev);
> > > > > > > > > >>> pci_dev_save_and_disable(dev);
> > > > > > > > >
> > > > > > > > > rc = __pci_reset_function_locked(dev);
> > > > > > > > >
> > > > > > > > > >>> pci_dev_restore(dev);
> > > > > > > > > pci_dev_unlock(dev);
> > > > > > > > >
> > > > > > > > > return rc;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > > > limited to the single device triggering this reset. Thanks,
> > > > > > > > >
> > > > > > > > > Alex
> > > > > > > > >
> > > > > > > >
> > > > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > > > behave with hotplugging PCIe bridge.
> > > > > > > >
> > > > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > > > implemented into kernel's emulated root bridge code).
> > > > > > > >
> > > > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > > > change.
> > > > > > >
> > > > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > > > implemented by the hotplug controller and disables presence detection
> > > > > > > around the bus reset. Thanks,
> > > > > >
> > > > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > > > >
> > > > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > > > hotplugging.
> > > > > >
> > > > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > > > Via these new patches it is possible to disable slot reset and enable
> > > > > > bus reset.
> > > > >
> > > > > That's true, a slot reset is simply a bus reset wrapped around code
> > > > > that prevents the device from getting ejected.
> > > >
> > > > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> > > >
> > > > > Maybe it would make
> > > > > sense to combine the two as far as this interface is concerned, ie. a
> > > > > single "bus" reset method that will always use slot reset when
> > > > > available. Thanks,
> > > >
> > > > That should work when slot reset is available.
> > > >
> > > > Other option is that mentioned remove-reset-rescan procedure.
> > >
> > > That's not something we can introduce to the pci_reset_function() path
> > > without a fair bit of collateral in using it through vfio-pci.
> > >
> > > > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > > > drivers implement reset_slot method.
> > > >
> > > > So there is a possible issue with hotplug driver which may eject device
> > > > during bus reset (because e.g. slot reset is not implemented)?
> > >
> > > People aren't reporting it, so maybe those controllers aren't being
> > > used for this use case. Or maybe introducing this patch will make
> > > these reset methods more readily accessible for testing. We can fix or
> > > blacklist those controllers for bus reset when reports come in. Thanks,
> >
> > Ok! I do not know neither if those controllers are used, but looks like
> > that there are still changes in hotplug code.
> >
> > So I guess with these patches people can test it and report issues when
> > such thing happen.
> So after a bit research as I understood we need to group slot
> and bus reset together in a single category of reset methods and
> then implicitly use slot reset if it is available when bus reset is
> enabled by the user.
> Is that right?
Yes, I understand it in same way. Just I do not know which name to
choose for this reset category. In PCI spec it is called Secondary Bus
Reset (as it resets whole bus with all devices; but we allow this reset
in this patch series only if on the bus is connected exactly one device).
In PCIe spec it is called Hot Reset. And if kernel detects Slot support
then kernel currently calls it Slot reset. But it is still same thing.
Any opinion? I think that we could call it Hot Reset as this patch
series exports it only for single device (so calling it _bus_ is not the
best match).
On Tue, 23 Mar 2021 15:34:19 +0100
Pali Rohár <[email protected]> wrote:
> On Thursday 18 March 2021 20:01:55 Amey Narkhede wrote:
> > On 21/03/17 09:13PM, Pali Rohár wrote:
> > > On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:
> > > > On Wed, 17 Mar 2021 20:40:24 +0100
> > > > Pali Rohár <[email protected]> wrote:
> > > >
> > > > > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > > > > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > > > > Pali Rohár <[email protected]> wrote:
> > > > > >
> > > > > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > > > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > >
> > > > > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > >
> > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > >
> > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > defined here.
> > > > > > > > > > >
> > > > > > > > > > > Ok!
> > > > > > > > > > >
> > > > > > > > > > > > Note that with this series the resets available through
> > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Alex
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > > > > let device in unconfigured / broken state.
> > > > > > > > > >
> > > > > > > > > > No, there's not:
> > > > > > > > > >
> > > > > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > > > > {
> > > > > > > > > > int rc;
> > > > > > > > > >
> > > > > > > > > > if (!dev->reset_fn)
> > > > > > > > > > return -ENOTTY;
> > > > > > > > > >
> > > > > > > > > > pci_dev_lock(dev);
> > > > > > > > > > >>> pci_dev_save_and_disable(dev);
> > > > > > > > > >
> > > > > > > > > > rc = __pci_reset_function_locked(dev);
> > > > > > > > > >
> > > > > > > > > > >>> pci_dev_restore(dev);
> > > > > > > > > > pci_dev_unlock(dev);
> > > > > > > > > >
> > > > > > > > > > return rc;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > > > > limited to the single device triggering this reset. Thanks,
> > > > > > > > > >
> > > > > > > > > > Alex
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > > > > behave with hotplugging PCIe bridge.
> > > > > > > > >
> > > > > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > > > > implemented into kernel's emulated root bridge code).
> > > > > > > > >
> > > > > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > > > > change.
> > > > > > > >
> > > > > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > > > > implemented by the hotplug controller and disables presence detection
> > > > > > > > around the bus reset. Thanks,
> > > > > > >
> > > > > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > > > > >
> > > > > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > > > > hotplugging.
> > > > > > >
> > > > > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > > > > Via these new patches it is possible to disable slot reset and enable
> > > > > > > bus reset.
> > > > > >
> > > > > > That's true, a slot reset is simply a bus reset wrapped around code
> > > > > > that prevents the device from getting ejected.
> > > > >
> > > > > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> > > > >
> > > > > > Maybe it would make
> > > > > > sense to combine the two as far as this interface is concerned, ie. a
> > > > > > single "bus" reset method that will always use slot reset when
> > > > > > available. Thanks,
> > > > >
> > > > > That should work when slot reset is available.
> > > > >
> > > > > Other option is that mentioned remove-reset-rescan procedure.
> > > >
> > > > That's not something we can introduce to the pci_reset_function() path
> > > > without a fair bit of collateral in using it through vfio-pci.
> > > >
> > > > > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > > > > drivers implement reset_slot method.
> > > > >
> > > > > So there is a possible issue with hotplug driver which may eject device
> > > > > during bus reset (because e.g. slot reset is not implemented)?
> > > >
> > > > People aren't reporting it, so maybe those controllers aren't being
> > > > used for this use case. Or maybe introducing this patch will make
> > > > these reset methods more readily accessible for testing. We can fix or
> > > > blacklist those controllers for bus reset when reports come in. Thanks,
> > >
> > > Ok! I do not know neither if those controllers are used, but looks like
> > > that there are still changes in hotplug code.
> > >
> > > So I guess with these patches people can test it and report issues when
> > > such thing happen.
> > So after a bit research as I understood we need to group slot
> > and bus reset together in a single category of reset methods and
> > then implicitly use slot reset if it is available when bus reset is
> > enabled by the user.
> > Is that right?
>
> Yes, I understand it in same way. Just I do not know which name to
> choose for this reset category. In PCI spec it is called Secondary Bus
> Reset (as it resets whole bus with all devices; but we allow this reset
> in this patch series only if on the bus is connected exactly one device).
> In PCIe spec it is called Hot Reset. And if kernel detects Slot support
> then kernel currently calls it Slot reset. But it is still same thing.
> Any opinion? I think that we could call it Hot Reset as this patch
> series exports it only for single device (so calling it _bus_ is not the
> best match).
A similar abstraction where our scope is not limited to a single
function calls this a bus reset:
int pci_reset_bus(struct pci_dev *pdev)
{
return (!pci_probe_reset_slot(pdev->slot)) ?
__pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
}
Thanks,
Alex
On 21/03/23 08:44AM, Alex Williamson wrote:
> On Tue, 23 Mar 2021 15:34:19 +0100
> Pali Roh?r <[email protected]> wrote:
>
> > On Thursday 18 March 2021 20:01:55 Amey Narkhede wrote:
> > > On 21/03/17 09:13PM, Pali Roh?r wrote:
> > > > On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:
> > > > > On Wed, 17 Mar 2021 20:40:24 +0100
> > > > > Pali Roh?r <[email protected]> wrote:
> > > > >
> > > > > > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > > > > > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > >
> > > > > > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > > > > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > Pali Roh?r <[email protected]> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > defined here.
> > > > > > > > > > > >
> > > > > > > > > > > > Ok!
> > > > > > > > > > > >
> > > > > > > > > > > > > Note that with this series the resets available through
> > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alex
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > > > > > let device in unconfigured / broken state.
> > > > > > > > > > >
> > > > > > > > > > > No, there's not:
> > > > > > > > > > >
> > > > > > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > > > > > {
> > > > > > > > > > > int rc;
> > > > > > > > > > >
> > > > > > > > > > > if (!dev->reset_fn)
> > > > > > > > > > > return -ENOTTY;
> > > > > > > > > > >
> > > > > > > > > > > pci_dev_lock(dev);
> > > > > > > > > > > >>> pci_dev_save_and_disable(dev);
> > > > > > > > > > >
> > > > > > > > > > > rc = __pci_reset_function_locked(dev);
> > > > > > > > > > >
> > > > > > > > > > > >>> pci_dev_restore(dev);
> > > > > > > > > > > pci_dev_unlock(dev);
> > > > > > > > > > >
> > > > > > > > > > > return rc;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > > > > > limited to the single device triggering this reset. Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Alex
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > > > > > behave with hotplugging PCIe bridge.
> > > > > > > > > >
> > > > > > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > > > > > implemented into kernel's emulated root bridge code).
> > > > > > > > > >
> > > > > > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > > > > > change.
> > > > > > > > >
> > > > > > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > > > > > implemented by the hotplug controller and disables presence detection
> > > > > > > > > around the bus reset. Thanks,
> > > > > > > >
> > > > > > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > > > > > >
> > > > > > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > > > > > hotplugging.
> > > > > > > >
> > > > > > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > > > > > Via these new patches it is possible to disable slot reset and enable
> > > > > > > > bus reset.
> > > > > > >
> > > > > > > That's true, a slot reset is simply a bus reset wrapped around code
> > > > > > > that prevents the device from getting ejected.
> > > > > >
> > > > > > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> > > > > >
> > > > > > > Maybe it would make
> > > > > > > sense to combine the two as far as this interface is concerned, ie. a
> > > > > > > single "bus" reset method that will always use slot reset when
> > > > > > > available. Thanks,
> > > > > >
> > > > > > That should work when slot reset is available.
> > > > > >
> > > > > > Other option is that mentioned remove-reset-rescan procedure.
> > > > >
> > > > > That's not something we can introduce to the pci_reset_function() path
> > > > > without a fair bit of collateral in using it through vfio-pci.
> > > > >
> > > > > > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > > > > > drivers implement reset_slot method.
> > > > > >
> > > > > > So there is a possible issue with hotplug driver which may eject device
> > > > > > during bus reset (because e.g. slot reset is not implemented)?
> > > > >
> > > > > People aren't reporting it, so maybe those controllers aren't being
> > > > > used for this use case. Or maybe introducing this patch will make
> > > > > these reset methods more readily accessible for testing. We can fix or
> > > > > blacklist those controllers for bus reset when reports come in. Thanks,
> > > >
> > > > Ok! I do not know neither if those controllers are used, but looks like
> > > > that there are still changes in hotplug code.
> > > >
> > > > So I guess with these patches people can test it and report issues when
> > > > such thing happen.
> > > So after a bit research as I understood we need to group slot
> > > and bus reset together in a single category of reset methods and
> > > then implicitly use slot reset if it is available when bus reset is
> > > enabled by the user.
> > > Is that right?
> >
> > Yes, I understand it in same way. Just I do not know which name to
> > choose for this reset category. In PCI spec it is called Secondary Bus
> > Reset (as it resets whole bus with all devices; but we allow this reset
> > in this patch series only if on the bus is connected exactly one device).
> > In PCIe spec it is called Hot Reset. And if kernel detects Slot support
> > then kernel currently calls it Slot reset. But it is still same thing.
> > Any opinion? I think that we could call it Hot Reset as this patch
> > series exports it only for single device (so calling it _bus_ is not the
> > best match).
>
> A similar abstraction where our scope is not limited to a single
> function calls this a bus reset:
>
> int pci_reset_bus(struct pci_dev *pdev)
> {
> return (!pci_probe_reset_slot(pdev->slot)) ?
> __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> }
>
> Thanks,
> Alex
>
I was going to use similar function
int pci_bus_reset(struct pci_dev *dev, int probe)
{
return pci_dev_reset_slot_function(dev, probe) ?
pci_parent_bus_reset(dev, probe) : 0;
}
Thanks,
Amey
On Tue, 23 Mar 2021 10:06:25 -0600
Alex Williamson <[email protected]> wrote:
> On Tue, 23 Mar 2021 21:02:21 +0530
> Amey Narkhede <[email protected]> wrote:
>
> > On 21/03/23 08:44AM, Alex Williamson wrote:
> > > On Tue, 23 Mar 2021 15:34:19 +0100
> > > Pali Rohár <[email protected]> wrote:
> > >
> > > > On Thursday 18 March 2021 20:01:55 Amey Narkhede wrote:
> > > > > On 21/03/17 09:13PM, Pali Rohár wrote:
> > > > > > On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:
> > > > > > > On Wed, 17 Mar 2021 20:40:24 +0100
> > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > >
> > > > > > > > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > > > > > > > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > > > > > > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > > > > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > > > defined here.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Ok!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Note that with this series the resets available through
> > > > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > > > > > > > let device in unconfigured / broken state.
> > > > > > > > > > > > >
> > > > > > > > > > > > > No, there's not:
> > > > > > > > > > > > >
> > > > > > > > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > > > > > > > {
> > > > > > > > > > > > > int rc;
> > > > > > > > > > > > >
> > > > > > > > > > > > > if (!dev->reset_fn)
> > > > > > > > > > > > > return -ENOTTY;
> > > > > > > > > > > > >
> > > > > > > > > > > > > pci_dev_lock(dev);
> > > > > > > > > > > > > >>> pci_dev_save_and_disable(dev);
> > > > > > > > > > > > >
> > > > > > > > > > > > > rc = __pci_reset_function_locked(dev);
> > > > > > > > > > > > >
> > > > > > > > > > > > > >>> pci_dev_restore(dev);
> > > > > > > > > > > > > pci_dev_unlock(dev);
> > > > > > > > > > > > >
> > > > > > > > > > > > > return rc;
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > > > > > > > limited to the single device triggering this reset. Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alex
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > > > > > > > behave with hotplugging PCIe bridge.
> > > > > > > > > > > >
> > > > > > > > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > > > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > > > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > > > > > > > implemented into kernel's emulated root bridge code).
> > > > > > > > > > > >
> > > > > > > > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > > > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > > > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > > > > > > > change.
> > > > > > > > > > >
> > > > > > > > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > > > > > > > implemented by the hotplug controller and disables presence detection
> > > > > > > > > > > around the bus reset. Thanks,
> > > > > > > > > >
> > > > > > > > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > > > > > > > >
> > > > > > > > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > > > > > > > hotplugging.
> > > > > > > > > >
> > > > > > > > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > > > > > > > Via these new patches it is possible to disable slot reset and enable
> > > > > > > > > > bus reset.
> > > > > > > > >
> > > > > > > > > That's true, a slot reset is simply a bus reset wrapped around code
> > > > > > > > > that prevents the device from getting ejected.
> > > > > > > >
> > > > > > > > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> > > > > > > >
> > > > > > > > > Maybe it would make
> > > > > > > > > sense to combine the two as far as this interface is concerned, ie. a
> > > > > > > > > single "bus" reset method that will always use slot reset when
> > > > > > > > > available. Thanks,
> > > > > > > >
> > > > > > > > That should work when slot reset is available.
> > > > > > > >
> > > > > > > > Other option is that mentioned remove-reset-rescan procedure.
> > > > > > >
> > > > > > > That's not something we can introduce to the pci_reset_function() path
> > > > > > > without a fair bit of collateral in using it through vfio-pci.
> > > > > > >
> > > > > > > > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > > > > > > > drivers implement reset_slot method.
> > > > > > > >
> > > > > > > > So there is a possible issue with hotplug driver which may eject device
> > > > > > > > during bus reset (because e.g. slot reset is not implemented)?
> > > > > > >
> > > > > > > People aren't reporting it, so maybe those controllers aren't being
> > > > > > > used for this use case. Or maybe introducing this patch will make
> > > > > > > these reset methods more readily accessible for testing. We can fix or
> > > > > > > blacklist those controllers for bus reset when reports come in. Thanks,
> > > > > >
> > > > > > Ok! I do not know neither if those controllers are used, but looks like
> > > > > > that there are still changes in hotplug code.
> > > > > >
> > > > > > So I guess with these patches people can test it and report issues when
> > > > > > such thing happen.
> > > > > So after a bit research as I understood we need to group slot
> > > > > and bus reset together in a single category of reset methods and
> > > > > then implicitly use slot reset if it is available when bus reset is
> > > > > enabled by the user.
> > > > > Is that right?
> > > >
> > > > Yes, I understand it in same way. Just I do not know which name to
> > > > choose for this reset category. In PCI spec it is called Secondary Bus
> > > > Reset (as it resets whole bus with all devices; but we allow this reset
> > > > in this patch series only if on the bus is connected exactly one device).
> > > > In PCIe spec it is called Hot Reset. And if kernel detects Slot support
> > > > then kernel currently calls it Slot reset. But it is still same thing.
> > > > Any opinion? I think that we could call it Hot Reset as this patch
> > > > series exports it only for single device (so calling it _bus_ is not the
> > > > best match).
> > >
> > > A similar abstraction where our scope is not limited to a single
> > > function calls this a bus reset:
> > >
> > > int pci_reset_bus(struct pci_dev *pdev)
> > > {
> > > return (!pci_probe_reset_slot(pdev->slot)) ?
> > > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> > > }
> > >
> > > Thanks,
> > > Alex
> > >
> > I was going to use similar function
> >
> > int pci_bus_reset(struct pci_dev *dev, int probe)
> > {
> > return pci_dev_reset_slot_function(dev, probe) ?
> > pci_parent_bus_reset(dev, probe) : 0;
> >
> > }
>
> I think via the sysfs attribute we can simply call this "bus" reset,
> but internally having both pci_reset_bus() and pci_bus_reset() would be
> really confusing. We're doing the same thing as pci_bus_reset() but
> with a different scope, so I'd probably suggest
> pci_bus_reset_function().
I'm already confusing them, s/bus_reset/reset_bus/ in the last sentence
above. Thanks,
Alex
>
> Also, the above ternary form isn't true to the original, only -ENOTTY
> allows fall-through, so something more like:
>
> int pci_reset_bus_function(struct pci_dev *dev, int probe)
> {
> int rc = pci_dev_reset_slot_function(dev, probe);
>
> return (rc == -ENOTTY) ? pci_parent_bus_reset(dev, probe) : rc;
> }
>
> Thanks,
> Alex
>
On Tue, 23 Mar 2021 21:02:21 +0530
Amey Narkhede <[email protected]> wrote:
> On 21/03/23 08:44AM, Alex Williamson wrote:
> > On Tue, 23 Mar 2021 15:34:19 +0100
> > Pali Rohár <[email protected]> wrote:
> >
> > > On Thursday 18 March 2021 20:01:55 Amey Narkhede wrote:
> > > > On 21/03/17 09:13PM, Pali Rohár wrote:
> > > > > On Wednesday 17 March 2021 14:00:20 Alex Williamson wrote:
> > > > > > On Wed, 17 Mar 2021 20:40:24 +0100
> > > > > > Pali Rohár <[email protected]> wrote:
> > > > > >
> > > > > > > On Wednesday 17 March 2021 13:32:45 Alex Williamson wrote:
> > > > > > > > On Wed, 17 Mar 2021 20:24:24 +0100
> > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > >
> > > > > > > > > On Wednesday 17 March 2021 13:15:36 Alex Williamson wrote:
> > > > > > > > > > On Wed, 17 Mar 2021 20:02:06 +0100
> > > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Monday 15 March 2021 09:03:39 Alex Williamson wrote:
> > > > > > > > > > > > On Mon, 15 Mar 2021 15:52:38 +0100
> > > > > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Monday 15 March 2021 08:34:09 Alex Williamson wrote:
> > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > > Pali Rohár <[email protected]> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > > defined here.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Ok!
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Note that with this series the resets available through
> > > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > > exactly the same as they are currently. The bus and slot reset
> > > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > > which performed a reset irrespective of the downstream devices. This
> > > > > > > > > > > > > > series only enables selection of the existing methods. Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > But with this patch series, there is still an issue with PCI secondary
> > > > > > > > > > > > > bus reset mechanism as exported sysfs attribute does not do that
> > > > > > > > > > > > > remove-reset-rescan procedure. As discussed in other thread, this reset
> > > > > > > > > > > > > let device in unconfigured / broken state.
> > > > > > > > > > > >
> > > > > > > > > > > > No, there's not:
> > > > > > > > > > > >
> > > > > > > > > > > > int pci_reset_function(struct pci_dev *dev)
> > > > > > > > > > > > {
> > > > > > > > > > > > int rc;
> > > > > > > > > > > >
> > > > > > > > > > > > if (!dev->reset_fn)
> > > > > > > > > > > > return -ENOTTY;
> > > > > > > > > > > >
> > > > > > > > > > > > pci_dev_lock(dev);
> > > > > > > > > > > > >>> pci_dev_save_and_disable(dev);
> > > > > > > > > > > >
> > > > > > > > > > > > rc = __pci_reset_function_locked(dev);
> > > > > > > > > > > >
> > > > > > > > > > > > >>> pci_dev_restore(dev);
> > > > > > > > > > > > pci_dev_unlock(dev);
> > > > > > > > > > > >
> > > > > > > > > > > > return rc;
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > The remove/re-scan was discussed primarily because your patch performed
> > > > > > > > > > > > a bus reset regardless of what devices were affected by that reset and
> > > > > > > > > > > > it's difficult to manage the scope where multiple devices are affected.
> > > > > > > > > > > > Here, the bus and slot reset functions will fail unless the scope is
> > > > > > > > > > > > limited to the single device triggering this reset. Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Alex
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I was thinking a bit more about it and I'm really sure how it would
> > > > > > > > > > > behave with hotplugging PCIe bridge.
> > > > > > > > > > >
> > > > > > > > > > > On aardvark PCIe controller I have already tested that secondary bus
> > > > > > > > > > > reset bit is triggering Hot Reset event and then also Link Down event.
> > > > > > > > > > > These events are not handled by aardvark driver yet (needs to
> > > > > > > > > > > implemented into kernel's emulated root bridge code).
> > > > > > > > > > >
> > > > > > > > > > > But I'm not sure how it would behave on real HW PCIe hotplugging bridge.
> > > > > > > > > > > Kernel has already code which removes PCIe device if it changes presence
> > > > > > > > > > > bit (and inform via interrupt). And Link Down event triggers this
> > > > > > > > > > > change.
> > > > > > > > > >
> > > > > > > > > > This is the difference between slot and bus resets, the slot reset is
> > > > > > > > > > implemented by the hotplug controller and disables presence detection
> > > > > > > > > > around the bus reset. Thanks,
> > > > > > > > >
> > > > > > > > > Yes, but I'm talking about bus reset, not about slot reset.
> > > > > > > > >
> > > > > > > > > I mean: to use bus reset via sysfs on hardware which supports slots and
> > > > > > > > > hotplugging.
> > > > > > > > >
> > > > > > > > > And if I'm reading code correctly, this combination is allowed, right?
> > > > > > > > > Via these new patches it is possible to disable slot reset and enable
> > > > > > > > > bus reset.
> > > > > > > >
> > > > > > > > That's true, a slot reset is simply a bus reset wrapped around code
> > > > > > > > that prevents the device from getting ejected.
> > > > > > >
> > > > > > > Yes, this makes slot reset "safe". But bus reset is "unsafe".
> > > > > > >
> > > > > > > > Maybe it would make
> > > > > > > > sense to combine the two as far as this interface is concerned, ie. a
> > > > > > > > single "bus" reset method that will always use slot reset when
> > > > > > > > available. Thanks,
> > > > > > >
> > > > > > > That should work when slot reset is available.
> > > > > > >
> > > > > > > Other option is that mentioned remove-reset-rescan procedure.
> > > > > >
> > > > > > That's not something we can introduce to the pci_reset_function() path
> > > > > > without a fair bit of collateral in using it through vfio-pci.
> > > > > >
> > > > > > > But quick search in drivers/pci/hotplug/ results that not all hotplug
> > > > > > > drivers implement reset_slot method.
> > > > > > >
> > > > > > > So there is a possible issue with hotplug driver which may eject device
> > > > > > > during bus reset (because e.g. slot reset is not implemented)?
> > > > > >
> > > > > > People aren't reporting it, so maybe those controllers aren't being
> > > > > > used for this use case. Or maybe introducing this patch will make
> > > > > > these reset methods more readily accessible for testing. We can fix or
> > > > > > blacklist those controllers for bus reset when reports come in. Thanks,
> > > > >
> > > > > Ok! I do not know neither if those controllers are used, but looks like
> > > > > that there are still changes in hotplug code.
> > > > >
> > > > > So I guess with these patches people can test it and report issues when
> > > > > such thing happen.
> > > > So after a bit research as I understood we need to group slot
> > > > and bus reset together in a single category of reset methods and
> > > > then implicitly use slot reset if it is available when bus reset is
> > > > enabled by the user.
> > > > Is that right?
> > >
> > > Yes, I understand it in same way. Just I do not know which name to
> > > choose for this reset category. In PCI spec it is called Secondary Bus
> > > Reset (as it resets whole bus with all devices; but we allow this reset
> > > in this patch series only if on the bus is connected exactly one device).
> > > In PCIe spec it is called Hot Reset. And if kernel detects Slot support
> > > then kernel currently calls it Slot reset. But it is still same thing.
> > > Any opinion? I think that we could call it Hot Reset as this patch
> > > series exports it only for single device (so calling it _bus_ is not the
> > > best match).
> >
> > A similar abstraction where our scope is not limited to a single
> > function calls this a bus reset:
> >
> > int pci_reset_bus(struct pci_dev *pdev)
> > {
> > return (!pci_probe_reset_slot(pdev->slot)) ?
> > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
> > }
> >
> > Thanks,
> > Alex
> >
> I was going to use similar function
>
> int pci_bus_reset(struct pci_dev *dev, int probe)
> {
> return pci_dev_reset_slot_function(dev, probe) ?
> pci_parent_bus_reset(dev, probe) : 0;
>
> }
I think via the sysfs attribute we can simply call this "bus" reset,
but internally having both pci_reset_bus() and pci_bus_reset() would be
really confusing. We're doing the same thing as pci_bus_reset() but
with a different scope, so I'd probably suggest
pci_bus_reset_function().
Also, the above ternary form isn't true to the original, only -ENOTTY
allows fall-through, so something more like:
int pci_reset_bus_function(struct pci_dev *dev, int probe)
{
int rc = pci_dev_reset_slot_function(dev, probe);
return (rc == -ENOTTY) ? pci_parent_bus_reset(dev, probe) : rc;
}
Thanks,
Alex
On Mon, Mar 22, 2021 at 11:10:03AM -0600, Alex Williamson wrote:
> On Sun, 21 Mar 2021 10:40:55 +0200
> Leon Romanovsky <[email protected]> wrote:
>
> > On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:
> > > On Sat, 20 Mar 2021 11:10:08 +0200
> > > Leon Romanovsky <[email protected]> wrote:
> > > > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:
> > > > >
> > > > > What if we taint the kernel or pci_warn() for cases where either all
> > > > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > > > time a device specific method is disabled?
> > > >
> > > > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > > > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > > > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.
> > >
> > > "none" as implemented in this patch, clearing the enabled function
> > > reset methods.
> >
> > It is far from intuitive, the empty string will be easier to understand,
> > because "none" means no reset at all.
>
> "No reset at all" is what "none" achieves, the
> pci_dev.reset_methods_enabled bitmap is cleared. We can use an empty
> string, but I think we want a way to clear all enabled resets and a way
> to return it to the default. I could see arguments for an empty string
> serving either purpose, so this version proposed explicitly using
> "none" and "default", as included in the ABI update.
I will stick with "default" only and leave "none" for something else.
>
> > > > > I'd almost go so far as to prevent disabling a device specific reset
> > > > > altogether, but for example should a device specific reset that fixes
> > > > > an aspect of FLR behavior prevent using a bus reset? I'd prefer in that
> > > > > case if direct FLR were disabled via a device flag introduced with the
> > > > > quirk and the remaining resets can still be selected by preference.
> > > >
> > > > I don't know enough to discuss the PCI details, but you raised good point.
> > > > This sysfs is user visible API that is presented as is from device point
> > > > of view. It can be easily run into problems if PCI/core doesn't work with
> > > > user's choice.
> > > >
> > > > >
> > > > > Theoretically all the other reset methods work and are available, it's
> > > > > only a policy decision which to use, right?
> > > >
> > > > But this patch was presented as a way to overcome situations where
> > > > supported != working and user magically knows which reset type to set.
> > >
> > > It's not magic, the new sysfs attributes expose which resets are
> > > enabled and the order that they're used, the user can simply select the
> > > next one. Being able to bypass a broken reset method is a helpful side
> > > effect of getting to select a preferred reset method.
> >
> > Magic in a sense that user has no idea what those resets mean, the
> > expectation is that he will blindly iterate till something works.
>
> Which ought to actually be a safe thing to do. We should have quirks to
> exclude resets that are known broken but still probe as present and I'd
> be perfectly fine if we issue a warning if the user disables all resets
> for a given device.
>
> > > > If you want to take this patch to be policy decision tool,
> > > > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > > > so fallback will work natively.
> > >
> > > I don't see that as a requirement. We have fall-through support in the
> > > kernel, but for a given device we're really only ever going to make use
> > > of one of those methods. If a user knows enough about a device to have
> > > a preference, I think it can be singular. That also significantly
> > > simplifies the interface and supporting code. Thanks,
> >
> > I'm struggling to get requirements from this thread. You talked about
> > policy decision to overtake fallback mechanism, Amey wanted to avoid
> > quirks.
> >
> > Do you have an example of such devices or we are talking about
> > theoretical case?
>
> Look at any device that already has a reset quirk and the process it
> took to get there. Those are more than just theoretical cases.
So let's fix the process. The long standing kernel policy is that kernel
bugs (and missing quirk can be seen as such bug) should be fixed in the
kernel and not workaround by the users.
>
> For policy preference, I already described how I've configured QEMU to
> prefer a bus reset rather than a PM reset due to lack of specification
> regarding the scope of a PM "soft reset". This interface would allow a
> system policy to do that same thing.
>
> I don't think anyone is suggesting this as a means to avoid quirks that
> would resolve reset issues and create the best default general behavior.
> This provides a mechanism to test various reset methods, and thereby
> identify broken methods, and set a policy. Sure, that policy might be
> to avoid a broken reset in the interim before it gets quirked and
> there's potential for abuse there, but I think the benefits outweigh
> the risks.
This interface is proposed as first class citizen in the general sysfs
layout. Of course, it will be seen as a way to bypass the kernel.
At least, put it under CONFIG_EXPERT option, so no distro will enable it
by default.
>
> > And I don't see why simple line parser with loop iterator over strchr()
> > suddenly becomes complicated code.
>
> Setting multiple bits in a bitmap is easy. How do you then go on to
> allow the user to specify an ordering preference? If you have an
> algorithm you'd like to propose that allows the user to manage the
> ordering when enabling multiple methods without substantially
> increasing the complexity, please share. IMO, a given device will
> generally use one reset method and it seems sufficient to restrict user
> preference to achieve all the use cases I've noted. Thanks,
Linked list + iterator will do the trick.
>
> Alex
>
On Wed, 24 Mar 2021 12:03:00 +0200
Leon Romanovsky <[email protected]> wrote:
> On Mon, Mar 22, 2021 at 11:10:03AM -0600, Alex Williamson wrote:
> > On Sun, 21 Mar 2021 10:40:55 +0200
> > Leon Romanovsky <[email protected]> wrote:
> >
> > > On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:
> > > > On Sat, 20 Mar 2021 11:10:08 +0200
> > > > Leon Romanovsky <[email protected]> wrote:
> > > > > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:
> > > > > >
> > > > > > What if we taint the kernel or pci_warn() for cases where either all
> > > > > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > > > > time a device specific method is disabled?
> > > > >
> > > > > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > > > > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > > > > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.
> > > >
> > > > "none" as implemented in this patch, clearing the enabled function
> > > > reset methods.
> > >
> > > It is far from intuitive, the empty string will be easier to understand,
> > > because "none" means no reset at all.
> >
> > "No reset at all" is what "none" achieves, the
> > pci_dev.reset_methods_enabled bitmap is cleared. We can use an empty
> > string, but I think we want a way to clear all enabled resets and a way
> > to return it to the default. I could see arguments for an empty string
> > serving either purpose, so this version proposed explicitly using
> > "none" and "default", as included in the ABI update.
>
> I will stick with "default" only and leave "none" for something else.
Are you suggesting writing "default" restores the unmodified behavior
and writing an empty string clears all enabled reset methods?
> > > > > > I'd almost go so far as to prevent disabling a device specific reset
> > > > > > altogether, but for example should a device specific reset that fixes
> > > > > > an aspect of FLR behavior prevent using a bus reset? I'd prefer in that
> > > > > > case if direct FLR were disabled via a device flag introduced with the
> > > > > > quirk and the remaining resets can still be selected by preference.
> > > > >
> > > > > I don't know enough to discuss the PCI details, but you raised good point.
> > > > > This sysfs is user visible API that is presented as is from device point
> > > > > of view. It can be easily run into problems if PCI/core doesn't work with
> > > > > user's choice.
> > > > >
> > > > > >
> > > > > > Theoretically all the other reset methods work and are available, it's
> > > > > > only a policy decision which to use, right?
> > > > >
> > > > > But this patch was presented as a way to overcome situations where
> > > > > supported != working and user magically knows which reset type to set.
> > > >
> > > > It's not magic, the new sysfs attributes expose which resets are
> > > > enabled and the order that they're used, the user can simply select the
> > > > next one. Being able to bypass a broken reset method is a helpful side
> > > > effect of getting to select a preferred reset method.
> > >
> > > Magic in a sense that user has no idea what those resets mean, the
> > > expectation is that he will blindly iterate till something works.
> >
> > Which ought to actually be a safe thing to do. We should have quirks to
> > exclude resets that are known broken but still probe as present and I'd
> > be perfectly fine if we issue a warning if the user disables all resets
> > for a given device.
> >
> > > > > If you want to take this patch to be policy decision tool,
> > > > > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > > > > so fallback will work natively.
> > > >
> > > > I don't see that as a requirement. We have fall-through support in the
> > > > kernel, but for a given device we're really only ever going to make use
> > > > of one of those methods. If a user knows enough about a device to have
> > > > a preference, I think it can be singular. That also significantly
> > > > simplifies the interface and supporting code. Thanks,
> > >
> > > I'm struggling to get requirements from this thread. You talked about
> > > policy decision to overtake fallback mechanism, Amey wanted to avoid
> > > quirks.
> > >
> > > Do you have an example of such devices or we are talking about
> > > theoretical case?
> >
> > Look at any device that already has a reset quirk and the process it
> > took to get there. Those are more than just theoretical cases.
>
> So let's fix the process. The long standing kernel policy is that kernel
> bugs (and missing quirk can be seen as such bug) should be fixed in the
> kernel and not workaround by the users.
I don't see an actual proposal here to fix the process. Allowing
specific reset methods to be trivially tested is a step towards fixing
the process. Unfortunately we can't tell the difference between
someone setting a policy because they prefer a reset mechanism, are
testing a reset mechanism, or they're avoiding a broken reset mechanism.
We can't force participation if we've made it clear that the interface
should not be used long term for anything other than policy preference
and testing.
> > For policy preference, I already described how I've configured QEMU to
> > prefer a bus reset rather than a PM reset due to lack of specification
> > regarding the scope of a PM "soft reset". This interface would allow a
> > system policy to do that same thing.
> >
> > I don't think anyone is suggesting this as a means to avoid quirks that
> > would resolve reset issues and create the best default general behavior.
> > This provides a mechanism to test various reset methods, and thereby
> > identify broken methods, and set a policy. Sure, that policy might be
> > to avoid a broken reset in the interim before it gets quirked and
> > there's potential for abuse there, but I think the benefits outweigh
> > the risks.
>
> This interface is proposed as first class citizen in the general sysfs
> layout. Of course, it will be seen as a way to bypass the kernel.
>
> At least, put it under CONFIG_EXPERT option, so no distro will enable it
> by default.
Of course we're proposing it to be accessible, it should also require
admin privileges to modify, sysfs has lots of such things. If it's
relegated to non-default accessibility, it won't be used for testing
and it won't be available for system policy and it's pointless.
> > > And I don't see why simple line parser with loop iterator over strchr()
> > > suddenly becomes complicated code.
> >
> > Setting multiple bits in a bitmap is easy. How do you then go on to
> > allow the user to specify an ordering preference? If you have an
> > algorithm you'd like to propose that allows the user to manage the
> > ordering when enabling multiple methods without substantially
> > increasing the complexity, please share. IMO, a given device will
> > generally use one reset method and it seems sufficient to restrict user
> > preference to achieve all the use cases I've noted. Thanks,
>
> Linked list + iterator will do the trick.
So you're suggesting to add potentially multiple dynamic allocations per
device and list locking and management for an unspecified use case for
an interface you seem to be opposed to anyway. It should be pretty
clear why the keep-it-simple approach was taken in this series. Thanks,
Alex
On Wed, Mar 24, 2021 at 08:37:43AM -0600, Alex Williamson wrote:
> On Wed, 24 Mar 2021 12:03:00 +0200
> Leon Romanovsky <[email protected]> wrote:
>
> > On Mon, Mar 22, 2021 at 11:10:03AM -0600, Alex Williamson wrote:
> > > On Sun, 21 Mar 2021 10:40:55 +0200
> > > Leon Romanovsky <[email protected]> wrote:
> > >
> > > > On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:
> > > > > On Sat, 20 Mar 2021 11:10:08 +0200
> > > > > Leon Romanovsky <[email protected]> wrote:
> > > > > > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:
> > > > > > >
> > > > > > > What if we taint the kernel or pci_warn() for cases where either all
> > > > > > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > > > > > time a device specific method is disabled?
> > > > > >
> > > > > > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > > > > > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > > > > > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.
> > > > >
> > > > > "none" as implemented in this patch, clearing the enabled function
> > > > > reset methods.
> > > >
> > > > It is far from intuitive, the empty string will be easier to understand,
> > > > because "none" means no reset at all.
> > >
> > > "No reset at all" is what "none" achieves, the
> > > pci_dev.reset_methods_enabled bitmap is cleared. We can use an empty
> > > string, but I think we want a way to clear all enabled resets and a way
> > > to return it to the default. I could see arguments for an empty string
> > > serving either purpose, so this version proposed explicitly using
> > > "none" and "default", as included in the ABI update.
> >
> > I will stick with "default" only and leave "none" for something else.
>
> Are you suggesting writing "default" restores the unmodified behavior
> and writing an empty string clears all enabled reset methods?
>
> > > > > > > I'd almost go so far as to prevent disabling a device specific reset
> > > > > > > altogether, but for example should a device specific reset that fixes
> > > > > > > an aspect of FLR behavior prevent using a bus reset? I'd prefer in that
> > > > > > > case if direct FLR were disabled via a device flag introduced with the
> > > > > > > quirk and the remaining resets can still be selected by preference.
> > > > > >
> > > > > > I don't know enough to discuss the PCI details, but you raised good point.
> > > > > > This sysfs is user visible API that is presented as is from device point
> > > > > > of view. It can be easily run into problems if PCI/core doesn't work with
> > > > > > user's choice.
> > > > > >
> > > > > > >
> > > > > > > Theoretically all the other reset methods work and are available, it's
> > > > > > > only a policy decision which to use, right?
> > > > > >
> > > > > > But this patch was presented as a way to overcome situations where
> > > > > > supported != working and user magically knows which reset type to set.
> > > > >
> > > > > It's not magic, the new sysfs attributes expose which resets are
> > > > > enabled and the order that they're used, the user can simply select the
> > > > > next one. Being able to bypass a broken reset method is a helpful side
> > > > > effect of getting to select a preferred reset method.
> > > >
> > > > Magic in a sense that user has no idea what those resets mean, the
> > > > expectation is that he will blindly iterate till something works.
> > >
> > > Which ought to actually be a safe thing to do. We should have quirks to
> > > exclude resets that are known broken but still probe as present and I'd
> > > be perfectly fine if we issue a warning if the user disables all resets
> > > for a given device.
> > >
> > > > > > If you want to take this patch to be policy decision tool,
> > > > > > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > > > > > so fallback will work natively.
> > > > >
> > > > > I don't see that as a requirement. We have fall-through support in the
> > > > > kernel, but for a given device we're really only ever going to make use
> > > > > of one of those methods. If a user knows enough about a device to have
> > > > > a preference, I think it can be singular. That also significantly
> > > > > simplifies the interface and supporting code. Thanks,
> > > >
> > > > I'm struggling to get requirements from this thread. You talked about
> > > > policy decision to overtake fallback mechanism, Amey wanted to avoid
> > > > quirks.
> > > >
> > > > Do you have an example of such devices or we are talking about
> > > > theoretical case?
> > >
> > > Look at any device that already has a reset quirk and the process it
> > > took to get there. Those are more than just theoretical cases.
> >
> > So let's fix the process. The long standing kernel policy is that kernel
> > bugs (and missing quirk can be seen as such bug) should be fixed in the
> > kernel and not workaround by the users.
>
> I don't see an actual proposal here to fix the process. Allowing
> specific reset methods to be trivially tested is a step towards fixing
> the process. Unfortunately we can't tell the difference between
> someone setting a policy because they prefer a reset mechanism, are
> testing a reset mechanism, or they're avoiding a broken reset mechanism.
> We can't force participation if we've made it clear that the interface
> should not be used long term for anything other than policy preference
> and testing.
Yes, and real testing/debugging almost always requires kernel rebuild.
Everything else is waste of time.
>
> > > For policy preference, I already described how I've configured QEMU to
> > > prefer a bus reset rather than a PM reset due to lack of specification
> > > regarding the scope of a PM "soft reset". This interface would allow a
> > > system policy to do that same thing.
> > >
> > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > would resolve reset issues and create the best default general behavior.
> > > This provides a mechanism to test various reset methods, and thereby
> > > identify broken methods, and set a policy. Sure, that policy might be
> > > to avoid a broken reset in the interim before it gets quirked and
> > > there's potential for abuse there, but I think the benefits outweigh
> > > the risks.
> >
> > This interface is proposed as first class citizen in the general sysfs
> > layout. Of course, it will be seen as a way to bypass the kernel.
> >
> > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > by default.
>
> Of course we're proposing it to be accessible, it should also require
> admin privileges to modify, sysfs has lots of such things. If it's
> relegated to non-default accessibility, it won't be used for testing
> and it won't be available for system policy and it's pointless.
We probably have difference in view of what testing is. I expect from
the users who experience issues with reset to do extra steps and one of
them is to require from them to compile their kernel.
The root permissions doesn't protect from anything, SO lovers will use
root without even thinking twice.
>
> > > > And I don't see why simple line parser with loop iterator over strchr()
> > > > suddenly becomes complicated code.
> > >
> > > Setting multiple bits in a bitmap is easy. How do you then go on to
> > > allow the user to specify an ordering preference? If you have an
> > > algorithm you'd like to propose that allows the user to manage the
> > > ordering when enabling multiple methods without substantially
> > > increasing the complexity, please share. IMO, a given device will
> > > generally use one reset method and it seems sufficient to restrict user
> > > preference to achieve all the use cases I've noted. Thanks,
> >
> > Linked list + iterator will do the trick.
>
> So you're suggesting to add potentially multiple dynamic allocations per
> device and list locking and management for an unspecified use case for
> an interface you seem to be opposed to anyway. It should be pretty
> clear why the keep-it-simple approach was taken in this series. Thanks,
I'm trying to help you with your use case of providing reset policy
mechanism, which can be without CONFIG_EXPERT. However if you want
to continue path of having specific reset type only, please ensure
that this is not taken to the "bypass kernel" direction.
Thanks
>
> Alex
>
On Wed, 24 Mar 2021 17:13:56 +0200
Leon Romanovsky <[email protected]> wrote:
> On Wed, Mar 24, 2021 at 08:37:43AM -0600, Alex Williamson wrote:
> > On Wed, 24 Mar 2021 12:03:00 +0200
> > Leon Romanovsky <[email protected]> wrote:
> >
> > > On Mon, Mar 22, 2021 at 11:10:03AM -0600, Alex Williamson wrote:
> > > > On Sun, 21 Mar 2021 10:40:55 +0200
> > > > Leon Romanovsky <[email protected]> wrote:
> > > >
> > > > > On Sat, Mar 20, 2021 at 08:59:42AM -0600, Alex Williamson wrote:
> > > > > > On Sat, 20 Mar 2021 11:10:08 +0200
> > > > > > Leon Romanovsky <[email protected]> wrote:
> > > > > > > On Fri, Mar 19, 2021 at 10:23:13AM -0600, Alex Williamson wrote:
> > > > > > > >
> > > > > > > > What if we taint the kernel or pci_warn() for cases where either all
> > > > > > > > the reset methods are disabled, ie. 'echo none > reset_method', or any
> > > > > > > > time a device specific method is disabled?
> > > > > > >
> > > > > > > What does it mean "none"? Does it mean nothing supported? If yes, I think that
> > > > > > > pci_warn() will be enough. At least for me, taint is usable during debug stages,
> > > > > > > probably if device doesn't crash no one will look to see /proc/sys/kernel/tainted.
> > > > > >
> > > > > > "none" as implemented in this patch, clearing the enabled function
> > > > > > reset methods.
> > > > >
> > > > > It is far from intuitive, the empty string will be easier to understand,
> > > > > because "none" means no reset at all.
> > > >
> > > > "No reset at all" is what "none" achieves, the
> > > > pci_dev.reset_methods_enabled bitmap is cleared. We can use an empty
> > > > string, but I think we want a way to clear all enabled resets and a way
> > > > to return it to the default. I could see arguments for an empty string
> > > > serving either purpose, so this version proposed explicitly using
> > > > "none" and "default", as included in the ABI update.
> > >
> > > I will stick with "default" only and leave "none" for something else.
> >
> > Are you suggesting writing "default" restores the unmodified behavior
> > and writing an empty string clears all enabled reset methods?
> >
> > > > > > > > I'd almost go so far as to prevent disabling a device specific reset
> > > > > > > > altogether, but for example should a device specific reset that fixes
> > > > > > > > an aspect of FLR behavior prevent using a bus reset? I'd prefer in that
> > > > > > > > case if direct FLR were disabled via a device flag introduced with the
> > > > > > > > quirk and the remaining resets can still be selected by preference.
> > > > > > >
> > > > > > > I don't know enough to discuss the PCI details, but you raised good point.
> > > > > > > This sysfs is user visible API that is presented as is from device point
> > > > > > > of view. It can be easily run into problems if PCI/core doesn't work with
> > > > > > > user's choice.
> > > > > > >
> > > > > > > >
> > > > > > > > Theoretically all the other reset methods work and are available, it's
> > > > > > > > only a policy decision which to use, right?
> > > > > > >
> > > > > > > But this patch was presented as a way to overcome situations where
> > > > > > > supported != working and user magically knows which reset type to set.
> > > > > >
> > > > > > It's not magic, the new sysfs attributes expose which resets are
> > > > > > enabled and the order that they're used, the user can simply select the
> > > > > > next one. Being able to bypass a broken reset method is a helpful side
> > > > > > effect of getting to select a preferred reset method.
> > > > >
> > > > > Magic in a sense that user has no idea what those resets mean, the
> > > > > expectation is that he will blindly iterate till something works.
> > > >
> > > > Which ought to actually be a safe thing to do. We should have quirks to
> > > > exclude resets that are known broken but still probe as present and I'd
> > > > be perfectly fine if we issue a warning if the user disables all resets
> > > > for a given device.
> > > >
> > > > > > > If you want to take this patch to be policy decision tool,
> > > > > > > it will need to accept "reset_type1,reset_type2,..." sort of input,
> > > > > > > so fallback will work natively.
> > > > > >
> > > > > > I don't see that as a requirement. We have fall-through support in the
> > > > > > kernel, but for a given device we're really only ever going to make use
> > > > > > of one of those methods. If a user knows enough about a device to have
> > > > > > a preference, I think it can be singular. That also significantly
> > > > > > simplifies the interface and supporting code. Thanks,
> > > > >
> > > > > I'm struggling to get requirements from this thread. You talked about
> > > > > policy decision to overtake fallback mechanism, Amey wanted to avoid
> > > > > quirks.
> > > > >
> > > > > Do you have an example of such devices or we are talking about
> > > > > theoretical case?
> > > >
> > > > Look at any device that already has a reset quirk and the process it
> > > > took to get there. Those are more than just theoretical cases.
> > >
> > > So let's fix the process. The long standing kernel policy is that kernel
> > > bugs (and missing quirk can be seen as such bug) should be fixed in the
> > > kernel and not workaround by the users.
> >
> > I don't see an actual proposal here to fix the process. Allowing
> > specific reset methods to be trivially tested is a step towards fixing
> > the process. Unfortunately we can't tell the difference between
> > someone setting a policy because they prefer a reset mechanism, are
> > testing a reset mechanism, or they're avoiding a broken reset mechanism.
> > We can't force participation if we've made it clear that the interface
> > should not be used long term for anything other than policy preference
> > and testing.
>
> Yes, and real testing/debugging almost always requires kernel rebuild.
> Everything else is waste of time.
Sorry, this is nonsense. Allowing users to debug issues without a full
kernel rebuild is a good thing.
> > > > For policy preference, I already described how I've configured QEMU to
> > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > regarding the scope of a PM "soft reset". This interface would allow a
> > > > system policy to do that same thing.
> > > >
> > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > would resolve reset issues and create the best default general behavior.
> > > > This provides a mechanism to test various reset methods, and thereby
> > > > identify broken methods, and set a policy. Sure, that policy might be
> > > > to avoid a broken reset in the interim before it gets quirked and
> > > > there's potential for abuse there, but I think the benefits outweigh
> > > > the risks.
> > >
> > > This interface is proposed as first class citizen in the general sysfs
> > > layout. Of course, it will be seen as a way to bypass the kernel.
> > >
> > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > by default.
> >
> > Of course we're proposing it to be accessible, it should also require
> > admin privileges to modify, sysfs has lots of such things. If it's
> > relegated to non-default accessibility, it won't be used for testing
> > and it won't be available for system policy and it's pointless.
>
> We probably have difference in view of what testing is. I expect from
> the users who experience issues with reset to do extra steps and one of
> them is to require from them to compile their kernel.
I would define the ability to generate a CI test that can pick a
device, unbind it from its driver, and iterate reset methods as a
worthwhile improvement in testing.
> The root permissions doesn't protect from anything, SO lovers will use
> root without even thinking twice.
Yes, with great power comes great responsibility. Many admins ignore
this. That's far beyond the scope of this series.
> > > > > And I don't see why simple line parser with loop iterator over strchr()
> > > > > suddenly becomes complicated code.
> > > >
> > > > Setting multiple bits in a bitmap is easy. How do you then go on to
> > > > allow the user to specify an ordering preference? If you have an
> > > > algorithm you'd like to propose that allows the user to manage the
> > > > ordering when enabling multiple methods without substantially
> > > > increasing the complexity, please share. IMO, a given device will
> > > > generally use one reset method and it seems sufficient to restrict user
> > > > preference to achieve all the use cases I've noted. Thanks,
> > >
> > > Linked list + iterator will do the trick.
> >
> > So you're suggesting to add potentially multiple dynamic allocations per
> > device and list locking and management for an unspecified use case for
> > an interface you seem to be opposed to anyway. It should be pretty
> > clear why the keep-it-simple approach was taken in this series. Thanks,
>
> I'm trying to help you with your use case of providing reset policy
> mechanism, which can be without CONFIG_EXPERT. However if you want
> to continue path of having specific reset type only, please ensure
> that this is not taken to the "bypass kernel" direction.
You've lost me, are you saying you'd be in favor of an interface that
allows an admin to specify an arbitrary list of reset methods because
that's somehow more in line with a policy choice than a userspace
workaround? This seems like unnecessary bloat because (a) it allows
the same bypass mechanism, and (b) a given device is only going to use
a single method anyway, so the functionality is unnecessary. Please
help me understand how this favors the policy use case. Thanks,
Alex
On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> On Wed, 24 Mar 2021 17:13:56 +0200
> Leon Romanovsky <[email protected]> wrote:
<...>
> > Yes, and real testing/debugging almost always requires kernel rebuild.
> > Everything else is waste of time.
>
> Sorry, this is nonsense. Allowing users to debug issues without a full
> kernel rebuild is a good thing.
It is far from debug, this interface doesn't give you any answers why
the reset didn't work, it just helps you to find the one that works.
Unless you believe that this information will be enough to understand
the root cause, you will need to ask from the user to perform extra
tests, maybe try some quirk. All of that requires from the users to
rebuild their kernel.
So no, it is not debug.
>
> > > > > For policy preference, I already described how I've configured QEMU to
> > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > regarding the scope of a PM "soft reset". This interface would allow a
> > > > > system policy to do that same thing.
> > > > >
> > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > would resolve reset issues and create the best default general behavior.
> > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > identify broken methods, and set a policy. Sure, that policy might be
> > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > the risks.
> > > >
> > > > This interface is proposed as first class citizen in the general sysfs
> > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > >
> > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > by default.
> > >
> > > Of course we're proposing it to be accessible, it should also require
> > > admin privileges to modify, sysfs has lots of such things. If it's
> > > relegated to non-default accessibility, it won't be used for testing
> > > and it won't be available for system policy and it's pointless.
> >
> > We probably have difference in view of what testing is. I expect from
> > the users who experience issues with reset to do extra steps and one of
> > them is to require from them to compile their kernel.
>
> I would define the ability to generate a CI test that can pick a
> device, unbind it from its driver, and iterate reset methods as a
> worthwhile improvement in testing.
Who is going to run this CI? At least all kernel CIs (external and
internal to HW vendors) that I'm familiar are building kernel themselves.
Distro kernel is too bloat to be really usable for CI.
>
> > The root permissions doesn't protect from anything, SO lovers will use
> > root without even thinking twice.
>
> Yes, with great power comes great responsibility. Many admins ignore
> this. That's far beyond the scope of this series.
<...>
> > I'm trying to help you with your use case of providing reset policy
> > mechanism, which can be without CONFIG_EXPERT. However if you want
> > to continue path of having specific reset type only, please ensure
> > that this is not taken to the "bypass kernel" direction.
>
> You've lost me, are you saying you'd be in favor of an interface that
> allows an admin to specify an arbitrary list of reset methods because
> that's somehow more in line with a policy choice than a userspace
> workaround? This seems like unnecessary bloat because (a) it allows
> the same bypass mechanism, and (b) a given device is only going to use
> a single method anyway, so the functionality is unnecessary. Please
> help me understand how this favors the policy use case. Thanks,
The policy decision is global logic that is easier to grasp. At some
point of our discussion, you presented the case where PM reset is not
defined well and you prefer to do bus reset (something like that).
I expect that QEMU sets same reset policy for all devices at the same
time instead of trying per-device to guess which one works.
And yes, you will be able to bypass kernel, but at least this interface
will be broader than initial one that serves only SO and workarounds.
Thanks
>
> Alex
>
On Thu, 25 Mar 2021 10:37:54 +0200
Leon Romanovsky <[email protected]> wrote:
> On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > On Wed, 24 Mar 2021 17:13:56 +0200
> > Leon Romanovsky <[email protected]> wrote:
>
> <...>
>
> > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > Everything else is waste of time.
> >
> > Sorry, this is nonsense. Allowing users to debug issues without a full
> > kernel rebuild is a good thing.
>
> It is far from debug, this interface doesn't give you any answers why
> the reset didn't work, it just helps you to find the one that works.
>
> Unless you believe that this information will be enough to understand
> the root cause, you will need to ask from the user to perform extra
> tests, maybe try some quirk. All of that requires from the users to
> rebuild their kernel.
>
> So no, it is not debug.
It allows a user to experiment to determine (a) my device doesn't work
in a given scenario with the default configuration, but (b) if I change
the reset to this other thing it does work. That is a step in
debugging.
It's absurd to think that a sysfs attribute could provide root cause,
but it might be enough for someone to further help that user. It would
be a useful clue for a bug report. Yes, reaching root cause might
involve building a kernel, but that doesn't invalidate that having a
step towards debugging in the base kernel might be a useful tool.
> > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > regarding the scope of a PM "soft reset". This interface would allow a
> > > > > > system policy to do that same thing.
> > > > > >
> > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > identify broken methods, and set a policy. Sure, that policy might be
> > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > the risks.
> > > > >
> > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > >
> > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > by default.
> > > >
> > > > Of course we're proposing it to be accessible, it should also require
> > > > admin privileges to modify, sysfs has lots of such things. If it's
> > > > relegated to non-default accessibility, it won't be used for testing
> > > > and it won't be available for system policy and it's pointless.
> > >
> > > We probably have difference in view of what testing is. I expect from
> > > the users who experience issues with reset to do extra steps and one of
> > > them is to require from them to compile their kernel.
> >
> > I would define the ability to generate a CI test that can pick a
> > device, unbind it from its driver, and iterate reset methods as a
> > worthwhile improvement in testing.
>
> Who is going to run this CI? At least all kernel CIs (external and
> internal to HW vendors) that I'm familiar are building kernel themselves.
>
> Distro kernel is too bloat to be really usable for CI.
At this point I'm suspicious you're trolling. A distro kernel CI
certainly uses the kernel they intend to ship and support in their
environment. You're concerned about a bloated kernel, but the proposal
here adds 2-bytes per device to track reset methods and a trivial array
in text memory, meanwhile you're proposing multiple per-device memory
allocations to enhance the feature you think is too bloated for CI.
> > > The root permissions doesn't protect from anything, SO lovers will use
> > > root without even thinking twice.
> >
> > Yes, with great power comes great responsibility. Many admins ignore
> > this. That's far beyond the scope of this series.
>
> <...>
>
> > > I'm trying to help you with your use case of providing reset policy
> > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > to continue path of having specific reset type only, please ensure
> > > that this is not taken to the "bypass kernel" direction.
> >
> > You've lost me, are you saying you'd be in favor of an interface that
> > allows an admin to specify an arbitrary list of reset methods because
> > that's somehow more in line with a policy choice than a userspace
> > workaround? This seems like unnecessary bloat because (a) it allows
> > the same bypass mechanism, and (b) a given device is only going to use
> > a single method anyway, so the functionality is unnecessary. Please
> > help me understand how this favors the policy use case. Thanks,
>
> The policy decision is global logic that is easier to grasp. At some
> point of our discussion, you presented the case where PM reset is not
> defined well and you prefer to do bus reset (something like that).
>
> I expect that QEMU sets same reset policy for all devices at the same
> time instead of trying per-device to guess which one works.
>
> And yes, you will be able to bypass kernel, but at least this interface
> will be broader than initial one that serves only SO and workarounds.
I still think allocating objects for a list and managing that list is
too bloated and complicated, but I agree that being able to have more
fine grained control could be useful. Is it necessary to be able to
re-order reset methods or might it still be better aligned to a policy
use case if we allow plus and minus operators? For example, a device
might list:
[pm] [bus]
Indicating that PM and bus reset are both available and enabled. The
user could do:
echo -pm > reset_methods
This would result in:
pm [bus]
Indicating that both PM and bus resets are available, but only bus reset
is enabled (note this is the identical result to "echo bus >" in the
current proposal). "echo +pm" or "echo default" could re-enable the PM
reset. Would something like that be satisfactory?
If we need to allow re-ording, we'd want to use a byte-array where each
byte indicates a type of reset and perhaps a non-zero value in the
array indicates the method is enabled and the value indicates priority.
For example writing "dev_spec,flr,bus" would parse to write 1 to the
byte associated with the device specific reset, 2 to flr, 3 to bus
reset, then we'd process low to high (or maybe starting at a high value
to count down to zero might be more simple). We could do that with
only adding less than a fixed 8-bytes per device and no dynamic
allocation. Thoughts? Thanks,
Alex
On Thu, Mar 25, 2021 at 08:55:04AM -0600, Alex Williamson wrote:
> On Thu, 25 Mar 2021 10:37:54 +0200
> Leon Romanovsky <[email protected]> wrote:
>
> > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > Leon Romanovsky <[email protected]> wrote:
> >
> > <...>
> >
> > > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > > Everything else is waste of time.
> > >
> > > Sorry, this is nonsense. Allowing users to debug issues without a full
> > > kernel rebuild is a good thing.
> >
> > It is far from debug, this interface doesn't give you any answers why
> > the reset didn't work, it just helps you to find the one that works.
> >
> > Unless you believe that this information will be enough to understand
> > the root cause, you will need to ask from the user to perform extra
> > tests, maybe try some quirk. All of that requires from the users to
> > rebuild their kernel.
> >
> > So no, it is not debug.
>
> It allows a user to experiment to determine (a) my device doesn't work
> in a given scenario with the default configuration, but (b) if I change
> the reset to this other thing it does work. That is a step in
> debugging.
>
> It's absurd to think that a sysfs attribute could provide root cause,
> but it might be enough for someone to further help that user. It would
> be a useful clue for a bug report. Yes, reaching root cause might
> involve building a kernel, but that doesn't invalidate that having a
> step towards debugging in the base kernel might be a useful tool.
Let's agree to do not agree.
>
> > > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > > regarding the scope of a PM "soft reset". This interface would allow a
> > > > > > > system policy to do that same thing.
> > > > > > >
> > > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > > identify broken methods, and set a policy. Sure, that policy might be
> > > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > > the risks.
> > > > > >
> > > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > > >
> > > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > > by default.
> > > > >
> > > > > Of course we're proposing it to be accessible, it should also require
> > > > > admin privileges to modify, sysfs has lots of such things. If it's
> > > > > relegated to non-default accessibility, it won't be used for testing
> > > > > and it won't be available for system policy and it's pointless.
> > > >
> > > > We probably have difference in view of what testing is. I expect from
> > > > the users who experience issues with reset to do extra steps and one of
> > > > them is to require from them to compile their kernel.
> > >
> > > I would define the ability to generate a CI test that can pick a
> > > device, unbind it from its driver, and iterate reset methods as a
> > > worthwhile improvement in testing.
> >
> > Who is going to run this CI? At least all kernel CIs (external and
> > internal to HW vendors) that I'm familiar are building kernel themselves.
> >
> > Distro kernel is too bloat to be really usable for CI.
>
> At this point I'm suspicious you're trolling. A distro kernel CI
> certainly uses the kernel they intend to ship and support in their
> environment. You're concerned about a bloated kernel, but the proposal
> here adds 2-bytes per device to track reset methods and a trivial array
> in text memory, meanwhile you're proposing multiple per-device memory
> allocations to enhance the feature you think is too bloated for CI.
I don't know why you decided to focus on memory footprint which is not
important at all during CI runs. The bloat is in Kconfig options that
are not needed. Those extra options add significant overhead during
builds and runs itself.
And not, I'm not trolling, but representing HW vendor that pushes its CI
and developers environment to the limit, by running full kernel builds with
less than 30 seconds and boot-to-test with less than 6 seconds for full
Fedora VM.
>
> > > > The root permissions doesn't protect from anything, SO lovers will use
> > > > root without even thinking twice.
> > >
> > > Yes, with great power comes great responsibility. Many admins ignore
> > > this. That's far beyond the scope of this series.
> >
> > <...>
> >
> > > > I'm trying to help you with your use case of providing reset policy
> > > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > > to continue path of having specific reset type only, please ensure
> > > > that this is not taken to the "bypass kernel" direction.
> > >
> > > You've lost me, are you saying you'd be in favor of an interface that
> > > allows an admin to specify an arbitrary list of reset methods because
> > > that's somehow more in line with a policy choice than a userspace
> > > workaround? This seems like unnecessary bloat because (a) it allows
> > > the same bypass mechanism, and (b) a given device is only going to use
> > > a single method anyway, so the functionality is unnecessary. Please
> > > help me understand how this favors the policy use case. Thanks,
> >
> > The policy decision is global logic that is easier to grasp. At some
> > point of our discussion, you presented the case where PM reset is not
> > defined well and you prefer to do bus reset (something like that).
> >
> > I expect that QEMU sets same reset policy for all devices at the same
> > time instead of trying per-device to guess which one works.
> >
> > And yes, you will be able to bypass kernel, but at least this interface
> > will be broader than initial one that serves only SO and workarounds.
>
> I still think allocating objects for a list and managing that list is
> too bloated and complicated, but I agree that being able to have more
> fine grained control could be useful. Is it necessary to be able to
> re-order reset methods or might it still be better aligned to a policy
> use case if we allow plus and minus operators? For example, a device
> might list:
>
> [pm] [bus]
>
> Indicating that PM and bus reset are both available and enabled. The
> user could do:
>
> echo -pm > reset_methods
>
> This would result in:
>
> pm [bus]
>
> Indicating that both PM and bus resets are available, but only bus reset
> is enabled (note this is the identical result to "echo bus >" in the
> current proposal). "echo +pm" or "echo default" could re-enable the PM
> reset. Would something like that be satisfactory?
Yes, I actually imagined simpler interface:
To set specific type:
echo pm > reset_methods
To set policy:
echo "pm,bus" > reset_methods
But your proposal is nicer.
>
> If we need to allow re-ording, we'd want to use a byte-array where each
> byte indicates a type of reset and perhaps a non-zero value in the
> array indicates the method is enabled and the value indicates priority.
> For example writing "dev_spec,flr,bus" would parse to write 1 to the
> byte associated with the device specific reset, 2 to flr, 3 to bus
> reset, then we'd process low to high (or maybe starting at a high value
> to count down to zero might be more simple). We could do that with
> only adding less than a fixed 8-bytes per device and no dynamic
> allocation. Thoughts? Thanks,
Like I suggested, linked list will be easier and the reset will be
something like:
for_each_reset_type(device, type) {
switch (type) {
case PM:
ret = do_some_reset(device);
break;
case BUS:
.....
}
if (!ret || ret == -ENOMEM) <-- go to next type in linked list
return ret;
}
>
> Alex
>
On 21/03/25 10:37AM, Leon Romanovsky wrote:
> On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > On Wed, 24 Mar 2021 17:13:56 +0200
> > Leon Romanovsky <[email protected]> wrote:
>
> <...>
>
> > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > Everything else is waste of time.
> >
> > Sorry, this is nonsense. Allowing users to debug issues without a full
> > kernel rebuild is a good thing.
>
> It is far from debug, this interface doesn't give you any answers why
> the reset didn't work, it just helps you to find the one that works.
>
> Unless you believe that this information will be enough to understand
> the root cause, you will need to ask from the user to perform extra
> tests, maybe try some quirk. All of that requires from the users to
> rebuild their kernel.
>
> So no, it is not debug.
>
> >
> > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > regarding the scope of a PM "soft reset". This interface would allow a
> > > > > > system policy to do that same thing.
> > > > > >
> > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > identify broken methods, and set a policy. Sure, that policy might be
> > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > the risks.
> > > > >
> > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > >
> > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > by default.
> > > >
> > > > Of course we're proposing it to be accessible, it should also require
> > > > admin privileges to modify, sysfs has lots of such things. If it's
> > > > relegated to non-default accessibility, it won't be used for testing
> > > > and it won't be available for system policy and it's pointless.
> > >
> > > We probably have difference in view of what testing is. I expect from
> > > the users who experience issues with reset to do extra steps and one of
> > > them is to require from them to compile their kernel.
> >
> > I would define the ability to generate a CI test that can pick a
> > device, unbind it from its driver, and iterate reset methods as a
> > worthwhile improvement in testing.
>
> Who is going to run this CI? At least all kernel CIs (external and
> internal to HW vendors) that I'm familiar are building kernel themselves.
>
> Distro kernel is too bloat to be really usable for CI.
>
> >
> > > The root permissions doesn't protect from anything, SO lovers will use
> > > root without even thinking twice.
> >
> > Yes, with great power comes great responsibility. Many admins ignore
> > this. That's far beyond the scope of this series.
>
> <...>
>
> > > I'm trying to help you with your use case of providing reset policy
> > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > to continue path of having specific reset type only, please ensure
> > > that this is not taken to the "bypass kernel" direction.
> >
> > You've lost me, are you saying you'd be in favor of an interface that
> > allows an admin to specify an arbitrary list of reset methods because
> > that's somehow more in line with a policy choice than a userspace
> > workaround? This seems like unnecessary bloat because (a) it allows
> > the same bypass mechanism, and (b) a given device is only going to use
> > a single method anyway, so the functionality is unnecessary. Please
> > help me understand how this favors the policy use case. Thanks,
>
> The policy decision is global logic that is easier to grasp. At some
> point of our discussion, you presented the case where PM reset is not
> defined well and you prefer to do bus reset (something like that).
>
> I expect that QEMU sets same reset policy for all devices at the same
> time instead of trying per-device to guess which one works.
>
The current reset attribute does the same thing internally you described
at the end.
> And yes, you will be able to bypass kernel, but at least this interface
> will be broader than initial one that serves only SO and workarounds.
>
What does it mean by "bypassing" kernel?
I don't see any problem with SO and workaround if that is the only
way an user can use their device. Why are you expecting every vendor to
develop quirk? Also I don't see any point of using linked list to
unnecessarily complicate a simple thing.
Thanks,
Amey
On Thu, Mar 25, 2021 at 09:56:37PM +0530, Amey Narkhede wrote:
> On 21/03/25 10:37AM, Leon Romanovsky wrote:
> > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > Leon Romanovsky <[email protected]> wrote:
<...>
> > I expect that QEMU sets same reset policy for all devices at the same
> > time instead of trying per-device to guess which one works.
> >
> The current reset attribute does the same thing internally you described
> at the end.
> > And yes, you will be able to bypass kernel, but at least this interface
> > will be broader than initial one that serves only SO and workarounds.
> >
> What does it mean by "bypassing" kernel?
> I don't see any problem with SO and workaround if that is the only
> way an user can use their device. Why are you expecting every vendor to
> develop quirk? Also I don't see any point of using linked list to
> unnecessarily complicate a simple thing.
Please reread our conversation with Alex, it has answers to both of your
questions.
Thanks
>
> Thanks,
> Amey
On 21/03/25 06:09PM, Leon Romanovsky wrote:
> On Thu, Mar 25, 2021 at 08:55:04AM -0600, Alex Williamson wrote:
> > On Thu, 25 Mar 2021 10:37:54 +0200
> > Leon Romanovsky <[email protected]> wrote:
> >
> > > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > > Leon Romanovsky <[email protected]> wrote:
> > >
> > > <...>
> > >
> > > > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > > > Everything else is waste of time.
> > > >
> > > > Sorry, this is nonsense. Allowing users to debug issues without a full
> > > > kernel rebuild is a good thing.
> > >
> > > It is far from debug, this interface doesn't give you any answers why
> > > the reset didn't work, it just helps you to find the one that works.
> > >
> > > Unless you believe that this information will be enough to understand
> > > the root cause, you will need to ask from the user to perform extra
> > > tests, maybe try some quirk. All of that requires from the users to
> > > rebuild their kernel.
> > >
> > > So no, it is not debug.
> >
> > It allows a user to experiment to determine (a) my device doesn't work
> > in a given scenario with the default configuration, but (b) if I change
> > the reset to this other thing it does work. That is a step in
> > debugging.
> >
> > It's absurd to think that a sysfs attribute could provide root cause,
> > but it might be enough for someone to further help that user. It would
> > be a useful clue for a bug report. Yes, reaching root cause might
> > involve building a kernel, but that doesn't invalidate that having a
> > step towards debugging in the base kernel might be a useful tool.
>
> Let's agree to do not agree.
>
> >
> > > > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > > > regarding the scope of a PM "soft reset". This interface would allow a
> > > > > > > > system policy to do that same thing.
> > > > > > > >
> > > > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > > > identify broken methods, and set a policy. Sure, that policy might be
> > > > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > > > the risks.
> > > > > > >
> > > > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > > > >
> > > > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > > > by default.
> > > > > >
> > > > > > Of course we're proposing it to be accessible, it should also require
> > > > > > admin privileges to modify, sysfs has lots of such things. If it's
> > > > > > relegated to non-default accessibility, it won't be used for testing
> > > > > > and it won't be available for system policy and it's pointless.
> > > > >
> > > > > We probably have difference in view of what testing is. I expect from
> > > > > the users who experience issues with reset to do extra steps and one of
> > > > > them is to require from them to compile their kernel.
> > > >
> > > > I would define the ability to generate a CI test that can pick a
> > > > device, unbind it from its driver, and iterate reset methods as a
> > > > worthwhile improvement in testing.
> > >
> > > Who is going to run this CI? At least all kernel CIs (external and
> > > internal to HW vendors) that I'm familiar are building kernel themselves.
> > >
> > > Distro kernel is too bloat to be really usable for CI.
> >
> > At this point I'm suspicious you're trolling. A distro kernel CI
> > certainly uses the kernel they intend to ship and support in their
> > environment. You're concerned about a bloated kernel, but the proposal
> > here adds 2-bytes per device to track reset methods and a trivial array
> > in text memory, meanwhile you're proposing multiple per-device memory
> > allocations to enhance the feature you think is too bloated for CI.
>
> I don't know why you decided to focus on memory footprint which is not
> important at all during CI runs. The bloat is in Kconfig options that
> are not needed. Those extra options add significant overhead during
> builds and runs itself.
>
> And not, I'm not trolling, but representing HW vendor that pushes its CI
> and developers environment to the limit, by running full kernel builds with
> less than 30 seconds and boot-to-test with less than 6 seconds for full
> Fedora VM.
>
> >
> > > > > The root permissions doesn't protect from anything, SO lovers will use
> > > > > root without even thinking twice.
> > > >
> > > > Yes, with great power comes great responsibility. Many admins ignore
> > > > this. That's far beyond the scope of this series.
> > >
> > > <...>
> > >
> > > > > I'm trying to help you with your use case of providing reset policy
> > > > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > > > to continue path of having specific reset type only, please ensure
> > > > > that this is not taken to the "bypass kernel" direction.
> > > >
> > > > You've lost me, are you saying you'd be in favor of an interface that
> > > > allows an admin to specify an arbitrary list of reset methods because
> > > > that's somehow more in line with a policy choice than a userspace
> > > > workaround? This seems like unnecessary bloat because (a) it allows
> > > > the same bypass mechanism, and (b) a given device is only going to use
> > > > a single method anyway, so the functionality is unnecessary. Please
> > > > help me understand how this favors the policy use case. Thanks,
> > >
> > > The policy decision is global logic that is easier to grasp. At some
> > > point of our discussion, you presented the case where PM reset is not
> > > defined well and you prefer to do bus reset (something like that).
> > >
> > > I expect that QEMU sets same reset policy for all devices at the same
> > > time instead of trying per-device to guess which one works.
> > >
> > > And yes, you will be able to bypass kernel, but at least this interface
> > > will be broader than initial one that serves only SO and workarounds.
> >
> > I still think allocating objects for a list and managing that list is
> > too bloated and complicated, but I agree that being able to have more
> > fine grained control could be useful. Is it necessary to be able to
> > re-order reset methods or might it still be better aligned to a policy
> > use case if we allow plus and minus operators? For example, a device
> > might list:
> >
> > [pm] [bus]
> >
> > Indicating that PM and bus reset are both available and enabled. The
> > user could do:
> >
> > echo -pm > reset_methods
> >
> > This would result in:
> >
> > pm [bus]
> >
> > Indicating that both PM and bus resets are available, but only bus reset
> > is enabled (note this is the identical result to "echo bus >" in the
> > current proposal). "echo +pm" or "echo default" could re-enable the PM
> > reset. Would something like that be satisfactory?
>
> Yes, I actually imagined simpler interface:
> To set specific type:
> echo pm > reset_methods
> To set policy:
> echo "pm,bus" > reset_methods
>
> But your proposal is nicer.
>
Okay I'll include this in v2
> >
> > If we need to allow re-ording, we'd want to use a byte-array where each
> > byte indicates a type of reset and perhaps a non-zero value in the
> > array indicates the method is enabled and the value indicates priority.
> > For example writing "dev_spec,flr,bus" would parse to write 1 to the
> > byte associated with the device specific reset, 2 to flr, 3 to bus
> > reset, then we'd process low to high (or maybe starting at a high value
> > to count down to zero might be more simple). We could do that with
> > only adding less than a fixed 8-bytes per device and no dynamic
> > allocation. Thoughts? Thanks,
>
> Like I suggested, linked list will be easier and the reset will be
> something like:
> for_each_reset_type(device, type) {
> switch (type) {
> case PM:
> ret = do_some_reset(device);
> break;
> case BUS:
> .....
> }
> if (!ret || ret == -ENOMEM) <-- go to next type in linked list
> return ret;
> }
>
Maybe we can use a byte array here. Lets consider current pci_reset_fn_methods
array. If a input is "pm, flr" we can have byte array with index of
those methods in pci_reset_fn_methods like [3, 1]. So when user triggers a
reset we use reset method at index 3(pm) and then at index 1(flr).
Does that make sense?
Thanks,
Amey
On Thu, Mar 25, 2021 at 10:52:57PM +0530, Amey Narkhede wrote:
> On 21/03/25 06:09PM, Leon Romanovsky wrote:
> > On Thu, Mar 25, 2021 at 08:55:04AM -0600, Alex Williamson wrote:
> > > On Thu, 25 Mar 2021 10:37:54 +0200
> > > Leon Romanovsky <[email protected]> wrote:
> > >
> > > > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > > > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > > > Leon Romanovsky <[email protected]> wrote:
> > > >
> > > > <...>
> > > >
> > > > > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > > > > Everything else is waste of time.
> > > > >
> > > > > Sorry, this is nonsense. Allowing users to debug issues without a full
> > > > > kernel rebuild is a good thing.
> > > >
> > > > It is far from debug, this interface doesn't give you any answers why
> > > > the reset didn't work, it just helps you to find the one that works.
> > > >
> > > > Unless you believe that this information will be enough to understand
> > > > the root cause, you will need to ask from the user to perform extra
> > > > tests, maybe try some quirk. All of that requires from the users to
> > > > rebuild their kernel.
> > > >
> > > > So no, it is not debug.
> > >
> > > It allows a user to experiment to determine (a) my device doesn't work
> > > in a given scenario with the default configuration, but (b) if I change
> > > the reset to this other thing it does work. That is a step in
> > > debugging.
> > >
> > > It's absurd to think that a sysfs attribute could provide root cause,
> > > but it might be enough for someone to further help that user. It would
> > > be a useful clue for a bug report. Yes, reaching root cause might
> > > involve building a kernel, but that doesn't invalidate that having a
> > > step towards debugging in the base kernel might be a useful tool.
> >
> > Let's agree to do not agree.
> >
> > >
> > > > > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > > > > regarding the scope of a PM "soft reset". This interface would allow a
> > > > > > > > > system policy to do that same thing.
> > > > > > > > >
> > > > > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > > > > identify broken methods, and set a policy. Sure, that policy might be
> > > > > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > > > > the risks.
> > > > > > > >
> > > > > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > > > > >
> > > > > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > > > > by default.
> > > > > > >
> > > > > > > Of course we're proposing it to be accessible, it should also require
> > > > > > > admin privileges to modify, sysfs has lots of such things. If it's
> > > > > > > relegated to non-default accessibility, it won't be used for testing
> > > > > > > and it won't be available for system policy and it's pointless.
> > > > > >
> > > > > > We probably have difference in view of what testing is. I expect from
> > > > > > the users who experience issues with reset to do extra steps and one of
> > > > > > them is to require from them to compile their kernel.
> > > > >
> > > > > I would define the ability to generate a CI test that can pick a
> > > > > device, unbind it from its driver, and iterate reset methods as a
> > > > > worthwhile improvement in testing.
> > > >
> > > > Who is going to run this CI? At least all kernel CIs (external and
> > > > internal to HW vendors) that I'm familiar are building kernel themselves.
> > > >
> > > > Distro kernel is too bloat to be really usable for CI.
> > >
> > > At this point I'm suspicious you're trolling. A distro kernel CI
> > > certainly uses the kernel they intend to ship and support in their
> > > environment. You're concerned about a bloated kernel, but the proposal
> > > here adds 2-bytes per device to track reset methods and a trivial array
> > > in text memory, meanwhile you're proposing multiple per-device memory
> > > allocations to enhance the feature you think is too bloated for CI.
> >
> > I don't know why you decided to focus on memory footprint which is not
> > important at all during CI runs. The bloat is in Kconfig options that
> > are not needed. Those extra options add significant overhead during
> > builds and runs itself.
> >
> > And not, I'm not trolling, but representing HW vendor that pushes its CI
> > and developers environment to the limit, by running full kernel builds with
> > less than 30 seconds and boot-to-test with less than 6 seconds for full
> > Fedora VM.
> >
> > >
> > > > > > The root permissions doesn't protect from anything, SO lovers will use
> > > > > > root without even thinking twice.
> > > > >
> > > > > Yes, with great power comes great responsibility. Many admins ignore
> > > > > this. That's far beyond the scope of this series.
> > > >
> > > > <...>
> > > >
> > > > > > I'm trying to help you with your use case of providing reset policy
> > > > > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > > > > to continue path of having specific reset type only, please ensure
> > > > > > that this is not taken to the "bypass kernel" direction.
> > > > >
> > > > > You've lost me, are you saying you'd be in favor of an interface that
> > > > > allows an admin to specify an arbitrary list of reset methods because
> > > > > that's somehow more in line with a policy choice than a userspace
> > > > > workaround? This seems like unnecessary bloat because (a) it allows
> > > > > the same bypass mechanism, and (b) a given device is only going to use
> > > > > a single method anyway, so the functionality is unnecessary. Please
> > > > > help me understand how this favors the policy use case. Thanks,
> > > >
> > > > The policy decision is global logic that is easier to grasp. At some
> > > > point of our discussion, you presented the case where PM reset is not
> > > > defined well and you prefer to do bus reset (something like that).
> > > >
> > > > I expect that QEMU sets same reset policy for all devices at the same
> > > > time instead of trying per-device to guess which one works.
> > > >
> > > > And yes, you will be able to bypass kernel, but at least this interface
> > > > will be broader than initial one that serves only SO and workarounds.
> > >
> > > I still think allocating objects for a list and managing that list is
> > > too bloated and complicated, but I agree that being able to have more
> > > fine grained control could be useful. Is it necessary to be able to
> > > re-order reset methods or might it still be better aligned to a policy
> > > use case if we allow plus and minus operators? For example, a device
> > > might list:
> > >
> > > [pm] [bus]
> > >
> > > Indicating that PM and bus reset are both available and enabled. The
> > > user could do:
> > >
> > > echo -pm > reset_methods
> > >
> > > This would result in:
> > >
> > > pm [bus]
> > >
> > > Indicating that both PM and bus resets are available, but only bus reset
> > > is enabled (note this is the identical result to "echo bus >" in the
> > > current proposal). "echo +pm" or "echo default" could re-enable the PM
> > > reset. Would something like that be satisfactory?
> >
> > Yes, I actually imagined simpler interface:
> > To set specific type:
> > echo pm > reset_methods
> > To set policy:
> > echo "pm,bus" > reset_methods
> >
> > But your proposal is nicer.
> >
> Okay I'll include this in v2
> > >
> > > If we need to allow re-ording, we'd want to use a byte-array where each
> > > byte indicates a type of reset and perhaps a non-zero value in the
> > > array indicates the method is enabled and the value indicates priority.
> > > For example writing "dev_spec,flr,bus" would parse to write 1 to the
> > > byte associated with the device specific reset, 2 to flr, 3 to bus
> > > reset, then we'd process low to high (or maybe starting at a high value
> > > to count down to zero might be more simple). We could do that with
> > > only adding less than a fixed 8-bytes per device and no dynamic
> > > allocation. Thoughts? Thanks,
> >
> > Like I suggested, linked list will be easier and the reset will be
> > something like:
> > for_each_reset_type(device, type) {
> > switch (type) {
> > case PM:
> > ret = do_some_reset(device);
> > break;
> > case BUS:
> > .....
> > }
> > if (!ret || ret == -ENOMEM) <-- go to next type in linked list
> > return ret;
> > }
> >
> Maybe we can use a byte array here. Lets consider current pci_reset_fn_methods
> array. If a input is "pm, flr" we can have byte array with index of
> those methods in pci_reset_fn_methods like [3, 1]. So when user triggers a
> reset we use reset method at index 3(pm) and then at index 1(flr).
> Does that make sense?
I'm not worried about in-kernel implementation, we will rewrite it if
needed. The most important part is user visible ABI, which we won't be
able to fix.
Thanks
>
> Thanks,
> Amey
On Thu, 25 Mar 2021 18:09:58 +0200
Leon Romanovsky <[email protected]> wrote:
> On Thu, Mar 25, 2021 at 08:55:04AM -0600, Alex Williamson wrote:
> > On Thu, 25 Mar 2021 10:37:54 +0200
> > Leon Romanovsky <[email protected]> wrote:
> >
> > > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > > Leon Romanovsky <[email protected]> wrote:
> > >
> > > <...>
> > >
> > > > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > > > Everything else is waste of time.
> > > >
> > > > Sorry, this is nonsense. Allowing users to debug issues without a full
> > > > kernel rebuild is a good thing.
> > >
> > > It is far from debug, this interface doesn't give you any answers why
> > > the reset didn't work, it just helps you to find the one that works.
> > >
> > > Unless you believe that this information will be enough to understand
> > > the root cause, you will need to ask from the user to perform extra
> > > tests, maybe try some quirk. All of that requires from the users to
> > > rebuild their kernel.
> > >
> > > So no, it is not debug.
> >
> > It allows a user to experiment to determine (a) my device doesn't work
> > in a given scenario with the default configuration, but (b) if I change
> > the reset to this other thing it does work. That is a step in
> > debugging.
> >
> > It's absurd to think that a sysfs attribute could provide root cause,
> > but it might be enough for someone to further help that user. It would
> > be a useful clue for a bug report. Yes, reaching root cause might
> > involve building a kernel, but that doesn't invalidate that having a
> > step towards debugging in the base kernel might be a useful tool.
>
> Let's agree to do not agree.
>
> >
> > > > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > > > regarding the scope of a PM "soft reset". This interface would allow a
> > > > > > > > system policy to do that same thing.
> > > > > > > >
> > > > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > > > identify broken methods, and set a policy. Sure, that policy might be
> > > > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > > > the risks.
> > > > > > >
> > > > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > > > >
> > > > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > > > by default.
> > > > > >
> > > > > > Of course we're proposing it to be accessible, it should also require
> > > > > > admin privileges to modify, sysfs has lots of such things. If it's
> > > > > > relegated to non-default accessibility, it won't be used for testing
> > > > > > and it won't be available for system policy and it's pointless.
> > > > >
> > > > > We probably have difference in view of what testing is. I expect from
> > > > > the users who experience issues with reset to do extra steps and one of
> > > > > them is to require from them to compile their kernel.
> > > >
> > > > I would define the ability to generate a CI test that can pick a
> > > > device, unbind it from its driver, and iterate reset methods as a
> > > > worthwhile improvement in testing.
> > >
> > > Who is going to run this CI? At least all kernel CIs (external and
> > > internal to HW vendors) that I'm familiar are building kernel themselves.
> > >
> > > Distro kernel is too bloat to be really usable for CI.
> >
> > At this point I'm suspicious you're trolling. A distro kernel CI
> > certainly uses the kernel they intend to ship and support in their
> > environment. You're concerned about a bloated kernel, but the proposal
> > here adds 2-bytes per device to track reset methods and a trivial array
> > in text memory, meanwhile you're proposing multiple per-device memory
> > allocations to enhance the feature you think is too bloated for CI.
>
> I don't know why you decided to focus on memory footprint which is not
> important at all during CI runs. The bloat is in Kconfig options that
> are not needed. Those extra options add significant overhead during
> builds and runs itself.
>
> And not, I'm not trolling, but representing HW vendor that pushes its CI
> and developers environment to the limit, by running full kernel builds with
> less than 30 seconds and boot-to-test with less than 6 seconds for full
> Fedora VM.
CI is only one aspect where I think this interface could be useful, as
below there's also a policy use case. Therefore my inclination is that
this would be included in default kernels and avoiding bloat is a good
thing. A CI environment can be used in different ways, it's not
necessarily building a new kernel for every test, nor do typical users
have access to those types of environments to report information in a
bug.
> > > > > The root permissions doesn't protect from anything, SO lovers will use
> > > > > root without even thinking twice.
> > > >
> > > > Yes, with great power comes great responsibility. Many admins ignore
> > > > this. That's far beyond the scope of this series.
> > >
> > > <...>
> > >
> > > > > I'm trying to help you with your use case of providing reset policy
> > > > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > > > to continue path of having specific reset type only, please ensure
> > > > > that this is not taken to the "bypass kernel" direction.
> > > >
> > > > You've lost me, are you saying you'd be in favor of an interface that
> > > > allows an admin to specify an arbitrary list of reset methods because
> > > > that's somehow more in line with a policy choice than a userspace
> > > > workaround? This seems like unnecessary bloat because (a) it allows
> > > > the same bypass mechanism, and (b) a given device is only going to use
> > > > a single method anyway, so the functionality is unnecessary. Please
> > > > help me understand how this favors the policy use case. Thanks,
> > >
> > > The policy decision is global logic that is easier to grasp. At some
> > > point of our discussion, you presented the case where PM reset is not
> > > defined well and you prefer to do bus reset (something like that).
> > >
> > > I expect that QEMU sets same reset policy for all devices at the same
> > > time instead of trying per-device to guess which one works.
> > >
> > > And yes, you will be able to bypass kernel, but at least this interface
> > > will be broader than initial one that serves only SO and workarounds.
> >
> > I still think allocating objects for a list and managing that list is
> > too bloated and complicated, but I agree that being able to have more
> > fine grained control could be useful. Is it necessary to be able to
> > re-order reset methods or might it still be better aligned to a policy
> > use case if we allow plus and minus operators? For example, a device
> > might list:
> >
> > [pm] [bus]
> >
> > Indicating that PM and bus reset are both available and enabled. The
> > user could do:
> >
> > echo -pm > reset_methods
> >
> > This would result in:
> >
> > pm [bus]
> >
> > Indicating that both PM and bus resets are available, but only bus reset
> > is enabled (note this is the identical result to "echo bus >" in the
> > current proposal). "echo +pm" or "echo default" could re-enable the PM
> > reset. Would something like that be satisfactory?
>
> Yes, I actually imagined simpler interface:
> To set specific type:
> echo pm > reset_methods
> To set policy:
> echo "pm,bus" > reset_methods
>
> But your proposal is nicer.
The above doesn't support re-ordering though, we'll need to parse a
comma separated list for that.
> > If we need to allow re-ording, we'd want to use a byte-array where each
> > byte indicates a type of reset and perhaps a non-zero value in the
> > array indicates the method is enabled and the value indicates priority.
> > For example writing "dev_spec,flr,bus" would parse to write 1 to the
> > byte associated with the device specific reset, 2 to flr, 3 to bus
> > reset, then we'd process low to high (or maybe starting at a high value
> > to count down to zero might be more simple). We could do that with
> > only adding less than a fixed 8-bytes per device and no dynamic
> > allocation. Thoughts? Thanks,
>
> Like I suggested, linked list will be easier and the reset will be
> something like:
> for_each_reset_type(device, type) {
> switch (type) {
> case PM:
> ret = do_some_reset(device);
> break;
> case BUS:
> .....
> }
> if (!ret || ret == -ENOMEM) <-- go to next type in linked list
> return ret;
> }
Perhaps Bjorn has some thoughts, but I don't like the dynamic memory
allocation and list management required for a linked list. Once bus &
slot reset are combined, I think we're talking about potentially 5
reset methods, so if we had:
const struct pci_reset_fn_method pci_reset_fn_methods[] = {
{ .reset_fn = &pci_dev_specific_reset, .name = "device_specific" },
{ .reset_fn = &pcie_flr, .name = "flr" },
{ .reset_fn = &pci_af_flr, .name = "af_flr" },
{ .reset_fn = &pci_pm_reset, .name = "pm" },
{ .reset_fn = &pci_reset_bus_function, .name = "bus" },
};
The pci_dev could include
u8 reset_methods[ARRAY_SIZE(pci_reset_fn_methods)];
And we could loop as:
u8 prio;
for (prio = ARRAY_SIZE(pci_reset_fn_methods); prio; prio--) {
int i;
for (i = 0; i < ARRAY_SIZE(pci_reset_fn_methods); i++) {
if (dev->reset_methods[i] == prio) {
ret = pci_reset_fn_methods[i].reset_fn(dev, probe);
if (ret != -ENOTTY)
return ret;
break;
}
}
if (i == ARRAY_SIZE(pci_reset_fn_methods))
break;
}
return -ENOTTY;
The sysfs _store function would probably do something like:
u8 reset_methods[ARRAY_SIZE(pci_reset_fn_methods)] = { 0 };
u8 prio = ARRAY_SIZE(pci_reset_fn_methods);
for each @string in comma separated list from user... {
int i;
for (i = 0; i < ARRAY_SIZE(pci_reset_fn_methods); i++) {
if (!strcmp(@string, pci_reset_fn_methods[i].name)) {
reset_methods[i] = prio--;
break;
}
}
if (i == ARRAY_SIZE(pci_reset_fn_methods))
return -EINVAL;
}
memcpy(dev->reset_methods, reset_methods, sizeof(reset_methods));
The probe would also need to fill the array in a compatible way:
u8 reset_methods[ARRAY_SIZE(pci_reset_fn_methods)] = { 0 };
u8 prio = ARRAY_SIZE(pci_reset_fn_methods);
int i;
for (i = 0; i < ARRAY_SIZE(pci_reset_fn_methods); i++) {
int ret = pci_reset_fn_methods[i].reset_fn(dev, 1);
if (!ret)
reset_methods[i] = prio--;
else if (ret != -ENOTTY)
break;
}
memcpy(dev->reset_methods, reset_methods, sizeof(reset_methods));
Thanks,
Alex
On Thu, Mar 25, 2021 at 11:53:24AM -0600, Alex Williamson wrote:
> On Thu, 25 Mar 2021 18:09:58 +0200
> Leon Romanovsky <[email protected]> wrote:
>
> > On Thu, Mar 25, 2021 at 08:55:04AM -0600, Alex Williamson wrote:
> > > On Thu, 25 Mar 2021 10:37:54 +0200
> > > Leon Romanovsky <[email protected]> wrote:
> > >
> > > > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > > > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > > > Leon Romanovsky <[email protected]> wrote:
> > > >
> > > > <...>
> > > >
> > > > > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > > > > Everything else is waste of time.
> > > > >
> > > > > Sorry, this is nonsense. Allowing users to debug issues without a full
> > > > > kernel rebuild is a good thing.
> > > >
> > > > It is far from debug, this interface doesn't give you any answers why
> > > > the reset didn't work, it just helps you to find the one that works.
> > > >
> > > > Unless you believe that this information will be enough to understand
> > > > the root cause, you will need to ask from the user to perform extra
> > > > tests, maybe try some quirk. All of that requires from the users to
> > > > rebuild their kernel.
> > > >
> > > > So no, it is not debug.
> > >
> > > It allows a user to experiment to determine (a) my device doesn't work
> > > in a given scenario with the default configuration, but (b) if I change
> > > the reset to this other thing it does work. That is a step in
> > > debugging.
> > >
> > > It's absurd to think that a sysfs attribute could provide root cause,
> > > but it might be enough for someone to further help that user. It would
> > > be a useful clue for a bug report. Yes, reaching root cause might
> > > involve building a kernel, but that doesn't invalidate that having a
> > > step towards debugging in the base kernel might be a useful tool.
> >
> > Let's agree to do not agree.
> >
> > >
> > > > > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > > > > regarding the scope of a PM "soft reset". This interface would allow a
> > > > > > > > > system policy to do that same thing.
> > > > > > > > >
> > > > > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > > > > identify broken methods, and set a policy. Sure, that policy might be
> > > > > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > > > > the risks.
> > > > > > > >
> > > > > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > > > > >
> > > > > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > > > > by default.
> > > > > > >
> > > > > > > Of course we're proposing it to be accessible, it should also require
> > > > > > > admin privileges to modify, sysfs has lots of such things. If it's
> > > > > > > relegated to non-default accessibility, it won't be used for testing
> > > > > > > and it won't be available for system policy and it's pointless.
> > > > > >
> > > > > > We probably have difference in view of what testing is. I expect from
> > > > > > the users who experience issues with reset to do extra steps and one of
> > > > > > them is to require from them to compile their kernel.
> > > > >
> > > > > I would define the ability to generate a CI test that can pick a
> > > > > device, unbind it from its driver, and iterate reset methods as a
> > > > > worthwhile improvement in testing.
> > > >
> > > > Who is going to run this CI? At least all kernel CIs (external and
> > > > internal to HW vendors) that I'm familiar are building kernel themselves.
> > > >
> > > > Distro kernel is too bloat to be really usable for CI.
> > >
> > > At this point I'm suspicious you're trolling. A distro kernel CI
> > > certainly uses the kernel they intend to ship and support in their
> > > environment. You're concerned about a bloated kernel, but the proposal
> > > here adds 2-bytes per device to track reset methods and a trivial array
> > > in text memory, meanwhile you're proposing multiple per-device memory
> > > allocations to enhance the feature you think is too bloated for CI.
> >
> > I don't know why you decided to focus on memory footprint which is not
> > important at all during CI runs. The bloat is in Kconfig options that
> > are not needed. Those extra options add significant overhead during
> > builds and runs itself.
> >
> > And not, I'm not trolling, but representing HW vendor that pushes its CI
> > and developers environment to the limit, by running full kernel builds with
> > less than 30 seconds and boot-to-test with less than 6 seconds for full
> > Fedora VM.
>
> CI is only one aspect where I think this interface could be useful, as
> below there's also a policy use case. Therefore my inclination is that
> this would be included in default kernels and avoiding bloat is a good
> thing. A CI environment can be used in different ways, it's not
> necessarily building a new kernel for every test, nor do typical users
> have access to those types of environments to report information in a
> bug.
>
> > > > > > The root permissions doesn't protect from anything, SO lovers will use
> > > > > > root without even thinking twice.
> > > > >
> > > > > Yes, with great power comes great responsibility. Many admins ignore
> > > > > this. That's far beyond the scope of this series.
> > > >
> > > > <...>
> > > >
> > > > > > I'm trying to help you with your use case of providing reset policy
> > > > > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > > > > to continue path of having specific reset type only, please ensure
> > > > > > that this is not taken to the "bypass kernel" direction.
> > > > >
> > > > > You've lost me, are you saying you'd be in favor of an interface that
> > > > > allows an admin to specify an arbitrary list of reset methods because
> > > > > that's somehow more in line with a policy choice than a userspace
> > > > > workaround? This seems like unnecessary bloat because (a) it allows
> > > > > the same bypass mechanism, and (b) a given device is only going to use
> > > > > a single method anyway, so the functionality is unnecessary. Please
> > > > > help me understand how this favors the policy use case. Thanks,
> > > >
> > > > The policy decision is global logic that is easier to grasp. At some
> > > > point of our discussion, you presented the case where PM reset is not
> > > > defined well and you prefer to do bus reset (something like that).
> > > >
> > > > I expect that QEMU sets same reset policy for all devices at the same
> > > > time instead of trying per-device to guess which one works.
> > > >
> > > > And yes, you will be able to bypass kernel, but at least this interface
> > > > will be broader than initial one that serves only SO and workarounds.
> > >
> > > I still think allocating objects for a list and managing that list is
> > > too bloated and complicated, but I agree that being able to have more
> > > fine grained control could be useful. Is it necessary to be able to
> > > re-order reset methods or might it still be better aligned to a policy
> > > use case if we allow plus and minus operators? For example, a device
> > > might list:
> > >
> > > [pm] [bus]
> > >
> > > Indicating that PM and bus reset are both available and enabled. The
> > > user could do:
> > >
> > > echo -pm > reset_methods
> > >
> > > This would result in:
> > >
> > > pm [bus]
> > >
> > > Indicating that both PM and bus resets are available, but only bus reset
> > > is enabled (note this is the identical result to "echo bus >" in the
> > > current proposal). "echo +pm" or "echo default" could re-enable the PM
> > > reset. Would something like that be satisfactory?
> >
> > Yes, I actually imagined simpler interface:
> > To set specific type:
> > echo pm > reset_methods
> > To set policy:
> > echo "pm,bus" > reset_methods
> >
> > But your proposal is nicer.
>
> The above doesn't support re-ordering though, we'll need to parse a
> comma separated list for that.
It supports by writing: echo "bus,pm" > reset_methods.
Regarding comma, IMHO it is easiest pattern for the parsing.
Anyway, The in-kernel implementation is not important to me.
Thanks
Hello,
[...]
Aside of the sysfs interface, would this new functionality also require
anything to be overridden at boot time via passing some command-line
arguments? Not sure how relevant such thing would be to device, but,
whatnot reset, though.
I am curious whether there would be a need for anything like that.
Krzysztof
On Fri, Mar 26, 2021 at 10:18:25AM +0100, Krzysztof Wilczyński wrote:
> Hello,
>
> [...]
>
> Aside of the sysfs interface, would this new functionality also require
> anything to be overridden at boot time via passing some command-line
> arguments? Not sure how relevant such thing would be to device, but,
> whatnot reset, though.
This is per-device property and can't be universally correct like kernel
command-line arguments. I don't think that we need to add such functionality.
>
> I am curious whether there would be a need for anything like that.
I prefer not.
>
> Krzysztof
On Fri, 26 Mar 2021 09:40:30 +0300
Leon Romanovsky <[email protected]> wrote:
> On Thu, Mar 25, 2021 at 11:53:24AM -0600, Alex Williamson wrote:
> > On Thu, 25 Mar 2021 18:09:58 +0200
> > Leon Romanovsky <[email protected]> wrote:
> >
> > > On Thu, Mar 25, 2021 at 08:55:04AM -0600, Alex Williamson wrote:
> > > > On Thu, 25 Mar 2021 10:37:54 +0200
> > > > Leon Romanovsky <[email protected]> wrote:
> > > >
> > > > > On Wed, Mar 24, 2021 at 11:17:29AM -0600, Alex Williamson wrote:
> > > > > > On Wed, 24 Mar 2021 17:13:56 +0200
> > > > > > Leon Romanovsky <[email protected]> wrote:
> > > > >
> > > > > <...>
> > > > >
> > > > > > > Yes, and real testing/debugging almost always requires kernel rebuild.
> > > > > > > Everything else is waste of time.
> > > > > >
> > > > > > Sorry, this is nonsense. Allowing users to debug issues without a full
> > > > > > kernel rebuild is a good thing.
> > > > >
> > > > > It is far from debug, this interface doesn't give you any answers why
> > > > > the reset didn't work, it just helps you to find the one that works.
> > > > >
> > > > > Unless you believe that this information will be enough to understand
> > > > > the root cause, you will need to ask from the user to perform extra
> > > > > tests, maybe try some quirk. All of that requires from the users to
> > > > > rebuild their kernel.
> > > > >
> > > > > So no, it is not debug.
> > > >
> > > > It allows a user to experiment to determine (a) my device doesn't work
> > > > in a given scenario with the default configuration, but (b) if I change
> > > > the reset to this other thing it does work. That is a step in
> > > > debugging.
> > > >
> > > > It's absurd to think that a sysfs attribute could provide root cause,
> > > > but it might be enough for someone to further help that user. It would
> > > > be a useful clue for a bug report. Yes, reaching root cause might
> > > > involve building a kernel, but that doesn't invalidate that having a
> > > > step towards debugging in the base kernel might be a useful tool.
> > >
> > > Let's agree to do not agree.
> > >
> > > >
> > > > > > > > > > For policy preference, I already described how I've configured QEMU to
> > > > > > > > > > prefer a bus reset rather than a PM reset due to lack of specification
> > > > > > > > > > regarding the scope of a PM "soft reset". This interface would allow a
> > > > > > > > > > system policy to do that same thing.
> > > > > > > > > >
> > > > > > > > > > I don't think anyone is suggesting this as a means to avoid quirks that
> > > > > > > > > > would resolve reset issues and create the best default general behavior.
> > > > > > > > > > This provides a mechanism to test various reset methods, and thereby
> > > > > > > > > > identify broken methods, and set a policy. Sure, that policy might be
> > > > > > > > > > to avoid a broken reset in the interim before it gets quirked and
> > > > > > > > > > there's potential for abuse there, but I think the benefits outweigh
> > > > > > > > > > the risks.
> > > > > > > > >
> > > > > > > > > This interface is proposed as first class citizen in the general sysfs
> > > > > > > > > layout. Of course, it will be seen as a way to bypass the kernel.
> > > > > > > > >
> > > > > > > > > At least, put it under CONFIG_EXPERT option, so no distro will enable it
> > > > > > > > > by default.
> > > > > > > >
> > > > > > > > Of course we're proposing it to be accessible, it should also require
> > > > > > > > admin privileges to modify, sysfs has lots of such things. If it's
> > > > > > > > relegated to non-default accessibility, it won't be used for testing
> > > > > > > > and it won't be available for system policy and it's pointless.
> > > > > > >
> > > > > > > We probably have difference in view of what testing is. I expect from
> > > > > > > the users who experience issues with reset to do extra steps and one of
> > > > > > > them is to require from them to compile their kernel.
> > > > > >
> > > > > > I would define the ability to generate a CI test that can pick a
> > > > > > device, unbind it from its driver, and iterate reset methods as a
> > > > > > worthwhile improvement in testing.
> > > > >
> > > > > Who is going to run this CI? At least all kernel CIs (external and
> > > > > internal to HW vendors) that I'm familiar are building kernel themselves.
> > > > >
> > > > > Distro kernel is too bloat to be really usable for CI.
> > > >
> > > > At this point I'm suspicious you're trolling. A distro kernel CI
> > > > certainly uses the kernel they intend to ship and support in their
> > > > environment. You're concerned about a bloated kernel, but the proposal
> > > > here adds 2-bytes per device to track reset methods and a trivial array
> > > > in text memory, meanwhile you're proposing multiple per-device memory
> > > > allocations to enhance the feature you think is too bloated for CI.
> > >
> > > I don't know why you decided to focus on memory footprint which is not
> > > important at all during CI runs. The bloat is in Kconfig options that
> > > are not needed. Those extra options add significant overhead during
> > > builds and runs itself.
> > >
> > > And not, I'm not trolling, but representing HW vendor that pushes its CI
> > > and developers environment to the limit, by running full kernel builds with
> > > less than 30 seconds and boot-to-test with less than 6 seconds for full
> > > Fedora VM.
> >
> > CI is only one aspect where I think this interface could be useful, as
> > below there's also a policy use case. Therefore my inclination is that
> > this would be included in default kernels and avoiding bloat is a good
> > thing. A CI environment can be used in different ways, it's not
> > necessarily building a new kernel for every test, nor do typical users
> > have access to those types of environments to report information in a
> > bug.
> >
> > > > > > > The root permissions doesn't protect from anything, SO lovers will use
> > > > > > > root without even thinking twice.
> > > > > >
> > > > > > Yes, with great power comes great responsibility. Many admins ignore
> > > > > > this. That's far beyond the scope of this series.
> > > > >
> > > > > <...>
> > > > >
> > > > > > > I'm trying to help you with your use case of providing reset policy
> > > > > > > mechanism, which can be without CONFIG_EXPERT. However if you want
> > > > > > > to continue path of having specific reset type only, please ensure
> > > > > > > that this is not taken to the "bypass kernel" direction.
> > > > > >
> > > > > > You've lost me, are you saying you'd be in favor of an interface that
> > > > > > allows an admin to specify an arbitrary list of reset methods because
> > > > > > that's somehow more in line with a policy choice than a userspace
> > > > > > workaround? This seems like unnecessary bloat because (a) it allows
> > > > > > the same bypass mechanism, and (b) a given device is only going to use
> > > > > > a single method anyway, so the functionality is unnecessary. Please
> > > > > > help me understand how this favors the policy use case. Thanks,
> > > > >
> > > > > The policy decision is global logic that is easier to grasp. At some
> > > > > point of our discussion, you presented the case where PM reset is not
> > > > > defined well and you prefer to do bus reset (something like that).
> > > > >
> > > > > I expect that QEMU sets same reset policy for all devices at the same
> > > > > time instead of trying per-device to guess which one works.
> > > > >
> > > > > And yes, you will be able to bypass kernel, but at least this interface
> > > > > will be broader than initial one that serves only SO and workarounds.
> > > >
> > > > I still think allocating objects for a list and managing that list is
> > > > too bloated and complicated, but I agree that being able to have more
> > > > fine grained control could be useful. Is it necessary to be able to
> > > > re-order reset methods or might it still be better aligned to a policy
> > > > use case if we allow plus and minus operators? For example, a device
> > > > might list:
> > > >
> > > > [pm] [bus]
> > > >
> > > > Indicating that PM and bus reset are both available and enabled. The
> > > > user could do:
> > > >
> > > > echo -pm > reset_methods
> > > >
> > > > This would result in:
> > > >
> > > > pm [bus]
> > > >
> > > > Indicating that both PM and bus resets are available, but only bus reset
> > > > is enabled (note this is the identical result to "echo bus >" in the
> > > > current proposal). "echo +pm" or "echo default" could re-enable the PM
> > > > reset. Would something like that be satisfactory?
(3) This +/- scheme, which doesn't support re-ordering.
> > >
> > > Yes, I actually imagined simpler interface:
> > > To set specific type:
> > > echo pm > reset_methods
> > > To set policy:
> > > echo "pm,bus" > reset_methods
> > >
> > > But your proposal is nicer.
(2) This, which I believe is in reference to... ^^
> >
> > The above doesn't support re-ordering though, we'll need to parse a
> > comma separated list for that.
(1) This refers to... ^^
>
> It supports by writing: echo "bus,pm" > reset_methods.
> Regarding comma, IMHO it is easiest pattern for the parsing.
>
> Anyway, The in-kernel implementation is not important to me.
Too bad, it should have been apparent from the sample code that it was
using a comma separated list with re-ordering support. Thanks,
Alex
On Fri, Mar 26, 2021 at 08:20:07AM -0600, Alex Williamson wrote:
> On Fri, 26 Mar 2021 09:40:30 +0300
> Leon Romanovsky <[email protected]> wrote:
<...>
> >
> > It supports by writing: echo "bus,pm" > reset_methods.
> > Regarding comma, IMHO it is easiest pattern for the parsing.
> >
> > Anyway, The in-kernel implementation is not important to me.
>
> Too bad, it should have been apparent from the sample code that it was
> using a comma separated list with re-ordering support. Thanks,
Excellent, both of us think that "bus,pm" is the easiest way to
implement policy decision.
Thanks
>
> Alex
>