From: Yicong Yang <[email protected]>
This series tends to improve the PTT's filter interface in 2 aspects (Patch 2&3):
- Support dynamically filter updating to response to hotplug
Previous the supported filter list is settled down once the driver probed and
it maybe out-of-date if hotplug events happen later. User need to reload the
driver to update list. Patch 1/2 enable the driver to update the list by
registering a PCI bus notifier and the filter list will always be the latest.
- Export the available filters through sysfs
Previous user needs to calculate the filters and filter value using device's
BDF number, which requires the user to know the hardware well. Patch 3/3 tends
to export the available filter information through sysfs attributes, the filter
value will be gotten by reading the file. This will be more user friendly.
Also includes a fix and an improve. Patch 1 tends to only export the online CPUs
supported by the PTT, this will make perf work properly when there's offline CPUs
within the node PTT locates. Patch 4 tends to set proper PMU capability to avoid
collecting unnecessary data to save the storage.
Yicong Yang (4):
hwtracing: hisi_ptt: Make cpumask only present online CPUs
hwtracing: hisi_ptt: Add support for dynamically updating the filter
list
hwtracing: hisi_ptt: Export available filters through sysfs
hwtracing: hisi_ptt: Advertise PERF_PMU_CAP_NO_EXCLUDE for PTT PMU
.../ABI/testing/sysfs-devices-hisi_ptt | 50 +++
Documentation/trace/hisi-ptt.rst | 12 +-
drivers/hwtracing/ptt/hisi_ptt.c | 396 +++++++++++++++++-
drivers/hwtracing/ptt/hisi_ptt.h | 51 +++
4 files changed, 498 insertions(+), 11 deletions(-)
--
2.24.0
From: Yicong Yang <[email protected]>
perf will try to start PTT trace on every CPU presented in cpumask sysfs
attribute and it will fail to start on offline CPUs(see the comments in
perf_event_open()). But the driver is using cpumask_of_node() to export
the available cpumask which may include offline CPUs and may fail the
perf unintendedly. Fix this by only export the online CPUs of the node.
Fixes: ff0de066b463 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device")
Signed-off-by: Yicong Yang <[email protected]>
---
drivers/hwtracing/ptt/hisi_ptt.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
index 30f1525639b5..0a10c7ec46ad 100644
--- a/drivers/hwtracing/ptt/hisi_ptt.c
+++ b/drivers/hwtracing/ptt/hisi_ptt.c
@@ -487,9 +487,18 @@ static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
- const cpumask_t *cpumask = cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev));
+ cpumask_var_t mask;
+ ssize_t n;
- return cpumap_print_to_pagebuf(true, buf, cpumask);
+ if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+ return 0;
+
+ cpumask_and(mask, cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)),
+ cpu_online_mask);
+ n = cpumap_print_to_pagebuf(true, buf, mask);
+ free_cpumask_var(mask);
+
+ return n;
}
static DEVICE_ATTR_RO(cpumask);
--
2.24.0
From: Yicong Yang <[email protected]>
The PTT trace collects PCIe TLP headers from the PCIe link and don't
have the ability to exclude certain context. It doesn't support itrace
as well. So only advertise PERF_PMU_CAP_NO_EXCLUDE. This will greatly
save the storage of final data. Tested tracing idle link for ~15s,
without this patch we'll collect ~28.682MB data for context related
information and with this patch it reduced to ~0.226MB.
Signed-off-by: Yicong Yang <[email protected]>
---
drivers/hwtracing/ptt/hisi_ptt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
index a5cd87edb813..8c1cce32b83f 100644
--- a/drivers/hwtracing/ptt/hisi_ptt.c
+++ b/drivers/hwtracing/ptt/hisi_ptt.c
@@ -1216,7 +1216,7 @@ static int hisi_ptt_register_pmu(struct hisi_ptt *hisi_ptt)
hisi_ptt->hisi_ptt_pmu = (struct pmu) {
.module = THIS_MODULE,
- .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
.task_ctx_nr = perf_sw_context,
.attr_groups = hisi_ptt_pmu_groups,
.event_init = hisi_ptt_pmu_event_init,
--
2.24.0
From: Yicong Yang <[email protected]>
The PCIe devices supported by the PTT trace can be removed/rescanned by
hotplug or through sysfs. Add support for dynamically updating the
available filter list by registering a PCI bus notifier block. Then user
can always get latest information about available tracing filters and
driver can block the invalid filters of which related devices no longer
exist in the system.
Signed-off-by: Yicong Yang <[email protected]>
---
Documentation/trace/hisi-ptt.rst | 6 +-
drivers/hwtracing/ptt/hisi_ptt.c | 152 ++++++++++++++++++++++++++++++-
drivers/hwtracing/ptt/hisi_ptt.h | 35 +++++++
3 files changed, 189 insertions(+), 4 deletions(-)
diff --git a/Documentation/trace/hisi-ptt.rst b/Documentation/trace/hisi-ptt.rst
index 4f87d8e21065..3641aca4287a 100644
--- a/Documentation/trace/hisi-ptt.rst
+++ b/Documentation/trace/hisi-ptt.rst
@@ -153,9 +153,9 @@ Endpoint function can be specified in one trace. Specifying both Root Port
and function at the same time is not supported. Driver maintains a list of
available filters and will check the invalid inputs.
-Currently the available filters are detected in driver's probe. If the supported
-devices are removed/added after probe, you may need to reload the driver to update
-the filters.
+The available filters will be dynamically updates, which means you will always
+get correct filter information when hotplug events happen, or when you manually
+remove/rescan the devices.
2. Type
-------
diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
index 0a10c7ec46ad..010cdbc3c172 100644
--- a/drivers/hwtracing/ptt/hisi_ptt.c
+++ b/drivers/hwtracing/ptt/hisi_ptt.c
@@ -354,6 +354,117 @@ static int hisi_ptt_register_irq(struct hisi_ptt *hisi_ptt)
return 0;
}
+static void hisi_ptt_update_filters(struct work_struct *work)
+{
+ struct delayed_work *delayed_work = to_delayed_work(work);
+ struct hisi_ptt_filter_update_info info;
+ struct hisi_ptt_filter_desc *filter;
+ struct list_head *target_list;
+ struct hisi_ptt *hisi_ptt;
+
+ hisi_ptt = container_of(delayed_work, struct hisi_ptt, work);
+
+ if (!mutex_trylock(&hisi_ptt->filter_lock)) {
+ schedule_delayed_work(&hisi_ptt->work, HISI_PTT_WORK_DELAY_MS);
+ return;
+ }
+
+ while (kfifo_get(&hisi_ptt->filter_update_kfifo, &info)) {
+ bool is_port = pci_pcie_type(info.pdev) == PCI_EXP_TYPE_ROOT_PORT;
+ u16 devid = PCI_DEVID(info.pdev->bus->number, info.pdev->devfn);
+ u16 val = hisi_ptt_get_filter_val(devid, is_port);
+
+ target_list = is_port ? &hisi_ptt->port_filters : &hisi_ptt->req_filters;
+
+ if (info.is_add) {
+ filter = kzalloc(sizeof(*filter), GFP_KERNEL);
+ if (!filter) {
+ pci_err(hisi_ptt->pdev, "failed to add filter %s\n",
+ pci_name(info.pdev));
+ continue;
+ }
+
+ filter->devid = devid;
+ filter->is_port = is_port;
+ list_add_tail(&filter->list, target_list);
+
+ if (is_port)
+ hisi_ptt->port_mask |= val;
+ } else {
+ list_for_each_entry(filter, target_list, list)
+ if (filter->devid == devid) {
+ list_del(&filter->list);
+ kfree(filter);
+ break;
+ }
+
+ if (is_port)
+ hisi_ptt->port_mask &= ~val;
+ }
+ }
+
+ mutex_unlock(&hisi_ptt->filter_lock);
+}
+
+static void hisi_ptt_update_fifo_in(struct hisi_ptt *hisi_ptt,
+ struct hisi_ptt_filter_update_info *info)
+{
+ struct pci_dev *root_port = pcie_find_root_port(info->pdev);
+ u32 port_devid;
+
+ if (!root_port)
+ return;
+
+ port_devid = PCI_DEVID(root_port->bus->number, root_port->devfn);
+ if (port_devid < hisi_ptt->lower_bdf ||
+ port_devid > hisi_ptt->upper_bdf)
+ return;
+
+ /*
+ * The FIFO size is 16 which is sufficient for almost all the cases,
+ * since each PCIe core will have most 8 Root Ports (typically only
+ * 1~4 Root Ports). On failure log the failed filter and let user
+ * handle it.
+ */
+ if (kfifo_in_spinlocked(&hisi_ptt->filter_update_kfifo, info, 1,
+ &hisi_ptt->filter_update_lock))
+ schedule_delayed_work(&hisi_ptt->work, 0);
+ else
+ pci_warn(hisi_ptt->pdev,
+ "filter update fifo overflow for target %s\n",
+ pci_name(info->pdev));
+}
+
+/*
+ * A PCI bus notifier is used here for dynamically updating the filter
+ * list.
+ */
+static int hisi_ptt_notifier_call(struct notifier_block *nb, unsigned long action,
+ void *data)
+{
+ struct hisi_ptt *hisi_ptt = container_of(nb, struct hisi_ptt, hisi_ptt_nb);
+ struct hisi_ptt_filter_update_info info;
+ struct device *dev = data;
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ info.pdev = pdev;
+
+ switch (action) {
+ case BUS_NOTIFY_ADD_DEVICE:
+ info.is_add = true;
+ break;
+ case BUS_NOTIFY_DEL_DEVICE:
+ info.is_add = false;
+ break;
+ default:
+ return 0;
+ }
+
+ hisi_ptt_update_fifo_in(hisi_ptt, &info);
+
+ return 0;
+}
+
static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data)
{
struct pci_dev *root_port = pcie_find_root_port(pdev);
@@ -451,8 +562,13 @@ static int hisi_ptt_init_ctrls(struct hisi_ptt *hisi_ptt)
int ret;
u32 reg;
+ INIT_DELAYED_WORK(&hisi_ptt->work, hisi_ptt_update_filters);
+ INIT_KFIFO(hisi_ptt->filter_update_kfifo);
+ spin_lock_init(&hisi_ptt->filter_update_lock);
+
INIT_LIST_HEAD(&hisi_ptt->port_filters);
INIT_LIST_HEAD(&hisi_ptt->req_filters);
+ mutex_init(&hisi_ptt->filter_lock);
ret = hisi_ptt_config_trace_buf(hisi_ptt);
if (ret)
@@ -627,14 +743,19 @@ static int hisi_ptt_trace_valid_filter(struct hisi_ptt *hisi_ptt, u64 config)
* For Requester ID filters, walk the available filter list to see
* whether we have one matched.
*/
+ mutex_lock(&hisi_ptt->filter_lock);
if (!hisi_ptt->trace_ctrl.is_port) {
list_for_each_entry(filter, &hisi_ptt->req_filters, list) {
- if (val == hisi_ptt_get_filter_val(filter->devid, filter->is_port))
+ if (val == hisi_ptt_get_filter_val(filter->devid, filter->is_port)) {
+ mutex_unlock(&hisi_ptt->filter_lock);
return 0;
+ }
}
} else if (bitmap_subset(&val, &port_mask, BITS_PER_LONG)) {
+ mutex_unlock(&hisi_ptt->filter_lock);
return 0;
}
+ mutex_unlock(&hisi_ptt->filter_lock);
return -EINVAL;
}
@@ -910,6 +1031,31 @@ static int hisi_ptt_register_pmu(struct hisi_ptt *hisi_ptt)
&hisi_ptt->hisi_ptt_pmu);
}
+static void hisi_ptt_unregister_filter_update_notifier(void *data)
+{
+ struct hisi_ptt *hisi_ptt = data;
+
+ bus_unregister_notifier(&pci_bus_type, &hisi_ptt->hisi_ptt_nb);
+
+ /* Cancel any work that has been queued */
+ cancel_delayed_work_sync(&hisi_ptt->work);
+}
+
+/* Register the bus notifier for dynamically updating the filter list */
+static int hisi_ptt_register_filter_update_notifier(struct hisi_ptt *hisi_ptt)
+{
+ int ret;
+
+ hisi_ptt->hisi_ptt_nb.notifier_call = hisi_ptt_notifier_call;
+ ret = bus_register_notifier(&pci_bus_type, &hisi_ptt->hisi_ptt_nb);
+ if (ret)
+ return ret;
+
+ return devm_add_action_or_reset(&hisi_ptt->pdev->dev,
+ hisi_ptt_unregister_filter_update_notifier,
+ hisi_ptt);
+}
+
/*
* The DMA of PTT trace can only use direct mappings due to some
* hardware restriction. Check whether there is no IOMMU or the
@@ -981,6 +1127,10 @@ static int hisi_ptt_probe(struct pci_dev *pdev,
return ret;
}
+ ret = hisi_ptt_register_filter_update_notifier(hisi_ptt);
+ if (ret)
+ pci_warn(pdev, "failed to register filter update notifier, ret = %d", ret);
+
ret = hisi_ptt_register_pmu(hisi_ptt);
if (ret) {
pci_err(pdev, "failed to register PMU device, ret = %d", ret);
diff --git a/drivers/hwtracing/ptt/hisi_ptt.h b/drivers/hwtracing/ptt/hisi_ptt.h
index 5beb1648c93a..b1ba638fe7ea 100644
--- a/drivers/hwtracing/ptt/hisi_ptt.h
+++ b/drivers/hwtracing/ptt/hisi_ptt.h
@@ -11,12 +11,15 @@
#include <linux/bits.h>
#include <linux/cpumask.h>
+#include <linux/kfifo.h>
#include <linux/list.h>
#include <linux/mutex.h>
+#include <linux/notifier.h>
#include <linux/pci.h>
#include <linux/perf_event.h>
#include <linux/spinlock.h>
#include <linux/types.h>
+#include <linux/workqueue.h>
#define DRV_NAME "hisi_ptt"
@@ -71,6 +74,11 @@
#define HISI_PTT_WAIT_TRACE_TIMEOUT_US 100UL
#define HISI_PTT_WAIT_POLL_INTERVAL_US 10UL
+/* FIFO size for dynamically updating the PTT trace filter list. */
+#define HISI_PTT_FILTER_UPDATE_FIFO_SIZE 16
+/* Delay time for filter updating work */
+#define HISI_PTT_WORK_DELAY_MS 100UL
+
#define HISI_PCIE_CORE_PORT_ID(devfn) ((PCI_SLOT(devfn) & 0x7) << 1)
/* Definition of the PMU configs */
@@ -143,6 +151,16 @@ struct hisi_ptt_filter_desc {
u16 devid;
};
+/**
+ * struct hisi_ptt_filter_update_info - Information for PTT filter updating
+ * @pdev: the PCI device to update in the filter list
+ * @is_add: adding to the filter or not
+ */
+struct hisi_ptt_filter_update_info {
+ struct pci_dev *pdev;
+ bool is_add;
+};
+
/**
* struct hisi_ptt_pmu_buf - Descriptor of the AUX buffer of PTT trace
* @length: size of the AUX buffer
@@ -170,10 +188,15 @@ struct hisi_ptt_pmu_buf {
* @lower_bdf: the lower BDF range of the PCI devices managed by this PTT device
* @port_filters: the filter list of root ports
* @req_filters: the filter list of requester ID
+ * @filter_lock: lock to protect the filters
* @port_mask: port mask of the managed root ports
+ * @work: delayed work for filter updating
+ * @filter_update_lock: spinlock to protect the filter update fifo
+ * @filter_update_fifo: fifo of the filters waiting to update the filter list
*/
struct hisi_ptt {
struct hisi_ptt_trace_ctrl trace_ctrl;
+ struct notifier_block hisi_ptt_nb;
struct hlist_node hotplug_node;
struct pmu hisi_ptt_pmu;
void __iomem *iobase;
@@ -192,7 +215,19 @@ struct hisi_ptt {
*/
struct list_head port_filters;
struct list_head req_filters;
+ struct mutex filter_lock;
u16 port_mask;
+
+ /*
+ * We use a delayed work here to avoid indefinitely waiting for
+ * the hisi_ptt->mutex which protecting the filter list. The
+ * work will be delayed only if the mutex can not be held,
+ * otherwise no delay will be applied.
+ */
+ struct delayed_work work;
+ spinlock_t filter_update_lock;
+ DECLARE_KFIFO(filter_update_kfifo, struct hisi_ptt_filter_update_info,
+ HISI_PTT_FILTER_UPDATE_FIFO_SIZE);
};
#define to_hisi_ptt(pmu) container_of(pmu, struct hisi_ptt, hisi_ptt_pmu)
--
2.24.0
From: Yicong Yang <[email protected]>
The PTT can only filter the traced TLP headers by the Root Ports or the
Requester ID of the Endpoint, which are located on the same core of the
PTT device. The filter value used is derived from the BDF number of the
supported Root Port or the Endpoint. It's not friendly enough for the
users since it requires the user to be familiar enough with the platform
and calculate the filter value manually.
This patch export the available filters through sysfs. Each available
filters is presented as an individual file with the name of the BDF
number of the related PCIe device. The files are created under
$(PTT PMU dir)/available_root_port_filters and
$(PTT PMU dir)/available_requester_filters respectively. The filter
value can be known by reading the related file.
Then the users can easily know the available filters for trace and get
the filter values without calculating.
Signed-off-by: Yicong Yang <[email protected]>
---
.../ABI/testing/sysfs-devices-hisi_ptt | 50 ++++
Documentation/trace/hisi-ptt.rst | 6 +
drivers/hwtracing/ptt/hisi_ptt.c | 229 +++++++++++++++++-
drivers/hwtracing/ptt/hisi_ptt.h | 16 ++
4 files changed, 297 insertions(+), 4 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-devices-hisi_ptt b/Documentation/ABI/testing/sysfs-devices-hisi_ptt
index 82de6d710266..fe5b742b037b 100644
--- a/Documentation/ABI/testing/sysfs-devices-hisi_ptt
+++ b/Documentation/ABI/testing/sysfs-devices-hisi_ptt
@@ -59,3 +59,53 @@ Description: (RW) Control the allocated buffer watermark of outbound packets.
The available tune data is [0, 1, 2]. Writing a negative value
will return an error, and out of range values will be converted
to 2. The value indicates a probable level of the event.
+
+What: /sys/devices/hisi_ptt<sicl_id>_<core_id>/root_port_filters
+Date: February 2023
+KernelVersion: 6.4
+Contact: Yicong Yang <[email protected]>
+Description: This directory contains the files providing the PCIe Root Port filters
+ information used for PTT trace. Each file is named after the supported
+ Root Port device name <domain>:<bus>:<device>.<function>.
+
+ See the description of the "filter" in Documentation/trace/hisi-ptt.rst
+ for more information.
+
+What: /sys/devices/hisi_ptt<sicl_id>_<core_id>/root_port_filters/multiselect
+Date: February 2023
+KernelVersion: 6.4
+Contact: Yicong Yang <[email protected]>
+Description: (Read) Indicates whether this kind of filter can be multiselected
+ or not. 1 for multiselectable, 0 for not.
+
+What: /sys/devices/hisi_ptt<sicl_id>_<core_id>/root_port_filters/<bdf>
+Date: February 2023
+KernelVersion: 6.4
+Contact: Yicong Yang <[email protected]>
+Description: (Read) Indicates the filter value of this Root Port filter, which
+ can be used to control the TLP headers to trace by the PTT trace.
+
+What: /sys/devices/hisi_ptt<sicl_id>_<core_id>/requester_filters
+Date: February 2023
+KernelVersion: 6.4
+Contact: Yicong Yang <[email protected]>
+Description: This directory contains the files providing the PCIe Requester filters
+ information used for PTT trace. Each file is named after the supported
+ Endpoint device name <domain>:<bus>:<device>.<function>.
+
+ See the description of the "filter" in Documentation/trace/hisi-ptt.rst
+ for more information.
+
+What: /sys/devices/hisi_ptt<sicl_id>_<core_id>/requester_filters/multiselect
+Date: February 2023
+KernelVersion: 6.4
+Contact: Yicong Yang <[email protected]>
+Description: (Read) Indicates whether this kind of filter can be multiselected
+ or not. 1 for multiselectable, 0 for not.
+
+What: /sys/devices/hisi_ptt<sicl_id>_<core_id>/requester_filters/<bdf>
+Date: February 2023
+KernelVersion: 6.4
+Contact: Yicong Yang <[email protected]>
+Description: (Read) Indicates the filter value of this Requester filter, which
+ can be used to control the TLP headers to trace by the PTT trace.
diff --git a/Documentation/trace/hisi-ptt.rst b/Documentation/trace/hisi-ptt.rst
index 3641aca4287a..b8c7d71aee32 100644
--- a/Documentation/trace/hisi-ptt.rst
+++ b/Documentation/trace/hisi-ptt.rst
@@ -148,6 +148,12 @@ For example, if the desired filter is Endpoint function 0000:01:00.1 the filter
value will be 0x00101. If the desired filter is Root Port 0000:00:10.0 then
then filter value is calculated as 0x80001.
+The driver also presents every supported Root Port and Requester filter through
+sysfs. Each filter will be an individual file with name of its related PCIe
+device name (domain:bus:device.function). The files of Root Port filters are
+under $(PTT PMU dir)/root_port_filters and files of Requester filters
+are under $(PTT PMU dir)/requester_filters.
+
Note that multiple Root Ports can be specified at one time, but only one
Endpoint function can be specified in one trace. Specifying both Root Port
and function at the same time is not supported. Driver maintains a list of
diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
index 010cdbc3c172..a5cd87edb813 100644
--- a/drivers/hwtracing/ptt/hisi_ptt.c
+++ b/drivers/hwtracing/ptt/hisi_ptt.c
@@ -354,6 +354,139 @@ static int hisi_ptt_register_irq(struct hisi_ptt *hisi_ptt)
return 0;
}
+static ssize_t hisi_ptt_filter_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct hisi_ptt_filter_desc *filter;
+ unsigned long filter_val;
+
+ filter = container_of(attr, struct hisi_ptt_filter_desc, attr);
+ filter_val = hisi_ptt_get_filter_val(filter->devid, filter->is_port) |
+ (filter->is_port ? HISI_PTT_PMU_FILTER_IS_PORT : 0);
+
+ return sysfs_emit(buf, "0x%05lx\n", filter_val);
+}
+
+static int hisi_ptt_create_rp_filter_attr(struct hisi_ptt *hisi_ptt,
+ struct hisi_ptt_filter_desc *filter)
+{
+ struct kobject *kobj = &hisi_ptt->hisi_ptt_pmu.dev->kobj;
+
+ filter->attr.attr.name = filter->name;
+ filter->attr.attr.mode = 0400; /* DEVICE_ATTR_ADMIN_RO */
+ filter->attr.show = hisi_ptt_filter_show;
+
+ return sysfs_add_file_to_group(kobj, &filter->attr.attr,
+ HISI_PTT_RP_FILTERS_GRP_NAME);
+}
+
+static void hisi_ptt_remove_rp_filter_attr(struct hisi_ptt *hisi_ptt,
+ struct hisi_ptt_filter_desc *filter)
+{
+ struct kobject *kobj = &hisi_ptt->hisi_ptt_pmu.dev->kobj;
+
+ sysfs_remove_file_from_group(kobj, &filter->attr.attr,
+ HISI_PTT_RP_FILTERS_GRP_NAME);
+}
+
+static int hisi_ptt_create_req_filter_attr(struct hisi_ptt *hisi_ptt,
+ struct hisi_ptt_filter_desc *filter)
+{
+ struct kobject *kobj = &hisi_ptt->hisi_ptt_pmu.dev->kobj;
+
+ filter->attr.attr.name = filter->name;
+ filter->attr.attr.mode = 0400; /* DEVICE_ATTR_ADMIN_RO */
+ filter->attr.show = hisi_ptt_filter_show;
+
+ return sysfs_add_file_to_group(kobj, &filter->attr.attr,
+ HISI_PTT_REQ_FILTERS_GRP_NAME);
+}
+
+static void hisi_ptt_remove_req_filter_attr(struct hisi_ptt *hisi_ptt,
+ struct hisi_ptt_filter_desc *filter)
+{
+ struct kobject *kobj = &hisi_ptt->hisi_ptt_pmu.dev->kobj;
+
+ sysfs_remove_file_from_group(kobj, &filter->attr.attr,
+ HISI_PTT_REQ_FILTERS_GRP_NAME);
+}
+
+static int hisi_ptt_create_filter_attr(struct hisi_ptt *hisi_ptt,
+ struct hisi_ptt_filter_desc *filter)
+{
+ int ret;
+
+ if (filter->is_port)
+ ret = hisi_ptt_create_rp_filter_attr(hisi_ptt, filter);
+ else
+ ret = hisi_ptt_create_req_filter_attr(hisi_ptt, filter);
+
+ if (ret)
+ pci_err(hisi_ptt->pdev, "failed to create sysfs attribute for filter %s\n",
+ filter->name);
+
+ return ret;
+}
+
+static void hisi_ptt_remove_filter_attr(struct hisi_ptt *hisi_ptt,
+ struct hisi_ptt_filter_desc *filter)
+{
+ if (filter->is_port)
+ hisi_ptt_remove_rp_filter_attr(hisi_ptt, filter);
+ else
+ hisi_ptt_remove_req_filter_attr(hisi_ptt, filter);
+}
+
+static void hisi_ptt_remove_all_filter_attributes(void *data)
+{
+ struct hisi_ptt_filter_desc *filter;
+ struct hisi_ptt *hisi_ptt = data;
+
+ mutex_lock(&hisi_ptt->filter_lock);
+
+ list_for_each_entry(filter, &hisi_ptt->req_filters, list)
+ hisi_ptt_remove_filter_attr(hisi_ptt, filter);
+
+ list_for_each_entry(filter, &hisi_ptt->port_filters, list)
+ hisi_ptt_remove_filter_attr(hisi_ptt, filter);
+
+ hisi_ptt->sysfs_inited = false;
+ mutex_unlock(&hisi_ptt->filter_lock);
+}
+
+static int hisi_ptt_init_filter_attributes(struct hisi_ptt *hisi_ptt)
+{
+ struct hisi_ptt_filter_desc *filter;
+ int ret;
+
+ mutex_lock(&hisi_ptt->filter_lock);
+
+ list_for_each_entry(filter, &hisi_ptt->port_filters, list) {
+ ret = hisi_ptt_create_filter_attr(hisi_ptt, filter);
+ if (ret)
+ goto err;
+ }
+
+ list_for_each_entry(filter, &hisi_ptt->req_filters, list) {
+ ret = hisi_ptt_create_filter_attr(hisi_ptt, filter);
+ if (ret)
+ goto err;
+ }
+
+ ret = devm_add_action_or_reset(&hisi_ptt->pdev->dev,
+ hisi_ptt_remove_all_filter_attributes,
+ hisi_ptt);
+ if (ret)
+ goto err;
+
+ hisi_ptt->sysfs_inited = true;
+ mutex_unlock(&hisi_ptt->filter_lock);
+ return 0;
+err:
+ mutex_unlock(&hisi_ptt->filter_lock);
+ return ret;
+}
+
static void hisi_ptt_update_filters(struct work_struct *work)
{
struct delayed_work *delayed_work = to_delayed_work(work);
@@ -384,8 +517,28 @@ static void hisi_ptt_update_filters(struct work_struct *work)
continue;
}
+ filter->name = kstrdup(pci_name(info.pdev), GFP_KERNEL);
+ if (!filter->name) {
+ pci_err(hisi_ptt->pdev, "failed to add filter %s\n",
+ pci_name(info.pdev));
+ kfree(filter);
+ continue;
+ }
+
filter->devid = devid;
filter->is_port = is_port;
+
+ /*
+ * If filters' sysfs entries hasn't been initialized, then
+ * we're still at probe stage and leave it to handled by
+ * others.
+ */
+ if (hisi_ptt->sysfs_inited &&
+ hisi_ptt_create_filter_attr(hisi_ptt, filter)) {
+ kfree(filter);
+ continue;
+ }
+
list_add_tail(&filter->list, target_list);
if (is_port)
@@ -394,6 +547,11 @@ static void hisi_ptt_update_filters(struct work_struct *work)
list_for_each_entry(filter, target_list, list)
if (filter->devid == devid) {
list_del(&filter->list);
+
+ if (hisi_ptt->sysfs_inited)
+ hisi_ptt_remove_filter_attr(hisi_ptt, filter);
+
+ kfree(filter->name);
kfree(filter);
break;
}
@@ -486,10 +644,12 @@ static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data)
* through the log. Other functions of PTT device are still available.
*/
filter = kzalloc(sizeof(*filter), GFP_KERNEL);
- if (!filter) {
- pci_err(hisi_ptt->pdev, "failed to add filter %s\n", pci_name(pdev));
- return -ENOMEM;
- }
+ if (!filter)
+ goto err_mem;
+
+ filter->name = kstrdup(pci_name(pdev), GFP_KERNEL);
+ if (!filter->name)
+ goto err_name;
filter->devid = PCI_DEVID(pdev->bus->number, pdev->devfn);
@@ -504,6 +664,11 @@ static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data)
}
return 0;
+err_name:
+ kfree(filter);
+err_mem:
+ pci_err(hisi_ptt->pdev, "failed to add filter %s\n", pci_name(pdev));
+ return -ENOMEM;
}
static void hisi_ptt_release_filters(void *data)
@@ -513,11 +678,13 @@ static void hisi_ptt_release_filters(void *data)
list_for_each_entry_safe(filter, tmp, &hisi_ptt->req_filters, list) {
list_del(&filter->list);
+ kfree(filter->name);
kfree(filter);
}
list_for_each_entry_safe(filter, tmp, &hisi_ptt->port_filters, list) {
list_del(&filter->list);
+ kfree(filter->name);
kfree(filter);
}
}
@@ -653,10 +820,58 @@ static struct attribute_group hisi_ptt_pmu_format_group = {
.attrs = hisi_ptt_pmu_format_attrs,
};
+static ssize_t hisi_ptt_filter_multiselect_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct dev_ext_attribute *ext_attr;
+
+ ext_attr = container_of(attr, struct dev_ext_attribute, attr);
+ return sysfs_emit(buf, "%s\n", (char *)ext_attr->var);
+}
+
+static struct dev_ext_attribute root_port_filters_multiselect = {
+ .attr = {
+ .attr = { .name = "multiselect", .mode = 0400 },
+ .show = hisi_ptt_filter_multiselect_show,
+ },
+ .var = "1",
+};
+
+static struct attribute *hisi_ptt_pmu_root_ports_attrs[] = {
+ &root_port_filters_multiselect.attr.attr,
+ NULL
+};
+
+static struct attribute_group hisi_ptt_pmu_root_ports_group = {
+ .name = HISI_PTT_RP_FILTERS_GRP_NAME,
+ .attrs = hisi_ptt_pmu_root_ports_attrs,
+};
+
+static struct dev_ext_attribute requester_filters_multiselect = {
+ .attr = {
+ .attr = { .name = "multiselect", .mode = 0400 },
+ .show = hisi_ptt_filter_multiselect_show,
+ },
+ .var = "0",
+};
+
+static struct attribute *hisi_ptt_pmu_requesters_attrs[] = {
+ &requester_filters_multiselect.attr.attr,
+ NULL
+};
+
+static struct attribute_group hisi_ptt_pmu_requesters_group = {
+ .name = HISI_PTT_REQ_FILTERS_GRP_NAME,
+ .attrs = hisi_ptt_pmu_requesters_attrs,
+};
+
static const struct attribute_group *hisi_ptt_pmu_groups[] = {
&hisi_ptt_cpumask_attr_group,
&hisi_ptt_pmu_format_group,
&hisi_ptt_tune_group,
+ &hisi_ptt_pmu_root_ports_group,
+ &hisi_ptt_pmu_requesters_group,
NULL
};
@@ -1137,6 +1352,12 @@ static int hisi_ptt_probe(struct pci_dev *pdev,
return ret;
}
+ ret = hisi_ptt_init_filter_attributes(hisi_ptt);
+ if (ret) {
+ pci_err(pdev, "failed to init sysfs filter attributes, ret = %d", ret);
+ return ret;
+ }
+
return 0;
}
diff --git a/drivers/hwtracing/ptt/hisi_ptt.h b/drivers/hwtracing/ptt/hisi_ptt.h
index b1ba638fe7ea..21c7d99f94e5 100644
--- a/drivers/hwtracing/ptt/hisi_ptt.h
+++ b/drivers/hwtracing/ptt/hisi_ptt.h
@@ -11,6 +11,7 @@
#include <linux/bits.h>
#include <linux/cpumask.h>
+#include <linux/device.h>
#include <linux/kfifo.h>
#include <linux/list.h>
#include <linux/mutex.h>
@@ -139,15 +140,28 @@ struct hisi_ptt_trace_ctrl {
u32 type:4;
};
+/*
+ * sysfs attribute group name for root port filters and requester filters:
+ * /sys/devices/hisi_ptt<sicl_id>_<core_id>/root_port_filters
+ * and
+ * /sys/devices/hisi_ptt<sicl_id>_<core_id>/requester_filters
+ */
+#define HISI_PTT_RP_FILTERS_GRP_NAME "root_port_filters"
+#define HISI_PTT_REQ_FILTERS_GRP_NAME "requester_filters"
+
/**
* struct hisi_ptt_filter_desc - Descriptor of the PTT trace filter
+ * @attr: sysfs attribute of this filter
* @list: entry of this descriptor in the filter list
* @is_port: the PCI device of the filter is a Root Port or not
+ * @name: name of this filter, same as the name of the related PCI device
* @devid: the PCI device's devid of the filter
*/
struct hisi_ptt_filter_desc {
+ struct device_attribute attr;
struct list_head list;
bool is_port;
+ char *name;
u16 devid;
};
@@ -189,6 +203,7 @@ struct hisi_ptt_pmu_buf {
* @port_filters: the filter list of root ports
* @req_filters: the filter list of requester ID
* @filter_lock: lock to protect the filters
+ * @sysfs_inited: whether the filters' sysfs entries has been initialized
* @port_mask: port mask of the managed root ports
* @work: delayed work for filter updating
* @filter_update_lock: spinlock to protect the filter update fifo
@@ -216,6 +231,7 @@ struct hisi_ptt {
struct list_head port_filters;
struct list_head req_filters;
struct mutex filter_lock;
+ bool sysfs_inited;
u16 port_mask;
/*
--
2.24.0
A gentle ping...
Thanks.
On 2023/3/15 17:43, Yicong Yang wrote:
> From: Yicong Yang <[email protected]>
>
> This series tends to improve the PTT's filter interface in 2 aspects (Patch 2&3):
> - Support dynamically filter updating to response to hotplug
> Previous the supported filter list is settled down once the driver probed and
> it maybe out-of-date if hotplug events happen later. User need to reload the
> driver to update list. Patch 1/2 enable the driver to update the list by
> registering a PCI bus notifier and the filter list will always be the latest.
> - Export the available filters through sysfs
> Previous user needs to calculate the filters and filter value using device's
> BDF number, which requires the user to know the hardware well. Patch 3/3 tends
> to export the available filter information through sysfs attributes, the filter
> value will be gotten by reading the file. This will be more user friendly.
>
> Also includes a fix and an improve. Patch 1 tends to only export the online CPUs
> supported by the PTT, this will make perf work properly when there's offline CPUs
> within the node PTT locates. Patch 4 tends to set proper PMU capability to avoid
> collecting unnecessary data to save the storage.
>
> Yicong Yang (4):
> hwtracing: hisi_ptt: Make cpumask only present online CPUs
> hwtracing: hisi_ptt: Add support for dynamically updating the filter
> list
> hwtracing: hisi_ptt: Export available filters through sysfs
> hwtracing: hisi_ptt: Advertise PERF_PMU_CAP_NO_EXCLUDE for PTT PMU
>
> .../ABI/testing/sysfs-devices-hisi_ptt | 50 +++
> Documentation/trace/hisi-ptt.rst | 12 +-
> drivers/hwtracing/ptt/hisi_ptt.c | 396 +++++++++++++++++-
> drivers/hwtracing/ptt/hisi_ptt.h | 51 +++
> 4 files changed, 498 insertions(+), 11 deletions(-)
>
On Wed, 15 Mar 2023 17:43:13 +0800
Yicong Yang <[email protected]> wrote:
> From: Yicong Yang <[email protected]>
>
> perf will try to start PTT trace on every CPU presented in cpumask sysfs
> attribute and it will fail to start on offline CPUs(see the comments in
> perf_event_open()). But the driver is using cpumask_of_node() to export
> the available cpumask which may include offline CPUs and may fail the
> perf unintendedly. Fix this by only export the online CPUs of the node.
There isn't clear documentation that I can find for cpumask_of_node()
and chasing through on arm64 (which is what we care about for this driver)
it's maintained via numa_add_cpu() numa_remove_cpu()
Those are called in arch/arm64/kernel/smp.c in locations that are closely coupled
with set_cpu_online(cpu, XXX);
https://elixir.bootlin.com/linux/v6.3-rc4/source/arch/arm64/kernel/smp.c#L246
https://elixir.bootlin.com/linux/v6.3-rc4/source/arch/arm64/kernel/smp.c#L303
Now there are races when the two might not be in sync but in this case
we are just exposing the result to userspace, so chances of a race
after this sysfs attribute has been read seems much higher to me and
I don't think we can do anything about that.
Is there another path that I'm missing where online and node masks are out
of sync?
Jonathan
>
> Fixes: ff0de066b463 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device")
> Signed-off-by: Yicong Yang <[email protected]>
> ---
> drivers/hwtracing/ptt/hisi_ptt.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
> index 30f1525639b5..0a10c7ec46ad 100644
> --- a/drivers/hwtracing/ptt/hisi_ptt.c
> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
> @@ -487,9 +487,18 @@ static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> {
> struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
> - const cpumask_t *cpumask = cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev));
> + cpumask_var_t mask;
> + ssize_t n;
>
> - return cpumap_print_to_pagebuf(true, buf, cpumask);
> + if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> + return 0;
> +
> + cpumask_and(mask, cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)),
> + cpu_online_mask);
> + n = cpumap_print_to_pagebuf(true, buf, mask);
> + free_cpumask_var(mask);
> +
> + return n;
> }
> static DEVICE_ATTR_RO(cpumask);
>
On Wed, 15 Mar 2023 17:43:14 +0800
Yicong Yang <[email protected]> wrote:
> From: Yicong Yang <[email protected]>
>
> The PCIe devices supported by the PTT trace can be removed/rescanned by
> hotplug or through sysfs. Add support for dynamically updating the
> available filter list by registering a PCI bus notifier block. Then user
> can always get latest information about available tracing filters and
> driver can block the invalid filters of which related devices no longer
> exist in the system.
>
> Signed-off-by: Yicong Yang <[email protected]>
> ---
Just a few trivial comments on this.
With those tidied up
Reviewed-by: Jonathan Cameron <[email protected]>
...
> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
> index 0a10c7ec46ad..010cdbc3c172 100644
> --- a/drivers/hwtracing/ptt/hisi_ptt.c
> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
> +/*
> + * A PCI bus notifier is used here for dynamically updating the filter
> + * list.
> + */
> +static int hisi_ptt_notifier_call(struct notifier_block *nb, unsigned long action,
> + void *data)
> +{
> + struct hisi_ptt *hisi_ptt = container_of(nb, struct hisi_ptt, hisi_ptt_nb);
> + struct hisi_ptt_filter_update_info info;
> + struct device *dev = data;
> + struct pci_dev *pdev = to_pci_dev(dev);
This local variable doesn't add anything over
info.pdev = to_pci_dev(dev);
> +
> + info.pdev = pdev;
> +
> + switch (action) {
> + case BUS_NOTIFY_ADD_DEVICE:
> + info.is_add = true;
> + break;
> + case BUS_NOTIFY_DEL_DEVICE:
> + info.is_add = false;
> + break;
> + default:
> + return 0;
> + }
> +
> + hisi_ptt_update_fifo_in(hisi_ptt, &info);
> +
> + return 0;
> +}
> +
> diff --git a/drivers/hwtracing/ptt/hisi_ptt.h b/drivers/hwtracing/ptt/hisi_ptt.h
> index 5beb1648c93a..b1ba638fe7ea 100644
> --- a/drivers/hwtracing/ptt/hisi_ptt.h
> +++ b/drivers/hwtracing/ptt/hisi_ptt.h
> @@ -11,12 +11,15 @@
> /**
> * struct hisi_ptt_pmu_buf - Descriptor of the AUX buffer of PTT trace
> * @length: size of the AUX buffer
> @@ -170,10 +188,15 @@ struct hisi_ptt_pmu_buf {
> * @lower_bdf: the lower BDF range of the PCI devices managed by this PTT device
> * @port_filters: the filter list of root ports
> * @req_filters: the filter list of requester ID
> + * @filter_lock: lock to protect the filters
> * @port_mask: port mask of the managed root ports
> + * @work: delayed work for filter updating
> + * @filter_update_lock: spinlock to protect the filter update fifo
> + * @filter_update_fifo: fifo of the filters waiting to update the filter list
> */
> struct hisi_ptt {
> struct hisi_ptt_trace_ctrl trace_ctrl;
> + struct notifier_block hisi_ptt_nb;
Docs update for this one?
> struct hlist_node hotplug_node;
> struct pmu hisi_ptt_pmu;
> void __iomem *iobase;
> @@ -192,7 +215,19 @@ struct hisi_ptt {
> */
> struct list_head port_filters;
> struct list_head req_filters;
> + struct mutex filter_lock;
> u16 port_mask;
> +
> + /*
> + * We use a delayed work here to avoid indefinitely waiting for
> + * the hisi_ptt->mutex which protecting the filter list. The
> + * work will be delayed only if the mutex can not be held,
> + * otherwise no delay will be applied.
> + */
> + struct delayed_work work;
> + spinlock_t filter_update_lock;
> + DECLARE_KFIFO(filter_update_kfifo, struct hisi_ptt_filter_update_info,
> + HISI_PTT_FILTER_UPDATE_FIFO_SIZE);
> };
>
> #define to_hisi_ptt(pmu) container_of(pmu, struct hisi_ptt, hisi_ptt_pmu)
On Wed, 15 Mar 2023 17:43:16 +0800
Yicong Yang <[email protected]> wrote:
> From: Yicong Yang <[email protected]>
>
> The PTT trace collects PCIe TLP headers from the PCIe link and don't
> have the ability to exclude certain context. It doesn't support itrace
> as well. So only advertise PERF_PMU_CAP_NO_EXCLUDE. This will greatly
> save the storage of final data. Tested tracing idle link for ~15s,
> without this patch we'll collect ~28.682MB data for context related
> information and with this patch it reduced to ~0.226MB.
>
> Signed-off-by: Yicong Yang <[email protected]>
Seems reasonable.
Reviewed-by: Jonathan Cameron <[email protected]>
> ---
> drivers/hwtracing/ptt/hisi_ptt.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
> index a5cd87edb813..8c1cce32b83f 100644
> --- a/drivers/hwtracing/ptt/hisi_ptt.c
> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
> @@ -1216,7 +1216,7 @@ static int hisi_ptt_register_pmu(struct hisi_ptt *hisi_ptt)
>
> hisi_ptt->hisi_ptt_pmu = (struct pmu) {
> .module = THIS_MODULE,
> - .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
> + .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
> .task_ctx_nr = perf_sw_context,
> .attr_groups = hisi_ptt_pmu_groups,
> .event_init = hisi_ptt_pmu_event_init,
On Wed, 15 Mar 2023 17:43:15 +0800
Yicong Yang <[email protected]> wrote:
> From: Yicong Yang <[email protected]>
>
> The PTT can only filter the traced TLP headers by the Root Ports or the
> Requester ID of the Endpoint, which are located on the same core of the
> PTT device. The filter value used is derived from the BDF number of the
> supported Root Port or the Endpoint. It's not friendly enough for the
> users since it requires the user to be familiar enough with the platform
> and calculate the filter value manually.
>
> This patch export the available filters through sysfs. Each available
> filters is presented as an individual file with the name of the BDF
> number of the related PCIe device. The files are created under
> $(PTT PMU dir)/available_root_port_filters and
> $(PTT PMU dir)/available_requester_filters respectively. The filter
> value can be known by reading the related file.
>
> Then the users can easily know the available filters for trace and get
> the filter values without calculating.
>
> Signed-off-by: Yicong Yang <[email protected]>
Trivial comments only inline.
With those answered / tidied up.
Reviewed-by: Jonathan Cameron <[email protected]>
> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
> index 010cdbc3c172..a5cd87edb813 100644
> --- a/drivers/hwtracing/ptt/hisi_ptt.c
> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
>
> +
> +static int hisi_ptt_init_filter_attributes(struct hisi_ptt *hisi_ptt)
> +{
> + struct hisi_ptt_filter_desc *filter;
> + int ret;
> +
> + mutex_lock(&hisi_ptt->filter_lock);
> +
> + list_for_each_entry(filter, &hisi_ptt->port_filters, list) {
> + ret = hisi_ptt_create_filter_attr(hisi_ptt, filter);
> + if (ret)
> + goto err;
> + }
> +
> + list_for_each_entry(filter, &hisi_ptt->req_filters, list) {
> + ret = hisi_ptt_create_filter_attr(hisi_ptt, filter);
> + if (ret)
> + goto err;
> + }
> +
> + ret = devm_add_action_or_reset(&hisi_ptt->pdev->dev,
> + hisi_ptt_remove_all_filter_attributes,
> + hisi_ptt);
> + if (ret)
> + goto err;
> +
> + hisi_ptt->sysfs_inited = true;
err:
> + mutex_unlock(&hisi_ptt->filter_lock);
return ret;
No need for separate exit block when nothing to do but unlock.
> + return 0;
> +err:
> + mutex_unlock(&hisi_ptt->filter_lock);
> + return ret;
> +}
> +
> static void hisi_ptt_update_filters(struct work_struct *work)
> {
> struct delayed_work *delayed_work = to_delayed_work(work);
> @@ -384,8 +517,28 @@ static void hisi_ptt_update_filters(struct work_struct *work)
> continue;
> }
>
> + filter->name = kstrdup(pci_name(info.pdev), GFP_KERNEL);
> + if (!filter->name) {
> + pci_err(hisi_ptt->pdev, "failed to add filter %s\n",
> + pci_name(info.pdev));
> + kfree(filter);
> + continue;
> + }
> +
> filter->devid = devid;
> filter->is_port = is_port;
> +
> + /*
> + * If filters' sysfs entries hasn't been initialized, then
> + * we're still at probe stage and leave it to handled by
> + * others.
> + */
> + if (hisi_ptt->sysfs_inited &&
Can we move this sysfs_inited check earlier? Seems silly to leave a simple check
like that so late.
> + hisi_ptt_create_filter_attr(hisi_ptt, filter)) {
> + kfree(filter);
> + continue;
> + }
> +
> list_add_tail(&filter->list, target_list);
>
> if (is_port)
> @@ -394,6 +547,11 @@ static void hisi_ptt_update_filters(struct work_struct *work)
> list_for_each_entry(filter, target_list, list)
> if (filter->devid == devid) {
> list_del(&filter->list);
> +
> + if (hisi_ptt->sysfs_inited)
> + hisi_ptt_remove_filter_attr(hisi_ptt, filter);
> +
> + kfree(filter->name);
> kfree(filter);
> break;
> }
> @@ -486,10 +644,12 @@ static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data)
> * through the log. Other functions of PTT device are still available.
> */
> filter = kzalloc(sizeof(*filter), GFP_KERNEL);
> - if (!filter) {
> - pci_err(hisi_ptt->pdev, "failed to add filter %s\n", pci_name(pdev));
> - return -ENOMEM;
> - }
> + if (!filter)
> + goto err_mem;
> +
> + filter->name = kstrdup(pci_name(pdev), GFP_KERNEL);
> + if (!filter->name)
> + goto err_name;
>
> filter->devid = PCI_DEVID(pdev->bus->number, pdev->devfn);
>
> @@ -504,6 +664,11 @@ static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data)
> }
>
> return 0;
> +err_name:
> + kfree(filter);
> +err_mem:
> + pci_err(hisi_ptt->pdev, "failed to add filter %s\n", pci_name(pdev));
I'd rather see a message for each of the error paths so we have some information on why.
Original message wasn't great for this obviously and perhaps given they are both allocation
errors it's not worth splitting them up.
> + return -ENOMEM;
> }
On 2023/3/29 0:51, Jonathan Cameron wrote:
> On Wed, 15 Mar 2023 17:43:14 +0800
> Yicong Yang <[email protected]> wrote:
>
>> From: Yicong Yang <[email protected]>
>>
>> The PCIe devices supported by the PTT trace can be removed/rescanned by
>> hotplug or through sysfs. Add support for dynamically updating the
>> available filter list by registering a PCI bus notifier block. Then user
>> can always get latest information about available tracing filters and
>> driver can block the invalid filters of which related devices no longer
>> exist in the system.
>>
>> Signed-off-by: Yicong Yang <[email protected]>
>> ---
>
> Just a few trivial comments on this.
>
> With those tidied up
> Reviewed-by: Jonathan Cameron <[email protected]>
Thanks for the comment, will fix in next version.
>
>
>
> ...
>
>> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
>> index 0a10c7ec46ad..010cdbc3c172 100644
>> --- a/drivers/hwtracing/ptt/hisi_ptt.c
>> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
>
>> +/*
>> + * A PCI bus notifier is used here for dynamically updating the filter
>> + * list.
>> + */
>> +static int hisi_ptt_notifier_call(struct notifier_block *nb, unsigned long action,
>> + void *data)
>> +{
>> + struct hisi_ptt *hisi_ptt = container_of(nb, struct hisi_ptt, hisi_ptt_nb);
>> + struct hisi_ptt_filter_update_info info;
>> + struct device *dev = data;
>> + struct pci_dev *pdev = to_pci_dev(dev);
>
> This local variable doesn't add anything over
>
> info.pdev = to_pci_dev(dev);
>
ok, will drop it.
>
>
>> +
>> + info.pdev = pdev;
>> +
>> + switch (action) {
>> + case BUS_NOTIFY_ADD_DEVICE:
>> + info.is_add = true;
>> + break;
>> + case BUS_NOTIFY_DEL_DEVICE:
>> + info.is_add = false;
>> + break;
>> + default:
>> + return 0;
>> + }
>> +
>> + hisi_ptt_update_fifo_in(hisi_ptt, &info);
>> +
>> + return 0;
>> +}
>> +
>
>> diff --git a/drivers/hwtracing/ptt/hisi_ptt.h b/drivers/hwtracing/ptt/hisi_ptt.h
>> index 5beb1648c93a..b1ba638fe7ea 100644
>> --- a/drivers/hwtracing/ptt/hisi_ptt.h
>> +++ b/drivers/hwtracing/ptt/hisi_ptt.h
>> @@ -11,12 +11,15 @@
>
>> /**
>> * struct hisi_ptt_pmu_buf - Descriptor of the AUX buffer of PTT trace
>> * @length: size of the AUX buffer
>> @@ -170,10 +188,15 @@ struct hisi_ptt_pmu_buf {
>> * @lower_bdf: the lower BDF range of the PCI devices managed by this PTT device
>> * @port_filters: the filter list of root ports
>> * @req_filters: the filter list of requester ID
>> + * @filter_lock: lock to protect the filters
>> * @port_mask: port mask of the managed root ports
>> + * @work: delayed work for filter updating
>> + * @filter_update_lock: spinlock to protect the filter update fifo
>> + * @filter_update_fifo: fifo of the filters waiting to update the filter list
>> */
>> struct hisi_ptt {
>> struct hisi_ptt_trace_ctrl trace_ctrl;
>> + struct notifier_block hisi_ptt_nb;
>
> Docs update for this one?
>
sorry for missing this. will fix.
Thanks,
Yicong
>> struct hlist_node hotplug_node;
>> struct pmu hisi_ptt_pmu;
>> void __iomem *iobase;
>> @@ -192,7 +215,19 @@ struct hisi_ptt {
>> */
>> struct list_head port_filters;
>> struct list_head req_filters;
>> + struct mutex filter_lock;
>> u16 port_mask;
>> +
>> + /*
>> + * We use a delayed work here to avoid indefinitely waiting for
>> + * the hisi_ptt->mutex which protecting the filter list. The
>> + * work will be delayed only if the mutex can not be held,
>> + * otherwise no delay will be applied.
>> + */
>> + struct delayed_work work;
>> + spinlock_t filter_update_lock;
>> + DECLARE_KFIFO(filter_update_kfifo, struct hisi_ptt_filter_update_info,
>> + HISI_PTT_FILTER_UPDATE_FIFO_SIZE);
>> };
>>
>> #define to_hisi_ptt(pmu) container_of(pmu, struct hisi_ptt, hisi_ptt_pmu)
>
> .
>
On 2023/3/29 1:02, Jonathan Cameron wrote:
> On Wed, 15 Mar 2023 17:43:15 +0800
> Yicong Yang <[email protected]> wrote:
>
>> From: Yicong Yang <[email protected]>
>>
>> The PTT can only filter the traced TLP headers by the Root Ports or the
>> Requester ID of the Endpoint, which are located on the same core of the
>> PTT device. The filter value used is derived from the BDF number of the
>> supported Root Port or the Endpoint. It's not friendly enough for the
>> users since it requires the user to be familiar enough with the platform
>> and calculate the filter value manually.
>>
>> This patch export the available filters through sysfs. Each available
>> filters is presented as an individual file with the name of the BDF
>> number of the related PCIe device. The files are created under
>> $(PTT PMU dir)/available_root_port_filters and
>> $(PTT PMU dir)/available_requester_filters respectively. The filter
>> value can be known by reading the related file.
>>
>> Then the users can easily know the available filters for trace and get
>> the filter values without calculating.
>>
>> Signed-off-by: Yicong Yang <[email protected]>
>
> Trivial comments only inline.
>
> With those answered / tidied up.
>
> Reviewed-by: Jonathan Cameron <[email protected]>
>
>
>> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
>> index 010cdbc3c172..a5cd87edb813 100644
>> --- a/drivers/hwtracing/ptt/hisi_ptt.c
>> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
>
>
>>
>> +
>> +static int hisi_ptt_init_filter_attributes(struct hisi_ptt *hisi_ptt)
>> +{
>> + struct hisi_ptt_filter_desc *filter;
>> + int ret;
>> +
>> + mutex_lock(&hisi_ptt->filter_lock);
>> +
>> + list_for_each_entry(filter, &hisi_ptt->port_filters, list) {
>> + ret = hisi_ptt_create_filter_attr(hisi_ptt, filter);
>> + if (ret)
>> + goto err;
>> + }
>> +
>> + list_for_each_entry(filter, &hisi_ptt->req_filters, list) {
>> + ret = hisi_ptt_create_filter_attr(hisi_ptt, filter);
>> + if (ret)
>> + goto err;
>> + }
>> +
>> + ret = devm_add_action_or_reset(&hisi_ptt->pdev->dev,
>> + hisi_ptt_remove_all_filter_attributes,
>> + hisi_ptt);
>> + if (ret)
>> + goto err;
>> +
>> + hisi_ptt->sysfs_inited = true;
>
> err:
>
>> + mutex_unlock(&hisi_ptt->filter_lock);
>
> return ret;
>
> No need for separate exit block when nothing to do but unlock.
>
ok. will refine here.
>> + return 0;
>> +err:
>> + mutex_unlock(&hisi_ptt->filter_lock);
>> + return ret;
>> +}
>> +
>> static void hisi_ptt_update_filters(struct work_struct *work)
>> {
>> struct delayed_work *delayed_work = to_delayed_work(work);
>> @@ -384,8 +517,28 @@ static void hisi_ptt_update_filters(struct work_struct *work)
>> continue;
>> }
>>
>> + filter->name = kstrdup(pci_name(info.pdev), GFP_KERNEL);
>> + if (!filter->name) {
>> + pci_err(hisi_ptt->pdev, "failed to add filter %s\n",
>> + pci_name(info.pdev));
>> + kfree(filter);
>> + continue;
>> + }
>> +
>> filter->devid = devid;
>> filter->is_port = is_port;
>> +
>> + /*
>> + * If filters' sysfs entries hasn't been initialized, then
>> + * we're still at probe stage and leave it to handled by
>> + * others.
>> + */
>> + if (hisi_ptt->sysfs_inited &&
>
> Can we move this sysfs_inited check earlier? Seems silly to leave a simple check
> like that so late.
>
maybe move it into the hisi_ptt_create_filter_attr()? will have a check.
for here we still need to update filter list even if the hisi_ptt's sysfs is not
initialized yet.
>> + hisi_ptt_create_filter_attr(hisi_ptt, filter)) {
>> + kfree(filter);
>> + continue;
>> + }
>> +
>> list_add_tail(&filter->list, target_list);
>>
>> if (is_port)
>> @@ -394,6 +547,11 @@ static void hisi_ptt_update_filters(struct work_struct *work)
>> list_for_each_entry(filter, target_list, list)
>> if (filter->devid == devid) {
>> list_del(&filter->list);
>> +
>> + if (hisi_ptt->sysfs_inited)
>> + hisi_ptt_remove_filter_attr(hisi_ptt, filter);
>> +
>> + kfree(filter->name);
>> kfree(filter);
>> break;
>> }
>> @@ -486,10 +644,12 @@ static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data)
>> * through the log. Other functions of PTT device are still available.
>> */
>> filter = kzalloc(sizeof(*filter), GFP_KERNEL);
>> - if (!filter) {
>> - pci_err(hisi_ptt->pdev, "failed to add filter %s\n", pci_name(pdev));
>> - return -ENOMEM;
>> - }
>> + if (!filter)
>> + goto err_mem;
>> +
>> + filter->name = kstrdup(pci_name(pdev), GFP_KERNEL);
>> + if (!filter->name)
>> + goto err_name;
>>
>> filter->devid = PCI_DEVID(pdev->bus->number, pdev->devfn);
>>
>> @@ -504,6 +664,11 @@ static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data)
>> }
>>
>> return 0;
>> +err_name:
>> + kfree(filter);
>> +err_mem:
>> + pci_err(hisi_ptt->pdev, "failed to add filter %s\n", pci_name(pdev));
>
> I'd rather see a message for each of the error paths so we have some information on why.
> Original message wasn't great for this obviously and perhaps given they are both allocation
> errors it's not worth splitting them up.
>
ok, will try to split it and make it more verbosely.
Thanks,
Yicong
>> + return -ENOMEM;
>> }
>
>
> .
>
On 2023/3/29 0:24, Jonathan Cameron wrote:
> On Wed, 15 Mar 2023 17:43:13 +0800
> Yicong Yang <[email protected]> wrote:
>
>> From: Yicong Yang <[email protected]>
>>
>> perf will try to start PTT trace on every CPU presented in cpumask sysfs
>> attribute and it will fail to start on offline CPUs(see the comments in
>> perf_event_open()). But the driver is using cpumask_of_node() to export
>> the available cpumask which may include offline CPUs and may fail the
>> perf unintendedly. Fix this by only export the online CPUs of the node.
>
> There isn't clear documentation that I can find for cpumask_of_node()
> and chasing through on arm64 (which is what we care about for this driver)
> it's maintained via numa_add_cpu() numa_remove_cpu()
> Those are called in arch/arm64/kernel/smp.c in locations that are closely coupled
> with set_cpu_online(cpu, XXX);
> https://elixir.bootlin.com/linux/v6.3-rc4/source/arch/arm64/kernel/smp.c#L246
> https://elixir.bootlin.com/linux/v6.3-rc4/source/arch/arm64/kernel/smp.c#L303
>
> Now there are races when the two might not be in sync but in this case
> we are just exposing the result to userspace, so chances of a race
> after this sysfs attribute has been read seems much higher to me and
> I don't think we can do anything about that.
>
> Is there another path that I'm missing where online and node masks are out
> of sync?
>
maybe no. This patch maybe incorrect and I need more investigation, so let's me
drop it from the series. Tested and everything seems fine now.
I found this problem and referred to commit 064f0e9302af ("mm: only display online cpus of the numa node")
which might be the same problem. But seems unnecessary that cpumask_of_node()
already include online CPUs only.
Thanks.
> Jonathan
>
>
>>
>> Fixes: ff0de066b463 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device")
>> Signed-off-by: Yicong Yang <[email protected]>
>
>> ---
>> drivers/hwtracing/ptt/hisi_ptt.c | 13 +++++++++++--
>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
>> index 30f1525639b5..0a10c7ec46ad 100644
>> --- a/drivers/hwtracing/ptt/hisi_ptt.c
>> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
>> @@ -487,9 +487,18 @@ static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
>> char *buf)
>> {
>> struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
>> - const cpumask_t *cpumask = cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev));
>> + cpumask_var_t mask;
>> + ssize_t n;
>>
>> - return cpumap_print_to_pagebuf(true, buf, cpumask);
>> + if (!alloc_cpumask_var(&mask, GFP_KERNEL))
>> + return 0;
>> +
>> + cpumask_and(mask, cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)),
>> + cpu_online_mask);
>> + n = cpumap_print_to_pagebuf(true, buf, mask);
>> + free_cpumask_var(mask);
>> +
>> + return n;
>> }
>> static DEVICE_ATTR_RO(cpumask);
>>
>
> .
>
On Thu, 30 Mar 2023 11:53:14 +0800
Yicong Yang <[email protected]> wrote:
> On 2023/3/29 0:24, Jonathan Cameron wrote:
> > On Wed, 15 Mar 2023 17:43:13 +0800
> > Yicong Yang <[email protected]> wrote:
> >
> >> From: Yicong Yang <[email protected]>
> >>
> >> perf will try to start PTT trace on every CPU presented in cpumask sysfs
> >> attribute and it will fail to start on offline CPUs(see the comments in
> >> perf_event_open()). But the driver is using cpumask_of_node() to export
> >> the available cpumask which may include offline CPUs and may fail the
> >> perf unintendedly. Fix this by only export the online CPUs of the node.
> >
> > There isn't clear documentation that I can find for cpumask_of_node()
> > and chasing through on arm64 (which is what we care about for this driver)
> > it's maintained via numa_add_cpu() numa_remove_cpu()
> > Those are called in arch/arm64/kernel/smp.c in locations that are closely coupled
> > with set_cpu_online(cpu, XXX);
> > https://elixir.bootlin.com/linux/v6.3-rc4/source/arch/arm64/kernel/smp.c#L246
> > https://elixir.bootlin.com/linux/v6.3-rc4/source/arch/arm64/kernel/smp.c#L303
> >
> > Now there are races when the two might not be in sync but in this case
> > we are just exposing the result to userspace, so chances of a race
> > after this sysfs attribute has been read seems much higher to me and
> > I don't think we can do anything about that.
> >
> > Is there another path that I'm missing where online and node masks are out
> > of sync?
> >
>
> maybe no. This patch maybe incorrect and I need more investigation, so let's me
> drop it from the series. Tested and everything seems fine now.
>
> I found this problem and referred to commit 064f0e9302af ("mm: only display online cpus of the numa node")
> which might be the same problem. But seems unnecessary that cpumask_of_node()
> already include online CPUs only.
Seems it was fixed up for arm64 in
7f954aa1a ("arm64: smp: remove cpu and numa topology information when hotplugging out CPMU")
If we could audit all the other architectures it would be great to document
the properties of this cpmuask and possibly simplify the code in the
path you highlight above (assuming no race conditions etc)
Jonathan
>
> Thanks.
>
> > Jonathan
> >
> >
> >>
> >> Fixes: ff0de066b463 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device")
> >> Signed-off-by: Yicong Yang <[email protected]>
> >
> >> ---
> >> drivers/hwtracing/ptt/hisi_ptt.c | 13 +++++++++++--
> >> 1 file changed, 11 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
> >> index 30f1525639b5..0a10c7ec46ad 100644
> >> --- a/drivers/hwtracing/ptt/hisi_ptt.c
> >> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
> >> @@ -487,9 +487,18 @@ static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
> >> char *buf)
> >> {
> >> struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
> >> - const cpumask_t *cpumask = cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev));
> >> + cpumask_var_t mask;
> >> + ssize_t n;
> >>
> >> - return cpumap_print_to_pagebuf(true, buf, cpumask);
> >> + if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> >> + return 0;
> >> +
> >> + cpumask_and(mask, cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)),
> >> + cpu_online_mask);
> >> + n = cpumap_print_to_pagebuf(true, buf, mask);
> >> + free_cpumask_var(mask);
> >> +
> >> + return n;
> >> }
> >> static DEVICE_ATTR_RO(cpumask);
> >>
> >
> > .
> >