2020-11-21 02:05:59

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v2 16/17] driver core: Refactor fw_devlink feature

The current implementation of fw_devlink is very inefficient because it
tries to get away without creating fwnode links in the name of saving
memory usage. Past attempts to optimize runtime at the cost of memory
usage were blocked with request for data showing that the optimization
made significant improvement for real world scenarios.

We have those scenarios now. There have been several reports of boot
time increase in the order of seconds in this thread [1]. Several OEMs
and SoC manufacturers have also privately reported significant
(350-400ms) increase in boot time due to all the parsing done by
fw_devlink.

So this patch uses all the setup done by the previous patches in this
series to refactor fw_devlink to be more efficient. Most of the code has
been moved out of firmware specific (DT mostly) code into driver core.

This brings the following benefits:
- Instead of parsing the device tree multiple times during bootup,
fw_devlink parses each fwnode node/property only once and creates
fwnode links. The rest of the fw_devlink code then just looks at these
fwnode links to do rest of the work.

- Makes it much easier to debug probe issue due to fw_devlink in the
future. fw_devlink=on blocks the probing of devices if they depend on
a device that hasn't been added yet. With this refactor, it'll be very
easy to tell what that device is because we now have a reference to
the fwnode of the device.

- Much easier to add fw_devlink support to ACPI and other firmware
types. A refactor to move the common bits from DT specific code to
driver core was in my TODO list as a prerequisite to adding ACPI
support to fw_devlink. This series gets that done.

[1] - https://lore.kernel.org/linux-omap/[email protected]/
Signed-off-by: Saravana Kannan <[email protected]>
Tested-by: Laurent Pinchart <[email protected]>
Tested-by: Grygorii Strashko <[email protected]>
---
drivers/base/core.c | 325 ++++++++++++++++++++++++++++++-----------
include/linux/device.h | 5 -
2 files changed, 238 insertions(+), 92 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 1873cecb0cc4..9edf9084fc98 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -46,8 +46,6 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
#endif

/* Device links support. */
-static LIST_HEAD(wait_for_suppliers);
-static DEFINE_MUTEX(wfs_lock);
static LIST_HEAD(deferred_sync);
static unsigned int defer_sync_state_count = 1;
static DEFINE_MUTEX(fwnode_link_lock);
@@ -803,74 +801,6 @@ struct device_link *device_link_add(struct device *consumer,
}
EXPORT_SYMBOL_GPL(device_link_add);

-/**
- * device_link_wait_for_supplier - Add device to wait_for_suppliers list
- * @consumer: Consumer device
- *
- * Marks the @consumer device as waiting for suppliers to become available by
- * adding it to the wait_for_suppliers list. The consumer device will never be
- * probed until it's removed from the wait_for_suppliers list.
- *
- * The caller is responsible for adding the links to the supplier devices once
- * they are available and removing the @consumer device from the
- * wait_for_suppliers list once links to all the suppliers have been created.
- *
- * This function is NOT meant to be called from the probe function of the
- * consumer but rather from code that creates/adds the consumer device.
- */
-static void device_link_wait_for_supplier(struct device *consumer,
- bool need_for_probe)
-{
- mutex_lock(&wfs_lock);
- list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
- consumer->links.need_for_probe = need_for_probe;
- mutex_unlock(&wfs_lock);
-}
-
-static void device_link_wait_for_mandatory_supplier(struct device *consumer)
-{
- device_link_wait_for_supplier(consumer, true);
-}
-
-static void device_link_wait_for_optional_supplier(struct device *consumer)
-{
- device_link_wait_for_supplier(consumer, false);
-}
-
-/**
- * device_link_add_missing_supplier_links - Add links from consumer devices to
- * supplier devices, leaving any
- * consumer with inactive suppliers on
- * the wait_for_suppliers list
- *
- * Loops through all consumers waiting on suppliers and tries to add all their
- * supplier links. If that succeeds, the consumer device is removed from
- * wait_for_suppliers list. Otherwise, they are left in the wait_for_suppliers
- * list. Devices left on the wait_for_suppliers list will not be probed.
- *
- * The fwnode add_links callback is expected to return 0 if it has found and
- * added all the supplier links for the consumer device. It should return an
- * error if it isn't able to do so.
- *
- * The caller of device_link_wait_for_supplier() is expected to call this once
- * it's aware of potential suppliers becoming available.
- */
-static void device_link_add_missing_supplier_links(void)
-{
- struct device *dev, *tmp;
-
- mutex_lock(&wfs_lock);
- list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
- links.needs_suppliers) {
- int ret = fwnode_call_int_op(dev->fwnode, add_links, dev);
- if (!ret)
- list_del_init(&dev->links.needs_suppliers);
- else if (ret != -ENODEV)
- dev->links.need_for_probe = false;
- }
- mutex_unlock(&wfs_lock);
-}
-
#ifdef CONFIG_SRCU
static void __device_link_del(struct kref *kref)
{
@@ -1195,9 +1125,8 @@ void device_links_driver_bound(struct device *dev)
* the device links it needs to or make new device links as it needs
* them. So, it no longer needs to wait on any suppliers.
*/
- mutex_lock(&wfs_lock);
- list_del_init(&dev->links.needs_suppliers);
- mutex_unlock(&wfs_lock);
+ if (dev->fwnode && dev->fwnode->dev == dev)
+ fwnode_links_purge_suppliers(dev->fwnode);
device_remove_file(dev, &dev_attr_waiting_for_supplier);

device_links_write_lock();
@@ -1488,10 +1417,6 @@ static void device_links_purge(struct device *dev)
if (dev->class == &devlink_class)
return;

- mutex_lock(&wfs_lock);
- list_del(&dev->links.needs_suppliers);
- mutex_unlock(&wfs_lock);
-
/*
* Delete all of the remaining links from this device to any other
* devices (either consumers or suppliers).
@@ -1561,19 +1486,246 @@ static void fw_devlink_parse_fwtree(struct fwnode_handle *fwnode)
fw_devlink_parse_fwtree(child);
}

-static void fw_devlink_link_device(struct device *dev)
+/**
+ * fw_devlink_create_devlink - Create a device link from a consumer to fwnode
+ * @con - Consumer device for the device link
+ * @sup_handle - fwnode handle of supplier
+ *
+ * This function will try to create a device link between the consumer device
+ * @con and the supplier device represented by @sup_handle.
+ *
+ * The supplier has to be provided as a fwnode because incorrect cycles in
+ * fwnode links can sometimes cause the supplier device to never be created.
+ * This function detects such cases and returns an error if it cannot create a
+ * device link from the consumer to a missing supplier.
+ *
+ * Returns,
+ * 0 on successfully creating a device link
+ * -EINVAL if the device link cannot be created as expected
+ * -EAGAIN if the device link cannot be created right now, but it may be
+ * possible to do that in the future
+ */
+static int fw_devlink_create_devlink(struct device *con,
+ struct fwnode_handle *sup_handle, u32 flags)
+{
+ struct device *sup_dev;
+ int ret = 0;
+
+ sup_dev = get_dev_from_fwnode(sup_handle);
+ if (sup_dev) {
+ /*
+ * If this fails, it is due to cycles in device links. Just
+ * give up on this link and treat it as invalid.
+ */
+ if (!device_link_add(con, sup_dev, flags))
+ ret = -EINVAL;
+
+ goto out;
+ }
+
+ /*
+ * DL_FLAG_SYNC_STATE_ONLY doesn't block probing and supports
+ * cycles. So cycle detection isn't necessary and shouldn't be
+ * done.
+ */
+ if (flags & DL_FLAG_SYNC_STATE_ONLY)
+ return -EAGAIN;
+
+ /*
+ * If we can't find the supplier device from its fwnode, it might be
+ * due to a cyclic dependency between fwnodes. Some of these cycles can
+ * be broken by applying logic. Check for these types of cycles and
+ * break them so that devices in the cycle probe properly.
+ *
+ * If the supplier's parent is dependent on the consumer, then
+ * the consumer-supplier dependency is a false dependency. So,
+ * treat it as an invalid link.
+ */
+ sup_dev = fwnode_get_next_parent_dev(sup_handle);
+ if (sup_dev && device_is_dependent(con, sup_dev)) {
+ dev_dbg(con, "Not linking to %pfwP - False link\n",
+ sup_handle);
+ ret = -EINVAL;
+ } else {
+ /*
+ * Can't check for cycles or no cycles. So let's try
+ * again later.
+ */
+ ret = -EAGAIN;
+ }
+
+out:
+ put_device(sup_dev);
+ return ret;
+}
+
+/**
+ * __fw_devlink_link_to_consumers - Create device links to consumers of a device
+ * @dev - Device that needs to be linked to its consumers
+ *
+ * This function looks at all the consumer fwnodes of @dev and creates device
+ * links between the consumer device and @dev (supplier).
+ *
+ * If the consumer device has not been added yet, then this function creates a
+ * SYNC_STATE_ONLY link between @dev (supplier) and the closest ancestor device
+ * of the consumer fwnode. This is necessary to make sure @dev doesn't get a
+ * sync_state() callback before the real consumer device gets to be added and
+ * then probed.
+ *
+ * Once device links are created from the real consumer to @dev (supplier), the
+ * fwnode links are deleted.
+ */
+static void __fw_devlink_link_to_consumers(struct device *dev)
+{
+ struct fwnode_handle *fwnode = dev->fwnode;
+ struct fwnode_link *link, *tmp;
+
+ list_for_each_entry_safe(link, tmp, &fwnode->consumers, s_hook) {
+ u32 dl_flags = fw_devlink_get_flags();
+ struct device *con_dev;
+ bool own_link = true;
+ int ret;
+
+ con_dev = get_dev_from_fwnode(link->consumer);
+ /*
+ * If consumer device is not available yet, make a "proxy"
+ * SYNC_STATE_ONLY link from the consumer's parent device to
+ * the supplier device. This is necessary to make sure the
+ * supplier doesn't get a sync_state() callback before the real
+ * consumer can create a device link to the supplier.
+ *
+ * This proxy link step is needed to handle the case where the
+ * consumer's parent device is added before the supplier.
+ */
+ if (!con_dev) {
+ con_dev = fwnode_get_next_parent_dev(link->consumer);
+ /*
+ * However, if the consumer's parent device is also the
+ * parent of the supplier, don't create a
+ * consumer-supplier link from the parent to its child
+ * device. Such a dependency is impossible.
+ */
+ if (con_dev &&
+ fwnode_is_ancestor_of(con_dev->fwnode, fwnode)) {
+ put_device(con_dev);
+ con_dev = NULL;
+ } else {
+ own_link = false;
+ dl_flags = DL_FLAG_SYNC_STATE_ONLY;
+ }
+ }
+
+ if (!con_dev)
+ continue;
+
+ ret = fw_devlink_create_devlink(con_dev, fwnode, dl_flags);
+ put_device(con_dev);
+ if (!own_link || ret == -EAGAIN)
+ continue;
+
+ list_del(&link->s_hook);
+ list_del(&link->c_hook);
+ kfree(link);
+ }
+}
+
+/**
+ * __fw_devlink_link_to_suppliers - Create device links to suppliers of a device
+ * @dev - The consumer device that needs to be linked to its suppliers
+ * @fwnode - Root of the fwnode tree that is used to create device links
+ *
+ * This function looks at all the supplier fwnodes of fwnode tree rooted at
+ * @fwnode and creates device links between @dev (consumer) and all the
+ * supplier devices of the entire fwnode tree at @fwnode.
+ *
+ * The function creates normal (non-SYNC_STATE_ONLY) device links between @dev
+ * and the real suppliers of @dev. Once these device links are created, the
+ * fwnode links are deleted. When such device links are successfully created,
+ * this function is called recursively on those supplier devices. This is
+ * needed to detect and break some invalid cycles in fwnode links. See
+ * fw_devlink_create_devlink() for more details.
+ *
+ * In addition, it also looks at all the suppliers of the entire fwnode tree
+ * because some of the child devices of @dev that have not been added yet
+ * (because @dev hasn't probed) might already have their suppliers added to
+ * driver core. So, this function creates SYNC_STATE_ONLY device links between
+ * @dev (consumer) and these suppliers to make sure they don't execute their
+ * sync_state() callbacks before these child devices have a chance to create
+ * their device links. The fwnode links that correspond to the child devices
+ * aren't delete because they are needed later to create the device links
+ * between the real consumer and supplier devices.
+ */
+static void __fw_devlink_link_to_suppliers(struct device *dev,
+ struct fwnode_handle *fwnode)
{
- int fw_ret;
+ bool own_link = (dev->fwnode == fwnode);
+ struct fwnode_link *link, *tmp;
+ struct fwnode_handle *child = NULL;
+ u32 dl_flags;
+
+ if (own_link)
+ dl_flags = fw_devlink_get_flags();
+ else
+ dl_flags = DL_FLAG_SYNC_STATE_ONLY;

- device_link_add_missing_supplier_links();
+ list_for_each_entry_safe(link, tmp, &fwnode->suppliers, c_hook) {
+ int ret;
+ struct device *sup_dev;
+ struct fwnode_handle *sup = link->supplier;
+
+ ret = fw_devlink_create_devlink(dev, sup, dl_flags);
+ if (!own_link || ret == -EAGAIN)
+ continue;

- if (fw_devlink_flags && fwnode_has_op(dev->fwnode, add_links)) {
- fw_ret = fwnode_call_int_op(dev->fwnode, add_links, dev);
- if (fw_ret == -ENODEV && !fw_devlink_is_permissive())
- device_link_wait_for_mandatory_supplier(dev);
- else if (fw_ret)
- device_link_wait_for_optional_supplier(dev);
+ list_del(&link->s_hook);
+ list_del(&link->c_hook);
+ kfree(link);
+
+ /* If no device link was created, nothing more to do. */
+ if (ret)
+ continue;
+
+ /*
+ * If a device link was successfully created to a supplier, we
+ * now need to try and link the supplier to all its suppliers.
+ *
+ * This is needed to detect and delete false dependencies in
+ * fwnode links that haven't been converted to a device link
+ * yet. See comments in fw_devlink_create_devlink() for more
+ * details on the false dependency.
+ *
+ * Without deleting these false dependencies, some devices will
+ * never probe because they'll keep waiting for their false
+ * dependency fwnode links to be converted to device links.
+ */
+ sup_dev = get_dev_from_fwnode(sup);
+ __fw_devlink_link_to_suppliers(sup_dev, sup_dev->fwnode);
+ put_device(sup_dev);
}
+
+ /*
+ * Make "proxy" SYNC_STATE_ONLY device links to represent the needs of
+ * all the descendants. This proxy link step is needed to handle the
+ * case where the supplier is added before the consumer's parent device
+ * (@dev).
+ */
+ while ((child = fwnode_get_next_available_child_node(fwnode, child)))
+ __fw_devlink_link_to_suppliers(dev, child);
+}
+
+static void fw_devlink_link_device(struct device *dev)
+{
+ struct fwnode_handle *fwnode = dev->fwnode;
+
+ if (!fw_devlink_flags)
+ return;
+
+ fw_devlink_parse_fwtree(fwnode);
+
+ mutex_lock(&fwnode_link_lock);
+ __fw_devlink_link_to_consumers(dev);
+ __fw_devlink_link_to_suppliers(dev, fwnode);
+ mutex_unlock(&fwnode_link_lock);
}

/* Device links support end. */
@@ -2431,7 +2583,6 @@ void device_initialize(struct device *dev)
#endif
INIT_LIST_HEAD(&dev->links.consumers);
INIT_LIST_HEAD(&dev->links.suppliers);
- INIT_LIST_HEAD(&dev->links.needs_suppliers);
INIT_LIST_HEAD(&dev->links.defer_sync);
dev->links.status = DL_DEV_NO_DRIVER;
}
diff --git a/include/linux/device.h b/include/linux/device.h
index 1e771ea4dca6..89bb8b84173e 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -351,18 +351,13 @@ enum dl_dev_state {
* struct dev_links_info - Device data related to device links.
* @suppliers: List of links to supplier devices.
* @consumers: List of links to consumer devices.
- * @needs_suppliers: Hook to global list of devices waiting for suppliers.
* @defer_sync: Hook to global list of devices that have deferred sync_state.
- * @need_for_probe: If needs_suppliers is on a list, this indicates if the
- * suppliers are needed for probe or not.
* @status: Driver status information.
*/
struct dev_links_info {
struct list_head suppliers;
struct list_head consumers;
- struct list_head needs_suppliers;
struct list_head defer_sync;
- bool need_for_probe;
enum dl_dev_state status;
};

--
2.29.2.454.gaff20da3a2-goog


2020-11-23 15:49:17

by Oliver Sang

[permalink] [raw]
Subject: [driver core] 95f755a4ef: will-it-scale.per_process_ops 2.2% improvement


Greeting,

FYI, we noticed a 2.2% improvement of will-it-scale.per_process_ops due to commit:


commit: 95f755a4ef7b9ccbedf6012b4112a990120b6a6c ("[PATCH v2 16/17] driver core: Refactor fw_devlink feature")
url: https://github.com/0day-ci/linux/commits/Saravana-Kannan/Refactor-fw_devlink-to-significantly-improve-boot-time/20201121-100850
base: https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git linux-next

in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

nr_task: 50%
mode: process
test: futex3
cpufreq_governor: performance
ucode: 0x5003003

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale

In addition to that, the commit also has significant impact on the following tests:

+------------------+---------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops 5.2% improvement |
| test machine | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=50% |
| | test=dup1 |
| | ucode=0x5003003 |
+------------------+---------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/futex3/will-it-scale/0x5003003

commit:
03b7843215 ("efi: Update implementation of add_links() to create fwnode links")
95f755a4ef ("driver core: Refactor fw_devlink feature")

03b7843215575338 95f755a4ef7b9ccbedf6012b411
---------------- ---------------------------
%stddev %change %stddev
\ | \
10181734 +2.2% 10402895 will-it-scale.per_process_ops
9.774e+08 +2.2% 9.987e+08 will-it-scale.workload
13832427 ? 6% -10.8% 12342804 ? 4% meminfo.DirectMap2M
2454 -5.0% 2332 ? 2% vmstat.system.cs
16233 ? 29% +172.3% 44205 ? 33% numa-vmstat.node0.nr_anon_pages
17076 ? 33% +162.3% 44793 ? 35% numa-vmstat.node0.nr_inactive_anon
17076 ? 33% +162.3% 44793 ? 35% numa-vmstat.node0.nr_zone_inactive_anon
23545 ? 69% +388.7% 115074 ? 36% numa-meminfo.node0.AnonHugePages
64841 ? 29% +172.6% 176742 ? 33% numa-meminfo.node0.AnonPages
68213 ? 33% +162.6% 179095 ? 35% numa-meminfo.node0.Inactive
68213 ? 33% +162.6% 179095 ? 35% numa-meminfo.node0.Inactive(anon)
4.38 ? 10% -2.5 1.87 ? 8% perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
4.61 ? 9% -2.7 1.88 ? 8% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
0.46 ? 11% -0.2 0.24 ? 9% perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
4.44 ? 10% -2.7 1.71 ? 8% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
2.00 ? 9% +0.6 2.63 ? 8% perf-profile.self.cycles-pp.do_futex
2.52 ? 10% +0.8 3.37 ? 9% perf-profile.self.cycles-pp.__x64_sys_futex
70868 ? 13% +21.5% 86124 sched_debug.cfs_rq:/.exec_clock.avg
125411 ? 10% +17.9% 147892 sched_debug.cfs_rq:/.exec_clock.max
7286186 ? 12% +21.3% 8840402 sched_debug.cfs_rq:/.min_vruntime.avg
12873032 ? 10% +17.7% 15149207 sched_debug.cfs_rq:/.min_vruntime.max
0.80 ? 13% +29.9% 1.04 ? 9% sched_debug.cfs_rq:/.nr_spread_over.avg
-828241 +84.1% -1524980 sched_debug.cfs_rq:/.spread0.min
7478 ? 5% +9.6% 8199 sched_debug.cpu.curr->pid.max
21364 ? 19% +41.9% 30320 ? 23% sched_debug.cpu.ttwu_count.max
1997 ? 14% +27.8% 2552 ? 15% sched_debug.cpu.ttwu_count.stddev
6.451e+10 +2.3% 6.597e+10 perf-stat.i.branch-instructions
4.942e+08 +2.3% 5.054e+08 perf-stat.i.branch-misses
13.97 ? 2% -1.0 13.00 ? 3% perf-stat.i.cache-miss-rate%
1429864 ? 3% -8.8% 1303388 ? 4% perf-stat.i.cache-misses
2410 -5.2% 2285 ? 3% perf-stat.i.context-switches
0.71 -2.6% 0.69 perf-stat.i.cpi
229727 ? 2% +9.8% 252139 ? 4% perf-stat.i.cycles-between-cache-misses
1.062e+11 +2.3% 1.087e+11 perf-stat.i.dTLB-loads
127892 +1.6% 129886 perf-stat.i.dTLB-store-misses
8.09e+10 +2.2% 8.272e+10 perf-stat.i.dTLB-stores
4259891 ? 2% -4.0% 4089092 perf-stat.i.iTLB-loads
4.208e+11 +2.3% 4.303e+11 perf-stat.i.instructions
1.42 +2.2% 1.45 perf-stat.i.ipc
0.65 +3.1% 0.67 perf-stat.i.major-faults
1.33 +3.6% 1.38 perf-stat.i.metric.K/sec
1310 +2.3% 1340 perf-stat.i.metric.M/sec
14.08 ? 2% -1.2 12.92 ? 3% perf-stat.overall.cache-miss-rate%
0.70 -2.1% 0.69 perf-stat.overall.cpi
205608 ? 3% +10.0% 226168 ? 4% perf-stat.overall.cycles-between-cache-misses
1.42 +2.2% 1.45 perf-stat.overall.ipc
6.429e+10 +2.3% 6.574e+10 perf-stat.ps.branch-instructions
4.925e+08 +2.3% 5.037e+08 perf-stat.ps.branch-misses
1439249 ? 3% -8.9% 1310558 ? 4% perf-stat.ps.cache-misses
2401 -5.3% 2274 ? 3% perf-stat.ps.context-switches
1.059e+11 +2.3% 1.083e+11 perf-stat.ps.dTLB-loads
127806 +1.7% 129943 perf-stat.ps.dTLB-store-misses
8.062e+10 +2.2% 8.243e+10 perf-stat.ps.dTLB-stores
4246454 ? 2% -4.1% 4074221 perf-stat.ps.iTLB-loads
4.193e+11 +2.3% 4.288e+11 perf-stat.ps.instructions
32095 ? 2% -6.6% 29971 ? 5% perf-stat.ps.node-loads
1.27e+14 +2.1% 1.297e+14 perf-stat.total.instructions
2251 ? 13% -25.6% 1674 ? 16% interrupts.CPU100.CAL:Function_call_interrupts
4823 ? 46% +34.3% 6475 ? 35% interrupts.CPU105.NMI:Non-maskable_interrupts
4823 ? 46% +34.3% 6475 ? 35% interrupts.CPU105.PMI:Performance_monitoring_interrupts
7063 ? 15% -47.6% 3704 ? 47% interrupts.CPU11.NMI:Non-maskable_interrupts
7063 ? 15% -47.6% 3704 ? 47% interrupts.CPU11.PMI:Performance_monitoring_interrupts
1743 ? 6% +29.0% 2249 ? 22% interrupts.CPU126.CAL:Function_call_interrupts
6249 ? 6% -41.3% 3670 ? 14% interrupts.CPU130.NMI:Non-maskable_interrupts
6249 ? 6% -41.3% 3670 ? 14% interrupts.CPU130.PMI:Performance_monitoring_interrupts
7215 ? 17% -42.5% 4148 ? 60% interrupts.CPU137.NMI:Non-maskable_interrupts
7215 ? 17% -42.5% 4148 ? 60% interrupts.CPU137.PMI:Performance_monitoring_interrupts
235.50 ? 19% -77.8% 52.25 ? 91% interrupts.CPU144.RES:Rescheduling_interrupts
5653 ? 37% -50.4% 2803 ? 33% interrupts.CPU146.NMI:Non-maskable_interrupts
5653 ? 37% -50.4% 2803 ? 33% interrupts.CPU146.PMI:Performance_monitoring_interrupts
6894 ? 16% -50.3% 3426 ? 22% interrupts.CPU147.NMI:Non-maskable_interrupts
6894 ? 16% -50.3% 3426 ? 22% interrupts.CPU147.PMI:Performance_monitoring_interrupts
6768 ? 20% -57.4% 2886 ? 36% interrupts.CPU149.NMI:Non-maskable_interrupts
6768 ? 20% -57.4% 2886 ? 36% interrupts.CPU149.PMI:Performance_monitoring_interrupts
6750 ? 19% -36.7% 4274 ? 24% interrupts.CPU15.NMI:Non-maskable_interrupts
6750 ? 19% -36.7% 4274 ? 24% interrupts.CPU15.PMI:Performance_monitoring_interrupts
74.75 ? 95% +181.6% 210.50 ? 32% interrupts.CPU157.RES:Rescheduling_interrupts
3955 ? 41% +114.3% 8474 ? 5% interrupts.CPU159.NMI:Non-maskable_interrupts
3955 ? 41% +114.3% 8474 ? 5% interrupts.CPU159.PMI:Performance_monitoring_interrupts
157.00 ? 50% -82.0% 28.25 ? 83% interrupts.CPU161.RES:Rescheduling_interrupts
6259 ? 11% -42.8% 3579 ? 34% interrupts.CPU183.NMI:Non-maskable_interrupts
6259 ? 11% -42.8% 3579 ? 34% interrupts.CPU183.PMI:Performance_monitoring_interrupts
354.50 ? 77% -87.2% 45.50 ?120% interrupts.CPU2.RES:Rescheduling_interrupts
5070 ? 35% +54.3% 7821 ? 14% interrupts.CPU26.NMI:Non-maskable_interrupts
5070 ? 35% +54.3% 7821 ? 14% interrupts.CPU26.PMI:Performance_monitoring_interrupts
158.75 ? 38% +74.3% 276.75 ? 15% interrupts.CPU35.RES:Rescheduling_interrupts
7163 ? 21% -49.6% 3608 ? 35% interrupts.CPU44.NMI:Non-maskable_interrupts
7163 ? 21% -49.6% 3608 ? 35% interrupts.CPU44.PMI:Performance_monitoring_interrupts
57.75 ? 74% +329.4% 248.00 ? 16% interrupts.CPU48.RES:Rescheduling_interrupts
5295 ? 16% +61.1% 8529 ? 3% interrupts.CPU5.NMI:Non-maskable_interrupts
5295 ? 16% +61.1% 8529 ? 3% interrupts.CPU5.PMI:Performance_monitoring_interrupts
7999 ? 16% -53.7% 3703 ? 47% interrupts.CPU60.NMI:Non-maskable_interrupts
7999 ? 16% -53.7% 3703 ? 47% interrupts.CPU60.PMI:Performance_monitoring_interrupts
6556 ? 23% -47.6% 3432 ? 10% interrupts.CPU61.NMI:Non-maskable_interrupts
6556 ? 23% -47.6% 3432 ? 10% interrupts.CPU61.PMI:Performance_monitoring_interrupts
238.00 ? 27% -63.0% 88.00 ? 80% interrupts.CPU61.RES:Rescheduling_interrupts
7585 ? 10% -49.1% 3863 ? 30% interrupts.CPU63.NMI:Non-maskable_interrupts
7585 ? 10% -49.1% 3863 ? 30% interrupts.CPU63.PMI:Performance_monitoring_interrupts
216.25 ? 21% -73.6% 57.00 ?133% interrupts.CPU63.RES:Rescheduling_interrupts
138.25 ? 56% +90.2% 263.00 ? 10% interrupts.CPU65.RES:Rescheduling_interrupts
241.25 ? 31% -56.7% 104.50 ? 96% interrupts.CPU66.RES:Rescheduling_interrupts
7553 ? 15% -47.0% 4004 ? 40% interrupts.CPU76.NMI:Non-maskable_interrupts
7553 ? 15% -47.0% 4004 ? 40% interrupts.CPU76.PMI:Performance_monitoring_interrupts
4934 ? 52% +71.1% 8442 ? 2% interrupts.CPU92.NMI:Non-maskable_interrupts
4934 ? 52% +71.1% 8442 ? 2% interrupts.CPU92.PMI:Performance_monitoring_interrupts
194.00 ? 30% +47.9% 287.00 interrupts.CPU92.RES:Rescheduling_interrupts
23725 ? 8% -27.8% 17131 ? 18% softirqs.CPU1.RCU
23137 ? 12% -31.2% 15917 ? 24% softirqs.CPU101.RCU
21338 ? 16% -28.2% 15329 ? 22% softirqs.CPU104.RCU
18202 ? 7% -17.2% 15064 ? 18% softirqs.CPU112.RCU
21059 ? 18% -26.8% 15417 ? 10% softirqs.CPU113.RCU
21006 ? 17% -30.8% 14529 ? 14% softirqs.CPU115.RCU
27815 ? 36% -33.3% 18546 ? 57% softirqs.CPU121.SCHED
20978 ? 19% -22.2% 16329 ? 27% softirqs.CPU124.RCU
19729 ? 12% -23.7% 15060 ? 14% softirqs.CPU125.RCU
22706 ? 15% -27.8% 16382 ? 12% softirqs.CPU126.RCU
22705 ? 12% -25.4% 16931 ? 27% softirqs.CPU127.RCU
20037 ? 19% -28.2% 14387 ? 12% softirqs.CPU128.RCU
19573 ? 12% -22.8% 15111 ? 4% softirqs.CPU129.RCU
20596 ? 10% -30.9% 14242 ? 14% softirqs.CPU131.RCU
23440 ? 33% +46.3% 34297 ? 9% softirqs.CPU131.SCHED
19106 ? 16% -23.1% 14700 ? 15% softirqs.CPU132.RCU
22821 ? 11% -26.4% 16795 ? 8% softirqs.CPU138.RCU
23505 ? 11% -32.3% 15917 ? 20% softirqs.CPU14.RCU
22836 ? 9% -16.7% 19023 ? 14% softirqs.CPU141.RCU
21767 ? 14% -30.6% 15105 ? 19% softirqs.CPU143.RCU
19267 ? 46% +67.2% 32223 ? 14% softirqs.CPU143.SCHED
24861 ? 4% -41.0% 14660 ? 11% softirqs.CPU144.RCU
11831 ? 47% +177.8% 32871 ? 15% softirqs.CPU144.SCHED
23297 ? 9% -27.0% 17004 ? 6% softirqs.CPU147.RCU
22685 ? 13% -31.4% 15567 ? 18% softirqs.CPU149.RCU
23209 ? 10% -25.8% 17228 ? 8% softirqs.CPU15.RCU
33035 ? 24% -54.6% 15002 ? 56% softirqs.CPU157.SCHED
28385 ? 25% -63.2% 10448 ? 79% softirqs.CPU159.SCHED
20067 ? 7% -12.1% 17638 ? 9% softirqs.CPU160.RCU
22261 ? 14% -35.7% 14306 ? 6% softirqs.CPU161.RCU
20998 ? 43% +67.5% 35179 ? 11% softirqs.CPU161.SCHED
32659 ? 28% -51.0% 16004 ? 66% softirqs.CPU162.SCHED
23692 ? 12% -19.1% 19177 ? 7% softirqs.CPU166.RCU
23350 ? 7% -26.6% 17139 ? 22% softirqs.CPU170.RCU
21357 ? 8% -35.5% 13779 ? 10% softirqs.CPU174.RCU
21886 ? 5% -33.4% 14582 ? 11% softirqs.CPU175.RCU
26609 ? 3% -27.8% 19208 ? 24% softirqs.CPU18.RCU
19898 ? 12% -27.4% 14443 ? 9% softirqs.CPU180.RCU
23109 ? 9% -32.2% 15658 ? 23% softirqs.CPU186.RCU
19719 ? 18% -36.6% 12500 ? 11% softirqs.CPU188.RCU
21919 ? 16% -33.2% 14647 ? 17% softirqs.CPU2.RCU
25187 ? 11% -23.3% 19313 ? 26% softirqs.CPU20.RCU
23983 ? 15% -20.6% 19041 ? 17% softirqs.CPU21.RCU
24148 ? 14% -17.8% 19849 ? 21% softirqs.CPU22.RCU
24732 ? 9% -25.8% 18348 ? 24% softirqs.CPU24.RCU
23282 ? 14% -27.6% 16851 ? 15% softirqs.CPU25.RCU
20694 ? 34% -54.3% 9460 ? 29% softirqs.CPU35.SCHED
25058 ? 13% -27.9% 18059 ? 18% softirqs.CPU4.RCU
20413 ? 18% -28.2% 14659 ? 8% softirqs.CPU43.RCU
23803 ? 17% -32.8% 15994 ? 19% softirqs.CPU44.RCU
24249 ? 11% -22.1% 18886 ? 22% softirqs.CPU46.RCU
32235 ? 14% -65.5% 11106 ? 51% softirqs.CPU48.SCHED
28803 ? 16% -39.1% 17536 ? 30% softirqs.CPU60.RCU
26565 ? 10% -34.4% 17436 ? 22% softirqs.CPU61.RCU
11142 ? 68% +156.3% 28562 ? 24% softirqs.CPU61.SCHED
25175 ? 12% -37.0% 15853 ? 9% softirqs.CPU63.RCU
14961 ? 56% +120.3% 32955 ? 26% softirqs.CPU63.SCHED
23043 ? 39% -62.7% 8602 ? 36% softirqs.CPU65.SCHED
26792 ? 12% -35.8% 17210 ? 14% softirqs.CPU66.RCU
19766 ? 18% -29.6% 13906 ? 9% softirqs.CPU7.RCU
22073 ? 13% -24.4% 16678 ? 23% softirqs.CPU70.RCU
24267 ? 4% -25.3% 18121 ? 19% softirqs.CPU75.RCU
27057 ? 23% -56.9% 11669 ? 75% softirqs.CPU78.SCHED
23392 ? 9% -32.1% 15885 ? 3% softirqs.CPU81.RCU
21645 ? 11% -26.5% 15910 ? 11% softirqs.CPU83.RCU
23556 ? 8% -30.1% 16475 ? 8% softirqs.CPU85.RCU
10105 ? 53% +102.5% 20466 ? 28% softirqs.CPU85.SCHED
24799 ? 15% -27.3% 18022 ? 9% softirqs.CPU88.RCU
19697 ? 58% -71.0% 5715 ? 7% softirqs.CPU92.SCHED
22778 ? 6% -24.3% 17235 ? 15% softirqs.CPU93.RCU
18189 ? 13% -24.4% 13760 ? 10% softirqs.CPU95.RCU
26156 ? 2% -21.7% 20485 ? 13% softirqs.CPU96.RCU
21904 ? 45% -56.8% 9452 ? 57% softirqs.CPU98.SCHED
22162 ? 11% -27.8% 15992 ? 9% softirqs.CPU99.RCU
38457 ? 10% -32.2% 26084 ? 35% softirqs.NET_RX
4109730 -18.0% 3370902 ? 12% softirqs.RCU
47247 ? 4% -20.1% 37765 ? 15% softirqs.TIMER



will-it-scale.per_process_ops

1.045e+07 +---------------------------------------------------------------+
| O O O |
1.04e+07 |-+ O O O O O O O O O O O O O O O |
| O O O O O O O O O O O O |
1.035e+07 |-+ |
| |
1.03e+07 |-+ |
| |
1.025e+07 |-+ |
| |
1.02e+07 |-+ +.+..+.+ .+. |
| + + .+.+..+.+.+.+ +.+.|
1.015e+07 |.+.+.+.+.+ +. .+.+.+.+.+. .+.+.+.+ |
| + + + |
1.01e+07 +---------------------------------------------------------------+


will-it-scale.workload

1.005e+09 +---------------------------------------------------------------+
| |
1e+09 |-O O O O O O O O O O |
| O O O O O O O O O O O O O O O O |
9.95e+08 |-+ O O O O |
| |
9.9e+08 |-+ |
| |
9.85e+08 |-+ |
| |
9.8e+08 |-+ +. .+.+ |
| : +. + +.+..+.+.+.+.+.+.+.|
9.75e+08 |.+.+.+.+. : +. .+.+.+. .+. .+.+.+. + |
| + + + + + + |
9.7e+08 +---------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample

***************************************************************************************************
lkp-csl-2ap2: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/dup1/will-it-scale/0x5003003

commit:
03b7843215 ("efi: Update implementation of add_links() to create fwnode links")
95f755a4ef ("driver core: Refactor fw_devlink feature")

03b7843215575338 95f755a4ef7b9ccbedf6012b411
---------------- ---------------------------
%stddev %change %stddev
\ | \
4681240 +5.2% 4925857 will-it-scale.per_process_ops
4.494e+08 +5.2% 4.729e+08 will-it-scale.workload
84205 ? 3% -6.6% 78620 ? 2% cpuidle.POLL.time
12.74 +1.4 14.10 mpstat.cpu.all.usr%
38.69 ? 9% +593.5% 268.33 ?101% sched_debug.cfs_rq:/.exec_clock.min
12.00 +8.3% 13.00 vmstat.cpu.us
2342 -1.8% 2299 vmstat.system.cs
1416 ? 4% -9.2% 1287 ? 4% slabinfo.dmaengine-unmap-16.active_objs
1416 ? 4% -9.2% 1287 ? 4% slabinfo.dmaengine-unmap-16.num_objs
6896 ? 4% -10.3% 6185 ? 4% slabinfo.kmalloc-rcl-64.active_objs
6896 ? 4% -10.3% 6185 ? 4% slabinfo.kmalloc-rcl-64.num_objs
2118 ? 2% -13.9% 1824 ? 6% slabinfo.kmalloc-rcl-96.active_objs
2118 ? 2% -13.9% 1824 ? 6% slabinfo.kmalloc-rcl-96.num_objs
13476 ? 4% -12.9% 11741 ? 3% slabinfo.pde_opener.active_objs
13476 ? 4% -12.9% 11741 ? 3% slabinfo.pde_opener.num_objs
2637 ? 63% -81.3% 492.50 ?142% numa-meminfo.node0.Active
2637 ? 63% -81.3% 492.50 ?142% numa-meminfo.node0.Active(anon)
2915 ? 16% -47.8% 1520 ? 19% numa-meminfo.node0.PageTables
87880 ? 10% -20.2% 70124 ? 9% numa-meminfo.node0.SUnreclaim
2153 ? 42% +215.9% 6802 ? 64% numa-meminfo.node2.Shmem
39060 ? 53% +306.7% 158860 ? 45% numa-meminfo.node3.AnonPages
46215 ? 44% +256.0% 164519 ? 42% numa-meminfo.node3.Inactive
46215 ? 44% +256.0% 164519 ? 42% numa-meminfo.node3.Inactive(anon)
750282 ? 10% +20.7% 905405 ? 13% numa-meminfo.node3.MemUsed
658.50 ? 63% -81.4% 122.75 ?142% numa-vmstat.node0.nr_active_anon
728.00 ? 15% -47.9% 379.50 ? 19% numa-vmstat.node0.nr_page_table_pages
21969 ? 10% -20.2% 17530 ? 9% numa-vmstat.node0.nr_slab_unreclaimable
658.50 ? 63% -81.4% 122.75 ?142% numa-vmstat.node0.nr_zone_active_anon
621492 ? 13% -26.7% 455253 ? 16% numa-vmstat.node0.numa_hit
572512 ? 17% -31.5% 391888 ? 15% numa-vmstat.node0.numa_local
538.00 ? 42% +216.0% 1700 ? 64% numa-vmstat.node2.nr_shmem
9760 ? 53% +307.2% 39747 ? 45% numa-vmstat.node3.nr_anon_pages
11513 ? 43% +257.5% 41166 ? 42% numa-vmstat.node3.nr_inactive_anon
11513 ? 43% +257.5% 41166 ? 42% numa-vmstat.node3.nr_zone_inactive_anon
548959 ? 19% +33.0% 730071 ? 31% numa-vmstat.node3.numa_hit
427995 ? 25% +47.7% 632173 ? 36% numa-vmstat.node3.numa_local
2.37 ? 8% -1.5 0.88 ? 8% perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
2.33 ? 8% -1.5 0.88 ? 8% perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.dup
0.47 ? 57% +0.3 0.74 ? 7% perf-profile.calltrace.cycles-pp.locks_remove_posix.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.27 ? 7% +0.7 1.96 ? 8% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__close
4.94 ? 8% -3.2 1.77 ? 8% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
0.44 ? 8% -0.2 0.23 ? 8% perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
0.12 ? 18% -0.0 0.08 ? 13% perf-profile.children.cycles-pp.clockevents_program_event
0.11 ? 18% -0.0 0.07 ? 15% perf-profile.children.cycles-pp.ktime_get
0.27 ? 4% +0.1 0.33 ? 18% perf-profile.children.cycles-pp.start_kernel
4.76 ? 8% -3.1 1.62 ? 8% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
1.15 ? 7% -0.5 0.67 ? 8% perf-profile.self.cycles-pp.__x64_sys_close
1.44 ? 9% -0.4 1.05 ? 9% perf-profile.self.cycles-pp.__x64_sys_dup
0.11 ? 21% -0.0 0.07 ? 13% perf-profile.self.cycles-pp.ktime_get
0.09 ? 8% +0.0 0.11 ? 4% perf-profile.self.cycles-pp.rcu_read_unlock_strict
9492 ? 72% -65.5% 3272 ?142% softirqs.CPU10.NET_RX
11789 ? 6% +25.7% 14823 ? 11% softirqs.CPU100.RCU
13884 ? 12% +26.5% 17565 ? 9% softirqs.CPU124.RCU
31918 ? 27% -65.9% 10876 ? 41% softirqs.CPU124.SCHED
15884 ? 11% +22.0% 19382 ? 11% softirqs.CPU13.RCU
30930 ? 28% -48.0% 16070 ? 72% softirqs.CPU132.SCHED
17573 ? 6% +77.1% 31116 ? 23% softirqs.CPU170.SCHED
16544 ? 4% -11.4% 14650 ? 10% softirqs.CPU185.RCU
17192 ? 8% -18.7% 13986 ? 16% softirqs.CPU186.RCU
10342 ? 45% +148.7% 25727 ? 27% softirqs.CPU186.SCHED
18559 ? 11% -22.9% 14300 ? 8% softirqs.CPU2.RCU
11714 ? 45% +150.5% 29341 ? 30% softirqs.CPU2.SCHED
21025 ? 46% -47.9% 10949 ? 77% softirqs.CPU21.SCHED
17639 ? 6% -20.2% 14081 ? 9% softirqs.CPU28.RCU
12427 ? 76% +146.2% 30597 ? 22% softirqs.CPU28.SCHED
5236 ? 6% +242.8% 17948 ? 41% softirqs.CPU4.SCHED
12934 ? 56% +139.0% 30910 ? 17% softirqs.CPU57.SCHED
14047 ? 6% +24.2% 17447 ? 8% softirqs.CPU74.RCU
27112 ? 3% -57.2% 11605 ? 70% softirqs.CPU74.SCHED
14231 ? 13% +15.8% 16478 ? 7% softirqs.CPU79.RCU
12191 ? 4% +28.2% 15634 ? 6% softirqs.CPU90.RCU
33498 ? 11% -47.3% 17657 ? 47% softirqs.CPU90.SCHED
13451 ? 5% +27.0% 17079 ? 13% softirqs.CPU98.RCU
33555 ? 15% -59.5% 13587 ? 56% softirqs.CPU98.SCHED
6.427e+10 +5.1% 6.752e+10 perf-stat.i.branch-instructions
0.90 -0.0 0.88 perf-stat.i.branch-miss-rate%
5.739e+08 +3.4% 5.933e+08 perf-stat.i.branch-misses
2298 -1.8% 2257 perf-stat.i.context-switches
0.88 -5.2% 0.84 perf-stat.i.cpi
1.067e+11 +5.1% 1.121e+11 perf-stat.i.dTLB-loads
110648 +8.9% 120538 perf-stat.i.dTLB-store-misses
7.223e+10 +5.1% 7.591e+10 perf-stat.i.dTLB-stores
6.072e+08 +5.4% 6.401e+08 ? 4% perf-stat.i.iTLB-load-misses
3.356e+11 +5.1% 3.526e+11 perf-stat.i.instructions
1.13 +5.5% 1.19 perf-stat.i.ipc
1.24 ? 2% +7.5% 1.34 ? 3% perf-stat.i.metric.K/sec
1266 +5.1% 1331 perf-stat.i.metric.M/sec
0.03 -3.5% 0.03 perf-stat.overall.MPKI
0.89 -0.0 0.88 perf-stat.overall.branch-miss-rate%
0.88 -5.2% 0.84 perf-stat.overall.cpi
0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
1.13 +5.5% 1.19 perf-stat.overall.ipc
6.405e+10 +5.1% 6.729e+10 perf-stat.ps.branch-instructions
5.719e+08 +3.4% 5.913e+08 perf-stat.ps.branch-misses
2287 -1.8% 2247 perf-stat.ps.context-switches
1.063e+11 +5.1% 1.117e+11 perf-stat.ps.dTLB-loads
110749 +8.7% 120431 perf-stat.ps.dTLB-store-misses
7.199e+10 +5.1% 7.565e+10 perf-stat.ps.dTLB-stores
6.049e+08 +5.4% 6.376e+08 ? 4% perf-stat.ps.iTLB-load-misses
3.345e+11 +5.1% 3.514e+11 perf-stat.ps.instructions
1.013e+14 +5.2% 1.065e+14 perf-stat.total.instructions
18582 ? 74% -66.5% 6229 ?146% interrupts.31:PCI-MSI.524289-edge.eth0-TxRx-0
18582 ? 74% -66.5% 6229 ?146% interrupts.CPU10.31:PCI-MSI.524289-edge.eth0-TxRx-0
2285 ? 40% -27.5% 1655 ? 5% interrupts.CPU103.CAL:Function_call_interrupts
150.00 ? 56% -67.2% 49.25 ?138% interrupts.CPU117.RES:Rescheduling_interrupts
2.00 ? 86% +8587.5% 173.75 ?168% interrupts.CPU123.TLB:TLB_shootdowns
73.00 ?101% +229.5% 240.50 ? 19% interrupts.CPU124.RES:Rescheduling_interrupts
3375 ? 23% +112.2% 7163 ? 20% interrupts.CPU129.NMI:Non-maskable_interrupts
3375 ? 23% +112.2% 7163 ? 20% interrupts.CPU129.PMI:Performance_monitoring_interrupts
2570 ? 19% +131.1% 5940 ? 34% interrupts.CPU132.NMI:Non-maskable_interrupts
2570 ? 19% +131.1% 5940 ? 34% interrupts.CPU132.PMI:Performance_monitoring_interrupts
382.50 ? 68% -78.6% 82.00 ?136% interrupts.CPU15.RES:Rescheduling_interrupts
8363 ? 5% -33.8% 5539 ? 25% interrupts.CPU161.NMI:Non-maskable_interrupts
8363 ? 5% -33.8% 5539 ? 25% interrupts.CPU161.PMI:Performance_monitoring_interrupts
253.00 ? 15% -54.0% 116.50 ? 57% interrupts.CPU186.RES:Rescheduling_interrupts
249.50 ? 16% -64.7% 88.00 ? 64% interrupts.CPU2.RES:Rescheduling_interrupts
4404 ? 25% +89.1% 8327 ? 6% interrupts.CPU21.NMI:Non-maskable_interrupts
4404 ? 25% +89.1% 8327 ? 6% interrupts.CPU21.PMI:Performance_monitoring_interrupts
3516 ? 52% -52.8% 1659 ? 5% interrupts.CPU30.CAL:Function_call_interrupts
8180 ? 9% -42.4% 4710 ? 59% interrupts.CPU33.NMI:Non-maskable_interrupts
8180 ? 9% -42.4% 4710 ? 59% interrupts.CPU33.PMI:Performance_monitoring_interrupts
8601 ? 2% -39.4% 5208 ? 39% interrupts.CPU36.NMI:Non-maskable_interrupts
8601 ? 2% -39.4% 5208 ? 39% interrupts.CPU36.PMI:Performance_monitoring_interrupts
222.75 ? 31% -55.4% 99.25 ?109% interrupts.CPU36.RES:Rescheduling_interrupts
291.75 -40.1% 174.75 ? 38% interrupts.CPU4.RES:Rescheduling_interrupts
7518 ? 17% -30.1% 5251 ? 42% interrupts.CPU66.NMI:Non-maskable_interrupts
7518 ? 17% -30.1% 5251 ? 42% interrupts.CPU66.PMI:Performance_monitoring_interrupts
8034 ? 12% -41.4% 4711 ? 26% interrupts.CPU75.NMI:Non-maskable_interrupts
8034 ? 12% -41.4% 4711 ? 26% interrupts.CPU75.PMI:Performance_monitoring_interrupts
52.50 ? 70% +263.8% 191.00 ? 37% interrupts.CPU90.RES:Rescheduling_interrupts
63.25 ? 65% +243.1% 217.00 ? 34% interrupts.CPU98.RES:Rescheduling_interrupts





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Oliver Sang


Attachments:
(No filename) (31.78 kB)
config-5.10.0-rc4-00075-g95f755a4ef7b (172.78 kB)
job-script (7.95 kB)
job.yaml (5.33 kB)
reproduce (348.00 B)
Download all attachments

2020-12-11 14:42:30

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] driver core: Refactor fw_devlink feature

On Fri, 2020-11-20 at 18:02 -0800, Saravana Kannan wrote:
> The current implementation of fw_devlink is very inefficient because it
> tries to get away without creating fwnode links in the name of saving
> memory usage. Past attempts to optimize runtime at the cost of memory
> usage were blocked with request for data showing that the optimization
> made significant improvement for real world scenarios.
>
> We have those scenarios now. There have been several reports of boot
> time increase in the order of seconds in this thread [1]. Several OEMs
> and SoC manufacturers have also privately reported significant
> (350-400ms) increase in boot time due to all the parsing done by
> fw_devlink.
>
> So this patch uses all the setup done by the previous patches in this
> series to refactor fw_devlink to be more efficient. Most of the code has
> been moved out of firmware specific (DT mostly) code into driver core.
>
> This brings the following benefits:
> - Instead of parsing the device tree multiple times during bootup,
> fw_devlink parses each fwnode node/property only once and creates
> fwnode links. The rest of the fw_devlink code then just looks at these
> fwnode links to do rest of the work.
>
> - Makes it much easier to debug probe issue due to fw_devlink in the
> future. fw_devlink=on blocks the probing of devices if they depend on
> a device that hasn't been added yet. With this refactor, it'll be very
> easy to tell what that device is because we now have a reference to
> the fwnode of the device.
>
> - Much easier to add fw_devlink support to ACPI and other firmware
> types. A refactor to move the common bits from DT specific code to
> driver core was in my TODO list as a prerequisite to adding ACPI
> support to fw_devlink. This series gets that done.
>
> [1] - https://lore.kernel.org/linux-omap/[email protected]/
> Signed-off-by: Saravana Kannan <[email protected]>
> Tested-by: Laurent Pinchart <[email protected]>
> Tested-by: Grygorii Strashko <[email protected]>

Reverting this commit and its dependency:

2d09e6eb4a6f driver core: Delete pointless parameter in fwnode_operations.add_links

from today's linux-next fixed a boot crash on an arm64 Thunder X2 server.

.config: https://cailca.coding.net/public/linux/mm/git/files/master/arm64.config

[ 57.413929][ T1] ACPI: 5 ACPI AML tables successfully acquired and loaded
[ 60.571643][ T1] ACPI: Interpreter enabled
[ 60.576104][ T1] ACPI: Using GIC for interrupt routing
[ 60.582474][ T1] ACPI: MCFG table detected, 1 entries
[ 60.588051][ T1] ACPI: IORT: SMMU-v3[402300000] Mapped to Proximity domain 0
[ 60.601374][ T1] Unable to handle kernel paging request at virtual address dfff800000000000
[ 60.610146][ T1] Mem abort info:
[ 60.613694][ T1] ESR = 0x96000004
[ 60.617496][ T1] EC = 0x25: DABT (current EL), IL = 32 bits
[ 60.623616][ T1] SET = 0, FnV = 0
[ 60.627420][ T1] EA = 0, S1PTW = 0
[ 60.631304][ T1] Data abort info:
[ 60.634957][ T1] ISV = 0, ISS = 0x00000004
[ 60.639546][ T1] CM = 0, WnR = 0
[ 60.643255][ T1] [dfff800000000000] address between user and kernel address ranges
[ 60.651226][ T1] Internal error: Oops: 96000004 [#1] SMP
[ 60.656864][ T1] Modules linked in:
[ 60.660658][ T1] CPU: 38 PID: 1 Comm: swapper/0 Tainted: G W 5.10.0-rc7-next-20201211 #2
[ 60.670424][ T1] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.16 07/29/2020
[ 60.680979][ T1] pstate: 10400009 (nzcV daif +PAN -UAO -TCO BTYPE=--)
[ 60.687757][ T1] pc : device_add+0xf60/0x16b0
[ 60.692430][ T1] lr : device_add+0xf08/0x16b0
[ 60.697098][ T1] sp : ffff0000063bf7d0
[ 60.701147][ T1] x29: ffff0000063bf7d0 x28: ffff00001f760810
[ 60.707226][ T1] x27: fffffffffffffff8 x26: ffff00001f760858
[ 60.713304][ T1] x25: ffff00001f760c58 x24: fffffffffffffff8
[ 60.719381][ T1] x23: 1fffe00003eec10b x22: ffff8000190d0260
[ 60.725458][ T1] x21: ffff800011dba708 x20: 0000000000000000
[ 60.731535][ T1] x19: 1fffe00000c77f10 x18: 1fffe001cf0d53ed
[ 60.737616][ T1] x17: 0000000000000000 x16: 1ffff000033676f1
[ 60.743709][ T1] x15: 0000000000000000 x14: ffff800011731e34
[ 60.749786][ T1] x13: ffff700003217fb9 x12: 1ffff00003217fb8
[ 60.755864][ T1] x11: 1ffff00003217fb8 x10: ffff700003217fb8
[ 60.761940][ T1] x9 : dfff800000000000 x8 : ffff8000190bfdc7
[ 60.768017][ T1] x7 : 0000000000000001 x6 : ffff700003217fb9
[ 60.774094][ T1] x5 : ffff700003217fb9 x4 : ffff000006324a80
[ 60.780170][ T1] x3 : 1fffe00000c64951 x2 : 0000000000000000
[ 60.786247][ T1] x1 : 0000000000000000 x0 : dfff800000000000
[ 60.792324][ T1] Call trace:
[ 60.795495][ T1] device_add+0xf60/0x16b0
__fw_devlink_link_to_consumers at drivers/base/core.c:1583
(inlined by) fw_devlink_link_device at drivers/base/core.c:1726
(inlined by) device_add at drivers/base/core.c:3088
[ 60.799813][ T1] platform_device_add+0x274/0x628
[ 60.804833][ T1] acpi_iort_init+0x9d8/0xc50
[ 60.809415][ T1] acpi_init+0x45c/0x4e8
[ 60.813556][ T1] do_one_initcall+0x170/0xb70
[ 60.818224][ T1] kernel_init_freeable+0x6a8/0x734
[ 60.823332][ T1] kernel_init+0x18/0x12c
[ 60.827566][ T1] ret_from_fork+0x10/0x1c
[ 60.831890][ T1] Code: f2fbffe0 d100205b d343fc41 aa1b03f8 (38e06820)
[ 60.838756][ T1] ---[ end trace fa5c8ce17a226d83 ]---
[ 60.844127][ T1] Kernel panic - not syncing: Oops: Fatal exception
[ 60.850881][ T1] SMP: stopping secondary CPUs
[ 60.855733][ T1] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
> ---
> drivers/base/core.c | 325 ++++++++++++++++++++++++++++++-----------
> include/linux/device.h | 5 -
> 2 files changed, 238 insertions(+), 92 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 1873cecb0cc4..9edf9084fc98 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -46,8 +46,6 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
> #endif
>
> /* Device links support. */
> -static LIST_HEAD(wait_for_suppliers);
> -static DEFINE_MUTEX(wfs_lock);
> static LIST_HEAD(deferred_sync);
> static unsigned int defer_sync_state_count = 1;
> static DEFINE_MUTEX(fwnode_link_lock);
> @@ -803,74 +801,6 @@ struct device_link *device_link_add(struct device *consumer,
> }
> EXPORT_SYMBOL_GPL(device_link_add);
>
> -/**
> - * device_link_wait_for_supplier - Add device to wait_for_suppliers list
> - * @consumer: Consumer device
> - *
> - * Marks the @consumer device as waiting for suppliers to become available by
> - * adding it to the wait_for_suppliers list. The consumer device will never be
> - * probed until it's removed from the wait_for_suppliers list.
> - *
> - * The caller is responsible for adding the links to the supplier devices once
> - * they are available and removing the @consumer device from the
> - * wait_for_suppliers list once links to all the suppliers have been created.
> - *
> - * This function is NOT meant to be called from the probe function of the
> - * consumer but rather from code that creates/adds the consumer device.
> - */
> -static void device_link_wait_for_supplier(struct device *consumer,
> - bool need_for_probe)
> -{
> - mutex_lock(&wfs_lock);
> - list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
> - consumer->links.need_for_probe = need_for_probe;
> - mutex_unlock(&wfs_lock);
> -}
> -
> -static void device_link_wait_for_mandatory_supplier(struct device *consumer)
> -{
> - device_link_wait_for_supplier(consumer, true);
> -}
> -
> -static void device_link_wait_for_optional_supplier(struct device *consumer)
> -{
> - device_link_wait_for_supplier(consumer, false);
> -}
> -
> -/**
> - * device_link_add_missing_supplier_links - Add links from consumer devices to
> - * supplier devices, leaving any
> - * consumer with inactive suppliers on
> - * the wait_for_suppliers list
> - *
> - * Loops through all consumers waiting on suppliers and tries to add all their
> - * supplier links. If that succeeds, the consumer device is removed from
> - * wait_for_suppliers list. Otherwise, they are left in the wait_for_suppliers
> - * list. Devices left on the wait_for_suppliers list will not be probed.
> - *
> - * The fwnode add_links callback is expected to return 0 if it has found and
> - * added all the supplier links for the consumer device. It should return an
> - * error if it isn't able to do so.
> - *
> - * The caller of device_link_wait_for_supplier() is expected to call this once
> - * it's aware of potential suppliers becoming available.
> - */
> -static void device_link_add_missing_supplier_links(void)
> -{
> - struct device *dev, *tmp;
> -
> - mutex_lock(&wfs_lock);
> - list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
> - links.needs_suppliers) {
> - int ret = fwnode_call_int_op(dev->fwnode, add_links, dev);
> - if (!ret)
> - list_del_init(&dev->links.needs_suppliers);
> - else if (ret != -ENODEV)
> - dev->links.need_for_probe = false;
> - }
> - mutex_unlock(&wfs_lock);
> -}
> -
> #ifdef CONFIG_SRCU
> static void __device_link_del(struct kref *kref)
> {
> @@ -1195,9 +1125,8 @@ void device_links_driver_bound(struct device *dev)
> * the device links it needs to or make new device links as it needs
> * them. So, it no longer needs to wait on any suppliers.
> */
> - mutex_lock(&wfs_lock);
> - list_del_init(&dev->links.needs_suppliers);
> - mutex_unlock(&wfs_lock);
> + if (dev->fwnode && dev->fwnode->dev == dev)
> + fwnode_links_purge_suppliers(dev->fwnode);
> device_remove_file(dev, &dev_attr_waiting_for_supplier);
>
> device_links_write_lock();
> @@ -1488,10 +1417,6 @@ static void device_links_purge(struct device *dev)
> if (dev->class == &devlink_class)
> return;
>
> - mutex_lock(&wfs_lock);
> - list_del(&dev->links.needs_suppliers);
> - mutex_unlock(&wfs_lock);
> -
> /*
> * Delete all of the remaining links from this device to any other
> * devices (either consumers or suppliers).
> @@ -1561,19 +1486,246 @@ static void fw_devlink_parse_fwtree(struct fwnode_handle *fwnode)
> fw_devlink_parse_fwtree(child);
> }
>
> -static void fw_devlink_link_device(struct device *dev)
> +/**
> + * fw_devlink_create_devlink - Create a device link from a consumer to fwnode
> + * @con - Consumer device for the device link
> + * @sup_handle - fwnode handle of supplier
> + *
> + * This function will try to create a device link between the consumer device
> + * @con and the supplier device represented by @sup_handle.
> + *
> + * The supplier has to be provided as a fwnode because incorrect cycles in
> + * fwnode links can sometimes cause the supplier device to never be created.
> + * This function detects such cases and returns an error if it cannot create a
> + * device link from the consumer to a missing supplier.
> + *
> + * Returns,
> + * 0 on successfully creating a device link
> + * -EINVAL if the device link cannot be created as expected
> + * -EAGAIN if the device link cannot be created right now, but it may be
> + * possible to do that in the future
> + */
> +static int fw_devlink_create_devlink(struct device *con,
> + struct fwnode_handle *sup_handle, u32 flags)
> +{
> + struct device *sup_dev;
> + int ret = 0;
> +
> + sup_dev = get_dev_from_fwnode(sup_handle);
> + if (sup_dev) {
> + /*
> + * If this fails, it is due to cycles in device links. Just
> + * give up on this link and treat it as invalid.
> + */
> + if (!device_link_add(con, sup_dev, flags))
> + ret = -EINVAL;
> +
> + goto out;
> + }
> +
> + /*
> + * DL_FLAG_SYNC_STATE_ONLY doesn't block probing and supports
> + * cycles. So cycle detection isn't necessary and shouldn't be
> + * done.
> + */
> + if (flags & DL_FLAG_SYNC_STATE_ONLY)
> + return -EAGAIN;
> +
> + /*
> + * If we can't find the supplier device from its fwnode, it might be
> + * due to a cyclic dependency between fwnodes. Some of these cycles can
> + * be broken by applying logic. Check for these types of cycles and
> + * break them so that devices in the cycle probe properly.
> + *
> + * If the supplier's parent is dependent on the consumer, then
> + * the consumer-supplier dependency is a false dependency. So,
> + * treat it as an invalid link.
> + */
> + sup_dev = fwnode_get_next_parent_dev(sup_handle);
> + if (sup_dev && device_is_dependent(con, sup_dev)) {
> + dev_dbg(con, "Not linking to %pfwP - False link\n",
> + sup_handle);
> + ret = -EINVAL;
> + } else {
> + /*
> + * Can't check for cycles or no cycles. So let's try
> + * again later.
> + */
> + ret = -EAGAIN;
> + }
> +
> +out:
> + put_device(sup_dev);
> + return ret;
> +}
> +
> +/**
> + * __fw_devlink_link_to_consumers - Create device links to consumers of a device
> + * @dev - Device that needs to be linked to its consumers
> + *
> + * This function looks at all the consumer fwnodes of @dev and creates device
> + * links between the consumer device and @dev (supplier).
> + *
> + * If the consumer device has not been added yet, then this function creates a
> + * SYNC_STATE_ONLY link between @dev (supplier) and the closest ancestor device
> + * of the consumer fwnode. This is necessary to make sure @dev doesn't get a
> + * sync_state() callback before the real consumer device gets to be added and
> + * then probed.
> + *
> + * Once device links are created from the real consumer to @dev (supplier), the
> + * fwnode links are deleted.
> + */
> +static void __fw_devlink_link_to_consumers(struct device *dev)
> +{
> + struct fwnode_handle *fwnode = dev->fwnode;
> + struct fwnode_link *link, *tmp;
> +
> + list_for_each_entry_safe(link, tmp, &fwnode->consumers, s_hook) {
> + u32 dl_flags = fw_devlink_get_flags();
> + struct device *con_dev;
> + bool own_link = true;
> + int ret;
> +
> + con_dev = get_dev_from_fwnode(link->consumer);
> + /*
> + * If consumer device is not available yet, make a "proxy"
> + * SYNC_STATE_ONLY link from the consumer's parent device to
> + * the supplier device. This is necessary to make sure the
> + * supplier doesn't get a sync_state() callback before the real
> + * consumer can create a device link to the supplier.
> + *
> + * This proxy link step is needed to handle the case where the
> + * consumer's parent device is added before the supplier.
> + */
> + if (!con_dev) {
> + con_dev = fwnode_get_next_parent_dev(link->consumer);
> + /*
> + * However, if the consumer's parent device is also the
> + * parent of the supplier, don't create a
> + * consumer-supplier link from the parent to its child
> + * device. Such a dependency is impossible.
> + */
> + if (con_dev &&
> + fwnode_is_ancestor_of(con_dev->fwnode, fwnode)) {
> + put_device(con_dev);
> + con_dev = NULL;
> + } else {
> + own_link = false;
> + dl_flags = DL_FLAG_SYNC_STATE_ONLY;
> + }
> + }
> +
> + if (!con_dev)
> + continue;
> +
> + ret = fw_devlink_create_devlink(con_dev, fwnode, dl_flags);
> + put_device(con_dev);
> + if (!own_link || ret == -EAGAIN)
> + continue;
> +
> + list_del(&link->s_hook);
> + list_del(&link->c_hook);
> + kfree(link);
> + }
> +}
> +
> +/**
> + * __fw_devlink_link_to_suppliers - Create device links to suppliers of a device
> + * @dev - The consumer device that needs to be linked to its suppliers
> + * @fwnode - Root of the fwnode tree that is used to create device links
> + *
> + * This function looks at all the supplier fwnodes of fwnode tree rooted at
> + * @fwnode and creates device links between @dev (consumer) and all the
> + * supplier devices of the entire fwnode tree at @fwnode.
> + *
> + * The function creates normal (non-SYNC_STATE_ONLY) device links between @dev
> + * and the real suppliers of @dev. Once these device links are created, the
> + * fwnode links are deleted. When such device links are successfully created,
> + * this function is called recursively on those supplier devices. This is
> + * needed to detect and break some invalid cycles in fwnode links. See
> + * fw_devlink_create_devlink() for more details.
> + *
> + * In addition, it also looks at all the suppliers of the entire fwnode tree
> + * because some of the child devices of @dev that have not been added yet
> + * (because @dev hasn't probed) might already have their suppliers added to
> + * driver core. So, this function creates SYNC_STATE_ONLY device links between
> + * @dev (consumer) and these suppliers to make sure they don't execute their
> + * sync_state() callbacks before these child devices have a chance to create
> + * their device links. The fwnode links that correspond to the child devices
> + * aren't delete because they are needed later to create the device links
> + * between the real consumer and supplier devices.
> + */
> +static void __fw_devlink_link_to_suppliers(struct device *dev,
> + struct fwnode_handle *fwnode)
> {
> - int fw_ret;
> + bool own_link = (dev->fwnode == fwnode);
> + struct fwnode_link *link, *tmp;
> + struct fwnode_handle *child = NULL;
> + u32 dl_flags;
> +
> + if (own_link)
> + dl_flags = fw_devlink_get_flags();
> + else
> + dl_flags = DL_FLAG_SYNC_STATE_ONLY;
>
> - device_link_add_missing_supplier_links();
> + list_for_each_entry_safe(link, tmp, &fwnode->suppliers, c_hook) {
> + int ret;
> + struct device *sup_dev;
> + struct fwnode_handle *sup = link->supplier;
> +
> + ret = fw_devlink_create_devlink(dev, sup, dl_flags);
> + if (!own_link || ret == -EAGAIN)
> + continue;
>
> - if (fw_devlink_flags && fwnode_has_op(dev->fwnode, add_links)) {
> - fw_ret = fwnode_call_int_op(dev->fwnode, add_links, dev);
> - if (fw_ret == -ENODEV && !fw_devlink_is_permissive())
> - device_link_wait_for_mandatory_supplier(dev);
> - else if (fw_ret)
> - device_link_wait_for_optional_supplier(dev);
> + list_del(&link->s_hook);
> + list_del(&link->c_hook);
> + kfree(link);
> +
> + /* If no device link was created, nothing more to do. */
> + if (ret)
> + continue;
> +
> + /*
> + * If a device link was successfully created to a supplier, we
> + * now need to try and link the supplier to all its suppliers.
> + *
> + * This is needed to detect and delete false dependencies in
> + * fwnode links that haven't been converted to a device link
> + * yet. See comments in fw_devlink_create_devlink() for more
> + * details on the false dependency.
> + *
> + * Without deleting these false dependencies, some devices will
> + * never probe because they'll keep waiting for their false
> + * dependency fwnode links to be converted to device links.
> + */
> + sup_dev = get_dev_from_fwnode(sup);
> + __fw_devlink_link_to_suppliers(sup_dev, sup_dev->fwnode);
> + put_device(sup_dev);
> }
> +
> + /*
> + * Make "proxy" SYNC_STATE_ONLY device links to represent the needs of
> + * all the descendants. This proxy link step is needed to handle the
> + * case where the supplier is added before the consumer's parent device
> + * (@dev).
> + */
> + while ((child = fwnode_get_next_available_child_node(fwnode, child)))
> + __fw_devlink_link_to_suppliers(dev, child);
> +}
> +
> +static void fw_devlink_link_device(struct device *dev)
> +{
> + struct fwnode_handle *fwnode = dev->fwnode;
> +
> + if (!fw_devlink_flags)
> + return;
> +
> + fw_devlink_parse_fwtree(fwnode);
> +
> + mutex_lock(&fwnode_link_lock);
> + __fw_devlink_link_to_consumers(dev);
> + __fw_devlink_link_to_suppliers(dev, fwnode);
> + mutex_unlock(&fwnode_link_lock);
> }
>
> /* Device links support end. */
> @@ -2431,7 +2583,6 @@ void device_initialize(struct device *dev)
> #endif
> INIT_LIST_HEAD(&dev->links.consumers);
> INIT_LIST_HEAD(&dev->links.suppliers);
> - INIT_LIST_HEAD(&dev->links.needs_suppliers);
> INIT_LIST_HEAD(&dev->links.defer_sync);
> dev->links.status = DL_DEV_NO_DRIVER;
> }
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 1e771ea4dca6..89bb8b84173e 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -351,18 +351,13 @@ enum dl_dev_state {
> * struct dev_links_info - Device data related to device links.
> * @suppliers: List of links to supplier devices.
> * @consumers: List of links to consumer devices.
> - * @needs_suppliers: Hook to global list of devices waiting for suppliers.
> * @defer_sync: Hook to global list of devices that have deferred sync_state.
> - * @need_for_probe: If needs_suppliers is on a list, this indicates if the
> - * suppliers are needed for probe or not.
> * @status: Driver status information.
> */
> struct dev_links_info {
> struct list_head suppliers;
> struct list_head consumers;
> - struct list_head needs_suppliers;
> struct list_head defer_sync;
> - bool need_for_probe;
> enum dl_dev_state status;
> };
>

2020-12-11 19:06:40

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] driver core: Refactor fw_devlink feature

On 2020-12-11 17:51, Saravana Kannan wrote:
> On Fri, Dec 11, 2020 at 8:34 AM Robin Murphy <[email protected]>
> wrote:
>>
>> On 2020-12-11 14:11, Qian Cai wrote:
>> > On Fri, 2020-11-20 at 18:02 -0800, Saravana Kannan wrote:
>> >> The current implementation of fw_devlink is very inefficient because it
>> >> tries to get away without creating fwnode links in the name of saving
>> >> memory usage. Past attempts to optimize runtime at the cost of memory
>> >> usage were blocked with request for data showing that the optimization
>> >> made significant improvement for real world scenarios.
>> >>
>> >> We have those scenarios now. There have been several reports of boot
>> >> time increase in the order of seconds in this thread [1]. Several OEMs
>> >> and SoC manufacturers have also privately reported significant
>> >> (350-400ms) increase in boot time due to all the parsing done by
>> >> fw_devlink.
>> >>
>> >> So this patch uses all the setup done by the previous patches in this
>> >> series to refactor fw_devlink to be more efficient. Most of the code has
>> >> been moved out of firmware specific (DT mostly) code into driver core.
>> >>
>> >> This brings the following benefits:
>> >> - Instead of parsing the device tree multiple times during bootup,
>> >> fw_devlink parses each fwnode node/property only once and creates
>> >> fwnode links. The rest of the fw_devlink code then just looks at these
>> >> fwnode links to do rest of the work.
>> >>
>> >> - Makes it much easier to debug probe issue due to fw_devlink in the
>> >> future. fw_devlink=on blocks the probing of devices if they depend on
>> >> a device that hasn't been added yet. With this refactor, it'll be very
>> >> easy to tell what that device is because we now have a reference to
>> >> the fwnode of the device.
>> >>
>> >> - Much easier to add fw_devlink support to ACPI and other firmware
>> >> types. A refactor to move the common bits from DT specific code to
>> >> driver core was in my TODO list as a prerequisite to adding ACPI
>> >> support to fw_devlink. This series gets that done.
>> >>
>> >> [1] - https://lore.kernel.org/linux-omap/[email protected]/
>> >> Signed-off-by: Saravana Kannan <[email protected]>
>> >> Tested-by: Laurent Pinchart <[email protected]>
>> >> Tested-by: Grygorii Strashko <[email protected]>
>> >
>> > Reverting this commit and its dependency:
>> >
>> > 2d09e6eb4a6f driver core: Delete pointless parameter in fwnode_operations.add_links
>> >
>> > from today's linux-next fixed a boot crash on an arm64 Thunder X2 server.
>>
>> Since the call stack implicates the platform-device-wrangling we do in
>> IORT code I took a quick look; AFAICS my guess would be it's blowing
>> up
>> trying to walk a zeroed list head since "driver core: Add
>> fwnode_init()"
>> missed acpi_alloc_fwnode_static().
>
> Thanks Robin. I'm pretty sure this is the reason. I thought I fixed
> all ACPI cases, but clearly I missed this one. I'll send out a patch
> for this today. If you think there are any other places I missed
> please let me know. I'll try some git grep foo to see if I missed any
> other instances of fwnode ops being set.

Yup, that fixed it here (QDF2400).

Thanks,

M.

diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 39263c6b52e1..2630c2e953f7 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -55,7 +55,7 @@ static inline struct fwnode_handle
*acpi_alloc_fwnode_static(void)
if (!fwnode)
return NULL;

- fwnode->ops = &acpi_static_fwnode_ops;
+ fwnode_init(fwnode, &acpi_static_fwnode_ops);

return fwnode;
}



--
Jazz is not dead. It just smells funny...

2020-12-11 19:08:58

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] driver core: Refactor fw_devlink feature

On Fri, Dec 11, 2020 at 10:03 AM Marc Zyngier <[email protected]> wrote:
>
> On 2020-12-11 17:51, Saravana Kannan wrote:
> > On Fri, Dec 11, 2020 at 8:34 AM Robin Murphy <[email protected]>
> > wrote:
> >>
> >> On 2020-12-11 14:11, Qian Cai wrote:
> >> > On Fri, 2020-11-20 at 18:02 -0800, Saravana Kannan wrote:
> >> >> The current implementation of fw_devlink is very inefficient because it
> >> >> tries to get away without creating fwnode links in the name of saving
> >> >> memory usage. Past attempts to optimize runtime at the cost of memory
> >> >> usage were blocked with request for data showing that the optimization
> >> >> made significant improvement for real world scenarios.
> >> >>
> >> >> We have those scenarios now. There have been several reports of boot
> >> >> time increase in the order of seconds in this thread [1]. Several OEMs
> >> >> and SoC manufacturers have also privately reported significant
> >> >> (350-400ms) increase in boot time due to all the parsing done by
> >> >> fw_devlink.
> >> >>
> >> >> So this patch uses all the setup done by the previous patches in this
> >> >> series to refactor fw_devlink to be more efficient. Most of the code has
> >> >> been moved out of firmware specific (DT mostly) code into driver core.
> >> >>
> >> >> This brings the following benefits:
> >> >> - Instead of parsing the device tree multiple times during bootup,
> >> >> fw_devlink parses each fwnode node/property only once and creates
> >> >> fwnode links. The rest of the fw_devlink code then just looks at these
> >> >> fwnode links to do rest of the work.
> >> >>
> >> >> - Makes it much easier to debug probe issue due to fw_devlink in the
> >> >> future. fw_devlink=on blocks the probing of devices if they depend on
> >> >> a device that hasn't been added yet. With this refactor, it'll be very
> >> >> easy to tell what that device is because we now have a reference to
> >> >> the fwnode of the device.
> >> >>
> >> >> - Much easier to add fw_devlink support to ACPI and other firmware
> >> >> types. A refactor to move the common bits from DT specific code to
> >> >> driver core was in my TODO list as a prerequisite to adding ACPI
> >> >> support to fw_devlink. This series gets that done.
> >> >>
> >> >> [1] - https://lore.kernel.org/linux-omap/[email protected]/
> >> >> Signed-off-by: Saravana Kannan <[email protected]>
> >> >> Tested-by: Laurent Pinchart <[email protected]>
> >> >> Tested-by: Grygorii Strashko <[email protected]>
> >> >
> >> > Reverting this commit and its dependency:
> >> >
> >> > 2d09e6eb4a6f driver core: Delete pointless parameter in fwnode_operations.add_links
> >> >
> >> > from today's linux-next fixed a boot crash on an arm64 Thunder X2 server.
> >>
> >> Since the call stack implicates the platform-device-wrangling we do in
> >> IORT code I took a quick look; AFAICS my guess would be it's blowing
> >> up
> >> trying to walk a zeroed list head since "driver core: Add
> >> fwnode_init()"
> >> missed acpi_alloc_fwnode_static().
> >
> > Thanks Robin. I'm pretty sure this is the reason. I thought I fixed
> > all ACPI cases, but clearly I missed this one. I'll send out a patch
> > for this today. If you think there are any other places I missed
> > please let me know. I'll try some git grep foo to see if I missed any
> > other instances of fwnode ops being set.
>
> Yup, that fixed it here (QDF2400).
>
> Thanks,
>
> M.
>
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 39263c6b52e1..2630c2e953f7 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -55,7 +55,7 @@ static inline struct fwnode_handle
> *acpi_alloc_fwnode_static(void)
> if (!fwnode)
> return NULL;
>
> - fwnode->ops = &acpi_static_fwnode_ops;
> + fwnode_init(fwnode, &acpi_static_fwnode_ops);
>
> return fwnode;
> }
>

Lol, my only contribution to the patch will be the commit text. I'll
send them with reported-by, suggested-by and tested-by if no one less
beats me to it.

-Saravana

2020-12-11 19:28:20

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] driver core: Refactor fw_devlink feature

On 2020-12-11 18:20, Saravana Kannan wrote:

> Lol, my only contribution to the patch will be the commit text. I'll
> send them with reported-by, suggested-by and tested-by if no one less
> beats me to it.

Teamwork!

M.
--
Jazz is not dead. It just smells funny...

2020-12-12 16:40:51

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] driver core: Refactor fw_devlink feature

On Fri, Dec 11, 2020 at 8:34 AM Robin Murphy <[email protected]> wrote:
>
> On 2020-12-11 14:11, Qian Cai wrote:
> > On Fri, 2020-11-20 at 18:02 -0800, Saravana Kannan wrote:
> >> The current implementation of fw_devlink is very inefficient because it
> >> tries to get away without creating fwnode links in the name of saving
> >> memory usage. Past attempts to optimize runtime at the cost of memory
> >> usage were blocked with request for data showing that the optimization
> >> made significant improvement for real world scenarios.
> >>
> >> We have those scenarios now. There have been several reports of boot
> >> time increase in the order of seconds in this thread [1]. Several OEMs
> >> and SoC manufacturers have also privately reported significant
> >> (350-400ms) increase in boot time due to all the parsing done by
> >> fw_devlink.
> >>
> >> So this patch uses all the setup done by the previous patches in this
> >> series to refactor fw_devlink to be more efficient. Most of the code has
> >> been moved out of firmware specific (DT mostly) code into driver core.
> >>
> >> This brings the following benefits:
> >> - Instead of parsing the device tree multiple times during bootup,
> >> fw_devlink parses each fwnode node/property only once and creates
> >> fwnode links. The rest of the fw_devlink code then just looks at these
> >> fwnode links to do rest of the work.
> >>
> >> - Makes it much easier to debug probe issue due to fw_devlink in the
> >> future. fw_devlink=on blocks the probing of devices if they depend on
> >> a device that hasn't been added yet. With this refactor, it'll be very
> >> easy to tell what that device is because we now have a reference to
> >> the fwnode of the device.
> >>
> >> - Much easier to add fw_devlink support to ACPI and other firmware
> >> types. A refactor to move the common bits from DT specific code to
> >> driver core was in my TODO list as a prerequisite to adding ACPI
> >> support to fw_devlink. This series gets that done.
> >>
> >> [1] - https://lore.kernel.org/linux-omap/[email protected]/
> >> Signed-off-by: Saravana Kannan <[email protected]>
> >> Tested-by: Laurent Pinchart <[email protected]>
> >> Tested-by: Grygorii Strashko <[email protected]>
> >
> > Reverting this commit and its dependency:
> >
> > 2d09e6eb4a6f driver core: Delete pointless parameter in fwnode_operations.add_links
> >
> > from today's linux-next fixed a boot crash on an arm64 Thunder X2 server.
>
> Since the call stack implicates the platform-device-wrangling we do in
> IORT code I took a quick look; AFAICS my guess would be it's blowing up
> trying to walk a zeroed list head since "driver core: Add fwnode_init()"
> missed acpi_alloc_fwnode_static().

Thanks Robin. I'm pretty sure this is the reason. I thought I fixed
all ACPI cases, but clearly I missed this one. I'll send out a patch
for this today. If you think there are any other places I missed
please let me know. I'll try some git grep foo to see if I missed any
other instances of fwnode ops being set.

-Saravana

2020-12-12 19:35:01

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] driver core: Refactor fw_devlink feature

On 2020-12-11 14:11, Qian Cai wrote:
> On Fri, 2020-11-20 at 18:02 -0800, Saravana Kannan wrote:
>> The current implementation of fw_devlink is very inefficient because it
>> tries to get away without creating fwnode links in the name of saving
>> memory usage. Past attempts to optimize runtime at the cost of memory
>> usage were blocked with request for data showing that the optimization
>> made significant improvement for real world scenarios.
>>
>> We have those scenarios now. There have been several reports of boot
>> time increase in the order of seconds in this thread [1]. Several OEMs
>> and SoC manufacturers have also privately reported significant
>> (350-400ms) increase in boot time due to all the parsing done by
>> fw_devlink.
>>
>> So this patch uses all the setup done by the previous patches in this
>> series to refactor fw_devlink to be more efficient. Most of the code has
>> been moved out of firmware specific (DT mostly) code into driver core.
>>
>> This brings the following benefits:
>> - Instead of parsing the device tree multiple times during bootup,
>> fw_devlink parses each fwnode node/property only once and creates
>> fwnode links. The rest of the fw_devlink code then just looks at these
>> fwnode links to do rest of the work.
>>
>> - Makes it much easier to debug probe issue due to fw_devlink in the
>> future. fw_devlink=on blocks the probing of devices if they depend on
>> a device that hasn't been added yet. With this refactor, it'll be very
>> easy to tell what that device is because we now have a reference to
>> the fwnode of the device.
>>
>> - Much easier to add fw_devlink support to ACPI and other firmware
>> types. A refactor to move the common bits from DT specific code to
>> driver core was in my TODO list as a prerequisite to adding ACPI
>> support to fw_devlink. This series gets that done.
>>
>> [1] - https://lore.kernel.org/linux-omap/[email protected]/
>> Signed-off-by: Saravana Kannan <[email protected]>
>> Tested-by: Laurent Pinchart <[email protected]>
>> Tested-by: Grygorii Strashko <[email protected]>
>
> Reverting this commit and its dependency:
>
> 2d09e6eb4a6f driver core: Delete pointless parameter in fwnode_operations.add_links
>
> from today's linux-next fixed a boot crash on an arm64 Thunder X2 server.

Since the call stack implicates the platform-device-wrangling we do in
IORT code I took a quick look; AFAICS my guess would be it's blowing up
trying to walk a zeroed list head since "driver core: Add fwnode_init()"
missed acpi_alloc_fwnode_static().

Robin.

> .config: https://cailca.coding.net/public/linux/mm/git/files/master/arm64.config
>
> [ 57.413929][ T1] ACPI: 5 ACPI AML tables successfully acquired and loaded
> [ 60.571643][ T1] ACPI: Interpreter enabled
> [ 60.576104][ T1] ACPI: Using GIC for interrupt routing
> [ 60.582474][ T1] ACPI: MCFG table detected, 1 entries
> [ 60.588051][ T1] ACPI: IORT: SMMU-v3[402300000] Mapped to Proximity domain 0
> [ 60.601374][ T1] Unable to handle kernel paging request at virtual address dfff800000000000
> [ 60.610146][ T1] Mem abort info:
> [ 60.613694][ T1] ESR = 0x96000004
> [ 60.617496][ T1] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 60.623616][ T1] SET = 0, FnV = 0
> [ 60.627420][ T1] EA = 0, S1PTW = 0
> [ 60.631304][ T1] Data abort info:
> [ 60.634957][ T1] ISV = 0, ISS = 0x00000004
> [ 60.639546][ T1] CM = 0, WnR = 0
> [ 60.643255][ T1] [dfff800000000000] address between user and kernel address ranges
> [ 60.651226][ T1] Internal error: Oops: 96000004 [#1] SMP
> [ 60.656864][ T1] Modules linked in:
> [ 60.660658][ T1] CPU: 38 PID: 1 Comm: swapper/0 Tainted: G W 5.10.0-rc7-next-20201211 #2
> [ 60.670424][ T1] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.16 07/29/2020
> [ 60.680979][ T1] pstate: 10400009 (nzcV daif +PAN -UAO -TCO BTYPE=--)
> [ 60.687757][ T1] pc : device_add+0xf60/0x16b0
> [ 60.692430][ T1] lr : device_add+0xf08/0x16b0
> [ 60.697098][ T1] sp : ffff0000063bf7d0
> [ 60.701147][ T1] x29: ffff0000063bf7d0 x28: ffff00001f760810
> [ 60.707226][ T1] x27: fffffffffffffff8 x26: ffff00001f760858
> [ 60.713304][ T1] x25: ffff00001f760c58 x24: fffffffffffffff8
> [ 60.719381][ T1] x23: 1fffe00003eec10b x22: ffff8000190d0260
> [ 60.725458][ T1] x21: ffff800011dba708 x20: 0000000000000000
> [ 60.731535][ T1] x19: 1fffe00000c77f10 x18: 1fffe001cf0d53ed
> [ 60.737616][ T1] x17: 0000000000000000 x16: 1ffff000033676f1
> [ 60.743709][ T1] x15: 0000000000000000 x14: ffff800011731e34
> [ 60.749786][ T1] x13: ffff700003217fb9 x12: 1ffff00003217fb8
> [ 60.755864][ T1] x11: 1ffff00003217fb8 x10: ffff700003217fb8
> [ 60.761940][ T1] x9 : dfff800000000000 x8 : ffff8000190bfdc7
> [ 60.768017][ T1] x7 : 0000000000000001 x6 : ffff700003217fb9
> [ 60.774094][ T1] x5 : ffff700003217fb9 x4 : ffff000006324a80
> [ 60.780170][ T1] x3 : 1fffe00000c64951 x2 : 0000000000000000
> [ 60.786247][ T1] x1 : 0000000000000000 x0 : dfff800000000000
> [ 60.792324][ T1] Call trace:
> [ 60.795495][ T1] device_add+0xf60/0x16b0
> __fw_devlink_link_to_consumers at drivers/base/core.c:1583
> (inlined by) fw_devlink_link_device at drivers/base/core.c:1726
> (inlined by) device_add at drivers/base/core.c:3088
> [ 60.799813][ T1] platform_device_add+0x274/0x628
> [ 60.804833][ T1] acpi_iort_init+0x9d8/0xc50
> [ 60.809415][ T1] acpi_init+0x45c/0x4e8
> [ 60.813556][ T1] do_one_initcall+0x170/0xb70
> [ 60.818224][ T1] kernel_init_freeable+0x6a8/0x734
> [ 60.823332][ T1] kernel_init+0x18/0x12c
> [ 60.827566][ T1] ret_from_fork+0x10/0x1c
> [ 60.831890][ T1] Code: f2fbffe0 d100205b d343fc41 aa1b03f8 (38e06820)
> [ 60.838756][ T1] ---[ end trace fa5c8ce17a226d83 ]---
> [ 60.844127][ T1] Kernel panic - not syncing: Oops: Fatal exception
> [ 60.850881][ T1] SMP: stopping secondary CPUs
> [ 60.855733][ T1] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
>> ---
>> drivers/base/core.c | 325 ++++++++++++++++++++++++++++++-----------
>> include/linux/device.h | 5 -
>> 2 files changed, 238 insertions(+), 92 deletions(-)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index 1873cecb0cc4..9edf9084fc98 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -46,8 +46,6 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
>> #endif
>>
>> /* Device links support. */
>> -static LIST_HEAD(wait_for_suppliers);
>> -static DEFINE_MUTEX(wfs_lock);
>> static LIST_HEAD(deferred_sync);
>> static unsigned int defer_sync_state_count = 1;
>> static DEFINE_MUTEX(fwnode_link_lock);
>> @@ -803,74 +801,6 @@ struct device_link *device_link_add(struct device *consumer,
>> }
>> EXPORT_SYMBOL_GPL(device_link_add);
>>
>> -/**
>> - * device_link_wait_for_supplier - Add device to wait_for_suppliers list
>> - * @consumer: Consumer device
>> - *
>> - * Marks the @consumer device as waiting for suppliers to become available by
>> - * adding it to the wait_for_suppliers list. The consumer device will never be
>> - * probed until it's removed from the wait_for_suppliers list.
>> - *
>> - * The caller is responsible for adding the links to the supplier devices once
>> - * they are available and removing the @consumer device from the
>> - * wait_for_suppliers list once links to all the suppliers have been created.
>> - *
>> - * This function is NOT meant to be called from the probe function of the
>> - * consumer but rather from code that creates/adds the consumer device.
>> - */
>> -static void device_link_wait_for_supplier(struct device *consumer,
>> - bool need_for_probe)
>> -{
>> - mutex_lock(&wfs_lock);
>> - list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
>> - consumer->links.need_for_probe = need_for_probe;
>> - mutex_unlock(&wfs_lock);
>> -}
>> -
>> -static void device_link_wait_for_mandatory_supplier(struct device *consumer)
>> -{
>> - device_link_wait_for_supplier(consumer, true);
>> -}
>> -
>> -static void device_link_wait_for_optional_supplier(struct device *consumer)
>> -{
>> - device_link_wait_for_supplier(consumer, false);
>> -}
>> -
>> -/**
>> - * device_link_add_missing_supplier_links - Add links from consumer devices to
>> - * supplier devices, leaving any
>> - * consumer with inactive suppliers on
>> - * the wait_for_suppliers list
>> - *
>> - * Loops through all consumers waiting on suppliers and tries to add all their
>> - * supplier links. If that succeeds, the consumer device is removed from
>> - * wait_for_suppliers list. Otherwise, they are left in the wait_for_suppliers
>> - * list. Devices left on the wait_for_suppliers list will not be probed.
>> - *
>> - * The fwnode add_links callback is expected to return 0 if it has found and
>> - * added all the supplier links for the consumer device. It should return an
>> - * error if it isn't able to do so.
>> - *
>> - * The caller of device_link_wait_for_supplier() is expected to call this once
>> - * it's aware of potential suppliers becoming available.
>> - */
>> -static void device_link_add_missing_supplier_links(void)
>> -{
>> - struct device *dev, *tmp;
>> -
>> - mutex_lock(&wfs_lock);
>> - list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
>> - links.needs_suppliers) {
>> - int ret = fwnode_call_int_op(dev->fwnode, add_links, dev);
>> - if (!ret)
>> - list_del_init(&dev->links.needs_suppliers);
>> - else if (ret != -ENODEV)
>> - dev->links.need_for_probe = false;
>> - }
>> - mutex_unlock(&wfs_lock);
>> -}
>> -
>> #ifdef CONFIG_SRCU
>> static void __device_link_del(struct kref *kref)
>> {
>> @@ -1195,9 +1125,8 @@ void device_links_driver_bound(struct device *dev)
>> * the device links it needs to or make new device links as it needs
>> * them. So, it no longer needs to wait on any suppliers.
>> */
>> - mutex_lock(&wfs_lock);
>> - list_del_init(&dev->links.needs_suppliers);
>> - mutex_unlock(&wfs_lock);
>> + if (dev->fwnode && dev->fwnode->dev == dev)
>> + fwnode_links_purge_suppliers(dev->fwnode);
>> device_remove_file(dev, &dev_attr_waiting_for_supplier);
>>
>> device_links_write_lock();
>> @@ -1488,10 +1417,6 @@ static void device_links_purge(struct device *dev)
>> if (dev->class == &devlink_class)
>> return;
>>
>> - mutex_lock(&wfs_lock);
>> - list_del(&dev->links.needs_suppliers);
>> - mutex_unlock(&wfs_lock);
>> -
>> /*
>> * Delete all of the remaining links from this device to any other
>> * devices (either consumers or suppliers).
>> @@ -1561,19 +1486,246 @@ static void fw_devlink_parse_fwtree(struct fwnode_handle *fwnode)
>> fw_devlink_parse_fwtree(child);
>> }
>>
>> -static void fw_devlink_link_device(struct device *dev)
>> +/**
>> + * fw_devlink_create_devlink - Create a device link from a consumer to fwnode
>> + * @con - Consumer device for the device link
>> + * @sup_handle - fwnode handle of supplier
>> + *
>> + * This function will try to create a device link between the consumer device
>> + * @con and the supplier device represented by @sup_handle.
>> + *
>> + * The supplier has to be provided as a fwnode because incorrect cycles in
>> + * fwnode links can sometimes cause the supplier device to never be created.
>> + * This function detects such cases and returns an error if it cannot create a
>> + * device link from the consumer to a missing supplier.
>> + *
>> + * Returns,
>> + * 0 on successfully creating a device link
>> + * -EINVAL if the device link cannot be created as expected
>> + * -EAGAIN if the device link cannot be created right now, but it may be
>> + * possible to do that in the future
>> + */
>> +static int fw_devlink_create_devlink(struct device *con,
>> + struct fwnode_handle *sup_handle, u32 flags)
>> +{
>> + struct device *sup_dev;
>> + int ret = 0;
>> +
>> + sup_dev = get_dev_from_fwnode(sup_handle);
>> + if (sup_dev) {
>> + /*
>> + * If this fails, it is due to cycles in device links. Just
>> + * give up on this link and treat it as invalid.
>> + */
>> + if (!device_link_add(con, sup_dev, flags))
>> + ret = -EINVAL;
>> +
>> + goto out;
>> + }
>> +
>> + /*
>> + * DL_FLAG_SYNC_STATE_ONLY doesn't block probing and supports
>> + * cycles. So cycle detection isn't necessary and shouldn't be
>> + * done.
>> + */
>> + if (flags & DL_FLAG_SYNC_STATE_ONLY)
>> + return -EAGAIN;
>> +
>> + /*
>> + * If we can't find the supplier device from its fwnode, it might be
>> + * due to a cyclic dependency between fwnodes. Some of these cycles can
>> + * be broken by applying logic. Check for these types of cycles and
>> + * break them so that devices in the cycle probe properly.
>> + *
>> + * If the supplier's parent is dependent on the consumer, then
>> + * the consumer-supplier dependency is a false dependency. So,
>> + * treat it as an invalid link.
>> + */
>> + sup_dev = fwnode_get_next_parent_dev(sup_handle);
>> + if (sup_dev && device_is_dependent(con, sup_dev)) {
>> + dev_dbg(con, "Not linking to %pfwP - False link\n",
>> + sup_handle);
>> + ret = -EINVAL;
>> + } else {
>> + /*
>> + * Can't check for cycles or no cycles. So let's try
>> + * again later.
>> + */
>> + ret = -EAGAIN;
>> + }
>> +
>> +out:
>> + put_device(sup_dev);
>> + return ret;
>> +}
>> +
>> +/**
>> + * __fw_devlink_link_to_consumers - Create device links to consumers of a device
>> + * @dev - Device that needs to be linked to its consumers
>> + *
>> + * This function looks at all the consumer fwnodes of @dev and creates device
>> + * links between the consumer device and @dev (supplier).
>> + *
>> + * If the consumer device has not been added yet, then this function creates a
>> + * SYNC_STATE_ONLY link between @dev (supplier) and the closest ancestor device
>> + * of the consumer fwnode. This is necessary to make sure @dev doesn't get a
>> + * sync_state() callback before the real consumer device gets to be added and
>> + * then probed.
>> + *
>> + * Once device links are created from the real consumer to @dev (supplier), the
>> + * fwnode links are deleted.
>> + */
>> +static void __fw_devlink_link_to_consumers(struct device *dev)
>> +{
>> + struct fwnode_handle *fwnode = dev->fwnode;
>> + struct fwnode_link *link, *tmp;
>> +
>> + list_for_each_entry_safe(link, tmp, &fwnode->consumers, s_hook) {
>> + u32 dl_flags = fw_devlink_get_flags();
>> + struct device *con_dev;
>> + bool own_link = true;
>> + int ret;
>> +
>> + con_dev = get_dev_from_fwnode(link->consumer);
>> + /*
>> + * If consumer device is not available yet, make a "proxy"
>> + * SYNC_STATE_ONLY link from the consumer's parent device to
>> + * the supplier device. This is necessary to make sure the
>> + * supplier doesn't get a sync_state() callback before the real
>> + * consumer can create a device link to the supplier.
>> + *
>> + * This proxy link step is needed to handle the case where the
>> + * consumer's parent device is added before the supplier.
>> + */
>> + if (!con_dev) {
>> + con_dev = fwnode_get_next_parent_dev(link->consumer);
>> + /*
>> + * However, if the consumer's parent device is also the
>> + * parent of the supplier, don't create a
>> + * consumer-supplier link from the parent to its child
>> + * device. Such a dependency is impossible.
>> + */
>> + if (con_dev &&
>> + fwnode_is_ancestor_of(con_dev->fwnode, fwnode)) {
>> + put_device(con_dev);
>> + con_dev = NULL;
>> + } else {
>> + own_link = false;
>> + dl_flags = DL_FLAG_SYNC_STATE_ONLY;
>> + }
>> + }
>> +
>> + if (!con_dev)
>> + continue;
>> +
>> + ret = fw_devlink_create_devlink(con_dev, fwnode, dl_flags);
>> + put_device(con_dev);
>> + if (!own_link || ret == -EAGAIN)
>> + continue;
>> +
>> + list_del(&link->s_hook);
>> + list_del(&link->c_hook);
>> + kfree(link);
>> + }
>> +}
>> +
>> +/**
>> + * __fw_devlink_link_to_suppliers - Create device links to suppliers of a device
>> + * @dev - The consumer device that needs to be linked to its suppliers
>> + * @fwnode - Root of the fwnode tree that is used to create device links
>> + *
>> + * This function looks at all the supplier fwnodes of fwnode tree rooted at
>> + * @fwnode and creates device links between @dev (consumer) and all the
>> + * supplier devices of the entire fwnode tree at @fwnode.
>> + *
>> + * The function creates normal (non-SYNC_STATE_ONLY) device links between @dev
>> + * and the real suppliers of @dev. Once these device links are created, the
>> + * fwnode links are deleted. When such device links are successfully created,
>> + * this function is called recursively on those supplier devices. This is
>> + * needed to detect and break some invalid cycles in fwnode links. See
>> + * fw_devlink_create_devlink() for more details.
>> + *
>> + * In addition, it also looks at all the suppliers of the entire fwnode tree
>> + * because some of the child devices of @dev that have not been added yet
>> + * (because @dev hasn't probed) might already have their suppliers added to
>> + * driver core. So, this function creates SYNC_STATE_ONLY device links between
>> + * @dev (consumer) and these suppliers to make sure they don't execute their
>> + * sync_state() callbacks before these child devices have a chance to create
>> + * their device links. The fwnode links that correspond to the child devices
>> + * aren't delete because they are needed later to create the device links
>> + * between the real consumer and supplier devices.
>> + */
>> +static void __fw_devlink_link_to_suppliers(struct device *dev,
>> + struct fwnode_handle *fwnode)
>> {
>> - int fw_ret;
>> + bool own_link = (dev->fwnode == fwnode);
>> + struct fwnode_link *link, *tmp;
>> + struct fwnode_handle *child = NULL;
>> + u32 dl_flags;
>> +
>> + if (own_link)
>> + dl_flags = fw_devlink_get_flags();
>> + else
>> + dl_flags = DL_FLAG_SYNC_STATE_ONLY;
>>
>> - device_link_add_missing_supplier_links();
>> + list_for_each_entry_safe(link, tmp, &fwnode->suppliers, c_hook) {
>> + int ret;
>> + struct device *sup_dev;
>> + struct fwnode_handle *sup = link->supplier;
>> +
>> + ret = fw_devlink_create_devlink(dev, sup, dl_flags);
>> + if (!own_link || ret == -EAGAIN)
>> + continue;
>>
>> - if (fw_devlink_flags && fwnode_has_op(dev->fwnode, add_links)) {
>> - fw_ret = fwnode_call_int_op(dev->fwnode, add_links, dev);
>> - if (fw_ret == -ENODEV && !fw_devlink_is_permissive())
>> - device_link_wait_for_mandatory_supplier(dev);
>> - else if (fw_ret)
>> - device_link_wait_for_optional_supplier(dev);
>> + list_del(&link->s_hook);
>> + list_del(&link->c_hook);
>> + kfree(link);
>> +
>> + /* If no device link was created, nothing more to do. */
>> + if (ret)
>> + continue;
>> +
>> + /*
>> + * If a device link was successfully created to a supplier, we
>> + * now need to try and link the supplier to all its suppliers.
>> + *
>> + * This is needed to detect and delete false dependencies in
>> + * fwnode links that haven't been converted to a device link
>> + * yet. See comments in fw_devlink_create_devlink() for more
>> + * details on the false dependency.
>> + *
>> + * Without deleting these false dependencies, some devices will
>> + * never probe because they'll keep waiting for their false
>> + * dependency fwnode links to be converted to device links.
>> + */
>> + sup_dev = get_dev_from_fwnode(sup);
>> + __fw_devlink_link_to_suppliers(sup_dev, sup_dev->fwnode);
>> + put_device(sup_dev);
>> }
>> +
>> + /*
>> + * Make "proxy" SYNC_STATE_ONLY device links to represent the needs of
>> + * all the descendants. This proxy link step is needed to handle the
>> + * case where the supplier is added before the consumer's parent device
>> + * (@dev).
>> + */
>> + while ((child = fwnode_get_next_available_child_node(fwnode, child)))
>> + __fw_devlink_link_to_suppliers(dev, child);
>> +}
>> +
>> +static void fw_devlink_link_device(struct device *dev)
>> +{
>> + struct fwnode_handle *fwnode = dev->fwnode;
>> +
>> + if (!fw_devlink_flags)
>> + return;
>> +
>> + fw_devlink_parse_fwtree(fwnode);
>> +
>> + mutex_lock(&fwnode_link_lock);
>> + __fw_devlink_link_to_consumers(dev);
>> + __fw_devlink_link_to_suppliers(dev, fwnode);
>> + mutex_unlock(&fwnode_link_lock);
>> }
>>
>> /* Device links support end. */
>> @@ -2431,7 +2583,6 @@ void device_initialize(struct device *dev)
>> #endif
>> INIT_LIST_HEAD(&dev->links.consumers);
>> INIT_LIST_HEAD(&dev->links.suppliers);
>> - INIT_LIST_HEAD(&dev->links.needs_suppliers);
>> INIT_LIST_HEAD(&dev->links.defer_sync);
>> dev->links.status = DL_DEV_NO_DRIVER;
>> }
>> diff --git a/include/linux/device.h b/include/linux/device.h
>> index 1e771ea4dca6..89bb8b84173e 100644
>> --- a/include/linux/device.h
>> +++ b/include/linux/device.h
>> @@ -351,18 +351,13 @@ enum dl_dev_state {
>> * struct dev_links_info - Device data related to device links.
>> * @suppliers: List of links to supplier devices.
>> * @consumers: List of links to consumer devices.
>> - * @needs_suppliers: Hook to global list of devices waiting for suppliers.
>> * @defer_sync: Hook to global list of devices that have deferred sync_state.
>> - * @need_for_probe: If needs_suppliers is on a list, this indicates if the
>> - * suppliers are needed for probe or not.
>> * @status: Driver status information.
>> */
>> struct dev_links_info {
>> struct list_head suppliers;
>> struct list_head consumers;
>> - struct list_head needs_suppliers;
>> struct list_head defer_sync;
>> - bool need_for_probe;
>> enum dl_dev_state status;
>> };
>>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

2020-12-13 05:18:27

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] driver core: Refactor fw_devlink feature

On Fri, Dec 11, 2020 at 11:07 AM Marc Zyngier <[email protected]> wrote:
>
> On 2020-12-11 18:20, Saravana Kannan wrote:
>
> > Lol, my only contribution to the patch will be the commit text. I'll
> > send them with reported-by, suggested-by and tested-by if no one less
> > beats me to it.
>
> Teamwork!
>
> M.

Forgot to CC most of the folks/lists here, but patch has been sent.

https://lore.kernel.org/lkml/[email protected]/

-Saravana

2020-12-29 03:36:55

by Michael Walle

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] driver core: Refactor fw_devlink feature

> The current implementation of fw_devlink is very inefficient because it
> tries to get away without creating fwnode links in the name of saving
> memory usage. Past attempts to optimize runtime at the cost of memory
> usage were blocked with request for data showing that the optimization
> made significant improvement for real world scenarios.
>
> We have those scenarios now. There have been several reports of boot
> time increase in the order of seconds in this thread [1]. Several OEMs
> and SoC manufacturers have also privately reported significant
> (350-400ms) increase in boot time due to all the parsing done by
> fw_devlink.
>
> So this patch uses all the setup done by the previous patches in this
> series to refactor fw_devlink to be more efficient. Most of the code has
> been moved out of firmware specific (DT mostly) code into driver core.
>
> This brings the following benefits:
> - Instead of parsing the device tree multiple times during bootup,
> fw_devlink parses each fwnode node/property only once and creates
> fwnode links. The rest of the fw_devlink code then just looks at these
> fwnode links to do rest of the work.
>
> - Makes it much easier to debug probe issue due to fw_devlink in the
> future. fw_devlink=on blocks the probing of devices if they depend on
> a device that hasn't been added yet. With this refactor, it'll be very
> easy to tell what that device is because we now have a reference to
> the fwnode of the device.
>
> - Much easier to add fw_devlink support to ACPI and other firmware
> types. A refactor to move the common bits from DT specific code to
> driver core was in my TODO list as a prerequisite to adding ACPI
> support to fw_devlink. This series gets that done.
>
> [1] - https://lore.kernel.org/linux-omap/[email protected]/
> Signed-off-by: Saravana Kannan <[email protected]>
> Tested-by: Laurent Pinchart <[email protected]>
> Tested-by: Grygorii Strashko <[email protected]>

git bisect show that this commit broke my board in 5.11-rc1:

[ 2.294375] sysfs: cannot create duplicate filename '/devices/virtual/devlink/0000:00:00.1--0000:00:00.1'
[ 2.303999] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc1-00016-ga0fb284b267 #267
[ 2.312125] Hardware name: Kontron SMARC-sAL28 (4 Lane) (DT)
[ 2.317804] Call trace:
[ 2.320253] dump_backtrace+0x0/0x1b8
[ 2.323936] show_stack+0x20/0x70
[ 2.327263] dump_stack+0xd8/0x134
[ 2.330677] sysfs_warn_dup+0x6c/0x88
[ 2.334351] sysfs_create_dir_ns+0xe8/0x100
[ 2.338547] kobject_add_internal+0x9c/0x290
[ 2.342833] kobject_add+0xa0/0x108
[ 2.346331] device_add+0xfc/0x798
[ 2.349746] device_link_add+0x454/0x5e0
[ 2.353682] fw_devlink_create_devlink+0xb8/0xc8
[ 2.358316] __fw_devlink_link_to_suppliers+0x84/0x180
[ 2.363474] __fw_devlink_link_to_suppliers+0x134/0x180
[ 2.368718] device_add+0x778/0x798
[ 2.372217] device_register+0x28/0x38
[ 2.375979] __mdiobus_register+0x94/0x340
[ 2.380089] of_mdiobus_register+0xb4/0x380
[ 2.384285] enetc_pf_probe+0x73c/0xb10
[ 2.388132] local_pci_probe+0x48/0xb8
[ 2.391896] pci_device_probe+0x120/0x1c0
[ 2.395920] really_probe+0xec/0x3c0
[ 2.399505] driver_probe_device+0x60/0xc0
[ 2.403614] device_driver_attach+0x7c/0x88
[ 2.407810] __driver_attach+0x60/0xe8
[ 2.411570] bus_for_each_dev+0x7c/0xd0
[ 2.415419] driver_attach+0x2c/0x38
[ 2.419004] bus_add_driver+0x194/0x1f8
[ 2.422851] driver_register+0x6c/0x128
[ 2.426698] __pci_register_driver+0x4c/0x58
[ 2.430983] enetc_pf_driver_init+0x2c/0x38
[ 2.435181] do_one_initcall+0x54/0x2d8
[ 2.439029] kernel_init_freeable+0x1fc/0x268
[ 2.443403] kernel_init+0x1c/0x120
[ 2.446904] ret_from_fork+0x10/0x30
[ 2.450502] kobject_add_internal failed for 0000:00:00.1--0000:00:00.1 with -EEXIST, don't try to register things with the same name in the same directory.

Looks like it will generate that link twice? Let me know if I can help
testing.

See also: https://lavalab.kontron.com/scheduler/job/3894#L831

-michael

2021-01-05 19:04:02

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] driver core: Refactor fw_devlink feature

On Mon, Dec 28, 2020 at 7:35 PM Michael Walle <[email protected]> wrote:
>
> > The current implementation of fw_devlink is very inefficient because it
> > tries to get away without creating fwnode links in the name of saving
> > memory usage. Past attempts to optimize runtime at the cost of memory
> > usage were blocked with request for data showing that the optimization
> > made significant improvement for real world scenarios.
> >
> > We have those scenarios now. There have been several reports of boot
> > time increase in the order of seconds in this thread [1]. Several OEMs
> > and SoC manufacturers have also privately reported significant
> > (350-400ms) increase in boot time due to all the parsing done by
> > fw_devlink.
> >
> > So this patch uses all the setup done by the previous patches in this
> > series to refactor fw_devlink to be more efficient. Most of the code has
> > been moved out of firmware specific (DT mostly) code into driver core.
> >
> > This brings the following benefits:
> > - Instead of parsing the device tree multiple times during bootup,
> > fw_devlink parses each fwnode node/property only once and creates
> > fwnode links. The rest of the fw_devlink code then just looks at these
> > fwnode links to do rest of the work.
> >
> > - Makes it much easier to debug probe issue due to fw_devlink in the
> > future. fw_devlink=on blocks the probing of devices if they depend on
> > a device that hasn't been added yet. With this refactor, it'll be very
> > easy to tell what that device is because we now have a reference to
> > the fwnode of the device.
> >
> > - Much easier to add fw_devlink support to ACPI and other firmware
> > types. A refactor to move the common bits from DT specific code to
> > driver core was in my TODO list as a prerequisite to adding ACPI
> > support to fw_devlink. This series gets that done.
> >
> > [1] - https://lore.kernel.org/linux-omap/[email protected]/
> > Signed-off-by: Saravana Kannan <[email protected]>
> > Tested-by: Laurent Pinchart <[email protected]>
> > Tested-by: Grygorii Strashko <[email protected]>
>
> git bisect show that this commit broke my board in 5.11-rc1:
>
> [ 2.294375] sysfs: cannot create duplicate filename '/devices/virtual/devlink/0000:00:00.1--0000:00:00.1'
> [ 2.303999] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc1-00016-ga0fb284b267 #267
> [ 2.312125] Hardware name: Kontron SMARC-sAL28 (4 Lane) (DT)
> [ 2.317804] Call trace:
> [ 2.320253] dump_backtrace+0x0/0x1b8
> [ 2.323936] show_stack+0x20/0x70
> [ 2.327263] dump_stack+0xd8/0x134
> [ 2.330677] sysfs_warn_dup+0x6c/0x88
> [ 2.334351] sysfs_create_dir_ns+0xe8/0x100
> [ 2.338547] kobject_add_internal+0x9c/0x290
> [ 2.342833] kobject_add+0xa0/0x108
> [ 2.346331] device_add+0xfc/0x798
> [ 2.349746] device_link_add+0x454/0x5e0
> [ 2.353682] fw_devlink_create_devlink+0xb8/0xc8
> [ 2.358316] __fw_devlink_link_to_suppliers+0x84/0x180
> [ 2.363474] __fw_devlink_link_to_suppliers+0x134/0x180
> [ 2.368718] device_add+0x778/0x798
> [ 2.372217] device_register+0x28/0x38
> [ 2.375979] __mdiobus_register+0x94/0x340
> [ 2.380089] of_mdiobus_register+0xb4/0x380
> [ 2.384285] enetc_pf_probe+0x73c/0xb10
> [ 2.388132] local_pci_probe+0x48/0xb8
> [ 2.391896] pci_device_probe+0x120/0x1c0
> [ 2.395920] really_probe+0xec/0x3c0
> [ 2.399505] driver_probe_device+0x60/0xc0
> [ 2.403614] device_driver_attach+0x7c/0x88
> [ 2.407810] __driver_attach+0x60/0xe8
> [ 2.411570] bus_for_each_dev+0x7c/0xd0
> [ 2.415419] driver_attach+0x2c/0x38
> [ 2.419004] bus_add_driver+0x194/0x1f8
> [ 2.422851] driver_register+0x6c/0x128
> [ 2.426698] __pci_register_driver+0x4c/0x58
> [ 2.430983] enetc_pf_driver_init+0x2c/0x38
> [ 2.435181] do_one_initcall+0x54/0x2d8
> [ 2.439029] kernel_init_freeable+0x1fc/0x268
> [ 2.443403] kernel_init+0x1c/0x120
> [ 2.446904] ret_from_fork+0x10/0x30
> [ 2.450502] kobject_add_internal failed for 0000:00:00.1--0000:00:00.1 with -EEXIST, don't try to register things with the same name in the same directory.
>
> Looks like it will generate that link twice? Let me know if I can help
> testing.
>
> See also: https://lavalab.kontron.com/scheduler/job/3894#L831

I'll look into this this week. Is the DT for this board in upstream?
If so, can you point me to the DT file(s)?

Also, can you give me the output of this?
find /sys/devices -type d | grep "0000:00:00.1"

Thanks,
Saravana

2021-01-05 23:03:35

by Michael Walle

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] driver core: Refactor fw_devlink feature

Am 2021-01-05 20:00, schrieb Saravana Kannan:
> On Mon, Dec 28, 2020 at 7:35 PM Michael Walle <[email protected]> wrote:
>>
>> > The current implementation of fw_devlink is very inefficient because it
>> > tries to get away without creating fwnode links in the name of saving
>> > memory usage. Past attempts to optimize runtime at the cost of memory
>> > usage were blocked with request for data showing that the optimization
>> > made significant improvement for real world scenarios.
>> >
>> > We have those scenarios now. There have been several reports of boot
>> > time increase in the order of seconds in this thread [1]. Several OEMs
>> > and SoC manufacturers have also privately reported significant
>> > (350-400ms) increase in boot time due to all the parsing done by
>> > fw_devlink.
>> >
>> > So this patch uses all the setup done by the previous patches in this
>> > series to refactor fw_devlink to be more efficient. Most of the code has
>> > been moved out of firmware specific (DT mostly) code into driver core.
>> >
>> > This brings the following benefits:
>> > - Instead of parsing the device tree multiple times during bootup,
>> > fw_devlink parses each fwnode node/property only once and creates
>> > fwnode links. The rest of the fw_devlink code then just looks at these
>> > fwnode links to do rest of the work.
>> >
>> > - Makes it much easier to debug probe issue due to fw_devlink in the
>> > future. fw_devlink=on blocks the probing of devices if they depend on
>> > a device that hasn't been added yet. With this refactor, it'll be very
>> > easy to tell what that device is because we now have a reference to
>> > the fwnode of the device.
>> >
>> > - Much easier to add fw_devlink support to ACPI and other firmware
>> > types. A refactor to move the common bits from DT specific code to
>> > driver core was in my TODO list as a prerequisite to adding ACPI
>> > support to fw_devlink. This series gets that done.
>> >
>> > [1] - https://lore.kernel.org/linux-omap/[email protected]/
>> > Signed-off-by: Saravana Kannan <[email protected]>
>> > Tested-by: Laurent Pinchart <[email protected]>
>> > Tested-by: Grygorii Strashko <[email protected]>
>>
>> git bisect show that this commit broke my board in 5.11-rc1:
>>
>> [ 2.294375] sysfs: cannot create duplicate filename
>> '/devices/virtual/devlink/0000:00:00.1--0000:00:00.1'
>> [ 2.303999] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
>> 5.11.0-rc1-00016-ga0fb284b267 #267
>> [ 2.312125] Hardware name: Kontron SMARC-sAL28 (4 Lane) (DT)
>> [ 2.317804] Call trace:
>> [ 2.320253] dump_backtrace+0x0/0x1b8
>> [ 2.323936] show_stack+0x20/0x70
>> [ 2.327263] dump_stack+0xd8/0x134
>> [ 2.330677] sysfs_warn_dup+0x6c/0x88
>> [ 2.334351] sysfs_create_dir_ns+0xe8/0x100
>> [ 2.338547] kobject_add_internal+0x9c/0x290
>> [ 2.342833] kobject_add+0xa0/0x108
>> [ 2.346331] device_add+0xfc/0x798
>> [ 2.349746] device_link_add+0x454/0x5e0
>> [ 2.353682] fw_devlink_create_devlink+0xb8/0xc8
>> [ 2.358316] __fw_devlink_link_to_suppliers+0x84/0x180
>> [ 2.363474] __fw_devlink_link_to_suppliers+0x134/0x180
>> [ 2.368718] device_add+0x778/0x798
>> [ 2.372217] device_register+0x28/0x38
>> [ 2.375979] __mdiobus_register+0x94/0x340
>> [ 2.380089] of_mdiobus_register+0xb4/0x380
>> [ 2.384285] enetc_pf_probe+0x73c/0xb10
>> [ 2.388132] local_pci_probe+0x48/0xb8
>> [ 2.391896] pci_device_probe+0x120/0x1c0
>> [ 2.395920] really_probe+0xec/0x3c0
>> [ 2.399505] driver_probe_device+0x60/0xc0
>> [ 2.403614] device_driver_attach+0x7c/0x88
>> [ 2.407810] __driver_attach+0x60/0xe8
>> [ 2.411570] bus_for_each_dev+0x7c/0xd0
>> [ 2.415419] driver_attach+0x2c/0x38
>> [ 2.419004] bus_add_driver+0x194/0x1f8
>> [ 2.422851] driver_register+0x6c/0x128
>> [ 2.426698] __pci_register_driver+0x4c/0x58
>> [ 2.430983] enetc_pf_driver_init+0x2c/0x38
>> [ 2.435181] do_one_initcall+0x54/0x2d8
>> [ 2.439029] kernel_init_freeable+0x1fc/0x268
>> [ 2.443403] kernel_init+0x1c/0x120
>> [ 2.446904] ret_from_fork+0x10/0x30
>> [ 2.450502] kobject_add_internal failed for
>> 0000:00:00.1--0000:00:00.1 with -EEXIST, don't try to register things
>> with the same name in the same directory.
>>
>> Looks like it will generate that link twice? Let me know if I can help
>> testing.
>>
>> See also: https://lavalab.kontron.com/scheduler/job/3894#L831
>
> I'll look into this this week. Is the DT for this board in upstream?
> If so, can you point me to the DT file(s)?

arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-kbox-a-230-ls.dts

> Also, can you give me the output of this?
> find /sys/devices -type d | grep "0000:00:00.1"

# uname -a
Linux buildroot 5.11.0-rc1-next-20210104 #298 SMP PREEMPT Tue Jan 5
21:55:23 CET 2021 aarch64 GNU/Linux
# find /sys/devices -type d | grep "0000:00:00.1"
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/power
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/statistics
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/power
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-6
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-6/byte_queue_limits
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-4
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-4/byte_queue_limits
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-7
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-2
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-2/byte_queue_limits
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-5
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-0
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-0/byte_queue_limits
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-3
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-1
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-7
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-7/byte_queue_limits
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-5
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-5/byte_queue_limits
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-3
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-3/byte_queue_limits
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-6
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-1
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-1/byte_queue_limits
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-4
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-2
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-0
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/statistics
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/power
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/statistics
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/regulator
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/regulator/regulator.3
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/regulator/regulator.3/power
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/regulator/regulator.4
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/regulator/regulator.4/power
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/power
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/msi_irqs
/sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/link
/sys/devices/virtual/devlink/5000000.iommu--0000:00:00.1

HTH,
-michael

2021-01-06 23:33:00

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] driver core: Refactor fw_devlink feature

On Tue, Jan 5, 2021 at 1:04 PM Michael Walle <[email protected]> wrote:
>
> Am 2021-01-05 20:00, schrieb Saravana Kannan:
> > On Mon, Dec 28, 2020 at 7:35 PM Michael Walle <[email protected]> wrote:
> >>
> >> > The current implementation of fw_devlink is very inefficient because it
> >> > tries to get away without creating fwnode links in the name of saving
> >> > memory usage. Past attempts to optimize runtime at the cost of memory
> >> > usage were blocked with request for data showing that the optimization
> >> > made significant improvement for real world scenarios.
> >> >
> >> > We have those scenarios now. There have been several reports of boot
> >> > time increase in the order of seconds in this thread [1]. Several OEMs
> >> > and SoC manufacturers have also privately reported significant
> >> > (350-400ms) increase in boot time due to all the parsing done by
> >> > fw_devlink.
> >> >
> >> > So this patch uses all the setup done by the previous patches in this
> >> > series to refactor fw_devlink to be more efficient. Most of the code has
> >> > been moved out of firmware specific (DT mostly) code into driver core.
> >> >
> >> > This brings the following benefits:
> >> > - Instead of parsing the device tree multiple times during bootup,
> >> > fw_devlink parses each fwnode node/property only once and creates
> >> > fwnode links. The rest of the fw_devlink code then just looks at these
> >> > fwnode links to do rest of the work.
> >> >
> >> > - Makes it much easier to debug probe issue due to fw_devlink in the
> >> > future. fw_devlink=on blocks the probing of devices if they depend on
> >> > a device that hasn't been added yet. With this refactor, it'll be very
> >> > easy to tell what that device is because we now have a reference to
> >> > the fwnode of the device.
> >> >
> >> > - Much easier to add fw_devlink support to ACPI and other firmware
> >> > types. A refactor to move the common bits from DT specific code to
> >> > driver core was in my TODO list as a prerequisite to adding ACPI
> >> > support to fw_devlink. This series gets that done.
> >> >
> >> > [1] - https://lore.kernel.org/linux-omap/[email protected]/
> >> > Signed-off-by: Saravana Kannan <[email protected]>
> >> > Tested-by: Laurent Pinchart <[email protected]>
> >> > Tested-by: Grygorii Strashko <[email protected]>
> >>
> >> git bisect show that this commit broke my board in 5.11-rc1:
> >>
> >> [ 2.294375] sysfs: cannot create duplicate filename
> >> '/devices/virtual/devlink/0000:00:00.1--0000:00:00.1'
> >> [ 2.303999] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
> >> 5.11.0-rc1-00016-ga0fb284b267 #267
> >> [ 2.312125] Hardware name: Kontron SMARC-sAL28 (4 Lane) (DT)
> >> [ 2.317804] Call trace:
> >> [ 2.320253] dump_backtrace+0x0/0x1b8
> >> [ 2.323936] show_stack+0x20/0x70
> >> [ 2.327263] dump_stack+0xd8/0x134
> >> [ 2.330677] sysfs_warn_dup+0x6c/0x88
> >> [ 2.334351] sysfs_create_dir_ns+0xe8/0x100
> >> [ 2.338547] kobject_add_internal+0x9c/0x290
> >> [ 2.342833] kobject_add+0xa0/0x108
> >> [ 2.346331] device_add+0xfc/0x798
> >> [ 2.349746] device_link_add+0x454/0x5e0
> >> [ 2.353682] fw_devlink_create_devlink+0xb8/0xc8
> >> [ 2.358316] __fw_devlink_link_to_suppliers+0x84/0x180
> >> [ 2.363474] __fw_devlink_link_to_suppliers+0x134/0x180
> >> [ 2.368718] device_add+0x778/0x798
> >> [ 2.372217] device_register+0x28/0x38
> >> [ 2.375979] __mdiobus_register+0x94/0x340
> >> [ 2.380089] of_mdiobus_register+0xb4/0x380
> >> [ 2.384285] enetc_pf_probe+0x73c/0xb10
> >> [ 2.388132] local_pci_probe+0x48/0xb8
> >> [ 2.391896] pci_device_probe+0x120/0x1c0
> >> [ 2.395920] really_probe+0xec/0x3c0
> >> [ 2.399505] driver_probe_device+0x60/0xc0
> >> [ 2.403614] device_driver_attach+0x7c/0x88
> >> [ 2.407810] __driver_attach+0x60/0xe8
> >> [ 2.411570] bus_for_each_dev+0x7c/0xd0
> >> [ 2.415419] driver_attach+0x2c/0x38
> >> [ 2.419004] bus_add_driver+0x194/0x1f8
> >> [ 2.422851] driver_register+0x6c/0x128
> >> [ 2.426698] __pci_register_driver+0x4c/0x58
> >> [ 2.430983] enetc_pf_driver_init+0x2c/0x38
> >> [ 2.435181] do_one_initcall+0x54/0x2d8
> >> [ 2.439029] kernel_init_freeable+0x1fc/0x268
> >> [ 2.443403] kernel_init+0x1c/0x120
> >> [ 2.446904] ret_from_fork+0x10/0x30
> >> [ 2.450502] kobject_add_internal failed for
> >> 0000:00:00.1--0000:00:00.1 with -EEXIST, don't try to register things
> >> with the same name in the same directory.
> >>
> >> Looks like it will generate that link twice? Let me know if I can help
> >> testing.
> >>
> >> See also: https://lavalab.kontron.com/scheduler/job/3894#L831
> >
> > I'll look into this this week. Is the DT for this board in upstream?
> > If so, can you point me to the DT file(s)?
>
> arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-kbox-a-230-ls.dts
>
> > Also, can you give me the output of this?
> > find /sys/devices -type d | grep "0000:00:00.1"
>
> # uname -a
> Linux buildroot 5.11.0-rc1-next-20210104 #298 SMP PREEMPT Tue Jan 5
> 21:55:23 CET 2021 aarch64 GNU/Linux
> # find /sys/devices -type d | grep "0000:00:00.1"
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/power
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/statistics
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/power
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-6
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-6/byte_queue_limits
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-4
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-4/byte_queue_limits
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-7
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-2
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-2/byte_queue_limits
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-5
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-0
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-0/byte_queue_limits
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-3
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-1
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-7
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-7/byte_queue_limits
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-5
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-5/byte_queue_limits
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-3
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-3/byte_queue_limits
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-6
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-1
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/tx-1/byte_queue_limits
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-4
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-2
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/net/eno1/queues/rx-0
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/statistics
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/power
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/statistics
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/regulator
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/regulator/regulator.3
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/regulator/regulator.3/power
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/regulator/regulator.4
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/regulator/regulator.4/power
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/mdio_bus/0000:00:00.1/0000:00:00.1:04/power
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/msi_irqs
> /sys/devices/platform/soc/1f0000000.pcie/pci0000:00/0000:00:00.1/link
> /sys/devices/virtual/devlink/5000000.iommu--0000:00:00.1


Sent a fix. Any further discussion will most likely continue in that thread.
https://lore.kernel.org/lkml/[email protected]/T/#u

-Saravana