2023-08-03 22:53:16

by Rob Clark

[permalink] [raw]
Subject: [PATCH v3 0/9] drm/msm+PM+icc: Make job_run() reclaim-safe

From: Rob Clark <[email protected]>

Inspired by https://lore.kernel.org/dri-devel/[email protected]/
it seemed like a good idea to get rid of memory allocation in job_run()
fence signaling path, and use lockdep annotations to yell at us about
anything that could deadlock against shrinker/reclaim. Anything that
can trigger reclaim, or block on any other thread that has triggered
reclaim, can block the GPU shrinker from releasing memory if it is
waiting the job to complete, causing deadlock.

The first two patches decouple allocation from devfreq->lock, and teach
lockdep that devfreq->lock can be acquired in paths that the shrinker
indirectly depends on.

The next three patches do the same for PM QoS. And the next two do a
similar thing for interconnect.

And then finally the last two patches enable the lockdep fence-
signalling annotations.


v2: Switch from embedding hw_fence in submit/job object to preallocating
the hw_fence. Rework "fenced unpin" locking to drop obj lock from
fence signaling path (ie. the part that was still WIP in the first
iteration of the patchset). Adds the final patch to enable fence
signaling annotations now that job_run() and job_free() are safe.
The PM devfreq/QoS and interconnect patches are unchanged.

v3: Mostly unchanged, but series is much smaller now that drm changes
have landed, mostly consisting of the remaining devfreq/qos/
interconnect fixes.

Rob Clark (9):
PM / devfreq: Drop unneed locking to appease lockdep
PM / devfreq: Teach lockdep about locking order
PM / QoS: Fix constraints alloc vs reclaim locking
PM / QoS: Decouple request alloc from dev_pm_qos_mtx
PM / QoS: Teach lockdep about dev_pm_qos_mtx locking order
interconnect: Fix locking for runpm vs reclaim
interconnect: Teach lockdep about icc_bw_lock order
drm/sched: Add (optional) fence signaling annotation
drm/msm: Enable fence signalling annotations

drivers/base/power/qos.c | 85 +++++++++++++++++++-------
drivers/devfreq/devfreq.c | 52 ++++++++--------
drivers/gpu/drm/msm/msm_ringbuffer.c | 1 +
drivers/gpu/drm/scheduler/sched_main.c | 9 +++
drivers/interconnect/core.c | 18 +++++-
include/drm/gpu_scheduler.h | 2 +
6 files changed, 117 insertions(+), 50 deletions(-)

--
2.41.0



2023-08-03 23:14:24

by Rob Clark

[permalink] [raw]
Subject: [PATCH v3 6/9] interconnect: Fix locking for runpm vs reclaim

From: Rob Clark <[email protected]>

For cases where icc_bw_set() can be called in callbaths that could
deadlock against shrinker/reclaim, such as runpm resume, we need to
decouple the icc locking. Introduce a new icc_bw_lock for cases where
we need to serialize bw aggregation and update to decouple that from
paths that require memory allocation such as node/link creation/
destruction.

Fixes this lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
6.2.0-rc8-debug+ #554 Not tainted
------------------------------------------------------
ring0/132 is trying to acquire lock:
ffffff80871916d0 (&gmu->lock){+.+.}-{3:3}, at: a6xx_pm_resume+0xf0/0x234

but task is already holding lock:
ffffffdb5aee57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (dma_fence_map){++++}-{0:0}:
__dma_fence_might_wait+0x74/0xc0
dma_resv_lockdep+0x1f4/0x2f4
do_one_initcall+0x104/0x2bc
kernel_init_freeable+0x344/0x34c
kernel_init+0x30/0x134
ret_from_fork+0x10/0x20

-> #3 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
fs_reclaim_acquire+0x80/0xa8
slab_pre_alloc_hook.constprop.0+0x40/0x25c
__kmem_cache_alloc_node+0x60/0x1cc
__kmalloc+0xd8/0x100
topology_parse_cpu_capacity+0x8c/0x178
get_cpu_for_node+0x88/0xc4
parse_cluster+0x1b0/0x28c
parse_cluster+0x8c/0x28c
init_cpu_topology+0x168/0x188
smp_prepare_cpus+0x24/0xf8
kernel_init_freeable+0x18c/0x34c
kernel_init+0x30/0x134
ret_from_fork+0x10/0x20

-> #2 (fs_reclaim){+.+.}-{0:0}:
__fs_reclaim_acquire+0x3c/0x48
fs_reclaim_acquire+0x54/0xa8
slab_pre_alloc_hook.constprop.0+0x40/0x25c
__kmem_cache_alloc_node+0x60/0x1cc
__kmalloc+0xd8/0x100
kzalloc.constprop.0+0x14/0x20
icc_node_create_nolock+0x4c/0xc4
icc_node_create+0x38/0x58
qcom_icc_rpmh_probe+0x1b8/0x248
platform_probe+0x70/0xc4
really_probe+0x158/0x290
__driver_probe_device+0xc8/0xe0
driver_probe_device+0x44/0x100
__driver_attach+0xf8/0x108
bus_for_each_dev+0x78/0xc4
driver_attach+0x2c/0x38
bus_add_driver+0xd0/0x1d8
driver_register+0xbc/0xf8
__platform_driver_register+0x30/0x3c
qnoc_driver_init+0x24/0x30
do_one_initcall+0x104/0x2bc
kernel_init_freeable+0x344/0x34c
kernel_init+0x30/0x134
ret_from_fork+0x10/0x20

-> #1 (icc_lock){+.+.}-{3:3}:
__mutex_lock+0xcc/0x3c8
mutex_lock_nested+0x30/0x44
icc_set_bw+0x88/0x2b4
_set_opp_bw+0x8c/0xd8
_set_opp+0x19c/0x300
dev_pm_opp_set_opp+0x84/0x94
a6xx_gmu_resume+0x18c/0x804
a6xx_pm_resume+0xf8/0x234
adreno_runtime_resume+0x2c/0x38
pm_generic_runtime_resume+0x30/0x44
__rpm_callback+0x15c/0x174
rpm_callback+0x78/0x7c
rpm_resume+0x318/0x524
__pm_runtime_resume+0x78/0xbc
adreno_load_gpu+0xc4/0x17c
msm_open+0x50/0x120
drm_file_alloc+0x17c/0x228
drm_open_helper+0x74/0x118
drm_open+0xa0/0x144
drm_stub_open+0xd4/0xe4
chrdev_open+0x1b8/0x1e4
do_dentry_open+0x2f8/0x38c
vfs_open+0x34/0x40
path_openat+0x64c/0x7b4
do_filp_open+0x54/0xc4
do_sys_openat2+0x9c/0x100
do_sys_open+0x50/0x7c
__arm64_sys_openat+0x28/0x34
invoke_syscall+0x8c/0x128
el0_svc_common.constprop.0+0xa0/0x11c
do_el0_svc+0xac/0xbc
el0_svc+0x48/0xa0
el0t_64_sync_handler+0xac/0x13c
el0t_64_sync+0x190/0x194

-> #0 (&gmu->lock){+.+.}-{3:3}:
__lock_acquire+0xe00/0x1060
lock_acquire+0x1e0/0x2f8
__mutex_lock+0xcc/0x3c8
mutex_lock_nested+0x30/0x44
a6xx_pm_resume+0xf0/0x234
adreno_runtime_resume+0x2c/0x38
pm_generic_runtime_resume+0x30/0x44
__rpm_callback+0x15c/0x174
rpm_callback+0x78/0x7c
rpm_resume+0x318/0x524
__pm_runtime_resume+0x78/0xbc
pm_runtime_get_sync.isra.0+0x14/0x20
msm_gpu_submit+0x58/0x178
msm_job_run+0x78/0x150
drm_sched_main+0x290/0x370
kthread+0xf0/0x100
ret_from_fork+0x10/0x20

other info that might help us debug this:

Chain exists of:
&gmu->lock --> mmu_notifier_invalidate_range_start --> dma_fence_map

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(dma_fence_map);
lock(mmu_notifier_invalidate_range_start);
lock(dma_fence_map);
lock(&gmu->lock);

*** DEADLOCK ***

2 locks held by ring0/132:
#0: ffffff8087191170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
#1: ffffffdb5aee57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150

stack backtrace:
CPU: 7 PID: 132 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #554
Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
Call trace:
dump_backtrace.part.0+0xb4/0xf8
show_stack+0x20/0x38
dump_stack_lvl+0x9c/0xd0
dump_stack+0x18/0x34
print_circular_bug+0x1b4/0x1f0
check_noncircular+0x78/0xac
__lock_acquire+0xe00/0x1060
lock_acquire+0x1e0/0x2f8
__mutex_lock+0xcc/0x3c8
mutex_lock_nested+0x30/0x44
a6xx_pm_resume+0xf0/0x234
adreno_runtime_resume+0x2c/0x38
pm_generic_runtime_resume+0x30/0x44
__rpm_callback+0x15c/0x174
rpm_callback+0x78/0x7c
rpm_resume+0x318/0x524
__pm_runtime_resume+0x78/0xbc
pm_runtime_get_sync.isra.0+0x14/0x20
msm_gpu_submit+0x58/0x178
msm_job_run+0x78/0x150
drm_sched_main+0x290/0x370
kthread+0xf0/0x100
ret_from_fork+0x10/0x20

Signed-off-by: Rob Clark <[email protected]>
---
drivers/interconnect/core.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
index 5fac448c28fd..e15a92a79df1 100644
--- a/drivers/interconnect/core.c
+++ b/drivers/interconnect/core.c
@@ -28,6 +28,7 @@ static LIST_HEAD(icc_providers);
static int providers_count;
static bool synced_state;
static DEFINE_MUTEX(icc_lock);
+static DEFINE_MUTEX(icc_bw_lock);
static struct dentry *icc_debugfs_dir;

static void icc_summary_show_one(struct seq_file *s, struct icc_node *n)
@@ -631,7 +632,7 @@ int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
if (WARN_ON(IS_ERR(path) || !path->num_nodes))
return -EINVAL;

- mutex_lock(&icc_lock);
+ mutex_lock(&icc_bw_lock);

old_avg = path->reqs[0].avg_bw;
old_peak = path->reqs[0].peak_bw;
@@ -663,7 +664,7 @@ int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
apply_constraints(path);
}

- mutex_unlock(&icc_lock);
+ mutex_unlock(&icc_bw_lock);

trace_icc_set_bw_end(path, ret);

@@ -872,6 +873,7 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider)
return;

mutex_lock(&icc_lock);
+ mutex_lock(&icc_bw_lock);

node->provider = provider;
list_add_tail(&node->node_list, &provider->nodes);
@@ -900,6 +902,7 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider)
node->avg_bw = 0;
node->peak_bw = 0;

+ mutex_unlock(&icc_bw_lock);
mutex_unlock(&icc_lock);
}
EXPORT_SYMBOL_GPL(icc_node_add);
@@ -1025,6 +1028,7 @@ void icc_sync_state(struct device *dev)
return;

mutex_lock(&icc_lock);
+ mutex_lock(&icc_bw_lock);
synced_state = true;
list_for_each_entry(p, &icc_providers, provider_list) {
dev_dbg(p->dev, "interconnect provider is in synced state\n");
--
2.41.0


2023-08-03 23:22:40

by Rob Clark

[permalink] [raw]
Subject: [PATCH v3 2/9] PM / devfreq: Teach lockdep about locking order

From: Rob Clark <[email protected]>

This will make it easier to catch places doing allocations that can
trigger reclaim under devfreq->lock.

Because devfreq->lock is held over various devfreq_dev_profile
callbacks, there might be some fallout if those callbacks do allocations
that can trigger reclaim, but I've looked through the various callback
implementations and don't see anything obvious. If it does trigger any
lockdep splats, those should be fixed.

Signed-off-by: Rob Clark <[email protected]>
---
drivers/devfreq/devfreq.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index e5558ec68ce8..81add6064406 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -817,6 +817,12 @@ struct devfreq *devfreq_add_device(struct device *dev,
}

mutex_init(&devfreq->lock);
+
+ /* Teach lockdep about lock ordering wrt. shrinker: */
+ fs_reclaim_acquire(GFP_KERNEL);
+ might_lock(&devfreq->lock);
+ fs_reclaim_release(GFP_KERNEL);
+
devfreq->dev.parent = dev;
devfreq->dev.class = devfreq_class;
devfreq->dev.release = devfreq_dev_release;
--
2.41.0


2023-08-04 00:09:47

by Rob Clark

[permalink] [raw]
Subject: [PATCH v3 4/9] PM / QoS: Decouple request alloc from dev_pm_qos_mtx

From: Rob Clark <[email protected]>

Similar to the previous patch, move the allocation out from under
dev_pm_qos_mtx, by speculatively doing the allocation and handle
any race after acquiring dev_pm_qos_mtx by freeing the redundant
allocation.

Signed-off-by: Rob Clark <[email protected]>
---
drivers/base/power/qos.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index f3e0c6b65635..4537d93ddb45 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -922,17 +922,19 @@ s32 dev_pm_qos_get_user_latency_tolerance(struct device *dev)
*/
int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
{
+ struct dev_pm_qos_request *req = NULL;
int ret;

ret = dev_pm_qos_constraints_ensure_allocated(dev);
if (ret)
return ret;

+ if (!dev->power.qos->latency_tolerance_req)
+ req = kzalloc(sizeof(*req), GFP_KERNEL);
+
mutex_lock(&dev_pm_qos_mtx);

if (!dev->power.qos->latency_tolerance_req) {
- struct dev_pm_qos_request *req;
-
if (val < 0) {
if (val == PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT)
ret = 0;
@@ -940,7 +942,6 @@ int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
ret = -EINVAL;
goto out;
}
- req = kzalloc(sizeof(*req), GFP_KERNEL);
if (!req) {
ret = -ENOMEM;
goto out;
@@ -952,6 +953,13 @@ int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
}
dev->power.qos->latency_tolerance_req = req;
} else {
+ /*
+ * If we raced with another thread to allocate the request,
+ * simply free the redundant allocation and move on.
+ */
+ if (req)
+ kfree(req);
+
if (val < 0) {
__dev_pm_qos_drop_user_request(dev, DEV_PM_QOS_LATENCY_TOLERANCE);
ret = 0;
--
2.41.0


2023-08-04 00:13:49

by Rob Clark

[permalink] [raw]
Subject: [PATCH v3 3/9] PM / QoS: Fix constraints alloc vs reclaim locking

From: Rob Clark <[email protected]>

In the process of adding lockdep annotation for drm GPU scheduler's
job_run() to detect potential deadlock against shrinker/reclaim, I hit
this lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
6.2.0-rc8-debug+ #558 Tainted: G W
------------------------------------------------------
ring0/125 is trying to acquire lock:
ffffffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x38/0x68

but task is already holding lock:
ffffff8087239208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&gpu->active_lock){+.+.}-{3:3}:
__mutex_lock+0xcc/0x3c8
mutex_lock_nested+0x30/0x44
msm_gpu_submit+0xec/0x178
msm_job_run+0x78/0x150
drm_sched_main+0x290/0x370
kthread+0xf0/0x100
ret_from_fork+0x10/0x20

-> #3 (dma_fence_map){++++}-{0:0}:
__dma_fence_might_wait+0x74/0xc0
dma_resv_lockdep+0x1f4/0x2f4
do_one_initcall+0x104/0x2bc
kernel_init_freeable+0x344/0x34c
kernel_init+0x30/0x134
ret_from_fork+0x10/0x20

-> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
fs_reclaim_acquire+0x80/0xa8
slab_pre_alloc_hook.constprop.0+0x40/0x25c
__kmem_cache_alloc_node+0x60/0x1cc
__kmalloc+0xd8/0x100
topology_parse_cpu_capacity+0x8c/0x178
get_cpu_for_node+0x88/0xc4
parse_cluster+0x1b0/0x28c
parse_cluster+0x8c/0x28c
init_cpu_topology+0x168/0x188
smp_prepare_cpus+0x24/0xf8
kernel_init_freeable+0x18c/0x34c
kernel_init+0x30/0x134
ret_from_fork+0x10/0x20

-> #1 (fs_reclaim){+.+.}-{0:0}:
__fs_reclaim_acquire+0x3c/0x48
fs_reclaim_acquire+0x54/0xa8
slab_pre_alloc_hook.constprop.0+0x40/0x25c
__kmem_cache_alloc_node+0x60/0x1cc
kmalloc_trace+0x50/0xa8
dev_pm_qos_constraints_allocate+0x38/0x100
__dev_pm_qos_add_request+0xb0/0x1e8
dev_pm_qos_add_request+0x58/0x80
dev_pm_qos_expose_latency_limit+0x60/0x13c
register_cpu+0x12c/0x130
topology_init+0xac/0xbc
do_one_initcall+0x104/0x2bc
kernel_init_freeable+0x344/0x34c
kernel_init+0x30/0x134
ret_from_fork+0x10/0x20

-> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
__lock_acquire+0xe00/0x1060
lock_acquire+0x1e0/0x2f8
__mutex_lock+0xcc/0x3c8
mutex_lock_nested+0x30/0x44
dev_pm_qos_update_request+0x38/0x68
msm_devfreq_boost+0x40/0x70
msm_devfreq_active+0xc0/0xf0
msm_gpu_submit+0x10c/0x178
msm_job_run+0x78/0x150
drm_sched_main+0x290/0x370
kthread+0xf0/0x100
ret_from_fork+0x10/0x20

other info that might help us debug this:

Chain exists of:
dev_pm_qos_mtx --> dma_fence_map --> &gpu->active_lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&gpu->active_lock);
lock(dma_fence_map);
lock(&gpu->active_lock);
lock(dev_pm_qos_mtx);

*** DEADLOCK ***

3 locks held by ring0/123:
#0: ffffff8087251170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
#1: ffffffd00b0e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150
#2: ffffff8087251208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178

stack backtrace:
CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
Call trace:
dump_backtrace.part.0+0xb4/0xf8
show_stack+0x20/0x38
dump_stack_lvl+0x9c/0xd0
dump_stack+0x18/0x34
print_circular_bug+0x1b4/0x1f0
check_noncircular+0x78/0xac
__lock_acquire+0xe00/0x1060
lock_acquire+0x1e0/0x2f8
__mutex_lock+0xcc/0x3c8
mutex_lock_nested+0x30/0x44
dev_pm_qos_update_request+0x38/0x68
msm_devfreq_boost+0x40/0x70
msm_devfreq_active+0xc0/0xf0
msm_gpu_submit+0x10c/0x178
msm_job_run+0x78/0x150
drm_sched_main+0x290/0x370
kthread+0xf0/0x100
ret_from_fork+0x10/0x20

The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
freq change) path, but it is also held across allocations that could
recurse into shrinker.

Solve this by changing dev_pm_qos_constraints_allocate() into a function
that can be called unconditionally before the device qos object is
needed and before aquiring dev_pm_qos_mtx. This way the allocations can
be done without holding the mutex. In the case that we raced with
another thread to allocate the qos object, detect this *after* acquiring
the dev_pm_qos_mtx and simply free the redundant allocations.

Signed-off-by: Rob Clark <[email protected]>
---
drivers/base/power/qos.c | 60 +++++++++++++++++++++++++++-------------
1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index 8e93167f1783..f3e0c6b65635 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -185,18 +185,24 @@ static int apply_constraint(struct dev_pm_qos_request *req,
}

/*
- * dev_pm_qos_constraints_allocate
+ * dev_pm_qos_constraints_ensure_allocated
* @dev: device to allocate data for
*
- * Called at the first call to add_request, for constraint data allocation
- * Must be called with the dev_pm_qos_mtx mutex held
+ * Called to ensure that devices qos is allocated, before acquiring
+ * dev_pm_qos_mtx.
*/
-static int dev_pm_qos_constraints_allocate(struct device *dev)
+static int dev_pm_qos_constraints_ensure_allocated(struct device *dev)
{
struct dev_pm_qos *qos;
struct pm_qos_constraints *c;
struct blocking_notifier_head *n;

+ if (!dev)
+ return -ENODEV;
+
+ if (!IS_ERR_OR_NULL(dev->power.qos))
+ return 0;
+
qos = kzalloc(sizeof(*qos), GFP_KERNEL);
if (!qos)
return -ENOMEM;
@@ -227,10 +233,26 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)

INIT_LIST_HEAD(&qos->flags.list);

+ mutex_lock(&dev_pm_qos_mtx);
+
+ if (!IS_ERR_OR_NULL(dev->power.qos)) {
+ /*
+ * We have raced with another task to create the qos.
+ * No biggie, just free the resources we've allocated
+ * outside of dev_pm_qos_mtx and move on with life.
+ */
+ kfree(n);
+ kfree(qos);
+ goto unlock;
+ }
+
spin_lock_irq(&dev->power.lock);
dev->power.qos = qos;
spin_unlock_irq(&dev->power.lock);

+unlock:
+ mutex_unlock(&dev_pm_qos_mtx);
+
return 0;
}

@@ -331,17 +353,15 @@ static int __dev_pm_qos_add_request(struct device *dev,
{
int ret = 0;

- if (!dev || !req || dev_pm_qos_invalid_req_type(dev, type))
+ if (!req || dev_pm_qos_invalid_req_type(dev, type))
return -EINVAL;

if (WARN(dev_pm_qos_request_active(req),
"%s() called for already added request\n", __func__))
return -EINVAL;

- if (IS_ERR(dev->power.qos))
+ if (IS_ERR_OR_NULL(dev->power.qos))
ret = -ENODEV;
- else if (!dev->power.qos)
- ret = dev_pm_qos_constraints_allocate(dev);

trace_dev_pm_qos_add_request(dev_name(dev), type, value);
if (ret)
@@ -390,6 +410,10 @@ int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
{
int ret;

+ ret = dev_pm_qos_constraints_ensure_allocated(dev);
+ if (ret)
+ return ret;
+
mutex_lock(&dev_pm_qos_mtx);
ret = __dev_pm_qos_add_request(dev, req, type, value);
mutex_unlock(&dev_pm_qos_mtx);
@@ -537,15 +561,11 @@ int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier,
{
int ret = 0;

- mutex_lock(&dev_pm_qos_mtx);
-
- if (IS_ERR(dev->power.qos))
- ret = -ENODEV;
- else if (!dev->power.qos)
- ret = dev_pm_qos_constraints_allocate(dev);
-
+ ret = dev_pm_qos_constraints_ensure_allocated(dev);
if (ret)
- goto unlock;
+ return ret;
+
+ mutex_lock(&dev_pm_qos_mtx);

switch (type) {
case DEV_PM_QOS_RESUME_LATENCY:
@@ -565,7 +585,6 @@ int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier,
ret = -EINVAL;
}

-unlock:
mutex_unlock(&dev_pm_qos_mtx);
return ret;
}
@@ -905,10 +924,13 @@ int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
{
int ret;

+ ret = dev_pm_qos_constraints_ensure_allocated(dev);
+ if (ret)
+ return ret;
+
mutex_lock(&dev_pm_qos_mtx);

- if (IS_ERR_OR_NULL(dev->power.qos)
- || !dev->power.qos->latency_tolerance_req) {
+ if (!dev->power.qos->latency_tolerance_req) {
struct dev_pm_qos_request *req;

if (val < 0) {
--
2.41.0


2023-08-04 10:13:06

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH v3 4/9] PM / QoS: Decouple request alloc from dev_pm_qos_mtx

Hi Rob,

kernel test robot noticed the following build warnings:

https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Rob-Clark/PM-devfreq-Drop-unneed-locking-to-appease-lockdep/20230804-060505
base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link: https://lore.kernel.org/r/20230803220202.78036-5-robdclark%40gmail.com
patch subject: [PATCH v3 4/9] PM / QoS: Decouple request alloc from dev_pm_qos_mtx
config: i386-randconfig-m021-20230730 (https://download.01.org/0day-ci/archive/20230804/[email protected]/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230804/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Reported-by: Dan Carpenter <[email protected]>
| Closes: https://lore.kernel.org/r/[email protected]/

smatch warnings:
drivers/base/power/qos.c:973 dev_pm_qos_update_user_latency_tolerance() warn: possible memory leak of 'req'

vim +/req +973 drivers/base/power/qos.c

2d984ad132a87c Rafael J. Wysocki 2014-02-11 923 int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
2d984ad132a87c Rafael J. Wysocki 2014-02-11 924 {
b5ac35ff4296f7 Rob Clark 2023-08-03 925 struct dev_pm_qos_request *req = NULL;
2d984ad132a87c Rafael J. Wysocki 2014-02-11 926 int ret;
2d984ad132a87c Rafael J. Wysocki 2014-02-11 927
211fb32e3a0547 Rob Clark 2023-08-03 928 ret = dev_pm_qos_constraints_ensure_allocated(dev);
211fb32e3a0547 Rob Clark 2023-08-03 929 if (ret)
211fb32e3a0547 Rob Clark 2023-08-03 930 return ret;
211fb32e3a0547 Rob Clark 2023-08-03 931
b5ac35ff4296f7 Rob Clark 2023-08-03 932 if (!dev->power.qos->latency_tolerance_req)
b5ac35ff4296f7 Rob Clark 2023-08-03 933 req = kzalloc(sizeof(*req), GFP_KERNEL);
b5ac35ff4296f7 Rob Clark 2023-08-03 934
2d984ad132a87c Rafael J. Wysocki 2014-02-11 935 mutex_lock(&dev_pm_qos_mtx);
2d984ad132a87c Rafael J. Wysocki 2014-02-11 936
211fb32e3a0547 Rob Clark 2023-08-03 937 if (!dev->power.qos->latency_tolerance_req) {
2d984ad132a87c Rafael J. Wysocki 2014-02-11 938 if (val < 0) {
80a6f7c79b7822 Andrew Lutomirski 2016-11-29 939 if (val == PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT)
80a6f7c79b7822 Andrew Lutomirski 2016-11-29 940 ret = 0;
80a6f7c79b7822 Andrew Lutomirski 2016-11-29 941 else
2d984ad132a87c Rafael J. Wysocki 2014-02-11 942 ret = -EINVAL;
2d984ad132a87c Rafael J. Wysocki 2014-02-11 943 goto out;

kfree(req);?

2d984ad132a87c Rafael J. Wysocki 2014-02-11 944 }
2d984ad132a87c Rafael J. Wysocki 2014-02-11 945 if (!req) {
2d984ad132a87c Rafael J. Wysocki 2014-02-11 946 ret = -ENOMEM;
2d984ad132a87c Rafael J. Wysocki 2014-02-11 947 goto out;
2d984ad132a87c Rafael J. Wysocki 2014-02-11 948 }
2d984ad132a87c Rafael J. Wysocki 2014-02-11 949 ret = __dev_pm_qos_add_request(dev, req, DEV_PM_QOS_LATENCY_TOLERANCE, val);
2d984ad132a87c Rafael J. Wysocki 2014-02-11 950 if (ret < 0) {
2d984ad132a87c Rafael J. Wysocki 2014-02-11 951 kfree(req);
2d984ad132a87c Rafael J. Wysocki 2014-02-11 952 goto out;
2d984ad132a87c Rafael J. Wysocki 2014-02-11 953 }
2d984ad132a87c Rafael J. Wysocki 2014-02-11 954 dev->power.qos->latency_tolerance_req = req;
2d984ad132a87c Rafael J. Wysocki 2014-02-11 955 } else {
b5ac35ff4296f7 Rob Clark 2023-08-03 956 /*
b5ac35ff4296f7 Rob Clark 2023-08-03 957 * If we raced with another thread to allocate the request,
b5ac35ff4296f7 Rob Clark 2023-08-03 958 * simply free the redundant allocation and move on.
b5ac35ff4296f7 Rob Clark 2023-08-03 959 */
b5ac35ff4296f7 Rob Clark 2023-08-03 960 if (req)
b5ac35ff4296f7 Rob Clark 2023-08-03 961 kfree(req);
b5ac35ff4296f7 Rob Clark 2023-08-03 962
2d984ad132a87c Rafael J. Wysocki 2014-02-11 963 if (val < 0) {
2d984ad132a87c Rafael J. Wysocki 2014-02-11 964 __dev_pm_qos_drop_user_request(dev, DEV_PM_QOS_LATENCY_TOLERANCE);
2d984ad132a87c Rafael J. Wysocki 2014-02-11 965 ret = 0;
2d984ad132a87c Rafael J. Wysocki 2014-02-11 966 } else {
2d984ad132a87c Rafael J. Wysocki 2014-02-11 967 ret = __dev_pm_qos_update_request(dev->power.qos->latency_tolerance_req, val);
2d984ad132a87c Rafael J. Wysocki 2014-02-11 968 }
2d984ad132a87c Rafael J. Wysocki 2014-02-11 969 }
2d984ad132a87c Rafael J. Wysocki 2014-02-11 970
2d984ad132a87c Rafael J. Wysocki 2014-02-11 971 out:
2d984ad132a87c Rafael J. Wysocki 2014-02-11 972 mutex_unlock(&dev_pm_qos_mtx);
2d984ad132a87c Rafael J. Wysocki 2014-02-11 @973 return ret;
2d984ad132a87c Rafael J. Wysocki 2014-02-11 974 }

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


2023-08-04 18:09:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 3/9] PM / QoS: Fix constraints alloc vs reclaim locking

On Fri, Aug 4, 2023 at 12:02 AM Rob Clark <[email protected]> wrote:
>
> From: Rob Clark <[email protected]>
>
> In the process of adding lockdep annotation for drm GPU scheduler's
> job_run() to detect potential deadlock against shrinker/reclaim, I hit
> this lockdep splat:
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.2.0-rc8-debug+ #558 Tainted: G W
> ------------------------------------------------------
> ring0/125 is trying to acquire lock:
> ffffffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x38/0x68
>
> but task is already holding lock:
> ffffff8087239208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #4 (&gpu->active_lock){+.+.}-{3:3}:
> __mutex_lock+0xcc/0x3c8
> mutex_lock_nested+0x30/0x44
> msm_gpu_submit+0xec/0x178
> msm_job_run+0x78/0x150
> drm_sched_main+0x290/0x370
> kthread+0xf0/0x100
> ret_from_fork+0x10/0x20
>
> -> #3 (dma_fence_map){++++}-{0:0}:
> __dma_fence_might_wait+0x74/0xc0
> dma_resv_lockdep+0x1f4/0x2f4
> do_one_initcall+0x104/0x2bc
> kernel_init_freeable+0x344/0x34c
> kernel_init+0x30/0x134
> ret_from_fork+0x10/0x20
>
> -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> fs_reclaim_acquire+0x80/0xa8
> slab_pre_alloc_hook.constprop.0+0x40/0x25c
> __kmem_cache_alloc_node+0x60/0x1cc
> __kmalloc+0xd8/0x100
> topology_parse_cpu_capacity+0x8c/0x178
> get_cpu_for_node+0x88/0xc4
> parse_cluster+0x1b0/0x28c
> parse_cluster+0x8c/0x28c
> init_cpu_topology+0x168/0x188
> smp_prepare_cpus+0x24/0xf8
> kernel_init_freeable+0x18c/0x34c
> kernel_init+0x30/0x134
> ret_from_fork+0x10/0x20
>
> -> #1 (fs_reclaim){+.+.}-{0:0}:
> __fs_reclaim_acquire+0x3c/0x48
> fs_reclaim_acquire+0x54/0xa8
> slab_pre_alloc_hook.constprop.0+0x40/0x25c
> __kmem_cache_alloc_node+0x60/0x1cc
> kmalloc_trace+0x50/0xa8
> dev_pm_qos_constraints_allocate+0x38/0x100
> __dev_pm_qos_add_request+0xb0/0x1e8
> dev_pm_qos_add_request+0x58/0x80
> dev_pm_qos_expose_latency_limit+0x60/0x13c
> register_cpu+0x12c/0x130
> topology_init+0xac/0xbc
> do_one_initcall+0x104/0x2bc
> kernel_init_freeable+0x344/0x34c
> kernel_init+0x30/0x134
> ret_from_fork+0x10/0x20
>
> -> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
> __lock_acquire+0xe00/0x1060
> lock_acquire+0x1e0/0x2f8
> __mutex_lock+0xcc/0x3c8
> mutex_lock_nested+0x30/0x44
> dev_pm_qos_update_request+0x38/0x68
> msm_devfreq_boost+0x40/0x70
> msm_devfreq_active+0xc0/0xf0
> msm_gpu_submit+0x10c/0x178
> msm_job_run+0x78/0x150
> drm_sched_main+0x290/0x370
> kthread+0xf0/0x100
> ret_from_fork+0x10/0x20
>
> other info that might help us debug this:
>
> Chain exists of:
> dev_pm_qos_mtx --> dma_fence_map --> &gpu->active_lock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&gpu->active_lock);
> lock(dma_fence_map);
> lock(&gpu->active_lock);
> lock(dev_pm_qos_mtx);
>
> *** DEADLOCK ***
>
> 3 locks held by ring0/123:
> #0: ffffff8087251170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
> #1: ffffffd00b0e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150
> #2: ffffff8087251208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
>
> stack backtrace:
> CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
> Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> Call trace:
> dump_backtrace.part.0+0xb4/0xf8
> show_stack+0x20/0x38
> dump_stack_lvl+0x9c/0xd0
> dump_stack+0x18/0x34
> print_circular_bug+0x1b4/0x1f0
> check_noncircular+0x78/0xac
> __lock_acquire+0xe00/0x1060
> lock_acquire+0x1e0/0x2f8
> __mutex_lock+0xcc/0x3c8
> mutex_lock_nested+0x30/0x44
> dev_pm_qos_update_request+0x38/0x68
> msm_devfreq_boost+0x40/0x70
> msm_devfreq_active+0xc0/0xf0
> msm_gpu_submit+0x10c/0x178
> msm_job_run+0x78/0x150
> drm_sched_main+0x290/0x370
> kthread+0xf0/0x100
> ret_from_fork+0x10/0x20
>
> The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> freq change) path, but it is also held across allocations that could
> recurse into shrinker.
>
> Solve this by changing dev_pm_qos_constraints_allocate() into a function
> that can be called unconditionally before the device qos object is
> needed and before aquiring dev_pm_qos_mtx. This way the allocations can
> be done without holding the mutex. In the case that we raced with
> another thread to allocate the qos object, detect this *after* acquiring
> the dev_pm_qos_mtx and simply free the redundant allocations.
>
> Signed-off-by: Rob Clark <[email protected]>
> ---
> drivers/base/power/qos.c | 60 +++++++++++++++++++++++++++-------------
> 1 file changed, 41 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> index 8e93167f1783..f3e0c6b65635 100644
> --- a/drivers/base/power/qos.c
> +++ b/drivers/base/power/qos.c
> @@ -185,18 +185,24 @@ static int apply_constraint(struct dev_pm_qos_request *req,
> }
>
> /*
> - * dev_pm_qos_constraints_allocate
> + * dev_pm_qos_constraints_ensure_allocated
> * @dev: device to allocate data for
> *
> - * Called at the first call to add_request, for constraint data allocation
> - * Must be called with the dev_pm_qos_mtx mutex held
> + * Called to ensure that devices qos is allocated, before acquiring
> + * dev_pm_qos_mtx.
> */
> -static int dev_pm_qos_constraints_allocate(struct device *dev)
> +static int dev_pm_qos_constraints_ensure_allocated(struct device *dev)
> {
> struct dev_pm_qos *qos;
> struct pm_qos_constraints *c;
> struct blocking_notifier_head *n;
>
> + if (!dev)
> + return -ENODEV;
> +
> + if (!IS_ERR_OR_NULL(dev->power.qos))
> + return 0;
> +
> qos = kzalloc(sizeof(*qos), GFP_KERNEL);
> if (!qos)
> return -ENOMEM;
> @@ -227,10 +233,26 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
>
> INIT_LIST_HEAD(&qos->flags.list);
>
> + mutex_lock(&dev_pm_qos_mtx);
> +
> + if (!IS_ERR_OR_NULL(dev->power.qos)) {
> + /*
> + * We have raced with another task to create the qos.
> + * No biggie, just free the resources we've allocated
> + * outside of dev_pm_qos_mtx and move on with life.
> + */
> + kfree(n);
> + kfree(qos);
> + goto unlock;
> + }
> +
> spin_lock_irq(&dev->power.lock);
> dev->power.qos = qos;
> spin_unlock_irq(&dev->power.lock);
>
> +unlock:
> + mutex_unlock(&dev_pm_qos_mtx);
> +
> return 0;
> }
>
> @@ -331,17 +353,15 @@ static int __dev_pm_qos_add_request(struct device *dev,
> {
> int ret = 0;
>
> - if (!dev || !req || dev_pm_qos_invalid_req_type(dev, type))
> + if (!req || dev_pm_qos_invalid_req_type(dev, type))
> return -EINVAL;
>
> if (WARN(dev_pm_qos_request_active(req),
> "%s() called for already added request\n", __func__))
> return -EINVAL;
>
> - if (IS_ERR(dev->power.qos))
> + if (IS_ERR_OR_NULL(dev->power.qos))
> ret = -ENODEV;
> - else if (!dev->power.qos)
> - ret = dev_pm_qos_constraints_allocate(dev);
>
> trace_dev_pm_qos_add_request(dev_name(dev), type, value);
> if (ret)
> @@ -390,6 +410,10 @@ int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> {
> int ret;
>
> + ret = dev_pm_qos_constraints_ensure_allocated(dev);
> + if (ret)
> + return ret;
> +

It is a bit unfortunate that the mutex is dropped and then immediately
re-acquired again. I don't think that this is strictly necessary.

> mutex_lock(&dev_pm_qos_mtx);
> ret = __dev_pm_qos_add_request(dev, req, type, value);
> mutex_unlock(&dev_pm_qos_mtx);
> @@ -537,15 +561,11 @@ int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier,
> {
> int ret = 0;
>
> - mutex_lock(&dev_pm_qos_mtx);
> -
> - if (IS_ERR(dev->power.qos))
> - ret = -ENODEV;
> - else if (!dev->power.qos)
> - ret = dev_pm_qos_constraints_allocate(dev);
> -
> + ret = dev_pm_qos_constraints_ensure_allocated(dev);
> if (ret)
> - goto unlock;
> + return ret;
> +
> + mutex_lock(&dev_pm_qos_mtx);
>
> switch (type) {
> case DEV_PM_QOS_RESUME_LATENCY:
> @@ -565,7 +585,6 @@ int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier,
> ret = -EINVAL;
> }
>
> -unlock:
> mutex_unlock(&dev_pm_qos_mtx);
> return ret;
> }
> @@ -905,10 +924,13 @@ int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
> {
> int ret;
>
> + ret = dev_pm_qos_constraints_ensure_allocated(dev);
> + if (ret)
> + return ret;
> +
> mutex_lock(&dev_pm_qos_mtx);
>
> - if (IS_ERR_OR_NULL(dev->power.qos)
> - || !dev->power.qos->latency_tolerance_req) {
> + if (!dev->power.qos->latency_tolerance_req) {
> struct dev_pm_qos_request *req;
>
> if (val < 0) {
> --
> 2.41.0
>

2023-08-04 20:21:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 3/9] PM / QoS: Fix constraints alloc vs reclaim locking

On Fri, Aug 4, 2023 at 8:38 PM Rob Clark <[email protected]> wrote:
>
> On Fri, Aug 4, 2023 at 10:07 AM Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Fri, Aug 4, 2023 at 12:02 AM Rob Clark <[email protected]> wrote:
> > >
> > > From: Rob Clark <[email protected]>
> > >
> > > In the process of adding lockdep annotation for drm GPU scheduler's
> > > job_run() to detect potential deadlock against shrinker/reclaim, I hit
> > > this lockdep splat:
> > >
> > > ======================================================
> > > WARNING: possible circular locking dependency detected
> > > 6.2.0-rc8-debug+ #558 Tainted: G W
> > > ------------------------------------------------------
> > > ring0/125 is trying to acquire lock:
> > > ffffffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x38/0x68
> > >
> > > but task is already holding lock:
> > > ffffff8087239208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> > >
> > > which lock already depends on the new lock.
> > >
> > > the existing dependency chain (in reverse order) is:
> > >
> > > -> #4 (&gpu->active_lock){+.+.}-{3:3}:
> > > __mutex_lock+0xcc/0x3c8
> > > mutex_lock_nested+0x30/0x44
> > > msm_gpu_submit+0xec/0x178
> > > msm_job_run+0x78/0x150
> > > drm_sched_main+0x290/0x370
> > > kthread+0xf0/0x100
> > > ret_from_fork+0x10/0x20
> > >
> > > -> #3 (dma_fence_map){++++}-{0:0}:
> > > __dma_fence_might_wait+0x74/0xc0
> > > dma_resv_lockdep+0x1f4/0x2f4
> > > do_one_initcall+0x104/0x2bc
> > > kernel_init_freeable+0x344/0x34c
> > > kernel_init+0x30/0x134
> > > ret_from_fork+0x10/0x20
> > >
> > > -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> > > fs_reclaim_acquire+0x80/0xa8
> > > slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > > __kmem_cache_alloc_node+0x60/0x1cc
> > > __kmalloc+0xd8/0x100
> > > topology_parse_cpu_capacity+0x8c/0x178
> > > get_cpu_for_node+0x88/0xc4
> > > parse_cluster+0x1b0/0x28c
> > > parse_cluster+0x8c/0x28c
> > > init_cpu_topology+0x168/0x188
> > > smp_prepare_cpus+0x24/0xf8
> > > kernel_init_freeable+0x18c/0x34c
> > > kernel_init+0x30/0x134
> > > ret_from_fork+0x10/0x20
> > >
> > > -> #1 (fs_reclaim){+.+.}-{0:0}:
> > > __fs_reclaim_acquire+0x3c/0x48
> > > fs_reclaim_acquire+0x54/0xa8
> > > slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > > __kmem_cache_alloc_node+0x60/0x1cc
> > > kmalloc_trace+0x50/0xa8
> > > dev_pm_qos_constraints_allocate+0x38/0x100
> > > __dev_pm_qos_add_request+0xb0/0x1e8
> > > dev_pm_qos_add_request+0x58/0x80
> > > dev_pm_qos_expose_latency_limit+0x60/0x13c
> > > register_cpu+0x12c/0x130
> > > topology_init+0xac/0xbc
> > > do_one_initcall+0x104/0x2bc
> > > kernel_init_freeable+0x344/0x34c
> > > kernel_init+0x30/0x134
> > > ret_from_fork+0x10/0x20
> > >
> > > -> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
> > > __lock_acquire+0xe00/0x1060
> > > lock_acquire+0x1e0/0x2f8
> > > __mutex_lock+0xcc/0x3c8
> > > mutex_lock_nested+0x30/0x44
> > > dev_pm_qos_update_request+0x38/0x68
> > > msm_devfreq_boost+0x40/0x70
> > > msm_devfreq_active+0xc0/0xf0
> > > msm_gpu_submit+0x10c/0x178
> > > msm_job_run+0x78/0x150
> > > drm_sched_main+0x290/0x370
> > > kthread+0xf0/0x100
> > > ret_from_fork+0x10/0x20
> > >
> > > other info that might help us debug this:
> > >
> > > Chain exists of:
> > > dev_pm_qos_mtx --> dma_fence_map --> &gpu->active_lock
> > >
> > > Possible unsafe locking scenario:
> > >
> > > CPU0 CPU1
> > > ---- ----
> > > lock(&gpu->active_lock);
> > > lock(dma_fence_map);
> > > lock(&gpu->active_lock);
> > > lock(dev_pm_qos_mtx);
> > >
> > > *** DEADLOCK ***
> > >
> > > 3 locks held by ring0/123:
> > > #0: ffffff8087251170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
> > > #1: ffffffd00b0e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150
> > > #2: ffffff8087251208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> > >
> > > stack backtrace:
> > > CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
> > > Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> > > Call trace:
> > > dump_backtrace.part.0+0xb4/0xf8
> > > show_stack+0x20/0x38
> > > dump_stack_lvl+0x9c/0xd0
> > > dump_stack+0x18/0x34
> > > print_circular_bug+0x1b4/0x1f0
> > > check_noncircular+0x78/0xac
> > > __lock_acquire+0xe00/0x1060
> > > lock_acquire+0x1e0/0x2f8
> > > __mutex_lock+0xcc/0x3c8
> > > mutex_lock_nested+0x30/0x44
> > > dev_pm_qos_update_request+0x38/0x68
> > > msm_devfreq_boost+0x40/0x70
> > > msm_devfreq_active+0xc0/0xf0
> > > msm_gpu_submit+0x10c/0x178
> > > msm_job_run+0x78/0x150
> > > drm_sched_main+0x290/0x370
> > > kthread+0xf0/0x100
> > > ret_from_fork+0x10/0x20
> > >
> > > The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> > > freq change) path, but it is also held across allocations that could
> > > recurse into shrinker.
> > >
> > > Solve this by changing dev_pm_qos_constraints_allocate() into a function
> > > that can be called unconditionally before the device qos object is
> > > needed and before aquiring dev_pm_qos_mtx. This way the allocations can
> > > be done without holding the mutex. In the case that we raced with
> > > another thread to allocate the qos object, detect this *after* acquiring
> > > the dev_pm_qos_mtx and simply free the redundant allocations.
> > >
> > > Signed-off-by: Rob Clark <[email protected]>
> > > ---
> > > drivers/base/power/qos.c | 60 +++++++++++++++++++++++++++-------------
> > > 1 file changed, 41 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> > > index 8e93167f1783..f3e0c6b65635 100644
> > > --- a/drivers/base/power/qos.c
> > > +++ b/drivers/base/power/qos.c
> > > @@ -185,18 +185,24 @@ static int apply_constraint(struct dev_pm_qos_request *req,
> > > }
> > >
> > > /*
> > > - * dev_pm_qos_constraints_allocate
> > > + * dev_pm_qos_constraints_ensure_allocated
> > > * @dev: device to allocate data for
> > > *
> > > - * Called at the first call to add_request, for constraint data allocation
> > > - * Must be called with the dev_pm_qos_mtx mutex held
> > > + * Called to ensure that devices qos is allocated, before acquiring
> > > + * dev_pm_qos_mtx.
> > > */
> > > -static int dev_pm_qos_constraints_allocate(struct device *dev)
> > > +static int dev_pm_qos_constraints_ensure_allocated(struct device *dev)
> > > {
> > > struct dev_pm_qos *qos;
> > > struct pm_qos_constraints *c;
> > > struct blocking_notifier_head *n;
> > >
> > > + if (!dev)
> > > + return -ENODEV;
> > > +
> > > + if (!IS_ERR_OR_NULL(dev->power.qos))
> > > + return 0;
> > > +
> > > qos = kzalloc(sizeof(*qos), GFP_KERNEL);
> > > if (!qos)
> > > return -ENOMEM;
> > > @@ -227,10 +233,26 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
> > >
> > > INIT_LIST_HEAD(&qos->flags.list);
> > >
> > > + mutex_lock(&dev_pm_qos_mtx);
> > > +
> > > + if (!IS_ERR_OR_NULL(dev->power.qos)) {
> > > + /*
> > > + * We have raced with another task to create the qos.
> > > + * No biggie, just free the resources we've allocated
> > > + * outside of dev_pm_qos_mtx and move on with life.
> > > + */
> > > + kfree(n);
> > > + kfree(qos);
> > > + goto unlock;
> > > + }
> > > +
> > > spin_lock_irq(&dev->power.lock);
> > > dev->power.qos = qos;
> > > spin_unlock_irq(&dev->power.lock);
> > >
> > > +unlock:
> > > + mutex_unlock(&dev_pm_qos_mtx);
> > > +
> > > return 0;
> > > }
> > >
> > > @@ -331,17 +353,15 @@ static int __dev_pm_qos_add_request(struct device *dev,
> > > {
> > > int ret = 0;
> > >
> > > - if (!dev || !req || dev_pm_qos_invalid_req_type(dev, type))
> > > + if (!req || dev_pm_qos_invalid_req_type(dev, type))
> > > return -EINVAL;
> > >
> > > if (WARN(dev_pm_qos_request_active(req),
> > > "%s() called for already added request\n", __func__))
> > > return -EINVAL;
> > >
> > > - if (IS_ERR(dev->power.qos))
> > > + if (IS_ERR_OR_NULL(dev->power.qos))
> > > ret = -ENODEV;
> > > - else if (!dev->power.qos)
> > > - ret = dev_pm_qos_constraints_allocate(dev);
> > >
> > > trace_dev_pm_qos_add_request(dev_name(dev), type, value);
> > > if (ret)
> > > @@ -390,6 +410,10 @@ int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> > > {
> > > int ret;
> > >
> > > + ret = dev_pm_qos_constraints_ensure_allocated(dev);
> > > + if (ret)
> > > + return ret;
> > > +
> >
> > It is a bit unfortunate that the mutex is dropped and then immediately
> > re-acquired again. I don't think that this is strictly necessary.
>
> We could have dev_pm_qos_constraints_ensure_allocated() return with
> the lock held in the success case if we had to.. but that seems a bit
> funny looking. And the dev_pm_qos_update_user_latency_tolerance()
> path would need to shuffle slightly to move the kzalloc out of the
> lock.

Well, what about something like this (modulo whitespace damage by
GMail), attached for completeness:

---
drivers/base/power/qos.c | 37 +++++++++++++++++++++++++------------
1 file changed, 25 insertions(+), 12 deletions(-)

Index: linux-pm/drivers/base/power/qos.c
===================================================================
--- linux-pm.orig/drivers/base/power/qos.c
+++ linux-pm/drivers/base/power/qos.c
@@ -186,26 +186,21 @@ static int apply_constraint(struct dev_p

/*
* dev_pm_qos_constraints_allocate
- * @dev: device to allocate data for
*
* Called at the first call to add_request, for constraint data allocation
* Must be called with the dev_pm_qos_mtx mutex held
*/
-static int dev_pm_qos_constraints_allocate(struct device *dev)
+static struct dev_pm_qos *dev_pm_qos_constraints_allocate(void)
{
struct dev_pm_qos *qos;
struct pm_qos_constraints *c;
struct blocking_notifier_head *n;

- qos = kzalloc(sizeof(*qos), GFP_KERNEL);
+ qos = kzalloc(sizeof(*qos) + kzalloc(3 * sizeof(*n), GFP_KERNEL);
if (!qos)
- return -ENOMEM;
+ return NULL;

- n = kzalloc(3 * sizeof(*n), GFP_KERNEL);
- if (!n) {
- kfree(qos);
- return -ENOMEM;
- }
+ n = (struct blocking_notifier_head *)(qos + 1);

c = &qos->resume_latency;
plist_head_init(&c->list);
@@ -227,6 +222,20 @@ static int dev_pm_qos_constraints_alloca

INIT_LIST_HEAD(&qos->flags.list);

+ return qos;
+}
+
+static int dev_pm_qos_constraints_add(struct device *dev,
+ struct dev_pm_qos *qos)
+{
+ if (!qos)
+ return -ENOMEM;
+
+ if (!IS_ERR_OR_NULL(dev->power.qos)) {
+ kfree(qos);
+ return -ENODEV;
+ }
+
spin_lock_irq(&dev->power.lock);
dev->power.qos = qos;
spin_unlock_irq(&dev->power.lock);
@@ -326,6 +335,7 @@ static bool dev_pm_qos_invalid_req_type(
}

static int __dev_pm_qos_add_request(struct device *dev,
+ struct dev_pm_qos *qos,
struct dev_pm_qos_request *req,
enum dev_pm_qos_req_type type, s32 value)
{
@@ -340,8 +350,10 @@ static int __dev_pm_qos_add_request(stru

if (IS_ERR(dev->power.qos))
ret = -ENODEV;
- else if (!dev->power.qos)
- ret = dev_pm_qos_constraints_allocate(dev);
+ else if (dev->power.qos)
+ kfree(qos);
+ else
+ ret = dev_pm_qos_constraints_add(dev);

trace_dev_pm_qos_add_request(dev_name(dev), type, value);
if (ret)
@@ -388,10 +400,11 @@ static int __dev_pm_qos_add_request(stru
int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
enum dev_pm_qos_req_type type, s32 value)
{
+ struct dev_pm_qos *qos = dev_pm_qos_constraints_allocate();
int ret;

mutex_lock(&dev_pm_qos_mtx);
- ret = __dev_pm_qos_add_request(dev, req, type, value);
+ ret = __dev_pm_qos_add_request(dev, qos, req, type, value);
mutex_unlock(&dev_pm_qos_mtx);
return ret;
}


Attachments:
pm-qos-rfc-change.patch (2.63 kB)

2023-08-04 21:03:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 3/9] PM / QoS: Fix constraints alloc vs reclaim locking

On Fri, Aug 4, 2023 at 10:38 PM Rob Clark <[email protected]> wrote:
>
> On Fri, Aug 4, 2023 at 12:11 PM Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Fri, Aug 4, 2023 at 8:38 PM Rob Clark <[email protected]> wrote:
> > >
> > > On Fri, Aug 4, 2023 at 10:07 AM Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > On Fri, Aug 4, 2023 at 12:02 AM Rob Clark <[email protected]> wrote:
> > > > >
> > > > > From: Rob Clark <[email protected]>
> > > > >
> > > > > In the process of adding lockdep annotation for drm GPU scheduler's
> > > > > job_run() to detect potential deadlock against shrinker/reclaim, I hit
> > > > > this lockdep splat:
> > > > >
> > > > > ======================================================
> > > > > WARNING: possible circular locking dependency detected
> > > > > 6.2.0-rc8-debug+ #558 Tainted: G W
> > > > > ------------------------------------------------------
> > > > > ring0/125 is trying to acquire lock:
> > > > > ffffffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x38/0x68
> > > > >
> > > > > but task is already holding lock:
> > > > > ffffff8087239208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> > > > >
> > > > > which lock already depends on the new lock.
> > > > >
> > > > > the existing dependency chain (in reverse order) is:
> > > > >
> > > > > -> #4 (&gpu->active_lock){+.+.}-{3:3}:
> > > > > __mutex_lock+0xcc/0x3c8
> > > > > mutex_lock_nested+0x30/0x44
> > > > > msm_gpu_submit+0xec/0x178
> > > > > msm_job_run+0x78/0x150
> > > > > drm_sched_main+0x290/0x370
> > > > > kthread+0xf0/0x100
> > > > > ret_from_fork+0x10/0x20
> > > > >
> > > > > -> #3 (dma_fence_map){++++}-{0:0}:
> > > > > __dma_fence_might_wait+0x74/0xc0
> > > > > dma_resv_lockdep+0x1f4/0x2f4
> > > > > do_one_initcall+0x104/0x2bc
> > > > > kernel_init_freeable+0x344/0x34c
> > > > > kernel_init+0x30/0x134
> > > > > ret_from_fork+0x10/0x20
> > > > >
> > > > > -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> > > > > fs_reclaim_acquire+0x80/0xa8
> > > > > slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > > > > __kmem_cache_alloc_node+0x60/0x1cc
> > > > > __kmalloc+0xd8/0x100
> > > > > topology_parse_cpu_capacity+0x8c/0x178
> > > > > get_cpu_for_node+0x88/0xc4
> > > > > parse_cluster+0x1b0/0x28c
> > > > > parse_cluster+0x8c/0x28c
> > > > > init_cpu_topology+0x168/0x188
> > > > > smp_prepare_cpus+0x24/0xf8
> > > > > kernel_init_freeable+0x18c/0x34c
> > > > > kernel_init+0x30/0x134
> > > > > ret_from_fork+0x10/0x20
> > > > >
> > > > > -> #1 (fs_reclaim){+.+.}-{0:0}:
> > > > > __fs_reclaim_acquire+0x3c/0x48
> > > > > fs_reclaim_acquire+0x54/0xa8
> > > > > slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > > > > __kmem_cache_alloc_node+0x60/0x1cc
> > > > > kmalloc_trace+0x50/0xa8
> > > > > dev_pm_qos_constraints_allocate+0x38/0x100
> > > > > __dev_pm_qos_add_request+0xb0/0x1e8
> > > > > dev_pm_qos_add_request+0x58/0x80
> > > > > dev_pm_qos_expose_latency_limit+0x60/0x13c
> > > > > register_cpu+0x12c/0x130
> > > > > topology_init+0xac/0xbc
> > > > > do_one_initcall+0x104/0x2bc
> > > > > kernel_init_freeable+0x344/0x34c
> > > > > kernel_init+0x30/0x134
> > > > > ret_from_fork+0x10/0x20
> > > > >
> > > > > -> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
> > > > > __lock_acquire+0xe00/0x1060
> > > > > lock_acquire+0x1e0/0x2f8
> > > > > __mutex_lock+0xcc/0x3c8
> > > > > mutex_lock_nested+0x30/0x44
> > > > > dev_pm_qos_update_request+0x38/0x68
> > > > > msm_devfreq_boost+0x40/0x70
> > > > > msm_devfreq_active+0xc0/0xf0
> > > > > msm_gpu_submit+0x10c/0x178
> > > > > msm_job_run+0x78/0x150
> > > > > drm_sched_main+0x290/0x370
> > > > > kthread+0xf0/0x100
> > > > > ret_from_fork+0x10/0x20
> > > > >
> > > > > other info that might help us debug this:
> > > > >
> > > > > Chain exists of:
> > > > > dev_pm_qos_mtx --> dma_fence_map --> &gpu->active_lock
> > > > >
> > > > > Possible unsafe locking scenario:
> > > > >
> > > > > CPU0 CPU1
> > > > > ---- ----
> > > > > lock(&gpu->active_lock);
> > > > > lock(dma_fence_map);
> > > > > lock(&gpu->active_lock);
> > > > > lock(dev_pm_qos_mtx);
> > > > >
> > > > > *** DEADLOCK ***
> > > > >
> > > > > 3 locks held by ring0/123:
> > > > > #0: ffffff8087251170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
> > > > > #1: ffffffd00b0e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150
> > > > > #2: ffffff8087251208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> > > > >
> > > > > stack backtrace:
> > > > > CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
> > > > > Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> > > > > Call trace:
> > > > > dump_backtrace.part.0+0xb4/0xf8
> > > > > show_stack+0x20/0x38
> > > > > dump_stack_lvl+0x9c/0xd0
> > > > > dump_stack+0x18/0x34
> > > > > print_circular_bug+0x1b4/0x1f0
> > > > > check_noncircular+0x78/0xac
> > > > > __lock_acquire+0xe00/0x1060
> > > > > lock_acquire+0x1e0/0x2f8
> > > > > __mutex_lock+0xcc/0x3c8
> > > > > mutex_lock_nested+0x30/0x44
> > > > > dev_pm_qos_update_request+0x38/0x68
> > > > > msm_devfreq_boost+0x40/0x70
> > > > > msm_devfreq_active+0xc0/0xf0
> > > > > msm_gpu_submit+0x10c/0x178
> > > > > msm_job_run+0x78/0x150
> > > > > drm_sched_main+0x290/0x370
> > > > > kthread+0xf0/0x100
> > > > > ret_from_fork+0x10/0x20
> > > > >
> > > > > The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> > > > > freq change) path, but it is also held across allocations that could
> > > > > recurse into shrinker.
> > > > >
> > > > > Solve this by changing dev_pm_qos_constraints_allocate() into a function
> > > > > that can be called unconditionally before the device qos object is
> > > > > needed and before aquiring dev_pm_qos_mtx. This way the allocations can
> > > > > be done without holding the mutex. In the case that we raced with
> > > > > another thread to allocate the qos object, detect this *after* acquiring
> > > > > the dev_pm_qos_mtx and simply free the redundant allocations.
> > > > >
> > > > > Signed-off-by: Rob Clark <[email protected]>
> > > > > ---
> > > > > drivers/base/power/qos.c | 60 +++++++++++++++++++++++++++-------------
> > > > > 1 file changed, 41 insertions(+), 19 deletions(-)
> > > > >
> > > > > diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> > > > > index 8e93167f1783..f3e0c6b65635 100644
> > > > > --- a/drivers/base/power/qos.c
> > > > > +++ b/drivers/base/power/qos.c
> > > > > @@ -185,18 +185,24 @@ static int apply_constraint(struct dev_pm_qos_request *req,
> > > > > }
> > > > >
> > > > > /*
> > > > > - * dev_pm_qos_constraints_allocate
> > > > > + * dev_pm_qos_constraints_ensure_allocated
> > > > > * @dev: device to allocate data for
> > > > > *
> > > > > - * Called at the first call to add_request, for constraint data allocation
> > > > > - * Must be called with the dev_pm_qos_mtx mutex held
> > > > > + * Called to ensure that devices qos is allocated, before acquiring
> > > > > + * dev_pm_qos_mtx.
> > > > > */
> > > > > -static int dev_pm_qos_constraints_allocate(struct device *dev)
> > > > > +static int dev_pm_qos_constraints_ensure_allocated(struct device *dev)
> > > > > {
> > > > > struct dev_pm_qos *qos;
> > > > > struct pm_qos_constraints *c;
> > > > > struct blocking_notifier_head *n;
> > > > >
> > > > > + if (!dev)
> > > > > + return -ENODEV;
> > > > > +
> > > > > + if (!IS_ERR_OR_NULL(dev->power.qos))
> > > > > + return 0;
> > > > > +
> > > > > qos = kzalloc(sizeof(*qos), GFP_KERNEL);
> > > > > if (!qos)
> > > > > return -ENOMEM;
> > > > > @@ -227,10 +233,26 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
> > > > >
> > > > > INIT_LIST_HEAD(&qos->flags.list);
> > > > >
> > > > > + mutex_lock(&dev_pm_qos_mtx);
> > > > > +
> > > > > + if (!IS_ERR_OR_NULL(dev->power.qos)) {
> > > > > + /*
> > > > > + * We have raced with another task to create the qos.
> > > > > + * No biggie, just free the resources we've allocated
> > > > > + * outside of dev_pm_qos_mtx and move on with life.
> > > > > + */
> > > > > + kfree(n);
> > > > > + kfree(qos);
> > > > > + goto unlock;
> > > > > + }
> > > > > +
> > > > > spin_lock_irq(&dev->power.lock);
> > > > > dev->power.qos = qos;
> > > > > spin_unlock_irq(&dev->power.lock);
> > > > >
> > > > > +unlock:
> > > > > + mutex_unlock(&dev_pm_qos_mtx);
> > > > > +
> > > > > return 0;
> > > > > }
> > > > >
> > > > > @@ -331,17 +353,15 @@ static int __dev_pm_qos_add_request(struct device *dev,
> > > > > {
> > > > > int ret = 0;
> > > > >
> > > > > - if (!dev || !req || dev_pm_qos_invalid_req_type(dev, type))
> > > > > + if (!req || dev_pm_qos_invalid_req_type(dev, type))
> > > > > return -EINVAL;
> > > > >
> > > > > if (WARN(dev_pm_qos_request_active(req),
> > > > > "%s() called for already added request\n", __func__))
> > > > > return -EINVAL;
> > > > >
> > > > > - if (IS_ERR(dev->power.qos))
> > > > > + if (IS_ERR_OR_NULL(dev->power.qos))
> > > > > ret = -ENODEV;
> > > > > - else if (!dev->power.qos)
> > > > > - ret = dev_pm_qos_constraints_allocate(dev);
> > > > >
> > > > > trace_dev_pm_qos_add_request(dev_name(dev), type, value);
> > > > > if (ret)
> > > > > @@ -390,6 +410,10 @@ int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> > > > > {
> > > > > int ret;
> > > > >
> > > > > + ret = dev_pm_qos_constraints_ensure_allocated(dev);
> > > > > + if (ret)
> > > > > + return ret;
> > > > > +
> > > >
> > > > It is a bit unfortunate that the mutex is dropped and then immediately
> > > > re-acquired again. I don't think that this is strictly necessary.
> > >
> > > We could have dev_pm_qos_constraints_ensure_allocated() return with
> > > the lock held in the success case if we had to.. but that seems a bit
> > > funny looking. And the dev_pm_qos_update_user_latency_tolerance()
> > > path would need to shuffle slightly to move the kzalloc out of the
> > > lock.
> >
> > Well, what about something like this (modulo whitespace damage by
> > GMail), attached for completeness:
> >
>
> There is one other path to handle, and some small details,

Yes, this was just an illustration of the approach.

> but I think the approach could work.. let's see..

OK

2023-08-04 21:26:45

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH v3 3/9] PM / QoS: Fix constraints alloc vs reclaim locking

On Fri, Aug 4, 2023 at 10:07 AM Rafael J. Wysocki <[email protected]> wrote:
>
> On Fri, Aug 4, 2023 at 12:02 AM Rob Clark <[email protected]> wrote:
> >
> > From: Rob Clark <[email protected]>
> >
> > In the process of adding lockdep annotation for drm GPU scheduler's
> > job_run() to detect potential deadlock against shrinker/reclaim, I hit
> > this lockdep splat:
> >
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 6.2.0-rc8-debug+ #558 Tainted: G W
> > ------------------------------------------------------
> > ring0/125 is trying to acquire lock:
> > ffffffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x38/0x68
> >
> > but task is already holding lock:
> > ffffff8087239208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> >
> > which lock already depends on the new lock.
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #4 (&gpu->active_lock){+.+.}-{3:3}:
> > __mutex_lock+0xcc/0x3c8
> > mutex_lock_nested+0x30/0x44
> > msm_gpu_submit+0xec/0x178
> > msm_job_run+0x78/0x150
> > drm_sched_main+0x290/0x370
> > kthread+0xf0/0x100
> > ret_from_fork+0x10/0x20
> >
> > -> #3 (dma_fence_map){++++}-{0:0}:
> > __dma_fence_might_wait+0x74/0xc0
> > dma_resv_lockdep+0x1f4/0x2f4
> > do_one_initcall+0x104/0x2bc
> > kernel_init_freeable+0x344/0x34c
> > kernel_init+0x30/0x134
> > ret_from_fork+0x10/0x20
> >
> > -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> > fs_reclaim_acquire+0x80/0xa8
> > slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > __kmem_cache_alloc_node+0x60/0x1cc
> > __kmalloc+0xd8/0x100
> > topology_parse_cpu_capacity+0x8c/0x178
> > get_cpu_for_node+0x88/0xc4
> > parse_cluster+0x1b0/0x28c
> > parse_cluster+0x8c/0x28c
> > init_cpu_topology+0x168/0x188
> > smp_prepare_cpus+0x24/0xf8
> > kernel_init_freeable+0x18c/0x34c
> > kernel_init+0x30/0x134
> > ret_from_fork+0x10/0x20
> >
> > -> #1 (fs_reclaim){+.+.}-{0:0}:
> > __fs_reclaim_acquire+0x3c/0x48
> > fs_reclaim_acquire+0x54/0xa8
> > slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > __kmem_cache_alloc_node+0x60/0x1cc
> > kmalloc_trace+0x50/0xa8
> > dev_pm_qos_constraints_allocate+0x38/0x100
> > __dev_pm_qos_add_request+0xb0/0x1e8
> > dev_pm_qos_add_request+0x58/0x80
> > dev_pm_qos_expose_latency_limit+0x60/0x13c
> > register_cpu+0x12c/0x130
> > topology_init+0xac/0xbc
> > do_one_initcall+0x104/0x2bc
> > kernel_init_freeable+0x344/0x34c
> > kernel_init+0x30/0x134
> > ret_from_fork+0x10/0x20
> >
> > -> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
> > __lock_acquire+0xe00/0x1060
> > lock_acquire+0x1e0/0x2f8
> > __mutex_lock+0xcc/0x3c8
> > mutex_lock_nested+0x30/0x44
> > dev_pm_qos_update_request+0x38/0x68
> > msm_devfreq_boost+0x40/0x70
> > msm_devfreq_active+0xc0/0xf0
> > msm_gpu_submit+0x10c/0x178
> > msm_job_run+0x78/0x150
> > drm_sched_main+0x290/0x370
> > kthread+0xf0/0x100
> > ret_from_fork+0x10/0x20
> >
> > other info that might help us debug this:
> >
> > Chain exists of:
> > dev_pm_qos_mtx --> dma_fence_map --> &gpu->active_lock
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(&gpu->active_lock);
> > lock(dma_fence_map);
> > lock(&gpu->active_lock);
> > lock(dev_pm_qos_mtx);
> >
> > *** DEADLOCK ***
> >
> > 3 locks held by ring0/123:
> > #0: ffffff8087251170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
> > #1: ffffffd00b0e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150
> > #2: ffffff8087251208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> >
> > stack backtrace:
> > CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
> > Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> > Call trace:
> > dump_backtrace.part.0+0xb4/0xf8
> > show_stack+0x20/0x38
> > dump_stack_lvl+0x9c/0xd0
> > dump_stack+0x18/0x34
> > print_circular_bug+0x1b4/0x1f0
> > check_noncircular+0x78/0xac
> > __lock_acquire+0xe00/0x1060
> > lock_acquire+0x1e0/0x2f8
> > __mutex_lock+0xcc/0x3c8
> > mutex_lock_nested+0x30/0x44
> > dev_pm_qos_update_request+0x38/0x68
> > msm_devfreq_boost+0x40/0x70
> > msm_devfreq_active+0xc0/0xf0
> > msm_gpu_submit+0x10c/0x178
> > msm_job_run+0x78/0x150
> > drm_sched_main+0x290/0x370
> > kthread+0xf0/0x100
> > ret_from_fork+0x10/0x20
> >
> > The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> > freq change) path, but it is also held across allocations that could
> > recurse into shrinker.
> >
> > Solve this by changing dev_pm_qos_constraints_allocate() into a function
> > that can be called unconditionally before the device qos object is
> > needed and before aquiring dev_pm_qos_mtx. This way the allocations can
> > be done without holding the mutex. In the case that we raced with
> > another thread to allocate the qos object, detect this *after* acquiring
> > the dev_pm_qos_mtx and simply free the redundant allocations.
> >
> > Signed-off-by: Rob Clark <[email protected]>
> > ---
> > drivers/base/power/qos.c | 60 +++++++++++++++++++++++++++-------------
> > 1 file changed, 41 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> > index 8e93167f1783..f3e0c6b65635 100644
> > --- a/drivers/base/power/qos.c
> > +++ b/drivers/base/power/qos.c
> > @@ -185,18 +185,24 @@ static int apply_constraint(struct dev_pm_qos_request *req,
> > }
> >
> > /*
> > - * dev_pm_qos_constraints_allocate
> > + * dev_pm_qos_constraints_ensure_allocated
> > * @dev: device to allocate data for
> > *
> > - * Called at the first call to add_request, for constraint data allocation
> > - * Must be called with the dev_pm_qos_mtx mutex held
> > + * Called to ensure that devices qos is allocated, before acquiring
> > + * dev_pm_qos_mtx.
> > */
> > -static int dev_pm_qos_constraints_allocate(struct device *dev)
> > +static int dev_pm_qos_constraints_ensure_allocated(struct device *dev)
> > {
> > struct dev_pm_qos *qos;
> > struct pm_qos_constraints *c;
> > struct blocking_notifier_head *n;
> >
> > + if (!dev)
> > + return -ENODEV;
> > +
> > + if (!IS_ERR_OR_NULL(dev->power.qos))
> > + return 0;
> > +
> > qos = kzalloc(sizeof(*qos), GFP_KERNEL);
> > if (!qos)
> > return -ENOMEM;
> > @@ -227,10 +233,26 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
> >
> > INIT_LIST_HEAD(&qos->flags.list);
> >
> > + mutex_lock(&dev_pm_qos_mtx);
> > +
> > + if (!IS_ERR_OR_NULL(dev->power.qos)) {
> > + /*
> > + * We have raced with another task to create the qos.
> > + * No biggie, just free the resources we've allocated
> > + * outside of dev_pm_qos_mtx and move on with life.
> > + */
> > + kfree(n);
> > + kfree(qos);
> > + goto unlock;
> > + }
> > +
> > spin_lock_irq(&dev->power.lock);
> > dev->power.qos = qos;
> > spin_unlock_irq(&dev->power.lock);
> >
> > +unlock:
> > + mutex_unlock(&dev_pm_qos_mtx);
> > +
> > return 0;
> > }
> >
> > @@ -331,17 +353,15 @@ static int __dev_pm_qos_add_request(struct device *dev,
> > {
> > int ret = 0;
> >
> > - if (!dev || !req || dev_pm_qos_invalid_req_type(dev, type))
> > + if (!req || dev_pm_qos_invalid_req_type(dev, type))
> > return -EINVAL;
> >
> > if (WARN(dev_pm_qos_request_active(req),
> > "%s() called for already added request\n", __func__))
> > return -EINVAL;
> >
> > - if (IS_ERR(dev->power.qos))
> > + if (IS_ERR_OR_NULL(dev->power.qos))
> > ret = -ENODEV;
> > - else if (!dev->power.qos)
> > - ret = dev_pm_qos_constraints_allocate(dev);
> >
> > trace_dev_pm_qos_add_request(dev_name(dev), type, value);
> > if (ret)
> > @@ -390,6 +410,10 @@ int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> > {
> > int ret;
> >
> > + ret = dev_pm_qos_constraints_ensure_allocated(dev);
> > + if (ret)
> > + return ret;
> > +
>
> It is a bit unfortunate that the mutex is dropped and then immediately
> re-acquired again. I don't think that this is strictly necessary.

We could have dev_pm_qos_constraints_ensure_allocated() return with
the lock held in the success case if we had to.. but that seems a bit
funny looking. And the dev_pm_qos_update_user_latency_tolerance()
path would need to shuffle slightly to move the kzalloc out of the
lock.

BR,
-R


> > mutex_lock(&dev_pm_qos_mtx);
> > ret = __dev_pm_qos_add_request(dev, req, type, value);
> > mutex_unlock(&dev_pm_qos_mtx);
> > @@ -537,15 +561,11 @@ int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier,
> > {
> > int ret = 0;
> >
> > - mutex_lock(&dev_pm_qos_mtx);
> > -
> > - if (IS_ERR(dev->power.qos))
> > - ret = -ENODEV;
> > - else if (!dev->power.qos)
> > - ret = dev_pm_qos_constraints_allocate(dev);
> > -
> > + ret = dev_pm_qos_constraints_ensure_allocated(dev);
> > if (ret)
> > - goto unlock;
> > + return ret;
> > +
> > + mutex_lock(&dev_pm_qos_mtx);
> >
> > switch (type) {
> > case DEV_PM_QOS_RESUME_LATENCY:
> > @@ -565,7 +585,6 @@ int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier,
> > ret = -EINVAL;
> > }
> >
> > -unlock:
> > mutex_unlock(&dev_pm_qos_mtx);
> > return ret;
> > }
> > @@ -905,10 +924,13 @@ int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
> > {
> > int ret;
> >
> > + ret = dev_pm_qos_constraints_ensure_allocated(dev);
> > + if (ret)
> > + return ret;
> > +
> > mutex_lock(&dev_pm_qos_mtx);
> >
> > - if (IS_ERR_OR_NULL(dev->power.qos)
> > - || !dev->power.qos->latency_tolerance_req) {
> > + if (!dev->power.qos->latency_tolerance_req) {
> > struct dev_pm_qos_request *req;
> >
> > if (val < 0) {
> > --
> > 2.41.0
> >

2023-08-04 22:42:51

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH v3 3/9] PM / QoS: Fix constraints alloc vs reclaim locking

On Fri, Aug 4, 2023 at 12:11 PM Rafael J. Wysocki <[email protected]> wrote:
>
> On Fri, Aug 4, 2023 at 8:38 PM Rob Clark <[email protected]> wrote:
> >
> > On Fri, Aug 4, 2023 at 10:07 AM Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Fri, Aug 4, 2023 at 12:02 AM Rob Clark <[email protected]> wrote:
> > > >
> > > > From: Rob Clark <[email protected]>
> > > >
> > > > In the process of adding lockdep annotation for drm GPU scheduler's
> > > > job_run() to detect potential deadlock against shrinker/reclaim, I hit
> > > > this lockdep splat:
> > > >
> > > > ======================================================
> > > > WARNING: possible circular locking dependency detected
> > > > 6.2.0-rc8-debug+ #558 Tainted: G W
> > > > ------------------------------------------------------
> > > > ring0/125 is trying to acquire lock:
> > > > ffffffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x38/0x68
> > > >
> > > > but task is already holding lock:
> > > > ffffff8087239208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> > > >
> > > > which lock already depends on the new lock.
> > > >
> > > > the existing dependency chain (in reverse order) is:
> > > >
> > > > -> #4 (&gpu->active_lock){+.+.}-{3:3}:
> > > > __mutex_lock+0xcc/0x3c8
> > > > mutex_lock_nested+0x30/0x44
> > > > msm_gpu_submit+0xec/0x178
> > > > msm_job_run+0x78/0x150
> > > > drm_sched_main+0x290/0x370
> > > > kthread+0xf0/0x100
> > > > ret_from_fork+0x10/0x20
> > > >
> > > > -> #3 (dma_fence_map){++++}-{0:0}:
> > > > __dma_fence_might_wait+0x74/0xc0
> > > > dma_resv_lockdep+0x1f4/0x2f4
> > > > do_one_initcall+0x104/0x2bc
> > > > kernel_init_freeable+0x344/0x34c
> > > > kernel_init+0x30/0x134
> > > > ret_from_fork+0x10/0x20
> > > >
> > > > -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> > > > fs_reclaim_acquire+0x80/0xa8
> > > > slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > > > __kmem_cache_alloc_node+0x60/0x1cc
> > > > __kmalloc+0xd8/0x100
> > > > topology_parse_cpu_capacity+0x8c/0x178
> > > > get_cpu_for_node+0x88/0xc4
> > > > parse_cluster+0x1b0/0x28c
> > > > parse_cluster+0x8c/0x28c
> > > > init_cpu_topology+0x168/0x188
> > > > smp_prepare_cpus+0x24/0xf8
> > > > kernel_init_freeable+0x18c/0x34c
> > > > kernel_init+0x30/0x134
> > > > ret_from_fork+0x10/0x20
> > > >
> > > > -> #1 (fs_reclaim){+.+.}-{0:0}:
> > > > __fs_reclaim_acquire+0x3c/0x48
> > > > fs_reclaim_acquire+0x54/0xa8
> > > > slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > > > __kmem_cache_alloc_node+0x60/0x1cc
> > > > kmalloc_trace+0x50/0xa8
> > > > dev_pm_qos_constraints_allocate+0x38/0x100
> > > > __dev_pm_qos_add_request+0xb0/0x1e8
> > > > dev_pm_qos_add_request+0x58/0x80
> > > > dev_pm_qos_expose_latency_limit+0x60/0x13c
> > > > register_cpu+0x12c/0x130
> > > > topology_init+0xac/0xbc
> > > > do_one_initcall+0x104/0x2bc
> > > > kernel_init_freeable+0x344/0x34c
> > > > kernel_init+0x30/0x134
> > > > ret_from_fork+0x10/0x20
> > > >
> > > > -> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
> > > > __lock_acquire+0xe00/0x1060
> > > > lock_acquire+0x1e0/0x2f8
> > > > __mutex_lock+0xcc/0x3c8
> > > > mutex_lock_nested+0x30/0x44
> > > > dev_pm_qos_update_request+0x38/0x68
> > > > msm_devfreq_boost+0x40/0x70
> > > > msm_devfreq_active+0xc0/0xf0
> > > > msm_gpu_submit+0x10c/0x178
> > > > msm_job_run+0x78/0x150
> > > > drm_sched_main+0x290/0x370
> > > > kthread+0xf0/0x100
> > > > ret_from_fork+0x10/0x20
> > > >
> > > > other info that might help us debug this:
> > > >
> > > > Chain exists of:
> > > > dev_pm_qos_mtx --> dma_fence_map --> &gpu->active_lock
> > > >
> > > > Possible unsafe locking scenario:
> > > >
> > > > CPU0 CPU1
> > > > ---- ----
> > > > lock(&gpu->active_lock);
> > > > lock(dma_fence_map);
> > > > lock(&gpu->active_lock);
> > > > lock(dev_pm_qos_mtx);
> > > >
> > > > *** DEADLOCK ***
> > > >
> > > > 3 locks held by ring0/123:
> > > > #0: ffffff8087251170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
> > > > #1: ffffffd00b0e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150
> > > > #2: ffffff8087251208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> > > >
> > > > stack backtrace:
> > > > CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
> > > > Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> > > > Call trace:
> > > > dump_backtrace.part.0+0xb4/0xf8
> > > > show_stack+0x20/0x38
> > > > dump_stack_lvl+0x9c/0xd0
> > > > dump_stack+0x18/0x34
> > > > print_circular_bug+0x1b4/0x1f0
> > > > check_noncircular+0x78/0xac
> > > > __lock_acquire+0xe00/0x1060
> > > > lock_acquire+0x1e0/0x2f8
> > > > __mutex_lock+0xcc/0x3c8
> > > > mutex_lock_nested+0x30/0x44
> > > > dev_pm_qos_update_request+0x38/0x68
> > > > msm_devfreq_boost+0x40/0x70
> > > > msm_devfreq_active+0xc0/0xf0
> > > > msm_gpu_submit+0x10c/0x178
> > > > msm_job_run+0x78/0x150
> > > > drm_sched_main+0x290/0x370
> > > > kthread+0xf0/0x100
> > > > ret_from_fork+0x10/0x20
> > > >
> > > > The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> > > > freq change) path, but it is also held across allocations that could
> > > > recurse into shrinker.
> > > >
> > > > Solve this by changing dev_pm_qos_constraints_allocate() into a function
> > > > that can be called unconditionally before the device qos object is
> > > > needed and before aquiring dev_pm_qos_mtx. This way the allocations can
> > > > be done without holding the mutex. In the case that we raced with
> > > > another thread to allocate the qos object, detect this *after* acquiring
> > > > the dev_pm_qos_mtx and simply free the redundant allocations.
> > > >
> > > > Signed-off-by: Rob Clark <[email protected]>
> > > > ---
> > > > drivers/base/power/qos.c | 60 +++++++++++++++++++++++++++-------------
> > > > 1 file changed, 41 insertions(+), 19 deletions(-)
> > > >
> > > > diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> > > > index 8e93167f1783..f3e0c6b65635 100644
> > > > --- a/drivers/base/power/qos.c
> > > > +++ b/drivers/base/power/qos.c
> > > > @@ -185,18 +185,24 @@ static int apply_constraint(struct dev_pm_qos_request *req,
> > > > }
> > > >
> > > > /*
> > > > - * dev_pm_qos_constraints_allocate
> > > > + * dev_pm_qos_constraints_ensure_allocated
> > > > * @dev: device to allocate data for
> > > > *
> > > > - * Called at the first call to add_request, for constraint data allocation
> > > > - * Must be called with the dev_pm_qos_mtx mutex held
> > > > + * Called to ensure that devices qos is allocated, before acquiring
> > > > + * dev_pm_qos_mtx.
> > > > */
> > > > -static int dev_pm_qos_constraints_allocate(struct device *dev)
> > > > +static int dev_pm_qos_constraints_ensure_allocated(struct device *dev)
> > > > {
> > > > struct dev_pm_qos *qos;
> > > > struct pm_qos_constraints *c;
> > > > struct blocking_notifier_head *n;
> > > >
> > > > + if (!dev)
> > > > + return -ENODEV;
> > > > +
> > > > + if (!IS_ERR_OR_NULL(dev->power.qos))
> > > > + return 0;
> > > > +
> > > > qos = kzalloc(sizeof(*qos), GFP_KERNEL);
> > > > if (!qos)
> > > > return -ENOMEM;
> > > > @@ -227,10 +233,26 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
> > > >
> > > > INIT_LIST_HEAD(&qos->flags.list);
> > > >
> > > > + mutex_lock(&dev_pm_qos_mtx);
> > > > +
> > > > + if (!IS_ERR_OR_NULL(dev->power.qos)) {
> > > > + /*
> > > > + * We have raced with another task to create the qos.
> > > > + * No biggie, just free the resources we've allocated
> > > > + * outside of dev_pm_qos_mtx and move on with life.
> > > > + */
> > > > + kfree(n);
> > > > + kfree(qos);
> > > > + goto unlock;
> > > > + }
> > > > +
> > > > spin_lock_irq(&dev->power.lock);
> > > > dev->power.qos = qos;
> > > > spin_unlock_irq(&dev->power.lock);
> > > >
> > > > +unlock:
> > > > + mutex_unlock(&dev_pm_qos_mtx);
> > > > +
> > > > return 0;
> > > > }
> > > >
> > > > @@ -331,17 +353,15 @@ static int __dev_pm_qos_add_request(struct device *dev,
> > > > {
> > > > int ret = 0;
> > > >
> > > > - if (!dev || !req || dev_pm_qos_invalid_req_type(dev, type))
> > > > + if (!req || dev_pm_qos_invalid_req_type(dev, type))
> > > > return -EINVAL;
> > > >
> > > > if (WARN(dev_pm_qos_request_active(req),
> > > > "%s() called for already added request\n", __func__))
> > > > return -EINVAL;
> > > >
> > > > - if (IS_ERR(dev->power.qos))
> > > > + if (IS_ERR_OR_NULL(dev->power.qos))
> > > > ret = -ENODEV;
> > > > - else if (!dev->power.qos)
> > > > - ret = dev_pm_qos_constraints_allocate(dev);
> > > >
> > > > trace_dev_pm_qos_add_request(dev_name(dev), type, value);
> > > > if (ret)
> > > > @@ -390,6 +410,10 @@ int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> > > > {
> > > > int ret;
> > > >
> > > > + ret = dev_pm_qos_constraints_ensure_allocated(dev);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > >
> > > It is a bit unfortunate that the mutex is dropped and then immediately
> > > re-acquired again. I don't think that this is strictly necessary.
> >
> > We could have dev_pm_qos_constraints_ensure_allocated() return with
> > the lock held in the success case if we had to.. but that seems a bit
> > funny looking. And the dev_pm_qos_update_user_latency_tolerance()
> > path would need to shuffle slightly to move the kzalloc out of the
> > lock.
>
> Well, what about something like this (modulo whitespace damage by
> GMail), attached for completeness:
>

There is one other path to handle, and some small details, but I think
the approach could work.. let's see..

BR,
-R

> ---
> drivers/base/power/qos.c | 37 +++++++++++++++++++++++++------------
> 1 file changed, 25 insertions(+), 12 deletions(-)
>
> Index: linux-pm/drivers/base/power/qos.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/qos.c
> +++ linux-pm/drivers/base/power/qos.c
> @@ -186,26 +186,21 @@ static int apply_constraint(struct dev_p
>
> /*
> * dev_pm_qos_constraints_allocate
> - * @dev: device to allocate data for
> *
> * Called at the first call to add_request, for constraint data allocation
> * Must be called with the dev_pm_qos_mtx mutex held
> */
> -static int dev_pm_qos_constraints_allocate(struct device *dev)
> +static struct dev_pm_qos *dev_pm_qos_constraints_allocate(void)
> {
> struct dev_pm_qos *qos;
> struct pm_qos_constraints *c;
> struct blocking_notifier_head *n;
>
> - qos = kzalloc(sizeof(*qos), GFP_KERNEL);
> + qos = kzalloc(sizeof(*qos) + kzalloc(3 * sizeof(*n), GFP_KERNEL);
> if (!qos)
> - return -ENOMEM;
> + return NULL;
>
> - n = kzalloc(3 * sizeof(*n), GFP_KERNEL);
> - if (!n) {
> - kfree(qos);
> - return -ENOMEM;
> - }
> + n = (struct blocking_notifier_head *)(qos + 1);
>
> c = &qos->resume_latency;
> plist_head_init(&c->list);
> @@ -227,6 +222,20 @@ static int dev_pm_qos_constraints_alloca
>
> INIT_LIST_HEAD(&qos->flags.list);
>
> + return qos;
> +}
> +
> +static int dev_pm_qos_constraints_add(struct device *dev,
> + struct dev_pm_qos *qos)
> +{
> + if (!qos)
> + return -ENOMEM;
> +
> + if (!IS_ERR_OR_NULL(dev->power.qos)) {
> + kfree(qos);
> + return -ENODEV;
> + }
> +
> spin_lock_irq(&dev->power.lock);
> dev->power.qos = qos;
> spin_unlock_irq(&dev->power.lock);
> @@ -326,6 +335,7 @@ static bool dev_pm_qos_invalid_req_type(
> }
>
> static int __dev_pm_qos_add_request(struct device *dev,
> + struct dev_pm_qos *qos,
> struct dev_pm_qos_request *req,
> enum dev_pm_qos_req_type type, s32 value)
> {
> @@ -340,8 +350,10 @@ static int __dev_pm_qos_add_request(stru
>
> if (IS_ERR(dev->power.qos))
> ret = -ENODEV;
> - else if (!dev->power.qos)
> - ret = dev_pm_qos_constraints_allocate(dev);
> + else if (dev->power.qos)
> + kfree(qos);
> + else
> + ret = dev_pm_qos_constraints_add(dev);
>
> trace_dev_pm_qos_add_request(dev_name(dev), type, value);
> if (ret)
> @@ -388,10 +400,11 @@ static int __dev_pm_qos_add_request(stru
> int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> enum dev_pm_qos_req_type type, s32 value)
> {
> + struct dev_pm_qos *qos = dev_pm_qos_constraints_allocate();
> int ret;
>
> mutex_lock(&dev_pm_qos_mtx);
> - ret = __dev_pm_qos_add_request(dev, req, type, value);
> + ret = __dev_pm_qos_add_request(dev, qos, req, type, value);
> mutex_unlock(&dev_pm_qos_mtx);
> return ret;
> }