A problem was find in stable 5.10 and the root cause of it like below.
In the use of q_usage_counter of request_queue, blk_cleanup_queue using
"wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))"
to wait q_usage_counter becoming zero. however, if the q_usage_counter
becoming zero quickly, and percpu_ref_exit will execute and ref->data
will be freed, maybe another process will cause a null-defef problem
like below:
CPU0 CPU1
blk_mq_destroy_queue
blk_freeze_queue
blk_mq_freeze_queue_wait
scsi_end_request
percpu_ref_get
...
percpu_ref_put
atomic_long_sub_and_test
blk_put_queue
kobject_put
kref_put
blk_release_queue
percpu_ref_exit
ref->data -> NULL
ref->data->release(ref) -> null-deref
As suggested by Ming Lei, fix it by getting the release method before
the referebce count is minus 0.
Suggested-by: Ming Lei <[email protected]>
Signed-off-by: Zhong Jinghua <[email protected]>
---
include/linux/percpu-refcount.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index d73a1c08c3e3..11e717c95acb 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -331,8 +331,11 @@ static inline void percpu_ref_put_many(struct percpu_ref *ref, unsigned long nr)
if (__ref_is_percpu(ref, &percpu_count))
this_cpu_sub(*percpu_count, nr);
- else if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
- ref->data->release(ref);
+ else {
+ percpu_ref_func_t *release = ref->data->release;
+ if (unlikely(atomic_long_sub_and_test(nr, &ref->data->count)))
+ release(ref);
+ }
rcu_read_unlock();
}
--
2.31.1