Now the release of zswap pool is controlled by percpu_ref, its release
callback (__zswap_pool_empty()) will be called when percpu_ref hit 0.
But this release callback may potentially be called from RCU callback
context by percpu_ref_kill(), which maybe in the interrupt context.
So we need to use spin_lock_irqsave() and spin_unlock_irqrestore()
in the release callback: __zswap_pool_empty(). In other task context
places, spin_lock_irq() and spin_unlock_irq() are enough to avoid
potential deadlock.
This problem is introduced by the commit f3da427e82c4 ("mm/zswap: change
zswap_pool kref to percpu_ref"), which is in mm-unstable branch now.
It can be reproduced by testing kernel build in tmpfs with zswap and
CONFIG_LOCKDEP enabled, meanwhile changing the zswap compressor setting
dynamically.
Signed-off-by: Chengming Zhou <[email protected]>
---
mm/zswap.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index 011e068eb355..894bd184f78e 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -456,10 +456,11 @@ static struct zswap_pool *zswap_pool_current(void);
static void __zswap_pool_empty(struct percpu_ref *ref)
{
struct zswap_pool *pool;
+ unsigned long flags;
pool = container_of(ref, typeof(*pool), ref);
- spin_lock(&zswap_pools_lock);
+ spin_lock_irqsave(&zswap_pools_lock, flags);
WARN_ON(pool == zswap_pool_current());
@@ -468,7 +469,7 @@ static void __zswap_pool_empty(struct percpu_ref *ref)
INIT_WORK(&pool->release_work, __zswap_pool_release);
schedule_work(&pool->release_work);
- spin_unlock(&zswap_pools_lock);
+ spin_unlock_irqrestore(&zswap_pools_lock, flags);
}
static int __must_check zswap_pool_get(struct zswap_pool *pool)
@@ -598,7 +599,7 @@ static int __zswap_param_set(const char *val, const struct kernel_param *kp,
return -EINVAL;
}
- spin_lock(&zswap_pools_lock);
+ spin_lock_irq(&zswap_pools_lock);
pool = zswap_pool_find_get(type, compressor);
if (pool) {
@@ -607,7 +608,7 @@ static int __zswap_param_set(const char *val, const struct kernel_param *kp,
list_del_rcu(&pool->list);
}
- spin_unlock(&zswap_pools_lock);
+ spin_unlock_irq(&zswap_pools_lock);
if (!pool)
pool = zswap_pool_create(type, compressor);
@@ -628,7 +629,7 @@ static int __zswap_param_set(const char *val, const struct kernel_param *kp,
else
ret = -EINVAL;
- spin_lock(&zswap_pools_lock);
+ spin_lock_irq(&zswap_pools_lock);
if (!ret) {
put_pool = zswap_pool_current();
@@ -643,7 +644,7 @@ static int __zswap_param_set(const char *val, const struct kernel_param *kp,
put_pool = pool;
}
- spin_unlock(&zswap_pools_lock);
+ spin_unlock_irq(&zswap_pools_lock);
if (!zswap_has_pool && !pool) {
/* if initial pool creation failed, and this pool creation also
--
2.40.1
On Wed, Feb 28, 2024 at 03:18:32PM +0000, Chengming Zhou wrote:
> Now the release of zswap pool is controlled by percpu_ref, its release
> callback (__zswap_pool_empty()) will be called when percpu_ref hit 0.
> But this release callback may potentially be called from RCU callback
> context by percpu_ref_kill(), which maybe in the interrupt context.
>
> So we need to use spin_lock_irqsave() and spin_unlock_irqrestore()
> in the release callback: __zswap_pool_empty(). In other task context
> places, spin_lock_irq() and spin_unlock_irq() are enough to avoid
> potential deadlock.
RCU callback context is BH, not IRQ, so it's enough to use
spin_lock_bh(), no?
On 2024/2/28 23:24, Matthew Wilcox wrote:
> On Wed, Feb 28, 2024 at 03:18:32PM +0000, Chengming Zhou wrote:
>> Now the release of zswap pool is controlled by percpu_ref, its release
>> callback (__zswap_pool_empty()) will be called when percpu_ref hit 0.
>> But this release callback may potentially be called from RCU callback
>> context by percpu_ref_kill(), which maybe in the interrupt context.
>>
>> So we need to use spin_lock_irqsave() and spin_unlock_irqrestore()
>> in the release callback: __zswap_pool_empty(). In other task context
>> places, spin_lock_irq() and spin_unlock_irq() are enough to avoid
>> potential deadlock.
>
> RCU callback context is BH, not IRQ, so it's enough to use
> spin_lock_bh(), no?
You're right, it's the softirq context, so spin_lock_bh() is enough.
Thanks!