2023-12-06 07:48:12

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH v6 42/45] mm: shrinker: make global slab shrink lockless

On Tue, Sep 12, 2023 at 9:57 PM Qi Zheng <[email protected]> wrote:

> - if (!down_read_trylock(&shrinker_rwsem))
> - goto out;
> -
> - list_for_each_entry(shrinker, &shrinker_list, list) {
> + /*
> + * lockless algorithm of global shrink.
> + *
> + * In the unregistration setp, the shrinker will be freed asynchronously
> + * via RCU after its refcount reaches 0. So both rcu_read_lock() and
> + * shrinker_try_get() can be used to ensure the existence of the shrinker.
> + *
> + * So in the global shrink:
> + * step 1: use rcu_read_lock() to guarantee existence of the shrinker
> + * and the validity of the shrinker_list walk.
> + * step 2: use shrinker_try_get() to try get the refcount, if successful,
> + * then the existence of the shrinker can also be guaranteed,
> + * so we can release the RCU lock to do do_shrink_slab() that
> + * may sleep.
> + * step 3: *MUST* to reacquire the RCU lock before calling shrinker_put(),
> + * which ensures that neither this shrinker nor the next shrinker
> + * will be freed in the next traversal operation.

Hello, Qi, Andrew, Paul,

I wonder know how RCU can ensure the lifespan of the next shrinker.
it seems it is diverged from the common pattern usage of RCU+reference.

cpu1:
rcu_read_lock();
shrinker_try_get(this_shrinker);
rcu_read_unlock();
cpu2: shrinker_free(this_shrinker);
cpu2: shrinker_free(next_shrinker); and free the memory of next_shrinker
cpu2: when shrinker_free(next_shrinker), no one updates this_shrinker's next
cpu2: since this_shrinker has been removed first.
rcu_read_lock();
shrinker_put(this_shrinker);
travel to the freed next_shrinker.

a quick simple fix:

// called with other references other than RCU (i.e. refcount)
static inline rcu_list_deleted(struct list_head *entry)
{
// something like this:
return entry->prev == LIST_POISON2;
}

// in the loop
if (rcu_list_deleted(&shrinker->list)) {
shrinker_put(shrinker);
goto restart;
}
rcu_read_lock();
shrinker_put(shrinker);

Thanks
Lai

> + * step 4: do shrinker_put() paired with step 2 to put the refcount,
> + * if the refcount reaches 0, then wake up the waiter in
> + * shrinker_free() by calling complete().
> + */
> + rcu_read_lock();
> + list_for_each_entry_rcu(shrinker, &shrinker_list, list) {
> struct shrink_control sc = {
> .gfp_mask = gfp_mask,
> .nid = nid,
> .memcg = memcg,
> };
>
> + if (!shrinker_try_get(shrinker))
> + continue;
> +
> + rcu_read_unlock();
> +
> ret = do_shrink_slab(&sc, shrinker, priority);
> if (ret == SHRINK_EMPTY)
> ret = 0;
> freed += ret;
> - /*
> - * Bail out if someone want to register a new shrinker to
> - * prevent the registration from being stalled for long periods
> - * by parallel ongoing shrinking.
> - */
> - if (rwsem_is_contended(&shrinker_rwsem)) {
> - freed = freed ? : 1;
> - break;
> - }
> +
> + rcu_read_lock();
> + shrinker_put(shrinker);
> }
>


2023-12-06 07:56:25

by Qi Zheng

[permalink] [raw]
Subject: Re: [PATCH v6 42/45] mm: shrinker: make global slab shrink lockless

Hi,

On 2023/12/6 15:47, Lai Jiangshan wrote:
> On Tue, Sep 12, 2023 at 9:57 PM Qi Zheng <[email protected]> wrote:
>
>> - if (!down_read_trylock(&shrinker_rwsem))
>> - goto out;
>> -
>> - list_for_each_entry(shrinker, &shrinker_list, list) {
>> + /*
>> + * lockless algorithm of global shrink.
>> + *
>> + * In the unregistration setp, the shrinker will be freed asynchronously
>> + * via RCU after its refcount reaches 0. So both rcu_read_lock() and
>> + * shrinker_try_get() can be used to ensure the existence of the shrinker.
>> + *
>> + * So in the global shrink:
>> + * step 1: use rcu_read_lock() to guarantee existence of the shrinker
>> + * and the validity of the shrinker_list walk.
>> + * step 2: use shrinker_try_get() to try get the refcount, if successful,
>> + * then the existence of the shrinker can also be guaranteed,
>> + * so we can release the RCU lock to do do_shrink_slab() that
>> + * may sleep.
>> + * step 3: *MUST* to reacquire the RCU lock before calling shrinker_put(),
>> + * which ensures that neither this shrinker nor the next shrinker
>> + * will be freed in the next traversal operation.
>
> Hello, Qi, Andrew, Paul,
>
> I wonder know how RCU can ensure the lifespan of the next shrinker.
> it seems it is diverged from the common pattern usage of RCU+reference.
>
> cpu1:
> rcu_read_lock();
> shrinker_try_get(this_shrinker);
> rcu_read_unlock();
> cpu2: shrinker_free(this_shrinker);
> cpu2: shrinker_free(next_shrinker); and free the memory of next_shrinker
> cpu2: when shrinker_free(next_shrinker), no one updates this_shrinker's next
> cpu2: since this_shrinker has been removed first.

No, this_shrinker will not be removed from the shrinker_list until the
last refcount is released. See below:

> rcu_read_lock();
> shrinker_put(this_shrinker);

CPU 1 CPU 2

--> if (refcount_dec_and_test(&shrinker->refcount))
complete(&shrinker->done);

wait_for_completion(&shrinker->done);
list_del_rcu(&shrinker->list);

> travel to the freed next_shrinker.
>
> a quick simple fix:
>
> // called with other references other than RCU (i.e. refcount)
> static inline rcu_list_deleted(struct list_head *entry)
> {
> // something like this:
> return entry->prev == LIST_POISON2;
> }
>
> // in the loop
> if (rcu_list_deleted(&shrinker->list)) {
> shrinker_put(shrinker);
> goto restart;
> }
> rcu_read_lock();
> shrinker_put(shrinker);
>
> Thanks
> Lai
>
>> + * step 4: do shrinker_put() paired with step 2 to put the refcount,
>> + * if the refcount reaches 0, then wake up the waiter in
>> + * shrinker_free() by calling complete().
>> + */
>> + rcu_read_lock();
>> + list_for_each_entry_rcu(shrinker, &shrinker_list, list) {
>> struct shrink_control sc = {
>> .gfp_mask = gfp_mask,
>> .nid = nid,
>> .memcg = memcg,
>> };
>>
>> + if (!shrinker_try_get(shrinker))
>> + continue;
>> +
>> + rcu_read_unlock();
>> +
>> ret = do_shrink_slab(&sc, shrinker, priority);
>> if (ret == SHRINK_EMPTY)
>> ret = 0;
>> freed += ret;
>> - /*
>> - * Bail out if someone want to register a new shrinker to
>> - * prevent the registration from being stalled for long periods
>> - * by parallel ongoing shrinking.
>> - */
>> - if (rwsem_is_contended(&shrinker_rwsem)) {
>> - freed = freed ? : 1;
>> - break;
>> - }
>> +
>> + rcu_read_lock();
>> + shrinker_put(shrinker);
>> }
>>

2023-12-06 08:23:49

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH v6 42/45] mm: shrinker: make global slab shrink lockless

On Wed, Dec 6, 2023 at 3:55 PM Qi Zheng <[email protected]> wrote:
>
> Hi,
>
> On 2023/12/6 15:47, Lai Jiangshan wrote:
> > On Tue, Sep 12, 2023 at 9:57 PM Qi Zheng <[email protected]> wrote:
> >
> >> - if (!down_read_trylock(&shrinker_rwsem))
> >> - goto out;
> >> -
> >> - list_for_each_entry(shrinker, &shrinker_list, list) {
> >> + /*
> >> + * lockless algorithm of global shrink.
> >> + *
> >> + * In the unregistration setp, the shrinker will be freed asynchronously
> >> + * via RCU after its refcount reaches 0. So both rcu_read_lock() and
> >> + * shrinker_try_get() can be used to ensure the existence of the shrinker.
> >> + *
> >> + * So in the global shrink:
> >> + * step 1: use rcu_read_lock() to guarantee existence of the shrinker
> >> + * and the validity of the shrinker_list walk.
> >> + * step 2: use shrinker_try_get() to try get the refcount, if successful,
> >> + * then the existence of the shrinker can also be guaranteed,
> >> + * so we can release the RCU lock to do do_shrink_slab() that
> >> + * may sleep.
> >> + * step 3: *MUST* to reacquire the RCU lock before calling shrinker_put(),
> >> + * which ensures that neither this shrinker nor the next shrinker
> >> + * will be freed in the next traversal operation.
> >
> > Hello, Qi, Andrew, Paul,
> >
> > I wonder know how RCU can ensure the lifespan of the next shrinker.
> > it seems it is diverged from the common pattern usage of RCU+reference.
> >
> > cpu1:
> > rcu_read_lock();
> > shrinker_try_get(this_shrinker);
> > rcu_read_unlock();
> > cpu2: shrinker_free(this_shrinker);
> > cpu2: shrinker_free(next_shrinker); and free the memory of next_shrinker
> > cpu2: when shrinker_free(next_shrinker), no one updates this_shrinker's next
> > cpu2: since this_shrinker has been removed first.
>
> No, this_shrinker will not be removed from the shrinker_list until the
> last refcount is released. See below:
>
> > rcu_read_lock();
> > shrinker_put(this_shrinker);
>
> CPU 1 CPU 2
>
> --> if (refcount_dec_and_test(&shrinker->refcount))
> complete(&shrinker->done);
>
> wait_for_completion(&shrinker->done);
> list_del_rcu(&shrinker->list);

since shrinker will not be removed from the shrinker_list until the
last refcount is released.

Is it possible that shrinker_free() can be starved by continuous
scanners getting and putting the refcount?

Thanks
Lai


>
> > travel to the freed next_shrinker.
> >
> > a quick simple fix:
> >
> > // called with other references other than RCU (i.e. refcount)
> > static inline rcu_list_deleted(struct list_head *entry)
> > {
> > // something like this:
> > return entry->prev == LIST_POISON2;
> > }
> >
> > // in the loop
> > if (rcu_list_deleted(&shrinker->list)) {
> > shrinker_put(shrinker);
> > goto restart;
> > }
> > rcu_read_lock();
> > shrinker_put(shrinker);
> >
> > Thanks
> > Lai
> >
> >> + * step 4: do shrinker_put() paired with step 2 to put the refcount,
> >> + * if the refcount reaches 0, then wake up the waiter in
> >> + * shrinker_free() by calling complete().
> >> + */
> >> + rcu_read_lock();
> >> + list_for_each_entry_rcu(shrinker, &shrinker_list, list) {
> >> struct shrink_control sc = {
> >> .gfp_mask = gfp_mask,
> >> .nid = nid,
> >> .memcg = memcg,
> >> };
> >>
> >> + if (!shrinker_try_get(shrinker))
> >> + continue;
> >> +
> >> + rcu_read_unlock();
> >> +
> >> ret = do_shrink_slab(&sc, shrinker, priority);
> >> if (ret == SHRINK_EMPTY)
> >> ret = 0;
> >> freed += ret;
> >> - /*
> >> - * Bail out if someone want to register a new shrinker to
> >> - * prevent the registration from being stalled for long periods
> >> - * by parallel ongoing shrinking.
> >> - */
> >> - if (rwsem_is_contended(&shrinker_rwsem)) {
> >> - freed = freed ? : 1;
> >> - break;
> >> - }
> >> +
> >> + rcu_read_lock();
> >> + shrinker_put(shrinker);
> >> }
> >>

2023-12-06 08:37:59

by Qi Zheng

[permalink] [raw]
Subject: Re: [PATCH v6 42/45] mm: shrinker: make global slab shrink lockless

Hi,

On 2023/12/6 16:23, Lai Jiangshan wrote:
> On Wed, Dec 6, 2023 at 3:55 PM Qi Zheng <[email protected]> wrote:
>>
>> Hi,
>>
>> On 2023/12/6 15:47, Lai Jiangshan wrote:
>>> On Tue, Sep 12, 2023 at 9:57 PM Qi Zheng <[email protected]> wrote:
>>>
>>>> - if (!down_read_trylock(&shrinker_rwsem))
>>>> - goto out;
>>>> -
>>>> - list_for_each_entry(shrinker, &shrinker_list, list) {
>>>> + /*
>>>> + * lockless algorithm of global shrink.
>>>> + *
>>>> + * In the unregistration setp, the shrinker will be freed asynchronously
>>>> + * via RCU after its refcount reaches 0. So both rcu_read_lock() and
>>>> + * shrinker_try_get() can be used to ensure the existence of the shrinker.
>>>> + *
>>>> + * So in the global shrink:
>>>> + * step 1: use rcu_read_lock() to guarantee existence of the shrinker
>>>> + * and the validity of the shrinker_list walk.
>>>> + * step 2: use shrinker_try_get() to try get the refcount, if successful,
>>>> + * then the existence of the shrinker can also be guaranteed,
>>>> + * so we can release the RCU lock to do do_shrink_slab() that
>>>> + * may sleep.
>>>> + * step 3: *MUST* to reacquire the RCU lock before calling shrinker_put(),
>>>> + * which ensures that neither this shrinker nor the next shrinker
>>>> + * will be freed in the next traversal operation.
>>>
>>> Hello, Qi, Andrew, Paul,
>>>
>>> I wonder know how RCU can ensure the lifespan of the next shrinker.
>>> it seems it is diverged from the common pattern usage of RCU+reference.
>>>
>>> cpu1:
>>> rcu_read_lock();
>>> shrinker_try_get(this_shrinker);
>>> rcu_read_unlock();
>>> cpu2: shrinker_free(this_shrinker);
>>> cpu2: shrinker_free(next_shrinker); and free the memory of next_shrinker
>>> cpu2: when shrinker_free(next_shrinker), no one updates this_shrinker's next
>>> cpu2: since this_shrinker has been removed first.
>>
>> No, this_shrinker will not be removed from the shrinker_list until the
>> last refcount is released. See below:
>>
>>> rcu_read_lock();
>>> shrinker_put(this_shrinker);
>>
>> CPU 1 CPU 2
>>
>> --> if (refcount_dec_and_test(&shrinker->refcount))
>> complete(&shrinker->done);
>>
>> wait_for_completion(&shrinker->done);
>> list_del_rcu(&shrinker->list);
>
> since shrinker will not be removed from the shrinker_list until the
> last refcount is released.
>
> Is it possible that shrinker_free() can be starved by continuous
> scanners getting and putting the refcount?

I actually considered this case, but the probability of this
happening was low, so I discarded the relevant code (v2 --> v3).
If this problem really occurs in a production environment, we
can fix it, like below:

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 1a00be90d93a..e5ebbbf1414f 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -88,6 +88,7 @@ struct shrinker {
long batch; /* reclaim batch size, 0 = default */
int seeks; /* seeks to recreate an obj */
unsigned flags;
+ bool registered;

/*
* The reference count of this shrinker. Registered shrinker
have an
@@ -138,7 +139,8 @@ void shrinker_free(struct shrinker *shrinker);

static inline bool shrinker_try_get(struct shrinker *shrinker)
{
- return refcount_inc_not_zero(&shrinker->refcount);
+ return READ_ONCE(shrinker->registered) &&
+ refcount_inc_not_zero(&shrinker->refcount);
}

static inline void shrinker_put(struct shrinker *shrinker)
diff --git a/mm/shrinker.c b/mm/shrinker.c
index dd91eab43ed3..9b8881d178c6 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -753,6 +753,7 @@ void shrinker_register(struct shrinker *shrinker)
* shrinker_try_get().
*/
refcount_set(&shrinker->refcount, 1);
+ WRITE_ONCE(shrinker->registered, true);
}
EXPORT_SYMBOL_GPL(shrinker_register);

@@ -773,6 +774,7 @@ void shrinker_free(struct shrinker *shrinker)
return;

if (shrinker->flags & SHRINKER_REGISTERED) {
+ WRITE_ONCE(shrinker->registered, false);
/* drop the initial refcount */
shrinker_put(shrinker);
/*

Thanks,
Qi

>
> Thanks
> Lai
>
>
>>
>>> travel to the freed next_shrinker.
>>>
>>> a quick simple fix:
>>>
>>> // called with other references other than RCU (i.e. refcount)
>>> static inline rcu_list_deleted(struct list_head *entry)
>>> {
>>> // something like this:
>>> return entry->prev == LIST_POISON2;
>>> }
>>>
>>> // in the loop
>>> if (rcu_list_deleted(&shrinker->list)) {
>>> shrinker_put(shrinker);
>>> goto restart;
>>> }
>>> rcu_read_lock();
>>> shrinker_put(shrinker);
>>>
>>> Thanks
>>> Lai
>>>
>>>> + * step 4: do shrinker_put() paired with step 2 to put the refcount,
>>>> + * if the refcount reaches 0, then wake up the waiter in
>>>> + * shrinker_free() by calling complete().
>>>> + */
>>>> + rcu_read_lock();
>>>> + list_for_each_entry_rcu(shrinker, &shrinker_list, list) {
>>>> struct shrink_control sc = {
>>>> .gfp_mask = gfp_mask,
>>>> .nid = nid,
>>>> .memcg = memcg,
>>>> };
>>>>
>>>> + if (!shrinker_try_get(shrinker))
>>>> + continue;
>>>> +
>>>> + rcu_read_unlock();
>>>> +
>>>> ret = do_shrink_slab(&sc, shrinker, priority);
>>>> if (ret == SHRINK_EMPTY)
>>>> ret = 0;
>>>> freed += ret;
>>>> - /*
>>>> - * Bail out if someone want to register a new shrinker to
>>>> - * prevent the registration from being stalled for long periods
>>>> - * by parallel ongoing shrinking.
>>>> - */
>>>> - if (rwsem_is_contended(&shrinker_rwsem)) {
>>>> - freed = freed ? : 1;
>>>> - break;
>>>> - }
>>>> +
>>>> + rcu_read_lock();
>>>> + shrinker_put(shrinker);
>>>> }
>>>>

2023-12-06 08:43:09

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH v6 42/45] mm: shrinker: make global slab shrink lockless

On Wed, Dec 6, 2023 at 3:55 PM Qi Zheng <[email protected]> wrote:
>
> Hi,
>
> On 2023/12/6 15:47, Lai Jiangshan wrote:
> > On Tue, Sep 12, 2023 at 9:57 PM Qi Zheng <[email protected]> wrote:
> >
> >> - if (!down_read_trylock(&shrinker_rwsem))
> >> - goto out;
> >> -
> >> - list_for_each_entry(shrinker, &shrinker_list, list) {
> >> + /*
> >> + * lockless algorithm of global shrink.
> >> + *
> >> + * In the unregistration setp, the shrinker will be freed asynchronously
> >> + * via RCU after its refcount reaches 0. So both rcu_read_lock() and
> >> + * shrinker_try_get() can be used to ensure the existence of the shrinker.
> >> + *
> >> + * So in the global shrink:
> >> + * step 1: use rcu_read_lock() to guarantee existence of the shrinker
> >> + * and the validity of the shrinker_list walk.
> >> + * step 2: use shrinker_try_get() to try get the refcount, if successful,
> >> + * then the existence of the shrinker can also be guaranteed,
> >> + * so we can release the RCU lock to do do_shrink_slab() that
> >> + * may sleep.
> >> + * step 3: *MUST* to reacquire the RCU lock before calling shrinker_put(),
> >> + * which ensures that neither this shrinker nor the next shrinker
> >> + * will be freed in the next traversal operation.
> >
> > Hello, Qi, Andrew, Paul,
> >
> > I wonder know how RCU can ensure the lifespan of the next shrinker.
> > it seems it is diverged from the common pattern usage of RCU+reference.
> >
> > cpu1:
> > rcu_read_lock();
> > shrinker_try_get(this_shrinker);
> > rcu_read_unlock();
> > cpu2: shrinker_free(this_shrinker);
> > cpu2: shrinker_free(next_shrinker); and free the memory of next_shrinker
> > cpu2: when shrinker_free(next_shrinker), no one updates this_shrinker's next
> > cpu2: since this_shrinker has been removed first.
>
> No, this_shrinker will not be removed from the shrinker_list until the
> last refcount is released. See below:

Oh, I miss the code here.

Thanks
Lai


>
> > rcu_read_lock();
> > shrinker_put(this_shrinker);
>
> CPU 1 CPU 2
>
> --> if (refcount_dec_and_test(&shrinker->refcount))
> complete(&shrinker->done);
>
> wait_for_completion(&shrinker->done);
> list_del_rcu(&shrinker->list);
>
> > travel to the freed next_shrinker.
> >
> > a quick simple fix:
> >
> > // called with other references other than RCU (i.e. refcount)
> > static inline rcu_list_deleted(struct list_head *entry)
> > {
> > // something like this:
> > return entry->prev == LIST_POISON2;
> > }
> >
> > // in the loop
> > if (rcu_list_deleted(&shrinker->list)) {
> > shrinker_put(shrinker);
> > goto restart;
> > }
> > rcu_read_lock();
> > shrinker_put(shrinker);
> >
> > Thanks
> > Lai
> >
> >> + * step 4: do shrinker_put() paired with step 2 to put the refcount,
> >> + * if the refcount reaches 0, then wake up the waiter in
> >> + * shrinker_free() by calling complete().
> >> + */
> >> + rcu_read_lock();
> >> + list_for_each_entry_rcu(shrinker, &shrinker_list, list) {
> >> struct shrink_control sc = {
> >> .gfp_mask = gfp_mask,
> >> .nid = nid,
> >> .memcg = memcg,
> >> };
> >>
> >> + if (!shrinker_try_get(shrinker))
> >> + continue;
> >> +
> >> + rcu_read_unlock();
> >> +
> >> ret = do_shrink_slab(&sc, shrinker, priority);
> >> if (ret == SHRINK_EMPTY)
> >> ret = 0;
> >> freed += ret;
> >> - /*
> >> - * Bail out if someone want to register a new shrinker to
> >> - * prevent the registration from being stalled for long periods
> >> - * by parallel ongoing shrinking.
> >> - */
> >> - if (rwsem_is_contended(&shrinker_rwsem)) {
> >> - freed = freed ? : 1;
> >> - break;
> >> - }
> >> +
> >> + rcu_read_lock();
> >> + shrinker_put(shrinker);
> >> }
> >>