2022-03-31 03:40:53

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()

On Wed, 30 Mar 2022 13:26:46 -0400 Waiman Long <[email protected]> wrote:

> Muchun Song found out there could be a race between list_lru_add()
> and memcg_reparent_list_lru_node() causing the later function to miss
> reparenting of a lru entry as shown below:
>
> CPU0: CPU1:
> list_lru_add()
> spin_lock(&nlru->lock)
> l = list_lru_from_kmem(memcg)
> memcg_reparent_objcgs(memcg)
> memcg_reparent_list_lrus(memcg)
> memcg_reparent_list_lru()
> memcg_reparent_list_lru_node()
> if (!READ_ONCE(nlru->nr_items))
> // Miss reparenting
> return
> // Assume 0->1
> l->nr_items++
> // Assume 0->1
> nlru->nr_items++
>
> Though it is not likely that a list_lru_node that has 0 item suddenly
> has a newly added lru entry at the end of its life. The race is still
> theoretically possible.
>
> With the lock/unlock pair used within the percpu_ref_kill() which is
> the last function call of memcg_reparent_objcgs(), any read issued
> in memcg_reparent_list_lru_node() will not be reordered before the
> reparenting of objcgs.
>
> Adding a !spin_is_locked()/smp_rmb()/!READ_ONCE(nlru->nr_items) check
> to ensure that either the reading of nr_items is valid or the racing
> list_lru_add() will see the reparented objcg.
>
> ...
>
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -395,10 +395,33 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid,
> struct list_lru_one *src, *dst;
>
> /*
> - * If there is no lru entry in this nlru, we can skip it immediately.
> + * With the lock/unlock pair used within the percpu_ref_kill()
> + * which is the last function call of memcg_reparent_objcgs(), any
> + * read issued here will not be reordered before the reparenting
> + * of objcgs.
> + *
> + * Assuming a racing list_lru_add():
> + * list_lru_add()
> + * <- memcg_reparent_list_lru_node()
> + * spin_lock(&nlru->lock)
> + * l = list_lru_from_kmem(memcg)
> + * nlru->nr_items++
> + * spin_unlock(&nlru->lock)
> + * <- memcg_reparent_list_lru_node()
> + *
> + * The !spin_is_locked(&nlru->lock) check is true means it is
> + * either before the spin_lock() or after the spin_unlock(). In the
> + * former case, list_lru_add() will see the reparented objcg and so
> + * won't touch the lru to be reparented. In the later case, it will
> + * see the updated nr_items. So we can use the optimization that if
> + * there is no lru entry in this nlru, skip it immediately.
> */
> - if (!READ_ONCE(nlru->nr_items))
> - return;
> + if (!spin_is_locked(&nlru->lock)) {

ick.

> + /* nr_items read must be ordered after nlru->lock */
> + smp_rmb();
> + if (!READ_ONCE(nlru->nr_items))
> + return;
> + }

include/linux/spinlock_up.h has

#define arch_spin_is_locked(lock) ((void)(lock), 0)

so this `if' will always be true on CONFIG_SMP=n. Will the kernel
still work?

At the very least let's have changelogging and commenting explaining
that we've actually thought about this.

Preferably, can we fix this hole properly and avoid this hack? There is
a reason for this:

hp2:/usr/src/25> grep spin_is_locked mm/*.c
hp2:/usr/src/25>



2022-03-31 05:07:48

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()



> On Mar 30, 2022, at 7:14 PM, Andrew Morton <[email protected]> wrote:
>
> On Wed, 30 Mar 2022 13:26:46 -0400 Waiman Long <[email protected]> wrote:
>
>> Muchun Song found out there could be a race between list_lru_add()
>> and memcg_reparent_list_lru_node() causing the later function to miss
>> reparenting of a lru entry as shown below:
>>
>> CPU0: CPU1:
>> list_lru_add()
>> spin_lock(&nlru->lock)
>> l = list_lru_from_kmem(memcg)
>> memcg_reparent_objcgs(memcg)
>> memcg_reparent_list_lrus(memcg)
>> memcg_reparent_list_lru()
>> memcg_reparent_list_lru_node()
>> if (!READ_ONCE(nlru->nr_items))
>> // Miss reparenting
>> return
>> // Assume 0->1
>> l->nr_items++
>> // Assume 0->1
>> nlru->nr_items++
>>
>> Though it is not likely that a list_lru_node that has 0 item suddenly
>> has a newly added lru entry at the end of its life. The race is still
>> theoretically possible.
>>
>> With the lock/unlock pair used within the percpu_ref_kill() which is
>> the last function call of memcg_reparent_objcgs(), any read issued
>> in memcg_reparent_list_lru_node() will not be reordered before the
>> reparenting of objcgs.
>>
>> Adding a !spin_is_locked()/smp_rmb()/!READ_ONCE(nlru->nr_items) check
>> to ensure that either the reading of nr_items is valid or the racing
>> list_lru_add() will see the reparented objcg.
>>
>> ...
>>
>> --- a/mm/list_lru.c
>> +++ b/mm/list_lru.c
>> @@ -395,10 +395,33 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid,
>> struct list_lru_one *src, *dst;
>>
>> /*
>> - * If there is no lru entry in this nlru, we can skip it immediately.
>> + * With the lock/unlock pair used within the percpu_ref_kill()
>> + * which is the last function call of memcg_reparent_objcgs(), any
>> + * read issued here will not be reordered before the reparenting
>> + * of objcgs.
>> + *
>> + * Assuming a racing list_lru_add():
>> + * list_lru_add()
>> + * <- memcg_reparent_list_lru_node()
>> + * spin_lock(&nlru->lock)
>> + * l = list_lru_from_kmem(memcg)
>> + * nlru->nr_items++
>> + * spin_unlock(&nlru->lock)
>> + * <- memcg_reparent_list_lru_node()
>> + *
>> + * The !spin_is_locked(&nlru->lock) check is true means it is
>> + * either before the spin_lock() or after the spin_unlock(). In the
>> + * former case, list_lru_add() will see the reparented objcg and so
>> + * won't touch the lru to be reparented. In the later case, it will
>> + * see the updated nr_items. So we can use the optimization that if
>> + * there is no lru entry in this nlru, skip it immediately.
>> */
>> - if (!READ_ONCE(nlru->nr_items))
>> - return;
>> + if (!spin_is_locked(&nlru->lock)) {
>
> ick.
>
>> + /* nr_items read must be ordered after nlru->lock */
>> + smp_rmb();
>> + if (!READ_ONCE(nlru->nr_items))
>> + return;
>> + }
>
> include/linux/spinlock_up.h has
>
> #define arch_spin_is_locked(lock) ((void)(lock), 0)
>
> so this `if' will always be true on CONFIG_SMP=n. Will the kernel
> still work?

I guess yes, because this race is not possible on a !smp machine.

>
> At the very least let's have changelogging and commenting explaining
> that we've actually thought about this.
>
> Preferably, can we fix this hole properly and avoid this hack? There is
> a reason for this:
>
> hp2:/usr/src/25> grep spin_is_locked mm/*.c
> hp2:/usr/src/25>


But honestly, I’d drop the original optimization together with the fix, if only there is no _real world_ data on the problem and the improvement. It seems like it has started as a nice simple improvement, but the race makes it complex and probably not worth the added complexity and fragility.

Thanks!

2022-03-31 08:40:43

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node()

On Wed, Mar 30, 2022 at 07:48:45PM -0700, Roman Gushchin wrote:
>
>
[...]
>
>
> But honestly, I’d drop the original optimization together with the fix, if only there is no _real world_ data on the problem and the improvement. It seems like it has started as a nice simple improvement, but the race makes it complex and probably not worth the added complexity and fragility.

I agree with dropping the original optimization as it is not really
fixing an observed issue which may justify adding some complexity.

thanks,
Shakeel