2024-01-25 14:33:22

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model

On Mon, Jan 15, 2024 at 03:37:41PM +0100, Anna-Maria Behnsen wrote:
> +/**
> + * tmigr_quick_check() - Quick forecast of next tmigr event when CPU wants to
> + * go idle
> + *
> + * Returns KTIME_MAX, when it is probable that nothing has to be done (not the
> + * only one in the level 0 group; and if it is the only one in level 0 group,
> + * but there are more than a single group active in top level)
> + *
> + * Returns first expiry of the top level group, when it is the only one in level
> + * 0 and top level also only has a single active child.
> + */
> +u64 tmigr_quick_check(void)
> +{
> + struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
> + struct tmigr_group *topgroup;
> + struct list_head lvllist;
> +
> + if (tmigr_is_not_available(tmc))
> + return KTIME_MAX;

Offline CPUs are supposed to handle their own global timers.

So instead of returning KTIME_MAX here, shouldn't we pass instead
tevt->global as a parameter and return that value?

Otherwise the quick check will simply ignore the next global event of this CPU
if it's before the next local event.

> +
> + if (WARN_ON_ONCE(tmc->idle))
> + return KTIME_MAX;

Same here I guess...

> +
> + if (!tmigr_check_migrator_and_lonely(tmc->tmgroup, tmc->childmask))
> + return KTIME_MAX;

This one makes sense.

> +
> + for (int i = tmigr_hierarchy_levels; i > 0 ; i--) {
> + lvllist = tmigr_level_list[i - 1];
> + if (list_is_singular(&lvllist)) {
> + topgroup = list_first_entry(&lvllist, struct
> tmigr_group, list);

Is it safe against concurrent allocation failure in hotplug?

If the list is seen singular, then concurrently a CPU comes up and creates/add
a new group. The current CPU actually fetches it instead of the current group
because it's not singular anymore. But then some higher level group
allocation fails and the newly added first entry is removed.

list_is_singular() looks safe. But list_first_entry isn't. You can create
list_first_entry_rcu:

#define list_first_entry_rcu(ptr, type, member) \
list_entry_rcu((ptr)->next, type, member)

Protected inside rcu_read_lock() until the below READ_ONCE().

And then use list_del_rcu/list_add_rcu/kfree_rcu on the update side.

Isn't it possible to walk through group->parent instead?

> +
> + if (tmigr_check_lonely(topgroup))
> + return READ_ONCE(topgroup->next_expiry);
> + } else {
> + continue;
> + }
> + }
> +
> + return KTIME_MAX;

I'm less sure about that return value.

Thanks.


2024-01-28 15:59:17

by Anna-Maria Behnsen

[permalink] [raw]
Subject: Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model

Frederic Weisbecker <[email protected]> writes:

> On Mon, Jan 15, 2024 at 03:37:41PM +0100, Anna-Maria Behnsen wrote:
>> +/**
>> + * tmigr_quick_check() - Quick forecast of next tmigr event when CPU wants to
>> + * go idle
>> + *
>> + * Returns KTIME_MAX, when it is probable that nothing has to be done (not the
>> + * only one in the level 0 group; and if it is the only one in level 0 group,
>> + * but there are more than a single group active in top level)
>> + *
>> + * Returns first expiry of the top level group, when it is the only one in level
>> + * 0 and top level also only has a single active child.
>> + */
>> +u64 tmigr_quick_check(void)
>> +{
>> + struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
>> + struct tmigr_group *topgroup;
>> + struct list_head lvllist;
>> +
>> + if (tmigr_is_not_available(tmc))
>> + return KTIME_MAX;
>
> Offline CPUs are supposed to handle their own global timers.
>
> So instead of returning KTIME_MAX here, shouldn't we pass instead
> tevt->global as a parameter and return that value?
>
> Otherwise the quick check will simply ignore the next global event of this CPU
> if it's before the next local event.

I thought about this as well. I skipped it to keep it as simple as
possible - as this is only a forecast and might change (CPUs will come
online/go offline/deletion of timers...). But I can integrate it to keep
it more precise.

>> +
>> + if (WARN_ON_ONCE(tmc->idle))
>> + return KTIME_MAX;
>
> Same here I guess...

Yes.

>> +
>> + if (!tmigr_check_migrator_and_lonely(tmc->tmgroup, tmc->childmask))
>> + return KTIME_MAX;
>
> This one makes sense.
>
>> +
>> + for (int i = tmigr_hierarchy_levels; i > 0 ; i--) {
>> + lvllist = tmigr_level_list[i - 1];
>> + if (list_is_singular(&lvllist)) {
>> + topgroup = list_first_entry(&lvllist, struct
>> tmigr_group, list);
>
> Is it safe against concurrent allocation failure in hotplug?

As you pointed out, it isn't!

> If the list is seen singular, then concurrently a CPU comes up and creates/add
> a new group. The current CPU actually fetches it instead of the current group
> because it's not singular anymore. But then some higher level group
> allocation fails and the newly added first entry is removed.
>
> list_is_singular() looks safe. But list_first_entry isn't. You can create
> list_first_entry_rcu:
>
> #define list_first_entry_rcu(ptr, type, member) \
> list_entry_rcu((ptr)->next, type, member)
>
> Protected inside rcu_read_lock() until the below READ_ONCE().
>
> And then use list_del_rcu/list_add_rcu/kfree_rcu on the update side.
>
> Isn't it possible to walk through group->parent instead?

Yes. Sure! This is possible and maybe also easier. I cannot remember why
I implemented it this way...

>> +
>> + if (tmigr_check_lonely(topgroup))
>> + return READ_ONCE(topgroup->next_expiry);

When I hand in tevt->global as a parameter, I'll need to compare the
first expiry of the toplevel group and the tevt->global value and return
the earlier expiry. Only a single child is active in top level, so it
might be that this CPU is the last active CPU in hierarchy.

I didn't check all the way to the top whether all groups are
'lonely'. So when the top level group has only a single active child, it
is also possible that the child of the top level group has two active
children... Then a return of KTIME_MAX would be also a more precise
forecast.

This quick check is there to keep the overhead minimal when checking
whether it might be possible to go idle. So I don't know, if we should
add this additional check per level (which is pretty simple when using
group->parent for walking the hierarchy). What do you think?

>> + } else {
>> + continue;
>> + }
>> + }
>> +
>> + return KTIME_MAX;
>
> I'm less sure about that return value.


This is ok, because there is not only a single child active in top
level. This CPU is definitely not the last active CPU in hierarchy. So
it is likely that some other CPU will handle tevt->global of this CPU.

I'll add a comment :)

Thanks,

Anna-Maria


2024-01-30 15:30:10

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model

Le Sun, Jan 28, 2024 at 04:58:55PM +0100, Anna-Maria Behnsen a ?crit :
> Frederic Weisbecker <[email protected]> writes:
> >> +
> >> + if (tmigr_check_lonely(topgroup))
> >> + return READ_ONCE(topgroup->next_expiry);
>
> When I hand in tevt->global as a parameter, I'll need to compare the
> first expiry of the toplevel group and the tevt->global value and return
> the earlier expiry. Only a single child is active in top level, so it
> might be that this CPU is the last active CPU in hierarchy.
>
> I didn't check all the way to the top whether all groups are
> 'lonely'. So when the top level group has only a single active child, it
> is also possible that the child of the top level group has two active
> children... Then a return of KTIME_MAX would be also a more precise
> forecast.
>
> This quick check is there to keep the overhead minimal when checking
> whether it might be possible to go idle. So I don't know, if we should
> add this additional check per level (which is pretty simple when using
> group->parent for walking the hierarchy). What do you think?

Not sure. Maybe if the tree never exceeds 3 levels (does it?) it's ok to
do the walk?

Thanks.

2024-01-30 16:45:41

by Anna-Maria Behnsen

[permalink] [raw]
Subject: Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model

Frederic Weisbecker <[email protected]> writes:

> Le Sun, Jan 28, 2024 at 04:58:55PM +0100, Anna-Maria Behnsen a écrit :
>> Frederic Weisbecker <[email protected]> writes:
>> >> +
>> >> + if (tmigr_check_lonely(topgroup))
>> >> + return READ_ONCE(topgroup->next_expiry);
>>
>> When I hand in tevt->global as a parameter, I'll need to compare the
>> first expiry of the toplevel group and the tevt->global value and return
>> the earlier expiry. Only a single child is active in top level, so it
>> might be that this CPU is the last active CPU in hierarchy.
>>
>> I didn't check all the way to the top whether all groups are
>> 'lonely'. So when the top level group has only a single active child, it
>> is also possible that the child of the top level group has two active
>> children... Then a return of KTIME_MAX would be also a more precise
>> forecast.
>>
>> This quick check is there to keep the overhead minimal when checking
>> whether it might be possible to go idle. So I don't know, if we should
>> add this additional check per level (which is pretty simple when using
>> group->parent for walking the hierarchy). What do you think?
>
> Not sure. Maybe if the tree never exceeds 3 levels (does it?) it's ok to
> do the walk?

Levels depend on the number of CPUs and NUMA nodes

Take 4096 CPUs, all evenly distributed across 4 NUMA nodes, then you
will end up with 6 levels.

Thanks,

Anna-Maria