2022-05-17 01:58:37

by Chen Wandun

[permalink] [raw]
Subject: [PATCH 1/2] psi: add support for multi level pressure stall trigger

Nowadays, psi events are triggered when stall time exceed
stall threshold, but no any different between these events.

Actually, events can be divide into multi level, each level
represent a different stall pressure, that is help to identify
pressure information more accurately.

echo "some 150000 350000 1000000" > /proc/pressure/memory would
add [150ms, 350ms) threshold for partial memory stall measured
within 1sec time window.

Signed-off-by: Chen Wandun <[email protected]>
---
include/linux/psi_types.h | 3 ++-
kernel/sched/psi.c | 19 +++++++++++++------
2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h
index c7fe7c089718..2b1393c8bf90 100644
--- a/include/linux/psi_types.h
+++ b/include/linux/psi_types.h
@@ -119,7 +119,8 @@ struct psi_trigger {
enum psi_states state;

/* User-spacified threshold in ns */
- u64 threshold;
+ u64 min_threshold;
+ u64 max_threshold;

/* List node inside triggers list */
struct list_head node;
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 6f9533c95b0a..17dd233b533a 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -541,7 +541,7 @@ static u64 update_triggers(struct psi_group *group, u64 now)

/* Calculate growth since last update */
growth = window_update(&t->win, now, total[t->state]);
- if (growth < t->threshold)
+ if (growth < t->min_threshold || growth >= t->max_threshold)
continue;

t->pending_event = true;
@@ -1087,15 +1087,18 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
{
struct psi_trigger *t;
enum psi_states state;
- u32 threshold_us;
+ u32 min_threshold_us;
+ u32 max_threshold_us;
u32 window_us;

if (static_branch_likely(&psi_disabled))
return ERR_PTR(-EOPNOTSUPP);

- if (sscanf(buf, "some %u %u", &threshold_us, &window_us) == 2)
+ if (sscanf(buf, "some %u %u %u", &min_threshold_us,
+ &max_threshold_us, &window_us) == 3)
state = PSI_IO_SOME + res * 2;
- else if (sscanf(buf, "full %u %u", &threshold_us, &window_us) == 2)
+ else if (sscanf(buf, "full %u %u %u", &min_threshold_us,
+ &max_threshold_us, &window_us) == 3)
state = PSI_IO_FULL + res * 2;
else
return ERR_PTR(-EINVAL);
@@ -1107,8 +1110,11 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
window_us > WINDOW_MAX_US)
return ERR_PTR(-EINVAL);

+ if (min_threshold_us >= max_threshold_us)
+ return ERR_PTR(-EINVAL);
+
/* Check threshold */
- if (threshold_us == 0 || threshold_us > window_us)
+ if (max_threshold_us > window_us)
return ERR_PTR(-EINVAL);

t = kmalloc(sizeof(*t), GFP_KERNEL);
@@ -1117,7 +1123,8 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,

t->group = group;
t->state = state;
- t->threshold = threshold_us * NSEC_PER_USEC;
+ t->min_threshold = min_threshold_us * NSEC_PER_USEC;
+ t->max_threshold = max_threshold_us * NSEC_PER_USEC;
t->win.size = window_us * NSEC_PER_USEC;
window_reset(&t->win, 0, 0, 0);

--
2.25.1



2022-05-18 03:59:49

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH 1/2] psi: add support for multi level pressure stall trigger

On Tue, May 17, 2022 at 5:46 AM Chen Wandun <[email protected]> wrote:
>
>
>
> 在 2022/5/16 16:43, Suren Baghdasaryan 写道:
> > On Mon, May 16, 2022 at 1:21 AM Suren Baghdasaryan <[email protected]> wrote:
> >> On Sun, May 15, 2022 at 11:20 PM Alex Shi <[email protected]> wrote:
> >>>
> >>>
> >>> On 5/16/22 11:35, Chen Wandun wrote:
> >>>> Nowadays, psi events are triggered when stall time exceed
> >>>> stall threshold, but no any different between these events.
> >>>>
> >>>> Actually, events can be divide into multi level, each level
> >>>> represent a different stall pressure, that is help to identify
> >>>> pressure information more accurately.
> >> IIUC by defining min and max, you want the trigger to activate when
> >> the stall is between min and max thresholds. But I don't see why you
> >> would need that. If you want to have several levels, you can create
> >> multiple triggers and monitor them separately. For your example, that
> >> would be:
> >>
> >> echo "some 150000 1000000" > /proc/pressure/memory
> >> echo "some 350000 1000000" > /proc/pressure/memory
> >>
> >> Your first trigger will fire whenever the stall exceeds 150ms within
> >> each 1sec and the second one will trigger when it exceeds 350ms. It is
> >> true that if the stall jumps sharply above 350ms, you would get both
> >> triggers firing. I'm guessing that's why you want this functionality
> >> so that 150ms trigger does not fire when 350ms one is firing but why
> >> is that a problem? Can't userspace pick the highest level one and
> >> ignore all the lower ones when this happens? Or are you addressing
> >> some other requirement?
> >>
> >>>> echo "some 150000 350000 1000000" > /proc/pressure/memory would
> >>> This breaks the old ABI. And why you need this new function?
> >> Both great points.
> > BTW, I think the additional max_threshold parameter could be
> > implemented in a backward compatible way so that the old API is not
> > broken:
> >
> > arg_count = sscanf(buf, "some %u %u %u", &min_threshold_us, &arg2, &arg3);
> > if (arg_count < 2) return ERR_PTR(-EINVAL);
> > if (arg_count < 3) {
> > max_threshold_us = INT_MAX;
> > window_us = arg2;
> > } else {
> > max_threshold_us = arg2;
> > window_us = arg3;
> > }
> OK
>
> Thanks.
> > But again, the motivation still needs to be explained.
> we want do different operation for different stall level,
> just as prev email explain, multi trigger is also OK in old
> ways, but it is a litter complex.

Ok, so the issue can be dealt with in the userspace but would make it
simpler if max_threashold is supported by the kernel. I can buy this
argument if the kernel implementation is not complex and max_threshold
is added in a way that does not break current users. I believe both
conditions can be met.

>
> >
> >>> Thanks
> >>>
> >>>> add [150ms, 350ms) threshold for partial memory stall measured
> >>>> within 1sec time window.
> >>>>
> >>>> Signed-off-by: Chen Wandun <[email protected]>
> >>>> ---
> >>>> include/linux/psi_types.h | 3 ++-
> >>>> kernel/sched/psi.c | 19 +++++++++++++------
> >>>> 2 files changed, 15 insertions(+), 7 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h
> >>>> index c7fe7c089718..2b1393c8bf90 100644
> >>>> --- a/include/linux/psi_types.h
> >>>> +++ b/include/linux/psi_types.h
> >>>> @@ -119,7 +119,8 @@ struct psi_trigger {
> >>>> enum psi_states state;
> >>>>
> >>>> /* User-spacified threshold in ns */
> >>>> - u64 threshold;
> >>>> + u64 min_threshold;
> >>>> + u64 max_threshold;
> >>>>
> >>>> /* List node inside triggers list */
> >>>> struct list_head node;
> >>>> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> >>>> index 6f9533c95b0a..17dd233b533a 100644
> >>>> --- a/kernel/sched/psi.c
> >>>> +++ b/kernel/sched/psi.c
> >>>> @@ -541,7 +541,7 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> >>>>
> >>>> /* Calculate growth since last update */
> >>>> growth = window_update(&t->win, now, total[t->state]);
> >>>> - if (growth < t->threshold)
> >>>> + if (growth < t->min_threshold || growth >= t->max_threshold)
> >>>> continue;
> >>>>
> >>>> t->pending_event = true;
> >>>> @@ -1087,15 +1087,18 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
> >>>> {
> >>>> struct psi_trigger *t;
> >>>> enum psi_states state;
> >>>> - u32 threshold_us;
> >>>> + u32 min_threshold_us;
> >>>> + u32 max_threshold_us;
> >>>> u32 window_us;
> >>>>
> >>>> if (static_branch_likely(&psi_disabled))
> >>>> return ERR_PTR(-EOPNOTSUPP);
> >>>>
> >>>> - if (sscanf(buf, "some %u %u", &threshold_us, &window_us) == 2)
> >>>> + if (sscanf(buf, "some %u %u %u", &min_threshold_us,
> >>>> + &max_threshold_us, &window_us) == 3)
> >>>> state = PSI_IO_SOME + res * 2;
> >>>> - else if (sscanf(buf, "full %u %u", &threshold_us, &window_us) == 2)
> >>>> + else if (sscanf(buf, "full %u %u %u", &min_threshold_us,
> >>>> + &max_threshold_us, &window_us) == 3)
> >>>> state = PSI_IO_FULL + res * 2;
> >>>> else
> >>>> return ERR_PTR(-EINVAL);
> >>>> @@ -1107,8 +1110,11 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
> >>>> window_us > WINDOW_MAX_US)
> >>>> return ERR_PTR(-EINVAL);
> >>>>
> >>>> + if (min_threshold_us >= max_threshold_us)
> >>>> + return ERR_PTR(-EINVAL);
> >>>> +
> >>>> /* Check threshold */
> >>>> - if (threshold_us == 0 || threshold_us > window_us)
> >>>> + if (max_threshold_us > window_us)
> >>>> return ERR_PTR(-EINVAL);
> >>>>
> >>>> t = kmalloc(sizeof(*t), GFP_KERNEL);
> >>>> @@ -1117,7 +1123,8 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
> >>>>
> >>>> t->group = group;
> >>>> t->state = state;
> >>>> - t->threshold = threshold_us * NSEC_PER_USEC;
> >>>> + t->min_threshold = min_threshold_us * NSEC_PER_USEC;
> >>>> + t->max_threshold = max_threshold_us * NSEC_PER_USEC;
> >>>> t->win.size = window_us * NSEC_PER_USEC;
> >>>> window_reset(&t->win, 0, 0, 0);
> >>>>
> > .
>

2022-05-18 10:42:51

by Alex Shi

[permalink] [raw]
Subject: Re: [PATCH 1/2] psi: add support for multi level pressure stall trigger



On 5/17/22 20:46, Chen Wandun wrote:
>>>> This breaks the old ABI. And why you need this new function?
>>> Both great points.
>> BTW, I think the additional max_threshold parameter could be
>> implemented in a backward compatible way so that the old API is not
>> broken:
>>
>> arg_count = sscanf(buf, "some %u %u %u", &min_threshold_us,  &arg2, &arg3);
>> if (arg_count < 2) return ERR_PTR(-EINVAL);
>> if (arg_count < 3) {
>>      max_threshold_us = INT_MAX;
>>      window_us = arg2;
>> } else {
>>      max_threshold_us = arg2;
>>      window_us = arg3;
>> }
> OK
>
> Thanks.
>> But again, the motivation still needs to be explained.
> we want do different operation for different stall level,
> just as prev email explain, multi trigger is also OK in old
> ways, but it is a litter complex.

In fact, I am not keen for this solution, the older and newer
interface is easy to be confused by users, for some resolvable
unclear issues. It's not a good idea.

Thanks
Alex

2022-05-18 21:58:19

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH 1/2] psi: add support for multi level pressure stall trigger

On Wed, May 18, 2022 at 3:29 AM Alex Shi <[email protected]> wrote:
>
>
>
> On 5/17/22 20:46, Chen Wandun wrote:
> >>>> This breaks the old ABI. And why you need this new function?
> >>> Both great points.
> >> BTW, I think the additional max_threshold parameter could be
> >> implemented in a backward compatible way so that the old API is not
> >> broken:
> >>
> >> arg_count = sscanf(buf, "some %u %u %u", &min_threshold_us, &arg2, &arg3);
> >> if (arg_count < 2) return ERR_PTR(-EINVAL);
> >> if (arg_count < 3) {
> >> max_threshold_us = INT_MAX;
> >> window_us = arg2;
> >> } else {
> >> max_threshold_us = arg2;
> >> window_us = arg3;
> >> }
> > OK
> >
> > Thanks.
> >> But again, the motivation still needs to be explained.
> > we want do different operation for different stall level,
> > just as prev email explain, multi trigger is also OK in old
> > ways, but it is a litter complex.
>
> In fact, I am not keen for this solution, the older and newer
> interface is easy to be confused by users, for some resolvable
> unclear issues. It's not a good idea.

Maybe adding the max_threshold as an optional last argument will be
less confusing? Smth like this:

some/full min_threshold window_size [max_threshold]

Also, if we do decide to add it, there should be a warning in the
documentation that max_threshold usage might lead to a stall being
missed completely. In your example:

echo "some 150000 350000 1000000" > /proc/pressure/memory

If there is a stall of more than 350ms within a given window, that
trigger will not fire at all.
Thanks,
Suren.

>
> Thanks
> Alex

2022-05-19 09:03:29

by Alex Shi

[permalink] [raw]
Subject: Re: [PATCH 1/2] psi: add support for multi level pressure stall trigger



On 5/19/22 05:38, Suren Baghdasaryan wrote:
> On Wed, May 18, 2022 at 3:29 AM Alex Shi <[email protected]> wrote:
>>
>>
>>
>> On 5/17/22 20:46, Chen Wandun wrote:
>>>>>> This breaks the old ABI. And why you need this new function?
>>>>> Both great points.
>>>> BTW, I think the additional max_threshold parameter could be
>>>> implemented in a backward compatible way so that the old API is not
>>>> broken:
>>>>
>>>> arg_count = sscanf(buf, "some %u %u %u", &min_threshold_us, &arg2, &arg3);
>>>> if (arg_count < 2) return ERR_PTR(-EINVAL);
>>>> if (arg_count < 3) {
>>>> max_threshold_us = INT_MAX;
>>>> window_us = arg2;
>>>> } else {
>>>> max_threshold_us = arg2;
>>>> window_us = arg3;
>>>> }
>>> OK
>>>
>>> Thanks.
>>>> But again, the motivation still needs to be explained.
>>> we want do different operation for different stall level,
>>> just as prev email explain, multi trigger is also OK in old
>>> ways, but it is a litter complex.
>>
>> In fact, I am not keen for this solution, the older and newer
>> interface is easy to be confused by users, for some resolvable
>> unclear issues. It's not a good idea.
>
> Maybe adding the max_threshold as an optional last argument will be
> less confusing? Smth like this:
>
> some/full min_threshold window_size [max_threshold]

It's already confused enough. :)
BTW, I still don't see the strong reason for the pressure range.

> > Also, if we do decide to add it, there should be a warning in the
> documentation that max_threshold usage might lead to a stall being
> missed completely. In your example:
>
> echo "some 150000 350000 1000000" > /proc/pressure/memory
>
> If there is a stall of more than 350ms within a given window, that
> trigger will not fire at all.

Right.
And what if others propose more pressure combinations?
Maybe leave them to user space is more likely workable?

Thanks
Alex

2022-05-21 11:39:21

by Alex Shi

[permalink] [raw]
Subject: Re: [PATCH 1/2] psi: add support for multi level pressure stall trigger



On 5/21/22 15:23, Chen Wandun wrote:
>
>
> 在 2022/5/19 14:15, Alex Shi 写道:
>>
>> On 5/19/22 05:38, Suren Baghdasaryan wrote:
>>> On Wed, May 18, 2022 at 3:29 AM Alex Shi <[email protected]> wrote:
>>>>
>>>>
>>>> On 5/17/22 20:46, Chen Wandun wrote:
>>>>>>>> This breaks the old ABI. And why you need this new function?
>>>>>>> Both great points.
>>>>>> BTW, I think the additional max_threshold parameter could be
>>>>>> implemented in a backward compatible way so that the old API is not
>>>>>> broken:
>>>>>>
>>>>>> arg_count = sscanf(buf, "some %u %u %u", &min_threshold_us,  &arg2, &arg3);
>>>>>> if (arg_count < 2) return ERR_PTR(-EINVAL);
>>>>>> if (arg_count < 3) {
>>>>>>       max_threshold_us = INT_MAX;
>>>>>>       window_us = arg2;
>>>>>> } else {
>>>>>>       max_threshold_us = arg2;
>>>>>>       window_us = arg3;
>>>>>> }
>>>>> OK
>>>>>
>>>>> Thanks.
>>>>>> But again, the motivation still needs to be explained.
>>>>> we want do different operation for different stall level,
>>>>> just as prev email explain, multi trigger is also OK in old
>>>>> ways, but it is a litter complex.
>>>> In fact, I am not keen for this solution, the older and newer
>>>> interface is easy to be confused by users, for some resolvable
>>>> unclear issues. It's not a good idea.
>>> Maybe adding the max_threshold as an optional last argument will be
>>> less confusing? Smth like this:
>>>
>>> some/full min_threshold window_size [max_threshold]
>> It's already confused enough. :)
> which point make you confused?
> Interface suggest by Suren is compatible with current version,
> I think it is more reasonable and there is no difficuty to understand it.

Your 3rd parameter has different meaning depends on the exists or non-exist
4th one. It's not a good design.

>> BTW, I still don't see the strong reason for the pressure range.
> Considering this case:
> I divide pressure into multi levels, and each level corresponds to a
> hander,  I have to register multi triggers and wait for fire events,
> nowadays, these trigger is something like:
> echo “some 150000 1000000” > /proc/pressure/memory
> echo “some 350000 1000000” > /proc/pressure/memory
> echo “some 550000 1000000” > /proc/pressure/memory
> echo “some 750000 1000000” > /proc/pressure/memory
>
> In the best case, stall pressure between 150000 and 350000,
> only one trigger fire, and only one wakeup.
>
> In any other case,  multi triggers fire and multi wakeup, but it
> indeed is no need.
>

Could you give more details info to show what detailed problem
which your propose could address, but current code cannot?


Thanks
Alex

> New implement make the fire and wakeup more precise,
> userspace code will be more simple, no confusing fire event,
> no need to filter fire event anymore, maybe minor performance
> improved.
>
> Thanks.
>>
>>>> Also, if we do decide to add it, there should be a warning in the
>>> documentation that max_threshold usage might lead to a stall being
>>> missed completely. In your example:
>>>
>>> echo "some 150000 350000 1000000" > /proc/pressure/memory
>>>
>>> If there is a stall of more than 350ms within a given window, that
>>> trigger will not fire at all.
>> Right.
>> And what if others propose more pressure combinations?
>> Maybe leave them to user space is more likely workable?
>>
>> Thanks
>> Alex
>> .
>

2022-05-23 06:59:43

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH 1/2] psi: add support for multi level pressure stall trigger

On Wed, May 18, 2022 at 11:15 PM Alex Shi <[email protected]> wrote:
>
>
>
> On 5/19/22 05:38, Suren Baghdasaryan wrote:
> > On Wed, May 18, 2022 at 3:29 AM Alex Shi <[email protected]> wrote:
> >>
> >>
> >>
> >> On 5/17/22 20:46, Chen Wandun wrote:
> >>>>>> This breaks the old ABI. And why you need this new function?
> >>>>> Both great points.
> >>>> BTW, I think the additional max_threshold parameter could be
> >>>> implemented in a backward compatible way so that the old API is not
> >>>> broken:
> >>>>
> >>>> arg_count = sscanf(buf, "some %u %u %u", &min_threshold_us, &arg2, &arg3);
> >>>> if (arg_count < 2) return ERR_PTR(-EINVAL);
> >>>> if (arg_count < 3) {
> >>>> max_threshold_us = INT_MAX;
> >>>> window_us = arg2;
> >>>> } else {
> >>>> max_threshold_us = arg2;
> >>>> window_us = arg3;
> >>>> }
> >>> OK
> >>>
> >>> Thanks.
> >>>> But again, the motivation still needs to be explained.
> >>> we want do different operation for different stall level,
> >>> just as prev email explain, multi trigger is also OK in old
> >>> ways, but it is a litter complex.
> >>
> >> In fact, I am not keen for this solution, the older and newer
> >> interface is easy to be confused by users, for some resolvable
> >> unclear issues. It's not a good idea.
> >
> > Maybe adding the max_threshold as an optional last argument will be
> > less confusing? Smth like this:
> >
> > some/full min_threshold window_size [max_threshold]
>
> It's already confused enough. :)
> BTW, I still don't see the strong reason for the pressure range.
>
> > > Also, if we do decide to add it, there should be a warning in the
> > documentation that max_threshold usage might lead to a stall being
> > missed completely. In your example:
> >
> > echo "some 150000 350000 1000000" > /proc/pressure/memory
> >
> > If there is a stall of more than 350ms within a given window, that
> > trigger will not fire at all.
>
> Right.
> And what if others propose more pressure combinations?
> Maybe leave them to user space is more likely workable?

Ok, sounds like userspace can handle the situation of multiple
triggers firing. Let's keep it simple as it is now, until we see a
strong need or convincing performance numbers for adding this new
trigger attribute.
Chen, if you can provide reasons why a userspace solution would be
prohibitive I would be happy to reconsider.
Thanks,
Suren.

>
> Thanks
> Alex