2023-02-27 02:38:57

by Yang Jihong

[permalink] [raw]
Subject: [PATCH RESEND v3] perf/core: Fix hardlockup failure caused by perf throttle

commit e050e3f0a71bf ("perf: Fix broken interrupt rate throttling")
introduces a change in throttling threshold judgment. Before this,
compare hwc->interrupts and max_samples_per_tick, then increase
hwc->interrupts by 1, but this commit reverses order of these two
behaviors, causing the semantics of max_samples_per_tick to change.
In literal sense of "max_samples_per_tick", if hwc->interrupts ==
max_samples_per_tick, it should not be throttled, therefore, the judgment
condition should be changed to "hwc->interrupts > max_samples_per_tick".

In fact, this may cause the hardlockup to fail, The minimum value of
max_samples_per_tick may be 1, in this case, the return value of
__perf_event_account_interrupt function is 1.
As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86
architecture as an example, see x86_pmu_handle_irq).

Fixes: e050e3f0a71b ("perf: Fix broken interrupt rate throttling")
Signed-off-by: Yang Jihong <[email protected]>
---

Changes since v2:
- Add fixed commit.

Changes since v1:
- Modify commit title.

kernel/events/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f79fd8b87f75..0540a8653906 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9434,7 +9434,7 @@ __perf_event_account_interrupt(struct perf_event *event, int throttle)
} else {
hwc->interrupts++;
if (unlikely(throttle
- && hwc->interrupts >= max_samples_per_tick)) {
+ && hwc->interrupts > max_samples_per_tick)) {
__this_cpu_inc(perf_throttled_count);
tick_dep_set_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS);
hwc->interrupts = MAX_INTERRUPTS;
--
2.30.GIT



2023-03-06 01:14:23

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH RESEND v3] perf/core: Fix hardlockup failure caused by perf throttle

Hello,

PING.

Thanks,
Yang.

On 2023/2/27 10:35, Yang Jihong wrote:
> commit e050e3f0a71bf ("perf: Fix broken interrupt rate throttling")
> introduces a change in throttling threshold judgment. Before this,
> compare hwc->interrupts and max_samples_per_tick, then increase
> hwc->interrupts by 1, but this commit reverses order of these two
> behaviors, causing the semantics of max_samples_per_tick to change.
> In literal sense of "max_samples_per_tick", if hwc->interrupts ==
> max_samples_per_tick, it should not be throttled, therefore, the judgment
> condition should be changed to "hwc->interrupts > max_samples_per_tick".
>
> In fact, this may cause the hardlockup to fail, The minimum value of
> max_samples_per_tick may be 1, in this case, the return value of
> __perf_event_account_interrupt function is 1.
> As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86
> architecture as an example, see x86_pmu_handle_irq).
>
> Fixes: e050e3f0a71b ("perf: Fix broken interrupt rate throttling")
> Signed-off-by: Yang Jihong <[email protected]>
> ---
>
> Changes since v2:
> - Add fixed commit.
>
> Changes since v1:
> - Modify commit title.
>
> kernel/events/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f79fd8b87f75..0540a8653906 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9434,7 +9434,7 @@ __perf_event_account_interrupt(struct perf_event *event, int throttle)
> } else {
> hwc->interrupts++;
> if (unlikely(throttle
> - && hwc->interrupts >= max_samples_per_tick)) {
> + && hwc->interrupts > max_samples_per_tick)) {
> __this_cpu_inc(perf_throttled_count);
> tick_dep_set_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS);
> hwc->interrupts = MAX_INTERRUPTS;
>

2023-03-22 07:46:43

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH RESEND v3] perf/core: Fix hardlockup failure caused by perf throttle

Hello,

PING.

This patch has not been responded.
Please take time to check whether the fix solution is OK.
Look forward to reviewing the patch. Thanks :)

Thanks,
Yang.

On 2023/3/6 9:14, Yang Jihong wrote:
> Hello,
>
> PING.
>
> Thanks,
> Yang.
>
> On 2023/2/27 10:35, Yang Jihong wrote:
>> commit e050e3f0a71bf ("perf: Fix broken interrupt rate throttling")
>> introduces a change in throttling threshold judgment. Before this,
>> compare hwc->interrupts and max_samples_per_tick, then increase
>> hwc->interrupts by 1, but this commit reverses order of these two
>> behaviors, causing the semantics of max_samples_per_tick to change.
>> In literal sense of "max_samples_per_tick", if hwc->interrupts ==
>> max_samples_per_tick, it should not be throttled, therefore, the judgment
>> condition should be changed to "hwc->interrupts > max_samples_per_tick".
>>
>> In fact, this may cause the hardlockup to fail, The minimum value of
>> max_samples_per_tick may be 1, in this case, the return value of
>> __perf_event_account_interrupt function is 1.
>> As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86
>> architecture as an example, see x86_pmu_handle_irq).
>>
>> Fixes: e050e3f0a71b ("perf: Fix broken interrupt rate throttling")
>> Signed-off-by: Yang Jihong <[email protected]>
>> ---
>>
>> Changes since v2:
>>    - Add fixed commit.
>>
>> Changes since v1:
>>    - Modify commit title.
>>
>>   kernel/events/core.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index f79fd8b87f75..0540a8653906 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -9434,7 +9434,7 @@ __perf_event_account_interrupt(struct perf_event
>> *event, int throttle)
>>       } else {
>>           hwc->interrupts++;
>>           if (unlikely(throttle
>> -                 && hwc->interrupts >= max_samples_per_tick)) {
>> +                 && hwc->interrupts > max_samples_per_tick)) {
>>               __this_cpu_inc(perf_throttled_count);
>>               tick_dep_set_cpu(smp_processor_id(),
>> TICK_DEP_BIT_PERF_EVENTS);
>>               hwc->interrupts = MAX_INTERRUPTS;
>>
>
> .

2023-04-14 07:06:20

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH RESEND v3] perf/core: Fix hardlockup failure caused by perf throttle

Hello,

PING again.
Look forward the review.

Thanks,
Yang.

On 2023/3/22 15:36, Yang Jihong wrote:
> Hello,
>
> PING.
>
> This patch has not been responded.
> Please take time to check whether the fix solution is OK.
> Look forward to reviewing the patch. Thanks :)
>
> Thanks,
> Yang.
>
> On 2023/3/6 9:14, Yang Jihong wrote:
>> Hello,
>>
>> PING.
>>
>> Thanks,
>> Yang.
>>
>> On 2023/2/27 10:35, Yang Jihong wrote:
>>> commit e050e3f0a71bf ("perf: Fix broken interrupt rate throttling")
>>> introduces a change in throttling threshold judgment. Before this,
>>> compare hwc->interrupts and max_samples_per_tick, then increase
>>> hwc->interrupts by 1, but this commit reverses order of these two
>>> behaviors, causing the semantics of max_samples_per_tick to change.
>>> In literal sense of "max_samples_per_tick", if hwc->interrupts ==
>>> max_samples_per_tick, it should not be throttled, therefore, the
>>> judgment
>>> condition should be changed to "hwc->interrupts > max_samples_per_tick".
>>>
>>> In fact, this may cause the hardlockup to fail, The minimum value of
>>> max_samples_per_tick may be 1, in this case, the return value of
>>> __perf_event_account_interrupt function is 1.
>>> As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86
>>> architecture as an example, see x86_pmu_handle_irq).
>>>
>>> Fixes: e050e3f0a71b ("perf: Fix broken interrupt rate throttling")
>>> Signed-off-by: Yang Jihong <[email protected]>
>>> ---
>>>
>>> Changes since v2:
>>>    - Add fixed commit.
>>>
>>> Changes since v1:
>>>    - Modify commit title.
>>>
>>>   kernel/events/core.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index f79fd8b87f75..0540a8653906 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -9434,7 +9434,7 @@ __perf_event_account_interrupt(struct
>>> perf_event *event, int throttle)
>>>       } else {
>>>           hwc->interrupts++;
>>>           if (unlikely(throttle
>>> -                 && hwc->interrupts >= max_samples_per_tick)) {
>>> +                 && hwc->interrupts > max_samples_per_tick)) {
>>>               __this_cpu_inc(perf_throttled_count);
>>>               tick_dep_set_cpu(smp_processor_id(),
>>> TICK_DEP_BIT_PERF_EVENTS);
>>>               hwc->interrupts = MAX_INTERRUPTS;
>>>
>>
>> .
>
> .

2023-04-14 08:25:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RESEND v3] perf/core: Fix hardlockup failure caused by perf throttle

On Mon, Feb 27, 2023 at 10:35:08AM +0800, Yang Jihong wrote:
> commit e050e3f0a71bf ("perf: Fix broken interrupt rate throttling")
> introduces a change in throttling threshold judgment. Before this,
> compare hwc->interrupts and max_samples_per_tick, then increase
> hwc->interrupts by 1, but this commit reverses order of these two
> behaviors, causing the semantics of max_samples_per_tick to change.
> In literal sense of "max_samples_per_tick", if hwc->interrupts ==
> max_samples_per_tick, it should not be throttled, therefore, the judgment
> condition should be changed to "hwc->interrupts > max_samples_per_tick".
>
> In fact, this may cause the hardlockup to fail, The minimum value of
> max_samples_per_tick may be 1, in this case, the return value of
> __perf_event_account_interrupt function is 1.
> As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86
> architecture as an example, see x86_pmu_handle_irq).
>
> Fixes: e050e3f0a71b ("perf: Fix broken interrupt rate throttling")
> Signed-off-by: Yang Jihong <[email protected]>
> ---
> kernel/events/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f79fd8b87f75..0540a8653906 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9434,7 +9434,7 @@ __perf_event_account_interrupt(struct perf_event *event, int throttle)
> } else {
> hwc->interrupts++;
> if (unlikely(throttle
> - && hwc->interrupts >= max_samples_per_tick)) {
> + && hwc->interrupts > max_samples_per_tick)) {
> __this_cpu_inc(perf_throttled_count);
> tick_dep_set_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS);
> hwc->interrupts = MAX_INTERRUPTS;

Thanks, I've made a slight edit to fix the && placement.

Subject: [tip: perf/core] perf/core: Fix hardlockup failure caused by perf throttle

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 15def34e2635ab7e0e96f1bc32e1b69609f14942
Gitweb: https://git.kernel.org/tip/15def34e2635ab7e0e96f1bc32e1b69609f14942
Author: Yang Jihong <[email protected]>
AuthorDate: Mon, 27 Feb 2023 10:35:08 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 14 Apr 2023 16:08:22 +02:00

perf/core: Fix hardlockup failure caused by perf throttle

commit e050e3f0a71bf ("perf: Fix broken interrupt rate throttling")
introduces a change in throttling threshold judgment. Before this,
compare hwc->interrupts and max_samples_per_tick, then increase
hwc->interrupts by 1, but this commit reverses order of these two
behaviors, causing the semantics of max_samples_per_tick to change.
In literal sense of "max_samples_per_tick", if hwc->interrupts ==
max_samples_per_tick, it should not be throttled, therefore, the judgment
condition should be changed to "hwc->interrupts > max_samples_per_tick".

In fact, this may cause the hardlockup to fail, The minimum value of
max_samples_per_tick may be 1, in this case, the return value of
__perf_event_account_interrupt function is 1.
As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86
architecture as an example, see x86_pmu_handle_irq).

Fixes: e050e3f0a71b ("perf: Fix broken interrupt rate throttling")
Signed-off-by: Yang Jihong <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/events/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index fb3e436..82b95b8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9433,8 +9433,8 @@ __perf_event_account_interrupt(struct perf_event *event, int throttle)
hwc->interrupts = 1;
} else {
hwc->interrupts++;
- if (unlikely(throttle
- && hwc->interrupts >= max_samples_per_tick)) {
+ if (unlikely(throttle &&
+ hwc->interrupts > max_samples_per_tick)) {
__this_cpu_inc(perf_throttled_count);
tick_dep_set_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS);
hwc->interrupts = MAX_INTERRUPTS;