The series makes the arm_pmu driver use NMIs for the perf interrupt when
NMIs are available on the platform (currently, only arm64 + GICv3). To make
it easier to play with the patches, I've pushed a branch at [1]:
$ git clone -b pmu-nmi-v5 git://linux-arm.org/linux-ae
I've tested the series on an espressobin v7*. These are the results of
running perf record -a -- sleep 60:
1. Without the patches:
12.68% [k] _raw_spin_unlock_irq
11.08% [k] arch_cpu_idle
7.86% [k] _raw_spin_unlock_irqrestore
6.05% [k] arch_counter_get_cntpct
2.86% [k] __softirqentry_text_start
2.33% [k] tick_nohz_idle_exit
[..]
2. Using NMIs:
19.45% [k] arch_counter_get_cntpct
5.14% [k] __delay
3.32% [k] wait_for_xmitr
1.99% [k] ktime_get
1.00% [k] _raw_write_lock_irqsave
1.00% [.] avahi_escape_label
[..]
When running perf record -a -- iperf3 -c 127.0.0.1 -t 60:
1. Without the patches:
24.70% [k] __arch_copy_from_user
21.77% [k] __arch_copy_to_user
5.21% [k] _raw_spin_unlock_irq
2.86% [k] _raw_spin_unlock_irqrestore
1.99% [k] __free_pages_ok
1.61% [k] arch_cpu_idle
[..]
2. Using NMIs:
23.84% [k] __arch_copy_from_user
23.44% [k] __arch_copy_to_user
1.23% [k] get_page_from_freelist
1.16% [k] tcp_ack
0.80% [k] __free_pages_ok
0.78% [k] tcp_sendmsg_locked
[..]
I've ran the same tests in a VM when both host+guest use NMIs, and when
neither use them. All of these tests were also ran on the model. Similar
results in all cases.
* All the firmware versions for espressobin v7 that I've tried clear
SCR_EL3.FIQ, which means that NMIs don't work. To make them work on the
board, I modified the GICv3 driver. That's why I would really appreciate
someone testing this series on a board where NMIs work without any GIC
changes.
Summary of the patches:
* Patch 1 is a fix for a bug that Julien found during the review for v4.
* Patches 2 and 3 remove locking from arm64 perf event code.
* Patches 4 and 5 makes the arm64 PMU interrupt handler NMI safe.
* Patches 6 and 7 enable the use of NMIs on arm64 with a GICv3 irqchip.
Changes since v4 [2]:
- Rebased on top of v5.8-rc1 and dropped the Tested-by tags because it's
been almost a year since the series has been tested.
- Dropped patch 3 because I couldn't find any instance where
armv7pmu_read_counter() was called with interrupts enabled. I've also
tested this by running several instances of perf for a few hours, and the
function was called every time with interrupts disabled.
- Dropped patches 4 and 5 because the tradeoff wasn't worth it in my
opinion: the irq handler was slower all the time (because it
saved/restored the counter select register), in exchange for being
slightly faster on the rare ocassions when it triggered at the beginning
of the critical sections.
- Minor changes here and there to address review comments.
Changes since v3 [3]:
- Added tags
- Fix build issue for perf_event_v6
- Don't disable preemption in pmu->enable()
- Always rely on IPI_IRQ_WORK to run the queued work
- Fixed typos + cleanups
Changes since v2 [4]:
- Rebased on recent linux-next (next-20190708)
- Fixed a number of bugs with indices (reported by Wei)
- Minor style fixes
Changes since v1 [5]:
- Rebased on v5.1-rc1
- Pseudo-NMI has changed a lot since then, use the (now merged) NMI API
- Remove locking from armv7 perf_event
- Use locking only in armv6 perf_event
- Use direct counter/type registers insted of selector register for armv8
[1] http://www.linux-arm.org/git?p=linux-ae.git;a=shortlog;h=refs/heads/pmu-nmi-v5
[2] https://lists.infradead.org/pipermail/linux-arm-kernel/2019-July/666824.html
[3] https://lists.infradead.org/pipermail/linux-arm-kernel/2019-July/665339.html
[4] https://lists.infradead.org/pipermail/linux-arm-kernel/2019-March/640536.html
[5] https://lists.infradead.org/pipermail/linux-arm-kernel/2018-January/554611.html
Alexandru Elisei (1):
arm64: perf: Add missing ISB in armv8pmu_enable_event()
Julien Thierry (5):
arm64: perf: Remove PMU locking
arm64: perf: Defer irq_work to IPI_IRQ_WORK
arm64: kvm: pmu: Make overflow handler NMI safe
arm_pmu: Introduce pmu_irq_ops
arm_pmu: arm64: Use NMIs for PMU
Mark Rutland (1):
arm64: perf: Avoid PMXEV* indirection
arch/arm64/kernel/perf_event.c | 138 ++++++++++++++++++++------------
arch/arm64/kvm/pmu-emul.c | 25 +++++-
drivers/perf/arm_pmu.c | 142 ++++++++++++++++++++++++++++-----
include/kvm/arm_pmu.h | 1 +
4 files changed, 235 insertions(+), 71 deletions(-)
--
2.27.0
From: Julien Thierry <[email protected]>
The PMU is disabled and enabled, and the counters are programmed from
contexts where interrupts or preemption is disabled.
The functions to toggle the PMU and to program the PMU counters access the
registers directly and don't access data modified by the interrupt handler.
That, and the fact that they're always called from non-preemptible
contexts, means that we don't need to disable interrupts or use a spinlock.
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Catalin Marinas <[email protected]>
Signed-off-by: Julien Thierry <[email protected]>
[Explained why locking is not needed, removed WARN_ONs]
Signed-off-by: Alexandru Elisei <[email protected]>
---
arch/arm64/kernel/perf_event.c | 28 ----------------------------
1 file changed, 28 deletions(-)
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index e95b5ca70a53..a6195022be7d 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -647,15 +647,10 @@ static inline u32 armv8pmu_getreset_flags(void)
static void armv8pmu_enable_event(struct perf_event *event)
{
- unsigned long flags;
- struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-
/*
* Enable counter and interrupt, and set the counter to count
* the event that we're interested in.
*/
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
/*
* Disable counter
@@ -678,21 +673,10 @@ static void armv8pmu_enable_event(struct perf_event *event)
* Enable counter
*/
armv8pmu_enable_event_counter(event);
-
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void armv8pmu_disable_event(struct perf_event *event)
{
- unsigned long flags;
- struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-
- /*
- * Disable counter and interrupt
- */
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
-
/*
* Disable counter
*/
@@ -702,30 +686,18 @@ static void armv8pmu_disable_event(struct perf_event *event)
* Disable interrupt for this counter
*/
armv8pmu_disable_event_irq(event);
-
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void armv8pmu_start(struct arm_pmu *cpu_pmu)
{
- unsigned long flags;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
/* Enable all counters */
armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
{
- unsigned long flags;
- struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
-
- raw_spin_lock_irqsave(&events->pmu_lock, flags);
/* Disable all counters */
armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E);
- raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
}
static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
--
2.27.0
Writes to the PMXEVTYPER_EL0 register are not self-synchronising. In
armv8pmu_enable_event(), the PE can reorder configuring the event type
after we have enabled the counter and the interrupt. This can lead to an
interrupt being asserted because the of the previous event type that we
were counting, not the one that we've just enabled.
The same rationale applies to writes to the PMINTENSET_EL1 register. The PE
can reorder enabling the interrupt at any point in the future after we have
enabled the event.
Prevent both situations from happening by adding an ISB just before we
enable the event counter.
Cc: Julien Thierry <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Catalin Marinas <[email protected]>
Fixes: 030896885ade ("arm64: Performance counters support")
Reported-by: Julien Thierry <[email protected]>
Signed-off-by: Alexandru Elisei <[email protected]>
---
arch/arm64/kernel/perf_event.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 4d7879484cec..ee180b2a5b39 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -605,6 +605,7 @@ static void armv8pmu_enable_event(struct perf_event *event)
* Enable interrupt for this counter
*/
armv8pmu_enable_event_irq(event);
+ isb();
/*
* Enable counter
--
2.27.0
Quoting Alexandru Elisei (2020-06-17 04:38:45)
> Writes to the PMXEVTYPER_EL0 register are not self-synchronising. In
> armv8pmu_enable_event(), the PE can reorder configuring the event type
> after we have enabled the counter and the interrupt. This can lead to an
> interrupt being asserted because the of the previous event type that we
'because the of the' doesn't read properly.
> were counting, not the one that we've just enabled.
>
> The same rationale applies to writes to the PMINTENSET_EL1 register. The PE
> can reorder enabling the interrupt at any point in the future after we have
> enabled the event.
>
> Prevent both situations from happening by adding an ISB just before we
> enable the event counter.
>
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 4d7879484cec..ee180b2a5b39 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -605,6 +605,7 @@ static void armv8pmu_enable_event(struct perf_event *event)
> * Enable interrupt for this counter
> */
> armv8pmu_enable_event_irq(event);
> + isb();
Please add a comment before the isb() explaining the situation. Nobody
knows what this is for when reading the code and they don't want to do
git archaeology to figure it out.
Quoting Alexandru Elisei (2020-06-17 04:38:47)
> From: Julien Thierry <[email protected]>
>
> The PMU is disabled and enabled, and the counters are programmed from
> contexts where interrupts or preemption is disabled.
>
> The functions to toggle the PMU and to program the PMU counters access the
> registers directly and don't access data modified by the interrupt handler.
> That, and the fact that they're always called from non-preemptible
> contexts, means that we don't need to disable interrupts or use a spinlock.
Maybe we should add a lockdep assertion that the code isn't preemptible?
I.e. add a cant_sleep() call? Or is it more that we don't need locking
because we're just doing register accesses and don't need to protect
those accesses from each other?
Hi Stephen,
Thank you very much for taking the time to review the patches!
Comments below.
On 6/17/20 9:01 PM, Stephen Boyd wrote:
> Quoting Alexandru Elisei (2020-06-17 04:38:45)
>> Writes to the PMXEVTYPER_EL0 register are not self-synchronising. In
>> armv8pmu_enable_event(), the PE can reorder configuring the event type
>> after we have enabled the counter and the interrupt. This can lead to an
>> interrupt being asserted because the of the previous event type that we
> 'because the of the' doesn't read properly.
Typo on my part, will fix it.
>
>> were counting, not the one that we've just enabled.
>>
>> The same rationale applies to writes to the PMINTENSET_EL1 register. The PE
>> can reorder enabling the interrupt at any point in the future after we have
>> enabled the event.
>>
>> Prevent both situations from happening by adding an ISB just before we
>> enable the event counter.
>>
>> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
>> index 4d7879484cec..ee180b2a5b39 100644
>> --- a/arch/arm64/kernel/perf_event.c
>> +++ b/arch/arm64/kernel/perf_event.c
>> @@ -605,6 +605,7 @@ static void armv8pmu_enable_event(struct perf_event *event)
>> * Enable interrupt for this counter
>> */
>> armv8pmu_enable_event_irq(event);
>> + isb();
> Please add a comment before the isb() explaining the situation. Nobody
> knows what this is for when reading the code and they don't want to do
> git archaeology to figure it out.
That's a good idea, I'll do that.
Thanks,
Alex
Hi,
On 6/17/20 9:17 PM, Stephen Boyd wrote:
> Quoting Alexandru Elisei (2020-06-17 04:38:47)
>> From: Julien Thierry <[email protected]>
>>
>> The PMU is disabled and enabled, and the counters are programmed from
>> contexts where interrupts or preemption is disabled.
>>
>> The functions to toggle the PMU and to program the PMU counters access the
>> registers directly and don't access data modified by the interrupt handler.
>> That, and the fact that they're always called from non-preemptible
>> contexts, means that we don't need to disable interrupts or use a spinlock.
> Maybe we should add a lockdep assertion that the code isn't preemptible?
> I.e. add a cant_sleep() call? Or is it more that we don't need locking
> because we're just doing register accesses and don't need to protect
> those accesses from each other?
It's both. The spinlocks were there to protect the functions from being preempted
and possibly migrated to another CPU, and from being interrupted by the PMU irq
handler.
There was no data race with the interrupt handler, but before the previous patch
("arm64: perf: Avoid PMXEV* indirection"), in order to read/write/program a
counter, one had to write the counter number to a counter selection register, and
then write/read the desired value from another register. This was done from both
the armv8pmu_{enable,disable}_event() functions and the irq handler, and the
spinlock was necessary. Now that we can access a counter using a single register
access, there's no need to protect the functions from being interrupted by the IRQ
handler. As for armv8pmu_{start,stop}(), they consist of one register write, so
it's also safe for the irq handler to interrupt them.
For the preemption part of the locking. The armv8pmu_{enable,disable}_event(),
when called by the perf core code via the pmu->{start,stop,add,del} callbacks, are
guaranteed to be called with IRQs and preemption disabled, as per the comment in
include/linux/perf_event.h. They are also called from the arm_pmu driver by the
CPU PM notifiers, which should also be executed with interrupts disabled. Should
we check here that the top level code respects these guarantees?
The armv8pmu_{start,stop}() functions are called from the irq handler, so we're
safe from preemption in this case. They are also called via
pmu->pmu_{enable,disable} callbacks, and I didn't find an explicit contract
regarding preemption in include/linux/perf_event.h. I've checked the other call
sites, and I didn't find any instances where they are called with preemption
enabled, which makes sense as we don't want to disable the PMU on a another CPU by
accident.
I would be inclined to add cant_sleep() calls to armv8pmu_{start,stop}(). In the
previous iteration, there were WARN_ONs in these functions, and Will said [1] they
can be removed because they are per-CPU operations. Will, what do you think about
adding the lockdep assertions?
[1] https://www.spinics.net/lists/arm-kernel/msg745161.html
Thanks,
Alex
Quoting Alexandru Elisei (2020-06-18 03:51:31)
> Hi,
>
> On 6/17/20 9:17 PM, Stephen Boyd wrote:
> > Quoting Alexandru Elisei (2020-06-17 04:38:47)
> >> From: Julien Thierry <[email protected]>
> >>
> >> The PMU is disabled and enabled, and the counters are programmed from
> >> contexts where interrupts or preemption is disabled.
> >>
> >> The functions to toggle the PMU and to program the PMU counters access the
> >> registers directly and don't access data modified by the interrupt handler.
> >> That, and the fact that they're always called from non-preemptible
> >> contexts, means that we don't need to disable interrupts or use a spinlock.
> > Maybe we should add a lockdep assertion that the code isn't preemptible?
> > I.e. add a cant_sleep() call? Or is it more that we don't need locking
> > because we're just doing register accesses and don't need to protect
> > those accesses from each other?
>
> It's both. The spinlocks were there to protect the functions from being preempted
> and possibly migrated to another CPU, and from being interrupted by the PMU irq
> handler.
>
> There was no data race with the interrupt handler, but before the previous patch
> ("arm64: perf: Avoid PMXEV* indirection"), in order to read/write/program a
> counter, one had to write the counter number to a counter selection register, and
> then write/read the desired value from another register. This was done from both
> the armv8pmu_{enable,disable}_event() functions and the irq handler, and the
> spinlock was necessary. Now that we can access a counter using a single register
> access, there's no need to protect the functions from being interrupted by the IRQ
> handler. As for armv8pmu_{start,stop}(), they consist of one register write, so
> it's also safe for the irq handler to interrupt them.
>
> For the preemption part of the locking. The armv8pmu_{enable,disable}_event(),
> when called by the perf core code via the pmu->{start,stop,add,del} callbacks, are
> guaranteed to be called with IRQs and preemption disabled, as per the comment in
> include/linux/perf_event.h. They are also called from the arm_pmu driver by the
> CPU PM notifiers, which should also be executed with interrupts disabled. Should
> we check here that the top level code respects these guarantees?
>
> The armv8pmu_{start,stop}() functions are called from the irq handler, so we're
> safe from preemption in this case. They are also called via
> pmu->pmu_{enable,disable} callbacks, and I didn't find an explicit contract
> regarding preemption in include/linux/perf_event.h. I've checked the other call
> sites, and I didn't find any instances where they are called with preemption
> enabled, which makes sense as we don't want to disable the PMU on a another CPU by
> accident.
If they're all callbacks then it's overkill to add this. Presumably it
is better to enforce this wherever the callbacks are called from so as
to not litter the callee with random cant_sleep() calls. Probably best
to ignore my suggestion.
>
> I would be inclined to add cant_sleep() calls to armv8pmu_{start,stop}(). In the
> previous iteration, there were WARN_ONs in these functions, and Will said [1] they
> can be removed because they are per-CPU operations. Will, what do you think about
> adding the lockdep assertions?
>
> [1] https://www.spinics.net/lists/arm-kernel/msg745161.html
>
If I read it correctly Will is saying the same thing in that thread.
Hi,
On 6/17/20 12:38 PM, Alexandru Elisei wrote:
> The series makes the arm_pmu driver use NMIs for the perf interrupt when
> NMIs are available on the platform (currently, only arm64 + GICv3). To make
> it easier to play with the patches, I've pushed a branch at [1]:
>
> $ git clone -b pmu-nmi-v5 git://linux-arm.org/linux-ae
For people who wanted to test the series, but couldn't because their firmware was
setting SCR_EL3.FIQ, I pushed some patches [1] to fix that. Now pseudo-NMIs work
on all GICv3 security + SCR_EL3.FIQ combinations.
[1] https://lists.infradead.org/pipermail/linux-arm-kernel/2020-June/580143.html
Thanks,
Alex
On Fri, Jun 19, 2020 at 01:29:59AM -0700, Stephen Boyd wrote:
> Quoting Alexandru Elisei (2020-06-18 03:51:31)
> > The armv8pmu_{start,stop}() functions are called from the irq handler, so we're
> > safe from preemption in this case. They are also called via
> > pmu->pmu_{enable,disable} callbacks, and I didn't find an explicit contract
> > regarding preemption in include/linux/perf_event.h. I've checked the other call
> > sites, and I didn't find any instances where they are called with preemption
> > enabled, which makes sense as we don't want to disable the PMU on a another CPU by
> > accident.
>
> If they're all callbacks then it's overkill to add this. Presumably it
> is better to enforce this wherever the callbacks are called from so as
> to not litter the callee with random cant_sleep() calls. Probably best
> to ignore my suggestion.
>
> >
> > I would be inclined to add cant_sleep() calls to armv8pmu_{start,stop}(). In the
> > previous iteration, there were WARN_ONs in these functions, and Will said [1] they
> > can be removed because they are per-CPU operations. Will, what do you think about
> > adding the lockdep assertions?
> >
> > [1] https://www.spinics.net/lists/arm-kernel/msg745161.html
> >
>
> If I read it correctly Will is saying the same thing in that thread.
Right, in the cases where perf core already relies on things not being
preemptible, we don't need to add extra checks.
Will