The series makes the arm_pmu driver use NMIs for the perf interrupt when
NMIs are available on the platform (currently, only arm64 + GICv3). To make
it easier to play with the patches, I've pushed a branch at [1]:
$ git clone -b pmu-nmi-v6 git://linux-arm.org/linux-ae
I've tested the series on an espressobin v7*. These are the results of
running perf record -a -- sleep 60:
1. Without the patches:
16.73% [k] arch_local_irq_enable
12.20% [k] arch_cpu_idle
8.61% [k] _raw_spin_unlock_irqrestore
4.09% [k] arch_local_irq_enable
2.25% [k] arch_local_irq_enable
1.82% [k] arch_counter_get_cntpct
[..]
2. Using NMIs:
3.37% [k] arch_counter_get_cntpct
2.62% [.] _IO_fwrite
1.62% [.] __gconv_transform_ascii_internal
1.49% [.] __mbrtowc
1.44% [k] el0_svc_common
1.31% [.] strchr
[..]
When running perf record -a -- iperf3 -c 127.0.0.1 -t 60:
1. Without the patches:
24.25% [k] __arch_copy_from_user
20.94% [k] __arch_copy_to_user
5.71% [k] arch_local_irq_enable
3.12% [k] _raw_spin_unlock_irqrestore
2.01% [k] __free_pages_ok
1.48% [k] arch_cpu_idle
[..]
2. Using NMIs:
23.15% [k] __arch_copy_from_user
21.68% [k] __arch_copy_to_user
1.23% [k] tcp_ack
1.08% [k] tcp_sendmsg_locked
0.97% [k] rmqueue
0.91% [k] __free_pages_ok
[..]
I've ran the same tests in a VM when both host+guest use NMIs, and when
neither use them. All of these tests were also ran on the model. Similar
results in all cases.
* All the firmware versions for espressobin v7 that I've tried clear
SCR_EL3.FIQ, which means that NMIs don't work. To make them work on the
board, I modified the GICv3 driver. That's why I would really appreciate
someone testing this series on a board where NMIs work without any GIC
changes. For people who want to test the series, but don't have a board
with firmware that sets SCR_EL3.FIQ, I've pushed a branch [2] with the
GICv3 drivers changes necessary to make NMIs work:
$ git clone -b pmu-nmi-v6-nmi-fiq-clear-v2 git://linux-arm.org/linux-ae
Summary of the patches:
* Patch 1 is a fix for a bug that Julien found during the review for v4.
* Patches 2 and 3 remove locking from arm64 perf event code.
* Patches 4 and 5 makes the arm64 PMU interrupt handler NMI safe.
* Patches 6 and 7 enable the use of NMIs on arm64 with a GICv3 irqchip.
Changes since v5 [3]:
- Rebased on top of v5.9-rc1.
- Typo fixes.
- Added comments to the ISB added by patches #1 and #2.
- Reworded message for patch #4, as per Mark's excellent suggestion.
Changes since v4 [4]:
- Rebased on top of v5.8-rc1 and dropped the Tested-by tags because it's
been almost a year since the series has been tested.
- Dropped patch 3 because I couldn't find any instance where
armv7pmu_read_counter() was called with interrupts enabled. I've also
tested this by running several instances of perf for a few hours, and the
function was called every time with interrupts disabled.
- Dropped patches 4 and 5 because the tradeoff wasn't worth it in my
opinion: the irq handler was slower all the time (because it
saved/restored the counter select register), in exchange for being
slightly faster on the rare ocassions when it triggered at the beginning
of the critical sections.
- Minor changes here and there to address review comments.
Changes since v3 [5]:
- Added tags
- Fix build issue for perf_event_v6
- Don't disable preemption in pmu->enable()
- Always rely on IPI_IRQ_WORK to run the queued work
- Fixed typos + cleanups
Changes since v2 [6]:
- Rebased on recent linux-next (next-20190708)
- Fixed a number of bugs with indices (reported by Wei)
- Minor style fixes
Changes since v1 [7]:
- Rebased on v5.1-rc1
- Pseudo-NMI has changed a lot since then, use the (now merged) NMI API
- Remove locking from armv7 perf_event
- Use locking only in armv6 perf_event
- Use direct counter/type registers insted of selector register for armv8
[1] http://www.linux-arm.org/git?p=linux-ae.git;a=shortlog;h=refs/heads/pmu-nmi-v6
[2] http://www.linux-arm.org/git?p=linux-ae.git;a=shortlog;h=refs/heads/pmu-nmi-v6-nmi-fiq-clear-v2
[3] https://www.spinics.net/lists/kernel/msg3554236.html
[4] https://lists.infradead.org/pipermail/linux-arm-kernel/2019-July/666824.html
[5] https://lists.infradead.org/pipermail/linux-arm-kernel/2019-July/665339.html
[6] https://lists.infradead.org/pipermail/linux-arm-kernel/2019-March/640536.html
[7] https://lists.infradead.org/pipermail/linux-arm-kernel/2018-January/554611.html
Alexandru Elisei (1):
arm64: perf: Add missing ISB in armv8pmu_enable_event()
Julien Thierry (5):
arm64: perf: Remove PMU locking
arm64: perf: Defer irq_work to IPI_IRQ_WORK
KVM: arm64: pmu: Make overflow handler NMI safe
arm_pmu: Introduce pmu_irq_ops
arm_pmu: arm64: Use NMIs for PMU
Mark Rutland (1):
arm64: perf: Avoid PMXEV* indirection
arch/arm64/kernel/perf_event.c | 145 +++++++++++++++++++++------------
arch/arm64/kvm/pmu-emul.c | 25 +++++-
drivers/perf/arm_pmu.c | 142 +++++++++++++++++++++++++++-----
include/kvm/arm_pmu.h | 1 +
4 files changed, 241 insertions(+), 72 deletions(-)
--
2.28.0
Hi Alex,
On Wed, 19 Aug 2020 at 19:03, Alexandru Elisei <[email protected]> wrote:
>
> The series makes the arm_pmu driver use NMIs for the perf interrupt when
> NMIs are available on the platform (currently, only arm64 + GICv3). To make
> it easier to play with the patches, I've pushed a branch at [1]:
>
> $ git clone -b pmu-nmi-v6 git://linux-arm.org/linux-ae
>
> I've tested the series on an espressobin v7*. These are the results of
> running perf record -a -- sleep 60:
>
> 1. Without the patches:
>
> 16.73% [k] arch_local_irq_enable
> 12.20% [k] arch_cpu_idle
> 8.61% [k] _raw_spin_unlock_irqrestore
> 4.09% [k] arch_local_irq_enable
> 2.25% [k] arch_local_irq_enable
> 1.82% [k] arch_counter_get_cntpct
> [..]
>
> 2. Using NMIs:
>
> 3.37% [k] arch_counter_get_cntpct
> 2.62% [.] _IO_fwrite
> 1.62% [.] __gconv_transform_ascii_internal
> 1.49% [.] __mbrtowc
> 1.44% [k] el0_svc_common
> 1.31% [.] strchr
> [..]
>
> When running perf record -a -- iperf3 -c 127.0.0.1 -t 60:
>
> 1. Without the patches:
> 24.25% [k] __arch_copy_from_user
> 20.94% [k] __arch_copy_to_user
> 5.71% [k] arch_local_irq_enable
> 3.12% [k] _raw_spin_unlock_irqrestore
> 2.01% [k] __free_pages_ok
> 1.48% [k] arch_cpu_idle
> [..]
>
> 2. Using NMIs:
>
> 23.15% [k] __arch_copy_from_user
> 21.68% [k] __arch_copy_to_user
> 1.23% [k] tcp_ack
> 1.08% [k] tcp_sendmsg_locked
> 0.97% [k] rmqueue
> 0.91% [k] __free_pages_ok
> [..]
>
> I've ran the same tests in a VM when both host+guest use NMIs, and when
> neither use them. All of these tests were also ran on the model. Similar
> results in all cases.
>
> * All the firmware versions for espressobin v7 that I've tried clear
> SCR_EL3.FIQ, which means that NMIs don't work. To make them work on the
> board, I modified the GICv3 driver. That's why I would really appreciate
> someone testing this series on a board where NMIs work without any GIC
> changes.
This series works perfectly fine for me without any GIC changes on
Developerbox. FWIW:
Tested-by: Sumit Garg <[email protected]> (Developerbox)
-Sumit
> For people who want to test the series, but don't have a board
> with firmware that sets SCR_EL3.FIQ, I've pushed a branch [2] with the
> GICv3 drivers changes necessary to make NMIs work:
>
> $ git clone -b pmu-nmi-v6-nmi-fiq-clear-v2 git://linux-arm.org/linux-ae
>
> Summary of the patches:
> * Patch 1 is a fix for a bug that Julien found during the review for v4.
> * Patches 2 and 3 remove locking from arm64 perf event code.
> * Patches 4 and 5 makes the arm64 PMU interrupt handler NMI safe.
> * Patches 6 and 7 enable the use of NMIs on arm64 with a GICv3 irqchip.
>
> Changes since v5 [3]:
> - Rebased on top of v5.9-rc1.
> - Typo fixes.
> - Added comments to the ISB added by patches #1 and #2.
> - Reworded message for patch #4, as per Mark's excellent suggestion.
>
> Changes since v4 [4]:
> - Rebased on top of v5.8-rc1 and dropped the Tested-by tags because it's
> been almost a year since the series has been tested.
> - Dropped patch 3 because I couldn't find any instance where
> armv7pmu_read_counter() was called with interrupts enabled. I've also
> tested this by running several instances of perf for a few hours, and the
> function was called every time with interrupts disabled.
> - Dropped patches 4 and 5 because the tradeoff wasn't worth it in my
> opinion: the irq handler was slower all the time (because it
> saved/restored the counter select register), in exchange for being
> slightly faster on the rare ocassions when it triggered at the beginning
> of the critical sections.
> - Minor changes here and there to address review comments.
>
> Changes since v3 [5]:
> - Added tags
> - Fix build issue for perf_event_v6
> - Don't disable preemption in pmu->enable()
> - Always rely on IPI_IRQ_WORK to run the queued work
> - Fixed typos + cleanups
>
> Changes since v2 [6]:
> - Rebased on recent linux-next (next-20190708)
> - Fixed a number of bugs with indices (reported by Wei)
> - Minor style fixes
>
> Changes since v1 [7]:
> - Rebased on v5.1-rc1
> - Pseudo-NMI has changed a lot since then, use the (now merged) NMI API
> - Remove locking from armv7 perf_event
> - Use locking only in armv6 perf_event
> - Use direct counter/type registers insted of selector register for armv8
>
> [1] http://www.linux-arm.org/git?p=linux-ae.git;a=shortlog;h=refs/heads/pmu-nmi-v6
> [2] http://www.linux-arm.org/git?p=linux-ae.git;a=shortlog;h=refs/heads/pmu-nmi-v6-nmi-fiq-clear-v2
> [3] https://www.spinics.net/lists/kernel/msg3554236.html
> [4] https://lists.infradead.org/pipermail/linux-arm-kernel/2019-July/666824.html
> [5] https://lists.infradead.org/pipermail/linux-arm-kernel/2019-July/665339.html
> [6] https://lists.infradead.org/pipermail/linux-arm-kernel/2019-March/640536.html
> [7] https://lists.infradead.org/pipermail/linux-arm-kernel/2018-January/554611.html
>
> Alexandru Elisei (1):
> arm64: perf: Add missing ISB in armv8pmu_enable_event()
>
> Julien Thierry (5):
> arm64: perf: Remove PMU locking
> arm64: perf: Defer irq_work to IPI_IRQ_WORK
> KVM: arm64: pmu: Make overflow handler NMI safe
> arm_pmu: Introduce pmu_irq_ops
> arm_pmu: arm64: Use NMIs for PMU
>
> Mark Rutland (1):
> arm64: perf: Avoid PMXEV* indirection
>
> arch/arm64/kernel/perf_event.c | 145 +++++++++++++++++++++------------
> arch/arm64/kvm/pmu-emul.c | 25 +++++-
> drivers/perf/arm_pmu.c | 142 +++++++++++++++++++++++++++-----
> include/kvm/arm_pmu.h | 1 +
> 4 files changed, 241 insertions(+), 72 deletions(-)
>
> --
> 2.28.0
>
On Wed, Aug 19, 2020 at 02:34:12PM +0100, Alexandru Elisei wrote:
> The series makes the arm_pmu driver use NMIs for the perf interrupt when
> NMIs are available on the platform (currently, only arm64 + GICv3). To make
> it easier to play with the patches, I've pushed a branch at [1]:
This mostly looks good to me, but see some of the comments I left on the
code. One other thing I'm not sure about is whether or not we should tell
userspace that we're using an NMI for the sampling. Do any other
architectures have a conditional NMI?
Will
Hi Will,
Thank you so much for reviewing the series!
On 9/21/20 2:59 PM, Will Deacon wrote:
> On Wed, Aug 19, 2020 at 02:34:12PM +0100, Alexandru Elisei wrote:
>> The series makes the arm_pmu driver use NMIs for the perf interrupt when
>> NMIs are available on the platform (currently, only arm64 + GICv3). To make
>> it easier to play with the patches, I've pushed a branch at [1]:
> This mostly looks good to me, but see some of the comments I left on the
> code. One other thing I'm not sure about is whether or not we should tell
> userspace that we're using an NMI for the sampling. Do any other
> architectures have a conditional NMI?
I'm not sure about other architectures being able to configure the perf interrupt
as NMI or a regular interrupt, I'll try to find out. Regardless of what the other
architecture do, I am of the opinion that we should spell out explicitly when the
PMU is using pseudo-NMIs, because it makes a huge difference in the accuracy of
the instrumentation and the overall usefulness of perf.
If I spin a v7 quickly, is it still time to merge the series for 5.10?
Thanks,
Alex
On Mon, Sep 21, 2020 at 04:41:00PM +0100, Alexandru Elisei wrote:
> On 9/21/20 2:59 PM, Will Deacon wrote:
> > On Wed, Aug 19, 2020 at 02:34:12PM +0100, Alexandru Elisei wrote:
> >> The series makes the arm_pmu driver use NMIs for the perf interrupt when
> >> NMIs are available on the platform (currently, only arm64 + GICv3). To make
> >> it easier to play with the patches, I've pushed a branch at [1]:
> > This mostly looks good to me, but see some of the comments I left on the
> > code. One other thing I'm not sure about is whether or not we should tell
> > userspace that we're using an NMI for the sampling. Do any other
> > architectures have a conditional NMI?
>
> I'm not sure about other architectures being able to configure the perf interrupt
> as NMI or a regular interrupt, I'll try to find out. Regardless of what the other
> architecture do, I am of the opinion that we should spell out explicitly when the
> PMU is using pseudo-NMIs, because it makes a huge difference in the accuracy of
> the instrumentation and the overall usefulness of perf.
>
> If I spin a v7 quickly, is it still time to merge the series for 5.10?
I'm on holiday for the rest of the week, but please post something when you
have it and I'll queue it if I manage to get to it.
Will
Hi,
On 9/21/20 4:41 PM, Alexandru Elisei wrote:
> Hi Will,
>
> Thank you so much for reviewing the series!
>
> On 9/21/20 2:59 PM, Will Deacon wrote:
>> On Wed, Aug 19, 2020 at 02:34:12PM +0100, Alexandru Elisei wrote:
>>> The series makes the arm_pmu driver use NMIs for the perf interrupt when
>>> NMIs are available on the platform (currently, only arm64 + GICv3). To make
>>> it easier to play with the patches, I've pushed a branch at [1]:
>> This mostly looks good to me, but see some of the comments I left on the
>> code. One other thing I'm not sure about is whether or not we should tell
>> userspace that we're using an NMI for the sampling. Do any other
>> architectures have a conditional NMI?
> I'm not sure about other architectures being able to configure the perf interrupt
> as NMI or a regular interrupt, I'll try to find out. Regardless of what the other
> architecture do, I am of the opinion that we should spell out explicitly when the
> PMU is using pseudo-NMIs, because it makes a huge difference in the accuracy of
> the instrumentation and the overall usefulness of perf.
Coming back to this, looked at what other architectures are doing by grepping for
perf_pmu_register() and going from there, results below. I've found xtensa to
allow both regular IRQs and NMIs for PMU, based on a kernel config option (just
like arm64). However, the description for the config option states clearly the the
PMU IRQ will be an IRQ, while we don't have that for arm64 - the IRQ will be an
NMI automatically if the GIC is configured to use pseudo-NMIs. I think displaying
a message is the right thing to do, I'll do that for v7.
PMU IRQs for other architectures:
* alpha - PMU interrupt is always IRQ.
* arc - optional PMU interrupt; when present it's requested with
request_percpu_irq(); it prints to dmesg when overflow IRQ support has been detected.
* arm - no NMIs.
* c6x - seems like it doesn't have a PMU at all.
* csky - PMU interrupts is always IRQ.
* h8300 - seems like it doesn't have a PMU at all.
* hexagon - seems like it doesn't have a PMU at all.
* ia64 - perfmon interrupt is registered with register_percpu_irq(); it prints the
IRQ number.
* m64k - couldn't find anything resembling a PMU.
* microblaze - seems like it doesn't have a PMU.
* mips - regular IRQ; irq number and if it's shared with the timer interrupt is
printed.
* nds32 - regular IRQ; doesn't print anything regarding IRQ number.
* nios2 - seems like it doesn't have a PMU.
* openrisc - no PMU.
* parisc - no PMU IRQ, free-running counters?
* powerpc - no IRQ for IMC, hv_24x7 and hv_gpci PMUs; looks like for powerpc64,
the PMU interrupt is treated like an NMI if it is taken when interrupts are
"soft-masked", for powerpc32 it's always a regular interrupt; no information
displayed about the interrupt.
* riscv - they use regular IRQs only when multiplexing events; I haven't found any
information displayed about the PMU.
* s390 - no IRQ for cpum_cf_diag and cpm_cf; regular IRQ for cpum_sf; no dmesg output.
* sh - no IRQ.
* sparc - looks like it's always NMI; no information about IRQ is displayed.
* um - no PMU.
* x86 - the PMU interrupt is always a NMI, the lapic is configured to deliver the
PMI as an NMI (in arch/x86/events/core.c::perf_events_lapic_init()). Nothing about
interrupts printed in init_hw_perf_events();
* xtensa - the interrupt can be configurated as an NMI (EXTENSA_FAKE_NMI), but no
information about it is displayed.
Thanks,
Alex