2022-04-12 21:56:15

by Dmitry Monakhov

[permalink] [raw]
Subject: [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated

get_stack_info() detects stack type only by begin address, so we must
check that address range in question is fully covered by detected stack

Otherwise following crash is possible:
-> unwind_next_frame
case ORC_TYPE_REGS:
if (!deref_stack_regs(state, sp, &state->ip, &state->sp))
-> deref_stack_regs
-> stack_access_ok <- here addr is inside stack range, but addr+len-1 is not, but we still exit with success
*ip = READ_ONCE_NOCHECK(regs->ip); <- Here we hit stack guard fault
OOPS LOG:
<0>[ 1941.845743] BUG: stack guard page was hit at 000000000dd984a2 (stack is 00000000d1caafca..00000000613712f0)
<4>[ 1941.845744] kernel stack overflow (page fault): 0000 [#1] SMP NOPTI
<4>[ 1941.845744] CPU: 93 PID: 23787 Comm: context_switch1 Not tainted 5.4.145 #1
<4>[ 1941.845745] Hardware name: XXXXXXXXXXXXXX
<4>[ 1941.845746] RIP: 0010:unwind_next_frame+0x311/0x5b0
<4>[ 1941.845746] Code: 01 0f 84 f6 01 00 00 0f 83 cc fe ff ff e9 f9 fe ff ff ba a8 00 00 00 4c 89 fe 48 89 df e8 57 fa ff ff 84 c0 0f 84 d9 00 00 00 <49> 8b 87 80 00 00 00 48 89 43 48 49 8b 87 98 00 00 00 4c 89 7b 50
<4>[ 1941.845747] RSP: 0018:fffffe00012908f0 EFLAGS: 00010002
<4>[ 1941.845748] RAX: 0000000000000001 RBX: fffffe0001290930 RCX: 0000000000000001
<4>[ 1941.845748] RDX: 0000000000000002 RSI: ffff893d94719e80 RDI: ffffc9001b9d7fc8
<4>[ 1941.845748] RBP: 0000000000000004 R08: ffffffff81a009bf R09: ffffffff824c7c20
<4>[ 1941.845749] R10: ffffffff824c7c1c R11: 0000000000000023 R12: ffffffff826c9cde
<4>[ 1941.845749] R13: ffffffff81a009d2 R14: fffffe0001289ff0 R15: ffffc9001b9d7fc8
<4>[ 1941.845749] FS: 00007f57ff9aa700(0000) GS:ffff893f6f140000(0000) knlGS:0000000000000000
<4>[ 1941.845750] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 1941.845750] CR2: ffffc9001b9d8048 CR3: 000000bcd3936000 CR4: 0000000000340ee0
<4>[ 1941.845750] Call Trace:
<4>[ 1941.845750] <NMI>
<4>[ 1941.845751] perf_callchain_kernel+0x12b/0x140
<4>[ 1941.845751] ? interrupt_entry+0xb2/0xc0
<4>[ 1941.845751] get_perf_callchain+0x10d/0x280
<4>[ 1941.845751] perf_callchain+0x6e/0x80
<4>[ 1941.845752] perf_prepare_sample+0x87/0x540
<4>[ 1941.845752] perf_event_output_forward+0x31/0x90
<4>[ 1941.845752] ? sched_clock+0x5/0x10
<4>[ 1941.845752] ? sched_clock_cpu+0xc/0xb0
<4>[ 1941.845753] ? arch_perf_update_userpage+0xd0/0xe0
<4>[ 1941.845753] ? sched_clock+0x5/0x10
<4>[ 1941.845753] ? sched_clock_cpu+0xc/0xb0
<4>[ 1941.845753] __perf_event_overflow+0x5a/0xf0
<4>[ 1941.845754] perf_ibs_handle_irq+0x340/0x5b0
<4>[ 1941.845754] ? interrupt_entry+0xb2/0xc0
<4>[ 1941.845754] ? interrupt_entry+0xb2/0xc0
<4>[ 1941.845754] ? __set_pte_vaddr+0x28/0x40
<4>[ 1941.845755] ? __set_pte_vaddr+0x28/0x40
<4>[ 1941.845755] ? set_pte_vaddr+0x3c/0x60
<4>[ 1941.845755] ? __native_set_fixmap+0x24/0x30
<4>[ 1941.845755] ? native_set_fixmap+0x3c/0xb0
<4>[ 1941.845756] ? ghes_copy_tofrom_phys+0x98/0x130
<4>[ 1941.845756] ? interrupt_entry+0xb2/0xc0
<4>[ 1941.845756] ? __ghes_peek_estatus.isra.16+0x49/0xb0
<4>[ 1941.845756] ? perf_ibs_nmi_handler+0x34/0x60
<4>[ 1941.845757] ? sched_clock+0x5/0x10
<4>[ 1941.845757] perf_ibs_nmi_handler+0x34/0x60
<4>[ 1941.845757] nmi_handle+0x79/0x190
<4>[ 1941.845757] default_do_nmi+0x3e/0x110
<4>[ 1941.845758] do_nmi+0x18d/0x1e0
<4>[ 1941.845758] end_repeat_nmi+0x16/0x50
<4>[ 1941.845758] RIP: 0010:syscall_return_via_sysret+0x26/0x7f
<4>[ 1941.845759] Code: 3e 09 00 00 41 5f 41 5e 41 5d 41 5c 5d 5b 5e 41 5a 41 59 41 58 58 5e 5a 5e 48 89 e7 65 48 8b 24 25 04 60 00 00 ff 77 28 ff 37 <50> eb 43 0f 20 df eb 34 48 89 f8 48 81 e7 ff 07 00 00 65 48 0f a3
<4>[ 1941.845759] RSP: 0018:fffffe0001289ff0 EFLAGS: 00000046
<4>[ 1941.845760] RAX: 0000000000000001 RBX: 0000559a21ff71d0 RCX: 00007f57ff286260
<4>[ 1941.845760] RDX: 0000000000000001 RSI: 00007ffe43dc3d17 RDI: ffffc9001b9d7fc8
<4>[ 1941.845760] RBP: 00007ffe43dc3d17 R08: 0000000000005ceb R09: 00007f57ff9aa700
<4>[ 1941.845761] R10: 00007f57ff9aa9d0 R11: 0000000000000246 R12: 00007f57ff9aa698
<4>[ 1941.845761] R13: 00007f57ff9b5d00 R14: 0000559a1d5bb070 R15: 0000000000000005
<4>[ 1941.845761] ? syscall_return_via_sysret+0x26/0x7f
<4>[ 1941.845761] ? syscall_return_via_sysret+0x26/0x7f
<4>[ 1941.845762] </NMI>
<4>[ 1941.845762] <ENTRY_TRAMPOLINE>
<4>[ 1941.845762] </ENTRY_TRAMPOLINE>
<4>[ 1941.845762] Modules linked in: unix_diag ip6table_filter sch_fq_codel tcp_diag inet_diag act_police act_gact cls_u32 sch_ingress netconsole configfs 8021q garp stp mrp llc amd64_edac_mod edac_mce_amd ghash_clmulni_intel ipmi_ssif ipmi_si ipmi_devintf ipmi_msghandler sp5100_tco i2c_piix4 k10temp tcp_htcp tcp_bbr kvm_amd ccp kvm irqbypass mpls_gso mpls_iptunnel mpls_router fou6 fou ip_tunnel ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core dummy msr ip6_tables ip_tables x_tables ip6_tunnel tunnel6 nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast drm_vram_helper i2c_algo_bit mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlxfw ttm pci_hyperv_intf tls drm

Signed-off-by: Dmitry Monakhov <[email protected]>
---
arch/x86/kernel/unwind_orc.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 794fdef2501a..80b878772b86 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -343,7 +343,9 @@ static bool stack_access_ok(struct unwind_state *state, unsigned long _addr,
(get_stack_info(addr, state->task, info, &state->stack_mask)))
return false;

- return true;
+ /* Recheck range after stack info was updated */
+ return on_stack(info, addr, len);
+
}

static bool deref_stack_reg(struct unwind_state *state, unsigned long addr,
--
2.7.4


2022-04-13 00:10:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated

On Tue, Apr 12, 2022 at 10:40:03AM +0300, Dmitry Monakhov wrote:
> get_stack_info() detects stack type only by begin address, so we must
> check that address range in question is fully covered by detected stack
>
> Otherwise following crash is possible:
> -> unwind_next_frame
> case ORC_TYPE_REGS:
> if (!deref_stack_regs(state, sp, &state->ip, &state->sp))
> -> deref_stack_regs
> -> stack_access_ok <- here addr is inside stack range, but addr+len-1 is not, but we still exit with success
> *ip = READ_ONCE_NOCHECK(regs->ip); <- Here we hit stack guard fault
> OOPS LOG:
> <0>[ 1941.845743] BUG: stack guard page was hit at 000000000dd984a2 (stack is 00000000d1caafca..00000000613712f0)


> <4>[ 1941.845751] get_perf_callchain+0x10d/0x280
> <4>[ 1941.845751] perf_callchain+0x6e/0x80
> <4>[ 1941.845752] perf_prepare_sample+0x87/0x540
> <4>[ 1941.845752] perf_event_output_forward+0x31/0x90
> <4>[ 1941.845753] __perf_event_overflow+0x5a/0xf0
> <4>[ 1941.845754] perf_ibs_handle_irq+0x340/0x5b0
> <4>[ 1941.845757] perf_ibs_nmi_handler+0x34/0x60
> <4>[ 1941.845757] nmi_handle+0x79/0x190

Urgh, this is another instance of trying to unwind an IP that no longer
matches the stack.

Fixing the unwinder bug is good, but arguable we should also fix this
IBS stuff, see 6cbc304f2f36 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")

2022-04-16 02:50:44

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated

On Tue, Apr 12, 2022 at 10:40:03AM +0300, Dmitry Monakhov wrote:
> get_stack_info() detects stack type only by begin address, so we must
> check that address range in question is fully covered by detected stack
>
> Otherwise following crash is possible:
> -> unwind_next_frame
> case ORC_TYPE_REGS:
> if (!deref_stack_regs(state, sp, &state->ip, &state->sp))
> -> deref_stack_regs
> -> stack_access_ok <- here addr is inside stack range, but addr+len-1 is not, but we still exit with success
> *ip = READ_ONCE_NOCHECK(regs->ip); <- Here we hit stack guard fault

Hi Dmitry,

Thanks for the patch.

As Peter mentioned, the root cause of the crash was that the stack was
changing while the unwinder was reading it. But the patch is still
valid since the unwinder needs to be paranoid.

The commit log should have that background.

> OOPS LOG:
> <0>[ 1941.845743] BUG: stack guard page was hit at 000000000dd984a2 (stack is 00000000d1caafca..00000000613712f0)
> <4>[ 1941.845744] kernel stack overflow (page fault): 0000 [#1] SMP NOPTI
> <4>[ 1941.845744] CPU: 93 PID: 23787 Comm: context_switch1 Not tainted 5.4.145 #1
> <4>[ 1941.845745] Hardware name: XXXXXXXXXXXXXX
> <4>[ 1941.845746] RIP: 0010:unwind_next_frame+0x311/0x5b0
> <4>[ 1941.845746] Code: 01 0f 84 f6 01 00 00 0f 83 cc fe ff ff e9 f9 fe ff ff ba a8 00 00 00 4c 89 fe 48 89 df e8 57 fa ff ff 84 c0 0f 84 d9 00 00 00 <49> 8b 87 80 00 00 00 48 89 43 48 49 8b 87 98 00 00 00 4c 89 7b 50
> <4>[ 1941.845747] RSP: 0018:fffffe00012908f0 EFLAGS: 00010002
> <4>[ 1941.845748] RAX: 0000000000000001 RBX: fffffe0001290930 RCX: 0000000000000001
> <4>[ 1941.845748] RDX: 0000000000000002 RSI: ffff893d94719e80 RDI: ffffc9001b9d7fc8
> <4>[ 1941.845748] RBP: 0000000000000004 R08: ffffffff81a009bf R09: ffffffff824c7c20
> <4>[ 1941.845749] R10: ffffffff824c7c1c R11: 0000000000000023 R12: ffffffff826c9cde
> <4>[ 1941.845749] R13: ffffffff81a009d2 R14: fffffe0001289ff0 R15: ffffc9001b9d7fc8
> <4>[ 1941.845749] FS: 00007f57ff9aa700(0000) GS:ffff893f6f140000(0000) knlGS:0000000000000000
> <4>[ 1941.845750] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[ 1941.845750] CR2: ffffc9001b9d8048 CR3: 000000bcd3936000 CR4: 0000000000340ee0
> <4>[ 1941.845750] Call Trace:
> <4>[ 1941.845750] <NMI>
> <4>[ 1941.845751] perf_callchain_kernel+0x12b/0x140
> <4>[ 1941.845751] ? interrupt_entry+0xb2/0xc0
> <4>[ 1941.845751] get_perf_callchain+0x10d/0x280
> <4>[ 1941.845751] perf_callchain+0x6e/0x80
> <4>[ 1941.845752] perf_prepare_sample+0x87/0x540
> <4>[ 1941.845752] perf_event_output_forward+0x31/0x90
> <4>[ 1941.845752] ? sched_clock+0x5/0x10
> <4>[ 1941.845752] ? sched_clock_cpu+0xc/0xb0
> <4>[ 1941.845753] ? arch_perf_update_userpage+0xd0/0xe0
> <4>[ 1941.845753] ? sched_clock+0x5/0x10
> <4>[ 1941.845753] ? sched_clock_cpu+0xc/0xb0
> <4>[ 1941.845753] __perf_event_overflow+0x5a/0xf0
> <4>[ 1941.845754] perf_ibs_handle_irq+0x340/0x5b0
> <4>[ 1941.845754] ? interrupt_entry+0xb2/0xc0
> <4>[ 1941.845754] ? interrupt_entry+0xb2/0xc0
> <4>[ 1941.845754] ? __set_pte_vaddr+0x28/0x40
> <4>[ 1941.845755] ? __set_pte_vaddr+0x28/0x40
> <4>[ 1941.845755] ? set_pte_vaddr+0x3c/0x60
> <4>[ 1941.845755] ? __native_set_fixmap+0x24/0x30
> <4>[ 1941.845755] ? native_set_fixmap+0x3c/0xb0
> <4>[ 1941.845756] ? ghes_copy_tofrom_phys+0x98/0x130
> <4>[ 1941.845756] ? interrupt_entry+0xb2/0xc0
> <4>[ 1941.845756] ? __ghes_peek_estatus.isra.16+0x49/0xb0
> <4>[ 1941.845756] ? perf_ibs_nmi_handler+0x34/0x60
> <4>[ 1941.845757] ? sched_clock+0x5/0x10
> <4>[ 1941.845757] perf_ibs_nmi_handler+0x34/0x60
> <4>[ 1941.845757] nmi_handle+0x79/0x190
> <4>[ 1941.845757] default_do_nmi+0x3e/0x110
> <4>[ 1941.845758] do_nmi+0x18d/0x1e0
> <4>[ 1941.845758] end_repeat_nmi+0x16/0x50
> <4>[ 1941.845758] RIP: 0010:syscall_return_via_sysret+0x26/0x7f
> <4>[ 1941.845759] Code: 3e 09 00 00 41 5f 41 5e 41 5d 41 5c 5d 5b 5e 41 5a 41 59 41 58 58 5e 5a 5e 48 89 e7 65 48 8b 24 25 04 60 00 00 ff 77 28 ff 37 <50> eb 43 0f 20 df eb 34 48 89 f8 48 81 e7 ff 07 00 00 65 48 0f a3
> <4>[ 1941.845759] RSP: 0018:fffffe0001289ff0 EFLAGS: 00000046
> <4>[ 1941.845760] RAX: 0000000000000001 RBX: 0000559a21ff71d0 RCX: 00007f57ff286260
> <4>[ 1941.845760] RDX: 0000000000000001 RSI: 00007ffe43dc3d17 RDI: ffffc9001b9d7fc8
> <4>[ 1941.845760] RBP: 00007ffe43dc3d17 R08: 0000000000005ceb R09: 00007f57ff9aa700
> <4>[ 1941.845761] R10: 00007f57ff9aa9d0 R11: 0000000000000246 R12: 00007f57ff9aa698
> <4>[ 1941.845761] R13: 00007f57ff9b5d00 R14: 0000559a1d5bb070 R15: 0000000000000005
> <4>[ 1941.845761] ? syscall_return_via_sysret+0x26/0x7f
> <4>[ 1941.845761] ? syscall_return_via_sysret+0x26/0x7f
> <4>[ 1941.845762] </NMI>
> <4>[ 1941.845762] <ENTRY_TRAMPOLINE>
> <4>[ 1941.845762] </ENTRY_TRAMPOLINE>
> <4>[ 1941.845762] Modules linked in: unix_diag ip6table_filter sch_fq_codel tcp_diag inet_diag act_police act_gact cls_u32 sch_ingress netconsole configfs 8021q garp stp mrp llc amd64_edac_mod edac_mce_amd ghash_clmulni_intel ipmi_ssif ipmi_si ipmi_devintf ipmi_msghandler sp5100_tco i2c_piix4 k10temp tcp_htcp tcp_bbr kvm_amd ccp kvm irqbypass mpls_gso mpls_iptunnel mpls_router fou6 fou ip_tunnel ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core dummy msr ip6_tables ip_tables x_tables ip6_tunnel tunnel6 nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast drm_vram_helper i2c_algo_bit mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlxfw ttm pci_hyperv_intf tls drm

This oops has way too much detail for the commit log and can be trimmed
down a lot. See for example the backtraces section in
Documentation/process/submitting-patches.rst.

> Signed-off-by: Dmitry Monakhov <[email protected]>
> ---
> arch/x86/kernel/unwind_orc.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> index 794fdef2501a..80b878772b86 100644
> --- a/arch/x86/kernel/unwind_orc.c
> +++ b/arch/x86/kernel/unwind_orc.c
> @@ -343,7 +343,9 @@ static bool stack_access_ok(struct unwind_state *state, unsigned long _addr,
> (get_stack_info(addr, state->task, info, &state->stack_mask)))
> return false;
>
> - return true;
> + /* Recheck range after stack info was updated */
> + return on_stack(info, addr, len);
> +
> }

This punishes the typical case where the first on_stack() call
succeeded. It only needs to be done if the first on_stack() failed.

--
Josh

2022-04-16 02:50:46

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/unwind/orc: recheck address range after stack info was updated

On Tue, Apr 12, 2022 at 12:08:37PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 12, 2022 at 10:40:03AM +0300, Dmitry Monakhov wrote:
> > get_stack_info() detects stack type only by begin address, so we must
> > check that address range in question is fully covered by detected stack
> >
> > Otherwise following crash is possible:
> > -> unwind_next_frame
> > case ORC_TYPE_REGS:
> > if (!deref_stack_regs(state, sp, &state->ip, &state->sp))
> > -> deref_stack_regs
> > -> stack_access_ok <- here addr is inside stack range, but addr+len-1 is not, but we still exit with success
> > *ip = READ_ONCE_NOCHECK(regs->ip); <- Here we hit stack guard fault
> > OOPS LOG:
> > <0>[ 1941.845743] BUG: stack guard page was hit at 000000000dd984a2 (stack is 00000000d1caafca..00000000613712f0)
>
>
> > <4>[ 1941.845751] get_perf_callchain+0x10d/0x280
> > <4>[ 1941.845751] perf_callchain+0x6e/0x80
> > <4>[ 1941.845752] perf_prepare_sample+0x87/0x540
> > <4>[ 1941.845752] perf_event_output_forward+0x31/0x90
> > <4>[ 1941.845753] __perf_event_overflow+0x5a/0xf0
> > <4>[ 1941.845754] perf_ibs_handle_irq+0x340/0x5b0
> > <4>[ 1941.845757] perf_ibs_nmi_handler+0x34/0x60
> > <4>[ 1941.845757] nmi_handle+0x79/0x190
>
> Urgh, this is another instance of trying to unwind an IP that no longer
> matches the stack.
>
> Fixing the unwinder bug is good, but arguable we should also fix this
> IBS stuff, see 6cbc304f2f36 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)")

I remember that nastiness well. So it's still broken? Or is this a
regression? Maybe we wouldn't notice it except for this triggered
unwinder bug?

--
Josh

2022-04-29 22:33:12

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH] perf/amd/ibs: Use interrupt regs ip for stack unwinding

IbsOpRip is recorded when IBS interrupt is triggered. But there is
a skid from the time IBS interrupt gets triggered to the time the
interrupt is presented to the core. Meanwhile processor would have
moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp
recorded as part of the interrupt regs. This causes issues while
unwinding stack using the ORC unwinder as it needs consistent rip,
rsp and rbp. Fix this by using rip from interrupt regs instead of
IbsOpRip for stack unwinding.

Fixes: ee9f8fce99640 ("x86/unwind: Add the ORC unwinder")
Reported-by: Dmitry Monakhov <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ravi Bangoria <[email protected]>
---
arch/x86/events/amd/ibs.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 9739019d4b67..171941043f53 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -304,6 +304,16 @@ static int perf_ibs_init(struct perf_event *event)
hwc->config_base = perf_ibs->msr;
hwc->config = config;

+ /*
+ * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+ * recorded as part of interrupt regs. Thus we need to use rip from
+ * interrupt regs while unwinding call stack. Setting _EARLY flag
+ * makes sure we unwind call-stack before perf sample rip is set to
+ * IbsOpRip.
+ */
+ if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+ event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;
+
return 0;
}

@@ -687,6 +697,14 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
data.raw = &raw;
}

+ /*
+ * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+ * recorded as part of interrupt regs. Thus we need to use rip from
+ * interrupt regs while unwinding call stack.
+ */
+ if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+ data.callchain = perf_callchain(event, iregs);
+
throttle = perf_event_overflow(event, &data, &regs);
out:
if (throttle) {
--
2.27.0

2022-05-02 13:22:53

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH] perf/amd/ibs: Use interrupt regs ip for stack unwinding

Hello,

On Thu, Apr 28, 2022 at 10:15 PM Ravi Bangoria <[email protected]> wrote:
>
> IbsOpRip is recorded when IBS interrupt is triggered. But there is
> a skid from the time IBS interrupt gets triggered to the time the
> interrupt is presented to the core. Meanwhile processor would have
> moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp
> recorded as part of the interrupt regs. This causes issues while
> unwinding stack using the ORC unwinder as it needs consistent rip,
> rsp and rbp. Fix this by using rip from interrupt regs instead of
> IbsOpRip for stack unwinding.
>
> Fixes: ee9f8fce99640 ("x86/unwind: Add the ORC unwinder")
> Reported-by: Dmitry Monakhov <[email protected]>
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Ravi Bangoria <[email protected]>

Acked-by: Namhyung Kim <[email protected]>

Thanks,
Namhyung


> ---
> arch/x86/events/amd/ibs.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 9739019d4b67..171941043f53 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -304,6 +304,16 @@ static int perf_ibs_init(struct perf_event *event)
> hwc->config_base = perf_ibs->msr;
> hwc->config = config;
>
> + /*
> + * rip recorded by IbsOpRip will not be consistent with rsp and rbp
> + * recorded as part of interrupt regs. Thus we need to use rip from
> + * interrupt regs while unwinding call stack. Setting _EARLY flag
> + * makes sure we unwind call-stack before perf sample rip is set to
> + * IbsOpRip.
> + */
> + if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
> + event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;
> +
> return 0;
> }
>
> @@ -687,6 +697,14 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
> data.raw = &raw;
> }
>
> + /*
> + * rip recorded by IbsOpRip will not be consistent with rsp and rbp
> + * recorded as part of interrupt regs. Thus we need to use rip from
> + * interrupt regs while unwinding call stack.
> + */
> + if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
> + data.callchain = perf_callchain(event, iregs);
> +
> throttle = perf_event_overflow(event, &data, &regs);
> out:
> if (throttle) {
> --
> 2.27.0
>

2022-05-03 01:05:28

by tip-bot2 for Tony Luck

[permalink] [raw]
Subject: [tip: perf/core] perf/amd/ibs: Use interrupt regs ip for stack unwinding

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 1989455e0196f59741d81601284a6058acfaf225
Gitweb: https://git.kernel.org/tip/1989455e0196f59741d81601284a6058acfaf225
Author: Ravi Bangoria <[email protected]>
AuthorDate: Fri, 29 Apr 2022 10:44:41 +05:30
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 29 Apr 2022 11:06:28 +02:00

perf/amd/ibs: Use interrupt regs ip for stack unwinding

IbsOpRip is recorded when IBS interrupt is triggered. But there is
a skid from the time IBS interrupt gets triggered to the time the
interrupt is presented to the core. Meanwhile processor would have
moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp
recorded as part of the interrupt regs. This causes issues while
unwinding stack using the ORC unwinder as it needs consistent rip,
rsp and rbp. Fix this by using rip from interrupt regs instead of
IbsOpRip for stack unwinding.

Fixes: ee9f8fce99640 ("x86/unwind: Add the ORC unwinder")
Reported-by: Dmitry Monakhov <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/events/amd/ibs.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 9739019..11e8b49 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -304,6 +304,16 @@ static int perf_ibs_init(struct perf_event *event)
hwc->config_base = perf_ibs->msr;
hwc->config = config;

+ /*
+ * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+ * recorded as part of interrupt regs. Thus we need to use rip from
+ * interrupt regs while unwinding call stack. Setting _EARLY flag
+ * makes sure we unwind call-stack before perf sample rip is set to
+ * IbsOpRip.
+ */
+ if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+ event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;
+
return 0;
}

@@ -687,6 +697,14 @@ fail:
data.raw = &raw;
}

+ /*
+ * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+ * recorded as part of interrupt regs. Thus we need to use rip from
+ * interrupt regs while unwinding call stack.
+ */
+ if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+ data.callchain = perf_callchain(event, iregs);
+
throttle = perf_event_overflow(event, &data, &regs);
out:
if (throttle) {

2022-05-05 12:51:12

by tip-bot2 for Tony Luck

[permalink] [raw]
Subject: [tip: perf/core] perf/amd/ibs: Use interrupt regs ip for stack unwinding

The following commit has been merged into the perf/core branch of tip:

Commit-ID: bd24325684029a48f20a188b899eb84900d0bc9c
Gitweb: https://git.kernel.org/tip/bd24325684029a48f20a188b899eb84900d0bc9c
Author: Ravi Bangoria <[email protected]>
AuthorDate: Fri, 29 Apr 2022 10:44:41 +05:30
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 04 May 2022 11:18:27 +02:00

perf/amd/ibs: Use interrupt regs ip for stack unwinding

IbsOpRip is recorded when IBS interrupt is triggered. But there is
a skid from the time IBS interrupt gets triggered to the time the
interrupt is presented to the core. Meanwhile processor would have
moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp
recorded as part of the interrupt regs. This causes issues while
unwinding stack using the ORC unwinder as it needs consistent rip,
rsp and rbp. Fix this by using rip from interrupt regs instead of
IbsOpRip for stack unwinding.

Fixes: ee9f8fce99640 ("x86/unwind: Add the ORC unwinder")
Reported-by: Dmitry Monakhov <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/events/amd/ibs.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 9739019..11e8b49 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -304,6 +304,16 @@ static int perf_ibs_init(struct perf_event *event)
hwc->config_base = perf_ibs->msr;
hwc->config = config;

+ /*
+ * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+ * recorded as part of interrupt regs. Thus we need to use rip from
+ * interrupt regs while unwinding call stack. Setting _EARLY flag
+ * makes sure we unwind call-stack before perf sample rip is set to
+ * IbsOpRip.
+ */
+ if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+ event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;
+
return 0;
}

@@ -687,6 +697,14 @@ fail:
data.raw = &raw;
}

+ /*
+ * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+ * recorded as part of interrupt regs. Thus we need to use rip from
+ * interrupt regs while unwinding call stack.
+ */
+ if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+ data.callchain = perf_callchain(event, iregs);
+
throttle = perf_event_overflow(event, &data, &regs);
out:
if (throttle) {

2022-05-10 20:08:14

by tip-bot2 for Tony Luck

[permalink] [raw]
Subject: [tip: perf/core] perf/amd/ibs: Use interrupt regs ip for stack unwinding

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 3d47083b9ff46863e8374ad3bb5edb5e464c75f8
Gitweb: https://git.kernel.org/tip/3d47083b9ff46863e8374ad3bb5edb5e464c75f8
Author: Ravi Bangoria <[email protected]>
AuthorDate: Fri, 29 Apr 2022 10:44:41 +05:30
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 10 May 2022 11:00:45 +02:00

perf/amd/ibs: Use interrupt regs ip for stack unwinding

IbsOpRip is recorded when IBS interrupt is triggered. But there is
a skid from the time IBS interrupt gets triggered to the time the
interrupt is presented to the core. Meanwhile processor would have
moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp
recorded as part of the interrupt regs. This causes issues while
unwinding stack using the ORC unwinder as it needs consistent rip,
rsp and rbp. Fix this by using rip from interrupt regs instead of
IbsOpRip for stack unwinding.

Fixes: ee9f8fce99640 ("x86/unwind: Add the ORC unwinder")
Reported-by: Dmitry Monakhov <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/events/amd/ibs.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 9739019..11e8b49 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -304,6 +304,16 @@ static int perf_ibs_init(struct perf_event *event)
hwc->config_base = perf_ibs->msr;
hwc->config = config;

+ /*
+ * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+ * recorded as part of interrupt regs. Thus we need to use rip from
+ * interrupt regs while unwinding call stack. Setting _EARLY flag
+ * makes sure we unwind call-stack before perf sample rip is set to
+ * IbsOpRip.
+ */
+ if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+ event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;
+
return 0;
}

@@ -687,6 +697,14 @@ fail:
data.raw = &raw;
}

+ /*
+ * rip recorded by IbsOpRip will not be consistent with rsp and rbp
+ * recorded as part of interrupt regs. Thus we need to use rip from
+ * interrupt regs while unwinding call stack.
+ */
+ if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+ data.callchain = perf_callchain(event, iregs);
+
throttle = perf_event_overflow(event, &data, &regs);
out:
if (throttle) {