2014-06-05 02:36:51

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs

From: Zhang Yanmin <[email protected]>

We hit a kernel panic when running perf to collect some performance data.
kenel is x86_64 and user space apps are 32bit.

[ 71.965351, 1] [ Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[ 71.965360, 1] [ Binder_2] IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
[ 71.965367, 1] [ Binder_2] PGD 6c65f067 PUD 0
[ 71.965375, 1] [ Binder_2] Oops: 0000 [#1] PREEMPT SMP
[ 71.965413, 1] [ Binder_2] Modules linked in: ddrgx snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) tngdisp bcm4335(O) cfg80211
[ 71.965420, 1] [ Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G W O 3.10.20-263902-g184bfbc-dirty #14
[ 71.965426, 1] [ Binder_2] task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
[ 71.965439, 1] [ Binder_2] RIP: 0010:[<ffffffff82012091>] [<ffffffæf82012091>] get_segment_base+0x71/0xc0
[ 71.965<44, 1] [ Binder_2] RSP: 0018:ffff^X8007ea87b98 EFLAGS: 00010092
[ 71.965447, 1] [ !Binder_2] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
[ 71.965450, 1] [ Binder_2] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
[ 71.965454, 1] [ Binder_2] RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
[ 71.965458, 1] [ Binder_2] R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
[ 71.965462, 1] [ Binder_2] R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
[ 71.965468, 1_ [ Binder_2] FS: 0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
[ 71.965472, 1] [ Binder_2] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 71.965476, 1] [ Binder_2] CR2: 0000000000000004 CR3: 0000000076588000 CR4: 00000^P00001007e0
[ 71.965480, 1] [ Binder_2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 71.965485, 1] [ Binder_2] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 71.966141, 1] [ Binder_2] Stack:
[ 71.966152, 1] [ Binder_2] ffff88005f266c00 0000000000000000 ffff88007ea87c18 ffffffff82013cac
[ 71.966161, 1] [ Binder_2] ffff88007ea87d58 00000016fe4704a0 00000000000001a7 ffff88007ea87ef8
[ 71.966171, 1] [ Binder_6] ffff88005f266c00 ffff88007ea87ef8 ffff8800!e07b400 ffff88005f266c00
[ 71.966173, 1] [ Binder_2] Call Trace:
[ 71.966179, 1] [ Binder_2] <NMI>
[ 71.966190, 1] [ Binder_2] [<ffffffff82013cac>] perf_callchain_user+0x15c/0x240
[ 71.966202, 1] [ Binder_2] [<ffffffff82160754>] perf_callchain+0x134/0x180
[ 71.966210, 1] [ Binder_2] [<fffff&ff820e0787>] ? local_clock+0x47/0x60
[ 71.966221, 1] [ Binder_2] [<ffffffff8215d49b>] perf_prepare_sample+0x1bb/0x240
[ 71.966231, 1] [ Binder_2] [<ffffffff8215d667>] __perf_event_overflow+0x147/0x230
[ 71.966241, 1] [ Binder_2] [<ffffffff82012f68>] ? x86_perf_event_set_period+0xd8/0x150
[ 71.966252, 1] [ Binder_2] [<ffffffff8215df24>] perf_event_overflow+0x14/0x20
[ 71.966260, 1] [ Binder_2] [<ffffffff820194d2>] intel_pmu_handle_irq+0x1c2/0x270
[ 71.966270, 1] [ Binder_2] [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[ 71.966284, 1] [ Binder_2]$ [<ffffffff828aff01>] perf_event_nmi_handler+0x21/0x30
[ 71.966293, 1] [ Binder_2] [<ffffffff828af5b9>] nmi_handle.isr!.1+0x59/0x=0
[ 71.966303, 1] [` Binder_2] [<ffffffff828af6d8>] default_do_nmi+0x58/0x240
[ 71.966312, 1] [ " Binder_2] [<ffffffff828af978>] do_nmi+0xb8/0xf0
[ 71.966321, 1] [ Binder_2] [|ffffffgf828aebe7>] end_repeat_nmi+0x1e/0x2e
[ 71.966332, 1] [ Binder_2] [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[ 71.966341, 1] [ Binder_2] [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[ 71.966350, 1] [ Binder_2] [<fFffffff828b5d60>] ? call_softirq+0x30/0x30

Basically, ia32 uses sysenter to start system calls.

sysexit_from_sys_call=>trace_hardirqs_on_thunk. Before calling,
sysexit_from_sys_call already pops up pt_regs, then trace_hardirqs_on_thunk
would reuse pt_regs space. If perf NMI happens here, perf might use a bad pt_regs.

The patch fixes it by moving the calling to trace_hardirqs_on_thunk ahead of
the stack popup.

Change-Id: I6c4fc46b009ea056f2321ce5b8f54cf8769a7bdd
Signed-off-by: Zhang Yanmin <[email protected]>
---
arch/x86/ia32/ia32entry.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb0..df61fdb 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -167,6 +167,7 @@ sysenter_dispatch:
testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
jnz sysexit_audit
sysexit_from_sys_call:
+ TRACE_IRQS_ON
andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
/* clear IF, that popfq doesn't enable interrupts early */
andl $~0x200,EFLAGS-R11(%rsp)
@@ -181,7 +182,6 @@ sysexit_from_sys_call:
/*CFI_RESTORE rflags*/
popq_cfi %rcx /* User %esp */
CFI_REGISTER rsp,rcx
- TRACE_IRQS_ON
ENABLE_INTERRUPTS_SYSEXIT32

#ifdef CONFIG_AUDITSYSCALL
--
1.8.3.2


2014-06-05 07:19:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs

On Thu, Jun 05, 2014 at 10:36:10AM +0800, Liu ShuoX wrote:
> From: Zhang Yanmin <[email protected]>
>
> We hit a kernel panic when running perf to collect some performance data.
> kenel is x86_64 and user space apps are 32bit.
>
> [ 71.965351, 1] [ Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> [ 71.965360, 1] [ Binder_2] IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
> [ 71.965367, 1] [ Binder_2] PGD 6c65f067 PUD 0
> [ 71.965375, 1] [ Binder_2] Oops: 0000 [#1] PREEMPT SMP
> [ 71.965413, 1] [ Binder_2] Modules linked in: ddrgx snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) tngdisp bcm4335(O) cfg80211
> [ 71.965420, 1] [ Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G W O 3.10.20-263902-g184bfbc-dirty #14
> [ 71.965426, 1] [ Binder_2] task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
> [ 71.965439, 1] [ Binder_2] RIP: 0010:[<ffffffff82012091>] [<ffffff?f82012091>] get_segment_base+0x71/0xc0
^
> [ 71.965<44, 1] [ Binder_2] RSP: 0018:ffff^X8007ea87b98 EFLAGS: 00010092
^ ^
> [ 71.965447, 1] [ !Binder_2] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
^
> [ 71.965450, 1] [ Binder_2] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
> [ 71.965454, 1] [ Binder_2] RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
> [ 71.965458, 1] [ Binder_2] R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
> [ 71.965462, 1] [ Binder_2] R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
> [ 71.965468, 1_ [ Binder_2] FS: 0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
^

Are you suffering some serious corruption?

> Basically, ia32 uses sysenter to start system calls.
>
> sysexit_from_sys_call=>trace_hardirqs_on_thunk. Before calling,
> sysexit_from_sys_call already pops up pt_regs, then trace_hardirqs_on_thunk
> would reuse pt_regs space. If perf NMI happens here, perf might use a bad pt_regs.
>
> The patch fixes it by moving the calling to trace_hardirqs_on_thunk ahead of
> the stack popup.
>
> Change-Id: I6c4fc46b009ea056f2321ce5b8f54cf8769a7bdd

No idea what that is, but it needs to go.

I'll leave the actual patch to hpa, this isn't something I'm too
familiar with.

> Signed-off-by: Zhang Yanmin <[email protected]>
> ---
> arch/x86/ia32/ia32entry.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> index 4299eb0..df61fdb 100644
> --- a/arch/x86/ia32/ia32entry.S
> +++ b/arch/x86/ia32/ia32entry.S
> @@ -167,6 +167,7 @@ sysenter_dispatch:
> testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> jnz sysexit_audit
> sysexit_from_sys_call:
> + TRACE_IRQS_ON
> andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> /* clear IF, that popfq doesn't enable interrupts early */
> andl $~0x200,EFLAGS-R11(%rsp) @@ -181,7 +182,6 @@ sysexit_from_sys_call:
> /*CFI_RESTORE rflags*/
> popq_cfi %rcx /* User %esp */
> CFI_REGISTER rsp,rcx
> - TRACE_IRQS_ON
> ENABLE_INTERRUPTS_SYSEXIT32
> #ifdef CONFIG_AUDITSYSCALL
> --
> 1.8.3.2
>


Attachments:
(No filename) (3.55 kB)
(No filename) (836.00 B)
Download all attachments

2014-06-05 07:34:02

by Liu ShuoX

[permalink] [raw]
Subject: Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs

On Thu 5.Jun'14 at 9:19:19 +0200, Peter Zijlstra wrote:
>On Thu, Jun 05, 2014 at 10:36:10AM +0800, Liu ShuoX wrote:
>> From: Zhang Yanmin <[email protected]>
>>
>> We hit a kernel panic when running perf to collect some performance data.
>> kenel is x86_64 and user space apps are 32bit.
>>
>> [ 71.965351, 1] [ Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>> [ 71.965360, 1] [ Binder_2] IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
>> [ 71.965367, 1] [ Binder_2] PGD 6c65f067 PUD 0
>> [ 71.965375, 1] [ Binder_2] Oops: 0000 [#1] PREEMPT SMP
>> [ 71.965413, 1] [ Binder_2] Modules linked in: ddrgx snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) tngdisp bcm4335(O) cfg80211
>> [ 71.965420, 1] [ Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G W O 3.10.20-263902-g184bfbc-dirty #14
>> [ 71.965426, 1] [ Binder_2] task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
>> [ 71.965439, 1] [ Binder_2] RIP: 0010:[<ffffffff82012091>] [<ffffff?f82012091>] get_segment_base+0x71/0xc0
> ^
>> [ 71.965<44, 1] [ Binder_2] RSP: 0018:ffff^X8007ea87b98 EFLAGS: 00010092
> ^ ^
>> [ 71.965447, 1] [ !Binder_2] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
> ^
>> [ 71.965450, 1] [ Binder_2] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
>> [ 71.965454, 1] [ Binder_2] RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
>> [ 71.965458, 1] [ Binder_2] R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
>> [ 71.965462, 1] [ Binder_2] R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
>> [ 71.965468, 1_ [ Binder_2] FS: 0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
> ^
>
>Are you suffering some serious corruption?
The log captured by pstore after rebooting, so there are some
corruption. Please ignore those.
>
>> Basically, ia32 uses sysenter to start system calls.
>>
>> sysexit_from_sys_call=>trace_hardirqs_on_thunk. Before calling,
>> sysexit_from_sys_call already pops up pt_regs, then trace_hardirqs_on_thunk
>> would reuse pt_regs space. If perf NMI happens here, perf might use a bad pt_regs.
>>
>> The patch fixes it by moving the calling to trace_hardirqs_on_thunk ahead of
>> the stack popup.
>>
>> Change-Id: I6c4fc46b009ea056f2321ce5b8f54cf8769a7bdd
>
>No idea what that is, but it needs to go.
>
>I'll leave the actual patch to hpa, this isn't something I'm too
>familiar with.
Thanks. I will send out v2. We changed the code in v2.

2014-06-05 07:40:49

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH v2] perf: fix kernel panic when parsing user space CS saved in pt_regs

From: Zhang Yanmin <[email protected]>

ChangeLog V2: Before sysexit, perf NMI might arrive. There is
still a race. Here we change rsp to keep it pointing
to pt_regs->orig_ax.
In addition, after sti, before sysexit, an irq might
arrives. That causes more chances for perf NMI to jump
in.

We hit a kernel panic when running perf to collect some performance data.
kenel is x86_64 and user space apps are 32bit.

[71.965351, 1] [Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[71.965360, 1] [Binder_2] IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
[71.965367, 1] [Binder_2] PGD 6c65f067 PUD 0
[71.965375, 1] [Binder_2] Oops: 0000 [#1] PREEMPT SMP
[71.965413, 1] [Binder_2] Modules linked in: <...>
[71.965420, 1] [Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G W O 3.10.20-263902-g184bfbc-dirty #14
[71.965426, 1] [Binder_2] task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
[71.965439, 1] [Binder_2] RIP: 0010:[<ffffffff82012091>] [<ffffffæf82012091>] get_segment_base+0x71/0xc0
[71.965<44, 1] [Binder_2] RSP: 0018:ffff^X8007ea87b98 EFLAGS: 00010092
[71.965447, 1] [Binder_2] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
[71.965450, 1] [Binder_2] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
[71.965454, 1] [Binder_2] RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
[71.965458, 1] [Binder_2] R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
[71.965462, 1] [Binder_2] R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
[71.965468, 1_ [Binder_2] FS: 0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
[71.965472, 1] [Binder_2] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[71.965476, 1] [Binder_2] CR2: 0000000000000004 CR3: 0000000076588000 CR4: 00000^P00001007e0
[71.965480, 1] [Binder_2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[71.965485, 1] [Binder_2] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[71.966141, 1] [Binder_2] Stack:
[71.966152, 1] [Binder_2] ffff88005f266c00 0000000000000000 ffff88007ea87c18 ffffffff82013cac
[71.966161, 1] [Binder_2] ffff88007ea87d58 00000016fe4704a0 00000000000001a7 ffff88007ea87ef8
[71.966171, 1] [Binder_6] ffff88005f266c00 ffff88007ea87ef8 ffff8800!e07b400 ffff88005f266c00
[71.966173, 1] [Binder_2] Call Trace:
[71.966179, 1] [Binder_2] <NMI>
[71.966190, 1] [Binder_2] [<ffffffff82013cac>] perf_callchain_user+0x15c/0x240
[71.966202, 1] [Binder_2] [<ffffffff82160754>] perf_callchain+0x134/0x180
[71.966210, 1] [Binder_2] [<fffff&ff820e0787>] ? local_clock+0x47/0x60
[71.966221, 1] [Binder_2] [<ffffffff8215d49b>] perf_prepare_sample+0x1bb/0x240
[71.966231, 1] [Binder_2] [<ffffffff8215d667>] __perf_event_overflow+0x147/0x230
[71.966241, 1] [Binder_2] [<ffffffff82012f68>] ? x86_perf_event_set_period+0xd8/0x150
[71.966252, 1] [Binder_2] [<ffffffff8215df24>] perf_event_overflow+0x14/0x20
[71.966260, 1] [Binder_2] [<ffffffff820194d2>] intel_pmu_handle_irq+0x1c2/0x270
[71.966270, 1] [Binder_2] [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[71.966284, 1] [Binder_2]$ [<ffffffff828aff01>] perf_event_nmi_handler+0x21/0x30
[71.966293, 1] [Binder_2] [<ffffffff828af5b9>] nmi_handle.isr!.1+0x59/0x=0
[71.966303, 1] [Binder_2] [<ffffffff828af6d8>] default_do_nmi+0x58/0x240
[71.966312, 1] [Binder_2] [<ffffffff828af978>] do_nmi+0xb8/0xf0
[71.966321, 1] [Binder_2] [|ffffffgf828aebe7>] end_repeat_nmi+0x1e/0x2e
[71.966332, 1] [Binder_2] [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[71.966341, 1] [Binder_2] [<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[71.966350, 1] [Binder_2] [<fFffffff828b5d60>] ? call_softirq+0x30/0x30

Basically, ia32 uses sysenter to start system calls.

sysexit_from_sys_call=>trace_hardirqs_on_thunk. Before calling,
sysexit_from_sys_call already pops up pt_regs, then trace_hardirqs_on_thunk
would reuse pt_regs space. If perf NMI happens here, perf might use a bad pt_regs.

The patch fixes it by keeping rsp pointing under pt_regs->ip.

Signed-off-by: Zhang Yanmin <[email protected]>
Signed-off-by: Liu ShuoX <[email protected]>
---
arch/x86/ia32/ia32entry.S | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb0..a695d8e 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -172,15 +172,16 @@ sysexit_from_sys_call:
andl $~0x200,EFLAGS-R11(%rsp)
movl RIP-R11(%rsp),%edx /* User %eip */
CFI_REGISTER rip,rdx
- RESTORE_ARGS 0,24,0,0,0,0
- xorq %r8,%r8
+ RESTORE_ARGS 0,0,0,0,0,0
+ movq 24(%rsp),%r8 /* rflags */
+ movq 32(%rsp),%rcx /* User %esp */
xorq %r9,%r9
xorq %r10,%r10
xorq %r11,%r11
- popfq_cfi
+ pushq_cfi %r8
/*CFI_RESTORE rflags*/
- popq_cfi %rcx /* User %esp */
- CFI_REGISTER rsp,rcx
+ popfq_cfi
+ xorq %r8,%r8
TRACE_IRQS_ON
ENABLE_INTERRUPTS_SYSEXIT32

--
1.8.3.2

2014-06-05 07:55:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs

On Thu, Jun 05, 2014 at 03:33:21PM +0800, Liu ShuoX wrote:
> On Thu 5.Jun'14 at 9:19:19 +0200, Peter Zijlstra wrote:
> >On Thu, Jun 05, 2014 at 10:36:10AM +0800, Liu ShuoX wrote:
> >>From: Zhang Yanmin <[email protected]>
> >>
> >>We hit a kernel panic when running perf to collect some performance data.
> >>kenel is x86_64 and user space apps are 32bit.
> >>
> >>[ 71.965351, 1] [ Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> >>[ 71.965360, 1] [ Binder_2] IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
> >>[ 71.965367, 1] [ Binder_2] PGD 6c65f067 PUD 0
> >>[ 71.965375, 1] [ Binder_2] Oops: 0000 [#1] PREEMPT SMP
> >>[ 71.965413, 1] [ Binder_2] Modules linked in: ddrgx snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) tngdisp bcm4335(O) cfg80211
> >>[ 71.965420, 1] [ Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G W O 3.10.20-263902-g184bfbc-dirty #14
> >>[ 71.965426, 1] [ Binder_2] task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
> >>[ 71.965439, 1] [ Binder_2] RIP: 0010:[<ffffffff82012091>] [<ffffff?f82012091>] get_segment_base+0x71/0xc0
> > ^
> >>[ 71.965<44, 1] [ Binder_2] RSP: 0018:ffff^X8007ea87b98 EFLAGS: 00010092
> > ^ ^
> >>[ 71.965447, 1] [ !Binder_2] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
> > ^
> >>[ 71.965450, 1] [ Binder_2] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
> >>[ 71.965454, 1] [ Binder_2] RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
> >>[ 71.965458, 1] [ Binder_2] R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
> >>[ 71.965462, 1] [ Binder_2] R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
> >>[ 71.965468, 1_ [ Binder_2] FS: 0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
> > ^
> >
> >Are you suffering some serious corruption?
> The log captured by pstore after rebooting, so there are some
> corruption. Please ignore those.

Why does pstore cause corruption? I thought that stuff was supposed to
be 'good' ?


Attachments:
(No filename) (2.47 kB)
(No filename) (836.00 B)
Download all attachments

2014-06-05 08:00:29

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs

On 2014/6/5 15:55, Peter Zijlstra wrote:
> On Thu, Jun 05, 2014 at 03:33:21PM +0800, Liu ShuoX wrote:
>> On Thu 5.Jun'14 at 9:19:19 +0200, Peter Zijlstra wrote:
>>> On Thu, Jun 05, 2014 at 10:36:10AM +0800, Liu ShuoX wrote:
>>>> From: Zhang Yanmin <[email protected]>
>>>>
>>>> We hit a kernel panic when running perf to collect some performance data.
>>>> kenel is x86_64 and user space apps are 32bit.
>>>>
>>>> [ 71.965351, 1] [ Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>>>> [ 71.965360, 1] [ Binder_2] IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
>>>> [ 71.965367, 1] [ Binder_2] PGD 6c65f067 PUD 0
>>>> [ 71.965375, 1] [ Binder_2] Oops: 0000 [#1] PREEMPT SMP
>>>> [ 71.965413, 1] [ Binder_2] Modules linked in: ddrgx snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) tngdisp bcm4335(O) cfg80211
>>>> [ 71.965420, 1] [ Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G W O 3.10.20-263902-g184bfbc-dirty #14
>>>> [ 71.965426, 1] [ Binder_2] task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
>>>> [ 71.965439, 1] [ Binder_2] RIP: 0010:[<ffffffff82012091>] [<ffffff?f82012091>] get_segment_base+0x71/0xc0
>>> ^
>>>> [ 71.965<44, 1] [ Binder_2] RSP: 0018:ffff^X8007ea87b98 EFLAGS: 00010092
>>> ^ ^
>>>> [ 71.965447, 1] [ !Binder_2] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
>>> ^
>>>> [ 71.965450, 1] [ Binder_2] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
>>>> [ 71.965454, 1] [ Binder_2] RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
>>>> [ 71.965458, 1] [ Binder_2] R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
>>>> [ 71.965462, 1] [ Binder_2] R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
>>>> [ 71.965468, 1_ [ Binder_2] FS: 0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
>>> ^
>>>
>>> Are you suffering some serious corruption?
>> The log captured by pstore after rebooting, so there are some
>> corruption. Please ignore those.
> Why does pstore cause corruption? I thought that stuff was supposed to
> be 'good' ?
pstore is good if the board is reset by WarmReset as memory content is
kept across rebooting.
If it's a ColdReset, memory might lose some or all contents. My board
uses Coldreset, which is a very
fast ColdReset. Most memory content can be kept. But sometimes, some
data has little change.

Yanmin

2014-06-05 09:15:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs

On Thu, Jun 05, 2014 at 04:00:24PM +0800, Zhang, Yanmin wrote:
> On 2014/6/5 15:55, Peter Zijlstra wrote:

> >Why does pstore cause corruption? I thought that stuff was supposed to
> >be 'good' ?

> pstore is good if the board is reset by WarmReset as memory content is kept
> across rebooting.
> If it's a ColdReset, memory might lose some or all contents. My board uses
> Coldreset, which is a very
> fast ColdReset. Most memory content can be kept. But sometimes, some data
> has little change.

Oh, I thought pstore was persistent across cold resets or even power
outages. My bad then.


Attachments:
(No filename) (590.00 B)
(No filename) (836.00 B)
Download all attachments

2014-06-05 13:15:41

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs

On 2014/6/5 17:15, Peter Zijlstra wrote:
> On Thu, Jun 05, 2014 at 04:00:24PM +0800, Zhang, Yanmin wrote:
>> On 2014/6/5 15:55, Peter Zijlstra wrote:
>>> Why does pstore cause corruption? I thought that stuff was supposed to
>>> be 'good' ?
>> pstore is good if the board is reset by WarmReset as memory content is kept
>> across rebooting.
>> If it's a ColdReset, memory might lose some or all contents. My board uses
>> Coldreset, which is a very
>> fast ColdReset. Most memory content can be kept. But sometimes, some data
>> has little change.
> Oh, I thought pstore was persistent across cold resets or even power
> outages. My bad then.

Sorry for misleading you. It depends on backend. Pstore can use RAM or
other storage, for example, EFI-provided backend. If using EFI backend,
pstore can keep persistent across cold reset or even power outrages.

If using RAM, usually pstore can keep persistent at WarmReset. Since RAM
access (even at non-cache mode) is fast and more flexible, in fact, we often use
RAM as backend of pstore.

2014-06-05 13:21:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs

On Thu, Jun 05, 2014 at 09:15:36PM +0800, Zhang, Yanmin wrote:

> Sorry for misleading you. It depends on backend. Pstore can use RAM or
> other storage, for example, EFI-provided backend. If using EFI backend,
> pstore can keep persistent across cold reset or even power outrages.
>
> If using RAM, usually pstore can keep persistent at WarmReset. Since RAM
> access (even at non-cache mode) is fast and more flexible, in fact, we often use
> RAM as backend of pstore.

Funny definition of persistent you end up with in that case... :-) But
yes, that does explain.


Attachments:
(No filename) (567.00 B)
(No filename) (836.00 B)
Download all attachments

2014-06-06 01:56:10

by Liu ShuoX

[permalink] [raw]
Subject: [PATCH v3] perf: fix kernel panic when parsing user space CS saved in pt_regs

From: Zhang Yanmin <[email protected]>

ChangeLog V3: Keep rsp pointing to pt_regs before sysexit.

ChangeLog V2: Before sysexit, perf NMI might arrive. There is
still a race. Here we change rsp to keep it pointing
to pt_regs->orig_ax.
In addition, after sti, before sysexit, an irq might
arrives. That causes more chances for perf NMI to jump
in.

We hit a kernel panic when running perf to collect some performance data.
kenel is x86_64 and user space apps are 32bit.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffff82012091>] get_segment_base+0x71/0xc0
PGD 6c65f067 PUD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: <...>
CPU: 1 PID: 304 Comm: Binder_2 Tainted: G W O 3.10.20-263902-g184bfbc-dirty #14
task: ffff8800764dc300 ti: ffff88006c6e8000 task.ti: ffff88006c6e8000
RIP: 0010:[<ffffffff82012091>] [<ffffffæf82012091>] get_segment_base+0x71/0xc0
RSP: 0018:ffff^X8007ea87b98 EFLAGS: 00010092
RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
RBP: ffff88007ea87ba8 R08: ffffffff83143b3c R09: ffffffff831848a8
R10: 0000000000000000 R11: 00000000001bf2d8 R12: 0000000000000000
R13: ffff88006c6e9fd8 R14: ffff88006c6e9f58 R15: ffff8800764dc300
FS: 0000000000000000(0000) GS:ffff88007ea80000(006b) knlGS:00000000f704add0
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000000000004 CR3: 0000000076588000 CR4: 00000^P00001007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88005f266c00 0000000000000000 ffff88007ea87c18 ffffffff82013cac
ffff88007ea87d58 00000016fe4704a0 00000000000001a7 ffff88007ea87ef8
ffff88005f266c00 ffff88007ea87ef8 ffff8800!e07b400 ffff88005f266c00
Call Trace:
<NMI>
[<ffffffff82013cac>] perf_callchain_user+0x15c/0x240
[<ffffffff82160754>] perf_callchain+0x134/0x180
[<fffff&ff820e0787>] ? local_clock+0x47/0x60
[<ffffffff8215d49b>] perf_prepare_sample+0x1bb/0x240
[<ffffffff8215d667>] __perf_event_overflow+0x147/0x230
[<ffffffff82012f68>] ? x86_perf_event_set_period+0xd8/0x150
[<ffffffff8215df24>] perf_event_overflow+0x14/0x20
[<ffffffff820194d2>] intel_pmu_handle_irq+0x1c2/0x270
[<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[<ffffffff828aff01>] perf_event_nmi_handler+0x21/0x30
[<ffffffff828af5b9>] nmi_handle.isr!.1+0x59/0x=0
[<ffffffff828af6d8>] default_do_nmi+0x58/0x240
[<ffffffff828af978>] do_nmi+0xb8/0xf0
[|ffffffgf828aebe7>] end_repeat_nmi+0x1e/0x2e
[<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[<ffffffff828b5d60>] ? call_softirq+0x30/0x30
[<fFffffff828b5d60>] ? call_softirq+0x30/0x30

Basically, ia32 uses sysenter to start system calls.

sysexit_from_sys_call=>trace_hardirqs_on_thunk. Before calling,
sysexit_from_sys_call already pops up pt_regs, then trace_hardirqs_on_thunk
would reuse pt_regs space. If perf NMI happens here, perf might use a bad pt_regs.

The patch fixes it by keeping rsp pointing to pt_regs.

Signed-off-by: Zhang Yanmin <[email protected]>
Signed-off-by: Liu Shuox <[email protected]>
---
arch/x86/ia32/ia32entry.S | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb0..d2f905b 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -172,15 +172,16 @@ sysexit_from_sys_call:
andl $~0x200,EFLAGS-R11(%rsp)
movl RIP-R11(%rsp),%edx /* User %eip */
CFI_REGISTER rip,rdx
- RESTORE_ARGS 0,24,0,0,0,0
- xorq %r8,%r8
+ RESTORE_ARGS 0,-ARG_SKIP,0,0,0,0
+ movq EFLAGS-R11(%rsp),%r8 /* rflags */
+ movq RSP-R11(%rsp),%rcx /* User %esp */
xorq %r9,%r9
xorq %r10,%r10
xorq %r11,%r11
- popfq_cfi
+ pushq_cfi %r8
/*CFI_RESTORE rflags*/
- popq_cfi %rcx /* User %esp */
- CFI_REGISTER rsp,rcx
+ popfq_cfi
+ xorq %r8,%r8
TRACE_IRQS_ON
ENABLE_INTERRUPTS_SYSEXIT32

--
1.8.3.2