Hello
I don't have time to come up with a test case right now, but I've
applied the patch to fix the oops from two days ago, and re-ran
my perf_fuzzer tool and it immediately came up with another issue on ARM.
This is an ARM Pandaboard running 3.11-rc4 with the one-line
oops fix from the other thread.
Vince
[ 388.509063] Unable to handle kernel paging request at virtual address 73fd14cc
[ 388.509063] pgd = eca6c000
[ 388.519805] [73fd14cc] *pgd=00000000
[ 388.523651] Internal error: Oops: 5 [#1] SMP ARM
[ 388.528594] Modules linked in: snd_soc_omap_hdmi omapdss snd_soc_omap_abe_twl6040 snd_soc_twl6040 snd_soc_omap snd_soc_omap_hdmi_card snd_soc_omap_mcpdm snd_soc_omap_mcbsp snd_soc_core snd_compress regmap_spi snd_pcm snd_page_alloc snd_timer snd soundcore
[ 388.551757] CPU: 1 PID: 2790 Comm: perf_fuzzer Not tainted 3.11.0-rc4 #6
[ 388.559906] task: eddcab80 ti: ed892000 task.ti: ed892000
[ 388.565643] PC is at armpmu_map_event+0x20/0x88
[ 388.570495] LR is at armpmu_event_init+0x38/0x280
[ 388.574981] pc : [<c001c3e4>] lr : [<c001c17c>] psr: 60000013
[ 388.574981] sp : ed893e40 ip : ecececec fp : edfaec00
[ 388.581878] r10: 00000000 r9 : 00000000 r8 : ed8c3ac0
[ 388.593292] r7 : ed8c3b5c r6 : edfaec00 r5 : 00000000 r4 : 00000000
[ 388.593292] r3 : 000000ff r2 : c0496144 r1 : c049611c r0 : edfaec00
[ 388.607177] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 388.614776] Control: 10c5387d Table: aca6c04a DAC: 00000015
[ 388.614776] Process perf_fuzzer (pid: 2790, stack limit = 0xed892240)
[ 388.621826] Stack: (0xed893e40 to 0xed894000)
[ 388.632385] 3e40: 00000800 c001c17c 00000002 c008a748 00000001 00000000 00000000 c00bf078
[ 388.634796] 3e60: 00000000 edfaee50 00000000 00000000 00000000 edfaec00 ed8c3ac0 edfaec00
[ 388.634796] 3e80: 00000000 c073ffac ed893f20 c00bf180 00000001 00000000 c00bf078 ed893f20
[ 388.652435] 3ea0: 00000000 ed8c3ac0 00000000 00000000 00000000 c0cb0818 eddcab80 c00bf440
[ 388.667205] 3ec0: ed893f20 00000000 eddcab80 eca76800 00000000 eca76800 00000000 00000000
[ 388.668487] 3ee0: 00000000 ec984c80 eddcab80 c00bfe68 00000000 00000000 00000000 00000080
[ 388.684631] 3f00: 00000000 ed892000 00000000 ed892030 00000004 ecc7e3c8 ecc7e3c8 00000000
[ 388.693328] 3f20: 00000000 00000048 ecececec 00000000 00000000 00000000 00000000 00000000
[ 388.701843] 3f40: 00000000 00000000 00297810 00000000 00000000 00000000 00000000 00000000
[ 388.701843] 3f60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 388.719451] 3f80: 00000002 00000002 000103a4 00000002 0000016c c00128e8 ed892000 00000000
[ 388.724212] 3fa0: 00090998 c0012700 00000002 000103a4 00090ab8 00000000 00000000 0000000f
[ 388.731842] 3fc0: 00000002 000103a4 00000002 0000016c 00090ab0 00090ab8 000107a0 00090998
[ 388.745574] 3fe0: bed92be0 bed92bd0 0000b785 b6e8f6d0 40000010 00090ab8 00000000 00000000
[ 388.751831] [<c001c3e4>] (armpmu_map_event+0x20/0x88) from [<c001c17c>] (armpmu_event_init+0x38/0x280)
[ 388.764221] [<c001c17c>] (armpmu_event_init+0x38/0x280) from [<c00bf180>] (perf_init_event+0x108/0x180)
[ 388.764221] [<c00bf180>] (perf_init_event+0x108/0x180) from [<c00bf440>] (perf_event_alloc+0x248/0x40c)
[ 388.764221] [<c00bf440>] (perf_event_alloc+0x248/0x40c) from [<c00bfe68>] (SyS_perf_event_open+0x4f4/0x8fc)
[ 388.791839] [<c00bfe68>] (SyS_perf_event_open+0x4f4/0x8fc) from [<c0012700>] (ret_fast_syscall+0x0/0x48)
[ 388.804718] Code: 0a000005 e3540004 0a000016 e3540000 (0791010c)
[ 388.811248] ---[ end trace 89ea407225495d97 ]---
On Wed, Aug 07, 2013 at 09:14:55PM +0100, Vince Weaver wrote:
> Hello
Hi Vince,
> I don't have time to come up with a test case right now, but I've
> applied the patch to fix the oops from two days ago, and re-ran
> my perf_fuzzer tool and it immediately came up with another issue on ARM.
> This is an ARM Pandaboard running 3.11-rc4 with the one-line
> oops fix from the other thread.
Thanks for continuing with the fuzzer -- looks like it's finding some real
issues here. Unfortunately, I've been unable to spot the issue from the
panic, so could you please share your .config, vmlinux and/or some
information about the system call parameters?
That should help in deciphering what exactly went wrong.
Cherrs,
Will
> [ 388.509063] Unable to handle kernel paging request at virtual address 73fd14cc
> [ 388.509063] pgd = eca6c000
> [ 388.519805] [73fd14cc] *pgd=00000000
> [ 388.523651] Internal error: Oops: 5 [#1] SMP ARM
> [ 388.528594] Modules linked in: snd_soc_omap_hdmi omapdss snd_soc_omap_abe_twl6040 snd_soc_twl6040 snd_soc_omap snd_soc_omap_hdmi_card snd_soc_omap_mcpdm snd_soc_omap_mcbsp snd_soc_core snd_compress regmap_spi snd_pcm snd_page_alloc snd_timer snd soundcore
> [ 388.551757] CPU: 1 PID: 2790 Comm: perf_fuzzer Not tainted 3.11.0-rc4 #6
> [ 388.559906] task: eddcab80 ti: ed892000 task.ti: ed892000
> [ 388.565643] PC is at armpmu_map_event+0x20/0x88
> [ 388.570495] LR is at armpmu_event_init+0x38/0x280
> [ 388.574981] pc : [<c001c3e4>] lr : [<c001c17c>] psr: 60000013
> [ 388.574981] sp : ed893e40 ip : ecececec fp : edfaec00
> [ 388.581878] r10: 00000000 r9 : 00000000 r8 : ed8c3ac0
> [ 388.593292] r7 : ed8c3b5c r6 : edfaec00 r5 : 00000000 r4 : 00000000
> [ 388.593292] r3 : 000000ff r2 : c0496144 r1 : c049611c r0 : edfaec00
> [ 388.607177] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
> [ 388.614776] Control: 10c5387d Table: aca6c04a DAC: 00000015
> [ 388.614776] Process perf_fuzzer (pid: 2790, stack limit = 0xed892240)
> [ 388.621826] Stack: (0xed893e40 to 0xed894000)
> [ 388.632385] 3e40: 00000800 c001c17c 00000002 c008a748 00000001 00000000 00000000 c00bf078
> [ 388.634796] 3e60: 00000000 edfaee50 00000000 00000000 00000000 edfaec00 ed8c3ac0 edfaec00
> [ 388.634796] 3e80: 00000000 c073ffac ed893f20 c00bf180 00000001 00000000 c00bf078 ed893f20
> [ 388.652435] 3ea0: 00000000 ed8c3ac0 00000000 00000000 00000000 c0cb0818 eddcab80 c00bf440
> [ 388.667205] 3ec0: ed893f20 00000000 eddcab80 eca76800 00000000 eca76800 00000000 00000000
> [ 388.668487] 3ee0: 00000000 ec984c80 eddcab80 c00bfe68 00000000 00000000 00000000 00000080
> [ 388.684631] 3f00: 00000000 ed892000 00000000 ed892030 00000004 ecc7e3c8 ecc7e3c8 00000000
> [ 388.693328] 3f20: 00000000 00000048 ecececec 00000000 00000000 00000000 00000000 00000000
> [ 388.701843] 3f40: 00000000 00000000 00297810 00000000 00000000 00000000 00000000 00000000
> [ 388.701843] 3f60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 388.719451] 3f80: 00000002 00000002 000103a4 00000002 0000016c c00128e8 ed892000 00000000
> [ 388.724212] 3fa0: 00090998 c0012700 00000002 000103a4 00090ab8 00000000 00000000 0000000f
> [ 388.731842] 3fc0: 00000002 000103a4 00000002 0000016c 00090ab0 00090ab8 000107a0 00090998
> [ 388.745574] 3fe0: bed92be0 bed92bd0 0000b785 b6e8f6d0 40000010 00090ab8 00000000 00000000
> [ 388.751831] [<c001c3e4>] (armpmu_map_event+0x20/0x88) from [<c001c17c>] (armpmu_event_init+0x38/0x280)
> [ 388.764221] [<c001c17c>] (armpmu_event_init+0x38/0x280) from [<c00bf180>] (perf_init_event+0x108/0x180)
> [ 388.764221] [<c00bf180>] (perf_init_event+0x108/0x180) from [<c00bf440>] (perf_event_alloc+0x248/0x40c)
> [ 388.764221] [<c00bf440>] (perf_event_alloc+0x248/0x40c) from [<c00bfe68>] (SyS_perf_event_open+0x4f4/0x8fc)
> [ 388.791839] [<c00bfe68>] (SyS_perf_event_open+0x4f4/0x8fc) from [<c0012700>] (ret_fast_syscall+0x0/0x48)
> [ 388.804718] Code: 0a000005 e3540004 0a000016 e3540000 (0791010c)
> [ 388.811248] ---[ end trace 89ea407225495d97 ]---
>
On 08/07/13 15:31, Will Deacon wrote:
> On Wed, Aug 07, 2013 at 09:14:55PM +0100, Vince Weaver wrote:
>
>> [ 388.509063] Unable to handle kernel paging request at virtual address 73fd14cc
>> [ 388.509063] pgd = eca6c000
>> [ 388.519805] [73fd14cc] *pgd=00000000
>> [ 388.523651] Internal error: Oops: 5 [#1] SMP ARM
>> [ 388.528594] Modules linked in: snd_soc_omap_hdmi omapdss snd_soc_omap_abe_twl6040 snd_soc_twl6040 snd_soc_omap snd_soc_omap_hdmi_card snd_soc_omap_mcpdm snd_soc_omap_mcbsp snd_soc_core snd_compress regmap_spi snd_pcm snd_page_alloc snd_timer snd soundcore
>> [ 388.551757] CPU: 1 PID: 2790 Comm: perf_fuzzer Not tainted 3.11.0-rc4 #6
>> [ 388.559906] task: eddcab80 ti: ed892000 task.ti: ed892000
>> [ 388.565643] PC is at armpmu_map_event+0x20/0x88
>> [ 388.570495] LR is at armpmu_event_init+0x38/0x280
>> [ 388.574981] pc : [<c001c3e4>] lr : [<c001c17c>] psr: 60000013
>> [ 388.574981] sp : ed893e40 ip : ecececec fp : edfaec00
>> [ 388.581878] r10: 00000000 r9 : 00000000 r8 : ed8c3ac0
>> [ 388.593292] r7 : ed8c3b5c r6 : edfaec00 r5 : 00000000 r4 : 00000000
>> [ 388.593292] r3 : 000000ff r2 : c0496144 r1 : c049611c r0 : edfaec00
>> [ 388.607177] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
>> [ 388.614776] Control: 10c5387d Table: aca6c04a DAC: 00000015
>> [ 388.614776] Process perf_fuzzer (pid: 2790, stack limit = 0xed892240)
>> [ 388.621826] Stack: (0xed893e40 to 0xed894000)
>> [ 388.632385] 3e40: 00000800 c001c17c 00000002 c008a748 00000001 00000000 00000000 c00bf078
>> [ 388.634796] 3e60: 00000000 edfaee50 00000000 00000000 00000000 edfaec00 ed8c3ac0 edfaec00
>> [ 388.634796] 3e80: 00000000 c073ffac ed893f20 c00bf180 00000001 00000000 c00bf078 ed893f20
>> [ 388.652435] 3ea0: 00000000 ed8c3ac0 00000000 00000000 00000000 c0cb0818 eddcab80 c00bf440
>> [ 388.667205] 3ec0: ed893f20 00000000 eddcab80 eca76800 00000000 eca76800 00000000 00000000
>> [ 388.668487] 3ee0: 00000000 ec984c80 eddcab80 c00bfe68 00000000 00000000 00000000 00000080
>> [ 388.684631] 3f00: 00000000 ed892000 00000000 ed892030 00000004 ecc7e3c8 ecc7e3c8 00000000
>> [ 388.693328] 3f20: 00000000 00000048 ecececec 00000000 00000000 00000000 00000000 00000000
>> [ 388.701843] 3f40: 00000000 00000000 00297810 00000000 00000000 00000000 00000000 00000000
>> [ 388.701843] 3f60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> [ 388.719451] 3f80: 00000002 00000002 000103a4 00000002 0000016c c00128e8 ed892000 00000000
>> [ 388.724212] 3fa0: 00090998 c0012700 00000002 000103a4 00090ab8 00000000 00000000 0000000f
>> [ 388.731842] 3fc0: 00000002 000103a4 00000002 0000016c 00090ab0 00090ab8 000107a0 00090998
>> [ 388.745574] 3fe0: bed92be0 bed92bd0 0000b785 b6e8f6d0 40000010 00090ab8 00000000 00000000
>> [ 388.751831] [<c001c3e4>] (armpmu_map_event+0x20/0x88) from [<c001c17c>] (armpmu_event_init+0x38/0x280)
>> [ 388.764221] [<c001c17c>] (armpmu_event_init+0x38/0x280) from [<c00bf180>] (perf_init_event+0x108/0x180)
>> [ 388.764221] [<c00bf180>] (perf_init_event+0x108/0x180) from [<c00bf440>] (perf_event_alloc+0x248/0x40c)
>> [ 388.764221] [<c00bf440>] (perf_event_alloc+0x248/0x40c) from [<c00bfe68>] (SyS_perf_event_open+0x4f4/0x8fc)
>> [ 388.791839] [<c00bfe68>] (SyS_perf_event_open+0x4f4/0x8fc) from [<c0012700>] (ret_fast_syscall+0x0/0x48)
>> [ 388.804718] Code: 0a000005 e3540004 0a000016 e3540000 (0791010c)
>> [ 388.811248] ---[ end trace 89ea407225495d97 ]---
>>
$ ./scripts/decodecode < oops
[ 388.804718] Code: 0a000005 e3540004 0a000016 e3540000 (0791010c)
All code
========
0: 0a000005 beq 0x1c
4: e3540004 cmp r4, #4
8: 0a000016 beq 0x68
c: e3540000 cmp r4, #0
10:* 0791010c ldreq r0, [r1, ip, lsl #2] <-- trapping instruction
Code starting with the faulting instruction
===========================================
0: 0791010c ldreq r0, [r1, ip, lsl #2]
Is config some really big value? It looks like config (or more
specifically event->attr.config) is ecececec which is larger than 9
(PERF_COUNT_HW_MAX). I'm fairly certain r4 is event->attr.type
(PERF_TYPE_HARDWARE) and so we're out of bounds on that array access in
armpmu_map_hw_event(). Does the below patch fix that?
---8<----
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index d9f5cd4..21f7790 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -53,7 +53,12 @@ armpmu_map_cache_event(const unsigned (*cache_map)
static int
armpmu_map_hw_event(const unsigned (*event_map)[PERF_COUNT_HW_MAX], u64 config)
{
- int mapping = (*event_map)[config];
+ int mapping;
+
+ if (config >= PERF_COUNT_HW_MAX)
+ return -ENOENT;
+
+ mapping = (*event_map)[config];
return mapping == HW_OP_UNSUPPORTED ? -ENOENT : mapping;
}
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
On Wed, 7 Aug 2013, Stephen Boyd wrote:
> Is config some really big value? It looks like config (or more
> specifically event->attr.config) is ecececec which is larger than 9
> (PERF_COUNT_HW_MAX). I'm fairly certain r4 is event->attr.type
> (PERF_TYPE_HARDWARE) and so we're out of bounds on that array access in
> armpmu_map_hw_event(). Does the below patch fix that?
Yes, it was big values in attr.config. I managed to bisect down to a
simple test case, which is attached. Oddly the test case has two events
before the oops happens; I should double check to make sure both are
really necessary.
I'll try this patch and see if it fixes things, thanks.
Vince
On Wed, 7 Aug 2013, Stephen Boyd wrote:
> ---8<----
>
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index d9f5cd4..21f7790 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -53,7 +53,12 @@ armpmu_map_cache_event(const unsigned (*cache_map)
> static int
> armpmu_map_hw_event(const unsigned (*event_map)[PERF_COUNT_HW_MAX], u64 config)
> {
> - int mapping = (*event_map)[config];
> + int mapping;
> +
> + if (config >= PERF_COUNT_HW_MAX)
> + return -ENOENT;
> +
> + mapping = (*event_map)[config];
> return mapping == HW_OP_UNSUPPORTED ? -ENOENT : mapping;
> }
I've tested this patch and my testcase no longer causes the kernel to
oops, so
Tested-by: Vince Weaver <[email protected]>
Thanks,
Vince
On Wed, 7 Aug 2013, Vince Weaver wrote:
> On Wed, 7 Aug 2013, Stephen Boyd wrote:
>
> > ---8<----
> >
> > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> > index d9f5cd4..21f7790 100644
> > --- a/arch/arm/kernel/perf_event.c
> > +++ b/arch/arm/kernel/perf_event.c
> > @@ -53,7 +53,12 @@ armpmu_map_cache_event(const unsigned (*cache_map)
> > static int
> > armpmu_map_hw_event(const unsigned (*event_map)[PERF_COUNT_HW_MAX], u64 config)
> > {
> > - int mapping = (*event_map)[config];
> > + int mapping;
> > +
> > + if (config >= PERF_COUNT_HW_MAX)
> > + return -ENOENT;
> > +
> > + mapping = (*event_map)[config];
> > return mapping == HW_OP_UNSUPPORTED ? -ENOENT : mapping;
> > }
>
> I've tested this patch and my testcase no longer causes the kernel to
> oops, so
>
> Tested-by: Vince Weaver <[email protected]>
P.S. I re-ran the fuzzer again after applying the patch and the good news
is there were no further oopsen. The bad news is the machine locked
up solid. I'll investigate further when I'm not remote.
Vince
On Thu, Aug 08, 2013 at 12:18:08AM +0100, Stephen Boyd wrote:
> Is config some really big value? It looks like config (or more
> specifically event->attr.config) is ecececec which is larger than 9
> (PERF_COUNT_HW_MAX). I'm fairly certain r4 is event->attr.type
> (PERF_TYPE_HARDWARE) and so we're out of bounds on that array access in
> armpmu_map_hw_event(). Does the below patch fix that?
>
> ---8<----
>
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index d9f5cd4..21f7790 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -53,7 +53,12 @@ armpmu_map_cache_event(const unsigned (*cache_map)
> static int
> armpmu_map_hw_event(const unsigned (*event_map)[PERF_COUNT_HW_MAX], u64 config)
> {
> - int mapping = (*event_map)[config];
> + int mapping;
> +
> + if (config >= PERF_COUNT_HW_MAX)
> + return -ENOENT;
> +
> + mapping = (*event_map)[config];
Well spotted, thanks. If you make that return -EINVAL instead of -ENOENT (to
match what we do for cache events) then:
Acked-by: Will Deacon <[email protected]>
Could you stick it in the patch system please?
Thanks Stephen,
Will
On Thu, Aug 08, 2013 at 03:53:31AM +0100, Vince Weaver wrote:
> On Wed, 7 Aug 2013, Vince Weaver wrote:
> > On Wed, 7 Aug 2013, Stephen Boyd wrote:
> > > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> > > index d9f5cd4..21f7790 100644
> > > --- a/arch/arm/kernel/perf_event.c
> > > +++ b/arch/arm/kernel/perf_event.c
> > > @@ -53,7 +53,12 @@ armpmu_map_cache_event(const unsigned (*cache_map)
> > > static int
> > > armpmu_map_hw_event(const unsigned (*event_map)[PERF_COUNT_HW_MAX], u64 config)
> > > {
> > > - int mapping = (*event_map)[config];
> > > + int mapping;
> > > +
> > > + if (config >= PERF_COUNT_HW_MAX)
> > > + return -ENOENT;
> > > +
> > > + mapping = (*event_map)[config];
> > > return mapping == HW_OP_UNSUPPORTED ? -ENOENT : mapping;
> > > }
> >
> > I've tested this patch and my testcase no longer causes the kernel to
> > oops, so
> >
> > Tested-by: Vince Weaver <[email protected]>
>
> P.S. I re-ran the fuzzer again after applying the patch and the good news
> is there were no further oopsen. The bad news is the machine locked
> up solid. I'll investigate further when I'm not remote.
On the flip side, the good news is that we know the problem is there. We're
probably generating interrupts at some horrendous rate for the lock-up....
are you running your fuzzer as root?
Also, is your fuzzer available somewhere? I could take it for a spin on some
different architectures if you like.
Thanks,
Will
On Thu, 8 Aug 2013, Will Deacon wrote:
> On the flip side, the good news is that we know the problem is there. We're
> probably generating interrupts at some horrendous rate for the lock-up....
> are you running your fuzzer as root?
No, I'm running the fuzzer as a regular user.
> Also, is your fuzzer available somewhere? I could take it for a spin on some
> different architectures if you like.
Yes:
git clone https://github.com/deater/perf_event_tests.git
and it's in the "fuzzer" subdirectory. I think I've committed all of the
ARM related patches.
To run the tool it's just "./perf_fuzzer" and away you go. There's a lot
of other tools for generating and analyzing fuzzer syscall traces but
unfortunately they're not very user-friendly yet.
As for other architectures (at least ARM) in addition to the pandaboard I
also have a beagleboard and a cortex-A15 chromebook. The challenge is
always getting recent Linus-git kernels running on the things.
I also have a raspberry-pi. I've successfully accessed the perf counters
on that by reading the low-level registers directly with a kernel
modulue. There's no perf driver because the PMU interrupt isn't hooked
up. I've been meaning to get perf support going by making things
periodically polled rather than interrupt driven: has anybody looked into
doing that yet?
Vince
On 08/08/13 05:09, Will Deacon wrote:
> Well spotted, thanks. If you make that return -EINVAL instead of -ENOENT (to
> match what we do for cache events) then:
>
> Acked-by: Will Deacon <[email protected]>
>
> Could you stick it in the patch system please?
Submitted as 7810/1
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation