2016-03-18 06:44:24

by Guenter Roeck

[permalink] [raw]
Subject: linux-next: Crash in arm_pmu_device_probe() due to 'drivers/perf: arm_pmu: make info messages more verbose'

Hi,

I am seeing the attached crash when running a realview-pb-a8 image with realview_defconfig in qemu.
bisect wasn't successful, but a commit analysis identified commit 'drivers/perf: arm_pmu: make info
messages more verbose' as the culprit. Reverting this commit fixes the problem.

The code roughly looks as follows.

int arm_pmu_device_probe()
{
...
if (node && ..) {
} else {
}

if (ret) {
pr_info("%s: failed to probe PMU! Error %i\n",
node->full_name, ret);
goto out_free;
}
....
out_free:
pr_info("%s: failed to register PMU devices! Error %i\n",
node->full_name, ret);
....
}

Note that 'node' is dereferenced even though it was previously checked if it is NULL.
The configuration I am testing does not use devicetree.

Can you use dev_info() instead ?

Thanks,
Guenter

---
Crash log:

Unable to handle kernel NULL pointer dereference at virtual address 0000000c
pgd = c0004000
[0000000c] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.5.0-next-20160317 #1
Hardware name: ARM-RealView PB-A8
task: df427600 ti: df428000 task.ti: df428000
PC is at arm_pmu_device_probe+0x11c/0x6ec
LR is at smp_call_function_single+0xe8/0x164
pc : [<c03bc2d0>] lr : [<c0180260>] psr: a0000053
sp : df429e40 ip : df428000 fp : 00000000
r10: df4aa200 r9 : 00000090 r8 : c0500d5c
r7 : c05015c8 r6 : fffffffa r5 : c08457f8 r4 : c080a5d8
r3 : 00000000 r2 : fffffffa r1 : df429df8 r0 : c05e6214
Flags: NzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 70004059 DAC: 00000051
Process swapper/0 (pid: 1, stack limit = 0xdf428210)
Stack: (0xdf429e40 to 0xdf42a000)
9e40: 00000000 c05ce464 00000000 00000001 00000090 ffffffed c080a5e8 fffffdfb
9e60: c0806a1c c0806a1c 00000090 00000000 00000000 c02f95b4 c02f9564 c080a5e8
9e80: c0844630 c0844638 00000000 c02f7f04 00000000 c080a5e8 c0806a1c c080a61c
9ea0: 00000000 c0704714 00000000 c02f8040 00000000 c0806a1c c02f7f94 c02f6504
9ec0: df41785c df47bcb4 c0806a1c df4ec300 c081afc8 c02f74d0 c05b859c a0000053
9ee0: c0806a1c c0806a1c c080514c df58e780 c082b400 c02f885c c02f915c c080514c
9f00: c080514c c0101744 0000005f 00000000 00000000 00000000 00000000 c022645c
9f20: 00000000 c0810320 c061bb48 c0512b18 00000090 c0136c3c 00000000 c05db800
9f40: c061b008 00000000 00000006 00000006 c08102e8 dfffc1c0 c0733bb8 00000006
9f60: c0728830 c082b400 c07005a4 00000090 c072883c c0700d70 00000006 00000006
9f80: 00000000 c07005a4 00000000 c04ad914 00000000 00000000 00000000 00000000
9fa0: 00000000 c04ad91c 00000000 c0107830 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<c03bc2d0>] (arm_pmu_device_probe) from [<c02f95b4>] (platform_drv_probe+0x50/0xb0)
[<c02f95b4>] (platform_drv_probe) from [<c02f7f04>] (driver_probe_device+0x218/0x2a8)
[<c02f7f04>] (driver_probe_device) from [<c02f8040>] (__driver_attach+0xac/0xb0)
[<c02f8040>] (__driver_attach) from [<c02f6504>] (bus_for_each_dev+0x54/0x88)
[<c02f6504>] (bus_for_each_dev) from [<c02f74d0>] (bus_add_driver+0xe4/0x1f4)
[<c02f74d0>] (bus_add_driver) from [<c02f885c>] (driver_register+0x78/0xf4)
[<c02f885c>] (driver_register) from [<c0101744>] (do_one_initcall+0x80/0x1d8)
[<c0101744>] (do_one_initcall) from [<c0700d70>] (kernel_init_freeable+0x118/0x1ec)
[<c0700d70>] (kernel_init_freeable) from [<c04ad91c>] (kernel_init+0x8/0x110)
[<c04ad91c>] (kernel_init) from [<c0107830>] (ret_from_fork+0x14/0x24)
Code: e3e0600b e59d3008 e1a02006 e59f03cc (e593100c)
---[ end trace bfac761a54ea927f ]---


2016-03-18 09:18:47

by Dirk Behme

[permalink] [raw]
Subject: Re: linux-next: Crash in arm_pmu_device_probe() due to 'drivers/perf: arm_pmu: make info messages more verbose'

Hi Guenter,

On 18.03.2016 07:44, Guenter Roeck wrote:
> Hi,
>
> I am seeing the attached crash when running a realview-pb-a8 image with
> realview_defconfig in qemu.
> bisect wasn't successful, but a commit analysis identified commit
> 'drivers/perf: arm_pmu: make info
> messages more verbose' as the culprit. Reverting this commit fixes the
> problem.
>
> The code roughly looks as follows.
>
> int arm_pmu_device_probe()
> {
> ...
> if (node && ..) {
> } else {
> }
>
> if (ret) {
> pr_info("%s: failed to probe PMU! Error %i\n",
> node->full_name, ret);
> goto out_free;
> }
> ....
> out_free:
> pr_info("%s: failed to register PMU devices! Error %i\n",
> node->full_name, ret);
> ....
> }
>
> Note that 'node' is dereferenced even though it was previously checked
> if it is NULL.
> The configuration I am testing does not use devicetree.
>
> Can you use dev_info() instead ?


Does anything like below [1] does work for you?

If so, could you please share the output? I.e. what it prints in your
non-devicetree non-pmu case?

Best regards

Dirk

[1]

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 11bacc7..bda3502 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -1002,8 +1002,7 @@ int arm_pmu_device_probe(struct platform_device *pdev,
}

if (ret) {
- pr_info("%s: failed to probe PMU! Error %i\n",
- node->full_name, ret);
+ dev_info(&pdev->dev, "failed to probe PMU! Error %i\n",
ret);
goto out_free;
}

@@ -1023,8 +1022,7 @@ int arm_pmu_device_probe(struct platform_device *pdev,
out_destroy:
cpu_pmu_destroy(pmu);
out_free:
- pr_info("%s: failed to register PMU devices! Error %i\n",
- node->full_name, ret);
+ dev_info(&pdev->dev, "failed to register PMU devices! Error
%i\n", ret);
kfree(pmu);
return ret;
}

2016-03-18 13:30:30

by Guenter Roeck

[permalink] [raw]
Subject: Re: linux-next: Crash in arm_pmu_device_probe() due to 'drivers/perf: arm_pmu: make info messages more verbose'

Hi Dirk,

On 03/18/2016 02:18 AM, Dirk Behme wrote:
> Hi Guenter,
>
> On 18.03.2016 07:44, Guenter Roeck wrote:
>> Hi,
>>
>> I am seeing the attached crash when running a realview-pb-a8 image with
>> realview_defconfig in qemu.
>> bisect wasn't successful, but a commit analysis identified commit
>> 'drivers/perf: arm_pmu: make info
>> messages more verbose' as the culprit. Reverting this commit fixes the
>> problem.
>>
>> The code roughly looks as follows.
>>
>> int arm_pmu_device_probe()
>> {
>> ...
>> if (node && ..) {
>> } else {
>> }
>>
>> if (ret) {
>> pr_info("%s: failed to probe PMU! Error %i\n",
>> node->full_name, ret);
>> goto out_free;
>> }
>> ....
>> out_free:
>> pr_info("%s: failed to register PMU devices! Error %i\n",
>> node->full_name, ret);
>> ....
>> }
>>
>> Note that 'node' is dereferenced even though it was previously checked
>> if it is NULL.
>> The configuration I am testing does not use devicetree.
>>
>> Can you use dev_info() instead ?
>
>
> Does anything like below [1] does work for you?
>
> If so, could you please share the output? I.e. what it prints in your non-devicetree non-pmu case?
>
This is what I get:

hw perfevents: probing PMU on CPU 0
armv7-pmu armv7-pmu: failed to probe PMU! Error -6
armv7-pmu armv7-pmu: failed to register PMU devices! Error -6

Another option might be to use of_node_full_name() (or even better
both dev_info() and of_node_full_name()).

Not sure though what you are looking for. '/soc/pmu_a53' doesn't
tell (me) much either, and I am not sure I understand the context
of the bug description. Why would the kernel try to initialize PMU
for a CPU which isn't online ? And if it really does, wouldn't it make
more sense to print CPU number and CPU ID instead of a devicetree
node name ?

Thanks,
Guenter

> Best regards
>
> Dirk
>
> [1]
>
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 11bacc7..bda3502 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -1002,8 +1002,7 @@ int arm_pmu_device_probe(struct platform_device *pdev,
> }
>
> if (ret) {
> - pr_info("%s: failed to probe PMU! Error %i\n",
> - node->full_name, ret);
> + dev_info(&pdev->dev, "failed to probe PMU! Error %i\n", ret);
> goto out_free;
> }
>
> @@ -1023,8 +1022,7 @@ int arm_pmu_device_probe(struct platform_device *pdev,
> out_destroy:
> cpu_pmu_destroy(pmu);
> out_free:
> - pr_info("%s: failed to register PMU devices! Error %i\n",
> - node->full_name, ret);
> + dev_info(&pdev->dev, "failed to register PMU devices! Error %i\n", ret);
> kfree(pmu);
> return ret;
> }
>
>

2016-03-21 07:02:58

by Dirk Behme

[permalink] [raw]
Subject: Re: linux-next: Crash in arm_pmu_device_probe() due to 'drivers/perf: arm_pmu: make info messages more verbose'

On 18.03.2016 14:30, Guenter Roeck wrote:
> Hi Dirk,
>
> On 03/18/2016 02:18 AM, Dirk Behme wrote:
>> Hi Guenter,
>>
>> On 18.03.2016 07:44, Guenter Roeck wrote:
>>> Hi,
>>>
>>> I am seeing the attached crash when running a realview-pb-a8 image with
>>> realview_defconfig in qemu.
>>> bisect wasn't successful, but a commit analysis identified commit
>>> 'drivers/perf: arm_pmu: make info
>>> messages more verbose' as the culprit. Reverting this commit fixes the
>>> problem.
>>>
>>> The code roughly looks as follows.
>>>
>>> int arm_pmu_device_probe()
>>> {
>>> ...
>>> if (node && ..) {
>>> } else {
>>> }
>>>
>>> if (ret) {
>>> pr_info("%s: failed to probe PMU! Error %i\n",
>>> node->full_name, ret);
>>> goto out_free;
>>> }
>>> ....
>>> out_free:
>>> pr_info("%s: failed to register PMU devices! Error %i\n",
>>> node->full_name, ret);
>>> ....
>>> }
>>>
>>> Note that 'node' is dereferenced even though it was previously checked
>>> if it is NULL.
>>> The configuration I am testing does not use devicetree.
>>>
>>> Can you use dev_info() instead ?
>>
>>
>> Does anything like below [1] does work for you?
>>
>> If so, could you please share the output? I.e. what it prints in your
>> non-devicetree non-pmu case?
>>
> This is what I get:
>
> hw perfevents: probing PMU on CPU 0
> armv7-pmu armv7-pmu: failed to probe PMU! Error -6
> armv7-pmu armv7-pmu: failed to register PMU devices! Error -6
>
> Another option might be to use of_node_full_name() (or even better
> both dev_info() and of_node_full_name()).
>
> Not sure though what you are looking for. '/soc/pmu_a53' doesn't
> tell (me) much either, and I am not sure I understand the context
> of the bug description. Why would the kernel try to initialize PMU
> for a CPU which isn't online ?


I have a device tree based ARMv8 Cortex A57 / Cortex A53 based system.
The Cortex A57 are always starting fine, but based on some firmware
issues the Cortex A53 are sometimes online, sometimes not. But enabling
the PMUs is always in the device tree, which fails some times then, too.

With the initial non-verbose PMU error messages I needed some time to
find out that the error messages were due to the Cortex A53 not being
online. And I thought it would help to debug similar cases to make this
more verbose.

Best regards

Dirk