2018-10-16 11:39:12

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Crash in msm serial on dragonboard with ftrace bootargs

Hi,

On dragonboard 410c, with "ftrace=function" boot args, the console
output slows down and board resets without any backtrace as below. This
is tested on latest kernel and seems to exist even in older kernels as well.

[ 2.949164] EINJ: ACPI disabled.
[ 3.133001] Serial: 8250/16550 dri
Format: Log Type - Time(microsec) - Message - Optional Info
Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic

But with pstore enabled, able to get the below backtrace:

<4>[ 2.949164] EINJ: ACPI disabled.
<6>[ 3.133001] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
<6>[ 3.164097] SuperH (H)SCI(F) driver initialized
<0>[ 3.164471] Internal error: synchronous external abort: 96000010
[#1] PREEMPT SMP
<4>[ 3.164479] Modules linked in:
<4>[ 3.164495] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
4.19.0-rc8-00008-ge033b9909fff-dirty #175
<4>[ 3.164501] Hardware name: Qualcomm Technologies, Inc. APQ 8016
SBC (DT)
<4>[ 3.164508] pstate: 40000085 (nZcv daIf -PAN -UAO)
<4>[ 3.164514] pc : msm_read.isra.2+0x20/0x50
<4>[ 3.164520] lr : msm_read.isra.2+0x1c/0x50
<4>[ 3.164526] sp : ffff000008033a50
<4>[ 3.164531] x29: ffff000008033a50 x28: ffff000009486018
<4>[ 3.164548] x27: 0000000000000001 x26: ffff7dfffe7ff070
<4>[ 3.164565] x25: 0000000000000034 x24: ffff000009486000
<4>[ 3.164582] x23: 0000000000000000 x22: ffff00000978e190
<4>[ 3.164599] x21: ffff0000095e8228 x20: 0000000000000034
<4>[ 3.164616] x19: ffff7dfffe7ff008 x18: ffffffffffffffff
<4>[ 3.164632] x17: 0000000000000000 x16: 0000000000000000
<4>[ 3.164649] x15: ffff0000094a96c8 x14: ffff00008978e6bf
<4>[ 3.164666] x13: ffff00000978e6cd x12: 0000000000000038
<4>[ 3.164683] x11: ffff0000094c6000 x10: 0000000000000c24
<4>[ 3.164699] x9 : ffff80003c89b400 x8 : ffff000008033970
<4>[ 3.164716] x7 : ffff80000eb04100 x6 : 00000000000af304
<4>[ 3.164732] x5 : 0000000000000c40 x4 : ffff80003c06f000
<4>[ 3.164750] x3 : ffff80003c89b498 x2 : 0000000000000000
<4>[ 3.164766] x1 : ffff80003ca68000 x0 : 0000000000000800
<0>[ 3.164785] Process swapper/0 (pid: 1, stack limit =
0x(____ptrval____))
<4>[ 3.164791] Call trace:
<4>[ 3.164797] msm_read.isra.2+0x20/0x50
<4>[ 3.164804] msm_reset_dm_count+0x44/0x80
<4>[ 3.164810] __msm_console_write+0x1c8/0x1d0
<4>[ 3.164816] msm_serial_early_write_dm+0x3c/0x50
<4>[ 3.164823] console_unlock.part.6+0x468/0x528
<4>[ 3.164829] vprintk_emit+0x210/0x218
<4>[ 3.164835] vprintk_default+0x48/0x58
<4>[ 3.164841] vprintk_func+0xf0/0x1c0
<4>[ 3.164847] printk+0x74/0x94
<4>[ 3.164853] sci_init+0x24/0x3c
<4>[ 3.164859] do_one_initcall+0x54/0x248
<4>[ 3.164866] kernel_init_freeable+0x210/0x378
<4>[ 3.164872] kernel_init+0x18/0x118
<4>[ 3.164878] ret_from_fork+0x10/0x1c
<0>[ 3.164884] Code: aa1e03e0 8b214273 97e616f7 d503201f (b9400260)

Seems to be issue with the msm serial driver and not ftrace.
Could someone look into it.

One more thing is for pstore dmesg-ramoops, I had to change
late_initcall to postcore_initcall which brings the question as to why
we changed to late_initcall?
Simple git blame shows to support crypto compress api, but is it really
helpful? A lot of boottime issues can be caught with pstore enabled at
postcore_initcall rather than late_initcall, this backtrace
is just one example. Is there any way we could change this?

Thanks,
Sai
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


2018-10-16 11:45:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Tue, Oct 16, 2018 at 05:08:25PM +0530, Sai Prakash Ranjan wrote:
> Hi,
>
> On dragonboard 410c, with "ftrace=function" boot args, the console output
> slows down and board resets without any backtrace as below. This is tested
> on latest kernel and seems to exist even in older kernels as well.
>
> [ 2.949164] EINJ: ACPI disabled.
> [ 3.133001] Serial: 8250/16550 dri
> Format: Log Type - Time(microsec) - Message - Optional Info
> Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic
>
> But with pstore enabled, able to get the below backtrace:
>
> <4>[ 2.949164] EINJ: ACPI disabled.
> <6>[ 3.133001] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> <6>[ 3.164097] SuperH (H)SCI(F) driver initialized
> <0>[ 3.164471] Internal error: synchronous external abort: 96000010 [#1]
> PREEMPT SMP
> <4>[ 3.164479] Modules linked in:
> <4>[ 3.164495] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
> 4.19.0-rc8-00008-ge033b9909fff-dirty #175
> <4>[ 3.164501] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC
> (DT)
> <4>[ 3.164508] pstate: 40000085 (nZcv daIf -PAN -UAO)
> <4>[ 3.164514] pc : msm_read.isra.2+0x20/0x50
> <4>[ 3.164520] lr : msm_read.isra.2+0x1c/0x50
> <4>[ 3.164526] sp : ffff000008033a50
> <4>[ 3.164531] x29: ffff000008033a50 x28: ffff000009486018
> <4>[ 3.164548] x27: 0000000000000001 x26: ffff7dfffe7ff070
> <4>[ 3.164565] x25: 0000000000000034 x24: ffff000009486000
> <4>[ 3.164582] x23: 0000000000000000 x22: ffff00000978e190
> <4>[ 3.164599] x21: ffff0000095e8228 x20: 0000000000000034
> <4>[ 3.164616] x19: ffff7dfffe7ff008 x18: ffffffffffffffff
> <4>[ 3.164632] x17: 0000000000000000 x16: 0000000000000000
> <4>[ 3.164649] x15: ffff0000094a96c8 x14: ffff00008978e6bf
> <4>[ 3.164666] x13: ffff00000978e6cd x12: 0000000000000038
> <4>[ 3.164683] x11: ffff0000094c6000 x10: 0000000000000c24
> <4>[ 3.164699] x9 : ffff80003c89b400 x8 : ffff000008033970
> <4>[ 3.164716] x7 : ffff80000eb04100 x6 : 00000000000af304
> <4>[ 3.164732] x5 : 0000000000000c40 x4 : ffff80003c06f000
> <4>[ 3.164750] x3 : ffff80003c89b498 x2 : 0000000000000000
> <4>[ 3.164766] x1 : ffff80003ca68000 x0 : 0000000000000800
> <0>[ 3.164785] Process swapper/0 (pid: 1, stack limit =
> 0x(____ptrval____))
> <4>[ 3.164791] Call trace:
> <4>[ 3.164797] msm_read.isra.2+0x20/0x50
> <4>[ 3.164804] msm_reset_dm_count+0x44/0x80
> <4>[ 3.164810] __msm_console_write+0x1c8/0x1d0
> <4>[ 3.164816] msm_serial_early_write_dm+0x3c/0x50
> <4>[ 3.164823] console_unlock.part.6+0x468/0x528
> <4>[ 3.164829] vprintk_emit+0x210/0x218
> <4>[ 3.164835] vprintk_default+0x48/0x58
> <4>[ 3.164841] vprintk_func+0xf0/0x1c0
> <4>[ 3.164847] printk+0x74/0x94
> <4>[ 3.164853] sci_init+0x24/0x3c
> <4>[ 3.164859] do_one_initcall+0x54/0x248
> <4>[ 3.164866] kernel_init_freeable+0x210/0x378
> <4>[ 3.164872] kernel_init+0x18/0x118
> <4>[ 3.164878] ret_from_fork+0x10/0x1c
> <0>[ 3.164884] Code: aa1e03e0 8b214273 97e616f7 d503201f (b9400260)
>
> Seems to be issue with the msm serial driver and not ftrace.
> Could someone look into it.

As you have the hardware to reproduce this, and it has always been an
issue, you are the best placed to help resolve this.

Good luck!

greg k-h

2018-10-16 11:59:35

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/16/2018 5:14 PM, Greg Kroah-Hartman wrote:
> On Tue, Oct 16, 2018 at 05:08:25PM +0530, Sai Prakash Ranjan wrote:
>> Hi,
>>
>> On dragonboard 410c, with "ftrace=function" boot args, the console output
>> slows down and board resets without any backtrace as below. This is tested
>> on latest kernel and seems to exist even in older kernels as well.
>>
>> [ 2.949164] EINJ: ACPI disabled.
>> [ 3.133001] Serial: 8250/16550 dri
>> Format: Log Type - Time(microsec) - Message - Optional Info
>> Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic
>>
>> But with pstore enabled, able to get the below backtrace:
>>
>> <4>[ 2.949164] EINJ: ACPI disabled.
>> <6>[ 3.133001] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> <6>[ 3.164097] SuperH (H)SCI(F) driver initialized
>> <0>[ 3.164471] Internal error: synchronous external abort: 96000010 [#1]
>> PREEMPT SMP
>> <4>[ 3.164479] Modules linked in:
>> <4>[ 3.164495] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
>> 4.19.0-rc8-00008-ge033b9909fff-dirty #175
>> <4>[ 3.164501] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC
>> (DT)
>> <4>[ 3.164508] pstate: 40000085 (nZcv daIf -PAN -UAO)
>> <4>[ 3.164514] pc : msm_read.isra.2+0x20/0x50
>> <4>[ 3.164520] lr : msm_read.isra.2+0x1c/0x50
>> <4>[ 3.164526] sp : ffff000008033a50
>> <4>[ 3.164531] x29: ffff000008033a50 x28: ffff000009486018
>> <4>[ 3.164548] x27: 0000000000000001 x26: ffff7dfffe7ff070
>> <4>[ 3.164565] x25: 0000000000000034 x24: ffff000009486000
>> <4>[ 3.164582] x23: 0000000000000000 x22: ffff00000978e190
>> <4>[ 3.164599] x21: ffff0000095e8228 x20: 0000000000000034
>> <4>[ 3.164616] x19: ffff7dfffe7ff008 x18: ffffffffffffffff
>> <4>[ 3.164632] x17: 0000000000000000 x16: 0000000000000000
>> <4>[ 3.164649] x15: ffff0000094a96c8 x14: ffff00008978e6bf
>> <4>[ 3.164666] x13: ffff00000978e6cd x12: 0000000000000038
>> <4>[ 3.164683] x11: ffff0000094c6000 x10: 0000000000000c24
>> <4>[ 3.164699] x9 : ffff80003c89b400 x8 : ffff000008033970
>> <4>[ 3.164716] x7 : ffff80000eb04100 x6 : 00000000000af304
>> <4>[ 3.164732] x5 : 0000000000000c40 x4 : ffff80003c06f000
>> <4>[ 3.164750] x3 : ffff80003c89b498 x2 : 0000000000000000
>> <4>[ 3.164766] x1 : ffff80003ca68000 x0 : 0000000000000800
>> <0>[ 3.164785] Process swapper/0 (pid: 1, stack limit =
>> 0x(____ptrval____))
>> <4>[ 3.164791] Call trace:
>> <4>[ 3.164797] msm_read.isra.2+0x20/0x50
>> <4>[ 3.164804] msm_reset_dm_count+0x44/0x80
>> <4>[ 3.164810] __msm_console_write+0x1c8/0x1d0
>> <4>[ 3.164816] msm_serial_early_write_dm+0x3c/0x50
>> <4>[ 3.164823] console_unlock.part.6+0x468/0x528
>> <4>[ 3.164829] vprintk_emit+0x210/0x218
>> <4>[ 3.164835] vprintk_default+0x48/0x58
>> <4>[ 3.164841] vprintk_func+0xf0/0x1c0
>> <4>[ 3.164847] printk+0x74/0x94
>> <4>[ 3.164853] sci_init+0x24/0x3c
>> <4>[ 3.164859] do_one_initcall+0x54/0x248
>> <4>[ 3.164866] kernel_init_freeable+0x210/0x378
>> <4>[ 3.164872] kernel_init+0x18/0x118
>> <4>[ 3.164878] ret_from_fork+0x10/0x1c
>> <0>[ 3.164884] Code: aa1e03e0 8b214273 97e616f7 d503201f (b9400260)
>>
>> Seems to be issue with the msm serial driver and not ftrace.
>> Could someone look into it.
>
> As you have the hardware to reproduce this, and it has always been an
> issue, you are the best placed to help resolve this.
>

I would definitely test if serial guys could point somewhere, not that
the backtrace is not pointing :) but I am not aware of this serial driver.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-16 15:30:32

by Steven Rostedt

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Tue, 16 Oct 2018 17:08:25 +0530
Sai Prakash Ranjan <[email protected]> wrote:

> Hi,
>
> On dragonboard 410c, with "ftrace=function" boot args, the console
> output slows down and board resets without any backtrace as below. This
> is tested on latest kernel and seems to exist even in older kernels as well.

So this only happens when ftrace=function is on the boot console.


>
> [ 2.949164] EINJ: ACPI disabled.
> [ 3.133001] Serial: 8250/16550 dri
> Format: Log Type - Time(microsec) - Message - Optional Info
> Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic
>
> But with pstore enabled, able to get the below backtrace:
>
> <4>[ 2.949164] EINJ: ACPI disabled.
> <6>[ 3.133001] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> <6>[ 3.164097] SuperH (H)SCI(F) driver initialized
> <0>[ 3.164471] Internal error: synchronous external abort: 96000010
> [#1] PREEMPT SMP
> <4>[ 3.164479] Modules linked in:
> <4>[ 3.164495] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
> 4.19.0-rc8-00008-ge033b9909fff-dirty #175
> <4>[ 3.164501] Hardware name: Qualcomm Technologies, Inc. APQ 8016
> SBC (DT)
> <4>[ 3.164508] pstate: 40000085 (nZcv daIf -PAN -UAO)
> <4>[ 3.164514] pc : msm_read.isra.2+0x20/0x50
> <4>[ 3.164520] lr : msm_read.isra.2+0x1c/0x50
> <4>[ 3.164526] sp : ffff000008033a50
> <4>[ 3.164531] x29: ffff000008033a50 x28: ffff000009486018
> <4>[ 3.164548] x27: 0000000000000001 x26: ffff7dfffe7ff070
> <4>[ 3.164565] x25: 0000000000000034 x24: ffff000009486000
> <4>[ 3.164582] x23: 0000000000000000 x22: ffff00000978e190
> <4>[ 3.164599] x21: ffff0000095e8228 x20: 0000000000000034
> <4>[ 3.164616] x19: ffff7dfffe7ff008 x18: ffffffffffffffff
> <4>[ 3.164632] x17: 0000000000000000 x16: 0000000000000000
> <4>[ 3.164649] x15: ffff0000094a96c8 x14: ffff00008978e6bf
> <4>[ 3.164666] x13: ffff00000978e6cd x12: 0000000000000038
> <4>[ 3.164683] x11: ffff0000094c6000 x10: 0000000000000c24
> <4>[ 3.164699] x9 : ffff80003c89b400 x8 : ffff000008033970
> <4>[ 3.164716] x7 : ffff80000eb04100 x6 : 00000000000af304
> <4>[ 3.164732] x5 : 0000000000000c40 x4 : ffff80003c06f000
> <4>[ 3.164750] x3 : ffff80003c89b498 x2 : 0000000000000000
> <4>[ 3.164766] x1 : ffff80003ca68000 x0 : 0000000000000800
> <0>[ 3.164785] Process swapper/0 (pid: 1, stack limit =
> 0x(____ptrval____))
> <4>[ 3.164791] Call trace:
> <4>[ 3.164797] msm_read.isra.2+0x20/0x50
> <4>[ 3.164804] msm_reset_dm_count+0x44/0x80
> <4>[ 3.164810] __msm_console_write+0x1c8/0x1d0
> <4>[ 3.164816] msm_serial_early_write_dm+0x3c/0x50
> <4>[ 3.164823] console_unlock.part.6+0x468/0x528
> <4>[ 3.164829] vprintk_emit+0x210/0x218
> <4>[ 3.164835] vprintk_default+0x48/0x58
> <4>[ 3.164841] vprintk_func+0xf0/0x1c0
> <4>[ 3.164847] printk+0x74/0x94
> <4>[ 3.164853] sci_init+0x24/0x3c
> <4>[ 3.164859] do_one_initcall+0x54/0x248
> <4>[ 3.164866] kernel_init_freeable+0x210/0x378
> <4>[ 3.164872] kernel_init+0x18/0x118
> <4>[ 3.164878] ret_from_fork+0x10/0x1c
> <0>[ 3.164884] Code: aa1e03e0 8b214273 97e616f7 d503201f (b9400260)
>
> Seems to be issue with the msm serial driver and not ftrace.
> Could someone look into it.

I'm guessing that there may have been an issue with ftrace, it tried to
print, but the printing caused a bug that rebooted the box.

Does function tracing work after boot up? That is, without the
ftrace=function, can you do:

echo function > /sys/kernel/debug/tracing/current_tracer

without any issue?

>
> One more thing is for pstore dmesg-ramoops, I had to change
> late_initcall to postcore_initcall which brings the question as to why
> we changed to late_initcall?
> Simple git blame shows to support crypto compress api, but is it really
> helpful? A lot of boottime issues can be caught with pstore enabled at
> postcore_initcall rather than late_initcall, this backtrace
> is just one example. Is there any way we could change this?

Does it break if the crypto is not initialized? Perhaps add a command
line flag to have it happen earlier:

ramoops=earlyinit

and add a postcore_initcall that checks if that flag is set, and if so,
it does the work then, and the late_initcall() will do nothing.

That way, you can still have unmodified kernels use pstore when it
crashes at boot up.

-- Steve

2018-10-16 16:37:41

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/16/2018 8:59 PM, Steven Rostedt wrote:
> On Tue, 16 Oct 2018 17:08:25 +0530
> Sai Prakash Ranjan <[email protected]> wrote:
>
>> Hi,
>>
>> On dragonboard 410c, with "ftrace=function" boot args, the console
>> output slows down and board resets without any backtrace as below. This
>> is tested on latest kernel and seems to exist even in older kernels as well.
>
> So this only happens when ftrace=function is on the boot console.
>

Yes. If I do not use boot console, target does not crash.

>
>>
>> [ 2.949164] EINJ: ACPI disabled.
>> [ 3.133001] Serial: 8250/16550 dri
>> Format: Log Type - Time(microsec) - Message - Optional Info
>> Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic
>>
>> But with pstore enabled, able to get the below backtrace:
>>
>> <4>[ 2.949164] EINJ: ACPI disabled.
>> <6>[ 3.133001] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> <6>[ 3.164097] SuperH (H)SCI(F) driver initialized
>> <0>[ 3.164471] Internal error: synchronous external abort: 96000010
>> [#1] PREEMPT SMP
>> <4>[ 3.164479] Modules linked in:
>> <4>[ 3.164495] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
>> 4.19.0-rc8-00008-ge033b9909fff-dirty #175
>> <4>[ 3.164501] Hardware name: Qualcomm Technologies, Inc. APQ 8016
>> SBC (DT)
>> <4>[ 3.164508] pstate: 40000085 (nZcv daIf -PAN -UAO)
>> <4>[ 3.164514] pc : msm_read.isra.2+0x20/0x50
>> <4>[ 3.164520] lr : msm_read.isra.2+0x1c/0x50
>> <4>[ 3.164526] sp : ffff000008033a50
>> <4>[ 3.164531] x29: ffff000008033a50 x28: ffff000009486018
>> <4>[ 3.164548] x27: 0000000000000001 x26: ffff7dfffe7ff070
>> <4>[ 3.164565] x25: 0000000000000034 x24: ffff000009486000
>> <4>[ 3.164582] x23: 0000000000000000 x22: ffff00000978e190
>> <4>[ 3.164599] x21: ffff0000095e8228 x20: 0000000000000034
>> <4>[ 3.164616] x19: ffff7dfffe7ff008 x18: ffffffffffffffff
>> <4>[ 3.164632] x17: 0000000000000000 x16: 0000000000000000
>> <4>[ 3.164649] x15: ffff0000094a96c8 x14: ffff00008978e6bf
>> <4>[ 3.164666] x13: ffff00000978e6cd x12: 0000000000000038
>> <4>[ 3.164683] x11: ffff0000094c6000 x10: 0000000000000c24
>> <4>[ 3.164699] x9 : ffff80003c89b400 x8 : ffff000008033970
>> <4>[ 3.164716] x7 : ffff80000eb04100 x6 : 00000000000af304
>> <4>[ 3.164732] x5 : 0000000000000c40 x4 : ffff80003c06f000
>> <4>[ 3.164750] x3 : ffff80003c89b498 x2 : 0000000000000000
>> <4>[ 3.164766] x1 : ffff80003ca68000 x0 : 0000000000000800
>> <0>[ 3.164785] Process swapper/0 (pid: 1, stack limit =
>> 0x(____ptrval____))
>> <4>[ 3.164791] Call trace:
>> <4>[ 3.164797] msm_read.isra.2+0x20/0x50
>> <4>[ 3.164804] msm_reset_dm_count+0x44/0x80
>> <4>[ 3.164810] __msm_console_write+0x1c8/0x1d0
>> <4>[ 3.164816] msm_serial_early_write_dm+0x3c/0x50
>> <4>[ 3.164823] console_unlock.part.6+0x468/0x528
>> <4>[ 3.164829] vprintk_emit+0x210/0x218
>> <4>[ 3.164835] vprintk_default+0x48/0x58
>> <4>[ 3.164841] vprintk_func+0xf0/0x1c0
>> <4>[ 3.164847] printk+0x74/0x94
>> <4>[ 3.164853] sci_init+0x24/0x3c
>> <4>[ 3.164859] do_one_initcall+0x54/0x248
>> <4>[ 3.164866] kernel_init_freeable+0x210/0x378
>> <4>[ 3.164872] kernel_init+0x18/0x118
>> <4>[ 3.164878] ret_from_fork+0x10/0x1c
>> <0>[ 3.164884] Code: aa1e03e0 8b214273 97e616f7 d503201f (b9400260)
>>
>> Seems to be issue with the msm serial driver and not ftrace.
>> Could someone look into it.
>
> I'm guessing that there may have been an issue with ftrace, it tried to
> print, but the printing caused a bug that rebooted the box.
>
> Does function tracing work after boot up? That is, without the
> ftrace=function, can you do:
>
> echo function > /sys/kernel/debug/tracing/current_tracer
>
> without any issue?
>

Yes ftrace in general works without any issue. I have also tested on
db820c and sdm845 where "ftrace=function" works fine. I am seeing this
issue only on db410c board.

>>
>> One more thing is for pstore dmesg-ramoops, I had to change
>> late_initcall to postcore_initcall which brings the question as to why
>> we changed to late_initcall?
>> Simple git blame shows to support crypto compress api, but is it really
>> helpful? A lot of boottime issues can be caught with pstore enabled at
>> postcore_initcall rather than late_initcall, this backtrace
>> is just one example. Is there any way we could change this?
>
> Does it break if the crypto is not initialized? Perhaps add a command
> line flag to have it happen earlier:
>

I didnt see any breakage, have been using ramoops with postcore_initcall
for sometime now.

> ramoops=earlyinit
>
> and add a postcore_initcall that checks if that flag is set, and if so,
> it does the work then, and the late_initcall() will do nothing.
>
> That way, you can still have unmodified kernels use pstore when it
> crashes at boot up.
>

Sounds good.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-16 16:58:08

by Steven Rostedt

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Tue, 16 Oct 2018 22:05:23 +0530
Sai Prakash Ranjan <[email protected]> wrote:

> On 10/16/2018 8:59 PM, Steven Rostedt wrote:
> > On Tue, 16 Oct 2018 17:08:25 +0530
> > Sai Prakash Ranjan <[email protected]> wrote:
> >
> >> Hi,
> >>
> >> On dragonboard 410c, with "ftrace=function" boot args, the console
> >> output slows down and board resets without any backtrace as below. This
> >> is tested on latest kernel and seems to exist even in older kernels as well.
> >
> > So this only happens when ftrace=function is on the boot console.
> >
>
> Yes. If I do not use boot console, target does not crash.
>


> >
> > Does function tracing work after boot up? That is, without the
> > ftrace=function, can you do:
> >
> > echo function > /sys/kernel/debug/tracing/current_tracer
> >
> > without any issue?
> >
>
> Yes ftrace in general works without any issue. I have also tested on
> db820c and sdm845 where "ftrace=function" works fine. I am seeing this
> issue only on db410c board.

OK, can you add to the command line:

ftrace=function ftrace_filter=*schedule*

to see if it's a specific function that may be causing the issue (but
hopefully it's not one of the scheduling functions that caused it).

>
> >>
> >> One more thing is for pstore dmesg-ramoops, I had to change
> >> late_initcall to postcore_initcall which brings the question as to why
> >> we changed to late_initcall?
> >> Simple git blame shows to support crypto compress api, but is it really
> >> helpful? A lot of boottime issues can be caught with pstore enabled at
> >> postcore_initcall rather than late_initcall, this backtrace
> >> is just one example. Is there any way we could change this?
> >
> > Does it break if the crypto is not initialized? Perhaps add a command
> > line flag to have it happen earlier:
> >
>
> I didnt see any breakage, have been using ramoops with postcore_initcall
> for sometime now.
>
> > ramoops=earlyinit
> >
> > and add a postcore_initcall that checks if that flag is set, and if so,
> > it does the work then, and the late_initcall() will do nothing.
> >
> > That way, you can still have unmodified kernels use pstore when it
> > crashes at boot up.
> >
>
> Sounds good.

Great, I guess you can write a patch to do that ;-)

-- Steve


2018-10-16 17:04:00

by Kees Cook

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Tue, Oct 16, 2018 at 8:29 AM, Steven Rostedt <[email protected]> wrote:
> On Tue, 16 Oct 2018 17:08:25 +0530
> Sai Prakash Ranjan <[email protected]> wrote:
>> One more thing is for pstore dmesg-ramoops, I had to change
>> late_initcall to postcore_initcall which brings the question as to why
>> we changed to late_initcall?
>> Simple git blame shows to support crypto compress api, but is it really
>> helpful? A lot of boottime issues can be caught with pstore enabled at
>> postcore_initcall rather than late_initcall, this backtrace
>> is just one example. Is there any way we could change this?
>
> Does it break if the crypto is not initialized? Perhaps add a command
> line flag to have it happen earlier:
>
> ramoops=earlyinit
>
> and add a postcore_initcall that checks if that flag is set, and if so,
> it does the work then, and the late_initcall() will do nothing.
>
> That way, you can still have unmodified kernels use pstore when it
> crashes at boot up.

Even better, if we could find a way to make it work with a late
initialization of compression (i.e. postcore_initcall() by default
means anything caught would be uncompressed, but anything after
late_initcall() would be compressed). I'd be very happy to review
patches for that!

-Kees

--
Kees Cook
Pixel Security

2018-10-16 17:37:15

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/16/2018 10:27 PM, Steven Rostedt wrote:
>
> OK, can you add to the command line:
>
> ftrace=function ftrace_filter=*schedule*
>
> to see if it's a specific function that may be causing the issue (but
> hopefully it's not one of the scheduling functions that caused it).
>

Target boots fine with this. So its not scheduling functions that is
causing it. Also I tried with ftrace_filter=*msm* just to be sure if
tracing driver functions is causing any issue but its NOT.


>>> Does it break if the crypto is not initialized? Perhaps add a command
>>> line flag to have it happen earlier:
>>>
>>
>> I didnt see any breakage, have been using ramoops with postcore_initcall
>> for sometime now.
>>
>>> ramoops=earlyinit
>>>
>>> and add a postcore_initcall that checks if that flag is set, and if so,
>>> it does the work then, and the late_initcall() will do nothing.
>>>
>>> That way, you can still have unmodified kernels use pstore when it
>>> crashes at boot up.
>>>
>>
>> Sounds good.
>
> Great, I guess you can write a patch to do that ;-)
>

Sure I can :) but as Kees said it would be better if we could
find a way to make it work with a late initialization of compression.
I will try on that.

Thanks,

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-16 17:50:03

by Steven Rostedt

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Tue, 16 Oct 2018 23:06:24 +0530
Sai Prakash Ranjan <[email protected]> wrote:

> On 10/16/2018 10:27 PM, Steven Rostedt wrote:
> >
> > OK, can you add to the command line:
> >
> > ftrace=function ftrace_filter=*schedule*
> >
> > to see if it's a specific function that may be causing the issue (but
> > hopefully it's not one of the scheduling functions that caused it).
> >
>
> Target boots fine with this. So its not scheduling functions that is
> causing it. Also I tried with ftrace_filter=*msm* just to be sure if
> tracing driver functions is causing any issue but its NOT.

OK, seems that something is being traced that shouldn't be.

When this happens after boot up, it's easy to bisect down to the
problem function. But since it's at boot up, it will take a lot longer.

I would suggest to start by going down the alphabet.

ftrace_filter=a*
ftrace_filter=b*
ftrace_filter=c*
[...]

And at least find the letter the bad function starts with.

Note, it could be more than one function (I've had that a couple of
times), and to find that out, you can test with "ftrace_notrace". Say
you find that the problem function starts with 'x'. You can do:

ftrace_notrace=x*

Which will trace all functions except those that start with an 'x', to
make sure it still boots.

Remember, you still need to have ftrace=function for all of this.

Once you find the letter of the function, you can try the next letter,
or perhaps come up with another method. I would say look at the
functions in /sys/kernel/debug/tracing/available_filter_functions, but
they don't list the init function (that can be traced). But you can
use /proc/kallsyms instead.

-- Steve

2018-10-16 18:06:28

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/16/2018 11:18 PM, Steven Rostedt wrote:
> On Tue, 16 Oct 2018 23:06:24 +0530
> Sai Prakash Ranjan <[email protected]> wrote:
>
>> On 10/16/2018 10:27 PM, Steven Rostedt wrote:
>>>
>>> OK, can you add to the command line:
>>>
>>> ftrace=function ftrace_filter=*schedule*
>>>
>>> to see if it's a specific function that may be causing the issue (but
>>> hopefully it's not one of the scheduling functions that caused it).
>>>
>>
>> Target boots fine with this. So its not scheduling functions that is
>> causing it. Also I tried with ftrace_filter=*msm* just to be sure if
>> tracing driver functions is causing any issue but its NOT.
>
> OK, seems that something is being traced that shouldn't be.
>
> When this happens after boot up, it's easy to bisect down to the
> problem function. But since it's at boot up, it will take a lot longer.
>
> I would suggest to start by going down the alphabet.
>
> ftrace_filter=a*
> ftrace_filter=b*
> ftrace_filter=c*
> [...]
>
> And at least find the letter the bad function starts with.
>
> Note, it could be more than one function (I've had that a couple of
> times), and to find that out, you can test with "ftrace_notrace". Say
> you find that the problem function starts with 'x'. You can do:
>
> ftrace_notrace=x*
>
> Which will trace all functions except those that start with an 'x', to
> make sure it still boots.
>
> Remember, you still need to have ftrace=function for all of this.
>
> Once you find the letter of the function, you can try the next letter,
> or perhaps come up with another method. I would say look at the
> functions in /sys/kernel/debug/tracing/available_filter_functions, but
> they don't list the init function (that can be traced). But you can
> use /proc/kallsyms instead.
>

Ok got it, this sounds fun. I'll give it a try.

Thanks

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-16 18:17:07

by Steven Rostedt

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs


[ Removed [email protected] due to getting mail delivery errors ]

On Tue, 16 Oct 2018 23:35:00 +0530
Sai Prakash Ranjan <[email protected]> wrote:

> Ok got it, this sounds fun. I'll give it a try.

Awesome, I'm looking forward to seeing what you come up with.

-- Steve

2018-10-16 18:26:54

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/16/2018 11:46 PM, Steven Rostedt wrote:
>
> [ Removed [email protected] due to getting mail delivery errors ]
>
> On Tue, 16 Oct 2018 23:35:00 +0530
> Sai Prakash Ranjan <[email protected]> wrote:
>
>> Ok got it, this sounds fun. I'll give it a try.
>
> Awesome, I'm looking forward to seeing what you come up with.
>

Now, I can trigger this crash reboot with "ftrace=function
ftrace_filter=f*,g*,h*,i*,j*,k*,l*,m*,n*,o*,p*,q*"

And pstore dmesg log gives slightly different backtrace than
before(i.e., uart_add_one_port+0x374/0x4c8):

<4>[ 1.976945] EINJ: ACPI disabled.
<6>[ 2.070011] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
<6>[ 2.087353] SuperH (H)SCI(F) driver initialized
<6>[ 2.093708] msm_serial 78af000.serial: msm_serial: detected port #1
<6>[ 2.093901] msm_serial 78af000.serial: uartclk = 19200000
<6>[ 2.099078] 78af000.serial: ttyMSM1 at MMIO 0x78af000 (irq = 9,
base_baud = 1200000) is a MSM
<6>[ 2.108386] msm_serial 78b0000.serial: msm_serial: detected port #0
<6>[ 2.113076] msm_serial 78b0000.serial: uartclk = 7372800
<6>[ 2.119300] 78b0000.serial: ttyMSM0 at MMIO 0x78b0000 (irq = 10,
base_baud = 460800) is a MSM
<0>[ 2.119769] Internal error: synchronous external abort: 96000010
[#1] PREEMPT SMP
<4>[ 2.119772] Modules linked in:
<4>[ 2.119780] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.19.0-rc8-00008-ge033b9909fff-dirty #2
<4>[ 2.119785] Hardware name: Qualcomm Technologies, Inc. APQ 8016
SBC (DT)
<4>[ 2.119789] pstate: 40000085 (nZcv daIf -PAN -UAO)
<4>[ 2.119792] pc : msm_read.isra.2+0x20/0x50
<4>[ 2.119796] lr : msm_read.isra.2+0x1c/0x50
<4>[ 2.119799] sp : ffff000008033760
<4>[ 2.119802] x29: ffff000008033760 x28: ffff000009486018
<4>[ 2.119809] x27: 0000000000000001 x26: ffff7dfffe7ff070
<4>[ 2.119816] x25: 0000000000000062 x24: ffff000009486000
<4>[ 2.119824] x23: 0000000000000000 x22: ffff00000978e190
<4>[ 2.119831] x21: ffff0000095e8228 x20: 0000000000000062
<4>[ 2.119838] x19: ffff7dfffe7ff008 x18: ffffffffffffffff
<4>[ 2.119845] x17: 0000000000000000 x16: 0000000000000000
<4>[ 2.119852] x15: ffff0000094a96c8 x14: 3d20647561625f65
<4>[ 2.119859] x13: 736162202c303120 x12: 3d20717269282030
<4>[ 2.119866] x11: 3030306238377830 x10: 0000000000000134
<4>[ 2.119874] x9 : ffff80003c356400 x8 : ffff000008033680
<4>[ 2.119881] x7 : ffff80003c354100 x6 : 00000000000d9b72
<4>[ 2.119888] x5 : 0000000000000150 x4 : ffff80003bc8f000
<4>[ 2.119895] x3 : ffff80003c356498 x2 : 0000000000000000
<4>[ 2.119902] x1 : ffff80003bf10000 x0 : 0000000000000800
<0>[ 2.119911] Process swapper/0 (pid: 1, stack limit =
0x(____ptrval____))
<4>[ 2.119914] Call trace:
<4>[ 2.119917] msm_read.isra.2+0x20/0x50
<4>[ 2.119920] msm_reset_dm_count+0x44/0x80
<4>[ 2.119924] __msm_console_write+0x1c8/0x1d0
<4>[ 2.119928] msm_serial_early_write_dm+0x3c/0x50
<4>[ 2.119931] console_unlock.part.6+0x468/0x528
<4>[ 2.119935] vprintk_emit+0x210/0x218
<4>[ 2.119938] vprintk_default+0x48/0x58
<4>[ 2.119942] vprintk_func+0xf0/0x1c0
<4>[ 2.119945] printk+0x74/0x94
<4>[ 2.119949] uart_add_one_port+0x374/0x4c8
<4>[ 2.119952] msm_serial_probe+0x140/0x1c8
<4>[ 2.119955] platform_drv_probe+0x58/0xb0
<4>[ 2.119959] really_probe+0x1f4/0x3c8
<4>[ 2.119963] driver_probe_device+0x12c/0x150
<4>[ 2.119966] __driver_attach+0x144/0x148
<4>[ 2.119969] bus_for_each_dev+0x78/0xe0
<4>[ 2.119973] driver_attach+0x30/0x40
<4>[ 2.119976] bus_add_driver+0x1c8/0x2a8
<4>[ 2.119979] driver_register+0x68/0x118
<4>[ 2.119983] __platform_driver_register+0x54/0x60
<4>[ 2.119987] msm_serial_init+0x40/0x70
<4>[ 2.119990] do_one_initcall+0x54/0x248
<4>[ 2.119993] kernel_init_freeable+0x210/0x378
<4>[ 2.119997] kernel_init+0x18/0x118
<4>[ 2.120000] ret_from_fork+0x10/0x1c
<0>[ 2.120004] Code: aa1e03e0 8b214273 97e61727 d503201f (b9400260)

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-16 18:42:10

by Steven Rostedt

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Tue, 16 Oct 2018 23:55:14 +0530
Sai Prakash Ranjan <[email protected]> wrote:

> On 10/16/2018 11:46 PM, Steven Rostedt wrote:
> >
> > [ Removed [email protected] due to getting mail delivery errors ]
> >
> > On Tue, 16 Oct 2018 23:35:00 +0530
> > Sai Prakash Ranjan <[email protected]> wrote:
> >
> >> Ok got it, this sounds fun. I'll give it a try.
> >
> > Awesome, I'm looking forward to seeing what you come up with.
> >
>
> Now, I can trigger this crash reboot with "ftrace=function
> ftrace_filter=f*,g*,h*,i*,j*,k*,l*,m*,n*,o*,p*,q*"

Does it crash with just "f*" or "g*" or any singularity of the above?

Hmm, I wonder if it is from tracing msm_read?

-- Steve


>
> And pstore dmesg log gives slightly different backtrace than
> before(i.e., uart_add_one_port+0x374/0x4c8):
>
> <4>[ 1.976945] EINJ: ACPI disabled.
> <6>[ 2.070011] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> <6>[ 2.087353] SuperH (H)SCI(F) driver initialized
> <6>[ 2.093708] msm_serial 78af000.serial: msm_serial: detected port #1
> <6>[ 2.093901] msm_serial 78af000.serial: uartclk = 19200000
> <6>[ 2.099078] 78af000.serial: ttyMSM1 at MMIO 0x78af000 (irq = 9,
> base_baud = 1200000) is a MSM
> <6>[ 2.108386] msm_serial 78b0000.serial: msm_serial: detected port #0
> <6>[ 2.113076] msm_serial 78b0000.serial: uartclk = 7372800
> <6>[ 2.119300] 78b0000.serial: ttyMSM0 at MMIO 0x78b0000 (irq = 10,
> base_baud = 460800) is a MSM
> <0>[ 2.119769] Internal error: synchronous external abort: 96000010
> [#1] PREEMPT SMP
> <4>[ 2.119772] Modules linked in:
> <4>[ 2.119780] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 4.19.0-rc8-00008-ge033b9909fff-dirty #2
> <4>[ 2.119785] Hardware name: Qualcomm Technologies, Inc. APQ 8016
> SBC (DT)
> <4>[ 2.119789] pstate: 40000085 (nZcv daIf -PAN -UAO)
> <4>[ 2.119792] pc : msm_read.isra.2+0x20/0x50
> <4>[ 2.119796] lr : msm_read.isra.2+0x1c/0x50
> <4>[ 2.119799] sp : ffff000008033760
> <4>[ 2.119802] x29: ffff000008033760 x28: ffff000009486018
> <4>[ 2.119809] x27: 0000000000000001 x26: ffff7dfffe7ff070
> <4>[ 2.119816] x25: 0000000000000062 x24: ffff000009486000
> <4>[ 2.119824] x23: 0000000000000000 x22: ffff00000978e190
> <4>[ 2.119831] x21: ffff0000095e8228 x20: 0000000000000062
> <4>[ 2.119838] x19: ffff7dfffe7ff008 x18: ffffffffffffffff
> <4>[ 2.119845] x17: 0000000000000000 x16: 0000000000000000
> <4>[ 2.119852] x15: ffff0000094a96c8 x14: 3d20647561625f65
> <4>[ 2.119859] x13: 736162202c303120 x12: 3d20717269282030
> <4>[ 2.119866] x11: 3030306238377830 x10: 0000000000000134
> <4>[ 2.119874] x9 : ffff80003c356400 x8 : ffff000008033680
> <4>[ 2.119881] x7 : ffff80003c354100 x6 : 00000000000d9b72
> <4>[ 2.119888] x5 : 0000000000000150 x4 : ffff80003bc8f000
> <4>[ 2.119895] x3 : ffff80003c356498 x2 : 0000000000000000
> <4>[ 2.119902] x1 : ffff80003bf10000 x0 : 0000000000000800
> <0>[ 2.119911] Process swapper/0 (pid: 1, stack limit =
> 0x(____ptrval____))
> <4>[ 2.119914] Call trace:
> <4>[ 2.119917] msm_read.isra.2+0x20/0x50
> <4>[ 2.119920] msm_reset_dm_count+0x44/0x80
> <4>[ 2.119924] __msm_console_write+0x1c8/0x1d0
> <4>[ 2.119928] msm_serial_early_write_dm+0x3c/0x50
> <4>[ 2.119931] console_unlock.part.6+0x468/0x528
> <4>[ 2.119935] vprintk_emit+0x210/0x218
> <4>[ 2.119938] vprintk_default+0x48/0x58
> <4>[ 2.119942] vprintk_func+0xf0/0x1c0
> <4>[ 2.119945] printk+0x74/0x94
> <4>[ 2.119949] uart_add_one_port+0x374/0x4c8
> <4>[ 2.119952] msm_serial_probe+0x140/0x1c8
> <4>[ 2.119955] platform_drv_probe+0x58/0xb0
> <4>[ 2.119959] really_probe+0x1f4/0x3c8
> <4>[ 2.119963] driver_probe_device+0x12c/0x150
> <4>[ 2.119966] __driver_attach+0x144/0x148
> <4>[ 2.119969] bus_for_each_dev+0x78/0xe0
> <4>[ 2.119973] driver_attach+0x30/0x40
> <4>[ 2.119976] bus_add_driver+0x1c8/0x2a8
> <4>[ 2.119979] driver_register+0x68/0x118
> <4>[ 2.119983] __platform_driver_register+0x54/0x60
> <4>[ 2.119987] msm_serial_init+0x40/0x70
> <4>[ 2.119990] do_one_initcall+0x54/0x248
> <4>[ 2.119993] kernel_init_freeable+0x210/0x378
> <4>[ 2.119997] kernel_init+0x18/0x118
> <4>[ 2.120000] ret_from_fork+0x10/0x1c
> <0>[ 2.120004] Code: aa1e03e0 8b214273 97e61727 d503201f (b9400260)
>


2018-10-16 19:03:41

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/17/2018 12:11 AM, Steven Rostedt wrote:
> On Tue, 16 Oct 2018 23:55:14 +0530
> Sai Prakash Ranjan <[email protected]> wrote:
>
>> On 10/16/2018 11:46 PM, Steven Rostedt wrote:
>>>
>>> [ Removed [email protected] due to getting mail delivery errors ]
>>>
>>> On Tue, 16 Oct 2018 23:35:00 +0530
>>> Sai Prakash Ranjan <[email protected]> wrote:
>>>
>>>> Ok got it, this sounds fun. I'll give it a try.
>>>
>>> Awesome, I'm looking forward to seeing what you come up with.
>>>
>>
>> Now, I can trigger this crash reboot with "ftrace=function
>> ftrace_filter=f*,g*,h*,i*,j*,k*,l*,m*,n*,o*,p*,q*"
>
> Does it crash with just "f*" or "g*" or any singularity of the above?
>
> Hmm, I wonder if it is from tracing msm_read?
>

Haa seems like you are right! With "ftrace=function
ftrace_filter=msm_read" , I can trigger the crash, but
sadly "ftrace_notrace=msm_read" also crashes.

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-16 19:04:37

by Steven Rostedt

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Wed, 17 Oct 2018 00:31:03 +0530
Sai Prakash Ranjan <[email protected]> wrote:

> Haa seems like you are right! With "ftrace=function
> ftrace_filter=msm_read" , I can trigger the crash, but
> sadly "ftrace_notrace=msm_read" also crashes.

So there's more than one problem area.

What about ftrace_notrace=m*

?

-- Steve

2018-10-16 19:06:59

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/17/2018 12:33 AM, Steven Rostedt wrote:
> On Wed, 17 Oct 2018 00:31:03 +0530
> Sai Prakash Ranjan <[email protected]> wrote:
>
>> Haa seems like you are right! With "ftrace=function
>> ftrace_filter=msm_read" , I can trigger the crash, but
>> sadly "ftrace_notrace=msm_read" also crashes.
>
> So there's more than one problem area.
>
> What about ftrace_notrace=m*
>
> ?
>

That too crashes.

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-16 19:16:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Wed, 17 Oct 2018 00:36:05 +0530
Sai Prakash Ranjan <[email protected]> wrote:

> On 10/17/2018 12:33 AM, Steven Rostedt wrote:
> > On Wed, 17 Oct 2018 00:31:03 +0530
> > Sai Prakash Ranjan <[email protected]> wrote:
> >
> >> Haa seems like you are right! With "ftrace=function
> >> ftrace_filter=msm_read" , I can trigger the crash, but
> >> sadly "ftrace_notrace=msm_read" also crashes.
> >
> > So there's more than one problem area.
> >
> > What about ftrace_notrace=m*
> >
> > ?
> >
>
> That too crashes.
>

Which compiler are you using and can you send me your config.

I wonder if there's something screwing up with the way ftrace nops are
working on this board.

A couple things of note, 1) it works fine after boot up. 2) it crashes
in the initcall code, so it's not due to ftrace being enabled too early.

I'd like to see the full command line as well. I bet if you remove the
qcom,msm-uartdm from the command line, and had just ftrace=function, it
may also boot fine too. Can you try that?

-- Steve

2018-10-16 19:17:17

by Steven Rostedt

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Tue, 16 Oct 2018 15:15:16 -0400
Steven Rostedt <[email protected]> wrote:

> I'd like to see the full command line as well. I bet if you remove the
> qcom,msm-uartdm from the command line, and had just ftrace=function, it
> may also boot fine too. Can you try that?

Note, I probably wont respond for the rest of the day.

-- Steve


2018-10-16 19:38:30

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/17/2018 12:46 AM, Steven Rostedt wrote:
> On Tue, 16 Oct 2018 15:15:16 -0400
> Steven Rostedt <[email protected]> wrote:
>
>> I'd like to see the full command line as well. I bet if you remove the
>> qcom,msm-uartdm from the command line, and had just ftrace=function, it
>> may also boot fine too. Can you try that?
>
> Note, I probably wont respond for the rest of the day.
>

Cool, no problem. Will update if I find anything.

Thanks for your time!

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-16 19:38:41

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/17/2018 12:45 AM, Steven Rostedt wrote:
> On Wed, 17 Oct 2018 00:36:05 +0530
> Sai Prakash Ranjan <[email protected]> wrote:
>
>> On 10/17/2018 12:33 AM, Steven Rostedt wrote:
>>> On Wed, 17 Oct 2018 00:31:03 +0530
>>> Sai Prakash Ranjan <[email protected]> wrote:
>>>
>>>> Haa seems like you are right! With "ftrace=function
>>>> ftrace_filter=msm_read" , I can trigger the crash, but
>>>> sadly "ftrace_notrace=msm_read" also crashes.
>>>
>>> So there's more than one problem area.
>>>
>>> What about ftrace_notrace=m*
>>>
>>> ?
>>>
>>
>> That too crashes.
>>
>
> Which compiler are you using and can you send me your config.
>
Config attached.

Compiler: aarch64-linux-gnu-gcc (Linaro GCC 6.3-2017.02) 6.3.1 20170109

> I wonder if there's something screwing up with the way ftrace nops are
> working on this board.
>
> A couple things of note, 1) it works fine after boot up. 2) it crashes
> in the initcall code, so it's not due to ftrace being enabled too early.
>
> I'd like to see the full command line as well. I bet if you remove the
> qcom,msm-uartdm from the command line, and had just ftrace=function, it
> may also boot fine too. Can you try that?
>

Kernel command line: root=/dev/disk/by-partlabel/rootfs rw rootwait
ftrace=function ftrace_notrace=m* earlycon console=ttyMSM0,115200n8

qcom,msm-uartdm is not in command line, it is the earlycon. So without
earlycon(bootconsole), board boots fine as we discussed earlier.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


Attachments:
dotconfig (172.31 kB)

2018-10-16 20:53:05

by Stephen Boyd

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

Quoting Sai Prakash Ranjan (2018-10-16 12:35:57)
> On 10/17/2018 12:45 AM, Steven Rostedt wrote:
> > On Wed, 17 Oct 2018 00:36:05 +0530
> > Sai Prakash Ranjan <[email protected]> wrote:
> >
> >> On 10/17/2018 12:33 AM, Steven Rostedt wrote:
> >>> On Wed, 17 Oct 2018 00:31:03 +0530
> >>> Sai Prakash Ranjan <[email protected]> wrote:
> >>>
> >>>> Haa seems like you are right! With "ftrace=function
> >>>> ftrace_filter=msm_read" , I can trigger the crash, but
> >>>> sadly "ftrace_notrace=msm_read" also crashes.
> >>>
> >>> So there's more than one problem area.
> >>>
> >>> What about ftrace_notrace=m*
> >>>
> >>> ?
> >>>
> >>
> >> That too crashes.
> >>
> >
> > Which compiler are you using and can you send me your config.
> >
> Config attached.
>
> Compiler: aarch64-linux-gnu-gcc (Linaro GCC 6.3-2017.02) 6.3.1 20170109
>
> > I wonder if there's something screwing up with the way ftrace nops are
> > working on this board.
> >
> > A couple things of note, 1) it works fine after boot up. 2) it crashes
> > in the initcall code, so it's not due to ftrace being enabled too early.
> >
> > I'd like to see the full command line as well. I bet if you remove the
> > qcom,msm-uartdm from the command line, and had just ftrace=function, it
> > may also boot fine too. Can you try that?
> >
>
> Kernel command line: root=/dev/disk/by-partlabel/rootfs rw rootwait
> ftrace=function ftrace_notrace=m* earlycon console=ttyMSM0,115200n8
>
> qcom,msm-uartdm is not in command line, it is the earlycon. So without
> earlycon(bootconsole), board boots fine as we discussed earlier.
>

Have you tried with earlycon and no ftrace things on the commandline?

root=/dev/disk/by-partlabel/rootfs rw rootwait earlycon console=ttyMSM0,115200n8

If earlycon is causing problems, it sounds like this may be another case
of earlycon uart handing off to the uart driver and that failing because
something gets printed while the uart is transitioning from the earlycon
console to the kernel boot console. I recall the uart would trample on
itself in interesting ways.


2018-10-16 23:10:32

by Joel Fernandes

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Tue, Oct 16, 2018 at 05:08:25PM +0530, Sai Prakash Ranjan wrote:
> Hi,
>
> On dragonboard 410c, with "ftrace=function" boot args, the console output
> slows down and board resets without any backtrace as below. This is tested
> on latest kernel and seems to exist even in older kernels as well.
>
> [ 2.949164] EINJ: ACPI disabled.
> [ 3.133001] Serial: 8250/16550 dri
> Format: Log Type - Time(microsec) - Message - Optional Info
> Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic
>
> But with pstore enabled, able to get the below backtrace:
> <4>[ 2.949164] EINJ: ACPI disabled.
> <6>[ 3.133001] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> <6>[ 3.164097] SuperH (H)SCI(F) driver initialized
> <0>[ 3.164471] Internal error: synchronous external abort: 96000010 [#1]
> PREEMPT SMP
> <4>[ 3.164479] Modules linked in:
> <4>[ 3.164495] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
> 4.19.0-rc8-00008-ge033b9909fff-dirty #175
> <4>[ 3.164501] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC
> (DT)
> <4>[ 3.164508] pstate: 40000085 (nZcv daIf -PAN -UAO)
> <4>[ 3.164514] pc : msm_read.isra.2+0x20/0x50
> <4>[ 3.164520] lr : msm_read.isra.2+0x1c/0x50
> <4>[ 3.164526] sp : ffff000008033a50
> <4>[ 3.164531] x29: ffff000008033a50 x28: ffff000009486018
> <4>[ 3.164548] x27: 0000000000000001 x26: ffff7dfffe7ff070
> <4>[ 3.164565] x25: 0000000000000034 x24: ffff000009486000
> <4>[ 3.164582] x23: 0000000000000000 x22: ffff00000978e190
> <4>[ 3.164599] x21: ffff0000095e8228 x20: 0000000000000034
> <4>[ 3.164616] x19: ffff7dfffe7ff008 x18: ffffffffffffffff
> <4>[ 3.164632] x17: 0000000000000000 x16: 0000000000000000
> <4>[ 3.164649] x15: ffff0000094a96c8 x14: ffff00008978e6bf
> <4>[ 3.164666] x13: ffff00000978e6cd x12: 0000000000000038
> <4>[ 3.164683] x11: ffff0000094c6000 x10: 0000000000000c24
> <4>[ 3.164699] x9 : ffff80003c89b400 x8 : ffff000008033970
> <4>[ 3.164716] x7 : ffff80000eb04100 x6 : 00000000000af304
> <4>[ 3.164732] x5 : 0000000000000c40 x4 : ffff80003c06f000
> <4>[ 3.164750] x3 : ffff80003c89b498 x2 : 0000000000000000
> <4>[ 3.164766] x1 : ffff80003ca68000 x0 : 0000000000000800
> <0>[ 3.164785] Process swapper/0 (pid: 1, stack limit =
> 0x(____ptrval____))
> <4>[ 3.164791] Call trace:
> <4>[ 3.164797] msm_read.isra.2+0x20/0x50
> <4>[ 3.164804] msm_reset_dm_count+0x44/0x80
> <4>[ 3.164810] __msm_console_write+0x1c8/0x1d0
> <4>[ 3.164816] msm_serial_early_write_dm+0x3c/0x50
> <4>[ 3.164823] console_unlock.part.6+0x468/0x528
> <4>[ 3.164829] vprintk_emit+0x210/0x218
> <4>[ 3.164835] vprintk_default+0x48/0x58
> <4>[ 3.164841] vprintk_func+0xf0/0x1c0
> <4>[ 3.164847] printk+0x74/0x94
> <4>[ 3.164853] sci_init+0x24/0x3c
> <4>[ 3.164859] do_one_initcall+0x54/0x248
> <4>[ 3.164866] kernel_init_freeable+0x210/0x378
> <4>[ 3.164872] kernel_init+0x18/0x118
> <4>[ 3.164878] ret_from_fork+0x10/0x1c
> <0>[ 3.164884] Code: aa1e03e0 8b214273 97e616f7 d503201f (b9400260)
>
> Seems to be issue with the msm serial driver and not ftrace.
> Could someone look into it.
>
> One more thing is for pstore dmesg-ramoops, I had to change late_initcall to
> postcore_initcall which brings the question as to why we changed to
> late_initcall?
> Simple git blame shows to support crypto compress api, but is it really
> helpful? A lot of boottime issues can be caught with pstore enabled at
> postcore_initcall rather than late_initcall, this backtrace
> is just one example. Is there any way we could change this?

Any chance you are able to dig deeper into the stack trace? I would
disassemble vmlinux and see which part of msm_read is generating the
synchronous external abort and look into that.

Also similar to what Steve suggested, I wonder if it boots for you if you
annotate all the functions in the serial driver with 'notrace'.

thanks,

- Joel


2018-10-17 10:15:03

by Joel Fernandes

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

Hi Kees,

On Tue, Oct 16, 2018 at 10:02:53AM -0700, Kees Cook wrote:
> On Tue, Oct 16, 2018 at 8:29 AM, Steven Rostedt <[email protected]> wrote:
> > On Tue, 16 Oct 2018 17:08:25 +0530
> > Sai Prakash Ranjan <[email protected]> wrote:
> >> One more thing is for pstore dmesg-ramoops, I had to change
> >> late_initcall to postcore_initcall which brings the question as to why
> >> we changed to late_initcall?
> >> Simple git blame shows to support crypto compress api, but is it really
> >> helpful? A lot of boottime issues can be caught with pstore enabled at
> >> postcore_initcall rather than late_initcall, this backtrace
> >> is just one example. Is there any way we could change this?
> >
> > Does it break if the crypto is not initialized? Perhaps add a command
> > line flag to have it happen earlier:
> >
> > ramoops=earlyinit
> >
> > and add a postcore_initcall that checks if that flag is set, and if so,
> > it does the work then, and the late_initcall() will do nothing.
> >
> > That way, you can still have unmodified kernels use pstore when it
> > crashes at boot up.
>
> Even better, if we could find a way to make it work with a late
> initialization of compression (i.e. postcore_initcall() by default
> means anything caught would be uncompressed, but anything after
> late_initcall() would be compressed). I'd be very happy to review
> patches for that!

What do you think about the (untested) patch below? It seems to me that it
should solve the issue of missing early crash dumps, but I have not tested it
yet. Sai, would you mind trying it out and let me know if you can see the
early crash dumps properly now?

----8<---
From: "Joel Fernandes (Google)" <[email protected]>
Subject: [RFC] pstore: allocate compression during late_initcall

ramoop's pstore registration (using pstore_register) has to run during
late_initcall because crypto backend may not be ready during
postcore_initcall. This causes missing of dmesg crash dumps which could
have been caught by pstore.

Instead, lets allow ramoops pstore registration earlier, and once crypto
is ready we can initialize the compression.

Reported-by: Sai Prakash Ranjan <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
fs/pstore/platform.c | 13 +++++++++++++
fs/pstore/ram.c | 2 +-
2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index 15e99d5a681d..f09066db2d4d 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -780,6 +780,19 @@ void __init pstore_choose_compression(void)
}
}

+static int __init pstore_compression_late_init(void)
+{
+ /*
+ * Check if any pstore backends registered earlier but did not allocate
+ * for compression because crypto was not ready, if so then initialize
+ * compression.
+ */
+ if (psinfo && !tfm)
+ allocate_buf_for_compression();
+ return 0;
+}
+late_initcall(pstore_compression_late_init);
+
module_param(compress, charp, 0444);
MODULE_PARM_DESC(compress, "Pstore compression to use");

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index bbd1e357c23d..98e48d1a9776 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -940,7 +940,7 @@ static int __init ramoops_init(void)
ramoops_register_dummy();
return platform_driver_register(&ramoops_driver);
}
-late_initcall(ramoops_init);
+postcore_initcall(ramoops_init);

static void __exit ramoops_exit(void)
{
--
2.19.1.331.ge82ca0e54c-goog


2018-10-17 11:28:30

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/17/2018 2:21 AM, Stephen Boyd wrote:
> Quoting Sai Prakash Ranjan (2018-10-16 12:35:57)
>> On 10/17/2018 12:45 AM, Steven Rostedt wrote:
>>> On Wed, 17 Oct 2018 00:36:05 +0530
>>> Sai Prakash Ranjan <[email protected]> wrote:
>>>
>>>> On 10/17/2018 12:33 AM, Steven Rostedt wrote:
>>>>> On Wed, 17 Oct 2018 00:31:03 +0530
>>>>> Sai Prakash Ranjan <[email protected]> wrote:
>>>>>
>>>>>> Haa seems like you are right! With "ftrace=function
>>>>>> ftrace_filter=msm_read" , I can trigger the crash, but
>>>>>> sadly "ftrace_notrace=msm_read" also crashes.
>>>>>
>>>>> So there's more than one problem area.
>>>>>
>>>>> What about ftrace_notrace=m*
>>>>>
>>>>> ?
>>>>>
>>>>
>>>> That too crashes.
>>>>
>>>
>>> Which compiler are you using and can you send me your config.
>>>
>> Config attached.
>>
>> Compiler: aarch64-linux-gnu-gcc (Linaro GCC 6.3-2017.02) 6.3.1 20170109
>>
>>> I wonder if there's something screwing up with the way ftrace nops are
>>> working on this board.
>>>
>>> A couple things of note, 1) it works fine after boot up. 2) it crashes
>>> in the initcall code, so it's not due to ftrace being enabled too early.
>>>
>>> I'd like to see the full command line as well. I bet if you remove the
>>> qcom,msm-uartdm from the command line, and had just ftrace=function, it
>>> may also boot fine too. Can you try that?
>>>
>>
>> Kernel command line: root=/dev/disk/by-partlabel/rootfs rw rootwait
>> ftrace=function ftrace_notrace=m* earlycon console=ttyMSM0,115200n8
>>
>> qcom,msm-uartdm is not in command line, it is the earlycon. So without
>> earlycon(bootconsole), board boots fine as we discussed earlier.
>>
>
> Have you tried with earlycon and no ftrace things on the commandline?
>
> root=/dev/disk/by-partlabel/rootfs rw rootwait earlycon console=ttyMSM0,115200n8
>
> If earlycon is causing problems, it sounds like this may be another case
> of earlycon uart handing off to the uart driver and that failing because
> something gets printed while the uart is transitioning from the earlycon
> console to the kernel boot console. I recall the uart would trample on
> itself in interesting ways.
>

Yes I have tried with only earlycon enabled and everything is fine.
Issue is reproduced only with ftrace=function cmdline.

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-17 11:39:04

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/17/2018 3:43 PM, Joel Fernandes wrote:
> Hi Kees,
>
> On Tue, Oct 16, 2018 at 10:02:53AM -0700, Kees Cook wrote:
>> On Tue, Oct 16, 2018 at 8:29 AM, Steven Rostedt <[email protected]> wrote:
>>> On Tue, 16 Oct 2018 17:08:25 +0530
>>> Sai Prakash Ranjan <[email protected]> wrote:
>>>> One more thing is for pstore dmesg-ramoops, I had to change
>>>> late_initcall to postcore_initcall which brings the question as to why
>>>> we changed to late_initcall?
>>>> Simple git blame shows to support crypto compress api, but is it really
>>>> helpful? A lot of boottime issues can be caught with pstore enabled at
>>>> postcore_initcall rather than late_initcall, this backtrace
>>>> is just one example. Is there any way we could change this?
>>>
>>> Does it break if the crypto is not initialized? Perhaps add a command
>>> line flag to have it happen earlier:
>>>
>>> ramoops=earlyinit
>>>
>>> and add a postcore_initcall that checks if that flag is set, and if so,
>>> it does the work then, and the late_initcall() will do nothing.
>>>
>>> That way, you can still have unmodified kernels use pstore when it
>>> crashes at boot up.
>>
>> Even better, if we could find a way to make it work with a late
>> initialization of compression (i.e. postcore_initcall() by default
>> means anything caught would be uncompressed, but anything after
>> late_initcall() would be compressed). I'd be very happy to review
>> patches for that!
>
> What do you think about the (untested) patch below? It seems to me that it
> should solve the issue of missing early crash dumps, but I have not tested it
> yet. Sai, would you mind trying it out and let me know if you can see the
> early crash dumps properly now?
>
> ----8<---
> From: "Joel Fernandes (Google)" <[email protected]>
> Subject: [RFC] pstore: allocate compression during late_initcall
>
> ramoop's pstore registration (using pstore_register) has to run during
> late_initcall because crypto backend may not be ready during
> postcore_initcall. This causes missing of dmesg crash dumps which could
> have been caught by pstore.
>
> Instead, lets allow ramoops pstore registration earlier, and once crypto
> is ready we can initialize the compression.
>
> Reported-by: Sai Prakash Ranjan <[email protected]>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> fs/pstore/platform.c | 13 +++++++++++++
> fs/pstore/ram.c | 2 +-
> 2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
> index 15e99d5a681d..f09066db2d4d 100644
> --- a/fs/pstore/platform.c
> +++ b/fs/pstore/platform.c
> @@ -780,6 +780,19 @@ void __init pstore_choose_compression(void)
> }
> }
>
> +static int __init pstore_compression_late_init(void)
> +{
> + /*
> + * Check if any pstore backends registered earlier but did not allocate
> + * for compression because crypto was not ready, if so then initialize
> + * compression.
> + */
> + if (psinfo && !tfm)
> + allocate_buf_for_compression();
> + return 0;
> +}
> +late_initcall(pstore_compression_late_init);
> +
> module_param(compress, charp, 0444);
> MODULE_PARM_DESC(compress, "Pstore compression to use");
>
> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
> index bbd1e357c23d..98e48d1a9776 100644
> --- a/fs/pstore/ram.c
> +++ b/fs/pstore/ram.c
> @@ -940,7 +940,7 @@ static int __init ramoops_init(void)
> ramoops_register_dummy();
> return platform_driver_register(&ramoops_driver);
> }
> -late_initcall(ramoops_init);
> +postcore_initcall(ramoops_init);
>
> static void __exit ramoops_exit(void)
> {
>

Yes I could see the early crash dump. Also I tested with different
compression (LZO) instead of deflate just to be sure and it works fine,
thanks :)

Tested-by: Sai Prakash Ranjan <[email protected]>

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-17 11:54:59

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/17/2018 4:39 AM, Joel Fernandes wrote:
> On Tue, Oct 16, 2018 at 05:08:25PM +0530, Sai Prakash Ranjan wrote:
>> Hi,
>>
>> On dragonboard 410c, with "ftrace=function" boot args, the console output
>> slows down and board resets without any backtrace as below. This is tested
>> on latest kernel and seems to exist even in older kernels as well.
>>
>> [ 2.949164] EINJ: ACPI disabled.
>> [ 3.133001] Serial: 8250/16550 dri
>> Format: Log Type - Time(microsec) - Message - Optional Info
>> Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic
>>
>> But with pstore enabled, able to get the below backtrace:
>> <4>[ 2.949164] EINJ: ACPI disabled.
>> <6>[ 3.133001] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> <6>[ 3.164097] SuperH (H)SCI(F) driver initialized
>> <0>[ 3.164471] Internal error: synchronous external abort: 96000010 [#1]
>> PREEMPT SMP
>> <4>[ 3.164479] Modules linked in:
>> <4>[ 3.164495] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
>> 4.19.0-rc8-00008-ge033b9909fff-dirty #175
>> <4>[ 3.164501] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC
>> (DT)
>> <4>[ 3.164508] pstate: 40000085 (nZcv daIf -PAN -UAO)
>> <4>[ 3.164514] pc : msm_read.isra.2+0x20/0x50
>> <4>[ 3.164520] lr : msm_read.isra.2+0x1c/0x50
>> <4>[ 3.164526] sp : ffff000008033a50
>> <4>[ 3.164531] x29: ffff000008033a50 x28: ffff000009486018
>> <4>[ 3.164548] x27: 0000000000000001 x26: ffff7dfffe7ff070
>> <4>[ 3.164565] x25: 0000000000000034 x24: ffff000009486000
>> <4>[ 3.164582] x23: 0000000000000000 x22: ffff00000978e190
>> <4>[ 3.164599] x21: ffff0000095e8228 x20: 0000000000000034
>> <4>[ 3.164616] x19: ffff7dfffe7ff008 x18: ffffffffffffffff
>> <4>[ 3.164632] x17: 0000000000000000 x16: 0000000000000000
>> <4>[ 3.164649] x15: ffff0000094a96c8 x14: ffff00008978e6bf
>> <4>[ 3.164666] x13: ffff00000978e6cd x12: 0000000000000038
>> <4>[ 3.164683] x11: ffff0000094c6000 x10: 0000000000000c24
>> <4>[ 3.164699] x9 : ffff80003c89b400 x8 : ffff000008033970
>> <4>[ 3.164716] x7 : ffff80000eb04100 x6 : 00000000000af304
>> <4>[ 3.164732] x5 : 0000000000000c40 x4 : ffff80003c06f000
>> <4>[ 3.164750] x3 : ffff80003c89b498 x2 : 0000000000000000
>> <4>[ 3.164766] x1 : ffff80003ca68000 x0 : 0000000000000800
>> <0>[ 3.164785] Process swapper/0 (pid: 1, stack limit =
>> 0x(____ptrval____))
>> <4>[ 3.164791] Call trace:
>> <4>[ 3.164797] msm_read.isra.2+0x20/0x50
>> <4>[ 3.164804] msm_reset_dm_count+0x44/0x80
>> <4>[ 3.164810] __msm_console_write+0x1c8/0x1d0
>> <4>[ 3.164816] msm_serial_early_write_dm+0x3c/0x50
>> <4>[ 3.164823] console_unlock.part.6+0x468/0x528
>> <4>[ 3.164829] vprintk_emit+0x210/0x218
>> <4>[ 3.164835] vprintk_default+0x48/0x58
>> <4>[ 3.164841] vprintk_func+0xf0/0x1c0
>> <4>[ 3.164847] printk+0x74/0x94
>> <4>[ 3.164853] sci_init+0x24/0x3c
>> <4>[ 3.164859] do_one_initcall+0x54/0x248
>> <4>[ 3.164866] kernel_init_freeable+0x210/0x378
>> <4>[ 3.164872] kernel_init+0x18/0x118
>> <4>[ 3.164878] ret_from_fork+0x10/0x1c
>> <0>[ 3.164884] Code: aa1e03e0 8b214273 97e616f7 d503201f (b9400260)
>>
>> Seems to be issue with the msm serial driver and not ftrace.
>> Could someone look into it.
>>
>> One more thing is for pstore dmesg-ramoops, I had to change late_initcall to
>> postcore_initcall which brings the question as to why we changed to
>> late_initcall?
>> Simple git blame shows to support crypto compress api, but is it really
>> helpful? A lot of boottime issues can be caught with pstore enabled at
>> postcore_initcall rather than late_initcall, this backtrace
>> is just one example. Is there any way we could change this?
>
> Any chance you are able to dig deeper into the stack trace? I would
> disassemble vmlinux and see which part of msm_read is generating the
> synchronous external abort and look into that.
>

Yes I had checked the part of msm_read which was generating the abort
and it always seems to be in "msm_wait_for_xmitr" at below pointed location.

static inline void msm_wait_for_xmitr(struct uart_port *port)
{
while (!(msm_read(port, UART_SR) & UART_SR_TX_EMPTY)) { <---
if (msm_read(port, UART_ISR) & UART_ISR_TX_READY)
break;
udelay(1);
}
msm_write(port, UART_CR_CMD_RESET_TX_READY, UART_CR);
}

Also I could confirm that this path is entered repeatedly(with tracing
register reads/writes from my patch series in
https://lore.kernel.org/patchwork/cover/983795/ and tp_printk) and crash
is seen at some random time, so could not get much from this.

> Also similar to what Steve suggested, I wonder if it boots for you if you
> annotate all the functions in the serial driver with 'notrace'.
>

I have tried this too, but still the target crashes. So I am doubtful if
this is ftrace issue? Maybe earlycon uart issue as Stephen is suspecting.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-17 14:51:05

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/17/2018 5:08 PM, Sai Prakash Ranjan wrote:
>>
>> What do you think about the (untested) patch below? It seems to me
>> that it
>> should solve the issue of missing early crash dumps, but I have not
>> tested it
>> yet. Sai, would you mind trying it out and let me know if you can see the
>> early crash dumps properly now?
>>
>> ----8<---
>> From: "Joel Fernandes (Google)" <[email protected]>
>> Subject: [RFC] pstore: allocate compression during late_initcall
>>
>> ramoop's pstore registration (using pstore_register) has to run during
>> late_initcall because crypto backend may not be ready during
>> postcore_initcall. This causes missing of dmesg crash dumps which could
>> have been caught by pstore.
>>
>> Instead, lets allow ramoops pstore registration earlier, and once crypto
>> is ready we can initialize the compression.
>>
>> Reported-by: Sai Prakash Ranjan <[email protected]>
>> Signed-off-by: Joel Fernandes (Google) <[email protected]>
>> ---
>>   fs/pstore/platform.c | 13 +++++++++++++
>>   fs/pstore/ram.c      |  2 +-
>>   2 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
>> index 15e99d5a681d..f09066db2d4d 100644
>> --- a/fs/pstore/platform.c
>> +++ b/fs/pstore/platform.c
>> @@ -780,6 +780,19 @@ void __init pstore_choose_compression(void)
>>       }
>>   }
>> +static int __init pstore_compression_late_init(void)
>> +{
>> +    /*
>> +     * Check if any pstore backends registered earlier but did not
>> allocate
>> +     * for compression because crypto was not ready, if so then
>> initialize
>> +     * compression.
>> +     */
>> +    if (psinfo && !tfm)
>> +        allocate_buf_for_compression();
>> +    return 0;
>> +}
>> +late_initcall(pstore_compression_late_init);
>> +
>>   module_param(compress, charp, 0444);
>>   MODULE_PARM_DESC(compress, "Pstore compression to use");
>> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
>> index bbd1e357c23d..98e48d1a9776 100644
>> --- a/fs/pstore/ram.c
>> +++ b/fs/pstore/ram.c
>> @@ -940,7 +940,7 @@ static int __init ramoops_init(void)
>>       ramoops_register_dummy();
>>       return platform_driver_register(&ramoops_driver);
>>   }
>> -late_initcall(ramoops_init);
>> +postcore_initcall(ramoops_init);
>>   static void __exit ramoops_exit(void)
>>   {
>>
>
> Yes I could see the early crash dump. Also I tested with different
> compression (LZO) instead of deflate just to be sure and it works fine,
> thanks :)
>
> Tested-by: Sai Prakash Ranjan <[email protected]>
>

I just noticed that allocate_buf_for_compression() is also called from
pstore_register(). Shouldn't that call be removed now that ramoops_init
is moved to postcore_initcall and allocate_buf_for_compression() will
just return doing nothing when called from pstore_register()?

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-17 17:57:12

by Joel Fernandes

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Wed, Oct 17, 2018 at 08:19:41PM +0530, Sai Prakash Ranjan wrote:
> On 10/17/2018 5:08 PM, Sai Prakash Ranjan wrote:
> > >
> > > What do you think about the (untested) patch below? It seems to me
> > > that it
> > > should solve the issue of missing early crash dumps, but I have not
> > > tested it
> > > yet. Sai, would you mind trying it out and let me know if you can see the
> > > early crash dumps properly now?
> > >
> > > ----8<---
> > > From: "Joel Fernandes (Google)" <[email protected]>
> > > Subject: [RFC] pstore: allocate compression during late_initcall
> > >
> > > ramoop's pstore registration (using pstore_register) has to run during
> > > late_initcall because crypto backend may not be ready during
> > > postcore_initcall. This causes missing of dmesg crash dumps which could
> > > have been caught by pstore.
> > >
> > > Instead, lets allow ramoops pstore registration earlier, and once crypto
> > > is ready we can initialize the compression.
> > >
> > > Reported-by: Sai Prakash Ranjan <[email protected]>
> > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > > ---
> > > ? fs/pstore/platform.c | 13 +++++++++++++
> > > ? fs/pstore/ram.c????? |? 2 +-
> > > ? 2 files changed, 14 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
> > > index 15e99d5a681d..f09066db2d4d 100644
> > > --- a/fs/pstore/platform.c
> > > +++ b/fs/pstore/platform.c
> > > @@ -780,6 +780,19 @@ void __init pstore_choose_compression(void)
> > > ????? }
> > > ? }
> > > +static int __init pstore_compression_late_init(void)
> > > +{
> > > +??? /*
> > > +???? * Check if any pstore backends registered earlier but did not
> > > allocate
> > > +???? * for compression because crypto was not ready, if so then
> > > initialize
> > > +???? * compression.
> > > +???? */
> > > +??? if (psinfo && !tfm)
> > > +??????? allocate_buf_for_compression();
> > > +??? return 0;
> > > +}
> > > +late_initcall(pstore_compression_late_init);
> > > +
> > > ? module_param(compress, charp, 0444);
> > > ? MODULE_PARM_DESC(compress, "Pstore compression to use");
> > > diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
> > > index bbd1e357c23d..98e48d1a9776 100644
> > > --- a/fs/pstore/ram.c
> > > +++ b/fs/pstore/ram.c
> > > @@ -940,7 +940,7 @@ static int __init ramoops_init(void)
> > > ????? ramoops_register_dummy();
> > > ????? return platform_driver_register(&ramoops_driver);
> > > ? }
> > > -late_initcall(ramoops_init);
> > > +postcore_initcall(ramoops_init);
> > > ? static void __exit ramoops_exit(void)
> > > ? {
> > >
> >
> > Yes I could see the early crash dump. Also I tested with different
> > compression (LZO) instead of deflate just to be sure and it works fine,
> > thanks :)
> >
> > Tested-by: Sai Prakash Ranjan <[email protected]>
> >

Thanks.

> I just noticed that allocate_buf_for_compression() is also called from
> pstore_register(). Shouldn't that call be removed now that ramoops_init is
> moved to postcore_initcall and allocate_buf_for_compression() will just
> return doing nothing when called from pstore_register()?

Yes, that is the point. If crypto is not ready then my thought is
allocate_buf_for_compression() called from pstore_register() should do
nothing and pstore will work uncompressed and the kdump infrastructure will
only cause uncompressed writes. But say if the kdump triggered *after* crypto
was ready, then the dump write out would be compressed since pstore is ready
for compression.

The idea is if a future pstore backend calls pstore_register late, then it
may as well do the allocate_buf_for_compression as well at that time when it
runs. In that cause pstore_compression_late_init would do nothing.

So this approach is both dynamic and future proof.

- Joel


2018-10-17 18:34:29

by Kees Cook

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Wed, Oct 17, 2018 at 10:56 AM, Joel Fernandes <[email protected]> wrote:
> On Wed, Oct 17, 2018 at 08:19:41PM +0530, Sai Prakash Ranjan wrote:
>> On 10/17/2018 5:08 PM, Sai Prakash Ranjan wrote:
>> > >
>> > > What do you think about the (untested) patch below? It seems to me
>> > > that it
>> > > should solve the issue of missing early crash dumps, but I have not
>> > > tested it
>> > > yet. Sai, would you mind trying it out and let me know if you can see the
>> > > early crash dumps properly now?
>> > >
>> > > ----8<---
>> > > From: "Joel Fernandes (Google)" <[email protected]>
>> > > Subject: [RFC] pstore: allocate compression during late_initcall
>> > >
>> > > ramoop's pstore registration (using pstore_register) has to run during
>> > > late_initcall because crypto backend may not be ready during
>> > > postcore_initcall. This causes missing of dmesg crash dumps which could
>> > > have been caught by pstore.
>> > >
>> > > Instead, lets allow ramoops pstore registration earlier, and once crypto
>> > > is ready we can initialize the compression.
>> > >
>> > > Reported-by: Sai Prakash Ranjan <[email protected]>
>> > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
>> > > ---
>> > > fs/pstore/platform.c | 13 +++++++++++++
>> > > fs/pstore/ram.c | 2 +-
>> > > 2 files changed, 14 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
>> > > index 15e99d5a681d..f09066db2d4d 100644
>> > > --- a/fs/pstore/platform.c
>> > > +++ b/fs/pstore/platform.c
>> > > @@ -780,6 +780,19 @@ void __init pstore_choose_compression(void)
>> > > }
>> > > }
>> > > +static int __init pstore_compression_late_init(void)
>> > > +{
>> > > + /*
>> > > + * Check if any pstore backends registered earlier but did not
>> > > allocate
>> > > + * for compression because crypto was not ready, if so then
>> > > initialize
>> > > + * compression.
>> > > + */
>> > > + if (psinfo && !tfm)
>> > > + allocate_buf_for_compression();
>> > > + return 0;
>> > > +}
>> > > +late_initcall(pstore_compression_late_init);
>> > > +
>> > > module_param(compress, charp, 0444);
>> > > MODULE_PARM_DESC(compress, "Pstore compression to use");
>> > > diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
>> > > index bbd1e357c23d..98e48d1a9776 100644
>> > > --- a/fs/pstore/ram.c
>> > > +++ b/fs/pstore/ram.c
>> > > @@ -940,7 +940,7 @@ static int __init ramoops_init(void)
>> > > ramoops_register_dummy();
>> > > return platform_driver_register(&ramoops_driver);
>> > > }
>> > > -late_initcall(ramoops_init);
>> > > +postcore_initcall(ramoops_init);
>> > > static void __exit ramoops_exit(void)
>> > > {
>> > >
>> >
>> > Yes I could see the early crash dump. Also I tested with different
>> > compression (LZO) instead of deflate just to be sure and it works fine,
>> > thanks :)
>> >
>> > Tested-by: Sai Prakash Ranjan <[email protected]>
>> >
>
> Thanks.
>
>> I just noticed that allocate_buf_for_compression() is also called from
>> pstore_register(). Shouldn't that call be removed now that ramoops_init is
>> moved to postcore_initcall and allocate_buf_for_compression() will just
>> return doing nothing when called from pstore_register()?
>
> Yes, that is the point. If crypto is not ready then my thought is
> allocate_buf_for_compression() called from pstore_register() should do
> nothing and pstore will work uncompressed and the kdump infrastructure will
> only cause uncompressed writes. But say if the kdump triggered *after* crypto
> was ready, then the dump write out would be compressed since pstore is ready
> for compression.
>
> The idea is if a future pstore backend calls pstore_register late, then it
> may as well do the allocate_buf_for_compression as well at that time when it
> runs. In that cause pstore_compression_late_init would do nothing.
>
> So this approach is both dynamic and future proof.

Yeah, thanks! I think this looks correct, but I'll spend some more
time testing it too.

-Kees

--
Kees Cook
Pixel Security

2018-10-18 02:45:12

by Steven Rostedt

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Wed, 17 Oct 2018 00:36:05 +0530
Sai Prakash Ranjan <[email protected]> wrote:

> On 10/17/2018 12:33 AM, Steven Rostedt wrote:
> > On Wed, 17 Oct 2018 00:31:03 +0530
> > Sai Prakash Ranjan <[email protected]> wrote:
> >
> >> Haa seems like you are right! With "ftrace=function
> >> ftrace_filter=msm_read" , I can trigger the crash, but
> >> sadly "ftrace_notrace=msm_read" also crashes.
> >
> > So there's more than one problem area.
> >
> > What about ftrace_notrace=m*
> >
> > ?
> >
>
> That too crashes.
>

So something else is causing an issue besides just msm_read.

Can you do an objdump -dr of the entire vmlinux binary and gzip it and
post it somewhere. Not sure if it would be too big to email. You could
try sending it to me privately. I'd like to see the binary that you are
using.

-- Steve

2018-10-18 05:22:21

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/18/2018 8:03 AM, Steven Rostedt wrote:
> On Wed, 17 Oct 2018 00:36:05 +0530
> Sai Prakash Ranjan <[email protected]> wrote:
>
>> On 10/17/2018 12:33 AM, Steven Rostedt wrote:
>>> On Wed, 17 Oct 2018 00:31:03 +0530
>>> Sai Prakash Ranjan <[email protected]> wrote:
>>>
>>>> Haa seems like you are right! With "ftrace=function
>>>> ftrace_filter=msm_read" , I can trigger the crash, but
>>>> sadly "ftrace_notrace=msm_read" also crashes.
>>>
>>> So there's more than one problem area.
>>>
>>> What about ftrace_notrace=m*
>>>
>>> ?
>>>
>>
>> That too crashes.
>>
>
> So something else is causing an issue besides just msm_read.
>
> Can you do an objdump -dr of the entire vmlinux binary and gzip it and
> post it somewhere. Not sure if it would be too big to email. You could
> try sending it to me privately. I'd like to see the binary that you are
> using.
>

I have sent the objdump and dot config to you privately.

Thanks,
Sai
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-18 13:17:56

by Steven Rostedt

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Thu, 18 Oct 2018 10:51:18 +0530
Sai Prakash Ranjan <[email protected]> wrote:

> > So something else is causing an issue besides just msm_read.
> >
> > Can you do an objdump -dr of the entire vmlinux binary and gzip it and
> > post it somewhere. Not sure if it would be too big to email. You could
> > try sending it to me privately. I'd like to see the binary that you are
> > using.
> >
>
> I have sent the objdump and dot config to you privately.

Thanks. I don't see anything that pops out, but then again, my arm asm
foo is very rusty (it has been literally decades since I did any arm
asm). I wonder if it could simply be a timing issue?

ffff0000086eb538 <msm_read.isra.1>:
ffff0000086eb538: a9be7bfd stp x29, x30, [sp,#-32]!
ffff0000086eb53c: 910003fd mov x29, sp
ffff0000086eb540: a90153f3 stp x19, x20, [sp,#16]
ffff0000086eb544: aa0003f4 mov x20, x0
ffff0000086eb548: 2a0103f3 mov w19, w1
ffff0000086eb54c: aa1e03e0 mov x0, x30
ffff0000086eb550: 97e6bae4 bl ffff00000809a0e0 <_mcount>

The above is changed to nop on boot, but then to:

bl ftrace_caller

When ftrace is enabled.

ffff0000086eb554: 8b334280 add x0, x20, w19, uxtw
ffff0000086eb558: b9400000 ldr w0, [x0]
ffff0000086eb55c: a94153f3 ldp x19, x20, [sp,#16]
ffff0000086eb560: a8c27bfd ldp x29, x30, [sp],#32
ffff0000086eb564: d65f03c0 ret



ffff00000809a0e4 <ftrace_caller>:
ffff00000809a0e4: a9bf7bfd stp x29, x30, [sp,#-16]!
ffff00000809a0e8: 910003fd mov x29, sp
ffff00000809a0ec: d10013c0 sub x0, x30, #0x4
ffff00000809a0f0: f94003a1 ldr x1, [x29]
ffff00000809a0f4: f9400421 ldr x1, [x1,#8]
ffff00000809a0f8: d1001021 sub x1, x1, #0x4

ffff00000809a0fc <ftrace_call>:
ffff00000809a0fc: d503201f nop

The above nop gets patched to:

bl ftrace_ops_no_ops

Which will iterate through all the registered functions.


ffff00000809a100 <ftrace_graph_call>:
ffff00000809a100: d503201f nop

The above only gets set when function graph tracer is enabled, which it
is not in this case.

ffff00000809a104: a8c17bfd ldp x29, x30, [sp],#16
ffff00000809a108: d65f03c0 ret


Anyone see any problems here?

-- Steve

2018-10-19 04:18:48

by Joel Fernandes

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Thu, Oct 18, 2018 at 09:17:06AM -0400, Steven Rostedt wrote:
> On Thu, 18 Oct 2018 10:51:18 +0530
> Sai Prakash Ranjan <[email protected]> wrote:
>
> > > So something else is causing an issue besides just msm_read.
> > >
> > > Can you do an objdump -dr of the entire vmlinux binary and gzip it and
> > > post it somewhere. Not sure if it would be too big to email. You could
> > > try sending it to me privately. I'd like to see the binary that you are
> > > using.
> > >
> >
> > I have sent the objdump and dot config to you privately.
>
> Thanks. I don't see anything that pops out, but then again, my arm asm
> foo is very rusty (it has been literally decades since I did any arm
> asm). I wonder if it could simply be a timing issue?
>
> ffff0000086eb538 <msm_read.isra.1>:
> ffff0000086eb538: a9be7bfd stp x29, x30, [sp,#-32]!
> ffff0000086eb53c: 910003fd mov x29, sp
> ffff0000086eb540: a90153f3 stp x19, x20, [sp,#16]
> ffff0000086eb544: aa0003f4 mov x20, x0
> ffff0000086eb548: 2a0103f3 mov w19, w1
> ffff0000086eb54c: aa1e03e0 mov x0, x30
> ffff0000086eb550: 97e6bae4 bl ffff00000809a0e0 <_mcount>
>
> The above is changed to nop on boot, but then to:
>
> bl ftrace_caller
>
> When ftrace is enabled.
>
> ffff0000086eb554: 8b334280 add x0, x20, w19, uxtw
> ffff0000086eb558: b9400000 ldr w0, [x0]
> ffff0000086eb55c: a94153f3 ldp x19, x20, [sp,#16]
> ffff0000086eb560: a8c27bfd ldp x29, x30, [sp],#32
> ffff0000086eb564: d65f03c0 ret
>
>
>
> ffff00000809a0e4 <ftrace_caller>:
> ffff00000809a0e4: a9bf7bfd stp x29, x30, [sp,#-16]!
> ffff00000809a0e8: 910003fd mov x29, sp
> ffff00000809a0ec: d10013c0 sub x0, x30, #0x4
> ffff00000809a0f0: f94003a1 ldr x1, [x29]
> ffff00000809a0f4: f9400421 ldr x1, [x1,#8]
> ffff00000809a0f8: d1001021 sub x1, x1, #0x4
>
> ffff00000809a0fc <ftrace_call>:
> ffff00000809a0fc: d503201f nop
>
> The above nop gets patched to:
>
> bl ftrace_ops_no_ops
>
> Which will iterate through all the registered functions.
>
>
> ffff00000809a100 <ftrace_graph_call>:
> ffff00000809a100: d503201f nop
>
> The above only gets set when function graph tracer is enabled, which it
> is not in this case.
>
> ffff00000809a104: a8c17bfd ldp x29, x30, [sp],#16
> ffff00000809a108: d65f03c0 ret
>
>
> Anyone see any problems here?

This seems sane to me, he says in the other thread that he put 'notrace' to
the msm serial functions (which AIUI should prevent ftrace instrumentation)
and he still sees the issue.

thanks,

- Joel


2018-10-19 06:56:24

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/19/2018 9:47 AM, Joel Fernandes wrote:
> On Thu, Oct 18, 2018 at 09:17:06AM -0400, Steven Rostedt wrote:
>> On Thu, 18 Oct 2018 10:51:18 +0530
>> Sai Prakash Ranjan <[email protected]> wrote:
>>
>>>> So something else is causing an issue besides just msm_read.
>>>>
>>>> Can you do an objdump -dr of the entire vmlinux binary and gzip it and
>>>> post it somewhere. Not sure if it would be too big to email. You could
>>>> try sending it to me privately. I'd like to see the binary that you are
>>>> using.
>>>>
>>>
>>> I have sent the objdump and dot config to you privately.
>>
>> Thanks. I don't see anything that pops out, but then again, my arm asm
>> foo is very rusty (it has been literally decades since I did any arm
>> asm). I wonder if it could simply be a timing issue?
>>
>> ffff0000086eb538 <msm_read.isra.1>:
>> ffff0000086eb538: a9be7bfd stp x29, x30, [sp,#-32]!
>> ffff0000086eb53c: 910003fd mov x29, sp
>> ffff0000086eb540: a90153f3 stp x19, x20, [sp,#16]
>> ffff0000086eb544: aa0003f4 mov x20, x0
>> ffff0000086eb548: 2a0103f3 mov w19, w1
>> ffff0000086eb54c: aa1e03e0 mov x0, x30
>> ffff0000086eb550: 97e6bae4 bl ffff00000809a0e0 <_mcount>
>>
>> The above is changed to nop on boot, but then to:
>>
>> bl ftrace_caller
>>
>> When ftrace is enabled.
>>
>> ffff0000086eb554: 8b334280 add x0, x20, w19, uxtw
>> ffff0000086eb558: b9400000 ldr w0, [x0]
>> ffff0000086eb55c: a94153f3 ldp x19, x20, [sp,#16]
>> ffff0000086eb560: a8c27bfd ldp x29, x30, [sp],#32
>> ffff0000086eb564: d65f03c0 ret
>>
>>
>>
>> ffff00000809a0e4 <ftrace_caller>:
>> ffff00000809a0e4: a9bf7bfd stp x29, x30, [sp,#-16]!
>> ffff00000809a0e8: 910003fd mov x29, sp
>> ffff00000809a0ec: d10013c0 sub x0, x30, #0x4
>> ffff00000809a0f0: f94003a1 ldr x1, [x29]
>> ffff00000809a0f4: f9400421 ldr x1, [x1,#8]
>> ffff00000809a0f8: d1001021 sub x1, x1, #0x4
>>
>> ffff00000809a0fc <ftrace_call>:
>> ffff00000809a0fc: d503201f nop
>>
>> The above nop gets patched to:
>>
>> bl ftrace_ops_no_ops
>>
>> Which will iterate through all the registered functions.
>>
>>
>> ffff00000809a100 <ftrace_graph_call>:
>> ffff00000809a100: d503201f nop
>>
>> The above only gets set when function graph tracer is enabled, which it
>> is not in this case.
>>
>> ffff00000809a104: a8c17bfd ldp x29, x30, [sp],#16
>> ffff00000809a108: d65f03c0 ret
>>
>>
>> Anyone see any problems here?
>
> This seems sane to me, he says in the other thread that he put 'notrace' to
> the msm serial functions (which AIUI should prevent ftrace instrumentation)
> and he still sees the issue.
>

Yes I did add notrace to all functions in msm serial and checked the
objdump to make sure that those were not instrumented, and yet the
target crashed. This doesnt seem like an issue with ftrace but rather
with msm early con.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-19 13:52:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Fri, 19 Oct 2018 12:24:05 +0530
Sai Prakash Ranjan <[email protected]> wrote:

> >> Anyone see any problems here?
> >
> > This seems sane to me, he says in the other thread that he put 'notrace' to
> > the msm serial functions (which AIUI should prevent ftrace instrumentation)
> > and he still sees the issue.
> >
>
> Yes I did add notrace to all functions in msm serial and checked the
> objdump to make sure that those were not instrumented, and yet the
> target crashed. This doesnt seem like an issue with ftrace but rather
> with msm early con.

The thing that still bothers me is that it boots fine without ftrace
running, and that it requires tracing something to cause the problem.
This tells me that ftrace has something to do with it. Perhaps it's
just changing the timing.

You said that if you add 'ftrace_filter=msm_read' to the command line,
it still crashes?

So only tracing that function we have an issue, right?

-- Steve

2018-10-19 14:49:12

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/19/2018 7:21 PM, Steven Rostedt wrote:
> On Fri, 19 Oct 2018 12:24:05 +0530
> Sai Prakash Ranjan <[email protected]> wrote:
>
>>>> Anyone see any problems here?
>>>
>>> This seems sane to me, he says in the other thread that he put 'notrace' to
>>> the msm serial functions (which AIUI should prevent ftrace instrumentation)
>>> and he still sees the issue.
>>>
>>
>> Yes I did add notrace to all functions in msm serial and checked the
>> objdump to make sure that those were not instrumented, and yet the
>> target crashed. This doesnt seem like an issue with ftrace but rather
>> with msm early con.
>
> The thing that still bothers me is that it boots fine without ftrace
> running, and that it requires tracing something to cause the problem.
> This tells me that ftrace has something to do with it. Perhaps it's
> just changing the timing.
>

Yes tracing does cause issue but only if earlycon is present. If I
disable earlycon, then even tracing wont cause any issue and target
boots fine.

> You said that if you add 'ftrace_filter=msm_read' to the command line,
> it still crashes?
>
> So only tracing that function we have an issue, right?
>

Tracing msm_read does cause the crash, but that is not the only one
causing the crash because even after "ftrace_notrace=msm_read" the board
crashes which is why I suspect msm earlycon and not ftrace.

Thanks,
Sai
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-10-19 15:13:00

by Steven Rostedt

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Fri, 19 Oct 2018 20:18:19 +0530
Sai Prakash Ranjan <[email protected]> wrote:


> > You said that if you add 'ftrace_filter=msm_read' to the command line,
> > it still crashes?
> >
> > So only tracing that function we have an issue, right?
> >
>
> Tracing msm_read does cause the crash, but that is not the only one
> causing the crash because even after "ftrace_notrace=msm_read" the board
> crashes which is why I suspect msm earlycon and not ftrace.

I'm saying there's a combination of the two. Because when ftrace is not
involved, early con doesn't crash.

I would focus on why earlycon crashes when only msm_read is traced.
That should help narrow down the cause.

What we know is:

earlycon -> boots

ftrace=function -> boots

earlycon ftrace=function ftrace_filter=msm_read -> crashes

If we remove the "ftrace=function ftrace_filter=msm_read" it boots
fine, and I'm assuming that if we just remove earlycon (keeping the
ftrace_filter), it also boots too.

Thus, ftrace tracing msm_read is causing earlycon to do something that
triggers the crash.

-- Steve


2018-10-25 14:39:07

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 10/19/2018 8:42 PM, Steven Rostedt wrote:
> On Fri, 19 Oct 2018 20:18:19 +0530
> Sai Prakash Ranjan <[email protected]> wrote:
>
>
>>> You said that if you add 'ftrace_filter=msm_read' to the command
>>> line,
>>> it still crashes?
>>>
>>> So only tracing that function we have an issue, right?
>>>
>>
>> Tracing msm_read does cause the crash, but that is not the only one
>> causing the crash because even after "ftrace_notrace=msm_read" the
>> board
>> crashes which is why I suspect msm earlycon and not ftrace.
>
> I'm saying there's a combination of the two. Because when ftrace is not
> involved, early con doesn't crash.
>
> I would focus on why earlycon crashes when only msm_read is traced.
> That should help narrow down the cause.
>
> What we know is:
>
> earlycon -> boots
>
> ftrace=function -> boots
>
> earlycon ftrace=function ftrace_filter=msm_read -> crashes
>
> If we remove the "ftrace=function ftrace_filter=msm_read" it boots
> fine, and I'm assuming that if we just remove earlycon (keeping the
> ftrace_filter), it also boots too.
>
> Thus, ftrace tracing msm_read is causing earlycon to do something that
> triggers the crash.
>

Hi,

Sorry for late response on this, was running out of debug options
;-)

Finally after bisecting (which I should have done earlier) this issue
till 4.4 kernel where I don't see the crash, I could see that
dma node in "apq8016-sbc.dtsi" is somehow causing the crash.

dma@7884000 {
status = "okay"; <--- This one
};

serial@78af000 {
label = "LS-UART0";
status = "okay";
pinctrl-names = "default", "sleep";
pinctrl-0 = <&blsp1_uart1_default>;
pinctrl-1 = <&blsp1_uart1_sleep>;
};

serial@78b0000 {
label = "LS-UART1";
status = "okay";
pinctrl-names = "default", "sleep";
pinctrl-0 = <&blsp1_uart2_default>;
pinctrl-1 = <&blsp1_uart2_sleep>;
};

"If I disable dma node and LS-UART0, then I don't see any crash and
ftrace also works fine"

And one more observation is that even without ftrace cmdline, if I use
earlycon and disable dma, I face the same crash.

So basically this seems to be some kind of earlycon and dma issue and
not ftrace(I can be wrong).

So adding Srinivas for more info on this dma node.

Also just for a note: apq8096-db820c.dtsi shows UART0 is disabled
because
bootloader does not allow access to it. Could this also be the case for
db410c?

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member
of Code Aurora Forum, hosted by The Linux Foundation

2018-11-13 09:45:10

by Srinivas Kandagatla

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

Hi Sai,



On 25/10/18 15:36, [email protected] wrote:
> "If I disable dma node and LS-UART0, then I don't see any crash and
> ftrace also works fine"
>
> And one more observation is that even without ftrace cmdline, if I use
> earlycon and disable dma, I face the same crash.
>
> So basically this seems to be some kind of earlycon and dma issue and
> not ftrace(I can be wrong).
>
> So adding Srinivas for more info on this dma node.

Its Interesting that my old email conversations with SBoyd show that I
have investigated this issue in early 2016!

My analysis so far:

This reason for such behavior is due the common iface clock
(GCC_BLSP1_AHB_CLK) across multiple drivers(serial ports, bam dma
and other low speed devices).
The code flow in DB410C is bit different, as the uart0 is first
attempted to set as console and then uart1, this ordering triggers
pm state change uart_change_pm(state, UART_PM_STATE_OFF) from serial
core while setting up uart0, this would go and disable all the
clocks for uart0.
As uart1 is not setup Yet, and earlycon is still active, any
attempts by earlycon to write to registers would trigger a system
reboot as the clock was just disabled by uart0 change_pm code.

This can even be triggered with any drivers like spi which uses same
clock I guess.

Hope it helps,

Either earlycon needs to reference the clocks or those clocks needs to
be marked always-on (but only with earlycon).

>
> Also just for a note: apq8096-db820c.dtsi shows UART0 is disabled because
> bootloader does not allow access to it. Could this also be the case for
> db410c?
No, this is not the case with DB410c. DB820c has added restrictions in
TZ, I think new booloaders should have solved this issue.


Thanks,
srini

2018-11-15 10:34:33

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 11/13/2018 3:14 PM, Srinivas Kandagatla wrote:
> Hi Sai,
>
>
>
> On 25/10/18 15:36, [email protected] wrote:
>> "If I disable dma node and LS-UART0, then I don't see any crash and
>> ftrace also works fine"
>>
>> And one more observation is that even without ftrace cmdline, if I use
>> earlycon and disable dma, I face the same crash.
>>
>> So basically this seems to be some kind of earlycon and dma issue and
>> not ftrace(I can be wrong).
>>
>> So adding Srinivas for more info on this dma node.
>
> Its Interesting that my old email conversations with SBoyd show that I
> have investigated this issue in early 2016!
>
> My analysis so far:
>
> This reason for such behavior is due the common iface clock
> (GCC_BLSP1_AHB_CLK) across multiple drivers(serial ports, bam dma
> and other low speed devices).
> The code flow in DB410C is bit different, as the uart0 is first
> attempted to set as console and then uart1, this ordering triggers
> pm state change uart_change_pm(state, UART_PM_STATE_OFF) from serial
> core while setting up uart0, this would go and disable all the
> clocks for uart0.
> As uart1 is not setup Yet, and earlycon is still active, any
> attempts by earlycon to write to registers would trigger a system
> reboot as the clock was just disabled by uart0 change_pm code.
>
> This can even be triggered with any drivers like spi which uses same
> clock I guess.
>
> Hope it helps,
>
> Either earlycon needs to reference the clocks or those clocks needs to
> be marked always-on (but only with earlycon).
>
>>
>> Also just for a note: apq8096-db820c.dtsi shows UART0 is disabled because
>> bootloader does not allow access to it. Could this also be the case
>> for db410c?
> No, this is not the case with DB410c. DB820c has added restrictions in
> TZ, I think new booloaders should have solved this issue.
>
>

Hi Srinivas,

Thanks a lot for pointing out the cause of crash.
I just tried setting GCC_BLSP1_AHB_CLK with flag CLK_IS_CRITICAL and the
crash disappears.

But I suppose setting CLK_IS_CRITICAL is not the solution?

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-11-15 10:55:37

by Srinivas Kandagatla

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs



On 15/11/18 10:33, Sai Prakash Ranjan wrote:
> On 11/13/2018 3:14 PM, Srinivas Kandagatla wrote:
>> Hi Sai,
>>
>>
>>
>> On 25/10/18 15:36, [email protected] wrote:
>>> "If I disable dma node and LS-UART0, then I don't see any crash and
>>> ftrace also works fine"
>>>
>>> And one more observation is that even without ftrace cmdline, if I use
>>> earlycon and disable dma, I face the same crash.
>>>
>>> So basically this seems to be some kind of earlycon and dma issue and
>>> not ftrace(I can be wrong).
>>>
>>> So adding Srinivas for more info on this dma node.
>>
>> Its Interesting that my old email conversations with SBoyd show that I
>> have investigated this issue in early 2016!
>>
>> My analysis so far:
>>
>> This reason for such behavior is due the common iface clock
>> (GCC_BLSP1_AHB_CLK) across multiple drivers(serial ports, bam dma
>> and other low speed devices).
>> The code flow in DB410C is bit different, as the uart0 is first
>> attempted to set as console and then uart1, this ordering triggers
>> pm state change uart_change_pm(state, UART_PM_STATE_OFF) from serial
>> core while setting up uart0, this would go and disable all the
>> clocks for uart0.
>> As uart1 is not setup Yet, and earlycon is still active, any
>> attempts by earlycon to write to registers would trigger a system
>> reboot as the clock was just disabled by uart0 change_pm code.
>>
>> This can even be triggered with any drivers like spi which uses same
>> clock I guess.
>>
>> Hope it helps,
>>
>> Either earlycon needs to reference the clocks or those clocks needs to
>> be marked always-on (but only with earlycon).
>>
>>>
>>> Also just for a note: apq8096-db820c.dtsi shows UART0 is disabled
>>> because
>>> bootloader does not allow access to it. Could this also be the case
>>> for db410c?
>> No, this is not the case with DB410c. DB820c has added restrictions in
>> TZ, I think new booloaders should have solved this issue.
>>
>>
>
> Hi Srinivas,
>
> Thanks a lot for pointing out the cause of crash.
> I just tried setting GCC_BLSP1_AHB_CLK with flag CLK_IS_CRITICAL and the
> crash disappears.
>
> But I suppose setting CLK_IS_CRITICAL is not the solution?
>

Yes, this is not the solution, but it proves that the hand-off between
booloaders and kernel is the issue.

In general there is wider issue with resources hand-off between
bootloader and kernel.

There has been some proposal in the past by Viresh for a new framework
called boot-constriants (https://lkml.org/lkml/2017/12/14/440) which am
not sure if its still actively looked at. But something similar should
be the way to address such issues.

--srini




> Thanks,
> Sai

2018-11-16 03:42:30

by Viresh Kumar

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On Thu, Nov 15, 2018 at 4:23 PM Srinivas Kandagatla
<[email protected]> wrote:

> Yes, this is not the solution, but it proves that the hand-off between
> booloaders and kernel is the issue.
>
> In general there is wider issue with resources hand-off between
> bootloader and kernel.
>
> There has been some proposal in the past by Viresh for a new framework
> called boot-constriants (https://lkml.org/lkml/2017/12/14/440) which am
> not sure if its still actively looked at. But something similar should
> be the way to address such issues.

It isn't dead code yet and I am waiting to gain few more use-cases
before I attempt
to convince Greg again :)

Here is the code..

git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux.git boot-constraint

--
viresh

2018-11-16 10:51:17

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Crash in msm serial on dragonboard with ftrace bootargs

On 11/16/2018 9:09 AM, Viresh Kumar wrote:
> On Thu, Nov 15, 2018 at 4:23 PM Srinivas Kandagatla
> <[email protected]> wrote:
>
>> Yes, this is not the solution, but it proves that the hand-off between
>> booloaders and kernel is the issue.
>>
>> In general there is wider issue with resources hand-off between
>> bootloader and kernel.
>>
>> There has been some proposal in the past by Viresh for a new framework
>> called boot-constriants (https://lkml.org/lkml/2017/12/14/440) which am
>> not sure if its still actively looked at. But something similar should
>> be the way to address such issues.
>
> It isn't dead code yet and I am waiting to gain few more use-cases
> before I attempt
> to convince Greg again :)
>
> Here is the code..
>
> git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux.git boot-constraint
>
> --
> viresh
>

Maybe you can take this earlycon issue as a usecase.
I have added a boot constraint for earlycon on db410c and have sent a
patch. Whenever you repitch boot-constraint, you can add that as well :)

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation