While boting linux next-20220308 on BeagleBoard-X15 and qemu arm the following
kernel crash reported which is CONFIG_KASAN enabled build [1] & [2].
Few configs [3]
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
CONFIG_KASAN_OUTLINE=y
# CONFIG_KASAN_INLINE is not set
CONFIG_KASAN_STACK=y
<>
# CONFIG_UNWINDER_FRAME_POINTER is not set
CONFIG_UNWINDER_ARM=y
CONFIG_ARM_UNWIND=y
metadata:
git_ref: master
git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
git_sha: cb153b68ff91cbc434f3de70ac549e110543e1bb
git_describe: next-20220308
kernel-config: https://builds.tuxbuild.com/2661dIAPUjE2DMJvye91He2gus0/config
[ 0.000000] Linux version 5.17.0-rc7-next-20220308
(tuxmake@tuxmake) (arm-linux-gnueabihf-gcc (Debian 10.2.1-6) 10.2.1
20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP @1646729452
[ 0.000000] CPU: ARMv7 Processor [410fd034] revision 4 (ARMv7), cr=10c5383d
<trim>
[[0;32m OK [0m] Started Rebuild Dynamic Linker Cache.
[ 10.756111] 8<--- cut here ---
[ 10.757434] Unable to handle kernel paging request at virtual
address 00003fe4
[ 10.760221] [00003fe4] *pgd=00000000
[ 10.761624] Internal error: Oops: 5 [#1] SMP ARM
[ 10.763414] Modules linked in:
[ 10.764628] CPU: 0 PID: 152 Comm: udevadm Not tainted
5.17.0-rc7-next-20220308 #1
[ 10.767979] Hardware name: Generic DT based system
[ 10.770206] PC is at __read_once_word_nocheck+0x0/0x8
[ 10.772606] LR is at unwind_frame+0x680/0xab4
[ 10.774676] pc : [<c0313ffc>] lr : [<c031482c>] psr: 600f0013
[ 10.777544] sp : c66b3910 ip : c34d5940 fp : 00000000
[ 10.779972] r10: 00000000 r9 : c66b3a20 r8 : 809b47af
[ 10.782413] r7 : c2957230 r6 : 00003fe4 r5 : 00004004 r4 : c66b3990
[ 10.785427] r3 : 00004004 r2 : 00000007 r1 : 00000000 r0 : 00003fe4
[ 10.788443] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 10.791726] Control: 10c5383d Table: 4746006a DAC: 00000051
[ 10.794420] Register r0 information: non-paged memory
[ 10.796788] Register r1 information: NULL pointer
[ 10.799004] Register r2 information: non-paged memory
[ 10.801346] Register r3 information: non-paged memory
[ 10.803719] Register r4 information: non-slab/vmalloc memory
[ 10.806367] Register r5 information: non-paged memory
[ 10.808720] Register r6 information: non-paged memory
[ 10.811091] Register r7 information: non-slab/vmalloc memory
[ 10.813728] Register r8 information: non-paged memory
[ 10.816094] Register r9 information: non-slab/vmalloc memory
[ 10.818743] Register r10 information: NULL pointer
[ 10.820974] Register r11 information: NULL pointer
[ 10.823227] Register r12 information: non-slab/vmalloc memory
[ 10.825895] Process udevadm (pid: 152, stack limit = 0xed5429af)
[ 10.828682] Stack: (0xc66b3910 to 0xc66b4000)
[ 10.830737] 3900: 00000000
c8b591c0 c8b59210 c3842400
[ 10.834526] 3920: c34d5940 00008000 00003fe4 c66b3a6c c66b3a78
c66b3a70 b7cd672c c61d7380
[ 10.838303] 3940: c66b3a74 c66b3dac c2957230 00000003 c66b3a20
1965ee6e 00000000 c5cf6400
[ 10.842074] 3960: 41b58ab3 c277a460 c03141ac 00000000 c3573fc0
c4550588 c2d0a900 c0519f68
[ 10.845841] 3980: 41b58ab3 c2789cf8 c382243c 00000000 c053bf94
00000003 c61d7380 25706000
[ 10.849623] 39a0: c4c04000 c7f1d810 c7f1d828 00004004 c4052058
00003fe4 c182cf44 00000000
[ 10.853410] 39c0: c452e758 00000064 00090009 00000003 00090009
00000002 00000002 c08cb260
[ 10.857200] 39e0: c2d0a900 b7cd6744 c66b3a80 c61d7380 c66b3a40
c3573fc0 00000003 1965ee6e
[ 10.860998] 3a00: 1965ee6e 00000000 600f0013 c66b3b60 1965ee6e
c28974e0 c66b3bc4 c043af20
[ 10.864821] 3a20: 1965ee6e c2c13820 00000000 c038559c c2c13820
1965ee6e c2c13820 c61d7380
[ 10.868604] 3a40: b7cd6754 c61d7380 00000000 00000004 00000cc0
c2c07150 c66b3a94 c030da40
[ 10.872378] 3a60: c66b3ac0 00000000 00000000 00004004 c66b3db0
c182cf44 c182cf44 c66b3dac
[ 10.876170] 3a80: c66b3cf8 1965ee6e 00260026 c66b3b00 b7cd6770
c040c060 00000026 c08cb260
[ 10.880372] 3aa0: 41b58ab3 c278c190 c040bfdc 000800e0 c19943e0
c7f1d800 c66b3cb4 c0562a1c
[ 10.884274] 3ac0: 00000005 00000040 c66b3ba0 00000001 00000001
c61d7380 c0990664 00000cc0
[ 10.888160] 3ae0: c7f1d800 c66b3cb4 02e39ae9 0000b8e6 00000000
c03cc370 c745b400 b7cd676c
[ 10.892023] 3b00: c2c06f50 0000b8e6 c66b3b80 c03ccaec c66b3d00
1965ee6e 00000000 c66b3ba0
[ 10.895925] 3b20: c28974e0 00000000 c2cf6560 c051d4c4 00000012
c3801200 00000cc0 c61d7380
[ 10.899823] 3b40: 00000004 c3801204 c2c07278 c66b3ba0 00000001
c61d7380 00000000 00000000
[ 10.903713] 3b60: 41b58ab3 c2788e18 c03cce00 c61d7380 b7cd677c
c61d7380 00000000 c053746c
[ 10.907826] 3b80: 41b58ab3 c2796554 c051d3a8 c030da40 c66b3c00
00000000 00000000 c66b3bc4
[ 10.911698] 3ba0: c03ccaec c03a75dc c03ad5e0 c03b482c c182cf44
1965ee6e 1965ee6e c66b3c40
[ 10.915579] 3bc0: c4550588 c040c060 b7cd6788 1965ee6e 00000000
1965ee6e c745c400 e82aabc0
[ 10.919456] 3be0: c745b400 8079a2ac 00000002 e82aab40 00000000
00000000 e82aacc4 c03a75dc
[ 10.923335] 3c00: c9055504 00000000 00000000 00000001 00000000
c61d7380 b7cd6788 c745b480
[ 10.927195] 3c20: c66b3ca0 1965ee6e c040bfdc c61d7380 00000000
1965ee6e 00000000 c9055504
[ 10.931078] 3c40: 41b58ab3 c2788e18 c03a7440 c053746c c9055500
c61d7380 ff7eae78 c6237000
[ 10.934959] 3c60: 25706000 00000000 00013542 600f0093 c61d7654
c044e294 c61d7600 c745c000
[ 10.938847] 3c80: 00013542 00000000 00013542 00000000 c6237000
c03ac094 00000800 c09a7ee8
[ 10.942733] 3ca0: 41b58ab3 1965ee6e e82aabc0 c745b400 c745b400
e82aabc0 e82ab180 c3511180
[ 10.946603] 3cc0: 8079a2ac 00000002 e82aad1c c03ad5e0 00000000
c745b400 c3511960 e82aab40
[ 10.950482] 3ce0: c2c06f50 e82ab1ec e82aabc0 c61d7614 c76e3480
c03b482c c289578c 1965ee6e
[ 10.954348] 3d00: c61d7380 c3801200 00000000 c356e980 00000cc0
00000004 00000cc0 c2c07150
[ 10.958214] 3d20: c0615340 c0516d4c c4c04c80 c2c18b20 c2c070dc
c8b5be00 00000000 c66b3e50
[ 10.962131] 3d40: c8b5be44 c66b3e10 c66b3e10 00000000 00000003
c0615340 c66b3e10 00000001
[ 10.966006] 3d60: 00000001 c66b3df0 c66b3ec0 b7cd67b8 c7f1d800
c61d7380 c66b3f40 c66b3ec0
[ 10.969884] 3d80: c66b3e10 00000000 c19abc80 c053bf94 00000003
c61d7380 25706000 c4c04000
[ 10.973748] 3da0: c7f1d810 c7f1d828 00004004 c182cf44 c7f1d814
00020001 c7f1d800 00000020
[ 10.977609] 3dc0: 41b58ab3 c2797334 c053ba98 c2ba4b40 00000002
00000002 c61d7598 c61d7388
[ 10.981622] 3de0: b7cd67c0 c4c04ca8 c66b3e60 e82ab16c b6acaf65
00000003 00080040 000800e0
[ 10.985553] 3e00: 41b58ab3 c27887f4 c182c9ac c04cb4d8 00010000
00000000 00000003 c66b3df0
[ 10.989488] 3e20: 00000001 00000000 c08c2d08 00000100 00000001
c04061c0 e82ab400 c7f1d900
[ 10.993781] 3e40: c053dec0 c04021e0 00000000 00000000 c7f1d800
00000000 00000000 00000000
[ 10.997862] 3e60: 00000000 00000000 00000000 40040000 00000000
00000000 c61d75a4 c03002c4
[ 11.001983] 3e80: 00000002 5ac3c35a b6dffc88 c66b3fb0 c66b3ea4
c182d8e4 c387f780 c61d7ae4
[ 11.006052] 3ea0: 00000003 c61d7380 c66b3f80 c61d7380 b6acaf65
c056c8bc 5ac3c35a 00004000
[ 11.010057] 3ec0: 41b58ab3 c27970a8 c0537a1c 0051d5b8 c7f1d800
1965ee6e c7f1d820 c7f1d800
[ 11.014050] 3ee0: b7cd67e4 c7f1d800 c61d7380 c66b3f80 c7f1d840
00000000 00000000 c053c37c
[ 11.018369] 3f00: 41b58ab3 c2798010 c055c494 00000000 00000000
00000000 00000012 00000000
[ 11.022313] 3f20: 41b58ab3 c2797380 c053c2c0 00000000 00000000
00000000 c350fb20 c0391fe0
[ 11.026184] 3f40: 00000000 00000000 c7f1d900 c61d7380 c350fb20
00000001 aedbb900 c0385240
[ 11.030043] 3f60: c7f1d900 c61d7380 00000001 00000006 c03002c4
c61d7380 00000006 1965ee6e
[ 11.033946] 3f80: c7f1d900 0051d5b8 b6aca434 0048c1c0 00000004
c03002c4 c61d7380 00000004
[ 11.037790] 3fa0: aedbb900 c03000c0 0051d5b8 b6aca434 00000003
b6acaf65 00000003 00000000
[ 11.041654] 3fc0: 0051d5b8 b6aca434 0048c1c0 00000004 00000003
b6acaf65 0045b9f4 aedbb900
[ 11.045561] 3fe0: 0048be3c b6aca400 0040b29c aecbe4c0 600f0010
00000003 00000000 00000000
[ 11.049425] __read_once_word_nocheck from unwind_frame+0x680/0xab4
[ 11.052517] unwind_frame from __save_stack_trace+0x70/0x94
[ 11.052560] __save_stack_trace from stack_trace_save+0x84/0xac
[ 11.052595] stack_trace_save from __kfence_alloc+0x11c/0x9d0
[[ [0; 1321.m05 OK 26[030m]] St__kfarentedce O_alpkglo fc firsrot
bootm conf__igkmaurell.
oc+0x460/0x58c
[ 11.065215] __kmalloc from kernfs_fop_write_iter+0xdc/0x240
[ 11.068069] kernfs_fop_write_iter from vfs_write+0x4fc/0x6d0
[ 11.070909] vfs_write from ksys_write+0xbc/0x154
[ 11.073228] ksys_write from ret_fast_syscall+0x0/0x54
[ 11.075812] Code: e8bd8070 eec11e10 e3a00000 e12fff1e (e5900000)
[ 11.079039] ---[ end trace 0000000000000000 ]---
Reported-by: Linux Kernel Functional Testing <[email protected]>
--
Linaro LKFT
https://lkft.linaro.org
[1] https://lkft.validation.linaro.org/scheduler/job/4677964#L600
[2] https://lkft.validation.linaro.org/scheduler/job/4677965#L2390
[3] https://builds.tuxbuild.com/2661dIAPUjE2DMJvye91He2gus0/
On Wed, Mar 09, 2022 at 03:18:12PM +0530, Naresh Kamboju wrote:
> While boting linux next-20220308 on BeagleBoard-X15 and qemu arm the following
> kernel crash reported which is CONFIG_KASAN enabled build [1] & [2].
The unwinder is currently broken in linux-next. Please try reverting
532319b9c418 ("ARM: unwind: disregard unwind info before stack frame is
set up")
Thanks.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Wed, 9 Mar 2022 at 16:07, Russell King (Oracle)
<[email protected]> wrote:
>
> On Wed, Mar 09, 2022 at 03:57:32PM +0100, Ard Biesheuvel wrote:
> > On Wed, 9 Mar 2022 at 15:44, Naresh Kamboju <[email protected]> wrote:
> > >
> > > On Wed, 9 Mar 2022 at 19:37, Naresh Kamboju <[email protected]> wrote:
> > > >
> > > > On Wed, 9 Mar 2022 at 16:16, Ard Biesheuvel <[email protected]> wrote:
> > > > >
> > > > > On Wed, 9 Mar 2022 at 11:37, Russell King (Oracle)
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, Mar 09, 2022 at 03:18:12PM +0530, Naresh Kamboju wrote:
> > > > > > > While boting linux next-20220308 on BeagleBoard-X15 and qemu arm the following
> > > > > > > kernel crash reported which is CONFIG_KASAN enabled build [1] & [2].
> > > > > >
> > > > > > The unwinder is currently broken in linux-next. Please try reverting
> > > > > > 532319b9c418 ("ARM: unwind: disregard unwind info before stack frame is
> > > > > > set up")
> > >
> > > I have reverted the suggested commit and built and boot failed due to reported
> > > kernel crash [1].
> > >
> > > - Naresh
> > >
> >
> > Thanks Naresh,
> >
> > This looks like it might be related to the issue Russell just sent a fix for:
> > https://lore.kernel.org/linux-arm-kernel/CAMj1kXEqp2UmsyUe1eWErtpMk3dGEFZyyno3nqydC_ML0bwTLw@mail.gmail.com/T/#t
> >
> > Could you please try that?
>
> Well, we unwound until:
>
> __irq_svc from migrate_disable+0x0/0x70
>
> and then crashed - and the key thing there is that we're at the start
> of migrate_disable() when we took an interrupt.
>
> For some reason, this triggers an access to address 0x10, which faults.
> We then try unwinding again, and successfully unwind all the way back
> to the same point (the line above) which then causes the unwinder to
> again access address 0x10, and the cycle repeats with the stack
> growing bigger and bigger.
>
> I'd suggest also testing without the revert but with my patch.
>
Indeed.
And as I suggested the other day, maybe it wouldn't be so bad to
harden the vsp dereference, like below:
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -27,6 +27,7 @@
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
+#include <linux/uaccess.h>
#include <linux/list.h>
#include <asm/sections.h>
@@ -236,10 +237,11 @@ static int unwind_pop_register(struct
unwind_ctrl_block *ctrl,
if (*vsp >= (unsigned long *)ctrl->sp_high)
return -URC_FAILURE;
- /* Use READ_ONCE_NOCHECK here to avoid this memory access
- * from being tracked by KASAN.
+ /* Use get_kernel_nofault() here to avoid this memory access
+ * from causing a fatal fault, and from being tracked by KASAN.
*/
- ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
+ if (get_kernel_nofault(ctrl->vrs[reg], *vsp))
+ return -URC_FAILURE;
if (reg == 14)
ctrl->lr_addr = *vsp;
(*vsp)++;
On Wed, Mar 09, 2022 at 03:57:32PM +0100, Ard Biesheuvel wrote:
> On Wed, 9 Mar 2022 at 15:44, Naresh Kamboju <[email protected]> wrote:
> >
> > On Wed, 9 Mar 2022 at 19:37, Naresh Kamboju <[email protected]> wrote:
> > >
> > > On Wed, 9 Mar 2022 at 16:16, Ard Biesheuvel <[email protected]> wrote:
> > > >
> > > > On Wed, 9 Mar 2022 at 11:37, Russell King (Oracle)
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Wed, Mar 09, 2022 at 03:18:12PM +0530, Naresh Kamboju wrote:
> > > > > > While boting linux next-20220308 on BeagleBoard-X15 and qemu arm the following
> > > > > > kernel crash reported which is CONFIG_KASAN enabled build [1] & [2].
> > > > >
> > > > > The unwinder is currently broken in linux-next. Please try reverting
> > > > > 532319b9c418 ("ARM: unwind: disregard unwind info before stack frame is
> > > > > set up")
> >
> > I have reverted the suggested commit and built and boot failed due to reported
> > kernel crash [1].
> >
> > - Naresh
> >
>
> Thanks Naresh,
>
> This looks like it might be related to the issue Russell just sent a fix for:
> https://lore.kernel.org/linux-arm-kernel/CAMj1kXEqp2UmsyUe1eWErtpMk3dGEFZyyno3nqydC_ML0bwTLw@mail.gmail.com/T/#t
>
> Could you please try that?
Well, we unwound until:
__irq_svc from migrate_disable+0x0/0x70
and then crashed - and the key thing there is that we're at the start
of migrate_disable() when we took an interrupt.
For some reason, this triggers an access to address 0x10, which faults.
We then try unwinding again, and successfully unwind all the way back
to the same point (the line above) which then causes the unwinder to
again access address 0x10, and the cycle repeats with the stack
growing bigger and bigger.
I'd suggest also testing without the revert but with my patch.
Thanks.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Hi Russell,
On Wed, 9 Mar 2022 at 20:37, Russell King (Oracle)
<[email protected]> wrote:
>
> On Wed, Mar 09, 2022 at 03:57:32PM +0100, Ard Biesheuvel wrote:
> > On Wed, 9 Mar 2022 at 15:44, Naresh Kamboju <[email protected]> wrote:
> > >
<trim>
> Well, we unwound until:
>
> __irq_svc from migrate_disable+0x0/0x70
>
> and then crashed - and the key thing there is that we're at the start
> of migrate_disable() when we took an interrupt.
>
> For some reason, this triggers an access to address 0x10, which faults.
> We then try unwinding again, and successfully unwind all the way back
> to the same point (the line above) which then causes the unwinder to
> again access address 0x10, and the cycle repeats with the stack
> growing bigger and bigger.
>
> I'd suggest also testing without the revert but with my patch.
I have tested your patch on top of linux next-20220309 and still see kernel
crash as below [1]. build link [2].
[ 26.812060] 8<--- cut here ---
[ 26.813459] Unhandled fault: page domain fault (0x01b) at 0xb6a3ab70
[ 26.816139] [b6a3ab70] *pgd=fb28a835
[ 26.817770] Internal error: : 1b [#1] SMP ARM
[ 26.819636] Modules linked in:
[ 26.820956] CPU: 0 PID: 211 Comm: haveged Not tainted
5.17.0-rc7-next-20220309 #1
[ 26.824519] Hardware name: Generic DT based system
[ 26.827148] PC is at __read_once_word_nocheck+0x0/0x8
[ 26.829856] LR is at unwind_frame+0x7dc/0xab4
- Naresh
[1] https://lkft.validation.linaro.org/scheduler/job/4688599#L596
[2] https://builds.tuxbuild.com/269gYLGuAdmltuLhIUDAjS2fg1Q/
On Wed, 9 Mar 2022 at 20:39, Russell King (Oracle)
<[email protected]> wrote:
>
> On Wed, Mar 09, 2022 at 08:14:30PM +0100, Ard Biesheuvel wrote:
> > On Wed, 9 Mar 2022 at 19:48, Russell King (Oracle)
> > <[email protected]> wrote:
> > >
> > > On Wed, Mar 09, 2022 at 06:43:42PM +0100, Ard Biesheuvel wrote:
> > > > On Wed, 9 Mar 2022 at 18:11, Russell King (Oracle)
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Wed, Mar 09, 2022 at 10:08:25PM +0530, Naresh Kamboju wrote:
> > > > > > Hi Russell,
> > > > > >
> > > > > > On Wed, 9 Mar 2022 at 20:37, Russell King (Oracle)
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, Mar 09, 2022 at 03:57:32PM +0100, Ard Biesheuvel wrote:
> > > > > > > > On Wed, 9 Mar 2022 at 15:44, Naresh Kamboju <[email protected]> wrote:
> > > > > > > > >
> > > > > > <trim>
> > > > > > > Well, we unwound until:
> > > > > > >
> > > > > > > __irq_svc from migrate_disable+0x0/0x70
> > > > > > >
> > > > > > > and then crashed - and the key thing there is that we're at the start
> > > > > > > of migrate_disable() when we took an interrupt.
> > > > > > >
> > > > > > > For some reason, this triggers an access to address 0x10, which faults.
> > > > > > > We then try unwinding again, and successfully unwind all the way back
> > > > > > > to the same point (the line above) which then causes the unwinder to
> > > > > > > again access address 0x10, and the cycle repeats with the stack
> > > > > > > growing bigger and bigger.
> > > > > > >
> > > > > > > I'd suggest also testing without the revert but with my patch.
> > > > > >
> > > > > > I have tested your patch on top of linux next-20220309 and still see kernel
> > > > > > crash as below [1]. build link [2].
> > > > > >
> > > > > > [ 26.812060] 8<--- cut here ---
> > > > > > [ 26.813459] Unhandled fault: page domain fault (0x01b) at 0xb6a3ab70
> > > > > > [ 26.816139] [b6a3ab70] *pgd=fb28a835
> > > > > > [ 26.817770] Internal error: : 1b [#1] SMP ARM
> > > > > > [ 26.819636] Modules linked in:
> > > > > > [ 26.820956] CPU: 0 PID: 211 Comm: haveged Not tainted
> > > > > > 5.17.0-rc7-next-20220309 #1
> > > > > > [ 26.824519] Hardware name: Generic DT based system
> > > > > > [ 26.827148] PC is at __read_once_word_nocheck+0x0/0x8
> > > > > > [ 26.829856] LR is at unwind_frame+0x7dc/0xab4
> > > > > >
> > > > > > - Naresh
> > > > > >
> > > > > > [1] https://lkft.validation.linaro.org/scheduler/job/4688599#L596
> > > > > > [2] https://builds.tuxbuild.com/269gYLGuAdmltuLhIUDAjS2fg1Q/
> > > > >
> > > > > I think the problem has just moved:
> > > > >
> > > > > [ 27.113085] __irq_svc from __copy_to_user_std+0x24/0x378
> > > > >
> > > > > The code at the start of __copy_to_user_std is:
> > > > >
> > > > > 0: e3a034bf mov r3, #-1090519040 ; 0xbf000000
> > > > > 4: e243c001 sub ip, r3, #1
> > > > > 8: e05cc000 subs ip, ip, r0
> > > > > c: 228cc001 addcs ip, ip, #1
> > > > > 10: 205cc002 subscs ip, ip, r2
> > > > > 14: 33a00000 movcc r0, #0
> > > > > 18: e320f014 csdb
> > > > > 1c: e3a03000 mov r3, #0
> > > > > 20: e92d481d push {r0, r2, r3, r4, fp, lr}
> > > > > 24: e1a0b00d mov fp, sp
> > > > >
> > > > > and the unwind information will be:
> > > > >
> > > > > 0xc056f14c <arm_copy_to_user+0x1c>: @0xc0b89b84
> > > > > Compact model index: 1
> > > > > 0x9b vsp = r11
> > > > > 0xb1 0x0d pop {r0, r2, r3}
> > > > > 0x84 0x81 pop {r4, r11, r14}
> > > > > 0xb0 finish
> > > > >
> > > > > The problem is that the unwind information says "starting at offset
> > > > > 0x1c, to unwind do the following operations". The first of which is
> > > > > to move r11 (fp) to the stack pointer. However, r11 isn't setup
> > > > > until function offset 0x24. You've hit that instruction, which hasn't
> > > > > executed yet, but the stack has been modified by pushing r0, r2-r4,
> > > > > fp and lr onto it.
> > > > >
> > > > > Given this, there is no way that the unwinder (as it currently stands)
> > > > > can do its job properly between 0x1c and 0x24.
> > > > >
> > > > > I don't think this is specifically caused by Ard's patches, but by
> > > > > the addition of KASAN, which has the effect of calling the unwinder
> > > > > at random points in the kernel (when an interrupt happens) and it's
> > > > > clear from the above that there are windows in the code where, if
> > > > > we attempt to unwind using the unwind information, we faill fail
> > > > > because the program state is not consistent with the unwind
> > > > > information.
> > > > >
> > > > > Ard's patch that changes:
> > > > >
> > > > > ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
> > > > >
> > > > > to use get_kernel_nofault() should have the effect of protecting
> > > > > against the oops, but the side effect is that it is fundamentally not
> > > > > possible with the way these things are to unwind at these points -
> > > > > which means its not possible to get a stacktrace there.
> > > > >
> > > > > So, I don't think this is a "new" problem, but a weakness of using
> > > > > the unwinder to get a backtrace for KASAN.
> > > > >
> > > >
> > > > It essentially means that we cannot unwind through asynchronous
> > > > exceptions, and so we should probably make the svc_entry macro
> > > > .nounwind, instead of pretending that we can reliably unwind through
> > > > it.
> > >
> > > Doesn't that impact the ability to debug the kernel over things like
> > > oopses and the like?
> > >
> >
> > The backtrace dumped by __die() uses the pt_regs from the exception
> > context as the starting point, so the exception entry code that deals
> > with the condition that triggered the oops is omitted, and does not
> > have to be unwound.
>
> That is true, but that's not really the case I was thinking about.
> I was thinking more about cases such as RCU stalls, soft lockups,
> etc.
>
> For example:
>
> https://www.linuxquestions.org/questions/linux-kernel-70/kenel-v4-4-60-panic-in-igmp6_send-and-and-__neigh_create-4175704721/
>
> In that stack trace, the interesting bits are not the beginning of
> the stack trace down to __irq_svc, but everything beyond __irq_svc,
> since the lockup is probably caused by being stuck in
> _raw_write_lock_bh().
>
> It's these situations that we will totally destroy debuggability for,
> and the only way around that would be to force frame pointers and
> ARM builds (not Thumb-2 as that requires the unwinder... which means
> a Thumb-2 kernel soft lockup would be undebuggable.
>
Indeed.
But that means that the only other choice we have is to retain the
imprecise nature of the current solution (which usually works fine
btw), and simply deal with the faulting double dereference of vsp in
the unwinder code. We simply don't know whether the exception was
taken at a point where the stack frame is consistent with the unwind
data.
On Wed, Mar 09, 2022 at 06:43:42PM +0100, Ard Biesheuvel wrote:
> On Wed, 9 Mar 2022 at 18:11, Russell King (Oracle)
> <[email protected]> wrote:
> >
> > On Wed, Mar 09, 2022 at 10:08:25PM +0530, Naresh Kamboju wrote:
> > > Hi Russell,
> > >
> > > On Wed, 9 Mar 2022 at 20:37, Russell King (Oracle)
> > > <[email protected]> wrote:
> > > >
> > > > On Wed, Mar 09, 2022 at 03:57:32PM +0100, Ard Biesheuvel wrote:
> > > > > On Wed, 9 Mar 2022 at 15:44, Naresh Kamboju <[email protected]> wrote:
> > > > > >
> > > <trim>
> > > > Well, we unwound until:
> > > >
> > > > __irq_svc from migrate_disable+0x0/0x70
> > > >
> > > > and then crashed - and the key thing there is that we're at the start
> > > > of migrate_disable() when we took an interrupt.
> > > >
> > > > For some reason, this triggers an access to address 0x10, which faults.
> > > > We then try unwinding again, and successfully unwind all the way back
> > > > to the same point (the line above) which then causes the unwinder to
> > > > again access address 0x10, and the cycle repeats with the stack
> > > > growing bigger and bigger.
> > > >
> > > > I'd suggest also testing without the revert but with my patch.
> > >
> > > I have tested your patch on top of linux next-20220309 and still see kernel
> > > crash as below [1]. build link [2].
> > >
> > > [ 26.812060] 8<--- cut here ---
> > > [ 26.813459] Unhandled fault: page domain fault (0x01b) at 0xb6a3ab70
> > > [ 26.816139] [b6a3ab70] *pgd=fb28a835
> > > [ 26.817770] Internal error: : 1b [#1] SMP ARM
> > > [ 26.819636] Modules linked in:
> > > [ 26.820956] CPU: 0 PID: 211 Comm: haveged Not tainted
> > > 5.17.0-rc7-next-20220309 #1
> > > [ 26.824519] Hardware name: Generic DT based system
> > > [ 26.827148] PC is at __read_once_word_nocheck+0x0/0x8
> > > [ 26.829856] LR is at unwind_frame+0x7dc/0xab4
> > >
> > > - Naresh
> > >
> > > [1] https://lkft.validation.linaro.org/scheduler/job/4688599#L596
> > > [2] https://builds.tuxbuild.com/269gYLGuAdmltuLhIUDAjS2fg1Q/
> >
> > I think the problem has just moved:
> >
> > [ 27.113085] __irq_svc from __copy_to_user_std+0x24/0x378
> >
> > The code at the start of __copy_to_user_std is:
> >
> > 0: e3a034bf mov r3, #-1090519040 ; 0xbf000000
> > 4: e243c001 sub ip, r3, #1
> > 8: e05cc000 subs ip, ip, r0
> > c: 228cc001 addcs ip, ip, #1
> > 10: 205cc002 subscs ip, ip, r2
> > 14: 33a00000 movcc r0, #0
> > 18: e320f014 csdb
> > 1c: e3a03000 mov r3, #0
> > 20: e92d481d push {r0, r2, r3, r4, fp, lr}
> > 24: e1a0b00d mov fp, sp
> >
> > and the unwind information will be:
> >
> > 0xc056f14c <arm_copy_to_user+0x1c>: @0xc0b89b84
> > Compact model index: 1
> > 0x9b vsp = r11
> > 0xb1 0x0d pop {r0, r2, r3}
> > 0x84 0x81 pop {r4, r11, r14}
> > 0xb0 finish
> >
> > The problem is that the unwind information says "starting at offset
> > 0x1c, to unwind do the following operations". The first of which is
> > to move r11 (fp) to the stack pointer. However, r11 isn't setup
> > until function offset 0x24. You've hit that instruction, which hasn't
> > executed yet, but the stack has been modified by pushing r0, r2-r4,
> > fp and lr onto it.
> >
> > Given this, there is no way that the unwinder (as it currently stands)
> > can do its job properly between 0x1c and 0x24.
> >
> > I don't think this is specifically caused by Ard's patches, but by
> > the addition of KASAN, which has the effect of calling the unwinder
> > at random points in the kernel (when an interrupt happens) and it's
> > clear from the above that there are windows in the code where, if
> > we attempt to unwind using the unwind information, we faill fail
> > because the program state is not consistent with the unwind
> > information.
> >
> > Ard's patch that changes:
> >
> > ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
> >
> > to use get_kernel_nofault() should have the effect of protecting
> > against the oops, but the side effect is that it is fundamentally not
> > possible with the way these things are to unwind at these points -
> > which means its not possible to get a stacktrace there.
> >
> > So, I don't think this is a "new" problem, but a weakness of using
> > the unwinder to get a backtrace for KASAN.
> >
>
> It essentially means that we cannot unwind through asynchronous
> exceptions, and so we should probably make the svc_entry macro
> .nounwind, instead of pretending that we can reliably unwind through
> it.
Doesn't that impact the ability to debug the kernel over things like
oopses and the like?
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Wed, 9 Mar 2022 at 17:38, Naresh Kamboju <[email protected]> wrote:
>
> Hi Russell,
>
> On Wed, 9 Mar 2022 at 20:37, Russell King (Oracle)
> <[email protected]> wrote:
> >
> > On Wed, Mar 09, 2022 at 03:57:32PM +0100, Ard Biesheuvel wrote:
> > > On Wed, 9 Mar 2022 at 15:44, Naresh Kamboju <[email protected]> wrote:
> > > >
> <trim>
> > Well, we unwound until:
> >
> > __irq_svc from migrate_disable+0x0/0x70
> >
> > and then crashed - and the key thing there is that we're at the start
> > of migrate_disable() when we took an interrupt.
> >
> > For some reason, this triggers an access to address 0x10, which faults.
> > We then try unwinding again, and successfully unwind all the way back
> > to the same point (the line above) which then causes the unwinder to
> > again access address 0x10, and the cycle repeats with the stack
> > growing bigger and bigger.
> >
> > I'd suggest also testing without the revert but with my patch.
>
> I have tested your patch on top of linux next-20220309 and still see kernel
> crash as below [1]. build link [2].
>
> [ 26.812060] 8<--- cut here ---
> [ 26.813459] Unhandled fault: page domain fault (0x01b) at 0xb6a3ab70
> [ 26.816139] [b6a3ab70] *pgd=fb28a835
> [ 26.817770] Internal error: : 1b [#1] SMP ARM
> [ 26.819636] Modules linked in:
> [ 26.820956] CPU: 0 PID: 211 Comm: haveged Not tainted
> 5.17.0-rc7-next-20220309 #1
> [ 26.824519] Hardware name: Generic DT based system
> [ 26.827148] PC is at __read_once_word_nocheck+0x0/0x8
> [ 26.829856] LR is at unwind_frame+0x7dc/0xab4
>
Thanks Naresh,
Is this something that could be bisected? With the unwind change
reverted, it is not obvious where the problem originated. Also, could
you check whether it is reproducible on the ARM tree? (for-next on
git://git.armlinux.org.uk/~rmk/linux-arm.git)
Unfortunately, after having been able to reproduce this locally, the
issue went away when I did the revert, so I am no longer seeing the
problem.
--
Ard.
On Wed, Mar 09, 2022 at 08:14:30PM +0100, Ard Biesheuvel wrote:
> On Wed, 9 Mar 2022 at 19:48, Russell King (Oracle)
> <[email protected]> wrote:
> >
> > On Wed, Mar 09, 2022 at 06:43:42PM +0100, Ard Biesheuvel wrote:
> > > On Wed, 9 Mar 2022 at 18:11, Russell King (Oracle)
> > > <[email protected]> wrote:
> > > >
> > > > On Wed, Mar 09, 2022 at 10:08:25PM +0530, Naresh Kamboju wrote:
> > > > > Hi Russell,
> > > > >
> > > > > On Wed, 9 Mar 2022 at 20:37, Russell King (Oracle)
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, Mar 09, 2022 at 03:57:32PM +0100, Ard Biesheuvel wrote:
> > > > > > > On Wed, 9 Mar 2022 at 15:44, Naresh Kamboju <[email protected]> wrote:
> > > > > > > >
> > > > > <trim>
> > > > > > Well, we unwound until:
> > > > > >
> > > > > > __irq_svc from migrate_disable+0x0/0x70
> > > > > >
> > > > > > and then crashed - and the key thing there is that we're at the start
> > > > > > of migrate_disable() when we took an interrupt.
> > > > > >
> > > > > > For some reason, this triggers an access to address 0x10, which faults.
> > > > > > We then try unwinding again, and successfully unwind all the way back
> > > > > > to the same point (the line above) which then causes the unwinder to
> > > > > > again access address 0x10, and the cycle repeats with the stack
> > > > > > growing bigger and bigger.
> > > > > >
> > > > > > I'd suggest also testing without the revert but with my patch.
> > > > >
> > > > > I have tested your patch on top of linux next-20220309 and still see kernel
> > > > > crash as below [1]. build link [2].
> > > > >
> > > > > [ 26.812060] 8<--- cut here ---
> > > > > [ 26.813459] Unhandled fault: page domain fault (0x01b) at 0xb6a3ab70
> > > > > [ 26.816139] [b6a3ab70] *pgd=fb28a835
> > > > > [ 26.817770] Internal error: : 1b [#1] SMP ARM
> > > > > [ 26.819636] Modules linked in:
> > > > > [ 26.820956] CPU: 0 PID: 211 Comm: haveged Not tainted
> > > > > 5.17.0-rc7-next-20220309 #1
> > > > > [ 26.824519] Hardware name: Generic DT based system
> > > > > [ 26.827148] PC is at __read_once_word_nocheck+0x0/0x8
> > > > > [ 26.829856] LR is at unwind_frame+0x7dc/0xab4
> > > > >
> > > > > - Naresh
> > > > >
> > > > > [1] https://lkft.validation.linaro.org/scheduler/job/4688599#L596
> > > > > [2] https://builds.tuxbuild.com/269gYLGuAdmltuLhIUDAjS2fg1Q/
> > > >
> > > > I think the problem has just moved:
> > > >
> > > > [ 27.113085] __irq_svc from __copy_to_user_std+0x24/0x378
> > > >
> > > > The code at the start of __copy_to_user_std is:
> > > >
> > > > 0: e3a034bf mov r3, #-1090519040 ; 0xbf000000
> > > > 4: e243c001 sub ip, r3, #1
> > > > 8: e05cc000 subs ip, ip, r0
> > > > c: 228cc001 addcs ip, ip, #1
> > > > 10: 205cc002 subscs ip, ip, r2
> > > > 14: 33a00000 movcc r0, #0
> > > > 18: e320f014 csdb
> > > > 1c: e3a03000 mov r3, #0
> > > > 20: e92d481d push {r0, r2, r3, r4, fp, lr}
> > > > 24: e1a0b00d mov fp, sp
> > > >
> > > > and the unwind information will be:
> > > >
> > > > 0xc056f14c <arm_copy_to_user+0x1c>: @0xc0b89b84
> > > > Compact model index: 1
> > > > 0x9b vsp = r11
> > > > 0xb1 0x0d pop {r0, r2, r3}
> > > > 0x84 0x81 pop {r4, r11, r14}
> > > > 0xb0 finish
> > > >
> > > > The problem is that the unwind information says "starting at offset
> > > > 0x1c, to unwind do the following operations". The first of which is
> > > > to move r11 (fp) to the stack pointer. However, r11 isn't setup
> > > > until function offset 0x24. You've hit that instruction, which hasn't
> > > > executed yet, but the stack has been modified by pushing r0, r2-r4,
> > > > fp and lr onto it.
> > > >
> > > > Given this, there is no way that the unwinder (as it currently stands)
> > > > can do its job properly between 0x1c and 0x24.
> > > >
> > > > I don't think this is specifically caused by Ard's patches, but by
> > > > the addition of KASAN, which has the effect of calling the unwinder
> > > > at random points in the kernel (when an interrupt happens) and it's
> > > > clear from the above that there are windows in the code where, if
> > > > we attempt to unwind using the unwind information, we faill fail
> > > > because the program state is not consistent with the unwind
> > > > information.
> > > >
> > > > Ard's patch that changes:
> > > >
> > > > ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
> > > >
> > > > to use get_kernel_nofault() should have the effect of protecting
> > > > against the oops, but the side effect is that it is fundamentally not
> > > > possible with the way these things are to unwind at these points -
> > > > which means its not possible to get a stacktrace there.
> > > >
> > > > So, I don't think this is a "new" problem, but a weakness of using
> > > > the unwinder to get a backtrace for KASAN.
> > > >
> > >
> > > It essentially means that we cannot unwind through asynchronous
> > > exceptions, and so we should probably make the svc_entry macro
> > > .nounwind, instead of pretending that we can reliably unwind through
> > > it.
> >
> > Doesn't that impact the ability to debug the kernel over things like
> > oopses and the like?
> >
>
> The backtrace dumped by __die() uses the pt_regs from the exception
> context as the starting point, so the exception entry code that deals
> with the condition that triggered the oops is omitted, and does not
> have to be unwound.
That is true, but that's not really the case I was thinking about.
I was thinking more about cases such as RCU stalls, soft lockups,
etc.
For example:
https://www.linuxquestions.org/questions/linux-kernel-70/kenel-v4-4-60-panic-in-igmp6_send-and-and-__neigh_create-4175704721/
In that stack trace, the interesting bits are not the beginning of
the stack trace down to __irq_svc, but everything beyond __irq_svc,
since the lockup is probably caused by being stuck in
_raw_write_lock_bh().
It's these situations that we will totally destroy debuggability for,
and the only way around that would be to force frame pointers and
ARM builds (not Thumb-2 as that requires the unwinder... which means
a Thumb-2 kernel soft lockup would be undebuggable.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Wed, 9 Mar 2022 at 19:48, Russell King (Oracle)
<[email protected]> wrote:
>
> On Wed, Mar 09, 2022 at 06:43:42PM +0100, Ard Biesheuvel wrote:
> > On Wed, 9 Mar 2022 at 18:11, Russell King (Oracle)
> > <[email protected]> wrote:
> > >
> > > On Wed, Mar 09, 2022 at 10:08:25PM +0530, Naresh Kamboju wrote:
> > > > Hi Russell,
> > > >
> > > > On Wed, 9 Mar 2022 at 20:37, Russell King (Oracle)
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Wed, Mar 09, 2022 at 03:57:32PM +0100, Ard Biesheuvel wrote:
> > > > > > On Wed, 9 Mar 2022 at 15:44, Naresh Kamboju <[email protected]> wrote:
> > > > > > >
> > > > <trim>
> > > > > Well, we unwound until:
> > > > >
> > > > > __irq_svc from migrate_disable+0x0/0x70
> > > > >
> > > > > and then crashed - and the key thing there is that we're at the start
> > > > > of migrate_disable() when we took an interrupt.
> > > > >
> > > > > For some reason, this triggers an access to address 0x10, which faults.
> > > > > We then try unwinding again, and successfully unwind all the way back
> > > > > to the same point (the line above) which then causes the unwinder to
> > > > > again access address 0x10, and the cycle repeats with the stack
> > > > > growing bigger and bigger.
> > > > >
> > > > > I'd suggest also testing without the revert but with my patch.
> > > >
> > > > I have tested your patch on top of linux next-20220309 and still see kernel
> > > > crash as below [1]. build link [2].
> > > >
> > > > [ 26.812060] 8<--- cut here ---
> > > > [ 26.813459] Unhandled fault: page domain fault (0x01b) at 0xb6a3ab70
> > > > [ 26.816139] [b6a3ab70] *pgd=fb28a835
> > > > [ 26.817770] Internal error: : 1b [#1] SMP ARM
> > > > [ 26.819636] Modules linked in:
> > > > [ 26.820956] CPU: 0 PID: 211 Comm: haveged Not tainted
> > > > 5.17.0-rc7-next-20220309 #1
> > > > [ 26.824519] Hardware name: Generic DT based system
> > > > [ 26.827148] PC is at __read_once_word_nocheck+0x0/0x8
> > > > [ 26.829856] LR is at unwind_frame+0x7dc/0xab4
> > > >
> > > > - Naresh
> > > >
> > > > [1] https://lkft.validation.linaro.org/scheduler/job/4688599#L596
> > > > [2] https://builds.tuxbuild.com/269gYLGuAdmltuLhIUDAjS2fg1Q/
> > >
> > > I think the problem has just moved:
> > >
> > > [ 27.113085] __irq_svc from __copy_to_user_std+0x24/0x378
> > >
> > > The code at the start of __copy_to_user_std is:
> > >
> > > 0: e3a034bf mov r3, #-1090519040 ; 0xbf000000
> > > 4: e243c001 sub ip, r3, #1
> > > 8: e05cc000 subs ip, ip, r0
> > > c: 228cc001 addcs ip, ip, #1
> > > 10: 205cc002 subscs ip, ip, r2
> > > 14: 33a00000 movcc r0, #0
> > > 18: e320f014 csdb
> > > 1c: e3a03000 mov r3, #0
> > > 20: e92d481d push {r0, r2, r3, r4, fp, lr}
> > > 24: e1a0b00d mov fp, sp
> > >
> > > and the unwind information will be:
> > >
> > > 0xc056f14c <arm_copy_to_user+0x1c>: @0xc0b89b84
> > > Compact model index: 1
> > > 0x9b vsp = r11
> > > 0xb1 0x0d pop {r0, r2, r3}
> > > 0x84 0x81 pop {r4, r11, r14}
> > > 0xb0 finish
> > >
> > > The problem is that the unwind information says "starting at offset
> > > 0x1c, to unwind do the following operations". The first of which is
> > > to move r11 (fp) to the stack pointer. However, r11 isn't setup
> > > until function offset 0x24. You've hit that instruction, which hasn't
> > > executed yet, but the stack has been modified by pushing r0, r2-r4,
> > > fp and lr onto it.
> > >
> > > Given this, there is no way that the unwinder (as it currently stands)
> > > can do its job properly between 0x1c and 0x24.
> > >
> > > I don't think this is specifically caused by Ard's patches, but by
> > > the addition of KASAN, which has the effect of calling the unwinder
> > > at random points in the kernel (when an interrupt happens) and it's
> > > clear from the above that there are windows in the code where, if
> > > we attempt to unwind using the unwind information, we faill fail
> > > because the program state is not consistent with the unwind
> > > information.
> > >
> > > Ard's patch that changes:
> > >
> > > ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
> > >
> > > to use get_kernel_nofault() should have the effect of protecting
> > > against the oops, but the side effect is that it is fundamentally not
> > > possible with the way these things are to unwind at these points -
> > > which means its not possible to get a stacktrace there.
> > >
> > > So, I don't think this is a "new" problem, but a weakness of using
> > > the unwinder to get a backtrace for KASAN.
> > >
> >
> > It essentially means that we cannot unwind through asynchronous
> > exceptions, and so we should probably make the svc_entry macro
> > .nounwind, instead of pretending that we can reliably unwind through
> > it.
>
> Doesn't that impact the ability to debug the kernel over things like
> oopses and the like?
>
The backtrace dumped by __die() uses the pt_regs from the exception
context as the starting point, so the exception entry code that deals
with the condition that triggered the oops is omitted, and does not
have to be unwound.
There may be other cases where less context is being provided with
this change, but it is basically the conclusion drawn in this thread
that the ARM EHABI unwinder cannot reliably do so anyway.
On Wed, 9 Mar 2022 at 18:11, Russell King (Oracle)
<[email protected]> wrote:
>
> On Wed, Mar 09, 2022 at 10:08:25PM +0530, Naresh Kamboju wrote:
> > Hi Russell,
> >
> > On Wed, 9 Mar 2022 at 20:37, Russell King (Oracle)
> > <[email protected]> wrote:
> > >
> > > On Wed, Mar 09, 2022 at 03:57:32PM +0100, Ard Biesheuvel wrote:
> > > > On Wed, 9 Mar 2022 at 15:44, Naresh Kamboju <[email protected]> wrote:
> > > > >
> > <trim>
> > > Well, we unwound until:
> > >
> > > __irq_svc from migrate_disable+0x0/0x70
> > >
> > > and then crashed - and the key thing there is that we're at the start
> > > of migrate_disable() when we took an interrupt.
> > >
> > > For some reason, this triggers an access to address 0x10, which faults.
> > > We then try unwinding again, and successfully unwind all the way back
> > > to the same point (the line above) which then causes the unwinder to
> > > again access address 0x10, and the cycle repeats with the stack
> > > growing bigger and bigger.
> > >
> > > I'd suggest also testing without the revert but with my patch.
> >
> > I have tested your patch on top of linux next-20220309 and still see kernel
> > crash as below [1]. build link [2].
> >
> > [ 26.812060] 8<--- cut here ---
> > [ 26.813459] Unhandled fault: page domain fault (0x01b) at 0xb6a3ab70
> > [ 26.816139] [b6a3ab70] *pgd=fb28a835
> > [ 26.817770] Internal error: : 1b [#1] SMP ARM
> > [ 26.819636] Modules linked in:
> > [ 26.820956] CPU: 0 PID: 211 Comm: haveged Not tainted
> > 5.17.0-rc7-next-20220309 #1
> > [ 26.824519] Hardware name: Generic DT based system
> > [ 26.827148] PC is at __read_once_word_nocheck+0x0/0x8
> > [ 26.829856] LR is at unwind_frame+0x7dc/0xab4
> >
> > - Naresh
> >
> > [1] https://lkft.validation.linaro.org/scheduler/job/4688599#L596
> > [2] https://builds.tuxbuild.com/269gYLGuAdmltuLhIUDAjS2fg1Q/
>
> I think the problem has just moved:
>
> [ 27.113085] __irq_svc from __copy_to_user_std+0x24/0x378
>
> The code at the start of __copy_to_user_std is:
>
> 0: e3a034bf mov r3, #-1090519040 ; 0xbf000000
> 4: e243c001 sub ip, r3, #1
> 8: e05cc000 subs ip, ip, r0
> c: 228cc001 addcs ip, ip, #1
> 10: 205cc002 subscs ip, ip, r2
> 14: 33a00000 movcc r0, #0
> 18: e320f014 csdb
> 1c: e3a03000 mov r3, #0
> 20: e92d481d push {r0, r2, r3, r4, fp, lr}
> 24: e1a0b00d mov fp, sp
>
> and the unwind information will be:
>
> 0xc056f14c <arm_copy_to_user+0x1c>: @0xc0b89b84
> Compact model index: 1
> 0x9b vsp = r11
> 0xb1 0x0d pop {r0, r2, r3}
> 0x84 0x81 pop {r4, r11, r14}
> 0xb0 finish
>
> The problem is that the unwind information says "starting at offset
> 0x1c, to unwind do the following operations". The first of which is
> to move r11 (fp) to the stack pointer. However, r11 isn't setup
> until function offset 0x24. You've hit that instruction, which hasn't
> executed yet, but the stack has been modified by pushing r0, r2-r4,
> fp and lr onto it.
>
> Given this, there is no way that the unwinder (as it currently stands)
> can do its job properly between 0x1c and 0x24.
>
> I don't think this is specifically caused by Ard's patches, but by
> the addition of KASAN, which has the effect of calling the unwinder
> at random points in the kernel (when an interrupt happens) and it's
> clear from the above that there are windows in the code where, if
> we attempt to unwind using the unwind information, we faill fail
> because the program state is not consistent with the unwind
> information.
>
> Ard's patch that changes:
>
> ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
>
> to use get_kernel_nofault() should have the effect of protecting
> against the oops, but the side effect is that it is fundamentally not
> possible with the way these things are to unwind at these points -
> which means its not possible to get a stacktrace there.
>
> So, I don't think this is a "new" problem, but a weakness of using
> the unwinder to get a backtrace for KASAN.
>
It essentially means that we cannot unwind through asynchronous
exceptions, and so we should probably make the svc_entry macro
.nounwind, instead of pretending that we can reliably unwind through
it.
For these annotation purposes, where the interrupt was taken is not
terribly interesting anyway, terminating the stacktrace earlier might
even recover some performance lost to KASAN overhead.
Naresh, please try the patch below.
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 5609ca8ae46a..0d8ae1a14643 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -184,12 +184,12 @@ ENDPROC(__und_invalid)
.macro svc_entry, stack_hole=0, trace=1, uaccess=1, overflow_check=1
UNWIND(.fnstart )
+ UNWIND(.cantunwind )
sub sp, sp, #(SVC_REGS_SIZE + \stack_hole)
THUMB( add sp, r1 ) @ get SP in a GPR without
THUMB( sub r1, sp, r1 ) @ using a temp register
.if \overflow_check
- UNWIND(.save {r0 - pc} )
do_overflow_check (SVC_REGS_SIZE + \stack_hole)
.endif
On Thu, 10 Mar 2022 at 14:01, Russell King (Oracle)
<[email protected]> wrote:
>
> On Thu, Mar 10, 2022 at 12:35:55PM +0000, Russell King (Oracle) wrote:
> > On Wed, Mar 09, 2022 at 09:42:29PM +0100, Ard Biesheuvel wrote:
> > > On Wed, 9 Mar 2022 at 20:39, Russell King (Oracle)
> > > <[email protected]> wrote:
> > > >
> > > > On Wed, Mar 09, 2022 at 08:14:30PM +0100, Ard Biesheuvel wrote:
> > > > > The backtrace dumped by __die() uses the pt_regs from the exception
> > > > > context as the starting point, so the exception entry code that deals
> > > > > with the condition that triggered the oops is omitted, and does not
> > > > > have to be unwound.
> > > >
> > > > That is true, but that's not really the case I was thinking about.
> > > > I was thinking more about cases such as RCU stalls, soft lockups,
> > > > etc.
> > > >
> > > > For example:
> > > >
> > > > https://www.linuxquestions.org/questions/linux-kernel-70/kenel-v4-4-60-panic-in-igmp6_send-and-and-__neigh_create-4175704721/
> > > >
> > > > In that stack trace, the interesting bits are not the beginning of
> > > > the stack trace down to __irq_svc, but everything beyond __irq_svc,
> > > > since the lockup is probably caused by being stuck in
> > > > _raw_write_lock_bh().
> > > >
> > > > It's these situations that we will totally destroy debuggability for,
> > > > and the only way around that would be to force frame pointers and
> > > > ARM builds (not Thumb-2 as that requires the unwinder... which means
> > > > a Thumb-2 kernel soft lockup would be undebuggable.
> > > >
> > >
> > > Indeed.
> > >
> > > But that means that the only other choice we have is to retain the
> > > imprecise nature of the current solution (which usually works fine
> > > btw), and simply deal with the faulting double dereference of vsp in
> > > the unwinder code. We simply don't know whether the exception was
> > > taken at a point where the stack frame is consistent with the unwind
> > > data.
> >
> > Okay, further analysis (for the record, since I've said much of this on
> > IRC):
> >
> > What we have currently is a robust unwinder that will cope when things
> > go wrong, such as an interrupt taken during the prologue of a function.
> > The way it copes is by two mechanisms:
> >
> > /* store the highest address on the stack to avoid crossing it*/
> > low = frame->sp;
> > ctrl.sp_high = ALIGN(low, THREAD_SIZE);
> >
> > These two represent the allowable bounds of the kernel stack. When we
> > run the unwinder, before each unwind instruction we check whether the
> > current SP value is getting close to the top of the kernel stack, and
> > if so, turn on additional checking:
> >
> > if ((ctrl.sp_high - ctrl.vrs[SP]) < sizeof(ctrl.vrs))
> > ctrl.check_each_pop = 1;
> >
> > that will ensure if we go off the top of the kernel stack, the
> > unwinder will report failure, and not access those addresses.
> >
> > After each instruction, we check whether the SP value is within the
> > above bounds:
> >
> > if (ctrl.vrs[SP] < low || ctrl.vrs[SP] >= ctrl.sp_high)
> > return -URC_FAILURE;
> >
> > This means that the unwinder can never modify SP to point outside of
> > the kernel stack region identified by low..ctrl.sp_high, thereby
> > protecting the load inside unwind_pop_register() from ever
> > dereferencing something outside of the kernel stack. Moreover, it also
> > prevents the unwinder modifying SP to point below the current stack
> > frame.
> >
> > The problem has been introduced by trying to make the unwinder cope
> > with IRQ stacks in b6506981f880 ("ARM: unwind: support unwinding across
> > multiple stacks"):
> >
> > - if (!load_sp)
> > + if (!load_sp) {
> > ctrl->vrs[SP] = (unsigned long)vsp;
> > + } else {
> > + ctrl->sp_low = ctrl->vrs[SP];
> > + ctrl->sp_high = ALIGN(ctrl->sp_low, THREAD_SIZE);
> > + }
> >
> > Now, whenever SP is loaded, we reset the allowable range for the SP
> > value, and this completely defeats the protections we previously had
> > which were ensuring that:
> >
> > 1) the SP value doesn't jump back _down_ the kernel stack resulting
> > in an infinite unwind loop.
> > 2) the SP value doesn't end up outside the kernel stack.
> >
> > We need those protections to prevent these problems that are being
> > reported - and the most efficient way I can think of doing that is to
> > somehow valudate the new SP value _before_ we modify sp_low and
> > sp_high, so these two limits are always valid.
> >
> > Merely changing the READ_ONCE_NOCHECK() to be get_kernel_nocheck()
> > will only partly fix this problem - it will stop the unwinder oopsing
> > the kernel, but definitely doesn't protect against (1) and doesn't
> > protect against SP pointing at some thing that is accessible (e.g.
> > a device or other kernel memory.)
> >
> > We're yet again at Thursday, with the last linux-next prior to the
> > merge window being created this evening, which really doesn't leave
> > much time to get this sorted... and we can't say "this code should
> > have been in earlier in the cycle" this time around, because these
> > changes to the unwinder have been present in linux-next prior to
> > 5.17-rc2. Annoyingly, it seems merging stuff earlier in the cycle
> > doesn't actually solve the problem of these last minute debugging
> > panics.
> >
> > Any suggestions for what we should do? Can we come up with some way
> > to validate the new SP value before 6pm UTC this evening?
>
> Also, looking deeper at the last linaro job report:
>
> https://lkft.validation.linaro.org/scheduler/job/4688599#L684
>
> the dumped memory doesn't look like an exception stack. If it was,
> e82aab40 would be the saved CPSR value and c388eb80 would be the PC
> value, both of which are nonsense.
>
> The stack that we were in (and we dumped out in full) was:
>
> Stack: (0xc381bb30 to 0xc381c000)
>
This is the IRQ stack that we switch to in
[ 29.198048] irq_exit from __irq_svc+0x70/0x8c
[ 29.200896] Exception stack(0xc5fcfc98 to 0xc5fcfce0)
> and the exception stack (the saved pt_regs) is:
>
> r0 r1 r2 r3 r4
> bfa0: c2ba47c0 0000000a c2ba1358 ffff9537 c2c05d00 00400140 c62d5624 c1948b20
> r5 r6 r7 r8 r9 r10 fp ip
> bfc0: e82ab498 c62d5400 c35377a0 c62d5404 25706000 c381bfe8 c62d5400 00000001
> sp lr pc cpsr orig_r0
> bfe0: c5fcfc48 c036251c c0995a14 20040013 ffffffff c5fcfc7c c62d5400 c0300bf0
>
I don't follow this - which pt_regs is this?
> but, we end up dumping out:
>
> Exception stack(0xc5fcfc98 to 0xc5fcfce0)
> fc80: b6a3ab70 00000004
> fca0: 00000000 00000004 b6a3ab70 c055f928 c388eb80 c5fcfd40 00000000 c5fcfd50
> fcc0: 00000005 00000051 e82aad1c c03ae570 00000000 c388eb80 c3512a20 e82aab40
>
> Firstly, that's in the wrong stack to be dumping for the exception
> stack, and secondly, why is it 0x50 bytes above the saved SP value from
> the real exception stack - that makes no sense in itself.
>
This is the exception stack that we were on when taking an IRQ and
subsequently switching to the IRQ stack.
On Wed, Mar 09, 2022 at 09:42:29PM +0100, Ard Biesheuvel wrote:
> On Wed, 9 Mar 2022 at 20:39, Russell King (Oracle)
> <[email protected]> wrote:
> >
> > On Wed, Mar 09, 2022 at 08:14:30PM +0100, Ard Biesheuvel wrote:
> > > The backtrace dumped by __die() uses the pt_regs from the exception
> > > context as the starting point, so the exception entry code that deals
> > > with the condition that triggered the oops is omitted, and does not
> > > have to be unwound.
> >
> > That is true, but that's not really the case I was thinking about.
> > I was thinking more about cases such as RCU stalls, soft lockups,
> > etc.
> >
> > For example:
> >
> > https://www.linuxquestions.org/questions/linux-kernel-70/kenel-v4-4-60-panic-in-igmp6_send-and-and-__neigh_create-4175704721/
> >
> > In that stack trace, the interesting bits are not the beginning of
> > the stack trace down to __irq_svc, but everything beyond __irq_svc,
> > since the lockup is probably caused by being stuck in
> > _raw_write_lock_bh().
> >
> > It's these situations that we will totally destroy debuggability for,
> > and the only way around that would be to force frame pointers and
> > ARM builds (not Thumb-2 as that requires the unwinder... which means
> > a Thumb-2 kernel soft lockup would be undebuggable.
> >
>
> Indeed.
>
> But that means that the only other choice we have is to retain the
> imprecise nature of the current solution (which usually works fine
> btw), and simply deal with the faulting double dereference of vsp in
> the unwinder code. We simply don't know whether the exception was
> taken at a point where the stack frame is consistent with the unwind
> data.
Okay, further analysis (for the record, since I've said much of this on
IRC):
What we have currently is a robust unwinder that will cope when things
go wrong, such as an interrupt taken during the prologue of a function.
The way it copes is by two mechanisms:
/* store the highest address on the stack to avoid crossing it*/
low = frame->sp;
ctrl.sp_high = ALIGN(low, THREAD_SIZE);
These two represent the allowable bounds of the kernel stack. When we
run the unwinder, before each unwind instruction we check whether the
current SP value is getting close to the top of the kernel stack, and
if so, turn on additional checking:
if ((ctrl.sp_high - ctrl.vrs[SP]) < sizeof(ctrl.vrs))
ctrl.check_each_pop = 1;
that will ensure if we go off the top of the kernel stack, the
unwinder will report failure, and not access those addresses.
After each instruction, we check whether the SP value is within the
above bounds:
if (ctrl.vrs[SP] < low || ctrl.vrs[SP] >= ctrl.sp_high)
return -URC_FAILURE;
This means that the unwinder can never modify SP to point outside of
the kernel stack region identified by low..ctrl.sp_high, thereby
protecting the load inside unwind_pop_register() from ever
dereferencing something outside of the kernel stack. Moreover, it also
prevents the unwinder modifying SP to point below the current stack
frame.
The problem has been introduced by trying to make the unwinder cope
with IRQ stacks in b6506981f880 ("ARM: unwind: support unwinding across
multiple stacks"):
- if (!load_sp)
+ if (!load_sp) {
ctrl->vrs[SP] = (unsigned long)vsp;
+ } else {
+ ctrl->sp_low = ctrl->vrs[SP];
+ ctrl->sp_high = ALIGN(ctrl->sp_low, THREAD_SIZE);
+ }
Now, whenever SP is loaded, we reset the allowable range for the SP
value, and this completely defeats the protections we previously had
which were ensuring that:
1) the SP value doesn't jump back _down_ the kernel stack resulting
in an infinite unwind loop.
2) the SP value doesn't end up outside the kernel stack.
We need those protections to prevent these problems that are being
reported - and the most efficient way I can think of doing that is to
somehow valudate the new SP value _before_ we modify sp_low and
sp_high, so these two limits are always valid.
Merely changing the READ_ONCE_NOCHECK() to be get_kernel_nocheck()
will only partly fix this problem - it will stop the unwinder oopsing
the kernel, but definitely doesn't protect against (1) and doesn't
protect against SP pointing at some thing that is accessible (e.g.
a device or other kernel memory.)
We're yet again at Thursday, with the last linux-next prior to the
merge window being created this evening, which really doesn't leave
much time to get this sorted... and we can't say "this code should
have been in earlier in the cycle" this time around, because these
changes to the unwinder have been present in linux-next prior to
5.17-rc2. Annoyingly, it seems merging stuff earlier in the cycle
doesn't actually solve the problem of these last minute debugging
panics.
Any suggestions for what we should do? Can we come up with some way
to validate the new SP value before 6pm UTC this evening?
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Hi Ard and Russell,
The boot test pass on linux next-20220310 tag with KASAN=y on BeagleBoard x15
device. but LTP cve tests reproduced the reported kernel crash [1].
From the available historical data I can confirm that this is an
intermittent issue on
BeagleBoard x15 devices.
OTOH, the kernel crash is always reproducible on qemu-arm with KASAN=y
while booting which has been known to fail for a long time.
From the Ardb tree I have boot tested qemu-arm with KASAN=y the reported
kernel crash is always reproducible.
The build steps [3] and extra Kconfigs.
- Naresh
[1] https://lkft.validation.linaro.org/scheduler/job/4701310
[2] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/
[3] https://builds.tuxbuild.com/2661dIAPUjE2DMJvye91He2gus0/tuxmake_reproducer.sh
On Thu, 10 Mar 2022 at 22:50, Naresh Kamboju <[email protected]> wrote:
>
> On Fri, 11 Mar 2022 at 02:55, Ard Biesheuvel <[email protected]> wrote:
> >
> > On Thu, 10 Mar 2022 at 22:18, Naresh Kamboju <[email protected]> wrote:
> > >
> > > Hi Ard and Russell,
> > >
> > > The boot test pass on linux next-20220310 tag with KASAN=y on BeagleBoard x15
> > > device. but LTP cve tests reproduced the reported kernel crash [1].
> > > From the available historical data I can confirm that this is an
> > > intermittent issue on
> > > BeagleBoard x15 devices.
> > >
> > > OTOH, the kernel crash is always reproducible on qemu-arm with KASAN=y
> > > while booting which has been known to fail for a long time.
> > >
> > > From the Ardb tree I have boot tested qemu-arm with KASAN=y the reported
> > > kernel crash is always reproducible.
> > >
> > > The build steps [3] and extra Kconfigs.
> > >
> > > - Naresh
> > > [1] https://lkft.validation.linaro.org/scheduler/job/4701310
> > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/
> > > [3] https://builds.tuxbuild.com/2661dIAPUjE2DMJvye91He2gus0/tuxmake_reproducer.sh
> >
> > Thanks Naresh. I'm having trouble to make sense of this, though. The
> > linked output log appears to be from a build that lacks my 'ARM:
> > entry: fix unwinder problems caused by IRQ stacks' patch, as it
> > doesn't show any occurrences of call_with_stack() on any of the call
> > stacks.
> >
> > Do you have a link to the vmlinux and zImage files for this build?
>
> Yes.
>
> vmlinux.xz: https://builds.tuxbuild.com/26BmIasJnAyCii0SkgbKarkF369/vmlinux.xz
> zImage: https://builds.tuxbuild.com/26BmIasJnAyCii0SkgbKarkF369/zImage
> System.map: https://builds.tuxbuild.com/26BmIasJnAyCii0SkgbKarkF369/System.map
> Build log: https://builds.tuxbuild.com/26BmIasJnAyCii0SkgbKarkF369/
>
This kernel does not appear to have
ARM: unwind: set frame.pc correctly for current-thread unwinding
ARM: entry: fix unwinder problems caused by IRQ stacks
ARM: Revert "unwind: dump exception stack from calling frame"
so it is expected that the same issue is still being observed.
Could you please try -next with those patches applied?
On Wed, Mar 09, 2022 at 10:08:25PM +0530, Naresh Kamboju wrote:
> Hi Russell,
>
> On Wed, 9 Mar 2022 at 20:37, Russell King (Oracle)
> <[email protected]> wrote:
> >
> > On Wed, Mar 09, 2022 at 03:57:32PM +0100, Ard Biesheuvel wrote:
> > > On Wed, 9 Mar 2022 at 15:44, Naresh Kamboju <[email protected]> wrote:
> > > >
> <trim>
> > Well, we unwound until:
> >
> > __irq_svc from migrate_disable+0x0/0x70
> >
> > and then crashed - and the key thing there is that we're at the start
> > of migrate_disable() when we took an interrupt.
> >
> > For some reason, this triggers an access to address 0x10, which faults.
> > We then try unwinding again, and successfully unwind all the way back
> > to the same point (the line above) which then causes the unwinder to
> > again access address 0x10, and the cycle repeats with the stack
> > growing bigger and bigger.
> >
> > I'd suggest also testing without the revert but with my patch.
>
> I have tested your patch on top of linux next-20220309 and still see kernel
> crash as below [1]. build link [2].
>
> [ 26.812060] 8<--- cut here ---
> [ 26.813459] Unhandled fault: page domain fault (0x01b) at 0xb6a3ab70
> [ 26.816139] [b6a3ab70] *pgd=fb28a835
> [ 26.817770] Internal error: : 1b [#1] SMP ARM
> [ 26.819636] Modules linked in:
> [ 26.820956] CPU: 0 PID: 211 Comm: haveged Not tainted
> 5.17.0-rc7-next-20220309 #1
> [ 26.824519] Hardware name: Generic DT based system
> [ 26.827148] PC is at __read_once_word_nocheck+0x0/0x8
> [ 26.829856] LR is at unwind_frame+0x7dc/0xab4
>
> - Naresh
>
> [1] https://lkft.validation.linaro.org/scheduler/job/4688599#L596
> [2] https://builds.tuxbuild.com/269gYLGuAdmltuLhIUDAjS2fg1Q/
I think the problem has just moved:
[ 27.113085] __irq_svc from __copy_to_user_std+0x24/0x378
The code at the start of __copy_to_user_std is:
0: e3a034bf mov r3, #-1090519040 ; 0xbf000000
4: e243c001 sub ip, r3, #1
8: e05cc000 subs ip, ip, r0
c: 228cc001 addcs ip, ip, #1
10: 205cc002 subscs ip, ip, r2
14: 33a00000 movcc r0, #0
18: e320f014 csdb
1c: e3a03000 mov r3, #0
20: e92d481d push {r0, r2, r3, r4, fp, lr}
24: e1a0b00d mov fp, sp
and the unwind information will be:
0xc056f14c <arm_copy_to_user+0x1c>: @0xc0b89b84
Compact model index: 1
0x9b vsp = r11
0xb1 0x0d pop {r0, r2, r3}
0x84 0x81 pop {r4, r11, r14}
0xb0 finish
The problem is that the unwind information says "starting at offset
0x1c, to unwind do the following operations". The first of which is
to move r11 (fp) to the stack pointer. However, r11 isn't setup
until function offset 0x24. You've hit that instruction, which hasn't
executed yet, but the stack has been modified by pushing r0, r2-r4,
fp and lr onto it.
Given this, there is no way that the unwinder (as it currently stands)
can do its job properly between 0x1c and 0x24.
I don't think this is specifically caused by Ard's patches, but by
the addition of KASAN, which has the effect of calling the unwinder
at random points in the kernel (when an interrupt happens) and it's
clear from the above that there are windows in the code where, if
we attempt to unwind using the unwind information, we faill fail
because the program state is not consistent with the unwind
information.
Ard's patch that changes:
ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
to use get_kernel_nofault() should have the effect of protecting
against the oops, but the side effect is that it is fundamentally not
possible with the way these things are to unwind at these points -
which means its not possible to get a stacktrace there.
So, I don't think this is a "new" problem, but a weakness of using
the unwinder to get a backtrace for KASAN.
Do you have any way to work out exactly when this problem first
appeared?
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Thu, 10 Mar 2022 at 22:18, Naresh Kamboju <[email protected]> wrote:
>
> Hi Ard and Russell,
>
> The boot test pass on linux next-20220310 tag with KASAN=y on BeagleBoard x15
> device. but LTP cve tests reproduced the reported kernel crash [1].
> From the available historical data I can confirm that this is an
> intermittent issue on
> BeagleBoard x15 devices.
>
> OTOH, the kernel crash is always reproducible on qemu-arm with KASAN=y
> while booting which has been known to fail for a long time.
>
> From the Ardb tree I have boot tested qemu-arm with KASAN=y the reported
> kernel crash is always reproducible.
>
> The build steps [3] and extra Kconfigs.
>
> - Naresh
> [1] https://lkft.validation.linaro.org/scheduler/job/4701310
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/
> [3] https://builds.tuxbuild.com/2661dIAPUjE2DMJvye91He2gus0/tuxmake_reproducer.sh
Thanks Naresh. I'm having trouble to make sense of this, though. The
linked output log appears to be from a build that lacks my 'ARM:
entry: fix unwinder problems caused by IRQ stacks' patch, as it
doesn't show any occurrences of call_with_stack() on any of the call
stacks.
Do you have a link to the vmlinux and zImage files for this build?
On Thu, Mar 10, 2022 at 12:35:55PM +0000, Russell King (Oracle) wrote:
> On Wed, Mar 09, 2022 at 09:42:29PM +0100, Ard Biesheuvel wrote:
> > On Wed, 9 Mar 2022 at 20:39, Russell King (Oracle)
> > <[email protected]> wrote:
> > >
> > > On Wed, Mar 09, 2022 at 08:14:30PM +0100, Ard Biesheuvel wrote:
> > > > The backtrace dumped by __die() uses the pt_regs from the exception
> > > > context as the starting point, so the exception entry code that deals
> > > > with the condition that triggered the oops is omitted, and does not
> > > > have to be unwound.
> > >
> > > That is true, but that's not really the case I was thinking about.
> > > I was thinking more about cases such as RCU stalls, soft lockups,
> > > etc.
> > >
> > > For example:
> > >
> > > https://www.linuxquestions.org/questions/linux-kernel-70/kenel-v4-4-60-panic-in-igmp6_send-and-and-__neigh_create-4175704721/
> > >
> > > In that stack trace, the interesting bits are not the beginning of
> > > the stack trace down to __irq_svc, but everything beyond __irq_svc,
> > > since the lockup is probably caused by being stuck in
> > > _raw_write_lock_bh().
> > >
> > > It's these situations that we will totally destroy debuggability for,
> > > and the only way around that would be to force frame pointers and
> > > ARM builds (not Thumb-2 as that requires the unwinder... which means
> > > a Thumb-2 kernel soft lockup would be undebuggable.
> > >
> >
> > Indeed.
> >
> > But that means that the only other choice we have is to retain the
> > imprecise nature of the current solution (which usually works fine
> > btw), and simply deal with the faulting double dereference of vsp in
> > the unwinder code. We simply don't know whether the exception was
> > taken at a point where the stack frame is consistent with the unwind
> > data.
>
> Okay, further analysis (for the record, since I've said much of this on
> IRC):
>
> What we have currently is a robust unwinder that will cope when things
> go wrong, such as an interrupt taken during the prologue of a function.
> The way it copes is by two mechanisms:
>
> /* store the highest address on the stack to avoid crossing it*/
> low = frame->sp;
> ctrl.sp_high = ALIGN(low, THREAD_SIZE);
>
> These two represent the allowable bounds of the kernel stack. When we
> run the unwinder, before each unwind instruction we check whether the
> current SP value is getting close to the top of the kernel stack, and
> if so, turn on additional checking:
>
> if ((ctrl.sp_high - ctrl.vrs[SP]) < sizeof(ctrl.vrs))
> ctrl.check_each_pop = 1;
>
> that will ensure if we go off the top of the kernel stack, the
> unwinder will report failure, and not access those addresses.
>
> After each instruction, we check whether the SP value is within the
> above bounds:
>
> if (ctrl.vrs[SP] < low || ctrl.vrs[SP] >= ctrl.sp_high)
> return -URC_FAILURE;
>
> This means that the unwinder can never modify SP to point outside of
> the kernel stack region identified by low..ctrl.sp_high, thereby
> protecting the load inside unwind_pop_register() from ever
> dereferencing something outside of the kernel stack. Moreover, it also
> prevents the unwinder modifying SP to point below the current stack
> frame.
>
> The problem has been introduced by trying to make the unwinder cope
> with IRQ stacks in b6506981f880 ("ARM: unwind: support unwinding across
> multiple stacks"):
>
> - if (!load_sp)
> + if (!load_sp) {
> ctrl->vrs[SP] = (unsigned long)vsp;
> + } else {
> + ctrl->sp_low = ctrl->vrs[SP];
> + ctrl->sp_high = ALIGN(ctrl->sp_low, THREAD_SIZE);
> + }
>
> Now, whenever SP is loaded, we reset the allowable range for the SP
> value, and this completely defeats the protections we previously had
> which were ensuring that:
>
> 1) the SP value doesn't jump back _down_ the kernel stack resulting
> in an infinite unwind loop.
> 2) the SP value doesn't end up outside the kernel stack.
>
> We need those protections to prevent these problems that are being
> reported - and the most efficient way I can think of doing that is to
> somehow valudate the new SP value _before_ we modify sp_low and
> sp_high, so these two limits are always valid.
>
> Merely changing the READ_ONCE_NOCHECK() to be get_kernel_nocheck()
> will only partly fix this problem - it will stop the unwinder oopsing
> the kernel, but definitely doesn't protect against (1) and doesn't
> protect against SP pointing at some thing that is accessible (e.g.
> a device or other kernel memory.)
>
> We're yet again at Thursday, with the last linux-next prior to the
> merge window being created this evening, which really doesn't leave
> much time to get this sorted... and we can't say "this code should
> have been in earlier in the cycle" this time around, because these
> changes to the unwinder have been present in linux-next prior to
> 5.17-rc2. Annoyingly, it seems merging stuff earlier in the cycle
> doesn't actually solve the problem of these last minute debugging
> panics.
>
> Any suggestions for what we should do? Can we come up with some way
> to validate the new SP value before 6pm UTC this evening?
Also, looking deeper at the last linaro job report:
https://lkft.validation.linaro.org/scheduler/job/4688599#L684
the dumped memory doesn't look like an exception stack. If it was,
e82aab40 would be the saved CPSR value and c388eb80 would be the PC
value, both of which are nonsense.
The stack that we were in (and we dumped out in full) was:
Stack: (0xc381bb30 to 0xc381c000)
and the exception stack (the saved pt_regs) is:
r0 r1 r2 r3 r4
bfa0: c2ba47c0 0000000a c2ba1358 ffff9537 c2c05d00 00400140 c62d5624 c1948b20
r5 r6 r7 r8 r9 r10 fp ip
bfc0: e82ab498 c62d5400 c35377a0 c62d5404 25706000 c381bfe8 c62d5400 00000001
sp lr pc cpsr orig_r0
bfe0: c5fcfc48 c036251c c0995a14 20040013 ffffffff c5fcfc7c c62d5400 c0300bf0
but, we end up dumping out:
Exception stack(0xc5fcfc98 to 0xc5fcfce0)
fc80: b6a3ab70 00000004
fca0: 00000000 00000004 b6a3ab70 c055f928 c388eb80 c5fcfd40 00000000 c5fcfd50
fcc0: 00000005 00000051 e82aad1c c03ae570 00000000 c388eb80 c3512a20 e82aab40
Firstly, that's in the wrong stack to be dumping for the exception
stack, and secondly, why is it 0x50 bytes above the saved SP value from
the real exception stack - that makes no sense in itself.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Thu, Mar 10, 2022 at 11:06:17PM +0100, Ard Biesheuvel wrote:
> On Thu, 10 Mar 2022 at 22:50, Naresh Kamboju <[email protected]> wrote:
> >
> > On Fri, 11 Mar 2022 at 02:55, Ard Biesheuvel <[email protected]> wrote:
> > >
> > > On Thu, 10 Mar 2022 at 22:18, Naresh Kamboju <[email protected]> wrote:
> > > >
> > > > Hi Ard and Russell,
> > > >
> > > > The boot test pass on linux next-20220310 tag with KASAN=y on BeagleBoard x15
> > > > device. but LTP cve tests reproduced the reported kernel crash [1].
> > > > From the available historical data I can confirm that this is an
> > > > intermittent issue on
> > > > BeagleBoard x15 devices.
> > > >
> > > > OTOH, the kernel crash is always reproducible on qemu-arm with KASAN=y
> > > > while booting which has been known to fail for a long time.
> > > >
> > > > From the Ardb tree I have boot tested qemu-arm with KASAN=y the reported
> > > > kernel crash is always reproducible.
> > > >
> > > > The build steps [3] and extra Kconfigs.
> > > >
> > > > - Naresh
> > > > [1] https://lkft.validation.linaro.org/scheduler/job/4701310
> > > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/
> > > > [3] https://builds.tuxbuild.com/2661dIAPUjE2DMJvye91He2gus0/tuxmake_reproducer.sh
> > >
> > > Thanks Naresh. I'm having trouble to make sense of this, though. The
> > > linked output log appears to be from a build that lacks my 'ARM:
> > > entry: fix unwinder problems caused by IRQ stacks' patch, as it
> > > doesn't show any occurrences of call_with_stack() on any of the call
> > > stacks.
> > >
> > > Do you have a link to the vmlinux and zImage files for this build?
> >
> > Yes.
> >
> > vmlinux.xz: https://builds.tuxbuild.com/26BmIasJnAyCii0SkgbKarkF369/vmlinux.xz
> > zImage: https://builds.tuxbuild.com/26BmIasJnAyCii0SkgbKarkF369/zImage
> > System.map: https://builds.tuxbuild.com/26BmIasJnAyCii0SkgbKarkF369/System.map
> > Build log: https://builds.tuxbuild.com/26BmIasJnAyCii0SkgbKarkF369/
> >
>
> This kernel does not appear to have
>
> ARM: unwind: set frame.pc correctly for current-thread unwinding
> ARM: entry: fix unwinder problems caused by IRQ stacks
> ARM: Revert "unwind: dump exception stack from calling frame"
>
> so it is expected that the same issue is still being observed.
>
> Could you please try -next with those patches applied?
I concur, from my inspection of the above referenced vmlinux file.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Fri, 11 Mar 2022 at 02:55, Ard Biesheuvel <[email protected]> wrote:
>
> On Thu, 10 Mar 2022 at 22:18, Naresh Kamboju <[email protected]> wrote:
> >
> > Hi Ard and Russell,
> >
> > The boot test pass on linux next-20220310 tag with KASAN=y on BeagleBoard x15
> > device. but LTP cve tests reproduced the reported kernel crash [1].
> > From the available historical data I can confirm that this is an
> > intermittent issue on
> > BeagleBoard x15 devices.
> >
> > OTOH, the kernel crash is always reproducible on qemu-arm with KASAN=y
> > while booting which has been known to fail for a long time.
> >
> > From the Ardb tree I have boot tested qemu-arm with KASAN=y the reported
> > kernel crash is always reproducible.
> >
> > The build steps [3] and extra Kconfigs.
> >
> > - Naresh
> > [1] https://lkft.validation.linaro.org/scheduler/job/4701310
> > [2] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/
> > [3] https://builds.tuxbuild.com/2661dIAPUjE2DMJvye91He2gus0/tuxmake_reproducer.sh
>
> Thanks Naresh. I'm having trouble to make sense of this, though. The
> linked output log appears to be from a build that lacks my 'ARM:
> entry: fix unwinder problems caused by IRQ stacks' patch, as it
> doesn't show any occurrences of call_with_stack() on any of the call
> stacks.
>
> Do you have a link to the vmlinux and zImage files for this build?
Yes.
vmlinux.xz: https://builds.tuxbuild.com/26BmIasJnAyCii0SkgbKarkF369/vmlinux.xz
zImage: https://builds.tuxbuild.com/26BmIasJnAyCii0SkgbKarkF369/zImage
System.map: https://builds.tuxbuild.com/26BmIasJnAyCii0SkgbKarkF369/System.map
Build log: https://builds.tuxbuild.com/26BmIasJnAyCii0SkgbKarkF369/
- Naresh
Hi Ard and Russell,
Your three patches applied and tested on x15 tested by Daniel and reported
kernel crash did not find it after multiple iterations.
ARM: unwind: set frame.pc correctly for current-thread unwinding
ARM: entry: fix unwinder problems caused by IRQ stacks
ARM: Revert "unwind: dump exception stack from calling frame"
Tested-by: Linux Kernel Functional Testing <[email protected]>
Build url:
https://builds.tuxbuild.com/26DZbOeAxshMvtU2FhS3Fytr7NS/
vmlinux: https://builds.tuxbuild.com/26DZbOeAxshMvtU2FhS3Fytr7NS/vmlinux.xz
zImage: https://builds.tuxbuild.com/26DZbOeAxshMvtU2FhS3Fytr7NS/zImage
System.map: https://builds.tuxbuild.com/26DZbOeAxshMvtU2FhS3Fytr7NS/System.map
LAVA test jobs ran on x15 device:
https://lkft.validation.linaro.org/scheduler/job/4702341
https://lkft.validation.linaro.org/scheduler/job/4702344
https://lkft.validation.linaro.org/scheduler/job/4702348
https://lkft.validation.linaro.org/scheduler/job/4702350
https://lkft.validation.linaro.org/scheduler/job/4702352
https://lkft.validation.linaro.org/scheduler/job/4702354
- Naresh
On Mon, 14 Mar 2022 at 10:02, Naresh Kamboju <[email protected]> wrote:
>
> Hi Ard and Russell,
>
> Your three patches applied and tested on x15 tested by Daniel and reported
> kernel crash did not find it after multiple iterations.
>
> ARM: unwind: set frame.pc correctly for current-thread unwinding
> ARM: entry: fix unwinder problems caused by IRQ stacks
> ARM: Revert "unwind: dump exception stack from calling frame"
>
> Tested-by: Linux Kernel Functional Testing <[email protected]>
>
Thank you Naresh.