2019-08-12 21:54:21

by Qian Cai

[permalink] [raw]
Subject: "arm64/for-next/core" causes boot panic

Booting today's linux-next on an arm64 server triggers a panic with
CONFIG_KASAN_SW_TAGS=y pointing to this line,

kfree()->virt_to_head_page()->compound_head()

unsigned long head = READ_ONCE(page->compound_head);

The bisect so far indicates one of those could be bad,

9c1cac424c93 arm64: mm: Really fix sparse warning in untagged_addr()
d2c68de192cf docs: arm64: Add layout and 52-bit info to memory document
2c624fe68715 arm64: mm: Remove vabits_user
b6d00d47e81a arm64: mm: Introduce 52-bit Kernel VAs
ce3aaed87344 arm64: mm: Modify calculation of VMEMMAP_SIZE
c8b6d2ccf9b1 arm64: mm: Separate out vmemmap
c812026c54cf arm64: mm: Logic to make offset_ttbr1 conditional
5383cc6efed1 arm64: mm: Introduce vabits_actual
90ec95cda91a arm64: mm: Introduce VA_BITS_MIN
99426e5e8c9f arm64: dump: De-constify VA_START and KASAN_SHADOW_START
6bd1d0be0e97 arm64: kasan: Switch to using KASAN_SHADOW_OFFSET
14c127c957c1 arm64: mm: Flip kernel VA space
08f103b9a950 arm64/ptrace: Fix typoes in sve_set() comment
2951d5efaf8b arm64: mm: print hexadecimal EC value in mem_abort_decode()
b99286b088ea arm64/prefetch: fix a -Wtype-limits warning
71c67a31f09f init/Kconfig: Fix infinite Kconfig recursion on PPC
42d038c4fb00 arm64: Add support for function error injection
45880f7b7b19 error-injection: Consolidate override function definition
9ce1263033cd selftests, arm64: add a selftest for passing tagged pointers to
kernel
63f0c6037965 arm64: Introduce prctl() options to control the tagged user
addresses ABI
2b835e24b5c6 arm64: untag user pointers in access_ok and __uaccess_mask_ptr
5cf896fb6be3 arm64: Add support for relocating the kernel with RELR relocations
66cbdf5d0c96 arm64: Move TIF_* documentation to individual definitions
13776f9d40a0 arm64: mm: free the initrd reserved memblock in a aligned manner
22ec71615d82 arm64: io: Relax implicit barriers in default I/O accessors
2f8f180b3cee arm64: Remove unused cpucap_multi_entry_cap_cpu_enable()
73961dc1182e arm64: sysreg: Remove unused and rotting SCTLR_ELx field
definitions
332e5281a4e8 arm64: esr: Add ESR exception class encoding for trapped ERET
b3e089cd446b arm64: Replace strncmp with str_has_prefix
3e77eeb7a27f ACPI/IORT: Rename arm_smmu_v3_set_proximity() 'node' local variable
b717480f5415 arm64: remove unneeded uapi/asm/stat.h
c19d050f8088 arm64/kexec: Use consistent convention of initializing
'kxec_buf.mem' with KEXEC_BUF_MEM_UNKNOWN
b907b80d7ae7 arm64: remove pointless __KERNEL__ guards
c87857945b0e arm64: Remove unused assembly macro


[    0.000000][    T0] Unable to handle kernel paging request at virtual address
0030ffe001e01588
[    0.000000][    T0] Mem abort info:
[    0.000000][    T0]   ESR = 0x96000004
[    0.000000][    T0]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.000000][    T0]   SET = 0, FnV = 0
[    0.000000][    T0]   EA = 0, S1PTW = 0
[    0.000000][    T0] Data abort info:
[    0.000000][    T0]   ISV = 0, ISS = 0x00000004
[    0.000000][    T0]   CM = 0, WnR = 0
[    0.000000][    T0] [0030ffe001e01588] address between user and kernel
address ranges
[    0.000000][    T0] Internal error: Oops: 96000004 [#1] SMP
[    0.000000][    T0] Modules linked in:
[    0.000000][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc4-next-
20190812+ #1
[    0.000000][    T0] pstate: 40000089 (nZcv daIf -PAN -UAO)
[    0.000000][    T0] pc : kfree+0x160/0xc98
[    0.000000][    T0] lr : kfree+0x154/0xc98
[    0.000000][    T0] sp : ffff900012e07cc0
[    0.000000][    T0] x29: ffff900012e07d50 x28: 0000000000000100 
[    0.000000][    T0] x27: 8cff000800563c88 x26: 3dff000800566220 
[    0.000000][    T0] x25: 7bff0008005c0800 x24: c3ff00080056a580 
[    0.000000][    T0] x23: 33ff000800563500 x22: 8cff000800563c80 
[    0.000000][    T0] x21: ffff9000109b57a4 x20: 33ff000800563500 
[    0.000000][    T0] x19: ffffffdfffc00000 x18: 0000000000000040 
[    0.000000][    T0] x17: 0000000000400000 x16: ffff900010c00000 
[    0.000000][    T0] x15: 0000000000000000 x14: ffff90001121a590 
[    0.000000][    T0] x13: ffff90001013c6fc x12: ffff900010141c78 
[    0.000000][    T0] x11: 0000000000000001 x10: ffff8fff8fc00000 
[    0.000000][    T0] x9 : 0001000080000000 x8 : 0030ffe001e01580 
[    0.000000][    T0] x7 : ffffffffffffffff x6 : 33ff000800563520 
[    0.000000][    T0] x5 : 0000000000000000 x4 : 0000000000000000 
[    0.000000][    T0] x3 : 0000000000000100 x2 : ffff900012e324f8 
[    0.000000][    T0] x1 : 33ff000800563500 x0 : c40000088056a580 
[    0.000000][    T0] Call trace:
[    0.000000][    T0]  kfree+0x160/0xc98
[    0.000000][    T0]  free_cpumask_var+0xc/0x14
[    0.000000][    T0]  apply_wqattrs_prepare+0x2e4/0x3b0
[    0.000000][    T0]  apply_workqueue_attrs_locked+0x7c/0xdc
[    0.000000][    T0]  alloc_workqueue+0x340/0x69c
[    0.000000][    T0]  workqueue_init_early+0x4b4/0x654
[    0.000000][    T0]  start_kernel+0x210/0x558
[    0.000000][    T0] Code: 97f323d3 d34afc08 927abd08 8b080268 (f9400509) 
[    0.000000][    T0] ---[ end trace 8710f821a534a562 ]---
[    0.000000][    T0] Kernel panic - not syncing: Fatal exception
[    0.000000][    T0] ---[ end Kernel panic - not syncing: Fatal exception ]---


2019-08-13 09:03:17

by Will Deacon

[permalink] [raw]
Subject: Re: "arm64/for-next/core" causes boot panic

Hi Qian,

Thanks for the report.

On Mon, Aug 12, 2019 at 05:51:35PM -0400, Qian Cai wrote:
> Booting today's linux-next on an arm64 server triggers a panic with
> CONFIG_KASAN_SW_TAGS=y pointing to this line,

Is this the only change on top of defconfig? If not, please can you share
your full .config?

> kfree()->virt_to_head_page()->compound_head()
>
> unsigned long head = READ_ONCE(page->compound_head);
>
> The bisect so far indicates one of those could be bad,

I guess that means the issue is reproducible on the arm64 for-next/core
branch. Once I have your .config, I'll give it a go.

> [????0.000000][????T0] Unable to handle kernel paging request at virtual address
> 0030ffe001e01588
> [????0.000000][????T0] Mem abort info:
> [????0.000000][????T0]???ESR = 0x96000004
> [????0.000000][????T0]???EC = 0x25: DABT (current EL), IL = 32 bits
> [????0.000000][????T0]???SET = 0, FnV = 0
> [????0.000000][????T0]???EA = 0, S1PTW = 0
> [????0.000000][????T0] Data abort info:
> [????0.000000][????T0]???ISV = 0, ISS = 0x00000004
> [????0.000000][????T0]???CM = 0, WnR = 0
> [????0.000000][????T0] [0030ffe001e01588] address between user and kernel
> address ranges

Hmm, nice address...

I suppose we're looking at the interaction of 52-bit VA, untagged pointers
and KASAN using sw tags. Lovely.

Thanks, and please keep us updated on the bisection.

Will

2019-08-13 11:06:32

by Will Deacon

[permalink] [raw]
Subject: Re: "arm64/for-next/core" causes boot panic

On Tue, Aug 13, 2019 at 10:02:01AM +0100, Will Deacon wrote:
> On Mon, Aug 12, 2019 at 05:51:35PM -0400, Qian Cai wrote:
> > Booting today's linux-next on an arm64 server triggers a panic with
> > CONFIG_KASAN_SW_TAGS=y pointing to this line,
>
> Is this the only change on top of defconfig? If not, please can you share
> your full .config?
>
> > kfree()->virt_to_head_page()->compound_head()
> >
> > unsigned long head = READ_ONCE(page->compound_head);
> >
> > The bisect so far indicates one of those could be bad,
>
> I guess that means the issue is reproducible on the arm64 for-next/core
> branch. Once I have your .config, I'll give it a go.

FWIW, I've managed to reproduce this using defconfig + SW_TAGS on
for-next/core, so I'll keep investigating.

Will

--->8

[ 0.000000] Unable to handle kernel paging request at virtual address 0037fe0007580d08
[ 0.000000] Mem abort info:
[ 0.000000] ESR = 0x96000004
[ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.000000] SET = 0, FnV = 0
[ 0.000000] EA = 0, S1PTW = 0
[ 0.000000] Data abort info:
[ 0.000000] ISV = 0, ISS = 0x00000004
[ 0.000000] CM = 0, WnR = 0
[ 0.000000] [0037fe0007580d08] address between user and kernel address ranges
[ 0.000000] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc3-00049-gf964cbd07098 #1
[ 0.000000] Hardware name: linux,dummy-virt (DT)
[ 0.000000] pstate: 20000085 (nzCv daIf -PAN -UAO)
[ 0.000000] pc : kfree+0x44/0x6ac
[ 0.000000] lr : apply_wqattrs_prepare+0x390/0x3fc
[ 0.000000] sp : ffff9000541d7d00
[ 0.000000] x29: ffff9000541d7d80 x28: 4dff0001de034e08
[ 0.000000] x27: b2ff0001de040000 x26: 0000000000000004
[ 0.000000] x25: c1ff0001de034c28 x24: 4dff0001de034e00
[ 0.000000] x23: a8ff0001de034d00 x22: c1ff0001de020a00
[ 0.000000] x21: a8ff0001de034d08 x20: 0000000000000000
[ 0.000000] x19: c1ff0001de034c00 x18: 0000000000000000
[ 0.000000] x17: 0000000000000000 x16: 0000000000000000
[ 0.000000] x15: 1ffff6b000000000 x14: ffff900053ca87e4
[ 0.000000] x13: ffff900052539444 x12: ffff90005253ce48
[ 0.000000] x11: 00000000000000c1 x10: ffff80001de034c1
[ 0.000000] x9 : fffffdffffe00008 x8 : 0138000007780d00
[ 0.000000] x7 : ffffffffffffffff x6 : a8ff0001de034d28
[ 0.000000] x5 : 0000000000000040 x4 : 0000000000000008
[ 0.000000] x3 : 0000000000000100 x2 : ffff9000541ddf68
[ 0.000000] x1 : a8ff0001de034d08 x0 : 4dff0001de034e00
[ 0.000000] Call trace:
[ 0.000000] kfree+0x44/0x6ac
[ 0.000000] apply_wqattrs_prepare+0x390/0x3fc
[ 0.000000] apply_workqueue_attrs+0x70/0xe4
[ 0.000000] alloc_workqueue+0x514/0x728
[ 0.000000] workqueue_init_early+0x36c/0x4a0
[ 0.000000] start_kernel+0x1d0/0x46c
[ 0.000000] Code: f2bffc09 d346fd08 f2dfbfe9 927acd08 (f8696909)
[ 0.000000] random: get_random_bytes called from oops_exit+0x4c/0x78 with crng_init=0
[ 0.000000] ---[ end trace 0000000000000000 ]---

2019-08-13 12:08:20

by Will Deacon

[permalink] [raw]
Subject: Re: "arm64/for-next/core" causes boot panic

[+Steve]

On Tue, Aug 13, 2019 at 11:58:52AM +0100, Will Deacon wrote:
> On Tue, Aug 13, 2019 at 10:02:01AM +0100, Will Deacon wrote:
> > On Mon, Aug 12, 2019 at 05:51:35PM -0400, Qian Cai wrote:
> > > Booting today's linux-next on an arm64 server triggers a panic with
> > > CONFIG_KASAN_SW_TAGS=y pointing to this line,
> >
> > Is this the only change on top of defconfig? If not, please can you share
> > your full .config?
> >
> > > kfree()->virt_to_head_page()->compound_head()
> > >
> > > unsigned long head = READ_ONCE(page->compound_head);
> > >
> > > The bisect so far indicates one of those could be bad,
> >
> > I guess that means the issue is reproducible on the arm64 for-next/core
> > branch. Once I have your .config, I'll give it a go.
>
> FWIW, I've managed to reproduce this using defconfig + SW_TAGS on
> for-next/core, so I'll keep investigating.

Right, hacky diff below seems to resolve this, so I'll split this up into
some proper patches as there is more than one bug here.

Thanks,

Will

--->8

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index afaf512c0e1b..541e8dcb5ab3 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -244,7 +244,7 @@ static inline const void *__tag_set(const void *addr, u8 tag)
/*
* The linear kernel range starts in the middle of the virtual adddress
* space. Testing the top bit for the start of the region is a
- * sufficient check.
+ * sufficient check and avoids having to worry about the tag.
*/
#define __is_lm_address(addr) (!((addr) & BIT(vabits_actual - 1)))

@@ -252,7 +252,7 @@ static inline const void *__tag_set(const void *addr, u8 tag)
#define __kimg_to_phys(addr) ((addr) - kimage_voffset)

#define __virt_to_phys_nodebug(x) ({ \
- phys_addr_t __x = (phys_addr_t)(x); \
+ phys_addr_t __x = (phys_addr_t)(__tag_reset(x)); \
__is_lm_address(__x) ? __lm_to_phys(__x) : \
__kimg_to_phys(__x); \
})
@@ -301,8 +301,8 @@ static inline void *phys_to_virt(phys_addr_t x)
#define __pa_nodebug(x) __virt_to_phys_nodebug((unsigned long)(x))
#define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x)))
#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT)
-#define virt_to_pfn(x) __phys_to_pfn(__virt_to_phys((unsigned long)(x)))
-#define sym_to_pfn(x) __phys_to_pfn(__pa_symbol(x))
+#define virt_to_pfn(x) __phys_to_pfn(__virt_to_phys((unsigned long)(x)))
+#define sym_to_pfn(x) __phys_to_pfn(__pa_symbol(x))

/*
* virt_to_page(k) convert a _valid_ virtual address to struct page *
@@ -311,7 +311,7 @@ static inline void *phys_to_virt(phys_addr_t x)
#define ARCH_PFN_OFFSET ((unsigned long)PHYS_PFN_OFFSET)

#if !defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_DEBUG_VIRTUAL)
-#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
+#define virt_to_page(kaddr) pfn_to_page(virt_to_pfn(kaddr))
#else
#define __virt_to_pgoff(kaddr) (((u64)(kaddr) - PAGE_OFFSET) / PAGE_SIZE * sizeof(struct page))
#define __page_to_voff(kaddr) (((u64)(kaddr) - VMEMMAP_START) * PAGE_SIZE / sizeof(struct page))
@@ -324,15 +324,17 @@ static inline void *phys_to_virt(phys_addr_t x)
((void *)__addr_tag); \
})

-#define virt_to_page(vaddr) ((struct page *)((__virt_to_pgoff(vaddr)) + VMEMMAP_START))
+#define virt_to_page(kaddr) ({ \
+ unsigned long __addr = __tag_reset((unsigned long)kaddr); \
+ (struct page *)((__virt_to_pgoff(__addr) + VMEMMAP_START)); \
+})
#endif
#endif

-#define _virt_addr_is_linear(kaddr) \
- (__tag_reset((u64)(kaddr)) >= PAGE_OFFSET)
-
-#define virt_addr_valid(kaddr) \
- (_virt_addr_is_linear(kaddr) && pfn_valid(virt_to_pfn(kaddr)))
+#define virt_addr_valid(kaddr) ({ \
+ unsigned long __addr = (unsigned long)kaddr; \
+ __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr)); \
+})

/*
* Given that the GIC architecture permits ITS implementations that can only be

2019-08-13 14:06:34

by Steve Capper

[permalink] [raw]
Subject: Re: "arm64/for-next/core" causes boot panic

Hi Will,

On Tue, Aug 13, 2019 at 01:06:44PM +0100, Will Deacon wrote:
> [+Steve]
>
> On Tue, Aug 13, 2019 at 11:58:52AM +0100, Will Deacon wrote:
> > On Tue, Aug 13, 2019 at 10:02:01AM +0100, Will Deacon wrote:
> > > On Mon, Aug 12, 2019 at 05:51:35PM -0400, Qian Cai wrote:
> > > > Booting today's linux-next on an arm64 server triggers a panic with
> > > > CONFIG_KASAN_SW_TAGS=y pointing to this line,
> > >
> > > Is this the only change on top of defconfig? If not, please can you share
> > > your full .config?
> > >
> > > > kfree()->virt_to_head_page()->compound_head()
> > > >
> > > > unsigned long head = READ_ONCE(page->compound_head);
> > > >
> > > > The bisect so far indicates one of those could be bad,
> > >
> > > I guess that means the issue is reproducible on the arm64 for-next/core
> > > branch. Once I have your .config, I'll give it a go.
> >
> > FWIW, I've managed to reproduce this using defconfig + SW_TAGS on
> > for-next/core, so I'll keep investigating.

I've installed clang-8 and enabled CONFIG_KASAN_SW_TAGS and was able to
reproduce the problem quite rapidly. Many apologies for missing this
before in my testing.

>
> Right, hacky diff below seems to resolve this, so I'll split this up into
> some proper patches as there is more than one bug here.
>
> Thanks,
>
> Will
>
> --->8
>
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
FWIW, this fixed the crashes I experienced, I'll run some additional
tests.

Cheers,
--
Steve

2019-08-13 14:45:06

by Steve Capper

[permalink] [raw]
Subject: Re: "arm64/for-next/core" causes boot panic

On Tue, Aug 13, 2019 at 03:04:52PM +0100, Steve Capper wrote:
> Hi Will,
>
> On Tue, Aug 13, 2019 at 01:06:44PM +0100, Will Deacon wrote:
> > [+Steve]
> >
> > On Tue, Aug 13, 2019 at 11:58:52AM +0100, Will Deacon wrote:
> > > On Tue, Aug 13, 2019 at 10:02:01AM +0100, Will Deacon wrote:
> > > > On Mon, Aug 12, 2019 at 05:51:35PM -0400, Qian Cai wrote:
> > > > > Booting today's linux-next on an arm64 server triggers a panic with
> > > > > CONFIG_KASAN_SW_TAGS=y pointing to this line,
> > > >
> > > > Is this the only change on top of defconfig? If not, please can you share
> > > > your full .config?
> > > >
> > > > > kfree()->virt_to_head_page()->compound_head()
> > > > >
> > > > > unsigned long head = READ_ONCE(page->compound_head);
> > > > >
> > > > > The bisect so far indicates one of those could be bad,
> > > >
> > > > I guess that means the issue is reproducible on the arm64 for-next/core
> > > > branch. Once I have your .config, I'll give it a go.
> > >
> > > FWIW, I've managed to reproduce this using defconfig + SW_TAGS on
> > > for-next/core, so I'll keep investigating.
>
> I've installed clang-8 and enabled CONFIG_KASAN_SW_TAGS and was able to
> reproduce the problem quite rapidly. Many apologies for missing this
> before in my testing.
>
> >
> > Right, hacky diff below seems to resolve this, so I'll split this up into
> > some proper patches as there is more than one bug here.
> >
> > Thanks,
> >
> > Will
> >
> > --->8
> >
> > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> FWIW, this fixed the crashes I experienced, I'll run some additional
> tests.
>

This works for me with 52-bit VAs + CONFIG_KASAN_SW_TAGS +
CONFIG_DEBUG_VIRTUAL + CONFIG_DEBUG_VM

FWIW:
Tested-by: Steve Capper <[email protected]>

Cheers,
--
Steve