Hello
When testing 20201130, I hit this problem:
+[ 213.050152] ------------[ cut here ]------------
+[ 213.050207] WARNING: CPU: 0 PID: 18430 at mm/highmem.c:581 kunmap_local_indexed+0x194/0x1d4
+[ 213.050214] Modules linked in: sm4_generic authenc vmac xcbc hmac streebog_generic sm3_generic sha3_generic crct10dif_generic crct10dif_common seed rmd320 rmd256 rmd160 rmd128 cts lzo lzo_compress salsa20_generic camellia_generic fcrypt pcbc tgr192 anubis wp512 khazad tea michael_mic arc4 cast6_generic cast5_generic cast_common deflate zlib_deflate sha512_generic cfb ofb serpent_generic lrw twofish_generic twofish_common blowfish_generic blowfish_common md4
+[ 213.050410] CPU: 0 PID: 18430 Comm: cryptsetup Not tainted 5.10.0-rc5-next-20201130-00059-gf7ecf0611042-dirty #242
+[ 213.050416] Hardware name: Allwinner sun7i (A20) Family
+[ 213.050448] [<c010d730>] (unwind_backtrace) from [<c010a218>] (show_stack+0x10/0x14)
+[ 213.050465] [<c010a218>] (show_stack) from [<c08bbdcc>] (dump_stack+0x98/0xac)
+[ 213.050479] [<c08bbdcc>] (dump_stack) from [<c08b93ac>] (__warn+0xc0/0xd8)
+[ 213.050491] [<c08b93ac>] (__warn) from [<c08b9428>] (warn_slowpath_fmt+0x64/0xc0)
+[ 213.050505] [<c08b9428>] (warn_slowpath_fmt) from [<c02018b4>] (kunmap_local_indexed+0x194/0x1d4)
+[ 213.050525] [<c02018b4>] (kunmap_local_indexed) from [<c03e0390>] (sg_miter_stop+0xb4/0x164)
+[ 213.050541] [<c03e0390>] (sg_miter_stop) from [<c03e082c>] (sg_miter_next+0xc/0xe4)
+[ 213.050560] [<c03e082c>] (sg_miter_next) from [<c06b2d04>] (sun4i_ss_opti_poll+0x278/0x40c)
+[ 213.050575] [<c06b2d04>] (sun4i_ss_opti_poll) from [<c06b338c>] (sun4i_ss_cipher_poll+0x4f4/0x5e4)
+[ 213.050590] [<c06b338c>] (sun4i_ss_cipher_poll) from [<c03991d4>] (crypto_skcipher_encrypt+0x38/0x5c)
+[ 213.050604] [<c03991d4>] (crypto_skcipher_encrypt) from [<c03aa980>] (xts_encrypt+0x8c/0xd4)
+[ 213.050617] [<c03aa980>] (xts_encrypt) from [<c03991d4>] (crypto_skcipher_encrypt+0x38/0x5c)
+[ 213.050631] [<c03991d4>] (crypto_skcipher_encrypt) from [<c03b3d94>] (skcipher_recvmsg+0x364/0x43c)
+[ 213.050646] [<c03b3d94>] (skcipher_recvmsg) from [<c0719650>] (sock_read_iter+0xa8/0xf8)
+[ 213.050663] [<c0719650>] (sock_read_iter) from [<c0239e98>] (vfs_read+0x2b8/0x2d8)
+[ 213.050676] [<c0239e98>] (vfs_read) from [<c023a398>] (ksys_read+0xb0/0xe4)
+[ 213.050688] [<c023a398>] (ksys_read) from [<c0100060>] (ret_fast_syscall+0x0/0x58)
+[ 213.050695] Exception stack(0xc4d13fa8 to 0xc4d13ff0)
+[ 213.050707] 3fa0: 00000006 b6f084d0 00000006 b47ff000 00010000 00000000
+[ 213.050718] 3fc0: 00000006 b6f084d0 00010000 00000003 00000030 beb6e9bc 00000010 beb6e9fc
+[ 213.050727] 3fe0: b6e3609c beb6e958 b6cc8504 b6cc851c
+[ 213.050735] ---[ end trace 915906e6b0e8a55d ]---
+[ 213.052281] 8<--- cut here ---
+[ 213.052288] Unable to handle kernel paging request at virtual address ffefdf70
+[ 213.052299] pgd = f9321c11
+[ 213.052306] [ffefdf70] *pgd=6ffce861, *pte=00000000, *ppte=00000000
+[ 213.052330] Internal error: Oops: 837 [#1] SMP ARM
+[ 213.057127] Modules linked in: sm4_generic authenc vmac xcbc hmac streebog_generic sm3_generic sha3_generic crct10dif_generic crct10dif_common seed rmd320 rmd256 rmd160 rmd128 cts lzo lzo_compress salsa20_generic camellia_generic fcrypt pcbc tgr192 anubis wp512 khazad tea michael_mic arc4 cast6_generic cast5_generic cast_common deflate zlib_deflate sha512_generic cfb ofb serpent_generic lrw twofish_generic twofish_common blowfish_generic blowfish_common md4
+[ 213.097705] CPU: 0 PID: 18430 Comm: cryptsetup Tainted: G W 5.10.0-rc5-next-20201130-00059-gf7ecf0611042-dirty #242
+[ 213.109424] Hardware name: Allwinner sun7i (A20) Family
+[ 213.114651] PC is at __raw_readsl+0x30/0x100
+[ 213.118917] LR is at 0xe63d133d
+[ 213.122056] pc : [<c041d528>] lr : [<e63d133d>] psr: 20000093
+[ 213.128314] sp : c4d13ca0 ip : 52cee82d fp : 00000000
+[ 213.133533] r10: c4d12000 r9 : c4fb0240 r8 : 00000000
+[ 213.138752] r7 : 00000fa0 r6 : c506fa40 r5 : 00000018 r4 : 13eb444f
+[ 213.145271] r3 : 2394c9a0 r2 : 00000004 r1 : ffefdf70 r0 : f08ea204
+[ 213.151791] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
+[ 213.159004] Control: 10c5387d Table: 44f8806a DAC: 00000051
+[ 213.164744] Process cryptsetup (pid: 18430, stack limit = 0x41d2aece)
+[ 213.171176] Stack: (0xc4d13ca0 to 0xc4d14000)
+[ 213.175535] 3ca0: 0000000d c06b2d88 eff19a00 80000013 00000038 00000001 00000000 c506fb64
+[ 213.183705] 3cc0: 60000013 00000000 c4d0b900 00000000 00000000 00000000 00000000 00000000
+[ 213.191875] 3ce0: 00000000 00000000 00000001 00001000 00000000 00000005 efef0a60 ffefd000
+[ 213.200046] 3d00: 00001000 00001000 c4fb00f8 00000000 00000001 00000001 00000000 00001000
+[ 213.208216] 3d20: 00000003 d1b3a700 efef06a0 c0dd81c0 c51d6d00 c506fa40 00000000 c4fb0240
+[ 213.216387] 3d40: c4fb0008 00000000 00000000 c06b338c 793d365f c0201500 c4d13da8 ff9735a0
+[ 213.224557] 3d60: c4d12000 00000000 00000001 c0399a28 c4d13da8 00000000 00000000 c0399fe8
+[ 213.232728] 3d80: 44272f52 004bf960 b4c30b1c 310de6b4 c4fb01c0 c03aa018 00000000 c4d12000
+[ 213.240899] 3da0: 00000001 00000000 ef864000 c4d80000 efef0a60 ffefe000 c50463f0 00001000
+[ 213.249069] 3dc0: 00000000 c4fb00f8 00001000 00000000 c4ff8ec0 c4fb01c0 00000000 00000000
+[ 213.257240] 3de0: c4d48cc0 d1b3a700 00000010 c4fb0240 c51d6d00 c0dd81c0 00010000 00000000
+[ 213.265410] 3e00: c4d13ea0 c4fb01c0 00000000 c03991d4 c4fb01c0 c4fb0200 00000000 00010000
+[ 213.273581] 3e20: 00000000 c03aa980 c4fb01c0 c51d6b80 c4d6f480 c03991d4 c4fb0000 c4d48fd0
+[ 213.281752] 3e40: c4d48fc0 c51d6b80 00000000 c03b3d94 00010000 c4d13e70 c4964b40 c0de0e40
+[ 213.289923] 3e60: c0394578 c03b20e8 c4d7aa00 c4d12000 00010000 d1b3a700 c4d1b840 c4d13f08
+[ 213.298093] 3e80: c03b3a30 c4d12000 c4d1b840 c4964b40 00000000 00000001 00000000 c0719650
+[ 213.306263] 3ea0: 00000000 00000000 00000004 00000000 00000000 c4d13f08 00000000 c071d5d4
+[ 213.314433] 3ec0: 00000000 00000000 00000000 00000000 c4d13f20 d1b3a700 00010000 c4d1b840
+[ 213.322604] 3ee0: 00000000 00000000 c4d12000 c0239e98 00010000 00000020 c4d6f480 c4d48fc0
+[ 213.330775] 3f00: b47ff000 00010000 00000004 00000000 00010000 c4d13f00 00000001 c071d5d4
+[ 213.338945] 3f20: c4d1b840 00000000 00000000 00000000 00000000 00000000 00000000 00000000
+[ 213.347115] 3f40: 00000000 00000000 00000000 d1b3a700 00000000 c4d1b840 c4d1b840 00010000
+[ 213.355286] 3f60: 00000003 c4d12000 00010000 b47ff000 beb6e9fc c023a398 c4d48e40 0000011c
+[ 213.363456] 3f80: 00000030 d1b3a700 00000006 b6f084d0 00010000 00000003 c0100264 c4d12000
+[ 213.371627] 3fa0: 00000003 c0100060 00000006 b6f084d0 00000006 b47ff000 00010000 00000000
+[ 213.379797] 3fc0: 00000006 b6f084d0 00010000 00000003 00000030 beb6e9bc 00000010 beb6e9fc
+[ 213.387968] 3fe0: b6e3609c beb6e958 b6cc8504 b6cc851c 80000010 00000006 00000000 00000000
+[ 213.396142] Code: e5904000 e590c000 e590e000 e2522004 (e8a15018)
+[ 213.402232] ---[ end trace 915906e6b0e8a55e ]---
+[ 213.406848] note: cryptsetup[18430] exited with preempt_count 2
The same driver work fine on 5.9.11.
Does someone have any idea on what go wrong ?
Does it is related with recent kmap work ?
Regards
On Tue, Dec 01 2020 at 14:01, Corentin Labbe wrote:
> +[ 213.050152] ------------[ cut here ]------------
> +[ 213.050207] WARNING: CPU: 0 PID: 18430 at mm/highmem.c:581 kunmap_local_indexed+0x194/0x1d4
> +[ 213.050214] Modules linked in: sm4_generic authenc vmac xcbc hmac streebog_generic sm3_generic sha3_generic crct10dif_generic crct10dif_common seed rmd320 rmd256 rmd160 rmd128 cts lzo lzo_compress salsa20_generic camellia_generic fcrypt pcbc tgr192 anubis wp512 khazad tea michael_mic arc4 cast6_generic cast5_generic cast_common deflate zlib_deflate sha512_generic cfb ofb serpent_generic lrw twofish_generic twofish_common blowfish_generic blowfish_common md4
> +[ 213.050410] CPU: 0 PID: 18430 Comm: cryptsetup Not tainted 5.10.0-rc5-next-20201130-00059-gf7ecf0611042-dirty #242
> +[ 213.050416] Hardware name: Allwinner sun7i (A20) Family
> +[ 213.050448] [<c010d730>] (unwind_backtrace) from [<c010a218>] (show_stack+0x10/0x14)
> +[ 213.050465] [<c010a218>] (show_stack) from [<c08bbdcc>] (dump_stack+0x98/0xac)
> +[ 213.050479] [<c08bbdcc>] (dump_stack) from [<c08b93ac>] (__warn+0xc0/0xd8)
> +[ 213.050491] [<c08b93ac>] (__warn) from [<c08b9428>] (warn_slowpath_fmt+0x64/0xc0)
> +[ 213.050505] [<c08b9428>] (warn_slowpath_fmt) from [<c02018b4>] (kunmap_local_indexed+0x194/0x1d4)
> +[ 213.050525] [<c02018b4>] (kunmap_local_indexed) from [<c03e0390>] (sg_miter_stop+0xb4/0x164)
> +[ 213.050541] [<c03e0390>] (sg_miter_stop) from [<c03e082c>] (sg_miter_next+0xc/0xe4)
> +[ 213.050560] [<c03e082c>] (sg_miter_next) from [<c06b2d04>] (sun4i_ss_opti_poll+0x278/0x40c)
> +[ 213.050575] [<c06b2d04>] (sun4i_ss_opti_poll) from [<c06b338c>] (sun4i_ss_cipher_poll+0x4f4/0x5e4)
> +[ 213.050590] [<c06b338c>] (sun4i_ss_cipher_poll) from [<c03991d4>] (crypto_skcipher_encrypt+0x38/0x5c)
> +[ 213.050604] [<c03991d4>] (crypto_skcipher_encrypt) from [<c03aa980>] (xts_encrypt+0x8c/0xd4)
> +[ 213.050617] [<c03aa980>] (xts_encrypt) from [<c03991d4>] (crypto_skcipher_encrypt+0x38/0x5c)
> +[ 213.050631] [<c03991d4>] (crypto_skcipher_encrypt) from [<c03b3d94>] (skcipher_recvmsg+0x364/0x43c)
> +[ 213.050646] [<c03b3d94>] (skcipher_recvmsg) from [<c0719650>] (sock_read_iter+0xa8/0xf8)
> +[ 213.050663] [<c0719650>] (sock_read_iter) from [<c0239e98>] (vfs_read+0x2b8/0x2d8)
> +[ 213.050676] [<c0239e98>] (vfs_read) from [<c023a398>] (ksys_read+0xb0/0xe4)
> +[ 213.050688] [<c023a398>] (ksys_read) from [<c0100060>] (ret_fast_syscall+0x0/0x58)
> +[ 213.050695] Exception stack(0xc4d13fa8 to 0xc4d13ff0)
> +[ 213.050707] 3fa0: 00000006 b6f084d0 00000006 b47ff000 00010000 00000000
> +[ 213.050718] 3fc0: 00000006 b6f084d0 00010000 00000003 00000030 beb6e9bc 00000010 beb6e9fc
> +[ 213.050727] 3fe0: b6e3609c beb6e958 b6cc8504 b6cc851c
> +[ 213.050735] ---[ end trace 915906e6b0e8a55d ]---
Hmm. No registers there. Can you apply the patch below so we can see the
address?
Thanks,
tglx
---
diff --git a/mm/highmem.c b/mm/highmem.c
index b49364a306b8..240fc6e5bfb4 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -571,8 +571,10 @@ void kunmap_local_indexed(void *vaddr)
* PAGE_OFFSET. Warn for all other addresses which are in
* the user space part of the virtual address space.
*/
- if (!kmap_high_unmap_local(addr))
+ if (!kmap_high_unmap_local(addr)) {
+ pr_err("kunmap_local: vaddr %lx\n", (unsigned long) vaddr);
WARN_ON_ONCE(addr < PAGE_OFFSET);
+ }
return;
}
On Tue, Dec 01, 2020 at 02:28:54PM +0100, Thomas Gleixner wrote:
> On Tue, Dec 01 2020 at 14:01, Corentin Labbe wrote:
> > +[ 213.050152] ------------[ cut here ]------------
> > +[ 213.050207] WARNING: CPU: 0 PID: 18430 at mm/highmem.c:581 kunmap_local_indexed+0x194/0x1d4
> > +[ 213.050214] Modules linked in: sm4_generic authenc vmac xcbc hmac streebog_generic sm3_generic sha3_generic crct10dif_generic crct10dif_common seed rmd320 rmd256 rmd160 rmd128 cts lzo lzo_compress salsa20_generic camellia_generic fcrypt pcbc tgr192 anubis wp512 khazad tea michael_mic arc4 cast6_generic cast5_generic cast_common deflate zlib_deflate sha512_generic cfb ofb serpent_generic lrw twofish_generic twofish_common blowfish_generic blowfish_common md4
> > +[ 213.050410] CPU: 0 PID: 18430 Comm: cryptsetup Not tainted 5.10.0-rc5-next-20201130-00059-gf7ecf0611042-dirty #242
> > +[ 213.050416] Hardware name: Allwinner sun7i (A20) Family
> > +[ 213.050448] [<c010d730>] (unwind_backtrace) from [<c010a218>] (show_stack+0x10/0x14)
> > +[ 213.050465] [<c010a218>] (show_stack) from [<c08bbdcc>] (dump_stack+0x98/0xac)
> > +[ 213.050479] [<c08bbdcc>] (dump_stack) from [<c08b93ac>] (__warn+0xc0/0xd8)
> > +[ 213.050491] [<c08b93ac>] (__warn) from [<c08b9428>] (warn_slowpath_fmt+0x64/0xc0)
> > +[ 213.050505] [<c08b9428>] (warn_slowpath_fmt) from [<c02018b4>] (kunmap_local_indexed+0x194/0x1d4)
> > +[ 213.050525] [<c02018b4>] (kunmap_local_indexed) from [<c03e0390>] (sg_miter_stop+0xb4/0x164)
> > +[ 213.050541] [<c03e0390>] (sg_miter_stop) from [<c03e082c>] (sg_miter_next+0xc/0xe4)
> > +[ 213.050560] [<c03e082c>] (sg_miter_next) from [<c06b2d04>] (sun4i_ss_opti_poll+0x278/0x40c)
> > +[ 213.050575] [<c06b2d04>] (sun4i_ss_opti_poll) from [<c06b338c>] (sun4i_ss_cipher_poll+0x4f4/0x5e4)
> > +[ 213.050590] [<c06b338c>] (sun4i_ss_cipher_poll) from [<c03991d4>] (crypto_skcipher_encrypt+0x38/0x5c)
> > +[ 213.050604] [<c03991d4>] (crypto_skcipher_encrypt) from [<c03aa980>] (xts_encrypt+0x8c/0xd4)
> > +[ 213.050617] [<c03aa980>] (xts_encrypt) from [<c03991d4>] (crypto_skcipher_encrypt+0x38/0x5c)
> > +[ 213.050631] [<c03991d4>] (crypto_skcipher_encrypt) from [<c03b3d94>] (skcipher_recvmsg+0x364/0x43c)
> > +[ 213.050646] [<c03b3d94>] (skcipher_recvmsg) from [<c0719650>] (sock_read_iter+0xa8/0xf8)
> > +[ 213.050663] [<c0719650>] (sock_read_iter) from [<c0239e98>] (vfs_read+0x2b8/0x2d8)
> > +[ 213.050676] [<c0239e98>] (vfs_read) from [<c023a398>] (ksys_read+0xb0/0xe4)
> > +[ 213.050688] [<c023a398>] (ksys_read) from [<c0100060>] (ret_fast_syscall+0x0/0x58)
> > +[ 213.050695] Exception stack(0xc4d13fa8 to 0xc4d13ff0)
> > +[ 213.050707] 3fa0: 00000006 b6f084d0 00000006 b47ff000 00010000 00000000
> > +[ 213.050718] 3fc0: 00000006 b6f084d0 00010000 00000003 00000030 beb6e9bc 00000010 beb6e9fc
> > +[ 213.050727] 3fe0: b6e3609c beb6e958 b6cc8504 b6cc851c
> > +[ 213.050735] ---[ end trace 915906e6b0e8a55d ]---
>
> Hmm. No registers there. Can you apply the patch below so we can see the
> address?
>
> Thanks,
>
> tglx
> ---
> diff --git a/mm/highmem.c b/mm/highmem.c
> index b49364a306b8..240fc6e5bfb4 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -571,8 +571,10 @@ void kunmap_local_indexed(void *vaddr)
> * PAGE_OFFSET. Warn for all other addresses which are in
> * the user space part of the virtual address space.
> */
> - if (!kmap_high_unmap_local(addr))
> + if (!kmap_high_unmap_local(addr)) {
> + pr_err("kunmap_local: vaddr %lx\n", (unsigned long) vaddr);
> WARN_ON_ONCE(addr < PAGE_OFFSET);
> + }
> return;
> }
>
The patch made the board too busy logging and fail to boot until the test.
Regards
On Tue, Dec 01 2020 at 14:52, Corentin Labbe wrote:
> On Tue, Dec 01, 2020 at 02:28:54PM +0100, Thomas Gleixner wrote:
> The patch made the board too busy logging and fail to boot until the test.
Stupid me. Of course this wants to be conditional.
Thanks,
tglx
---
diff --git a/mm/highmem.c b/mm/highmem.c
index b49364a306b8..178b126ab4f6 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -571,8 +571,10 @@ void kunmap_local_indexed(void *vaddr)
* PAGE_OFFSET. Warn for all other addresses which are in
* the user space part of the virtual address space.
*/
- if (!kmap_high_unmap_local(addr))
- WARN_ON_ONCE(addr < PAGE_OFFSET);
+ if (!kmap_high_unmap_local(addr)) {
+ if (WARN_ON_ONCE(addr < PAGE_OFFSET))
+ pr_err("kunmap_local: vaddr %lx\n", (unsigned long) vaddr);
+ }
return;
}
On Tue, Dec 01, 2020 at 03:16:36PM +0100, Thomas Gleixner wrote:
> On Tue, Dec 01 2020 at 14:52, Corentin Labbe wrote:
> > On Tue, Dec 01, 2020 at 02:28:54PM +0100, Thomas Gleixner wrote:
> > The patch made the board too busy logging and fail to boot until the test.
>
> Stupid me. Of course this wants to be conditional.
>
> Thanks,
>
> tglx
> ---
> diff --git a/mm/highmem.c b/mm/highmem.c
> index b49364a306b8..178b126ab4f6 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -571,8 +571,10 @@ void kunmap_local_indexed(void *vaddr)
> * PAGE_OFFSET. Warn for all other addresses which are in
> * the user space part of the virtual address space.
> */
> - if (!kmap_high_unmap_local(addr))
> - WARN_ON_ONCE(addr < PAGE_OFFSET);
> + if (!kmap_high_unmap_local(addr)) {
> + if (WARN_ON_ONCE(addr < PAGE_OFFSET))
> + pr_err("kunmap_local: vaddr %lx\n", (unsigned long) vaddr);
> + }
> return;
> }
>
In fact the warn was a bit later so I added:
preempt_disable();
idx = arch_kmap_local_unmap_idx(kmap_local_idx(), addr);
- WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
+ if (WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx)))
+ pr_err("kunmap_local: vaddr %lx\n", (unsigned long) vaddr);
arch_kmap_local_pre_unmap(addr);
pte_clear(&init_mm, addr, kmap_pte - idx);
and this give kunmap_local: vaddr ffefe000
full dump:
+[ 215.602507] ------------[ cut here ]------------
+[ 215.602563] WARNING: CPU: 0 PID: 18429 at mm/highmem.c:584 kunmap_local_indexed+0x204/0x228
+[ 215.602570] Modules linked in: sm4_generic authenc vmac xcbc hmac streebog_generic sm3_generic sha3_generic crct10dif_generic crct10dif_common seed rmd320 rmd256 rmd160 rmd128 cts lzo lzo_compress salsa20_generic camellia_generic fcrypt pcbc tgr192 anubis wp512 khazad tea michael_mic arc4 cast6_generic cast5_generic cast_common deflate zlib_deflate sha512_generic cfb ofb serpent_generic lrw twofish_generic twofish_common blowfish_generic blowfish_common md4
+[ 215.602767] CPU: 0 PID: 18429 Comm: cryptsetup Tainted: G W 5.10.0-rc5-next-20201130-00059-gf7ecf0611042-dirty #247
+[ 215.602772] Hardware name: Allwinner sun7i (A20) Family
+[ 215.602805] [<c010d730>] (unwind_backtrace) from [<c010a218>] (show_stack+0x10/0x14)
+[ 215.602821] [<c010a218>] (show_stack) from [<c08bbe2c>] (dump_stack+0x98/0xac)
+[ 215.602834] [<c08bbe2c>] (dump_stack) from [<c08b940c>] (__warn+0xc0/0xd8)
+[ 215.602846] [<c08b940c>] (__warn) from [<c08b9488>] (warn_slowpath_fmt+0x64/0xc0)
+[ 215.602860] [<c08b9488>] (warn_slowpath_fmt) from [<c0201924>] (kunmap_local_indexed+0x204/0x228)
+[ 215.602879] [<c0201924>] (kunmap_local_indexed) from [<c03e03e8>] (sg_miter_stop+0xb4/0x164)
+[ 215.602895] [<c03e03e8>] (sg_miter_stop) from [<c03e0884>] (sg_miter_next+0xc/0xe4)
+[ 215.602913] [<c03e0884>] (sg_miter_next) from [<c06b2d64>] (sun4i_ss_opti_poll+0x278/0x40c)
+[ 215.602929] [<c06b2d64>] (sun4i_ss_opti_poll) from [<c06b33ec>] (sun4i_ss_cipher_poll+0x4f4/0x5e4)
+[ 215.602944] [<c06b33ec>] (sun4i_ss_cipher_poll) from [<c039922c>] (crypto_skcipher_encrypt+0x38/0x5c)
+[ 215.602958] [<c039922c>] (crypto_skcipher_encrypt) from [<c03aa9d8>] (xts_encrypt+0x8c/0xd4)
+[ 215.602971] [<c03aa9d8>] (xts_encrypt) from [<c039922c>] (crypto_skcipher_encrypt+0x38/0x5c)
+[ 215.602984] [<c039922c>] (crypto_skcipher_encrypt) from [<c03b3dec>] (skcipher_recvmsg+0x364/0x43c)
+[ 215.602999] [<c03b3dec>] (skcipher_recvmsg) from [<c07196b0>] (sock_read_iter+0xa8/0xf8)
+[ 215.603017] [<c07196b0>] (sock_read_iter) from [<c0239eec>] (vfs_read+0x2b8/0x2d8)
+[ 215.603030] [<c0239eec>] (vfs_read) from [<c023a3ec>] (ksys_read+0xb0/0xe4)
+[ 215.603042] [<c023a3ec>] (ksys_read) from [<c0100060>] (ret_fast_syscall+0x0/0x58)
+[ 215.603050] Exception stack(0xc4de9fa8 to 0xc4de9ff0)
+[ 215.603061] 9fa0: 00000006 b6fea4d0 00000006 b48ce000 00010000 00000000
+[ 215.603073] 9fc0: 00000006 b6fea4d0 00010000 00000003 00000030 bebad9bc 00000010 bebad9fc
+[ 215.603082] 9fe0: b6f1809c bebad958 b6daa504 b6daa51c
+[ 215.603091] ---[ end trace d5a2f8d9794ce46d ]---
+[ 215.603098] kunmap_local: vaddr ffefe000
+[ 215.604643] 8<--- cut here ---
+[ 215.604650] Unable to handle kernel paging request at virtual address ffefdf70
+[ 215.604661] pgd = b4af32d8
+[ 215.604667] [ffefdf70] *pgd=6ffce861, *pte=00000000, *ppte=00000000
+[ 215.604692] Internal error: Oops: 837 [#1] SMP ARM
+[ 215.609490] Modules linked in: sm4_generic authenc vmac xcbc hmac streebog_generic sm3_generic sha3_generic crct10dif_generic crct10dif_common seed rmd320 rmd256 rmd160 rmd128 cts lzo lzo_compress salsa20_generic camellia_generic fcrypt pcbc tgr192 anubis wp512 khazad tea michael_mic arc4 cast6_generic cast5_generic cast_common deflate zlib_deflate sha512_generic cfb ofb serpent_generic lrw twofish_generic twofish_common blowfish_generic blowfish_common md4
+[ 215.650069] CPU: 0 PID: 18429 Comm: cryptsetup Tainted: G W 5.10.0-rc5-next-20201130-00059-gf7ecf0611042-dirty #247
+[ 215.661789] Hardware name: Allwinner sun7i (A20) Family
+[ 215.667017] PC is at __raw_readsl+0x30/0x100
+[ 215.671282] LR is at 0x3a692a37
+[ 215.674421] pc : [<c041d588>] lr : [<3a692a37>] psr: 20000093
+[ 215.680679] sp : c4de9ca0 ip : c01ea23f fp : 00000000
+[ 215.685897] r10: c4de8000 r9 : c51ca640 r8 : 00000000
+[ 215.691116] r7 : 00000fa0 r6 : c5068a40 r5 : 00000018 r4 : 4f0e2404
+[ 215.697634] r3 : 3e034c16 r2 : 00000004 r1 : ffefdf70 r0 : f08ea204
+[ 215.704155] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
+[ 215.711368] Control: 10c5387d Table: 44f3806a DAC: 00000051
+[ 215.717108] Process cryptsetup (pid: 18429, stack limit = 0xca52e1ce)
+[ 215.723541] Stack: (0xc4de9ca0 to 0xc4dea000)
+[ 215.727899] 9ca0: 0000000d c06b2de8 eff19c80 80000013 00000374 00000001 00000000 c5068b64
+[ 215.736069] 9cc0: 60000013 00000000 ffefe374 00000000 00000000 00000000 00000000 00000000
+[ 215.744239] 9ce0: 00000000 00000000 00000001 00001000 00000000 00000005 efed8240 ffefd000
+[ 215.752409] 9d00: 00001000 00001000 c51ca4f8 00000000 00000001 00000001 00000000 00001000
+[ 215.760580] 9d20: 00000003 d09453dc efed7e80 c0dd81c0 c4e394c0 c5068a40 00000000 c51ca640
+[ 215.768750] 9d40: c51ca408 00000000 00000000 c06b33ec 7879265f c0201500 c4de9da8 ff98fca0
+[ 215.776920] 9d60: c4de8000 00000000 00000001 00000000 c4de9da8 00000000 00000000 c039a040
+[ 215.785090] 9d80: 68874dfe e29ea43a 0001fe9b 2581e6c9 c51ca5c0 c03aa070 00000000 c4de8000
+[ 215.793261] 9da0: 00000001 00000000 ef867ee0 c4f77000 efed8240 ffefe000 c50b54f0 00001000
+[ 215.801431] 9dc0: 00000000 c51ca4f8 00001000 00000000 c5ffbec0 c51ca5c0 00000000 00000000
+[ 215.809602] 9de0: c4e7bcc0 d09453dc 00000010 c51ca640 c4e394c0 c0dd81c0 00010000 00000000
+[ 215.817772] 9e00: c4de9ea0 c51ca5c0 00000000 c039922c c51ca5c0 c51ca600 00000000 00010000
+[ 215.825943] 9e20: 00000000 c03aa9d8 c51ca5c0 c4e39700 c4d69080 c039922c c51ca400 c4e7bc90
+[ 215.834113] 9e40: c4e7bc80 c4e39700 00000000 c03b3dec 00010000 c4de9e70 c15cfb00 c0de0e40
+[ 215.842283] 9e60: c03945d0 c03b2140 c4e80400 c4de8000 00010000 d09453dc c12f1b40 c4de9f08
+[ 215.850454] 9e80: c03b3a88 c4de8000 c12f1b40 c15cfb00 00000000 00000001 00000000 c07196b0
+[ 215.858624] 9ea0: 00000000 00000000 00000004 00000000 00000000 c4de9f08 00000000 c071d634
+[ 215.866794] 9ec0: 00000000 00000000 00000000 00000000 c4de9f20 d09453dc 00010000 c12f1b40
+[ 215.874965] 9ee0: 00000000 00000000 c4de8000 c0239eec 00010000 00000020 c4d69080 c4e7bc80
+[ 215.883135] 9f00: b48ce000 00010000 00000004 00000000 00010000 c4de9f00 00000001 c071d634
+[ 215.891305] 9f20: c12f1b40 00000000 00000000 00000000 00000000 00000000 00000000 00000000
+[ 215.899475] 9f40: 00000000 00000000 00000000 d09453dc 00000000 c12f1b40 c12f1b40 00010000
+[ 215.907645] 9f60: 00000003 c4de8000 00010000 b48ce000 bebad9fc c023a3ec c4e7bd00 0000011c
+[ 215.915816] 9f80: 00000030 d09453dc 00000006 b6fea4d0 00010000 00000003 c0100264 c4de8000
+[ 215.923987] 9fa0: 00000003 c0100060 00000006 b6fea4d0 00000006 b48ce000 00010000 00000000
+[ 215.932157] 9fc0: 00000006 b6fea4d0 00010000 00000003 00000030 bebad9bc 00000010 bebad9fc
+[ 215.940327] 9fe0: b6f1809c bebad958 b6daa504 b6daa51c 80000010 00000006 00000000 00000000
+[ 215.948501] Code: e5904000 e590c000 e590e000 e2522004 (e8a15018)
+[ 215.954592] ---[ end trace d5a2f8d9794ce46e ]---
+[ 215.959208] note: cryptsetup[18429] exited with preempt_count 2
On Tue, Dec 01 2020 at 15:45, Corentin Labbe wrote:
> On Tue, Dec 01, 2020 at 03:16:36PM +0100, Thomas Gleixner wrote:
> In fact the warn was a bit later so I added:
> preempt_disable();
> idx = arch_kmap_local_unmap_idx(kmap_local_idx(), addr);
> - WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
> + if (WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx)))
> + pr_err("kunmap_local: vaddr %lx\n", (unsigned long) vaddr);
>
> arch_kmap_local_pre_unmap(addr);
> pte_clear(&init_mm, addr, kmap_pte - idx);
>
> and this give kunmap_local: vaddr ffefe000
which looks like a valid one.
Can you apply the patch below and add 'ftrace_dump_on_oops' on the
command line or enable it in /proc/sys/kernel/ftrace_dump_on_oops before
starting the test.
That should spill out the trace after crashing.
Thanks,
tglx
---
diff --git a/mm/highmem.c b/mm/highmem.c
index b49364a306b8..461fe2c26107 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -485,6 +485,7 @@ static inline bool kmap_high_unmap_local(unsigned long vaddr)
{
#ifdef ARCH_NEEDS_KMAP_HIGH_GET
if (vaddr >= PKMAP_ADDR(0) && vaddr < PKMAP_ADDR(LAST_PKMAP)) {
+ trace_printk("kunmap_high: %lx\n", vaddr);
kunmap_high(pte_page(pkmap_page_table[PKMAP_NR(vaddr)]));
return true;
}
@@ -520,6 +521,7 @@ void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot)
preempt_disable();
idx = arch_kmap_local_map_idx(kmap_local_idx_push(), pfn);
vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+ trace_printk("kmap_local_pfn: %d %lx\n", idx, (unsigned long) vaddr);
BUG_ON(!pte_none(*(kmap_pte - idx)));
pteval = pfn_pte(pfn, prot);
set_pte_at(&init_mm, vaddr, kmap_pte - idx, pteval);
@@ -545,8 +547,10 @@ void *__kmap_local_page_prot(struct page *page, pgprot_t prot)
/* Try kmap_high_get() if architecture has it enabled */
kmap = arch_kmap_local_high_get(page);
- if (kmap)
+ if (kmap) {
+ trace_printk("kmap_local_high_get: %lx\n", (unsigned long) kmap);
return kmap;
+ }
return __kmap_local_pfn_prot(page_to_pfn(page), prot);
}
@@ -578,6 +582,7 @@ void kunmap_local_indexed(void *vaddr)
preempt_disable();
idx = arch_kmap_local_unmap_idx(kmap_local_idx(), addr);
+ trace_printk("kunmap_local: %i %lx\n", idx, (unsigned long) vaddr);
WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
arch_kmap_local_pre_unmap(addr);
On Tue, Dec 01, 2020 at 04:15:08PM +0100, Thomas Gleixner wrote:
> On Tue, Dec 01 2020 at 15:45, Corentin Labbe wrote:
> > On Tue, Dec 01, 2020 at 03:16:36PM +0100, Thomas Gleixner wrote:
> > In fact the warn was a bit later so I added:
> > preempt_disable();
> > idx = arch_kmap_local_unmap_idx(kmap_local_idx(), addr);
> > - WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
> > + if (WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx)))
> > + pr_err("kunmap_local: vaddr %lx\n", (unsigned long) vaddr);
> >
> > arch_kmap_local_pre_unmap(addr);
> > pte_clear(&init_mm, addr, kmap_pte - idx);
> >
> > and this give kunmap_local: vaddr ffefe000
>
> which looks like a valid one.
>
> Can you apply the patch below and add 'ftrace_dump_on_oops' on the
> command line or enable it in /proc/sys/kernel/ftrace_dump_on_oops before
> starting the test.
>
> That should spill out the trace after crashing.
>
> Thanks,
>
> tglx
> ---
> diff --git a/mm/highmem.c b/mm/highmem.c
> index b49364a306b8..461fe2c26107 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -485,6 +485,7 @@ static inline bool kmap_high_unmap_local(unsigned long vaddr)
> {
> #ifdef ARCH_NEEDS_KMAP_HIGH_GET
> if (vaddr >= PKMAP_ADDR(0) && vaddr < PKMAP_ADDR(LAST_PKMAP)) {
> + trace_printk("kunmap_high: %lx\n", vaddr);
> kunmap_high(pte_page(pkmap_page_table[PKMAP_NR(vaddr)]));
> return true;
> }
> @@ -520,6 +521,7 @@ void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot)
> preempt_disable();
> idx = arch_kmap_local_map_idx(kmap_local_idx_push(), pfn);
> vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
> + trace_printk("kmap_local_pfn: %d %lx\n", idx, (unsigned long) vaddr);
> BUG_ON(!pte_none(*(kmap_pte - idx)));
> pteval = pfn_pte(pfn, prot);
> set_pte_at(&init_mm, vaddr, kmap_pte - idx, pteval);
> @@ -545,8 +547,10 @@ void *__kmap_local_page_prot(struct page *page, pgprot_t prot)
>
> /* Try kmap_high_get() if architecture has it enabled */
> kmap = arch_kmap_local_high_get(page);
> - if (kmap)
> + if (kmap) {
> + trace_printk("kmap_local_high_get: %lx\n", (unsigned long) kmap);
> return kmap;
> + }
>
> return __kmap_local_pfn_prot(page_to_pfn(page), prot);
> }
> @@ -578,6 +582,7 @@ void kunmap_local_indexed(void *vaddr)
>
> preempt_disable();
> idx = arch_kmap_local_unmap_idx(kmap_local_idx(), addr);
> + trace_printk("kunmap_local: %i %lx\n", idx, (unsigned long) vaddr);
> WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
>
> arch_kmap_local_pre_unmap(addr);
>
The result could be seen at http://kernel.montjoie.ovh/129768.log
The log is 9Mb, but the ftrace dump seems not terminated, tell me if you need more.
Regards
On Wed, Dec 02 2020 at 20:55, Corentin Labbe wrote:
> On Tue, Dec 01, 2020 at 04:15:08PM +0100, Thomas Gleixner wrote:
>
> The result could be seen at http://kernel.montjoie.ovh/129768.log
> The log is 9Mb, but the ftrace dump seems not terminated, tell me if you need more.
Correct, the interesting entries right before the crash are missing. Can
you try to make the trace buffers smaller or wait longer before
terminating the thing?
16k buffer size per CPU should be completely sufficient. That should
give us roughly the last 100 entries per CPU.
'trace_buf_size=16k'
is the command line option for that.
Thanks,
tglx
On Wed, Dec 02, 2020 at 09:59:36PM +0100, Thomas Gleixner wrote:
> On Wed, Dec 02 2020 at 20:55, Corentin Labbe wrote:
> > On Tue, Dec 01, 2020 at 04:15:08PM +0100, Thomas Gleixner wrote:
> >
> > The result could be seen at http://kernel.montjoie.ovh/129768.log
> > The log is 9Mb, but the ftrace dump seems not terminated, tell me if you need more.
>
> Correct, the interesting entries right before the crash are missing. Can
> you try to make the trace buffers smaller or wait longer before
> terminating the thing?
>
> 16k buffer size per CPU should be completely sufficient. That should
> give us roughly the last 100 entries per CPU.
>
> 'trace_buf_size=16k'
>
> is the command line option for that.
>
I have set a longer timeout and now the job end with the crash:
http://kernel.montjoie.ovh/130094.log
On Thu, Dec 03 2020 at 18:38, Corentin Labbe wrote:
> On Wed, Dec 02, 2020 at 09:59:36PM +0100, Thomas Gleixner wrote:
>> On Wed, Dec 02 2020 at 20:55, Corentin Labbe wrote:
>> > On Tue, Dec 01, 2020 at 04:15:08PM +0100, Thomas Gleixner wrote:
>> >
>> > The result could be seen at http://kernel.montjoie.ovh/129768.log
>> > The log is 9Mb, but the ftrace dump seems not terminated, tell me if you need more.
>>
>> Correct, the interesting entries right before the crash are missing. Can
>> you try to make the trace buffers smaller or wait longer before
>> terminating the thing?
>>
>> 16k buffer size per CPU should be completely sufficient. That should
>> give us roughly the last 100 entries per CPU.
>>
>> 'trace_buf_size=16k'
>>
>> is the command line option for that.
>
> I have set a longer timeout and now the job end with the crash:
> http://kernel.montjoie.ovh/130094.log
Ok. So here is the problem:
[ 996.933980] cryptset-316 0d..3 73943313us : __kmap_local_pfn_prot: kmap_local_pfn: 0 ffefe000
[ 996.943030] cryptset-316 0d..4 73943317us : __kmap_local_pfn_prot: kmap_local_pfn: 1 ffefd000
[ 996.952080] cryptset-316 0d..4 73943419us : kunmap_local_indexed: kunmap_local: 1 ffefe000
There are two maps:
1) index 0 vaddr 0xffefe000
2) index 1 vaddr 0xffefd000
Now comes the unmap and unmaps 0xffefe000 which is the first map and not
the second one. -> Fail
[ 74.017103] [<c0251824>] (kunmap_local_indexed) from [<c046ca58>] (sg_miter_stop+0xb4/0x164)
[ 74.025535] [<c046ca58>] (sg_miter_stop) from [<c046cef4>] (sg_miter_next+0xc/0xe4)
[ 74.033191] [<c046cef4>] (sg_miter_next) from [<c075f5dc>] (sun4i_ss_opti_poll+0x278/0x40c)
[ 74.041539] [<c075f5dc>] (sun4i_ss_opti_poll) from [<c075fc64>] (sun4i_ss_cipher_poll+0x4f4/0x5e4)
[ 74.050497] [<c075fc64>] (sun4i_ss_cipher_poll) from [<c0421394>] (crypto_skcipher_encrypt+0x38/0x5c)
[ 74.059713] [<c0421394>] (crypto_skcipher_encrypt) from [<c0432ba0>] (xts_encrypt+0x8c/0xd4)
[ 74.068146] [<c0432ba0>] (xts_encrypt) from [<c0421394>] (crypto_skcipher_encrypt+0x38/0x5c)
[ 74.076581] [<c0421394>] (crypto_skcipher_encrypt) from [<c043bfd4>] (skcipher_recvmsg+0x364/0x43c)
[ 74.085625] [<c043bfd4>] (skcipher_recvmsg) from [<c07c8da0>] (sock_read_iter+0xa8/0xf8)
[ 74.093713] [<c07c8da0>] (sock_read_iter) from [<c028ce94>] (vfs_read+0x2b8/0x2d8)
[ 74.101279] [<c028ce94>] (vfs_read) from [<c028d394>] (ksys_read+0xb0/0xe4)
[ 74.108237] [<c028d394>] (ksys_read) from [<c0100060>] (ret_fast_syscall+0x0/0x58)
The unmap comes from sg_miter_stop() and looking at the previous
map/unmap cycles there are never nested maps.
[ 996.943030] cryptset-316 0d..4 73943317us : __kmap_local_pfn_prot: kmap_local_pfn: 1 ffefd000
is the first event which allocates a nested map.
So something goes south either in sg_miter or in the crypto maze.
Enabling CONFIG_DEBUG_KMAP_LOCAL and function tracing might give us more clue.
Thanks,
tglx
On Fri, Dec 04, 2020 at 12:34:05AM +0100, Thomas Gleixner wrote:
> On Thu, Dec 03 2020 at 18:38, Corentin Labbe wrote:
> > On Wed, Dec 02, 2020 at 09:59:36PM +0100, Thomas Gleixner wrote:
> >> On Wed, Dec 02 2020 at 20:55, Corentin Labbe wrote:
> >> > On Tue, Dec 01, 2020 at 04:15:08PM +0100, Thomas Gleixner wrote:
> >> >
> >> > The result could be seen at http://kernel.montjoie.ovh/129768.log
> >> > The log is 9Mb, but the ftrace dump seems not terminated, tell me if you need more.
> >>
> >> Correct, the interesting entries right before the crash are missing. Can
> >> you try to make the trace buffers smaller or wait longer before
> >> terminating the thing?
> >>
> >> 16k buffer size per CPU should be completely sufficient. That should
> >> give us roughly the last 100 entries per CPU.
> >>
> >> 'trace_buf_size=16k'
> >>
> >> is the command line option for that.
> >
> > I have set a longer timeout and now the job end with the crash:
> > http://kernel.montjoie.ovh/130094.log
>
> Ok. So here is the problem:
>
> [ 996.933980] cryptset-316 0d..3 73943313us : __kmap_local_pfn_prot: kmap_local_pfn: 0 ffefe000
> [ 996.943030] cryptset-316 0d..4 73943317us : __kmap_local_pfn_prot: kmap_local_pfn: 1 ffefd000
> [ 996.952080] cryptset-316 0d..4 73943419us : kunmap_local_indexed: kunmap_local: 1 ffefe000
>
> There are two maps:
>
> 1) index 0 vaddr 0xffefe000
> 2) index 1 vaddr 0xffefd000
>
> Now comes the unmap and unmaps 0xffefe000 which is the first map and not
> the second one. -> Fail
>
> [ 74.017103] [<c0251824>] (kunmap_local_indexed) from [<c046ca58>] (sg_miter_stop+0xb4/0x164)
> [ 74.025535] [<c046ca58>] (sg_miter_stop) from [<c046cef4>] (sg_miter_next+0xc/0xe4)
> [ 74.033191] [<c046cef4>] (sg_miter_next) from [<c075f5dc>] (sun4i_ss_opti_poll+0x278/0x40c)
> [ 74.041539] [<c075f5dc>] (sun4i_ss_opti_poll) from [<c075fc64>] (sun4i_ss_cipher_poll+0x4f4/0x5e4)
> [ 74.050497] [<c075fc64>] (sun4i_ss_cipher_poll) from [<c0421394>] (crypto_skcipher_encrypt+0x38/0x5c)
> [ 74.059713] [<c0421394>] (crypto_skcipher_encrypt) from [<c0432ba0>] (xts_encrypt+0x8c/0xd4)
> [ 74.068146] [<c0432ba0>] (xts_encrypt) from [<c0421394>] (crypto_skcipher_encrypt+0x38/0x5c)
> [ 74.076581] [<c0421394>] (crypto_skcipher_encrypt) from [<c043bfd4>] (skcipher_recvmsg+0x364/0x43c)
> [ 74.085625] [<c043bfd4>] (skcipher_recvmsg) from [<c07c8da0>] (sock_read_iter+0xa8/0xf8)
> [ 74.093713] [<c07c8da0>] (sock_read_iter) from [<c028ce94>] (vfs_read+0x2b8/0x2d8)
> [ 74.101279] [<c028ce94>] (vfs_read) from [<c028d394>] (ksys_read+0xb0/0xe4)
> [ 74.108237] [<c028d394>] (ksys_read) from [<c0100060>] (ret_fast_syscall+0x0/0x58)
>
> The unmap comes from sg_miter_stop() and looking at the previous
> map/unmap cycles there are never nested maps.
>
> [ 996.943030] cryptset-316 0d..4 73943317us : __kmap_local_pfn_prot: kmap_local_pfn: 1 ffefd000
>
> is the first event which allocates a nested map.
>
> So something goes south either in sg_miter or in the crypto maze.
>
> Enabling CONFIG_DEBUG_KMAP_LOCAL and function tracing might give us more clue.
Done, http://kernel.montjoie.ovh/130466.log
On Fri, Dec 04 2020 at 14:26, Corentin Labbe wrote:
> On Fri, Dec 04, 2020 at 12:34:05AM +0100, Thomas Gleixner wrote:
>> The unmap comes from sg_miter_stop() and looking at the previous
>> map/unmap cycles there are never nested maps.
>>
>> [ 996.943030] cryptset-316 0d..4 73943317us : __kmap_local_pfn_prot: kmap_local_pfn: 1 ffefd000
>>
>> is the first event which allocates a nested map.
>>
>> So something goes south either in sg_miter or in the crypto maze.
>>
>> Enabling CONFIG_DEBUG_KMAP_LOCAL and function tracing might give us more clue.
>
> Done, http://kernel.montjoie.ovh/130466.log
Does not provide more information with the debug enabled. So can you
please enable CONFIG_FUNCTION_TRACER and add 'ftrace=function' to the
command line?
Thanks,
tglx
On Fri, Dec 04, 2020 at 04:08:27PM +0100, Thomas Gleixner wrote:
> On Fri, Dec 04 2020 at 14:26, Corentin Labbe wrote:
> > On Fri, Dec 04, 2020 at 12:34:05AM +0100, Thomas Gleixner wrote:
> >> The unmap comes from sg_miter_stop() and looking at the previous
> >> map/unmap cycles there are never nested maps.
> >>
> >> [ 996.943030] cryptset-316 0d..4 73943317us : __kmap_local_pfn_prot: kmap_local_pfn: 1 ffefd000
> >>
> >> is the first event which allocates a nested map.
> >>
> >> So something goes south either in sg_miter or in the crypto maze.
> >>
> >> Enabling CONFIG_DEBUG_KMAP_LOCAL and function tracing might give us more clue.
> >
> > Done, http://kernel.montjoie.ovh/130466.log
>
> Does not provide more information with the debug enabled. So can you
> please enable CONFIG_FUNCTION_TRACER and add 'ftrace=function' to the
> command line?
>
Done, http://kernel.montjoie.ovh/130490.log
On Fri, Dec 04 2020 at 20:27, Corentin Labbe wrote:
> On Fri, Dec 04, 2020 at 04:08:27PM +0100, Thomas Gleixner wrote:
>> On Fri, Dec 04 2020 at 14:26, Corentin Labbe wrote:
>> > On Fri, Dec 04, 2020 at 12:34:05AM +0100, Thomas Gleixner wrote:
>> >> The unmap comes from sg_miter_stop() and looking at the previous
>> >> map/unmap cycles there are never nested maps.
>> >>
>> >> [ 996.943030] cryptset-316 0d..4 73943317us : __kmap_local_pfn_prot: kmap_local_pfn: 1 ffefd000
>> >>
>> >> is the first event which allocates a nested map.
>> >>
>> >> So something goes south either in sg_miter or in the crypto maze.
>> >>
>> >> Enabling CONFIG_DEBUG_KMAP_LOCAL and function tracing might give us more clue.
>> >
>> > Done, http://kernel.montjoie.ovh/130466.log
>>
>> Does not provide more information with the debug enabled. So can you
>> please enable CONFIG_FUNCTION_TRACER and add 'ftrace=function' to the
>> command line?
>
> Done, http://kernel.montjoie.ovh/130490.log
Aaargh. That overwrites everything while printing out that
warning.
Can you please replace the debug patch with the one below and try again?
That stops the trace right on the condition.
Thanks,
tglx
---
diff --git a/mm/highmem.c b/mm/highmem.c
index b49364a306b8..8f8862f79d23 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -485,6 +485,7 @@ static inline bool kmap_high_unmap_local(unsigned long vaddr)
{
#ifdef ARCH_NEEDS_KMAP_HIGH_GET
if (vaddr >= PKMAP_ADDR(0) && vaddr < PKMAP_ADDR(LAST_PKMAP)) {
+ trace_printk("kunmap_high: %lx\n", vaddr);
kunmap_high(pte_page(pkmap_page_table[PKMAP_NR(vaddr)]));
return true;
}
@@ -520,6 +521,7 @@ void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot)
preempt_disable();
idx = arch_kmap_local_map_idx(kmap_local_idx_push(), pfn);
vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+ trace_printk("kmap_local_pfn: %d %lx\n", idx, (unsigned long) vaddr);
BUG_ON(!pte_none(*(kmap_pte - idx)));
pteval = pfn_pte(pfn, prot);
set_pte_at(&init_mm, vaddr, kmap_pte - idx, pteval);
@@ -545,8 +547,10 @@ void *__kmap_local_page_prot(struct page *page, pgprot_t prot)
/* Try kmap_high_get() if architecture has it enabled */
kmap = arch_kmap_local_high_get(page);
- if (kmap)
+ if (kmap) {
+ trace_printk("kmap_local_high_get: %lx\n", (unsigned long) kmap);
return kmap;
+ }
return __kmap_local_pfn_prot(page_to_pfn(page), prot);
}
@@ -578,7 +582,11 @@ void kunmap_local_indexed(void *vaddr)
preempt_disable();
idx = arch_kmap_local_unmap_idx(kmap_local_idx(), addr);
- WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
+ trace_printk("kunmap_local: %i %lx\n", idx, (unsigned long) vaddr);
+ if (addr != __fix_to_virt(FIX_KMAP_BEGIN + idx)) {
+ tracing_off();
+ BUG();
+ }
arch_kmap_local_pre_unmap(addr);
pte_clear(&init_mm, addr, kmap_pte - idx);
On Fri, Dec 04, 2020 at 09:58:21PM +0100, Thomas Gleixner wrote:
> On Fri, Dec 04 2020 at 20:27, Corentin Labbe wrote:
> > On Fri, Dec 04, 2020 at 04:08:27PM +0100, Thomas Gleixner wrote:
> >> On Fri, Dec 04 2020 at 14:26, Corentin Labbe wrote:
> >> > On Fri, Dec 04, 2020 at 12:34:05AM +0100, Thomas Gleixner wrote:
> >> >> The unmap comes from sg_miter_stop() and looking at the previous
> >> >> map/unmap cycles there are never nested maps.
> >> >>
> >> >> [ 996.943030] cryptset-316 0d..4 73943317us : __kmap_local_pfn_prot: kmap_local_pfn: 1 ffefd000
> >> >>
> >> >> is the first event which allocates a nested map.
> >> >>
> >> >> So something goes south either in sg_miter or in the crypto maze.
> >> >>
> >> >> Enabling CONFIG_DEBUG_KMAP_LOCAL and function tracing might give us more clue.
> >> >
> >> > Done, http://kernel.montjoie.ovh/130466.log
> >>
> >> Does not provide more information with the debug enabled. So can you
> >> please enable CONFIG_FUNCTION_TRACER and add 'ftrace=function' to the
> >> command line?
> >
> > Done, http://kernel.montjoie.ovh/130490.log
>
> Aaargh. That overwrites everything while printing out that
> warning.
>
> Can you please replace the debug patch with the one below and try again?
> That stops the trace right on the condition.
>
> Thanks,
>
> tglx
> ---
> diff --git a/mm/highmem.c b/mm/highmem.c
> index b49364a306b8..8f8862f79d23 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -485,6 +485,7 @@ static inline bool kmap_high_unmap_local(unsigned long vaddr)
> {
> #ifdef ARCH_NEEDS_KMAP_HIGH_GET
> if (vaddr >= PKMAP_ADDR(0) && vaddr < PKMAP_ADDR(LAST_PKMAP)) {
> + trace_printk("kunmap_high: %lx\n", vaddr);
> kunmap_high(pte_page(pkmap_page_table[PKMAP_NR(vaddr)]));
> return true;
> }
> @@ -520,6 +521,7 @@ void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot)
> preempt_disable();
> idx = arch_kmap_local_map_idx(kmap_local_idx_push(), pfn);
> vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
> + trace_printk("kmap_local_pfn: %d %lx\n", idx, (unsigned long) vaddr);
> BUG_ON(!pte_none(*(kmap_pte - idx)));
> pteval = pfn_pte(pfn, prot);
> set_pte_at(&init_mm, vaddr, kmap_pte - idx, pteval);
> @@ -545,8 +547,10 @@ void *__kmap_local_page_prot(struct page *page, pgprot_t prot)
>
> /* Try kmap_high_get() if architecture has it enabled */
> kmap = arch_kmap_local_high_get(page);
> - if (kmap)
> + if (kmap) {
> + trace_printk("kmap_local_high_get: %lx\n", (unsigned long) kmap);
> return kmap;
> + }
>
> return __kmap_local_pfn_prot(page_to_pfn(page), prot);
> }
> @@ -578,7 +582,11 @@ void kunmap_local_indexed(void *vaddr)
>
> preempt_disable();
> idx = arch_kmap_local_unmap_idx(kmap_local_idx(), addr);
> - WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
> + trace_printk("kunmap_local: %i %lx\n", idx, (unsigned long) vaddr);
> + if (addr != __fix_to_virt(FIX_KMAP_BEGIN + idx)) {
> + tracing_off();
> + BUG();
> + }
>
> arch_kmap_local_pre_unmap(addr);
> pte_clear(&init_mm, addr, kmap_pte - idx);
>
Hello, the result could be found at http://kernel.montjoie.ovh/130739.log
Thanks
Corentin,
On Sat, Dec 05 2020 at 19:43, Corentin Labbe wrote:
> On Fri, Dec 04, 2020 at 09:58:21PM +0100, Thomas Gleixner wrote:
>> Can you please replace the debug patch with the one below and try again?
>> That stops the trace right on the condition.
>
> Hello, the result could be found at http://kernel.montjoie.ovh/130739.log
Thanks for providing this. This is clearly showing where stuff goes
wrong. It starts here at 729.550001. I removed the uninteresting parts:
0d..2 147103293us : __kmap_local_page_prot <-sg_miter_next
0d..3 147103308us :__kmap_local_pfn_prot: kmap_local_pfn: 1 ffefd000
0d..3 147103311us : __kmap_local_page_prot <-sg_miter_next
0d..4 147103325us : __kmap_local_pfn_prot: kmap_local_pfn: 3 ffefb000
0d..3 147103429us : kunmap_local_indexed <-sg_miter_stop
0d..4 147103433us : kunmap_local_indexed: kunmap_local: 3 ffefd000
So this maps two pages and unmaps the first one. That's all called from
sun4i_ss_opti_poll() and the bug is clearly visible there:
sg_miter_next(&mi);
sg_miter_next(&mo);
release_ss:
sg_miter_stop(&mi);
sg_miter_stop(&mo);
Written by yourself :) Same issue in sun4i_ss_cipher_poll()
Fix below.
Julia, it might be worth to have a coccinelle check for that. It's the
only place which got it wrong, but this goes unnoticed when code is
e.g. only fully tested on 64bit or as in this case never tested with
full debugging enabled. The whole kmap_atomic and kmap_local (new in
next) family and all users like the sg_miter stuff are affected by this.
Thanks,
tglx
---
Subject: crypto: sun4i-ss - Fix sg_miter_stop() ordering
From: Thomas Gleixner <[email protected]>
Date: Sat, 05 Dec 2020 20:17:28 +0100
sun4i_ss_opti_poll() and sun4i_ss_cipher_poll() do:
sg_miter_next(&mi);
sg_miter_next(&mo);
...
sg_miter_stop(&mi);
sg_miter_stop(&mo);
which is the wrong order because sg_miter_next() maps a page with
kmap_atomic() and sg_miter_stop() unmaps it. kmap_atomic() uses a stack
internaly which requires that the nested map is unmapped first. As this
uses the wrong order it triggers the warning in kunmap_local_indexed()
which checks the to be unmapped address and subsequently crashes.
This went unnoticed for 5 years because the ARM kmap_atomic()
implementation had the warning conditional on CONFIG_DEBUG_HIGHMEM which
was obviously never enabled when testing that driver.
Flip the order to cure it.
Reported-by: Corentin Labbe <[email protected]>
Fixes: 6298e948215f ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator")
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
---
drivers/crypto/allwinner/sun4i-ss/sun4i-ss-cipher.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/crypto/allwinner/sun4i-ss/sun4i-ss-cipher.c
+++ b/drivers/crypto/allwinner/sun4i-ss/sun4i-ss-cipher.c
@@ -109,8 +109,8 @@ static int noinline_for_stack sun4i_ss_o
}
release_ss:
- sg_miter_stop(&mi);
sg_miter_stop(&mo);
+ sg_miter_stop(&mi);
writel(0, ss->base + SS_CTL);
spin_unlock_irqrestore(&ss->slock, flags);
return err;
@@ -333,8 +333,8 @@ static int sun4i_ss_cipher_poll(struct s
}
release_ss:
- sg_miter_stop(&mi);
sg_miter_stop(&mo);
+ sg_miter_stop(&mi);
writel(0, ss->base + SS_CTL);
spin_unlock_irqrestore(&ss->slock, flags);
On Sat, 5 Dec 2020, Thomas Gleixner wrote:
> Corentin,
>
> On Sat, Dec 05 2020 at 19:43, Corentin Labbe wrote:
> > On Fri, Dec 04, 2020 at 09:58:21PM +0100, Thomas Gleixner wrote:
> >> Can you please replace the debug patch with the one below and try again?
> >> That stops the trace right on the condition.
> >
> > Hello, the result could be found at http://kernel.montjoie.ovh/130739.log
>
> Thanks for providing this. This is clearly showing where stuff goes
> wrong. It starts here at 729.550001. I removed the uninteresting parts:
>
> 0d..2 147103293us : __kmap_local_page_prot <-sg_miter_next
> 0d..3 147103308us :__kmap_local_pfn_prot: kmap_local_pfn: 1 ffefd000
>
> 0d..3 147103311us : __kmap_local_page_prot <-sg_miter_next
> 0d..4 147103325us : __kmap_local_pfn_prot: kmap_local_pfn: 3 ffefb000
>
> 0d..3 147103429us : kunmap_local_indexed <-sg_miter_stop
> 0d..4 147103433us : kunmap_local_indexed: kunmap_local: 3 ffefd000
>
> So this maps two pages and unmaps the first one. That's all called from
> sun4i_ss_opti_poll() and the bug is clearly visible there:
>
> sg_miter_next(&mi);
> sg_miter_next(&mo);
>
> release_ss:
> sg_miter_stop(&mi);
> sg_miter_stop(&mo);
>
> Written by yourself :) Same issue in sun4i_ss_cipher_poll()
>
> Fix below.
>
> Julia, it might be worth to have a coccinelle check for that. It's the
> only place which got it wrong, but this goes unnoticed when code is
> e.g. only fully tested on 64bit or as in this case never tested with
> full debugging enabled. The whole kmap_atomic and kmap_local (new in
> next) family and all users like the sg_miter stuff are affected by this.
OK, thanks for the suggestion. I will look into it.
julia
>
> Thanks,
>
> tglx
> ---
> Subject: crypto: sun4i-ss - Fix sg_miter_stop() ordering
> From: Thomas Gleixner <[email protected]>
> Date: Sat, 05 Dec 2020 20:17:28 +0100
>
> sun4i_ss_opti_poll() and sun4i_ss_cipher_poll() do:
>
> sg_miter_next(&mi);
> sg_miter_next(&mo);
> ...
> sg_miter_stop(&mi);
> sg_miter_stop(&mo);
>
> which is the wrong order because sg_miter_next() maps a page with
> kmap_atomic() and sg_miter_stop() unmaps it. kmap_atomic() uses a stack
> internaly which requires that the nested map is unmapped first. As this
> uses the wrong order it triggers the warning in kunmap_local_indexed()
> which checks the to be unmapped address and subsequently crashes.
>
> This went unnoticed for 5 years because the ARM kmap_atomic()
> implementation had the warning conditional on CONFIG_DEBUG_HIGHMEM which
> was obviously never enabled when testing that driver.
>
> Flip the order to cure it.
>
> Reported-by: Corentin Labbe <[email protected]>
> Fixes: 6298e948215f ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator")
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> ---
> drivers/crypto/allwinner/sun4i-ss/sun4i-ss-cipher.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/drivers/crypto/allwinner/sun4i-ss/sun4i-ss-cipher.c
> +++ b/drivers/crypto/allwinner/sun4i-ss/sun4i-ss-cipher.c
> @@ -109,8 +109,8 @@ static int noinline_for_stack sun4i_ss_o
> }
>
> release_ss:
> - sg_miter_stop(&mi);
> sg_miter_stop(&mo);
> + sg_miter_stop(&mi);
> writel(0, ss->base + SS_CTL);
> spin_unlock_irqrestore(&ss->slock, flags);
> return err;
> @@ -333,8 +333,8 @@ static int sun4i_ss_cipher_poll(struct s
> }
>
> release_ss:
> - sg_miter_stop(&mi);
> sg_miter_stop(&mo);
> + sg_miter_stop(&mi);
> writel(0, ss->base + SS_CTL);
> spin_unlock_irqrestore(&ss->slock, flags);
>
>
>
>
On Sun, Dec 06 2020 at 22:40, Corentin Labbe wrote:
> On Sat, Dec 05, 2020 at 08:48:15PM +0100, Thomas Gleixner wrote:
>> So this maps two pages and unmaps the first one. That's all called from
>> sun4i_ss_opti_poll() and the bug is clearly visible there:
>>
>> sg_miter_next(&mi);
>> sg_miter_next(&mo);
>>
>> release_ss:
>> sg_miter_stop(&mi);
>> sg_miter_stop(&mo);
>>
>> Written by yourself :) Same issue in sun4i_ss_cipher_poll()
>>
>> Fix below.
>>
>
> Unfortunatly, the crash still happen with the fix.
> See http://kernel.montjoie.ovh/131321.log
And why are you not looking for the reason of this problem in your own
code yourself? It's not a regression caused by my work.
Turn on CONFIG_DEBUG_HIGHMEM on 5.10-rcX or older kernels and you will
get the very same crashes. My work just made these checks unconditional.
This was broken forever and it's not my problem that you did not enable
mandatory debug options when developing this thing.
I gave you tons of hints by now how to debug this and what to look
for. Obviously I overlooked something and here is the final hint:
sg_miter_next(&mi);
sg_miter_next(&mo);
do {
....
if (cond1)
sg_miter_next(&mi); <--- HINT
....
if (cond2)
sg_miter_next(&mo);
release_ss:
sg_miter_stop(&mi);
sg_miter_stop(&mo);
So yes, I overlooked the obvious, but as I said above it's not something
which my is failing due to my changes. It was broken forever, it just
was not tested properly. Don't blame the messenger.
My knowledge about how to use nested sg_miter correctly is close to
zero. I can and did explain you the rules of kmap_atomic/local() but
that's it.
Thanks,
tglx
On Mon, Dec 07 2020 at 13:18, Corentin Labbe wrote:
> On Mon, Dec 07, 2020 at 01:15:49AM +0100, Thomas Gleixner wrote:
> So if I understand correctly, basicly I cannot have two atomic kmap at
> the same time since it made unmapping them in the right order complex.
You can, but the ordering has to be correct and with sg_miter that's
probably hard to get right.
> I am not sure to have well understood your hint, but could you give me
So the point is:
sg_miter_next(&mi); map 1 -> vaddr1
sg_miter_next(&mo); map 2 -> vaddr2
do {
...
if (cond) {
sg_miter_next(&mi)
sg_miter_stop()
unmap(vaddr1); unmaps map2 -> FAIL
if (next_page)
map(); maps map2 -> vaddr2 -> FAIL
}
The only way how that could have ever worked is when the conditional
sg_miter_next(&mi) did not try to map a new page, i.e. end of data.
The ARM kunmap_atomic() had:
#ifdef CONFIG_DEBUG_HIGHMEM
BUG_ON(vaddr != __fix_to_virt(idx));
set_fixmap_pte(idx, __pte(0));
#else
which means the warning and clearing the PTE only happens when debugging
is enabled. That made your code "work" by chance because the unmap
leaves map2 intact which means the vaddr2 mapping stays valid, so the
access to it further down still worked.
sg_miter_next(&mi); map 1 -> vaddr1
sg_miter_next(&mo); map 2 -> vaddr2
do {
...
if (cond) {
sg_miter_next(&mi)
sg_miter_stop()
unmap(vaddr1); idx 2 ---> 1
but mapping still valid for vaddr2
}
*vaddr2 = x; works by chance
But that also would cause trouble in the following case:
sg_miter_next(&mi); map 1 -> vaddr1
sg_miter_next(&mo); map 2 -> vaddr2
do {
...
if (cond) {
sg_miter_next(&mi)
sg_miter_stop()
unmap(vaddr1); idx 2 ---> 1
but mapping still valid for vaddr2
}
interrupt
kmap_atomic(some_other_page)
idx 1 -> 2 map some_otherpage to vaddr2
kunmap_atomic(vaddr2) idx 2 ---> 1
mapping still valid for vaddr2,
but now points to some_other_page
end of interrupt
*vaddr2 = x; <-- accesses some_other_page -> FAIL
This is the worst one because it's random data corruption and extremly
hard to debug.
I made the warning and the pte clearing in the new code unconditional
just to catch any issues upfront which it did.
sg_miter_next(&mi); map 1 -> vaddr1
sg_miter_next(&mo); map 2 -> vaddr2
do {
...
if (cond) {
sg_miter_next(&mi)
sg_miter_stop()
unmap(vaddr1); unmaps map2 -> FAIL
clear map2 invalidates vaddr2
}
*vaddr2 = x; <-- accesses the unmapped vaddr2 -> CRASH
> what you think about the following patch which fix (at least) the
> crash. Instead of holding SGmiter (and so two kmap), I use only one
> at a time.
That looks fine at least vs. the sg_miter/kmap_atomic usage.
Thanks,
tglx