2022-09-21 07:10:35

by Sachin Sant

[permalink] [raw]
Subject: [powerpc] Kernel crash with THP tests (next-20220920)

While running transparent huge page tests [1] against 6.0.0-rc6-next-20220920
following crash is seen on IBM Power server.

Kernel attempted to read user page (34) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000034
Faulting instruction address: 0xc0000000004d2744
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: dm_mod(E) bonding(E) rfkill(E) tls(E) sunrpc(E) nd_pmem(E) nd_btt(E) dax_pmem(E) papr_scm(E) libnvdimm(E) pseries_rng(E) vmx_crypto(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) sg(E) ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) fuse(E)
CPU: 37 PID: 2219255 Comm: sysctl Tainted: G E 6.0.0-rc6-next-20220920 #1
NIP: c0000000004d2744 LR: c0000000004d2734 CTR: 0000000000000000
REGS: c0000012801bf660 TRAP: 0300 Tainted: G E (6.0.0-rc6-next-20220920)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24048222 XER: 20040000
CFAR: c0000000004b0eac DAR: 0000000000000034 DSISR: 40000000 IRQMASK: 0
GPR00: c0000000004d2734 c0000012801bf900 c000000002a92300 0000000000000000
GPR04: c000000002ac8ac0 c000000001209340 0000000000000005 c000001286714b80
GPR08: 0000000000000034 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000028048242 c00000167fff6b00 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: c0000012801bfae8 0000000000000001 0000000000000100 0000000000000001
GPR24: c0000012801bfae8 c000000002ac8ac0 0000000000000002 0000000000000005
GPR28: 0000000000000000 0000000000000001 0000000000000000 0000000000346cca
NIP [c0000000004d2744] alloc_buddy_huge_page+0xd4/0x240
LR [c0000000004d2734] alloc_buddy_huge_page+0xc4/0x240
Call Trace:
[c0000012801bf900] [c0000000004d2734] alloc_buddy_huge_page+0xc4/0x240 (unreliable)
[c0000012801bf9b0] [c0000000004d46a4] alloc_fresh_huge_page.part.72+0x214/0x2a0
[c0000012801bfa40] [c0000000004d7f88] alloc_pool_huge_page+0x118/0x190
[c0000012801bfa90] [c0000000004d84dc] __nr_hugepages_store_common+0x4dc/0x610
[c0000012801bfb70] [c0000000004d88bc] hugetlb_sysctl_handler_common+0x13c/0x180
[c0000012801bfc10] [c0000000006380e0] proc_sys_call_handler+0x210/0x350
[c0000012801bfc90] [c000000000551c00] vfs_write+0x2e0/0x460
[c0000012801bfd50] [c000000000551f5c] ksys_write+0x7c/0x140
[c0000012801bfda0] [c000000000033f58] system_call_exception+0x188/0x3f0
[c0000012801bfe10] [c00000000000c53c] system_call_common+0xec/0x270
--- interrupt: c00 at 0x7fffa9520c34
NIP: 00007fffa9520c34 LR: 00000001024754bc CTR: 0000000000000000
REGS: c0000012801bfe80 TRAP: 0c00 Tainted: G E (6.0.0-rc6-next-20220920)
MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28002202 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000004 00007fffccd76cd0 00007fffa9607300 0000000000000003
GPR04: 0000000138da6970 0000000000000006 fffffffffffffff6 0000000000000000
GPR08: 0000000138da6970 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fffa9a40940 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000001 0000000000000010 0000000000000006 0000000138da8aa0
GPR28: 00007fffa95fc2c8 0000000138da8aa0 0000000000000006 0000000138da6930
NIP [00007fffa9520c34] 0x7fffa9520c34
LR [00000001024754bc] 0x1024754bc
--- interrupt: c00
Instruction dump:
3b400002 3ba00001 3b800000 7f26cb78 7fc5f378 7f64db78 7fe3fb78 4bfde5b9
60000000 7c691b78 39030034 7c0004ac <7d404028> 7c0ae800 40c20010 7f80412d
---[ end trace 0000000000000000 ]---

Kernel panic - not syncing: Fatal exception

Bisect points to following patch:
commit f2f3c25dea3acfb17aecb7273541e7266dfc8842
hugetlb: freeze allocated pages before creating hugetlb pages

Reverting the patch allows the test to run successfully.

Thanks
- Sachin

[1] https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/transparent_hugepages_defrag.py


2022-09-21 23:59:35

by Mike Kravetz

[permalink] [raw]
Subject: Re: [powerpc] Kernel crash with THP tests (next-20220920)

On 09/21/22 12:00, Sachin Sant wrote:
> While running transparent huge page tests [1] against 6.0.0-rc6-next-20220920
> following crash is seen on IBM Power server.

Thanks Sachin,

Naoya reported this, with my analysis here:
https://lore.kernel.org/linux-mm/YyqCS6+OXAgoqI8T@monkey/

An updated version of the patch was posted here,
https://lore.kernel.org/linux-mm/[email protected]/

Sorry about that,
--
Mike Kravetz

>
> Kernel attempted to read user page (34) - exploit attempt? (uid: 0)
> BUG: Kernel NULL pointer dereference on read at 0x00000034
> Faulting instruction address: 0xc0000000004d2744
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in: dm_mod(E) bonding(E) rfkill(E) tls(E) sunrpc(E) nd_pmem(E) nd_btt(E) dax_pmem(E) papr_scm(E) libnvdimm(E) pseries_rng(E) vmx_crypto(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) sg(E) ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) fuse(E)
> CPU: 37 PID: 2219255 Comm: sysctl Tainted: G E 6.0.0-rc6-next-20220920 #1
> NIP: c0000000004d2744 LR: c0000000004d2734 CTR: 0000000000000000
> REGS: c0000012801bf660 TRAP: 0300 Tainted: G E (6.0.0-rc6-next-20220920)
> MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24048222 XER: 20040000
> CFAR: c0000000004b0eac DAR: 0000000000000034 DSISR: 40000000 IRQMASK: 0
> GPR00: c0000000004d2734 c0000012801bf900 c000000002a92300 0000000000000000
> GPR04: c000000002ac8ac0 c000000001209340 0000000000000005 c000001286714b80
> GPR08: 0000000000000034 0000000000000000 0000000000000000 0000000000000000
> GPR12: 0000000028048242 c00000167fff6b00 0000000000000000 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: c0000012801bfae8 0000000000000001 0000000000000100 0000000000000001
> GPR24: c0000012801bfae8 c000000002ac8ac0 0000000000000002 0000000000000005
> GPR28: 0000000000000000 0000000000000001 0000000000000000 0000000000346cca
> NIP [c0000000004d2744] alloc_buddy_huge_page+0xd4/0x240
> LR [c0000000004d2734] alloc_buddy_huge_page+0xc4/0x240
> Call Trace:
> [c0000012801bf900] [c0000000004d2734] alloc_buddy_huge_page+0xc4/0x240 (unreliable)
> [c0000012801bf9b0] [c0000000004d46a4] alloc_fresh_huge_page.part.72+0x214/0x2a0
> [c0000012801bfa40] [c0000000004d7f88] alloc_pool_huge_page+0x118/0x190
> [c0000012801bfa90] [c0000000004d84dc] __nr_hugepages_store_common+0x4dc/0x610
> [c0000012801bfb70] [c0000000004d88bc] hugetlb_sysctl_handler_common+0x13c/0x180
> [c0000012801bfc10] [c0000000006380e0] proc_sys_call_handler+0x210/0x350
> [c0000012801bfc90] [c000000000551c00] vfs_write+0x2e0/0x460
> [c0000012801bfd50] [c000000000551f5c] ksys_write+0x7c/0x140
> [c0000012801bfda0] [c000000000033f58] system_call_exception+0x188/0x3f0
> [c0000012801bfe10] [c00000000000c53c] system_call_common+0xec/0x270
> --- interrupt: c00 at 0x7fffa9520c34
> NIP: 00007fffa9520c34 LR: 00000001024754bc CTR: 0000000000000000
> REGS: c0000012801bfe80 TRAP: 0c00 Tainted: G E (6.0.0-rc6-next-20220920)
> MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28002202 XER: 00000000
> IRQMASK: 0
> GPR00: 0000000000000004 00007fffccd76cd0 00007fffa9607300 0000000000000003
> GPR04: 0000000138da6970 0000000000000006 fffffffffffffff6 0000000000000000
> GPR08: 0000000138da6970 0000000000000000 0000000000000000 0000000000000000
> GPR12: 0000000000000000 00007fffa9a40940 0000000000000000 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR24: 0000000000000001 0000000000000010 0000000000000006 0000000138da8aa0
> GPR28: 00007fffa95fc2c8 0000000138da8aa0 0000000000000006 0000000138da6930
> NIP [00007fffa9520c34] 0x7fffa9520c34
> LR [00000001024754bc] 0x1024754bc
> --- interrupt: c00
> Instruction dump:
> 3b400002 3ba00001 3b800000 7f26cb78 7fc5f378 7f64db78 7fe3fb78 4bfde5b9
> 60000000 7c691b78 39030034 7c0004ac <7d404028> 7c0ae800 40c20010 7f80412d
> ---[ end trace 0000000000000000 ]---
>
> Kernel panic - not syncing: Fatal exception
>
> Bisect points to following patch:
> commit f2f3c25dea3acfb17aecb7273541e7266dfc8842
> hugetlb: freeze allocated pages before creating hugetlb pages
>
> Reverting the patch allows the test to run successfully.
>
> Thanks
> - Sachin
>
> [1] https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/transparent_hugepages_defrag.py

2022-09-22 13:35:11

by Sachin Sant

[permalink] [raw]
Subject: Re: [powerpc] Kernel crash with THP tests (next-20220920)



> On 22-Sep-2022, at 5:11 AM, Mike Kravetz <[email protected]> wrote:
>
> On 09/21/22 12:00, Sachin Sant wrote:
>> While running transparent huge page tests [1] against 6.0.0-rc6-next-20220920
>> following crash is seen on IBM Power server.
>
> Thanks Sachin,
>
> Naoya reported this, with my analysis here:
> https://lore.kernel.org/linux-mm/YyqCS6+OXAgoqI8T@monkey/
>

Thanks Mike for the pointer.

> An updated version of the patch was posted here,
> https://lore.kernel.org/linux-mm/[email protected]/
>
This updated patch works for me. The test runs to completion without any
issues.

- Sachin