Hi David,
Recently, Redhat CKI reported a kdump kernel bootup failure caused by
OOM. After bisect, it only happened after commit b5bad8c16b9b
("nouveau/gsp: move to 535.113.01"). Reverting the commit can avoid the
OOM, kdump kernel can boot up successfully.
From debugging, we can see that about extra 100M memory will be costed
when commit b5bad8c16b9b applied on the hpe machine with 2G memory.
Do you know if there's room to improve that to reduce the extra memory
cost?
I have opened a fedora bug to track this OOM, and copy the bug
description here for reference in case someone may not access the bug
easily.
Bug 2253165 - kdump kernel failed to boot up because a big memory chunk is reserved
https://bugzilla.redhat.com/show_bug.cgi?id=2253165
------------------------------------------------------------
CKI reported a failure on beaker machine hp-z210-01.ml3.eng.bos.redhat.com, please see below CKI reports:
https://datawarehouse.cki-project.org/kcidb/tests/10508330
In that failure, crashkernel=256M and succeeded to reserve in 1st kernel. However, in
kdump kernel it failed to boot up when it started to run init process. I set crashkernel=320M to make kdump kernel boot up successfully and vmcore dumping succeeded too.
After adding "rd.memdebug=4 memblock=debug" to kdump kernel cmdline, it appears to have a big chunk of reserved memory in memblock of about 122M. I don't know where it comes from. I doubt firmware stole that chunk from system memory to cause the kdump kernel having oom.
[Tue Dec 5 22:32:38 2023] DMI: Hewlett-Packard HP Z210 Workstation/1587h, BIOS J51 v01.20 09/16/2011
[Tue Dec 5 22:32:38 2023] tsc: Fast TSC calibration using PIT
[Tue Dec 5 22:32:38 2023] tsc: Detected 3092.940 MHz processor
[Tue Dec 5 22:32:38 2023] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[Tue Dec 5 22:32:38 2023] e820: remove [mem 0x000a0000-0x000fffff] usable
[Tue Dec 5 22:32:38 2023] last_pfn = 0x61000 max_arch_pfn = 0x400000000
[Tue Dec 5 22:32:38 2023] MTRR map: 4 entries (3 fixed + 1 variable; max 23), built from 10 variable MTRRs
[Tue Dec 5 22:32:38 2023] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
[Tue Dec 5 22:32:38 2023] x2apic: enabled by BIOS, switching to x2apic ops
[Tue Dec 5 22:32:38 2023] found SMP MP-table at [mem 0x000f4b80-0x000f4b8f]
[Tue Dec 5 22:32:38 2023] memblock_reserve: [0x00000000000f4b80-0x00000000000f4b8f] smp_scan_config+0xca/0x150
[Tue Dec 5 22:32:38 2023] memblock_reserve: [0x00000000000f4b90-0x00000000000f4e4b] smp_scan_config+0x13a/0x150
[Tue Dec 5 22:32:38 2023] memblock_reserve: [0x000000005f600000-0x000000005f610fff] setup_arch+0xd84/0xf10
[Tue Dec 5 22:32:38 2023] memblock_add: [0x0000000000001000-0x000000000008f7ff] e820__memblock_setup+0x73/0xb0
[Tue Dec 5 22:32:38 2023] memblock_add: [0x000000004d0e00b0-0x0000000060ff81cf] e820__memblock_setup+0x73/0xb0
[Tue Dec 5 22:32:38 2023] memblock_add: [0x0000000060ff81d0-0x0000000060ff81ff] e820__memblock_setup+0x73/0xb0
[Tue Dec 5 22:32:38 2023] memblock_add: [0x0000000060ff8200-0x0000000060ffffff] e820__memblock_setup+0x73/0xb0
[Tue Dec 5 22:32:38 2023] MEMBLOCK configuration:
[Tue Dec 5 22:32:38 2023] memory size = 0x0000000013fae750 reserved size = 0x0000000007b7cc50
[Tue Dec 5 22:32:38 2023] memory.cnt = 0x2
[Tue Dec 5 22:32:38 2023] memory[0x0] [0x0000000000001000-0x000000000008efff], 0x000000000008e000 bytes flags: 0x0
[Tue Dec 5 22:32:38 2023] memory[0x1] [0x000000004d0e1000-0x0000000060ffffff], 0x0000000013f1f000 bytes flags: 0x0
[Tue Dec 5 22:32:38 2023] reserved.cnt = 0x5
[Tue Dec 5 22:32:38 2023] reserved[0x0] [0x0000000000000000-0x000000000000ffff], 0x0000000000010000 bytes flags: 0x0
[Tue Dec 5 22:32:38 2023] reserved[0x1] [0x000000000008f400-0x00000000000fffff], 0x0000000000070c00 bytes flags: 0x0
[Tue Dec 5 22:32:38 2023] reserved[0x2] [0x0000000057b16000-0x000000005f610fff], 0x0000000007afb000 bytes flags: 0x0
[Tue Dec 5 22:32:38 2023] reserved[0x3] [0x0000000060ff81d0-0x0000000060ff821f], 0x0000000000000050 bytes flags: 0x0
[Tue Dec 5 22:32:38 2023] reserved[0x4] [0x0000000060ffe000-0x0000000060ffefff], 0x0000000000001000 bytes flags: 0x0
----------------------------------------------------
Thanks
Baoquan