It was broken somewhere between b00d209241ff and 3541833fd1f2.
[ 0.000000] cannot allocate crashkernel (size:0x20000000)
Where a good one looks like this,
[ 0.000000] crashkernel reserved: 0x0000000008600000 - 0x0000000028600000 (512 MB)
Some commits look more suspicious than others.
mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
mm: introduce mm_[p4d|pud|pmd]_folded
mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
# diff -u ../iomem.good.txt ../iomem.bad.txt
--- ../iomem.good.txt 2018-11-10 22:28:20.092614398 -0500
+++ ../iomem.bad.txt 2018-11-10 20:39:54.930294479 -0500
@@ -1,9 +1,8 @@
00000000-3965ffff : System RAM
00080000-018cffff : Kernel code
- 018d0000-020affff : reserved
- 020b0000-045affff : Kernel data
- 08600000-285fffff : Crash kernel
- 28730000-2d5affff : reserved
+ 018d0000-0762ffff : reserved
+ 07630000-09b2ffff : Kernel data
+ 231b0000-2802ffff : reserved
30ec0000-30ecffff : reserved
35660000-3965ffff : reserved
39660000-396fffff : reserved
@@ -127,7 +126,7 @@
7c5200000-7c520ffff : 0004:48:00.0
1040000000-17fbffffff : System RAM
13fbfd0000-13fdfdffff : reserved
- 16fba80000-17fbfdffff : reserved
+ 16fafd0000-17fbfdffff : reserved
17fbfe0000-17fbffffff : reserved
1800000000-1ffbffffff : System RAM
1bfbff0000-1bfdfeffff : reserved
The memory map looks like this,
[ 0.000000] ACPI: Early table checksum verification disabled
[ 0.000000] ACPI: RSDP 0x00000000398D0014 000024 (v02 HISI )
[ 0.000000] ACPI: XSDT 0x00000000398C00E8 000064 (v01 HISI HIP07 00000000 01000013)
[ 0.000000] ACPI: FACP 0x0000000039770000 000114 (v06 HISI HIP07 00000000 INTL 20151124)
[ 0.000000] ACPI: DSDT 0x0000000039730000 00691A (v02 HISI HIP07 00000000 INTL 20170728)
[ 0.000000] ACPI: MCFG 0x00000000397C0000 0000AC (v01 HISI HIP07 00000000 INTL 20151124)
[ 0.000000] ACPI: SLIT 0x00000000397B0000 00003C (v01 HISI HIP07 00000000 INTL 20151124)
[ 0.000000] ACPI: SRAT 0x00000000397A0000 000578 (v03 HISI HIP07 00000000 INTL 20151124)
[ 0.000000] ACPI: DBG2 0x0000000039790000 00005A (v00 HISI HIP07 00000000 INTL 20151124)
[ 0.000000] ACPI: GTDT 0x0000000039760000 00007C (v02 HISI HIP07 00000000 INTL 20151124)
[ 0.000000] ACPI: APIC 0x0000000039750000 0014E4 (v04 HISI HIP07 00000000 INTL 20151124)
[ 0.000000] ACPI: IORT 0x0000000039740000 000554 (v00 HISI HIP07 00000000 INTL 20170728)
[ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
[ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x1800000000-0x1fffffffff]
[ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x17ffffffff]
[ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0x9000000000-0x97ffffffff]
[ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x8800000000-0x8fffffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x17fbffe5c0-0x17fbffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe5c0-0x1ffbffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x8ffbffe5c0-0x8ffbffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x97fadce5c0-0x97fadcffff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000000000000-0x00000000ffffffff]
[ 0.000000] Normal [mem 0x0000000100000000-0x00000097fbffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000000000-0x000000003965ffff]
[ 0.000000] node 0: [mem 0x0000000039660000-0x00000000396fffff]
[ 0.000000] node 0: [mem 0x0000000039700000-0x000000003977ffff]
[ 0.000000] node 0: [mem 0x0000000039780000-0x000000003978ffff]
[ 0.000000] node 0: [mem 0x0000000039790000-0x00000000397cffff]
[ 0.000000] node 0: [mem 0x00000000397d0000-0x00000000398bffff]
[ 0.000000] node 0: [mem 0x00000000398c0000-0x00000000398dffff]
[ 0.000000] node 0: [mem 0x00000000398e0000-0x0000000039d5ffff]
[ 0.000000] node 0: [mem 0x0000000039d60000-0x000000003ed4ffff]
[ 0.000000] node 0: [mem 0x000000003ed50000-0x000000003ed7ffff]
[ 0.000000] node 0: [mem 0x000000003ed80000-0x000000003fbfffff]
[ 0.000000] node 0: [mem 0x0000001040000000-0x00000017fbffffff]
[ 0.000000] node 1: [mem 0x0000001800000000-0x0000001ffbffffff]
[ 0.000000] node 2: [mem 0x0000008800000000-0x0000008ffbffffff]
[ 0.000000] node 3: [mem 0x0000009000000000-0x00000097fbffffff]
On Sat, 10 Nov 2018 23:41:34 -0500
Qian Cai <[email protected]> wrote:
> It was broken somewhere between b00d209241ff and 3541833fd1f2.
>
> [ 0.000000] cannot allocate crashkernel (size:0x20000000)
>
> Where a good one looks like this,
>
> [ 0.000000] crashkernel reserved: 0x0000000008600000 - 0x0000000028600000 (512 MB)
>
> Some commits look more suspicious than others.
>
> mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
> mm: introduce mm_[p4d|pud|pmd]_folded
> mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
The intent of these three patches is to add extra checks to the
pgtable_bytes accounting function. If applied incorrectly the expected
result would be warnings like this:
BUG: non-zero pgtables_bytes on freeing mm: 16384
The change Linus worried about affects the __PAGETABLE_PxD_FOLDED defines.
These defines are used with #ifdef, #ifndef, and __is_defined() for the
new mm_p?d_folded() macros. I can not see how this would make a difference
for your iomem setup.
> # diff -u ../iomem.good.txt ../iomem.bad.txt
> --- ../iomem.good.txt 2018-11-10 22:28:20.092614398 -0500
> +++ ../iomem.bad.txt 2018-11-10 20:39:54.930294479 -0500
> @@ -1,9 +1,8 @@
> 00000000-3965ffff : System RAM
> 00080000-018cffff : Kernel code
> - 018d0000-020affff : reserved
> - 020b0000-045affff : Kernel data
> - 08600000-285fffff : Crash kernel
> - 28730000-2d5affff : reserved
> + 018d0000-0762ffff : reserved
> + 07630000-09b2ffff : Kernel data
> + 231b0000-2802ffff : reserved
> 30ec0000-30ecffff : reserved
> 35660000-3965ffff : reserved
> 39660000-396fffff : reserved
> @@ -127,7 +126,7 @@
> 7c5200000-7c520ffff : 0004:48:00.0
> 1040000000-17fbffffff : System RAM
> 13fbfd0000-13fdfdffff : reserved
> - 16fba80000-17fbfdffff : reserved
> + 16fafd0000-17fbfdffff : reserved
> 17fbfe0000-17fbffffff : reserved
> 1800000000-1ffbffffff : System RAM
> 1bfbff0000-1bfdfeffff : reserved
The easiest way to verify if the three commits have something to do with your
problem is to revert them and run your test. Can you do that please ?
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
> On Nov 11, 2018, at 6:35 AM, Martin Schwidefsky <[email protected]> wrote:
>
> On Sat, 10 Nov 2018 23:41:34 -0500
> Qian Cai <[email protected]> wrote:
>
>> It was broken somewhere between b00d209241ff and 3541833fd1f2.
>>
>> [ 0.000000] cannot allocate crashkernel (size:0x20000000)
>>
>> Where a good one looks like this,
>>
>> [ 0.000000] crashkernel reserved: 0x0000000008600000 - 0x0000000028600000 (512 MB)
>>
>> Some commits look more suspicious than others.
>>
>> mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
>> mm: introduce mm_[p4d|pud|pmd]_folded
>> mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
>
> The intent of these three patches is to add extra checks to the
> pgtable_bytes accounting function. If applied incorrectly the expected
> result would be warnings like this:
> BUG: non-zero pgtables_bytes on freeing mm: 16384
>
> The change Linus worried about affects the __PAGETABLE_PxD_FOLDED defines.
> These defines are used with #ifdef, #ifndef, and __is_defined() for the
> new mm_p?d_folded() macros. I can not see how this would make a difference
> for your iomem setup.
>
>> # diff -u ../iomem.good.txt ../iomem.bad.txt
>> --- ../iomem.good.txt 2018-11-10 22:28:20.092614398 -0500
>> +++ ../iomem.bad.txt 2018-11-10 20:39:54.930294479 -0500
>> @@ -1,9 +1,8 @@
>> 00000000-3965ffff : System RAM
>> 00080000-018cffff : Kernel code
>> - 018d0000-020affff : reserved
>> - 020b0000-045affff : Kernel data
>> - 08600000-285fffff : Crash kernel
>> - 28730000-2d5affff : reserved
>> + 018d0000-0762ffff : reserved
>> + 07630000-09b2ffff : Kernel data
>> + 231b0000-2802ffff : reserved
>> 30ec0000-30ecffff : reserved
>> 35660000-3965ffff : reserved
>> 39660000-396fffff : reserved
>> @@ -127,7 +126,7 @@
>> 7c5200000-7c520ffff : 0004:48:00.0
>> 1040000000-17fbffffff : System RAM
>> 13fbfd0000-13fdfdffff : reserved
>> - 16fba80000-17fbfdffff : reserved
>> + 16fafd0000-17fbfdffff : reserved
>> 17fbfe0000-17fbffffff : reserved
>> 1800000000-1ffbffffff : System RAM
>> 1bfbff0000-1bfdfeffff : reserved
>
> The easiest way to verify if the three commits have something to do with your
> problem is to revert them and run your test. Can you do that please ?
Yes, you are right. Those commits have nothing to do with the problem. I should
realized it earlier as those are virtual memory vs physical memory. Sorry for the
nosie.
It turned out I made a wrong assumption that if kmemleak is disabled by default,
there should be no memory reserved for kmemleak at all which is not the case.
CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=600000
CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y
Even without kmemleak=on in the kernel cmdline, it still reserve early log memory
which causes not enough memory for crashkernel.
Since there seems no way to turn kmemleak on later after boot, is there any
reasons for the current behavior?
On Sun, 11 Nov 2018 08:36:09 -0500
Qian Cai <[email protected]> wrote:
> > On Nov 11, 2018, at 6:35 AM, Martin Schwidefsky <[email protected]> wrote:
> >
> > On Sat, 10 Nov 2018 23:41:34 -0500
> > Qian Cai <[email protected]> wrote:
> >
> >> It was broken somewhere between b00d209241ff and 3541833fd1f2.
> >>
> >> [ 0.000000] cannot allocate crashkernel (size:0x20000000)
> >>
> >> Where a good one looks like this,
> >>
> >> [ 0.000000] crashkernel reserved: 0x0000000008600000 - 0x0000000028600000 (512 MB)
> >>
> >> Some commits look more suspicious than others.
> >>
> >> mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
> >> mm: introduce mm_[p4d|pud|pmd]_folded
> >> mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
> >
> > The intent of these three patches is to add extra checks to the
> > pgtable_bytes accounting function. If applied incorrectly the expected
> > result would be warnings like this:
> > BUG: non-zero pgtables_bytes on freeing mm: 16384
> >
> > The change Linus worried about affects the __PAGETABLE_PxD_FOLDED defines.
> > These defines are used with #ifdef, #ifndef, and __is_defined() for the
> > new mm_p?d_folded() macros. I can not see how this would make a difference
> > for your iomem setup.
> >
> >> # diff -u ../iomem.good.txt ../iomem.bad.txt
> >> --- ../iomem.good.txt 2018-11-10 22:28:20.092614398 -0500
> >> +++ ../iomem.bad.txt 2018-11-10 20:39:54.930294479 -0500
> >> @@ -1,9 +1,8 @@
> >> 00000000-3965ffff : System RAM
> >> 00080000-018cffff : Kernel code
> >> - 018d0000-020affff : reserved
> >> - 020b0000-045affff : Kernel data
> >> - 08600000-285fffff : Crash kernel
> >> - 28730000-2d5affff : reserved
> >> + 018d0000-0762ffff : reserved
> >> + 07630000-09b2ffff : Kernel data
> >> + 231b0000-2802ffff : reserved
> >> 30ec0000-30ecffff : reserved
> >> 35660000-3965ffff : reserved
> >> 39660000-396fffff : reserved
> >> @@ -127,7 +126,7 @@
> >> 7c5200000-7c520ffff : 0004:48:00.0
> >> 1040000000-17fbffffff : System RAM
> >> 13fbfd0000-13fdfdffff : reserved
> >> - 16fba80000-17fbfdffff : reserved
> >> + 16fafd0000-17fbfdffff : reserved
> >> 17fbfe0000-17fbffffff : reserved
> >> 1800000000-1ffbffffff : System RAM
> >> 1bfbff0000-1bfdfeffff : reserved
> >
> > The easiest way to verify if the three commits have something to do with your
> > problem is to revert them and run your test. Can you do that please ?
> Yes, you are right. Those commits have nothing to do with the problem. I should
> realized it earlier as those are virtual memory vs physical memory. Sorry for the
> nosie.
>
> It turned out I made a wrong assumption that if kmemleak is disabled by default,
> there should be no memory reserved for kmemleak at all which is not the case.
>
> CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=600000
> CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y
>
> Even without kmemleak=on in the kernel cmdline, it still reserve early log memory
> which causes not enough memory for crashkernel.
>
> Since there seems no way to turn kmemleak on later after boot, is there any
> reasons for the current behavior?
Well seems like you do have CONFIG_DEBUG_KMEMLEAK=y in your config. The code
contains data structures for the case that you want to use the kmemleak checker.
The presence of these structures will change the sizes. The last commit in regard
to the 'early_log' buffer has been from 2009 with this change:
@@ -232,8 +232,9 @@ struct early_log {
};
/* early logging buffer and current position */
-static struct early_log early_log[CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE];
-static int crt_early_log;
+static struct early_log
+ early_log[CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE] __initdata;
+static int crt_early_log __initdata;
static void kmemleak_disable(void);
The current behavior is imho nothing new.
Would it be possible to disable CONFIG_DEBUG_KMEMLEAK for your kdump kernel?
That seems like the simplest solution.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
On 11/12/18 at 1:01 AM, Martin Schwidefsky wrote:
> On Sun, 11 Nov 2018 08:36:09 -0500
> Qian Cai <[email protected]> wrote:
>
> > > On Nov 11, 2018, at 6:35 AM, Martin Schwidefsky <[email protected]> wrote:
> > >
> > > On Sat, 10 Nov 2018 23:41:34 -0500
> > > Qian Cai <[email protected]> wrote:
> > >
> > >> It was broken somewhere between b00d209241ff and 3541833fd1f2.
> > >>
> > >> [ 0.000000] cannot allocate crashkernel (size:0x20000000)
> > >>
> > >> Where a good one looks like this,
> > >>
> > >> [ 0.000000] crashkernel reserved: 0x0000000008600000 - 0x0000000028600000 (512 MB)
> > >>
> > >> Some commits look more suspicious than others.
> > >>
> > >> mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
> > >> mm: introduce mm_[p4d|pud|pmd]_folded
> > >> mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
> > >
> > > The intent of these three patches is to add extra checks to the
> > > pgtable_bytes accounting function. If applied incorrectly the expected
> > > result would be warnings like this:
> > > BUG: non-zero pgtables_bytes on freeing mm: 16384
> > >
> > > The change Linus worried about affects the __PAGETABLE_PxD_FOLDED defines.
> > > These defines are used with #ifdef, #ifndef, and __is_defined() for the
> > > new mm_p?d_folded() macros. I can not see how this would make a difference
> > > for your iomem setup.
> > >
> > >> # diff -u ../iomem.good.txt ../iomem.bad.txt
> > >> --- ../iomem.good.txt 2018-11-10 22:28:20.092614398 -0500
> > >> +++ ../iomem.bad.txt 2018-11-10 20:39:54.930294479 -0500
> > >> @@ -1,9 +1,8 @@
> > >> 00000000-3965ffff : System RAM
> > >> 00080000-018cffff : Kernel code
> > >> - 018d0000-020affff : reserved
> > >> - 020b0000-045affff : Kernel data
> > >> - 08600000-285fffff : Crash kernel
> > >> - 28730000-2d5affff : reserved
> > >> + 018d0000-0762ffff : reserved
> > >> + 07630000-09b2ffff : Kernel data
> > >> + 231b0000-2802ffff : reserved
> > >> 30ec0000-30ecffff : reserved
> > >> 35660000-3965ffff : reserved
> > >> 39660000-396fffff : reserved
> > >> @@ -127,7 +126,7 @@
> > >> 7c5200000-7c520ffff : 0004:48:00.0
> > >> 1040000000-17fbffffff : System RAM
> > >> 13fbfd0000-13fdfdffff : reserved
> > >> - 16fba80000-17fbfdffff : reserved
> > >> + 16fafd0000-17fbfdffff : reserved
> > >> 17fbfe0000-17fbffffff : reserved
> > >> 1800000000-1ffbffffff : System RAM
> > >> 1bfbff0000-1bfdfeffff : reserved
> > >
> > > The easiest way to verify if the three commits have something to do with your
> > > problem is to revert them and run your test. Can you do that please ?
> > Yes, you are right. Those commits have nothing to do with the problem. I should
> > realized it earlier as those are virtual memory vs physical memory. Sorry for the
> > nosie.
> >
> > It turned out I made a wrong assumption that if kmemleak is disabled by default,
> > there should be no memory reserved for kmemleak at all which is not the case.
> >
> > CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=600000
> > CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y
> >
> > Even without kmemleak=on in the kernel cmdline, it still reserve early log memory
> > which causes not enough memory for crashkernel.
> >
> > Since there seems no way to turn kmemleak on later after boot, is there any
> > reasons for the current behavior?
>
> Well seems like you do have CONFIG_DEBUG_KMEMLEAK=y in your config. The code
> contains data structures for the case that you want to use the kmemleak checker.
> The presence of these structures will change the sizes. The last commit in regard
> to the 'early_log' buffer has been from 2009 with this change:
>
> @@ -232,8 +232,9 @@ struct early_log {
> };
>
> /* early logging buffer and current position */
> -static struct early_log early_log[CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE];
> -static int crt_early_log;
> +static struct early_log
> + early_log[CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE] __initdata;
> +static int crt_early_log __initdata;
>
> static void kmemleak_disable(void);
>
> The current behavior is imho nothing new.
>
> Would it be possible to disable CONFIG_DEBUG_KMEMLEAK for your kdump kernel?
> That seems like the simplest solution.
Ah, okay. Those are static memory allocations
regardless of the kmemleak runtime setting.
The problem is that it has to disable kmemleak entirely
and re-compile the kernel for the first-kernel as well,
as crashkernel reservation happens in the first-kernel.
Hence, it loses flexibility to enable kmemleak during
boot time as well. I can live with it, although it does
not seem ideal.