LinuxLists.cc - compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page

2020-04-23 21:30:45

Subject: compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))

Compaction starts to crash below on linux-next today. The faulty page belongs to Node 0 DMA32 zone.
I’ll continue to narrow it down, but just want to give a headup in case someone could beat me to it.

Debug output from free_area_init_core()
[ 0.000000] KK start page = ffffea0000000040, end page = ffffea0000040000, nid = 0 DMA
[ 0.000000] KK start page = ffffea0000040000, end page = ffffea0004000000, nid = 0 DMA32
[ 0.000000] KK start page = ffffea0004000000, end page = ffffea0012000000, nid = 0 NORMAL
[ 0.000000] KK start page = ffffea0012000000, end page = ffffea0021fc0000, nid = 4 NORMAL

I don’t understand how it could end up in such a situation. There are several recent patches look
more related than some others.

- mm: rework free_area_init*() funcitons
https://lore.kernel.org/linux-mm/[email protected]/
Could this somehow allow an invalid pfn to escape into the page allocator?
Especially, is it related to skip the checks in memmap_init_zone()?
https://lore.kernel.org/linux-mm/[email protected]

# numactl -H
available: 8 nodes (0-7)
node 0 cpus: 0 1 16 17
node 0 size: 7951 MB
node 0 free: 4445 MB
node 1 cpus: 2 3 18 19
node 1 size: 0 MB
node 1 free: 0 MB
node 2 cpus: 4 5 20 21
node 2 size: 0 MB
node 2 free: 0 MB
node 3 cpus: 6 7 22 23
node 3 size: 0 MB
node 3 free: 0 MB
node 4 cpus: 8 9 24 25
node 4 size: 15354 MB
node 4 free: 78 MB
node 5 cpus: 10 11 26 27
node 5 size: 0 MB
node 5 free: 0 MB
node 6 cpus: 12 13 28 29
node 6 size: 0 MB
node 6 free: 0 MB
node 7 cpus: 14 15 30 31
node 7 size: 0 MB
node 7 free: 0 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 16 16 16 32 32 32 32
1: 16 10 16 16 32 32 32 32
2: 16 16 10 16 32 32 32 32
3: 16 16 16 10 32 32 32 32
4: 32 32 32 32 10 16 16 16
5: 32 32 32 32 16 10 16 16
6: 32 32 32 32 16 16 10 16
7: 32 32 32 32 16 16 16 10

[ 6803.941550] LTP: starting swapping01 (swapping01 -i 5)
[ 6821.098489] page:ffffea0000aa0000 refcount:1 mapcount:0 mapping:000000002243743b index:0x0
[ 6821.107077] flags: 0x1fffe000001000(reserved)
[ 6821.111534] raw: 001fffe000001000 ffffea0000aa0008 ffffea0000aa0008 0000000000000000
[ 6821.119365] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 6821.127167] page dumped because: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
[ 6821.135399] page_owner info is not present (never set?)
[ 6821.140717] ------------[ cut here ]------------
[ 6821.145372] kernel BUG at mm/page_alloc.c:533!
[ 6821.150075] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 6821.150083] irq event stamp: 10075005
[ 6821.150102] hardirqs last enabled at (10075005): [<ffffffff99ea403f>] do_page_fault+0x45f/0x9d7
[ 6821.156829] CPU: 17 PID: 218 Comm: kcompactd0 Not tainted 5.7.0-rc2-next-20200423+ #7
[ 6821.160522] hardirqs last disabled at (10075004): [<ffffffff99e03ed1>] trace_hardirqs_off_thunk+0x1a/0x1c
[ 6821.160535] softirqs last enabled at (10067158): [<ffffffff9ac00478>] __do_softirq+0x478/0x77f
[ 6821.169366] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018
[ 6821.169378] RIP: 0010:set_pfnblock_flags_mask+0x150/0x210
[ 6821.177257] softirqs last disabled at (10067149): [<ffffffff99ed22a6>] irq_exit+0xd6/0xf0
[ 6821.218362] Code: 1a 49 8d 7f 38 e8 80 ee 05 00 48 8b 45 a0 48 8b 55 a8 48 03 50 78 49 39 d4 72 1d 48 c7 c6 80 ee ef 9a 4c 89 ef e8 70 73 fb ff <0f> 0b 48 c7 c7 40 50 75 9b e8 c4 09 2b 00 4c 8b 65 d0 b9 3f 00 00
[ 6821.237457] RSP: 0018:ffffc900042ff858 EFLAGS: 00010282
[ 6821.242719] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff9a002382
[ 6821.249900] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8884535b8e6c
[ 6821.257384] RBP: ffffc900042ff8b8 R08: ffffed108a6b8459 R09: ffffed108a6b8459
[ 6821.264566] R10: ffff8884535c22c7 R11: ffffed108a6b8458 R12: 000000000002a800
[ 6821.271748] R13: ffffea0000aa0000 R14: ffff88847fff3000 R15: ffff88847fff3040
[ 6821.278930] FS: 0000000000000000(0000) GS:ffff888453580000(0000) knlGS:0000000000000000
[ 6821.287318] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6821.293102] CR2: 00007fd1eb4a1000 CR3: 000000083154c000 CR4: 00000000003406e0
[ 6821.300283] Call Trace:
[ 6821.302752] isolate_freepages+0xb20/0x1140
[ 6821.307167] ? isolate_freepages_block+0x730/0x730
[ 6821.311993] ? mark_held_locks+0x34/0xb0
[ 6821.315942] ? free_unref_page+0x7d/0x90
[ 6821.319891] ? free_unref_page+0x7d/0x90
[ 6821.323842] ? check_flags.part.28+0x86/0x220
[ 6821.328234] compaction_alloc+0xdd/0x100
[ 6821.332401] migrate_pages+0x304/0x17e0
[ 6821.336277] ? __ClearPageMovable+0x100/0x100
[ 6821.340674] ? isolate_freepages+0x1140/0x1140
[ 6821.345153] compact_zone+0x1249/0x1e90
[ 6821.349020] ? compaction_suitable+0x260/0x260
[ 6821.353494] kcompactd_do_work+0x231/0x650
[ 6821.357873] ? sysfs_compact_node+0x80/0x80
[ 6821.362088] ? finish_wait+0xe6/0x110
[ 6821.365775] kcompactd+0x162/0x490
[ 6821.369202] ? kcompactd_do_work+0x650/0x650
[ 6821.373501] ? finish_wait+0x110/0x110
[ 6821.377280] ? __kasan_check_read+0x11/0x20
[ 6821.381693] ? __kthread_parkme+0xd4/0xf0
[ 6821.385729] ? kcompactd_do_work+0x650/0x650
[ 6821.390027] kthread+0x1f7/0x220
[ 6821.393280] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 6821.398369] ret_from_fork+0x27/0x50
[ 6821.401968] Modules linked in: brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_amd ses kvm enclosure dax_pmem dax_pmem_core irqbypass acpi_cpufreq ip_tables x_tables xfs sd_mod smartpqi scsi_transport_sas tg3 firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod
[ 6821.426127] ---[ end trace 9783087562801ccf ]---
[ 6821.430800] RIP: 0010:set_pfnblock_flags_mask+0x150/0x210
[ 6821.436410] Code: 1a 49 8d 7f 38 e8 80 ee 05 00 48 8b 45 a0 48 8b 55 a8 48 03 50 78 49 39 d4 72 1d 48 c7 c6 80 ee ef 9a 4c 89 ef e8 70 73 fb ff <0f> 0b 48 c7 c7 40 50 75 9b e8 c4 09 2b 00 4c 8b 65 d0 b9 3f 00 00
[ 6821.455319] RSP: 0018:ffffc900042ff858 EFLAGS: 00010282
[ 6821.460863] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff9a002382
[ 6821.468063] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8884535b8e6c
[ 6821.475245] RBP: ffffc900042ff8b8 R08: ffffed108a6b8459 R09: ffffed108a6b8459
[ 6821.482675] R10: ffff8884535c22c7 R11: ffffed108a6b8458 R12: 000000000002a800
[ 6821.489877] R13: ffffea0000aa0000 R14: ffff88847fff3000 R15: ffff88847fff3040
[ 6821.497062] FS: 0000000000000000(0000) GS:ffff888453580000(0000) knlGS:0000000000000000
[ 6821.505218] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6821.511284] CR2: 00007fd1eb4a1000 CR3: 000000083154c000 CR4: 00000000003406e0
[ 6821.518487] Kernel panic - not syncing: Fatal exception
[ 6821.523876] Kernel Offset: 0x18e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 6821.534915] ---[ end Kernel panic - not syncing: Fatal exception ]---

2020-04-24 03:46:55

by Baoquan He

[permalink] [raw]

Subject: Re: compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))

On 04/23/20 at 05:25pm, Qian Cai wrote:
> Compaction starts to crash below on linux-next today. The faulty page belongs to Node 0 DMA32 zone.
> I’ll continue to narrow it down, but just want to give a headup in case someone could beat me to it.
>
> Debug output from free_area_init_core()
> [ 0.000000] KK start page = ffffea0000000040, end page = ffffea0000040000, nid = 0 DMA
> [ 0.000000] KK start page = ffffea0000040000, end page = ffffea0004000000, nid = 0 DMA32
> [ 0.000000] KK start page = ffffea0004000000, end page = ffffea0012000000, nid = 0 NORMAL
> [ 0.000000] KK start page = ffffea0012000000, end page = ffffea0021fc0000, nid = 4 NORMAL

Where are these printed? They are the direct mapping address of page?

>
> I don’t understand how it could end up in such a situation. There are several recent patches look
> more related than some others.
>
> - mm: rework free_area_init*() funcitons
> https://lore.kernel.org/linux-mm/[email protected]/
> Could this somehow allow an invalid pfn to escape into the page allocator?
> Especially, is it related to skip the checks in memmap_init_zone()?
> https://lore.kernel.org/linux-mm/[email protected]

Possibly. In which arch is this happening? Do you have boot log?

>
> # numactl -H
> available: 8 nodes (0-7)
> node 0 cpus: 0 1 16 17
> node 0 size: 7951 MB
> node 0 free: 4445 MB
> node 1 cpus: 2 3 18 19
> node 1 size: 0 MB
> node 1 free: 0 MB
> node 2 cpus: 4 5 20 21
> node 2 size: 0 MB
> node 2 free: 0 MB
> node 3 cpus: 6 7 22 23
> node 3 size: 0 MB
> node 3 free: 0 MB
> node 4 cpus: 8 9 24 25
> node 4 size: 15354 MB
> node 4 free: 78 MB
> node 5 cpus: 10 11 26 27
> node 5 size: 0 MB
> node 5 free: 0 MB
> node 6 cpus: 12 13 28 29
> node 6 size: 0 MB
> node 6 free: 0 MB
> node 7 cpus: 14 15 30 31
> node 7 size: 0 MB
> node 7 free: 0 MB
> node distances:
> node 0 1 2 3 4 5 6 7
> 0: 10 16 16 16 32 32 32 32
> 1: 16 10 16 16 32 32 32 32
> 2: 16 16 10 16 32 32 32 32
> 3: 16 16 16 10 32 32 32 32
> 4: 32 32 32 32 10 16 16 16
> 5: 32 32 32 32 16 10 16 16
> 6: 32 32 32 32 16 16 10 16
> 7: 32 32 32 32 16 16 16 10
>
> [ 6803.941550] LTP: starting swapping01 (swapping01 -i 5)
> [ 6821.098489] page:ffffea0000aa0000 refcount:1 mapcount:0 mapping:000000002243743b index:0x0
> [ 6821.107077] flags: 0x1fffe000001000(reserved)
> [ 6821.111534] raw: 001fffe000001000 ffffea0000aa0008 ffffea0000aa0008 0000000000000000
> [ 6821.119365] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> [ 6821.127167] page dumped because: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
> [ 6821.135399] page_owner info is not present (never set?)
> [ 6821.140717] ------------[ cut here ]------------
> [ 6821.145372] kernel BUG at mm/page_alloc.c:533!
> [ 6821.150075] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [ 6821.150083] irq event stamp: 10075005
> [ 6821.150102] hardirqs last enabled at (10075005): [<ffffffff99ea403f>] do_page_fault+0x45f/0x9d7
> [ 6821.156829] CPU: 17 PID: 218 Comm: kcompactd0 Not tainted 5.7.0-rc2-next-20200423+ #7
> [ 6821.160522] hardirqs last disabled at (10075004): [<ffffffff99e03ed1>] trace_hardirqs_off_thunk+0x1a/0x1c
> [ 6821.160535] softirqs last enabled at (10067158): [<ffffffff9ac00478>] __do_softirq+0x478/0x77f
> [ 6821.169366] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018
> [ 6821.169378] RIP: 0010:set_pfnblock_flags_mask+0x150/0x210
> [ 6821.177257] softirqs last disabled at (10067149): [<ffffffff99ed22a6>] irq_exit+0xd6/0xf0
> [ 6821.218362] Code: 1a 49 8d 7f 38 e8 80 ee 05 00 48 8b 45 a0 48 8b 55 a8 48 03 50 78 49 39 d4 72 1d 48 c7 c6 80 ee ef 9a 4c 89 ef e8 70 73 fb ff <0f> 0b 48 c7 c7 40 50 75 9b e8 c4 09 2b 00 4c 8b 65 d0 b9 3f 00 00
> [ 6821.237457] RSP: 0018:ffffc900042ff858 EFLAGS: 00010282
> [ 6821.242719] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff9a002382
> [ 6821.249900] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8884535b8e6c
> [ 6821.257384] RBP: ffffc900042ff8b8 R08: ffffed108a6b8459 R09: ffffed108a6b8459
> [ 6821.264566] R10: ffff8884535c22c7 R11: ffffed108a6b8458 R12: 000000000002a800
> [ 6821.271748] R13: ffffea0000aa0000 R14: ffff88847fff3000 R15: ffff88847fff3040
> [ 6821.278930] FS: 0000000000000000(0000) GS:ffff888453580000(0000) knlGS:0000000000000000
> [ 6821.287318] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 6821.293102] CR2: 00007fd1eb4a1000 CR3: 000000083154c000 CR4: 00000000003406e0
> [ 6821.300283] Call Trace:
> [ 6821.302752] isolate_freepages+0xb20/0x1140
> [ 6821.307167] ? isolate_freepages_block+0x730/0x730
> [ 6821.311993] ? mark_held_locks+0x34/0xb0
> [ 6821.315942] ? free_unref_page+0x7d/0x90
> [ 6821.319891] ? free_unref_page+0x7d/0x90
> [ 6821.323842] ? check_flags.part.28+0x86/0x220
> [ 6821.328234] compaction_alloc+0xdd/0x100
> [ 6821.332401] migrate_pages+0x304/0x17e0
> [ 6821.336277] ? __ClearPageMovable+0x100/0x100
> [ 6821.340674] ? isolate_freepages+0x1140/0x1140
> [ 6821.345153] compact_zone+0x1249/0x1e90
> [ 6821.349020] ? compaction_suitable+0x260/0x260
> [ 6821.353494] kcompactd_do_work+0x231/0x650
> [ 6821.357873] ? sysfs_compact_node+0x80/0x80
> [ 6821.362088] ? finish_wait+0xe6/0x110
> [ 6821.365775] kcompactd+0x162/0x490
> [ 6821.369202] ? kcompactd_do_work+0x650/0x650
> [ 6821.373501] ? finish_wait+0x110/0x110
> [ 6821.377280] ? __kasan_check_read+0x11/0x20
> [ 6821.381693] ? __kthread_parkme+0xd4/0xf0
> [ 6821.385729] ? kcompactd_do_work+0x650/0x650
> [ 6821.390027] kthread+0x1f7/0x220
> [ 6821.393280] ? kthread_create_worker_on_cpu+0xc0/0xc0
> [ 6821.398369] ret_from_fork+0x27/0x50
> [ 6821.401968] Modules linked in: brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_amd ses kvm enclosure dax_pmem dax_pmem_core irqbypass acpi_cpufreq ip_tables x_tables xfs sd_mod smartpqi scsi_transport_sas tg3 firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod
> [ 6821.426127] ---[ end trace 9783087562801ccf ]---
> [ 6821.430800] RIP: 0010:set_pfnblock_flags_mask+0x150/0x210
> [ 6821.436410] Code: 1a 49 8d 7f 38 e8 80 ee 05 00 48 8b 45 a0 48 8b 55 a8 48 03 50 78 49 39 d4 72 1d 48 c7 c6 80 ee ef 9a 4c 89 ef e8 70 73 fb ff <0f> 0b 48 c7 c7 40 50 75 9b e8 c4 09 2b 00 4c 8b 65 d0 b9 3f 00 00
> [ 6821.455319] RSP: 0018:ffffc900042ff858 EFLAGS: 00010282
> [ 6821.460863] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff9a002382
> [ 6821.468063] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8884535b8e6c
> [ 6821.475245] RBP: ffffc900042ff8b8 R08: ffffed108a6b8459 R09: ffffed108a6b8459
> [ 6821.482675] R10: ffff8884535c22c7 R11: ffffed108a6b8458 R12: 000000000002a800
> [ 6821.489877] R13: ffffea0000aa0000 R14: ffff88847fff3000 R15: ffff88847fff3040
> [ 6821.497062] FS: 0000000000000000(0000) GS:ffff888453580000(0000) knlGS:0000000000000000
> [ 6821.505218] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 6821.511284] CR2: 00007fd1eb4a1000 CR3: 000000083154c000 CR4: 00000000003406e0
> [ 6821.518487] Kernel panic - not syncing: Fatal exception
> [ 6821.523876] Kernel Offset: 0x18e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 6821.534915] ---[ end Kernel panic - not syncing: Fatal exception ]---
>

2020-04-24 13:47:54

by Qian Cai

[permalink] [raw]

Subject: Re: compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))

> On Apr 23, 2020, at 11:43 PM, Baoquan He <[email protected]> wrote:
>
> On 04/23/20 at 05:25pm, Qian Cai wrote:
>> Compaction starts to crash below on linux-next today. The faulty page belongs to Node 0 DMA32 zone.
>> I’ll continue to narrow it down, but just want to give a headup in case someone could beat me to it.
>>
>> Debug output from free_area_init_core()
>> [ 0.000000] KK start page = ffffea0000000040, end page = ffffea0000040000, nid = 0 DMA
>> [ 0.000000] KK start page = ffffea0000040000, end page = ffffea0004000000, nid = 0 DMA32
>> [ 0.000000] KK start page = ffffea0004000000, end page = ffffea0012000000, nid = 0 NORMAL
>> [ 0.000000] KK start page = ffffea0012000000, end page = ffffea0021fc0000, nid = 4 NORMAL
>
> Where are these printed? They are the direct mapping address of page?

From this debug patch. Yes, direct mapping.

@@ -6072,11 +6092,17 @@ void __meminit __weak memmap_init(unsigned long size, int nid,
unsigned long start_pfn, end_pfn;
unsigned long range_end_pfn = range_start_pfn + size;
int i;
+ struct page *page = (struct page *)0xffffea0000aa0000;
+
+ printk("KK start page = %px, end page = %px, nid = %d\n", pfn_to_page(range_start_pfn), pfn_to_page(range_end_pfn), nid);

This is the config if that helps.

https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config

>
>>
>> I don’t understand how it could end up in such a situation. There are several recent patches look
>> more related than some others.
>>
>> - mm: rework free_area_init*() funcitons
>> https://lore.kernel.org/linux-mm/[email protected]/
>> Could this somehow allow an invalid pfn to escape into the page allocator?
>> Especially, is it related to skip the checks in memmap_init_zone()?
>> https://lore.kernel.org/linux-mm/[email protected]
>
> Possibly. In which arch is this happening? Do you have boot log?

x86,

https://cailca.github.io/files/dmesg.txt

>
>>
>> # numactl -H
>> available: 8 nodes (0-7)
>> node 0 cpus: 0 1 16 17
>> node 0 size: 7951 MB
>> node 0 free: 4445 MB
>> node 1 cpus: 2 3 18 19
>> node 1 size: 0 MB
>> node 1 free: 0 MB
>> node 2 cpus: 4 5 20 21
>> node 2 size: 0 MB
>> node 2 free: 0 MB
>> node 3 cpus: 6 7 22 23
>> node 3 size: 0 MB
>> node 3 free: 0 MB
>> node 4 cpus: 8 9 24 25
>> node 4 size: 15354 MB
>> node 4 free: 78 MB
>> node 5 cpus: 10 11 26 27
>> node 5 size: 0 MB
>> node 5 free: 0 MB
>> node 6 cpus: 12 13 28 29
>> node 6 size: 0 MB
>> node 6 free: 0 MB
>> node 7 cpus: 14 15 30 31
>> node 7 size: 0 MB
>> node 7 free: 0 MB
>> node distances:
>> node 0 1 2 3 4 5 6 7
>> 0: 10 16 16 16 32 32 32 32
>> 1: 16 10 16 16 32 32 32 32
>> 2: 16 16 10 16 32 32 32 32
>> 3: 16 16 16 10 32 32 32 32
>> 4: 32 32 32 32 10 16 16 16
>> 5: 32 32 32 32 16 10 16 16
>> 6: 32 32 32 32 16 16 10 16
>> 7: 32 32 32 32 16 16 16 10
>>
>> [ 6803.941550] LTP: starting swapping01 (swapping01 -i 5)
>> [ 6821.098489] page:ffffea0000aa0000 refcount:1 mapcount:0 mapping:000000002243743b index:0x0
>> [ 6821.107077] flags: 0x1fffe000001000(reserved)
>> [ 6821.111534] raw: 001fffe000001000 ffffea0000aa0008 ffffea0000aa0008 0000000000000000
>> [ 6821.119365] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
>> [ 6821.127167] page dumped because: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
>> [ 6821.135399] page_owner info is not present (never set?)
>> [ 6821.140717] ------------[ cut here ]------------
>> [ 6821.145372] kernel BUG at mm/page_alloc.c:533!
>> [ 6821.150075] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> [ 6821.150083] irq event stamp: 10075005
>> [ 6821.150102] hardirqs last enabled at (10075005): [<ffffffff99ea403f>] do_page_fault+0x45f/0x9d7
>> [ 6821.156829] CPU: 17 PID: 218 Comm: kcompactd0 Not tainted 5.7.0-rc2-next-20200423+ #7
>> [ 6821.160522] hardirqs last disabled at (10075004): [<ffffffff99e03ed1>] trace_hardirqs_off_thunk+0x1a/0x1c
>> [ 6821.160535] softirqs last enabled at (10067158): [<ffffffff9ac00478>] __do_softirq+0x478/0x77f
>> [ 6821.169366] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018
>> [ 6821.169378] RIP: 0010:set_pfnblock_flags_mask+0x150/0x210
>> [ 6821.177257] softirqs last disabled at (10067149): [<ffffffff99ed22a6>] irq_exit+0xd6/0xf0
>> [ 6821.218362] Code: 1a 49 8d 7f 38 e8 80 ee 05 00 48 8b 45 a0 48 8b 55 a8 48 03 50 78 49 39 d4 72 1d 48 c7 c6 80 ee ef 9a 4c 89 ef e8 70 73 fb ff <0f> 0b 48 c7 c7 40 50 75 9b e8 c4 09 2b 00 4c 8b 65 d0 b9 3f 00 00
>> [ 6821.237457] RSP: 0018:ffffc900042ff858 EFLAGS: 00010282
>> [ 6821.242719] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff9a002382
>> [ 6821.249900] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8884535b8e6c
>> [ 6821.257384] RBP: ffffc900042ff8b8 R08: ffffed108a6b8459 R09: ffffed108a6b8459
>> [ 6821.264566] R10: ffff8884535c22c7 R11: ffffed108a6b8458 R12: 000000000002a800
>> [ 6821.271748] R13: ffffea0000aa0000 R14: ffff88847fff3000 R15: ffff88847fff3040
>> [ 6821.278930] FS: 0000000000000000(0000) GS:ffff888453580000(0000) knlGS:0000000000000000
>> [ 6821.287318] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 6821.293102] CR2: 00007fd1eb4a1000 CR3: 000000083154c000 CR4: 00000000003406e0
>> [ 6821.300283] Call Trace:
>> [ 6821.302752] isolate_freepages+0xb20/0x1140
>> [ 6821.307167] ? isolate_freepages_block+0x730/0x730
>> [ 6821.311993] ? mark_held_locks+0x34/0xb0
>> [ 6821.315942] ? free_unref_page+0x7d/0x90
>> [ 6821.319891] ? free_unref_page+0x7d/0x90
>> [ 6821.323842] ? check_flags.part.28+0x86/0x220
>> [ 6821.328234] compaction_alloc+0xdd/0x100
>> [ 6821.332401] migrate_pages+0x304/0x17e0
>> [ 6821.336277] ? __ClearPageMovable+0x100/0x100
>> [ 6821.340674] ? isolate_freepages+0x1140/0x1140
>> [ 6821.345153] compact_zone+0x1249/0x1e90
>> [ 6821.349020] ? compaction_suitable+0x260/0x260
>> [ 6821.353494] kcompactd_do_work+0x231/0x650
>> [ 6821.357873] ? sysfs_compact_node+0x80/0x80
>> [ 6821.362088] ? finish_wait+0xe6/0x110
>> [ 6821.365775] kcompactd+0x162/0x490
>> [ 6821.369202] ? kcompactd_do_work+0x650/0x650
>> [ 6821.373501] ? finish_wait+0x110/0x110
>> [ 6821.377280] ? __kasan_check_read+0x11/0x20
>> [ 6821.381693] ? __kthread_parkme+0xd4/0xf0
>> [ 6821.385729] ? kcompactd_do_work+0x650/0x650
>> [ 6821.390027] kthread+0x1f7/0x220
>> [ 6821.393280] ? kthread_create_worker_on_cpu+0xc0/0xc0
>> [ 6821.398369] ret_from_fork+0x27/0x50
>> [ 6821.401968] Modules linked in: brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_amd ses kvm enclosure dax_pmem dax_pmem_core irqbypass acpi_cpufreq ip_tables x_tables xfs sd_mod smartpqi scsi_transport_sas tg3 firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod
>> [ 6821.426127] ---[ end trace 9783087562801ccf ]---
>> [ 6821.430800] RIP: 0010:set_pfnblock_flags_mask+0x150/0x210
>> [ 6821.436410] Code: 1a 49 8d 7f 38 e8 80 ee 05 00 48 8b 45 a0 48 8b 55 a8 48 03 50 78 49 39 d4 72 1d 48 c7 c6 80 ee ef 9a 4c 89 ef e8 70 73 fb ff <0f> 0b 48 c7 c7 40 50 75 9b e8 c4 09 2b 00 4c 8b 65 d0 b9 3f 00 00
>> [ 6821.455319] RSP: 0018:ffffc900042ff858 EFLAGS: 00010282
>> [ 6821.460863] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff9a002382
>> [ 6821.468063] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8884535b8e6c
>> [ 6821.475245] RBP: ffffc900042ff8b8 R08: ffffed108a6b8459 R09: ffffed108a6b8459
>> [ 6821.482675] R10: ffff8884535c22c7 R11: ffffed108a6b8458 R12: 000000000002a800
>> [ 6821.489877] R13: ffffea0000aa0000 R14: ffff88847fff3000 R15: ffff88847fff3040
>> [ 6821.497062] FS: 0000000000000000(0000) GS:ffff888453580000(0000) knlGS:0000000000000000
>> [ 6821.505218] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 6821.511284] CR2: 00007fd1eb4a1000 CR3: 000000083154c000 CR4: 00000000003406e0
>> [ 6821.518487] Kernel panic - not syncing: Fatal exception
>> [ 6821.523876] Kernel Offset: 0x18e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [ 6821.534915] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>

2020-04-26 14:45:58

by Mike Rapoport

[permalink] [raw]

Subject: Re: compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))

Hi,

On Thu, Apr 23, 2020 at 05:25:56PM -0400, Qian Cai wrote:
> Compaction starts to crash below on linux-next today. The faulty page belongs to Node 0 DMA32 zone.
> I’ll continue to narrow it down, but just want to give a headup in case someone could beat me to it.
>
> Debug output from free_area_init_core()
> [ 0.000000] KK start page = ffffea0000000040, end page = ffffea0000040000, nid = 0 DMA
> [ 0.000000] KK start page = ffffea0000040000, end page = ffffea0004000000, nid = 0 DMA32
> [ 0.000000] KK start page = ffffea0004000000, end page = ffffea0012000000, nid = 0 NORMAL
> [ 0.000000] KK start page = ffffea0012000000, end page = ffffea0021fc0000, nid = 4 NORMAL
>
> I don’t understand how it could end up in such a situation. There are several recent patches look
> more related than some others.

Can you please add "mminit_loglevel=4 memblock=debug" to the kernel
command line?

> - mm: rework free_area_init*() funcitons
> https://lore.kernel.org/linux-mm/[email protected]/
> Could this somehow allow an invalid pfn to escape into the page allocator?
> Especially, is it related to skip the checks in memmap_init_zone()?
> https://lore.kernel.org/linux-mm/[email protected]

--
Sincerely yours,
Mike.

2020-04-27 13:49:20

by Qian Cai

[permalink] [raw]

Subject: Re: compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))

> On Apr 26, 2020, at 10:41 AM, Mike Rapoport <[email protected]> wrote:
>
> Hi,
>
> On Thu, Apr 23, 2020 at 05:25:56PM -0400, Qian Cai wrote:
>> Compaction starts to crash below on linux-next today. The faulty page belongs to Node 0 DMA32 zone.
>> I’ll continue to narrow it down, but just want to give a headup in case someone could beat me to it.
>>
>> Debug output from free_area_init_core()
>> [ 0.000000] KK start page = ffffea0000000040, end page = ffffea0000040000, nid = 0 DMA
>> [ 0.000000] KK start page = ffffea0000040000, end page = ffffea0004000000, nid = 0 DMA32
>> [ 0.000000] KK start page = ffffea0004000000, end page = ffffea0012000000, nid = 0 NORMAL
>> [ 0.000000] KK start page = ffffea0012000000, end page = ffffea0021fc0000, nid = 4 NORMAL
>>
>> I don’t understand how it could end up in such a situation. There are several recent patches look
>> more related than some others.
>
> Can you please add "mminit_loglevel=4 memblock=debug" to the kernel
> command line?

https://cailca.github.io/files/dmesg.txt

2020-05-05 12:45:13

by Baoquan He

[permalink] [raw]

Subject: Re: compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))

Hi,

On 04/24/20 at 09:45am, Qian Cai wrote:
>
>
> > On Apr 23, 2020, at 11:43 PM, Baoquan He <[email protected]> wrote:
> >
> > On 04/23/20 at 05:25pm, Qian Cai wrote:
> >> Compaction starts to crash below on linux-next today. The faulty page belongs to Node 0 DMA32 zone.
> >> I’ll continue to narrow it down, but just want to give a headup in case someone could beat me to it.
> >>
> >> Debug output from free_area_init_core()
> >> [ 0.000000] KK start page = ffffea0000000040, end page = ffffea0000040000, nid = 0 DMA
> >> [ 0.000000] KK start page = ffffea0000040000, end page = ffffea0004000000, nid = 0 DMA32
> >> [ 0.000000] KK start page = ffffea0004000000, end page = ffffea0012000000, nid = 0 NORMAL
> >> [ 0.000000] KK start page = ffffea0012000000, end page = ffffea0021fc0000, nid = 4 NORMAL
> >
> > Where are these printed? They are the direct mapping address of page?
>
> From this debug patch. Yes, direct mapping.

Can you try below patch? I may get why this is caused, not sure if the
place is right.

diff --git a/mm/compaction.c b/mm/compaction.c
index 177c11a8f3b9..e26972f26414 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1409,7 +1409,9 @@ fast_isolate_freepages(struct compact_control *cc)
cc->free_pfn = highest;
} else {
if (cc->direct_compaction && pfn_valid(min_pfn)) {
- page = pfn_to_page(min_pfn);
+ page = pageblock_pfn_to_page(min_pfn,
+ pageblock_end_pfn(min_pfn),
+ cc->zone);
cc->free_pfn = min_pfn;
}
}

2020-05-05 13:23:25

by Qian Cai

[permalink] [raw]

Subject: Re: compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))

> On May 5, 2020, at 8:43 AM, Baoquan He <[email protected]> wrote:
>
> Hi,
>
> On 04/24/20 at 09:45am, Qian Cai wrote:
>>
>>
>>> On Apr 23, 2020, at 11:43 PM, Baoquan He <[email protected]> wrote:
>>>
>>> On 04/23/20 at 05:25pm, Qian Cai wrote:
>>>> Compaction starts to crash below on linux-next today. The faulty page belongs to Node 0 DMA32 zone.
>>>> I’ll continue to narrow it down, but just want to give a headup in case someone could beat me to it.
>>>>
>>>> Debug output from free_area_init_core()
>>>> [ 0.000000] KK start page = ffffea0000000040, end page = ffffea0000040000, nid = 0 DMA
>>>> [ 0.000000] KK start page = ffffea0000040000, end page = ffffea0004000000, nid = 0 DMA32
>>>> [ 0.000000] KK start page = ffffea0004000000, end page = ffffea0012000000, nid = 0 NORMAL
>>>> [ 0.000000] KK start page = ffffea0012000000, end page = ffffea0021fc0000, nid = 4 NORMAL
>>>
>>> Where are these printed? They are the direct mapping address of page?
>>
>> From this debug patch. Yes, direct mapping.
>
> Can you try below patch? I may get why this is caused, not sure if the
> place is right.
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 177c11a8f3b9..e26972f26414 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1409,7 +1409,9 @@ fast_isolate_freepages(struct compact_control *cc)
> cc->free_pfn = highest;
> } else {
> if (cc->direct_compaction && pfn_valid(min_pfn)) {
> - page = pfn_to_page(min_pfn);
> + page = pageblock_pfn_to_page(min_pfn,
> + pageblock_end_pfn(min_pfn),
> + cc->zone);
> cc->free_pfn = min_pfn;
> }
> }

I have not had luck to reproduce this again yet, but feel free to move forward with the patch anyway if you are comfortable to do so, so at least people could review it properly.

2020-05-11 01:23:28

by Baoquan He

[permalink] [raw]

Subject: Re: compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))

On 05/05/20 at 09:20am, Qian Cai wrote:
>
>
> > On May 5, 2020, at 8:43 AM, Baoquan He <[email protected]> wrote:
> >
> > Hi,
> >
> > On 04/24/20 at 09:45am, Qian Cai wrote:
> >>
> >>
> >>> On Apr 23, 2020, at 11:43 PM, Baoquan He <[email protected]> wrote:
> >>>
> >>> On 04/23/20 at 05:25pm, Qian Cai wrote:
> >>>> Compaction starts to crash below on linux-next today. The faulty page belongs to Node 0 DMA32 zone.
> >>>> I’ll continue to narrow it down, but just want to give a headup in case someone could beat me to it.
> >>>>
> >>>> Debug output from free_area_init_core()
> >>>> [ 0.000000] KK start page = ffffea0000000040, end page = ffffea0000040000, nid = 0 DMA
> >>>> [ 0.000000] KK start page = ffffea0000040000, end page = ffffea0004000000, nid = 0 DMA32
> >>>> [ 0.000000] KK start page = ffffea0004000000, end page = ffffea0012000000, nid = 0 NORMAL
> >>>> [ 0.000000] KK start page = ffffea0012000000, end page = ffffea0021fc0000, nid = 4 NORMAL
> >>>
> >>> Where are these printed? They are the direct mapping address of page?
> >>
> >> From this debug patch. Yes, direct mapping.
> >
> > Can you try below patch? I may get why this is caused, not sure if the
> > place is right.
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 177c11a8f3b9..e26972f26414 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -1409,7 +1409,9 @@ fast_isolate_freepages(struct compact_control *cc)
> > cc->free_pfn = highest;
> > } else {
> > if (cc->direct_compaction && pfn_valid(min_pfn)) {
> > - page = pfn_to_page(min_pfn);
> > + page = pageblock_pfn_to_page(min_pfn,
> > + pageblock_end_pfn(min_pfn),
> > + cc->zone);
> > cc->free_pfn = min_pfn;
> > }
> > }
>
> I have not had luck to reproduce this again yet, but feel free to move forward with the patch anyway if you are comfortable to do so, so at least people could review it properly.

OK, I will make a patch with details in log and post. Thanks.

2020-11-21 19:48:02

by Andrea Arcangeli

[permalink] [raw]

Subject: [PATCH 0/1] VM_BUG_ON_PAGE(!zone_spans_pfn) in set_pfnblock_flags_mask

Hello,

After hitting this twice on two different systems, I'm now running
with the tentative fix applied, but it's not a meaningful test since
it's non reproducible.

However it is possible to inject this bug if you do "grep E820
/proc/iomem" and then find a phys addr there with a struct page
(i.e. pfn_valid) in a zone, with this change:

min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1));
+ if (cc->zone is the zone where the e820 physaddr has a pfn_valid)
+ min_pfn = physaddr_of_E820_non_RAM_page_with_valid_pfn >> PAGE_SHIFT;

I didn't try to inject the bug to validate the fix and it'd be great
if someone can try that to validate this or any other fix.

Andrea Arcangeli (1):
mm: compaction: avoid fast_isolate_around() to set pageblock_skip on
reserved pages

mm/compaction.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)