2022-03-31 04:12:33

by Steven Rostedt

[permalink] [raw]
Subject: [BUG] Crash on x86_32 for: mm: page_alloc: avoid merging non-fallbackable pageblocks with others

I started testing new patches and it crashed when doing the x86-32 test on
boot up.

Initializing HighMem for node 0 (000375fe:0021ee00)
BUG: kernel NULL pointer dereference, address: 00000878
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
*pdpt = 0000000000000000 *pde = f0000000f000eef3
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-test+ #469
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: get_pfnblock_flags_mask+0x2c/0x36
Code: 6d ea ff 55 89 e5 56 89 ce 53 8b 18 89 d8 c1 eb 1e e8 f7 fb ff ff 69 db c0 02 00 00 89 c1 89 c2 c1 ea 05 8b 83 7c d7 79 c1 5b <8b> 04 90 d3 e8 21 f0 5e 5d c3 55 89 e5 57 56 89 d6 53 89 c3 64 a1
EAX: 00000000 EBX: f75f6000 ECX: 000043dc EDX: 0000021e
ESI: 00000007 EDI: 00000000 EBP: c15d9e34 ESP: c15d9e30
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210007
CR0: 80050033 CR2: 00000878 CR3: 01a48000 CR4: 000406b0
Call Trace:
__free_one_page+0x168/0x22a
free_pcppages_bulk+0xf0/0x1b9
free_unref_page_commit+0xe4/0xed
free_unref_page+0x77/0x9b
free_the_page+0x16/0x18
__free_pages+0x22/0x51
add_highpages_with_active_regions+0xbb/0xea
set_highmem_pages_init+0x69/0x7a
mem_init+0x2d/0x141
start_kernel+0x353/0x5f4
? set_intr_gate+0x47/0x5a
? early_idt_handler_common+0x44/0x44
i386_start_kernel+0x48/0x4a
startup_32_smp+0x161/0x164
Modules linked in:
CR2: 0000000000000878
---[ end trace 0000000000000000 ]---


I bisected it down to:

1dd214b8f21ca46d5431be9b2db8513c59e07a26
mm: page_alloc: avoid merging non-fallbackable pageblocks with others

To confirm, I went back to Linus's master branch (from last night), to
verify that it crashes. Then reverted this commit, recompiled and it booted
up fine without it.

Attached is the config that crashed.

-- Steve


Attachments:
(No filename) (1.83 kB)
config-bad (159.26 kB)
Download all attachments

2022-03-31 04:21:59

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [BUG] Crash on x86_32 for: mm: page_alloc: avoid merging non-fallbackable pageblocks with others

On 3/30/22 22:05, Linus Torvalds wrote:
> On Wed, Mar 30, 2022 at 12:42 PM Steven Rostedt <[email protected]> wrote:
>>
>> I started testing new patches and it crashed when doing the x86-32 test on
>> boot up.
>>
>> Initializing HighMem for node 0 (000375fe:0021ee00)
>> BUG: kernel NULL pointer dereference, address: 00000878
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> *pdpt = 0000000000000000 *pde = f0000000f000eef3
>> Oops: 0000 [#1] PREEMPT SMP PTI
>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-test+ #469
>> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
>> EIP: get_pfnblock_flags_mask+0x2c/0x36
>> Code: 6d ea ff 55 89 e5 56 89 ce 53 8b 18 89 d8 c1 eb 1e e8 f7 fb ff ff 69 db c0 02 00 00 89 c1 89 c2 c1 ea 05 8b 83 7c d7 79 c1 5b <8b> 04 90 d3 e8 21 f0 5e 5d c3 55 89 e5 57 56 89 d6 53 89 c3 64 a1
>
> The whole function is in that Code: thing, and it decodes to:
>
> 0: 55 push %ebp
> 1: 89 e5 mov %esp,%ebp
> 3: 56 push %esi
> 4: 89 ce mov %ecx,%esi
> 6: 53 push %ebx
> 7: 8b 18 mov (%eax),%ebx
> 9: 89 d8 mov %ebx,%eax
> b: c1 eb 1e shr $0x1e,%ebx
> e: e8 f7 fb ff ff call 0xfffffc0a
> 13: 69 db c0 02 00 00 imul $0x2c0,%ebx,%ebx
> 19: 89 c1 mov %eax,%ecx
> 1b: 89 c2 mov %eax,%edx
> 1d: c1 ea 05 shr $0x5,%edx
> 20: 8b 83 7c d7 79 c1 mov -0x3e862884(%ebx),%eax
> 26: 5b pop %ebx
> 27:* 8b 04 90 mov (%eax,%edx,4),%eax <-- trapping instruction
> 2a: d3 e8 shr %cl,%eax
> 2c: 21 f0 and %esi,%eax
> 2e: 5e pop %esi
> 2f: 5d pop %ebp
> 30: c3 ret
>
> with '%eax' being NULL, and %edx being 0x21e.
>
> (The call seems to be to 'pfn_to_bitidx().isra.0' if my compiler does
> similar code generation, so it's out-of-lined part of pfn_to_bitidx()
> despite being marked inline)
>
> So that oops is that
>
> word = bitmap[word_bitidx];
>
> line, with 'bitmap' being NULL (and %edx contains 'word_bitidx').
>
> Looking around, your 'config-bad' doesn't even have
> CONFIG_MEMORY_ISOLATION enabled, and so I suspect the culprit is this
> part of the change:
>
> - if (unlikely(has_isolate_pageblock(zone))) {
>
> which used to always be false for that config, and now the code is
> suddenly enabled.

If CONFIG_MEMORY_ISOLATION was enabled then the zone layout would be the
same, so I think it's not simply that. I think it's the timing -
has_isolate_pageblock(zone) would only be possible to become true later
in runtime when some isolation is ongoing, but here it seems we are
still in the early boot. Probably at a boundary of highmem with another
zone that doesn't have the pageblock bitmap yet initialized? While later
it would have, and all would be fine.

As Zi Yan said, the usual merging code will, through page_is_buddy()
find safely enough the buddy is not applicable, so I agree with his
patch direction. Seems this also shows the code tried to become too
smart and for the next merge window we should try just move the
migratetype checks into the main while loop (under something like "if
(order >= max_order)") and simplify the function a lot, hopefully with
negligible perf impact.

> Alternatively, that code just can't deal with highmem properly.
>
> But I didn't really analyze things, I'm mainly doing pattern matching here.
>
> Zi Yan - and all the people who ack'ed and reviewed this - please take
> a deeper look..
>
> Linus

2022-03-31 04:27:46

by Zi Yan

[permalink] [raw]
Subject: Re: [BUG] Crash on x86_32 for: mm: page_alloc: avoid merging non-fallbackable pageblocks with others

On 30 Mar 2022, at 16:05, Linus Torvalds wrote:

> On Wed, Mar 30, 2022 at 12:42 PM Steven Rostedt <[email protected]> wrote:
>>
>> I started testing new patches and it crashed when doing the x86-32 test on
>> boot up.
>>
>> Initializing HighMem for node 0 (000375fe:0021ee00)
>> BUG: kernel NULL pointer dereference, address: 00000878
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> *pdpt = 0000000000000000 *pde = f0000000f000eef3
>> Oops: 0000 [#1] PREEMPT SMP PTI
>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-test+ #469
>> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
>> EIP: get_pfnblock_flags_mask+0x2c/0x36
>> Code: 6d ea ff 55 89 e5 56 89 ce 53 8b 18 89 d8 c1 eb 1e e8 f7 fb ff ff 69 db c0 02 00 00 89 c1 89 c2 c1 ea 05 8b 83 7c d7 79 c1 5b <8b> 04 90 d3 e8 21 f0 5e 5d c3 55 89 e5 57 56 89 d6 53 89 c3 64 a1
>
> The whole function is in that Code: thing, and it decodes to:
>
> 0: 55 push %ebp
> 1: 89 e5 mov %esp,%ebp
> 3: 56 push %esi
> 4: 89 ce mov %ecx,%esi
> 6: 53 push %ebx
> 7: 8b 18 mov (%eax),%ebx
> 9: 89 d8 mov %ebx,%eax
> b: c1 eb 1e shr $0x1e,%ebx
> e: e8 f7 fb ff ff call 0xfffffc0a
> 13: 69 db c0 02 00 00 imul $0x2c0,%ebx,%ebx
> 19: 89 c1 mov %eax,%ecx
> 1b: 89 c2 mov %eax,%edx
> 1d: c1 ea 05 shr $0x5,%edx
> 20: 8b 83 7c d7 79 c1 mov -0x3e862884(%ebx),%eax
> 26: 5b pop %ebx
> 27:* 8b 04 90 mov (%eax,%edx,4),%eax <-- trapping instruction
> 2a: d3 e8 shr %cl,%eax
> 2c: 21 f0 and %esi,%eax
> 2e: 5e pop %esi
> 2f: 5d pop %ebp
> 30: c3 ret
>
> with '%eax' being NULL, and %edx being 0x21e.
>
> (The call seems to be to 'pfn_to_bitidx().isra.0' if my compiler does
> similar code generation, so it's out-of-lined part of pfn_to_bitidx()
> despite being marked inline)
>
> So that oops is that
>
> word = bitmap[word_bitidx];
>
> line, with 'bitmap' being NULL (and %edx contains 'word_bitidx').
>
> Looking around, your 'config-bad' doesn't even have
> CONFIG_MEMORY_ISOLATION enabled, and so I suspect the culprit is this
> part of the change:
>
> - if (unlikely(has_isolate_pageblock(zone))) {
>
> which used to always be false for that config, and now the code is
> suddenly enabled.
>
> Alternatively, that code just can't deal with highmem properly.
>
> But I didn't really analyze things, I'm mainly doing pattern matching here.
>
> Zi Yan - and all the people who ack'ed and reviewed this - please take
> a deeper look..
>

In the original code, it will jump back to continue_merging and still tries
to find the buddy. The crash means the found buddy is not valid, since its
pageblock migratetype is NULL. That seems to suggest the physical memory
range is not aligned to MAX_ORDER_NR_PAGES, which should not be the case.
But if (!page_is_buddy(page, buddy, order)) prevents further buddy merging.
I must be missing something.


Hi Steven,

Can you try the patch below to see if it fixes the crash? Thanks.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bdc8f60ae462..83a90e2973b7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1108,6 +1108,8 @@ static inline void __free_one_page(struct page *page,

buddy_pfn = __find_buddy_pfn(pfn, order);
buddy = page + (buddy_pfn - pfn);
+ if (!page_is_buddy(page, buddy, order))
+ goto done_merging;
buddy_mt = get_pageblock_migratetype(buddy);

if (migratetype != buddy_mt


--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2022-03-31 04:58:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG] Crash on x86_32 for: mm: page_alloc: avoid merging non-fallbackable pageblocks with others

On Wed, Mar 30, 2022 at 12:42 PM Steven Rostedt <[email protected]> wrote:
>
> I started testing new patches and it crashed when doing the x86-32 test on
> boot up.
>
> Initializing HighMem for node 0 (000375fe:0021ee00)
> BUG: kernel NULL pointer dereference, address: 00000878
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> *pdpt = 0000000000000000 *pde = f0000000f000eef3
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-test+ #469
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> EIP: get_pfnblock_flags_mask+0x2c/0x36
> Code: 6d ea ff 55 89 e5 56 89 ce 53 8b 18 89 d8 c1 eb 1e e8 f7 fb ff ff 69 db c0 02 00 00 89 c1 89 c2 c1 ea 05 8b 83 7c d7 79 c1 5b <8b> 04 90 d3 e8 21 f0 5e 5d c3 55 89 e5 57 56 89 d6 53 89 c3 64 a1

The whole function is in that Code: thing, and it decodes to:

0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 56 push %esi
4: 89 ce mov %ecx,%esi
6: 53 push %ebx
7: 8b 18 mov (%eax),%ebx
9: 89 d8 mov %ebx,%eax
b: c1 eb 1e shr $0x1e,%ebx
e: e8 f7 fb ff ff call 0xfffffc0a
13: 69 db c0 02 00 00 imul $0x2c0,%ebx,%ebx
19: 89 c1 mov %eax,%ecx
1b: 89 c2 mov %eax,%edx
1d: c1 ea 05 shr $0x5,%edx
20: 8b 83 7c d7 79 c1 mov -0x3e862884(%ebx),%eax
26: 5b pop %ebx
27:* 8b 04 90 mov (%eax,%edx,4),%eax <-- trapping instruction
2a: d3 e8 shr %cl,%eax
2c: 21 f0 and %esi,%eax
2e: 5e pop %esi
2f: 5d pop %ebp
30: c3 ret

with '%eax' being NULL, and %edx being 0x21e.

(The call seems to be to 'pfn_to_bitidx().isra.0' if my compiler does
similar code generation, so it's out-of-lined part of pfn_to_bitidx()
despite being marked inline)

So that oops is that

word = bitmap[word_bitidx];

line, with 'bitmap' being NULL (and %edx contains 'word_bitidx').

Looking around, your 'config-bad' doesn't even have
CONFIG_MEMORY_ISOLATION enabled, and so I suspect the culprit is this
part of the change:

- if (unlikely(has_isolate_pageblock(zone))) {

which used to always be false for that config, and now the code is
suddenly enabled.

Alternatively, that code just can't deal with highmem properly.

But I didn't really analyze things, I'm mainly doing pattern matching here.

Zi Yan - and all the people who ack'ed and reviewed this - please take
a deeper look..

Linus