2018-07-12 17:26:46

by Abdul Haleem

[permalink] [raw]
Subject: [next-20180711][Oops] linux-next kernel boot is broken on powerpc

Greeting's

Today's next kernel fail to boot on powerpc

kernel : 4.18.0-rc4-next-20180711

Boot is completely broken, I see Oops message and the fault instruction
maps to below code path

# gdb -batch vmlinux -ex 'list *(0xc000000000d175e8)'
0xc000000000d175e8 is in __free_pages_bootmem (mm/page_alloc.c:1270).
1265 set_page_count(p, 0);
1266 }
1267 __ClearPageReserved(p);
1268 set_page_count(p, 0);
1269
1270 page_zone(page)->managed_pages += nr_pages;
1271 set_page_refcounted(page);
1272 __free_pages(page, order);
1273 }
1274

and on few machines I see below trace logs:
------------------------------------
vmemmap_populate: Unable to cre[ 0.000000] vmemmap_populate: Unable to create vmemmap mapping: -1
sparse_mem_maps_populate_node: sparsemem memory map backing failed some memory will not be available
vmemmap_populate: Unable to create vmemmap mapping: -1
sparse_mem_maps_populate_node: sparsemem memory map backing failed some memory will not be available
vmemmap_populate: Unable to create vmemmap mapping: -1
sparse_mem_maps_populate_node: sparsemem memory map backing failed some memory will not be available
vmemmap_populate: Unable to create vmemmap mapping: -1
sparse_mem_maps_populate_node: sparsemem memory map b[ 0.000000] percpu: Embedded 4 pages/cpu @(____ptrval____) s168728 r0 d93416 u262144
Built 2 zonelists, mobility grouping on. Total pages: 167772
Policy zone: DMA
Kernel command line: rw root=/dev/mapper/rhel_ci--s822l1--lp10-root
BUG: Bad page state in process swapper pfn:01e01
page:f000000000078040 count:0 mapcount:1 mapping:0000000000000000 index:0x0
flags: 0x0()
raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
page dumped because: nonzero mapcount
Modules linked in:
CPU: 3 PID: 0 Comm: swapper Not tainted 4.18.0-rc4-next-20180711-autotest-autotest #1
Call Trace:
[c00000000112fd20] [c0000000009a920c] dump_stack+0xb0/0xf4 (unreliable)
[c00000000112fd60] [c0000000002859ac] bad_page+0x11c/0x190
[c00000000112fdf0] [c0000000002873d0] __free_pages_ok+0x3f0/0x400
[c00000000112fe60] [c000000000d1e45c] free_all_bootmem+0x184/0x218
[c00000000112fee0] [c000000000cf33fc] mem_init+0x40/0x60
[c00000000112ff00] [c000000000ce402c] start_kernel+0x2a4/0x5dc
[c00000000112ff90] [c00000000000ac7c] start_here_common+0x1c/0x520
Disabling lock debugging due to kernel taint
BUG: Bad page state in process swapper pfn:01e02


Related commit could be one of below ? I see lots of patches related to mm and could not bisect

5479976fda7d3ab23ba0a4eb4d60b296eb88b866 mm: page_alloc: restore memblock_next_valid_pfn() on arm/arm64
41619b27b5696e7e5ef76d9c692dd7342c1ad7eb mm-drop-vm_bug_on-from-__get_free_pages-fix
531bbe6bd2721f4b66cdb0f5cf5ac14612fa1419 mm: drop VM_BUG_ON from __get_free_pages
479350dd1a35f8bfb2534697e5ca68ee8a6e8dea mm, page_alloc: actually ignore mempolicies for high priority allocations
088018f6fe571444caaeb16e84c9f24f22dfc8b0 mm: skip invalid pages block at a time in zero_resv_unresv()


--
Regard's

Abdul Haleem
IBM Linux Technology Centre



Attachments:
bootlogs.txt (47.02 kB)
ZZ-VM-config (149.70 kB)
Download all attachments

2018-07-12 17:46:30

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [next-20180711][Oops] linux-next kernel boot is broken on powerpc

> Related commit could be one of below ? I see lots of patches related to mm and could not bisect
>
> 5479976fda7d3ab23ba0a4eb4d60b296eb88b866 mm: page_alloc: restore memblock_next_valid_pfn() on arm/arm64
> 41619b27b5696e7e5ef76d9c692dd7342c1ad7eb mm-drop-vm_bug_on-from-__get_free_pages-fix
> 531bbe6bd2721f4b66cdb0f5cf5ac14612fa1419 mm: drop VM_BUG_ON from __get_free_pages
> 479350dd1a35f8bfb2534697e5ca68ee8a6e8dea mm, page_alloc: actually ignore mempolicies for high priority allocations
> 088018f6fe571444caaeb16e84c9f24f22dfc8b0 mm: skip invalid pages block at a time in zero_resv_unresv()

Looks like:
0ba29a108979 mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER

This patch is going to be reverted from linux-next. Abdul, please
verify that issue is gone once you revert this patch.

Thank you,
Pavel

2018-07-13 09:14:57

by Abdul Haleem

[permalink] [raw]
Subject: Re: [next-20180711][Oops] linux-next kernel boot is broken on powerpc

On Thu, 2018-07-12 at 13:44 -0400, Pavel Tatashin wrote:
> > Related commit could be one of below ? I see lots of patches related to mm and could not bisect
> >
> > 5479976fda7d3ab23ba0a4eb4d60b296eb88b866 mm: page_alloc: restore memblock_next_valid_pfn() on arm/arm64
> > 41619b27b5696e7e5ef76d9c692dd7342c1ad7eb mm-drop-vm_bug_on-from-__get_free_pages-fix
> > 531bbe6bd2721f4b66cdb0f5cf5ac14612fa1419 mm: drop VM_BUG_ON from __get_free_pages
> > 479350dd1a35f8bfb2534697e5ca68ee8a6e8dea mm, page_alloc: actually ignore mempolicies for high priority allocations
> > 088018f6fe571444caaeb16e84c9f24f22dfc8b0 mm: skip invalid pages block at a time in zero_resv_unresv()
>
> Looks like:
> 0ba29a108979 mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
>
> This patch is going to be reverted from linux-next. Abdul, please
> verify that issue is gone once you revert this patch.

kernel booted fine when the above patch is reverted.

--
Regard's

Abdul Haleem
IBM Linux Technology Centre




2018-07-14 00:56:21

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [next-20180711][Oops] linux-next kernel boot is broken on powerpc

Hi Abdul,

On Fri, 13 Jul 2018 14:43:11 +0530 Abdul Haleem <[email protected]> wrote:
>
> On Thu, 2018-07-12 at 13:44 -0400, Pavel Tatashin wrote:
> > > Related commit could be one of below ? I see lots of patches related to mm and could not bisect
> > >
> > > 5479976fda7d3ab23ba0a4eb4d60b296eb88b866 mm: page_alloc: restore memblock_next_valid_pfn() on arm/arm64
> > > 41619b27b5696e7e5ef76d9c692dd7342c1ad7eb mm-drop-vm_bug_on-from-__get_free_pages-fix
> > > 531bbe6bd2721f4b66cdb0f5cf5ac14612fa1419 mm: drop VM_BUG_ON from __get_free_pages
> > > 479350dd1a35f8bfb2534697e5ca68ee8a6e8dea mm, page_alloc: actually ignore mempolicies for high priority allocations
> > > 088018f6fe571444caaeb16e84c9f24f22dfc8b0 mm: skip invalid pages block at a time in zero_resv_unresv()
> >
> > Looks like:
> > 0ba29a108979 mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> >
> > This patch is going to be reverted from linux-next. Abdul, please
> > verify that issue is gone once you revert this patch.
>
> kernel booted fine when the above patch is reverted.

And it has been removed from linux-next as of next-20180713. (Friday
the 13th is not all bad :-))
--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2018-07-17 10:50:26

by Abdul Haleem

[permalink] [raw]
Subject: Re: [next-20180711][Oops] linux-next kernel boot is broken on powerpc

On Sat, 2018-07-14 at 10:55 +1000, Stephen Rothwell wrote:
> Hi Abdul,
>
> On Fri, 13 Jul 2018 14:43:11 +0530 Abdul Haleem <[email protected]> wrote:
> >
> > On Thu, 2018-07-12 at 13:44 -0400, Pavel Tatashin wrote:
> > > > Related commit could be one of below ? I see lots of patches related to mm and could not bisect
> > > >
> > > > 5479976fda7d3ab23ba0a4eb4d60b296eb88b866 mm: page_alloc: restore memblock_next_valid_pfn() on arm/arm64
> > > > 41619b27b5696e7e5ef76d9c692dd7342c1ad7eb mm-drop-vm_bug_on-from-__get_free_pages-fix
> > > > 531bbe6bd2721f4b66cdb0f5cf5ac14612fa1419 mm: drop VM_BUG_ON from __get_free_pages
> > > > 479350dd1a35f8bfb2534697e5ca68ee8a6e8dea mm, page_alloc: actually ignore mempolicies for high priority allocations
> > > > 088018f6fe571444caaeb16e84c9f24f22dfc8b0 mm: skip invalid pages block at a time in zero_resv_unresv()
> > >
> > > Looks like:
> > > 0ba29a108979 mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> > >
> > > This patch is going to be reverted from linux-next. Abdul, please
> > > verify that issue is gone once you revert this patch.
> >
> > kernel booted fine when the above patch is reverted.
>
> And it has been removed from linux-next as of next-20180713. (Friday
> the 13th is not all bad :-))

Hi Stephen,

After reverting 0ba29a108979, our bare-metal machines boot fails with
kernel panic, is this related ?

I have attached the boot logs.

--
Regard's

Abdul Haleem
IBM Linux Technology Centre



Attachments:
bootlogs.txxt (101.12 kB)

2018-07-18 02:02:03

by Pavel Tatashin

[permalink] [raw]
Subject: Re: [next-20180711][Oops] linux-next kernel boot is broken on powerpc

On Tue, Jul 17, 2018 at 6:49 AM Abdul Haleem
<[email protected]> wrote:
>
> On Sat, 2018-07-14 at 10:55 +1000, Stephen Rothwell wrote:
> > Hi Abdul,
> >
> > On Fri, 13 Jul 2018 14:43:11 +0530 Abdul Haleem <[email protected]> wrote:
> > >
> > > On Thu, 2018-07-12 at 13:44 -0400, Pavel Tatashin wrote:
> > > > > Related commit could be one of below ? I see lots of patches related to mm and could not bisect
> > > > >
> > > > > 5479976fda7d3ab23ba0a4eb4d60b296eb88b866 mm: page_alloc: restore memblock_next_valid_pfn() on arm/arm64
> > > > > 41619b27b5696e7e5ef76d9c692dd7342c1ad7eb mm-drop-vm_bug_on-from-__get_free_pages-fix
> > > > > 531bbe6bd2721f4b66cdb0f5cf5ac14612fa1419 mm: drop VM_BUG_ON from __get_free_pages
> > > > > 479350dd1a35f8bfb2534697e5ca68ee8a6e8dea mm, page_alloc: actually ignore mempolicies for high priority allocations
> > > > > 088018f6fe571444caaeb16e84c9f24f22dfc8b0 mm: skip invalid pages block at a time in zero_resv_unresv()
> > > >
> > > > Looks like:
> > > > 0ba29a108979 mm/sparse: Remove CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
> > > >
> > > > This patch is going to be reverted from linux-next. Abdul, please
> > > > verify that issue is gone once you revert this patch.
> > >
> > > kernel booted fine when the above patch is reverted.
> >
> > And it has been removed from linux-next as of next-20180713. (Friday
> > the 13th is not all bad :-))
>
> Hi Stephen,
>
> After reverting 0ba29a108979, our bare-metal machines boot fails with
> kernel panic, is this related ?
>
> I have attached the boot logs.

The panic happens much later in boot and looks unrelated to the
sparse_init changes.

Thank you,
Pavel