LinuxLists.cc - [PATCH v2] mm: access to uninitialized struct page

2018-04-26 20:28:27

Subject: [PATCH v2] mm: access to uninitialized struct page

The following two bugs were reported by Fengguang Wu:

kernel reboot-without-warning in early-boot stage, last printk:
early console in setup code

http://lkml.kernel.org/r/[email protected]

And, also:
[per_cpu_ptr_to_phys] PANIC: early exception 0x0d
IP 10:ffffffffa892f15f error 0 cr2 0xffff88001fbff000

http://lkml.kernel.org/r/[email protected]

Both of the problems are due to accessing uninitialized struct page from
trap_init(). We must first do mm_init() in order to initialize allocated
struct pages, and than we can access fields of any struct page that belongs
to memory that's been allocated.

Below is explanation of the root cause.

The issue arises in this stack:

start_kernel()
trap_init()
setup_cpu_entry_areas()
setup_cpu_entry_area(cpu)
get_cpu_gdt_paddr(cpu)
per_cpu_ptr_to_phys(addr)
pcpu_addr_to_page(addr)
virt_to_page(addr)
pfn_to_page(__pa(addr) >> PAGE_SHIFT)
The returned "struct page" is sometimes uninitialized, and thus
failing later when used. It turns out sometimes is because it depends
on KASLR.

When boot is failing we have this when pfn_to_page() is called:
kasrl: 0x000000000d600000
addr: ffffffff83e0d000
pa: 1040d000
pfn: 1040d
page: ffff88001f113340
page->flags ffffffffffffffff <- Uninitialized!

When boot is successful:
kaslr: 0x000000000a800000
addr: ffffffff83e0d000
pa: d60d000
pfn: d60d
page: ffff88001f05b340
page->flags 280000000000 <- Initialized!

Here are physical addresses that BIOS provided to us:
e820: BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x000000001ffdffff] usable
BIOS-e820: [mem 0x000000001ffe0000-0x000000001fffffff] reserved
BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved

In both cases, working and non-working the real physical address is
the same:

pa - kasrl = 0x2E0D000

The only thing that is different is PFN.

We initialize struct pages in four places:

1. Early in boot a small set of struct pages is initialized to fill
the first section, and lower zones.
2. During mm_init() we initialize "struct pages" for all the memory
that is allocated, i.e reserved in memblock.
3. Using on-demand logic when pages are allocated after mm_init call
4. After smp_init() when the rest free deferred pages are initialized.

The above path happens before deferred memory is initialized, and thus
it must be covered either by 1, 2 or 3.

So, lets check what PFNs are initialized after (1).

memmap_init_zone() is called for pfn ranges:
1 - 1000, and 1000 - 1ffe0, but it quits after reaching pfn 0x10000,
as it leaves the rest to be initialized as deferred pages.

In the working scenario pfn ended up being below 1000, but in the
failing scenario it is above. Hence, we must initialize this page in
(2). But trap_init() is called before mm_init().

The bug was introduced by "mm: initialize pages on demand during boot"
because we lowered amount of pages that is initialized in the step
(1). But, it still could happen, because the number of initialized
pages was a guessing.

The current fix moves trap_init() to be called after mm_init, but as
alternative, we could increase pgdat->static_init_pgcnt:
In free_area_init_node we can increase:
pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
pgdat->node_spanned_pages);
Instead of one PAGES_PER_SECTION, set several, so the text is
covered for all KASLR offsets. But, this would still be guessing.
Therefore, I prefer the current fix.

Fixes: c9e97a1997fb ("mm: initialize pages on demand during boot")

Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
---
init/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index b795aa341a3a..870f75581cea 100644
--- a/init/main.c
+++ b/init/main.c
@@ -585,8 +585,8 @@ asmlinkage __visible void __init start_kernel(void)
setup_log_buf(0);
vfs_caches_init_early();
sort_main_extable();
- trap_init();
mm_init();
+ trap_init();

ftrace_init();

--
2.17.0

2018-04-30 23:28:35

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH v2] mm: access to uninitialized struct page

On Thu, 26 Apr 2018 16:26:19 -0400 Pavel Tatashin <[email protected]> wrote:

> The following two bugs were reported by Fengguang Wu:
>
> kernel reboot-without-warning in early-boot stage, last printk:
> early console in setup code
>
> http://lkml.kernel.org/r/[email protected]
>
> ...
>
> --- a/init/main.c
> +++ b/init/main.c
> @@ -585,8 +585,8 @@ asmlinkage __visible void __init start_kernel(void)
> setup_log_buf(0);
> vfs_caches_init_early();
> sort_main_extable();
> - trap_init();
> mm_init();
> + trap_init();
>
> ftrace_init();

Gulp. Let's hope that nothing in mm_init() requires that trap_init()
has been run. What happens if something goes wrong during mm_init()
and the architecture attempts to raise a software exception, hits a bus
error, div-by-zero, etc, etc? Might there be hard-to-discover
dependencies in such a case?

2018-04-30 23:59:28

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH v2] mm: access to uninitialized struct page

On Mon, 30 Apr 2018 16:26:58 -0700
Andrew Morton <[email protected]> wrote:

> On Thu, 26 Apr 2018 16:26:19 -0400 Pavel Tatashin <[email protected]> wrote:
>
> > The following two bugs were reported by Fengguang Wu:
> >
> > kernel reboot-without-warning in early-boot stage, last printk:
> > early console in setup code
> >
> > http://lkml.kernel.org/r/[email protected]
> >
> > ...
> >
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -585,8 +585,8 @@ asmlinkage __visible void __init start_kernel(void)
> > setup_log_buf(0);
> > vfs_caches_init_early();
> > sort_main_extable();
> > - trap_init();
> > mm_init();
> > + trap_init();
> >
> > ftrace_init();
>
> Gulp. Let's hope that nothing in mm_init() requires that trap_init()
> has been run. What happens if something goes wrong during mm_init()
> and the architecture attempts to raise a software exception, hits a bus
> error, div-by-zero, etc, etc? Might there be hard-to-discover
> dependencies in such a case?

I mentioned the same thing.

-- Steve

2018-05-01 00:02:23

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH v2] mm: access to uninitialized struct page

On Mon, 30 Apr 2018 19:58:58 -0400 Steven Rostedt <[email protected]> wrote:

> On Mon, 30 Apr 2018 16:26:58 -0700
> Andrew Morton <[email protected]> wrote:
>
> > On Thu, 26 Apr 2018 16:26:19 -0400 Pavel Tatashin <[email protected]> wrote:
> >
> > > The following two bugs were reported by Fengguang Wu:
> > >
> > > kernel reboot-without-warning in early-boot stage, last printk:
> > > early console in setup code
> > >
> > > http://lkml.kernel.org/r/[email protected]
> > >
> > > ...
> > >
> > > --- a/init/main.c
> > > +++ b/init/main.c
> > > @@ -585,8 +585,8 @@ asmlinkage __visible void __init start_kernel(void)
> > > setup_log_buf(0);
> > > vfs_caches_init_early();
> > > sort_main_extable();
> > > - trap_init();
> > > mm_init();
> > > + trap_init();
> > >
> > > ftrace_init();
> >
> > Gulp. Let's hope that nothing in mm_init() requires that trap_init()
> > has been run. What happens if something goes wrong during mm_init()
> > and the architecture attempts to raise a software exception, hits a bus
> > error, div-by-zero, etc, etc? Might there be hard-to-discover
> > dependencies in such a case?
>
> I mentioned the same thing.
>

I guess the same concern applies to all the code which we've always run
before trap_init(), and that's quite a lot of stuff. So we should be
OK. But don't quote me ;)

2018-05-04 08:28:50

by Andrei Vagin

[permalink] [raw]

Subject: Re: [v2] mm: access to uninitialized struct page

Hello,

We have a robot which runs criu tests on linux-next kernels.

All tests passed on 4.17.0-rc3-next-20180502.

But the 4.17.0-rc3-next-20180504 kernel didn't boot.

git bisect points on this patch.

On Thu, Apr 26, 2018 at 04:26:19PM -0400, Pavel Tatashin wrote:
> The following two bugs were reported by Fengguang Wu:
>
> kernel reboot-without-warning in early-boot stage, last printk:
> early console in setup code
>
> http://lkml.kernel.org/r/[email protected]

The problem looks similar with this one.

[ 5.596975] devtmpfs: mounted
[ 5.855754] Freeing unused kernel memory: 1704K
[ 5.858162] Write protecting the kernel read-only data: 18432k
[ 5.860772] Freeing unused kernel memory: 2012K
[ 5.861838] Freeing unused kernel memory: 160K
[ 5.862572] rodata_test: all tests were successful
[ 5.866857] random: fast init done
early console in setup code
[ 0.000000] Linux version 4.17.0-rc3-00023-g7c4cc2d022a1
(avagin@laptop) (gcc version 8.0.1 20180324 (Red Hat 8.0.1-0.20) (GCC))
#13 SMP Fri May 4 01:10:51 PDT 2018
[ 0.000000] Command line: root=/dev/vda2 ro debug
console=ttyS0,115200 LANG=en_US.UTF-8 slub_debug=FZP raid=noautodetect
selinux=0 earlyprintk=serial,ttyS0,115200
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds
registers'

$ git describe HEAD
v4.17-rc3-23-g7c4cc2d022a1

[avagin@laptop linux-next]$ git log --pretty=oneline | head -n 1
7c4cc2d022a1fd56eb2ee555533b8666bc780f1e mm: access to uninitialized struct page

>
> And, also:
> [per_cpu_ptr_to_phys] PANIC: early exception 0x0d
> IP 10:ffffffffa892f15f error 0 cr2 0xffff88001fbff000
>
> http://lkml.kernel.org/r/[email protected]
>
> Both of the problems are due to accessing uninitialized struct page from
> trap_init(). We must first do mm_init() in order to initialize allocated
> struct pages, and than we can access fields of any struct page that belongs
> to memory that's been allocated.
>
> Below is explanation of the root cause.
>
> The issue arises in this stack:
>
> start_kernel()
> trap_init()
> setup_cpu_entry_areas()
> setup_cpu_entry_area(cpu)
> get_cpu_gdt_paddr(cpu)
> per_cpu_ptr_to_phys(addr)
> pcpu_addr_to_page(addr)
> virt_to_page(addr)
> pfn_to_page(__pa(addr) >> PAGE_SHIFT)
> The returned "struct page" is sometimes uninitialized, and thus
> failing later when used. It turns out sometimes is because it depends
> on KASLR.
>
> When boot is failing we have this when pfn_to_page() is called:
> kasrl: 0x000000000d600000
> addr: ffffffff83e0d000
> pa: 1040d000
> pfn: 1040d
> page: ffff88001f113340
> page->flags ffffffffffffffff <- Uninitialized!
>
> When boot is successful:
> kaslr: 0x000000000a800000
> addr: ffffffff83e0d000
> pa: d60d000
> pfn: d60d
> page: ffff88001f05b340
> page->flags 280000000000 <- Initialized!
>
> Here are physical addresses that BIOS provided to us:
> e820: BIOS-provided physical RAM map:
> BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> BIOS-e820: [mem 0x0000000000100000-0x000000001ffdffff] usable
> BIOS-e820: [mem 0x000000001ffe0000-0x000000001fffffff] reserved
> BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
>
> In both cases, working and non-working the real physical address is
> the same:
>
> pa - kasrl = 0x2E0D000
>
> The only thing that is different is PFN.
>
> We initialize struct pages in four places:
>
> 1. Early in boot a small set of struct pages is initialized to fill
> the first section, and lower zones.
> 2. During mm_init() we initialize "struct pages" for all the memory
> that is allocated, i.e reserved in memblock.
> 3. Using on-demand logic when pages are allocated after mm_init call
> 4. After smp_init() when the rest free deferred pages are initialized.
>
> The above path happens before deferred memory is initialized, and thus
> it must be covered either by 1, 2 or 3.
>
> So, lets check what PFNs are initialized after (1).
>
> memmap_init_zone() is called for pfn ranges:
> 1 - 1000, and 1000 - 1ffe0, but it quits after reaching pfn 0x10000,
> as it leaves the rest to be initialized as deferred pages.
>
> In the working scenario pfn ended up being below 1000, but in the
> failing scenario it is above. Hence, we must initialize this page in
> (2). But trap_init() is called before mm_init().
>
> The bug was introduced by "mm: initialize pages on demand during boot"
> because we lowered amount of pages that is initialized in the step
> (1). But, it still could happen, because the number of initialized
> pages was a guessing.
>
> The current fix moves trap_init() to be called after mm_init, but as
> alternative, we could increase pgdat->static_init_pgcnt:
> In free_area_init_node we can increase:
> pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION,
> pgdat->node_spanned_pages);
> Instead of one PAGES_PER_SECTION, set several, so the text is
> covered for all KASLR offsets. But, this would still be guessing.
> Therefore, I prefer the current fix.
>
> Fixes: c9e97a1997fb ("mm: initialize pages on demand during boot")
>
> Signed-off-by: Pavel Tatashin <[email protected]>
> Reviewed-by: Steven Rostedt (VMware) <[email protected]>
> ---
> init/main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/init/main.c b/init/main.c
> index b795aa341a3a..870f75581cea 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -585,8 +585,8 @@ asmlinkage __visible void __init start_kernel(void)
> setup_log_buf(0);
> vfs_caches_init_early();
> sort_main_extable();
> - trap_init();
> mm_init();
> + trap_init();
>
> ftrace_init();
>

2018-05-04 12:51:38

by Pavel Tatashin

[permalink] [raw]

Subject: Re: [v2] mm: access to uninitialized struct page

Hi Andrei,

Could you please provide me with scripts to reproduce this issue?

Thank you,
Pavel
On Fri, May 4, 2018 at 4:27 AM Andrei Vagin <[email protected]> wrote:

> Hello,

> We have a robot which runs criu tests on linux-next kernels.

> All tests passed on 4.17.0-rc3-next-20180502.

> But the 4.17.0-rc3-next-20180504 kernel didn't boot.

> git bisect points on this patch.

> On Thu, Apr 26, 2018 at 04:26:19PM -0400, Pavel Tatashin wrote:
> > The following two bugs were reported by Fengguang Wu:
> >
> > kernel reboot-without-warning in early-boot stage, last printk:
> > early console in setup code
> >
> >
http://lkml.kernel.org/r/[email protected]

> The problem looks similar with this one.

> [ 5.596975] devtmpfs: mounted
> [ 5.855754] Freeing unused kernel memory: 1704K
> [ 5.858162] Write protecting the kernel read-only data: 18432k
> [ 5.860772] Freeing unused kernel memory: 2012K
> [ 5.861838] Freeing unused kernel memory: 160K
> [ 5.862572] rodata_test: all tests were successful
> [ 5.866857] random: fast init done
> early console in setup code
> [ 0.000000] Linux version 4.17.0-rc3-00023-g7c4cc2d022a1
> (avagin@laptop) (gcc version 8.0.1 20180324 (Red Hat 8.0.1-0.20) (GCC))
> #13 SMP Fri May 4 01:10:51 PDT 2018
> [ 0.000000] Command line: root=/dev/vda2 ro debug
> console=ttyS0,115200 LANG=en_US.UTF-8 slub_debug=FZP raid=noautodetect
> selinux=0 earlyprintk=serial,ttyS0,115200
> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
> point registers'
> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds
> registers'

> $ git describe HEAD
> v4.17-rc3-23-g7c4cc2d022a1

> [avagin@laptop linux-next]$ git log --pretty=oneline | head -n 1
> 7c4cc2d022a1fd56eb2ee555533b8666bc780f1e mm: access to uninitialized
struct page

> >
> > And, also:
> > [per_cpu_ptr_to_phys] PANIC: early exception 0x0d
> > IP 10:ffffffffa892f15f error 0 cr2 0xffff88001fbff000
> >
> >
http://lkml.kernel.org/r/[email protected]
> >
> > Both of the problems are due to accessing uninitialized struct page from
> > trap_init(). We must first do mm_init() in order to initialize allocated
> > struct pages, and than we can access fields of any struct page that
belongs
> > to memory that's been allocated.
> >
> > Below is explanation of the root cause.
> >
> > The issue arises in this stack:
> >
> > start_kernel()
> > trap_init()
> > setup_cpu_entry_areas()
> > setup_cpu_entry_area(cpu)
> > get_cpu_gdt_paddr(cpu)
> > per_cpu_ptr_to_phys(addr)
> > pcpu_addr_to_page(addr)
> > virt_to_page(addr)
> > pfn_to_page(__pa(addr) >> PAGE_SHIFT)
> > The returned "struct page" is sometimes uninitialized, and thus
> > failing later when used. It turns out sometimes is because it depends
> > on KASLR.
> >
> > When boot is failing we have this when pfn_to_page() is called:
> > kasrl: 0x000000000d600000
> > addr: ffffffff83e0d000
> > pa: 1040d000
> > pfn: 1040d
> > page: ffff88001f113340
> > page->flags ffffffffffffffff <- Uninitialized!
> >
> > When boot is successful:
> > kaslr: 0x000000000a800000
> > addr: ffffffff83e0d000
> > pa: d60d000
> > pfn: d60d
> > page: ffff88001f05b340
> > page->flags 280000000000 <- Initialized!
> >
> > Here are physical addresses that BIOS provided to us:
> > e820: BIOS-provided physical RAM map:
> > BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> > BIOS-e820: [mem 0x0000000000100000-0x000000001ffdffff] usable
> > BIOS-e820: [mem 0x000000001ffe0000-0x000000001fffffff] reserved
> > BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> > BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> >
> > In both cases, working and non-working the real physical address is
> > the same:
> >
> > pa - kasrl = 0x2E0D000
> >
> > The only thing that is different is PFN.
> >
> > We initialize struct pages in four places:
> >
> > 1. Early in boot a small set of struct pages is initialized to fill
> > the first section, and lower zones.
> > 2. During mm_init() we initialize "struct pages" for all the memory
> > that is allocated, i.e reserved in memblock.
> > 3. Using on-demand logic when pages are allocated after mm_init call
> > 4. After smp_init() when the rest free deferred pages are initialized.
> >
> > The above path happens before deferred memory is initialized, and thus
> > it must be covered either by 1, 2 or 3.
> >
> > So, lets check what PFNs are initialized after (1).
> >
> > memmap_init_zone() is called for pfn ranges:
> > 1 - 1000, and 1000 - 1ffe0, but it quits after reaching pfn 0x10000,
> > as it leaves the rest to be initialized as deferred pages.
> >
> > In the working scenario pfn ended up being below 1000, but in the
> > failing scenario it is above. Hence, we must initialize this page in
> > (2). But trap_init() is called before mm_init().
> >
> > The bug was introduced by "mm: initialize pages on demand during boot"
> > because we lowered amount of pages that is initialized in the step
> > (1). But, it still could happen, because the number of initialized
> > pages was a guessing.
> >
> > The current fix moves trap_init() to be called after mm_init, but as
> > alternative, we could increase pgdat->static_init_pgcnt:
> > In free_area_init_node we can increase:
> > pgdat->static_init_pgcnt = min_t(unsigned long,
PAGES_PER_SECTION,
> > pgdat->node_spanned_pages);
> > Instead of one PAGES_PER_SECTION, set several, so the text is
> > covered for all KASLR offsets. But, this would still be guessing.
> > Therefore, I prefer the current fix.
> >
> > Fixes: c9e97a1997fb ("mm: initialize pages on demand during boot")
> >
> > Signed-off-by: Pavel Tatashin <[email protected]>
> > Reviewed-by: Steven Rostedt (VMware) <[email protected]>
> > ---
> > init/main.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/init/main.c b/init/main.c
> > index b795aa341a3a..870f75581cea 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -585,8 +585,8 @@ asmlinkage __visible void __init start_kernel(void)
> > setup_log_buf(0);
> > vfs_caches_init_early();
> > sort_main_extable();
> > - trap_init();
> > mm_init();
> > + trap_init();
> >
> > ftrace_init();
> >

2018-05-04 14:51:31

by Steven Rostedt

[permalink] [raw]

Subject: Re: [v2] mm: access to uninitialized struct page

On Fri, 04 May 2018 12:47:53 +0000
Pavel Tatashin <[email protected]> wrote:

> Hi Andrei,
>
> Could you please provide me with scripts to reproduce this issue?
>
>

And the config that was used. Just saying that the commit doesn't boot
isn't very useful.

-- Steve

2018-05-04 16:04:18

by Andrei Vagin

[permalink] [raw]

Subject: Re: [v2] mm: access to uninitialized struct page

On Fri, May 04, 2018 at 12:47:53PM +0000, Pavel Tatashin wrote:
> Hi Andrei,
>
> Could you please provide me with scripts to reproduce this issue?

I boot this kernel in a kvm virtual machine. The kernel is built without
modules. A config file is attahced.

Here is a qemu command line what I use to reproduce the problem:

qemu-kvm -kernel /home/avagin/git/linux-next/arch/x86/boot/bzImage \
-append 'root=/dev/vda2 ro debug console=ttyS0,115200 LANG=en_US.UTF-8 slub_debug=FZP raid=noautodetect selinux=0 earlyprintk=serial,ttyS0,115200' \
-boot c \
-smp 2,sockets=2,cores=1,threads=1 \
-drive file=/home/vms/fc22.img,format=raw,if=none,id=drive-virtio-disk0 \
--display none \
-serial telnet:127.0.0.1:4444,server,nowait -cpu Skylake-Client-IBRS,ss=on,hypervisor=on,tsc_adjust=on,clflushopt=on,xsaves=on,pdpe1gb=on,ibpb=on \
-m 4096 \
-realtime mlock=off \
-machine pc-i440fx-2.3,accel=kvm,usb=off,dump-guest-core=off \
-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on

[avagin@laptop linux-next]$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 78
model name : Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
stepping : 3
microcode : 0xc2
cpu MHz : 1213.986
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves ibpb ibrs stibp dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs : cpu_meltdown spectre_v1 spectre_v2
bogomips : 4992.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

>
> Thank you,
> Pavel
> On Fri, May 4, 2018 at 4:27 AM Andrei Vagin <[email protected]> wrote:
>
> > Hello,
>
> > We have a robot which runs criu tests on linux-next kernels.
>
> > All tests passed on 4.17.0-rc3-next-20180502.
>
> > But the 4.17.0-rc3-next-20180504 kernel didn't boot.
>
> > git bisect points on this patch.
>
> > On Thu, Apr 26, 2018 at 04:26:19PM -0400, Pavel Tatashin wrote:
> > > The following two bugs were reported by Fengguang Wu:
> > >
> > > kernel reboot-without-warning in early-boot stage, last printk:
> > > early console in setup code
> > >
> > >
> http://lkml.kernel.org/r/[email protected]
>
> > The problem looks similar with this one.
>
> > [ 5.596975] devtmpfs: mounted
> > [ 5.855754] Freeing unused kernel memory: 1704K
> > [ 5.858162] Write protecting the kernel read-only data: 18432k
> > [ 5.860772] Freeing unused kernel memory: 2012K
> > [ 5.861838] Freeing unused kernel memory: 160K
> > [ 5.862572] rodata_test: all tests were successful
> > [ 5.866857] random: fast init done
> > early console in setup code
> > [ 0.000000] Linux version 4.17.0-rc3-00023-g7c4cc2d022a1
> > (avagin@laptop) (gcc version 8.0.1 20180324 (Red Hat 8.0.1-0.20) (GCC))
> > #13 SMP Fri May 4 01:10:51 PDT 2018
> > [ 0.000000] Command line: root=/dev/vda2 ro debug
> > console=ttyS0,115200 LANG=en_US.UTF-8 slub_debug=FZP raid=noautodetect
> > selinux=0 earlyprintk=serial,ttyS0,115200
> > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
> > point registers'
> > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds
> > registers'
>
> > $ git describe HEAD
> > v4.17-rc3-23-g7c4cc2d022a1
>
> > [avagin@laptop linux-next]$ git log --pretty=oneline | head -n 1
> > 7c4cc2d022a1fd56eb2ee555533b8666bc780f1e mm: access to uninitialized
> struct page
>
>
> > >
> > > And, also:
> > > [per_cpu_ptr_to_phys] PANIC: early exception 0x0d
> > > IP 10:ffffffffa892f15f error 0 cr2 0xffff88001fbff000
> > >
> > >
> http://lkml.kernel.org/r/[email protected]
> > >
> > > Both of the problems are due to accessing uninitialized struct page from
> > > trap_init(). We must first do mm_init() in order to initialize allocated
> > > struct pages, and than we can access fields of any struct page that
> belongs
> > > to memory that's been allocated.
> > >
> > > Below is explanation of the root cause.
> > >
> > > The issue arises in this stack:
> > >
> > > start_kernel()
> > > trap_init()
> > > setup_cpu_entry_areas()
> > > setup_cpu_entry_area(cpu)
> > > get_cpu_gdt_paddr(cpu)
> > > per_cpu_ptr_to_phys(addr)
> > > pcpu_addr_to_page(addr)
> > > virt_to_page(addr)
> > > pfn_to_page(__pa(addr) >> PAGE_SHIFT)
> > > The returned "struct page" is sometimes uninitialized, and thus
> > > failing later when used. It turns out sometimes is because it depends
> > > on KASLR.
> > >
> > > When boot is failing we have this when pfn_to_page() is called:
> > > kasrl: 0x000000000d600000
> > > addr: ffffffff83e0d000
> > > pa: 1040d000
> > > pfn: 1040d
> > > page: ffff88001f113340
> > > page->flags ffffffffffffffff <- Uninitialized!
> > >
> > > When boot is successful:
> > > kaslr: 0x000000000a800000
> > > addr: ffffffff83e0d000
> > > pa: d60d000
> > > pfn: d60d
> > > page: ffff88001f05b340
> > > page->flags 280000000000 <- Initialized!
> > >
> > > Here are physical addresses that BIOS provided to us:
> > > e820: BIOS-provided physical RAM map:
> > > BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > > BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > > BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> > > BIOS-e820: [mem 0x0000000000100000-0x000000001ffdffff] usable
> > > BIOS-e820: [mem 0x000000001ffe0000-0x000000001fffffff] reserved
> > > BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> > > BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> > >
> > > In both cases, working and non-working the real physical address is
> > > the same:
> > >
> > > pa - kasrl = 0x2E0D000
> > >
> > > The only thing that is different is PFN.
> > >
> > > We initialize struct pages in four places:
> > >
> > > 1. Early in boot a small set of struct pages is initialized to fill
> > > the first section, and lower zones.
> > > 2. During mm_init() we initialize "struct pages" for all the memory
> > > that is allocated, i.e reserved in memblock.
> > > 3. Using on-demand logic when pages are allocated after mm_init call
> > > 4. After smp_init() when the rest free deferred pages are initialized.
> > >
> > > The above path happens before deferred memory is initialized, and thus
> > > it must be covered either by 1, 2 or 3.
> > >
> > > So, lets check what PFNs are initialized after (1).
> > >
> > > memmap_init_zone() is called for pfn ranges:
> > > 1 - 1000, and 1000 - 1ffe0, but it quits after reaching pfn 0x10000,
> > > as it leaves the rest to be initialized as deferred pages.
> > >
> > > In the working scenario pfn ended up being below 1000, but in the
> > > failing scenario it is above. Hence, we must initialize this page in
> > > (2). But trap_init() is called before mm_init().
> > >
> > > The bug was introduced by "mm: initialize pages on demand during boot"
> > > because we lowered amount of pages that is initialized in the step
> > > (1). But, it still could happen, because the number of initialized
> > > pages was a guessing.
> > >
> > > The current fix moves trap_init() to be called after mm_init, but as
> > > alternative, we could increase pgdat->static_init_pgcnt:
> > > In free_area_init_node we can increase:
> > > pgdat->static_init_pgcnt = min_t(unsigned long,
> PAGES_PER_SECTION,
> > > pgdat->node_spanned_pages);
> > > Instead of one PAGES_PER_SECTION, set several, so the text is
> > > covered for all KASLR offsets. But, this would still be guessing.
> > > Therefore, I prefer the current fix.
> > >
> > > Fixes: c9e97a1997fb ("mm: initialize pages on demand during boot")
> > >
> > > Signed-off-by: Pavel Tatashin <[email protected]>
> > > Reviewed-by: Steven Rostedt (VMware) <[email protected]>
> > > ---
> > > init/main.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/init/main.c b/init/main.c
> > > index b795aa341a3a..870f75581cea 100644
> > > --- a/init/main.c
> > > +++ b/init/main.c
> > > @@ -585,8 +585,8 @@ asmlinkage __visible void __init start_kernel(void)
> > > setup_log_buf(0);
> > > vfs_caches_init_early();
> > > sort_main_extable();
> > > - trap_init();
> > > mm_init();
> > > + trap_init();
> > >
> > > ftrace_init();
> > >

Attachments:

(No filename) (9.76 kB)
.config (93.09 kB)
Download all attachments

2018-05-04 16:05:52

by Pavel Tatashin

[permalink] [raw]

Subject: Re: [v2] mm: access to uninitialized struct page

Thank you, I will try to figure out what is happening.

Pavel

On 05/04/2018 12:01 PM, Andrei Vagin wrote:
> On Fri, May 04, 2018 at 12:47:53PM +0000, Pavel Tatashin wrote:
>> Hi Andrei,
>>
>> Could you please provide me with scripts to reproduce this issue?
>
> I boot this kernel in a kvm virtual machine. The kernel is built without
> modules. A config file is attahced.
>
> Here is a qemu command line what I use to reproduce the problem:
>
> qemu-kvm -kernel /home/avagin/git/linux-next/arch/x86/boot/bzImage \
> -append 'root=/dev/vda2 ro debug console=ttyS0,115200 LANG=en_US.UTF-8 slub_debug=FZP raid=noautodetect selinux=0 earlyprintk=serial,ttyS0,115200' \
> -boot c \
> -smp 2,sockets=2,cores=1,threads=1 \
> -drive file=/home/vms/fc22.img,format=raw,if=none,id=drive-virtio-disk0 \
> --display none \
> -serial telnet:127.0.0.1:4444,server,nowait -cpu Skylake-Client-IBRS,ss=on,hypervisor=on,tsc_adjust=on,clflushopt=on,xsaves=on,pdpe1gb=on,ibpb=on \
> -m 4096 \
> -realtime mlock=off \
> -machine pc-i440fx-2.3,accel=kvm,usb=off,dump-guest-core=off \
> -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 \
> -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 \
> -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 \
> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
>
>
> [avagin@laptop linux-next]$ cat /proc/cpuinfo
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 78
> model name : Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
> stepping : 3
> microcode : 0xc2
> cpu MHz : 1213.986
> cache size : 3072 KB
> physical id : 0
> siblings : 4
> core id : 0
> cpu cores : 2
> apicid : 0
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 22
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves ibpb ibrs stibp dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
> bugs : cpu_meltdown spectre_v1 spectre_v2
> bogomips : 4992.00
> clflush size : 64
> cache_alignment : 64
> address sizes : 39 bits physical, 48 bits virtual
> power management:
>
>>
>> Thank you,
>> Pavel
>> On Fri, May 4, 2018 at 4:27 AM Andrei Vagin <[email protected]> wrote:
>>
>>> Hello,
>>
>>> We have a robot which runs criu tests on linux-next kernels.
>>
>>> All tests passed on 4.17.0-rc3-next-20180502.
>>
>>> But the 4.17.0-rc3-next-20180504 kernel didn't boot.
>>
>>> git bisect points on this patch.
>>
>>> On Thu, Apr 26, 2018 at 04:26:19PM -0400, Pavel Tatashin wrote:
>>>> The following two bugs were reported by Fengguang Wu:
>>>>
>>>> kernel reboot-without-warning in early-boot stage, last printk:
>>>> early console in setup code
>>>>
>>>>
>> http://lkml.kernel.org/r/[email protected]
>>
>>> The problem looks similar with this one.
>>
>>> [ 5.596975] devtmpfs: mounted
>>> [ 5.855754] Freeing unused kernel memory: 1704K
>>> [ 5.858162] Write protecting the kernel read-only data: 18432k
>>> [ 5.860772] Freeing unused kernel memory: 2012K
>>> [ 5.861838] Freeing unused kernel memory: 160K
>>> [ 5.862572] rodata_test: all tests were successful
>>> [ 5.866857] random: fast init done
>>> early console in setup code
>>> [ 0.000000] Linux version 4.17.0-rc3-00023-g7c4cc2d022a1
>>> (avagin@laptop) (gcc version 8.0.1 20180324 (Red Hat 8.0.1-0.20) (GCC))
>>> #13 SMP Fri May 4 01:10:51 PDT 2018
>>> [ 0.000000] Command line: root=/dev/vda2 ro debug
>>> console=ttyS0,115200 LANG=en_US.UTF-8 slub_debug=FZP raid=noautodetect
>>> selinux=0 earlyprintk=serial,ttyS0,115200
>>> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating
>>> point registers'
>>> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
>>> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
>>> [ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds
>>> registers'
>>
>>> $ git describe HEAD
>>> v4.17-rc3-23-g7c4cc2d022a1
>>
>>> [avagin@laptop linux-next]$ git log --pretty=oneline | head -n 1
>>> 7c4cc2d022a1fd56eb2ee555533b8666bc780f1e mm: access to uninitialized
>> struct page
>>
>>
>>>>
>>>> And, also:
>>>> [per_cpu_ptr_to_phys] PANIC: early exception 0x0d
>>>> IP 10:ffffffffa892f15f error 0 cr2 0xffff88001fbff000
>>>>
>>>>
>> http://lkml.kernel.org/r/[email protected]
>>>>
>>>> Both of the problems are due to accessing uninitialized struct page from
>>>> trap_init(). We must first do mm_init() in order to initialize allocated
>>>> struct pages, and than we can access fields of any struct page that
>> belongs
>>>> to memory that's been allocated.
>>>>
>>>> Below is explanation of the root cause.
>>>>
>>>> The issue arises in this stack:
>>>>
>>>> start_kernel()
>>>> trap_init()
>>>> setup_cpu_entry_areas()
>>>> setup_cpu_entry_area(cpu)
>>>> get_cpu_gdt_paddr(cpu)
>>>> per_cpu_ptr_to_phys(addr)
>>>> pcpu_addr_to_page(addr)
>>>> virt_to_page(addr)
>>>> pfn_to_page(__pa(addr) >> PAGE_SHIFT)
>>>> The returned "struct page" is sometimes uninitialized, and thus
>>>> failing later when used. It turns out sometimes is because it depends
>>>> on KASLR.
>>>>
>>>> When boot is failing we have this when pfn_to_page() is called:
>>>> kasrl: 0x000000000d600000
>>>> addr: ffffffff83e0d000
>>>> pa: 1040d000
>>>> pfn: 1040d
>>>> page: ffff88001f113340
>>>> page->flags ffffffffffffffff <- Uninitialized!
>>>>
>>>> When boot is successful:
>>>> kaslr: 0x000000000a800000
>>>> addr: ffffffff83e0d000
>>>> pa: d60d000
>>>> pfn: d60d
>>>> page: ffff88001f05b340
>>>> page->flags 280000000000 <- Initialized!
>>>>
>>>> Here are physical addresses that BIOS provided to us:
>>>> e820: BIOS-provided physical RAM map:
>>>> BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
>>>> BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
>>>> BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
>>>> BIOS-e820: [mem 0x0000000000100000-0x000000001ffdffff] usable
>>>> BIOS-e820: [mem 0x000000001ffe0000-0x000000001fffffff] reserved
>>>> BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
>>>> BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
>>>>
>>>> In both cases, working and non-working the real physical address is
>>>> the same:
>>>>
>>>> pa - kasrl = 0x2E0D000
>>>>
>>>> The only thing that is different is PFN.
>>>>
>>>> We initialize struct pages in four places:
>>>>
>>>> 1. Early in boot a small set of struct pages is initialized to fill
>>>> the first section, and lower zones.
>>>> 2. During mm_init() we initialize "struct pages" for all the memory
>>>> that is allocated, i.e reserved in memblock.
>>>> 3. Using on-demand logic when pages are allocated after mm_init call
>>>> 4. After smp_init() when the rest free deferred pages are initialized.
>>>>
>>>> The above path happens before deferred memory is initialized, and thus
>>>> it must be covered either by 1, 2 or 3.
>>>>
>>>> So, lets check what PFNs are initialized after (1).
>>>>
>>>> memmap_init_zone() is called for pfn ranges:
>>>> 1 - 1000, and 1000 - 1ffe0, but it quits after reaching pfn 0x10000,
>>>> as it leaves the rest to be initialized as deferred pages.
>>>>
>>>> In the working scenario pfn ended up being below 1000, but in the
>>>> failing scenario it is above. Hence, we must initialize this page in
>>>> (2). But trap_init() is called before mm_init().
>>>>
>>>> The bug was introduced by "mm: initialize pages on demand during boot"
>>>> because we lowered amount of pages that is initialized in the step
>>>> (1). But, it still could happen, because the number of initialized
>>>> pages was a guessing.
>>>>
>>>> The current fix moves trap_init() to be called after mm_init, but as
>>>> alternative, we could increase pgdat->static_init_pgcnt:
>>>> In free_area_init_node we can increase:
>>>> pgdat->static_init_pgcnt = min_t(unsigned long,
>> PAGES_PER_SECTION,
>>>> pgdat->node_spanned_pages);
>>>> Instead of one PAGES_PER_SECTION, set several, so the text is
>>>> covered for all KASLR offsets. But, this would still be guessing.
>>>> Therefore, I prefer the current fix.
>>>>
>>>> Fixes: c9e97a1997fb ("mm: initialize pages on demand during boot")
>>>>
>>>> Signed-off-by: Pavel Tatashin <[email protected]>
>>>> Reviewed-by: Steven Rostedt (VMware) <[email protected]>
>>>> ---
>>>> init/main.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/init/main.c b/init/main.c
>>>> index b795aa341a3a..870f75581cea 100644
>>>> --- a/init/main.c
>>>> +++ b/init/main.c
>>>> @@ -585,8 +585,8 @@ asmlinkage __visible void __init start_kernel(void)
>>>> setup_log_buf(0);
>>>> vfs_caches_init_early();
>>>> sort_main_extable();
>>>> - trap_init();
>>>> mm_init();
>>>> + trap_init();
>>>>
>>>> ftrace_init();
>>>>

2018-05-04 17:50:26

by Andy Shevchenko

[permalink] [raw]

Subject: Re: [v2] mm: access to uninitialized struct page

On Fri, May 4, 2018 at 7:03 PM, Pavel Tatashin
<[email protected]> wrote:
> Thank you, I will try to figure out what is happening.

+1 is here.

The last message I have seen on the console are:

[ 4.690972] Non-volatile memory driver v1.3
[ 4.703360] Linux agpgart interface v0.103
[ 4.710282] loop: module loaded

Bisection points to this very patch.

I would suggest to revert ASAP and you may still continue
investigating on your side.

--
With Best Regards,
Andy Shevchenko

2018-05-05 01:05:17

by Fengguang Wu

[permalink] [raw]

Subject: Re: [v2] mm: access to uninitialized struct page

Hi Pavel,

FYI here is 0day's bisect result. The attached dmesg has reproduce
script at the bottom.

[27e2ce5dba4c30db031744c8140675d03d2ae7aa] mm: access to uninitialized struct page
git://git.cmpxchg.org/linux-mmotm.git devel-catchup-201805041701

git bisect start 53eff77ad4b0adaf1ca6e1ecc6acf3804c344531 6da6c0db5316275015e8cc2959f12a17584aeb64 --
git bisect bad 3fc24705ffb48c18b23ce2c229f5018d39b18ab0 # 20:22 B 2Merge 'djwong-xfs/djwong-devel' into devel-catchup-201805041701
git bisect bad 78bd9ee71ffbbe5ab169cbe469503af2dfb913f9 # 20:35 B 2Merge 'linux-review/Geert-Uytterhoeven/dt-bindings-can-rcar_can-Fix-R8A7796-SoC-name/20180504-154952' into devel-catchup-201805041701
git bisect bad 1deba87932c5d0adcffe63d8ce4847f39e864775 # 20:56 B 2Merge 'yhuang/fix_thp_swap' into devel-catchup-201805041701
git bisect good e0997365e1e89e8bc9f5ed4a58a6cd2500b58668 # 21:09 G 20day base guard for 'devel-catchup-201805041701'
git bisect bad 98815bfa9156a8d1da1f1ca5f3748e250fa19a88 # 21:22 B 2Merge 'yhuang/thp_delay_split3_r1a' into devel-catchup-201805041701
git bisect good 6da6c0db5316275015e8cc2959f12a17584aeb64 # 21:22 G 3Linux v4.17-rc3
git bisect bad 97c561bb48a33e135a90573c596bc755ed4eab32 # 21:36 B 2mm, pagemap: Hide swap entry for unprivileged users
git bisect bad 466b08a3a87e8e43af677375a4cf8eb105f50007 # 21:47 B 2mm, swap: fix race between swapoff and some swap operations
git bisect bad 44ea77b7788384a7c27a960ec5752de08e47882a # 21:58 B 2zram-introduce-zram-memory-tracking-fix
git bisect bad 7a53abd52e920e8a2c16ea2cb81439e6e87a7ea4 # 22:06 B 2prctl: add PR_[GS]ET_PDEATHSIG_PROC
git bisect good 1fda92fccc022924575edb98191f1ad2c0477c31 # 22:15 G 2z3fold-fix-reclaim-lock-ups-checkpatch-fixes
git bisect bad 685cc80b632235416b72869247df7d6ae2816d61 # 22:30 B 2mm: migrate: fix double call of radix_tree_replace_slot()
git bisect bad 27e2ce5dba4c30db031744c8140675d03d2ae7aa # 22:55 B 2mm: access to uninitialized struct page
git bisect good 7a0e68e17b8aa41aa33e8c80015e36d47dde390a # 23:11 G 2mm: sections are not offlined during memory hotremove
# extra tests on first bad commit
# bad: [27e2ce5dba4c30db031744c8140675d03d2ae7aa] mm: access to uninitialized struct page
# extra tests on parent commit
# good: [7a0e68e17b8aa41aa33e8c80015e36d47dde390a] mm: sections are not offlined during memory hotremove

tests: 2
testcase/path_params/tbox_group/run: boot/1/vm-vp-quantal-x86_64

7a0e68e17b8aa41a 27e2ce5dba4c30db031744c814
---------------- --------------------------
fail:runs %reproduction fail:runs
| | |
:2 100% 2:2 dmesg.BUG:kernel_reboot-without-warning_in_boot_stage

testcase/path_params/tbox_group/run: boot/1/vm-lkp-nex04-yocto-x86_64

7a0e68e17b8aa41a 27e2ce5dba4c30db031744c814
---------------- --------------------------
:4 100% 4:4 dmesg.BUG:kernel_reboot-without-warning_in_boot_stage

Thanks,
Fengguang

Attachments:

(No filename) (3.06 kB)
dmesg.xz (8.56 kB)
config-4.17.0-rc3-00012-g27e2ce5 (118.12 kB)
Download all attachments

2018-05-08 14:46:36

by Pavel Tatashin

[permalink] [raw]

Subject: Re: [PATCH v2] mm: access to uninitialized struct page

> Gulp. Let's hope that nothing in mm_init() requires that trap_init()
> has been run. What happens if something goes wrong during mm_init()
> and the architecture attempts to raise a software exception, hits a bus
> error, div-by-zero, etc, etc? Might there be hard-to-discover
> dependencies in such a case?

Hi Andrew,

Unfortunately, mm_init() requires trap_init(). And, because trap_init() is
arch specific, I do not see a way to simply fix trap_init(). So, we need
to find a different fix for the above problem. And, the current fix needs
to be removed from mm.

BTW, the bug was not introduced by:
c9e97a1997fb ("mm: initialize pages on demand during boot")

Fengguang Wu, reproduced this bug with builds prior to when this patch was
added. So, I think that while my patch may make this problem happen more
frequently, the problem itself is older. Basically, it depends on value of
KASLR.

One way to quickly fix this issue is to disable deferred struct pages when
the following combination is true:
CONFIG_RANDOMIZE_BASE && CONFIG_SPARSEMEM && !CONFIG_SPARSEMEM_VMEMMAP

RANDOMIZE_BASE means we do not know from what PFN struct pages are going to
be required before mm_init().
CONFIG_SPARSEMEM && !CONFIG_SPARSEMEM_VMEMMAP means that page_to_pfn() will
use information from page->flags to get section number, and thus require
accessing "struct pages"

Pavel