2014-01-14 13:31:39

by Fengguang Wu

[permalink] [raw]
Subject: [x86, kaslr] BUG: kernel boot hang

Greetings,

I got the below dmesg and the first bad commit is

commit 82fa9637a2ba285bcc7c5050c73010b2c1b3d803
Author: Kees Cook <[email protected]>
AuthorDate: Thu Oct 10 17:18:16 2013 -0700
Commit: H. Peter Anvin <[email protected]>
CommitDate: Sun Oct 13 03:12:19 2013 -0700

x86, kaslr: Select random position from e820 maps

Counts available alignment positions across all e820 maps, and chooses
one randomly for the new kernel base address, making sure not to collide
with unsafe memory areas.

Signed-off-by: Kees Cook <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>

Note that there are many other warning/errors and it's not very
reproducible, so this report might be wrong.

===================================================
PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT!
===================================================

+-----------------------------------------------------------+--------------+--------------+
| | 5bfce5ef55cb | 1955a14a5ba6 |
+-----------------------------------------------------------+--------------+--------------+
| boot_successes | 3948 | 0 |
| boot_failures | 52 | 89 |
| page_allocation_failure:order:,mode | 48 | 2 |
| Out_of_memory:Kill_process | 7 | |
| BUG:kernel_early_hang_without_any_printk_output | 1 | |
| BUG:soft_lockup-CPU_stuck_for_s | 1 | |
| WARNING:CPU:PID:at_kernel/locking/lockdep.c:check_flags() | 0 | 85 |
| general_protection_fault:SMP_SMP | 0 | 1 |
| RIP:__lock_acquire | 0 | 1 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 1 |
| BUG:kernel_boot_hang | 0 | 2 |
| BUG:kernel_boot_crashed | 0 | 1 |
+-----------------------------------------------------------+--------------+--------------+

The last dmesg is

[ 0.803796] Initramfs unpacking failed: junk in compressed archive
[ 0.803796] Initramfs unpacking failed: junk in compressed archive

or in some cases

[ 0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
[ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.000000] [mem 0x00000000-0x000fffff] page 4k
[ 0.000000] BRK [0x07886000, 0x07886fff] PGTABLE
[ 0.000000] BRK [0x07887000, 0x07887fff] PGTABLE
[ 0.000000] BRK [0x07888000, 0x07888fff] PGTABLE
PANIC: early exception 0e rip 10:ffffffff86204c6e error 0 cr2 ffffffff81972b28
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0-rc4-00008-g6e6a493 #614
PANIC: early exception 0e rip 10:ffffffff86204f22 error 0 cr2 ffffffff81972b28

git bisect start 7ddcdb2ccdcae0838a39b1bf7b0773c5540da847 v3.12 --
git bisect good c537aba00e3f1df8ce6c7c9fcb98b82c0c2d1d2c # 06:55 790+ 10 Merge git://git.kvack.org/~bcrl/aio-next
git bisect good a707271a8180eb60edc3bd9dc3cb425c7547fd76 # 11:26 790+ 9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect good e0e8fc3148e0e142079611f82089258fb3b46f00 # 16:33 790+ 16 Merge remote-tracking branch 'thermal/next'
git bisect good f7f483b7c64695af05e1ce58cc23001631281fe4 # 21:08 790+ 14 Merge remote-tracking branch 'vfio/next'
git bisect bad 19eee7a401b4a2316f1ebcdc7e66b44d6d7ea963 # 21:39 36- 1 Merge remote-tracking branch 'leds/for-next'
git bisect bad aa3a911ccf01099f146cda01793bad03d970bc0f # 21:56 54- 1 Merge remote-tracking branch 'edac-amd/for-next'
git bisect good a11acbbed1bf2fd3204027d2b2ac6246daf14445 # 00:27 790+ 8 Merge remote-tracking branch 'dt-rh/for-next'
git bisect bad b8057368d6c29db538ff259266a4375200c3d029 # 01:09 66- 3 Merge remote-tracking branch 'tip/auto-latest'
git bisect good 592410b1ba9daf61c3bde92762a43eac58000850 # 06:02 790+ 15 Merge remote-tracking branch 'spi/for-next'
git bisect good fa6e8e5f7cbf85f364ebd5a90525dbbe9de2083b # 11:29 790+ 9 Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
git bisect good 230d47dedb5763acfaf6842c85d9d24f97fc3ee2 # 16:02 790+ 16 Merge branch 'sched/core'
git bisect good c0ffb3570ae6e152490abcf1a0241e0924db3176 # 20:38 790+ 14 manual merge of x86/efi
git bisect bad 88ec17ea98ebac2a6306bc14810046493c96e27f # 21:09 102- 3 Merge branch 'x86/kaslr'
git bisect good 939379960e3fcd81cb8b709b67afe4c7f097dcc8 # 02:19 1000+ 14 Merge branch 'x86/efi-kexec'
git bisect bad 6e6a4932b0f569b1a5bb4fcbf5dde1b1a42f01bb # 02:38 0- 3 x86, boot: Rename get_flags() and check_flags() to *_cpuflags()
git bisect good 5bfce5ef55cbe78ee2ee6e97f2e26a8a582008f3 # 07:35 1000+ 11 x86, kaslr: Provide randomness functions
git bisect bad f32360ef6608434a032dc7ad262d45e9693c27f3 # 07:48 0- 4 x86, kaslr: Report kernel offset on panic
git bisect bad 82fa9637a2ba285bcc7c5050c73010b2c1b3d803 # 08:00 1- 2 x86, kaslr: Select random position from e820 maps
# first bad commit: [82fa9637a2ba285bcc7c5050c73010b2c1b3d803] x86, kaslr: Select random position from e820 maps
git bisect good 5bfce5ef55cbe78ee2ee6e97f2e26a8a582008f3 # 20:21 3000+ 52 x86, kaslr: Provide randomness functions
git bisect bad 1955a14a5ba6e3c3b11117812d11dc550ccc37ae # 20:21 0- 89 Add linux-next specific files for 20140110
git bisect good a815a5e98e7bf4d75b2515fd08f464ca24f52c85 # 08:40 3000+ 3000 Revert "x86, kaslr: Select random position from e820 maps"
git bisect good a6da83f98267bc8ee4e34aa899169991eb0ceb93 # 19:21 3000+ 41 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
git bisect bad cf1c1d193e37b9f79eedddc6bbd71b9f5f9751e5 # 19:51 99- 100 Add linux-next specific files for 20140114

Thanks,
Fengguang


Attachments:
(No filename) (6.28 kB)
dmesg-quantal-xgwo-5:20140110182546:x86_64-randconfig-s1-01101738:3.13.0-rc7-next-20140110-08732-g1955a14:1 (59.70 kB)
x86_64-randconfig-s1-01101738-1955a14a5ba6e3c3b11117812d11dc550ccc37ae-BUG:-kernel-boot-hang-21372.log (60.48 kB)
config-3.13.0-rc7-next-20140110-08732-g1955a14 (74.74 kB)
dmesg-yocto-xian-38:20140113065540:x86_64-randconfig-s1-01101738:3.12.0-rc4-00008-g6e6a493:614 (4.41 kB)
Download all attachments

2014-01-14 16:03:30

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [x86, kaslr] BUG: kernel boot hang

On 01/14/2014 05:31 AM, Fengguang Wu wrote:
> Greetings,
>
> I got the below dmesg and the first bad commit is
>
> commit 82fa9637a2ba285bcc7c5050c73010b2c1b3d803
> Author: Kees Cook <[email protected]>
> AuthorDate: Thu Oct 10 17:18:16 2013 -0700
> Commit: H. Peter Anvin <[email protected]>
> CommitDate: Sun Oct 13 03:12:19 2013 -0700
>
> x86, kaslr: Select random position from e820 maps
>
> Counts available alignment positions across all e820 maps, and chooses
> one randomly for the new kernel base address, making sure not to collide
> with unsafe memory areas.
>
> Signed-off-by: Kees Cook <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: H. Peter Anvin <[email protected]>
>
> Note that there are many other warning/errors and it's not very
> reproducible, so this report might be wrong.
>
> ===================================================
> PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT!
> ===================================================
>

I wonder if this is in any way related to the fact that the ELF parser
we have in the decompressor is quite frankly complete crap... it assumes
that all sections can only be moved downward.

-hpa

2014-01-14 18:26:45

by Kees Cook

[permalink] [raw]
Subject: Re: [x86, kaslr] BUG: kernel boot hang

On Tue, Jan 14, 2014 at 5:31 AM, Fengguang Wu <[email protected]> wrote:
> Greetings,
>
> I got the below dmesg and the first bad commit is
>
> commit 82fa9637a2ba285bcc7c5050c73010b2c1b3d803
> Author: Kees Cook <[email protected]>
> AuthorDate: Thu Oct 10 17:18:16 2013 -0700
> Commit: H. Peter Anvin <[email protected]>
> CommitDate: Sun Oct 13 03:12:19 2013 -0700
>
> x86, kaslr: Select random position from e820 maps
>
> Counts available alignment positions across all e820 maps, and chooses
> one randomly for the new kernel base address, making sure not to collide
> with unsafe memory areas.
>
> Signed-off-by: Kees Cook <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: H. Peter Anvin <[email protected]>
>
> Note that there are many other warning/errors and it's not very
> reproducible, so this report might be wrong.
>
> ===================================================
> PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT!
> ===================================================
>
> +-----------------------------------------------------------+--------------+--------------+
> | | 5bfce5ef55cb | 1955a14a5ba6 |
> +-----------------------------------------------------------+--------------+--------------+
> | boot_successes | 3948 | 0 |
> | boot_failures | 52 | 89 |
> | page_allocation_failure:order:,mode | 48 | 2 |
> | Out_of_memory:Kill_process | 7 | |
> | BUG:kernel_early_hang_without_any_printk_output | 1 | |
> | BUG:soft_lockup-CPU_stuck_for_s | 1 | |
> | WARNING:CPU:PID:at_kernel/locking/lockdep.c:check_flags() | 0 | 85 |

Does this mean that
"WARNING:CPU:PID:at_kernel/locking/lockdep.c:check_flags()" is the
most common failure condition?

> | general_protection_fault:SMP_SMP | 0 | 1 |
> | RIP:__lock_acquire | 0 | 1 |
> | Kernel_panic-not_syncing:Fatal_exception | 0 | 1 |
> | BUG:kernel_boot_hang | 0 | 2 |
> | BUG:kernel_boot_crashed | 0 | 1 |
> +-----------------------------------------------------------+--------------+--------------+
>
> The last dmesg is
>
> [ 0.803796] Initramfs unpacking failed: junk in compressed archive
> [ 0.803796] Initramfs unpacking failed: junk in compressed archive
>
> or in some cases
>
> [ 0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
> [ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
> [ 0.000000] [mem 0x00000000-0x000fffff] page 4k
> [ 0.000000] BRK [0x07886000, 0x07886fff] PGTABLE
> [ 0.000000] BRK [0x07887000, 0x07887fff] PGTABLE
> [ 0.000000] BRK [0x07888000, 0x07888fff] PGTABLE
> PANIC: early exception 0e rip 10:ffffffff86204c6e error 0 cr2 ffffffff81972b28
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0-rc4-00008-g6e6a493 #614
> PANIC: early exception 0e rip 10:ffffffff86204f22 error 0 cr2 ffffffff81972b28

I will try to reproduce this, but it's not clear to me what is causing
the failure. The generated config doesn't look insane to me, so I'm
not sure what's happening here. Is QEMU doing something unexpected
with the ordering of where things go for its boot loader?

-Kees

--
Kees Cook
Chrome OS Security

2014-01-14 18:32:09

by Kees Cook

[permalink] [raw]
Subject: Re: [x86, kaslr] BUG: kernel boot hang

On Tue, Jan 14, 2014 at 8:02 AM, H. Peter Anvin <[email protected]> wrote:
> On 01/14/2014 05:31 AM, Fengguang Wu wrote:
>> Greetings,
>>
>> I got the below dmesg and the first bad commit is
>>
>> commit 82fa9637a2ba285bcc7c5050c73010b2c1b3d803
>> Author: Kees Cook <[email protected]>
>> AuthorDate: Thu Oct 10 17:18:16 2013 -0700
>> Commit: H. Peter Anvin <[email protected]>
>> CommitDate: Sun Oct 13 03:12:19 2013 -0700
>>
>> x86, kaslr: Select random position from e820 maps
>>
>> Counts available alignment positions across all e820 maps, and chooses
>> one randomly for the new kernel base address, making sure not to collide
>> with unsafe memory areas.
>>
>> Signed-off-by: Kees Cook <[email protected]>
>> Link: http://lkml.kernel.org/r/[email protected]
>> Signed-off-by: H. Peter Anvin <[email protected]>
>>
>> Note that there are many other warning/errors and it's not very
>> reproducible, so this report might be wrong.
>>
>> ===================================================
>> PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT!
>> ===================================================
>>
>
> I wonder if this is in any way related to the fact that the ELF parser
> we have in the decompressor is quite frankly complete crap... it assumes
> that all sections can only be moved downward.

Not that this would change the code here, but I notice tip:x86/kaslr
isn't fully up to date. It's still missing the two most recent
commits:

https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=kaslr-c-v8

"x86, kaslr: clarify RANDOMIZE_BASE_MAX_OFFSET"
"x86, kaslr: remove unused including <linux/version.h>"

-Kees

--
Kees Cook
Chrome OS Security

2014-01-14 18:47:52

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [x86, kaslr] BUG: kernel boot hang

On 01/14/2014 10:26 AM, Kees Cook wrote:
>>
>> [ 0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
>> [ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
>> [ 0.000000] [mem 0x00000000-0x000fffff] page 4k
>> [ 0.000000] BRK [0x07886000, 0x07886fff] PGTABLE
>> [ 0.000000] BRK [0x07887000, 0x07887fff] PGTABLE
>> [ 0.000000] BRK [0x07888000, 0x07888fff] PGTABLE
>> PANIC: early exception 0e rip 10:ffffffff86204c6e error 0 cr2 ffffffff81972b28
>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0-rc4-00008-g6e6a493 #614
>> PANIC: early exception 0e rip 10:ffffffff86204f22 error 0 cr2 ffffffff81972b28
>
> I will try to reproduce this, but it's not clear to me what is causing
> the failure. The generated config doesn't look insane to me, so I'm
> not sure what's happening here. Is QEMU doing something unexpected
> with the ordering of where things go for its boot loader?
>

It used to, but I fixed it up a long time ago and now it is really bog
standard.

More likely it just happens to trigger a corner condition, which is
actually a good thing... much easier to debug in a simulator.

-hpa

2014-01-14 19:19:49

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [x86, kaslr] BUG: kernel boot hang

On 01/14/2014 10:32 AM, Kees Cook wrote:
>
> Not that this would change the code here, but I notice tip:x86/kaslr
> isn't fully up to date. It's still missing the two most recent
> commits:
>
> https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=kaslr-c-v8
>
> "x86, kaslr: clarify RANDOMIZE_BASE_MAX_OFFSET"
> "x86, kaslr: remove unused including <linux/version.h>"
>

Fixed. The patches were tagged [PATCH -next] and so I overlooked them.

-hpa

2014-01-14 22:33:20

by Kees Cook

[permalink] [raw]
Subject: Re: [x86, kaslr] BUG: kernel boot hang

On Tue, Jan 14, 2014 at 5:31 AM, Fengguang Wu <[email protected]> wrote:
> Greetings,
>
> I got the below dmesg and the first bad commit is
>
> commit 82fa9637a2ba285bcc7c5050c73010b2c1b3d803
> Author: Kees Cook <[email protected]>
> AuthorDate: Thu Oct 10 17:18:16 2013 -0700
> Commit: H. Peter Anvin <[email protected]>
> CommitDate: Sun Oct 13 03:12:19 2013 -0700
>
> x86, kaslr: Select random position from e820 maps
>
> Counts available alignment positions across all e820 maps, and chooses
> one randomly for the new kernel base address, making sure not to collide
> with unsafe memory areas.
>
> Signed-off-by: Kees Cook <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: H. Peter Anvin <[email protected]>
>
> Note that there are many other warning/errors and it's not very
> reproducible, so this report might be wrong.
>
> ===================================================
> PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT!
> ===================================================
>
> +-----------------------------------------------------------+--------------+--------------+
> | | 5bfce5ef55cb | 1955a14a5ba6 |
> +-----------------------------------------------------------+--------------+--------------+
> | boot_successes | 3948 | 0 |
> | boot_failures | 52 | 89 |
> | page_allocation_failure:order:,mode | 48 | 2 |
> | Out_of_memory:Kill_process | 7 | |
> | BUG:kernel_early_hang_without_any_printk_output | 1 | |
> | BUG:soft_lockup-CPU_stuck_for_s | 1 | |
> | WARNING:CPU:PID:at_kernel/locking/lockdep.c:check_flags() | 0 | 85 |
> | general_protection_fault:SMP_SMP | 0 | 1 |
> | RIP:__lock_acquire | 0 | 1 |
> | Kernel_panic-not_syncing:Fatal_exception | 0 | 1 |
> | BUG:kernel_boot_hang | 0 | 2 |
> | BUG:kernel_boot_crashed | 0 | 1 |
> +-----------------------------------------------------------+--------------+--------------+
>
> The last dmesg is
>
> [ 0.803796] Initramfs unpacking failed: junk in compressed archive

Can you tell me how the initrd for quantal-core-x86_64.cgz was built
in the qemu instances you're using? It seems like all the failures
point to a problem with how kASLR is interacting with the initrd.

Thanks,

-Kees

--
Kees Cook
Chrome OS Security

2014-01-14 23:24:24

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [x86, kaslr] BUG: kernel boot hang

On 01/14/2014 02:33 PM, Kees Cook wrote:
>
> Can you tell me how the initrd for quantal-core-x86_64.cgz was built
> in the qemu instances you're using? It seems like all the failures
> point to a problem with how kASLR is interacting with the initrd.
>

If kASLR somehow causes the kernel to collide with the initrd that would
be problem...

-hpa

2014-01-14 23:31:50

by Kees Cook

[permalink] [raw]
Subject: Re: [x86, kaslr] BUG: kernel boot hang

On Tue, Jan 14, 2014 at 3:23 PM, H. Peter Anvin <[email protected]> wrote:
> On 01/14/2014 02:33 PM, Kees Cook wrote:
>>
>> Can you tell me how the initrd for quantal-core-x86_64.cgz was built
>> in the qemu instances you're using? It seems like all the failures
>> point to a problem with how kASLR is interacting with the initrd.
>>
>
> If kASLR somehow causes the kernel to collide with the initrd that would
> be problem...

Agreed, but I can't reproduce this yet. The initrd is on the list of
areas that get avoided, so the fundamental design isn't broken, but
clearly something is busted.

How long has tip:x86/kaslr been sitting in linux-next? If this is some
interaction between kaslr and something else, perhaps merge order just
happens to be pointing at kaslr?

Regardless, all my tests so far against next-20140114 and an initramfs
haven't seen corruption (using the given .config). I don't have the
same initrd, though, so I'm hoping getting that will trigger the
glitch.

-Kees

--
Kees Cook
Chrome OS Security

2014-01-14 23:45:49

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [x86, kaslr] BUG: kernel boot hang

On 01/14/2014 03:31 PM, Kees Cook wrote:
> On Tue, Jan 14, 2014 at 3:23 PM, H. Peter Anvin <[email protected]> wrote:
>> On 01/14/2014 02:33 PM, Kees Cook wrote:
>>>
>>> Can you tell me how the initrd for quantal-core-x86_64.cgz was built
>>> in the qemu instances you're using? It seems like all the failures
>>> point to a problem with how kASLR is interacting with the initrd.
>>>
>>
>> If kASLR somehow causes the kernel to collide with the initrd that would
>> be problem...
>
> Agreed, but I can't reproduce this yet. The initrd is on the list of
> areas that get avoided, so the fundamental design isn't broken, but
> clearly something is busted.
>
> How long has tip:x86/kaslr been sitting in linux-next? If this is some
> interaction between kaslr and something else, perhaps merge order just
> happens to be pointing at kaslr?
>
> Regardless, all my tests so far against next-20140114 and an initramfs
> haven't seen corruption (using the given .config). I don't have the
> same initrd, though, so I'm hoping getting that will trigger the
> glitch.
>

Given our craptastic ELF parser I'm wondering if we at some point
overrun the kernel "safe memory zone" during decompression/parsing...

-hpa

2014-01-15 12:11:04

by Fengguang Wu

[permalink] [raw]
Subject: Re: [x86, kaslr] BUG: kernel boot hang

On Tue, Jan 14, 2014 at 10:26:42AM -0800, Kees Cook wrote:
> On Tue, Jan 14, 2014 at 5:31 AM, Fengguang Wu <[email protected]> wrote:
> > Greetings,
> >
> > I got the below dmesg and the first bad commit is
> >
> > commit 82fa9637a2ba285bcc7c5050c73010b2c1b3d803
> > Author: Kees Cook <[email protected]>
> > AuthorDate: Thu Oct 10 17:18:16 2013 -0700
> > Commit: H. Peter Anvin <[email protected]>
> > CommitDate: Sun Oct 13 03:12:19 2013 -0700
> >
> > x86, kaslr: Select random position from e820 maps
> >
> > Counts available alignment positions across all e820 maps, and chooses
> > one randomly for the new kernel base address, making sure not to collide
> > with unsafe memory areas.
> >
> > Signed-off-by: Kees Cook <[email protected]>
> > Link: http://lkml.kernel.org/r/[email protected]
> > Signed-off-by: H. Peter Anvin <[email protected]>
> >
> > Note that there are many other warning/errors and it's not very
> > reproducible, so this report might be wrong.
> >
> > ===================================================
> > PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT!
> > ===================================================
> >
> > +-----------------------------------------------------------+--------------+--------------+
> > | | 5bfce5ef55cb | 1955a14a5ba6 |
> > +-----------------------------------------------------------+--------------+--------------+
> > | boot_successes | 3948 | 0 |
> > | boot_failures | 52 | 89 |
> > | page_allocation_failure:order:,mode | 48 | 2 |
> > | Out_of_memory:Kill_process | 7 | |
> > | BUG:kernel_early_hang_without_any_printk_output | 1 | |
> > | BUG:soft_lockup-CPU_stuck_for_s | 1 | |
> > | WARNING:CPU:PID:at_kernel/locking/lockdep.c:check_flags() | 0 | 85 |
>
> Does this mean that
> "WARNING:CPU:PID:at_kernel/locking/lockdep.c:check_flags()" is the
> most common failure condition?

Yes. However I noticed that this warning exists before your commit.
So they should be irrelevant noises.

> > | general_protection_fault:SMP_SMP | 0 | 1 |
> > | RIP:__lock_acquire | 0 | 1 |
> > | Kernel_panic-not_syncing:Fatal_exception | 0 | 1 |
> > | BUG:kernel_boot_hang | 0 | 2 |
> > | BUG:kernel_boot_crashed | 0 | 1 |
> > +-----------------------------------------------------------+--------------+--------------+
> >
> > The last dmesg is
> >
> > [ 0.803796] Initramfs unpacking failed: junk in compressed archive
> > [ 0.803796] Initramfs unpacking failed: junk in compressed archive
> >
> > or in some cases
> >
> > [ 0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
> > [ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
> > [ 0.000000] [mem 0x00000000-0x000fffff] page 4k
> > [ 0.000000] BRK [0x07886000, 0x07886fff] PGTABLE
> > [ 0.000000] BRK [0x07887000, 0x07887fff] PGTABLE
> > [ 0.000000] BRK [0x07888000, 0x07888fff] PGTABLE
> > PANIC: early exception 0e rip 10:ffffffff86204c6e error 0 cr2 ffffffff81972b28
> > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0-rc4-00008-g6e6a493 #614
> > PANIC: early exception 0e rip 10:ffffffff86204f22 error 0 cr2 ffffffff81972b28
>
> I will try to reproduce this, but it's not clear to me what is causing
> the failure. The generated config doesn't look insane to me, so I'm
> not sure what's happening here. Is QEMU doing something unexpected
> with the ordering of where things go for its boot loader?

The PANIC message is very reproducible in some configs (I'll report
one bisect result for it in another email), and it will happen in both
the initrd files I used: the 3.1MB yocto-minimal-x86_64.cgz and the
23MB quantal-core-x86_64.cgz. I'll send you the smaller yocto image.

Thanks,
Fengguang

2014-01-15 12:37:27

by Fengguang Wu

[permalink] [raw]
Subject: Re: [x86, kaslr] BUG: kernel boot hang

On Tue, Jan 14, 2014 at 02:33:15PM -0800, Kees Cook wrote:
> On Tue, Jan 14, 2014 at 5:31 AM, Fengguang Wu <[email protected]> wrote:
> > Greetings,
> >
> > I got the below dmesg and the first bad commit is
> >
> > commit 82fa9637a2ba285bcc7c5050c73010b2c1b3d803
> > Author: Kees Cook <[email protected]>
> > AuthorDate: Thu Oct 10 17:18:16 2013 -0700
> > Commit: H. Peter Anvin <[email protected]>
> > CommitDate: Sun Oct 13 03:12:19 2013 -0700
> >
> > x86, kaslr: Select random position from e820 maps
> >
> > Counts available alignment positions across all e820 maps, and chooses
> > one randomly for the new kernel base address, making sure not to collide
> > with unsafe memory areas.
> >
> > Signed-off-by: Kees Cook <[email protected]>
> > Link: http://lkml.kernel.org/r/[email protected]
> > Signed-off-by: H. Peter Anvin <[email protected]>
> >
> > Note that there are many other warning/errors and it's not very
> > reproducible, so this report might be wrong.
> >
> > ===================================================
> > PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT!
> > ===================================================
> >
> > +-----------------------------------------------------------+--------------+--------------+
> > | | 5bfce5ef55cb | 1955a14a5ba6 |
> > +-----------------------------------------------------------+--------------+--------------+
> > | boot_successes | 3948 | 0 |
> > | boot_failures | 52 | 89 |
> > | page_allocation_failure:order:,mode | 48 | 2 |
> > | Out_of_memory:Kill_process | 7 | |
> > | BUG:kernel_early_hang_without_any_printk_output | 1 | |
> > | BUG:soft_lockup-CPU_stuck_for_s | 1 | |
> > | WARNING:CPU:PID:at_kernel/locking/lockdep.c:check_flags() | 0 | 85 |
> > | general_protection_fault:SMP_SMP | 0 | 1 |
> > | RIP:__lock_acquire | 0 | 1 |
> > | Kernel_panic-not_syncing:Fatal_exception | 0 | 1 |
> > | BUG:kernel_boot_hang | 0 | 2 |
> > | BUG:kernel_boot_crashed | 0 | 1 |
> > +-----------------------------------------------------------+--------------+--------------+
> >
> > The last dmesg is
> >
> > [ 0.803796] Initramfs unpacking failed: junk in compressed archive
>
> Can you tell me how the initrd for quantal-core-x86_64.cgz was built
> in the qemu instances you're using? It seems like all the failures
> point to a problem with how kASLR is interacting with the initrd.

That Initramfs unpacking error is much harder to reproduce, but
anyway here's how I create the image file:

quantal-core-x86_64.cgz is based on the ubuntu-core images files:

http://cdimage.ubuntu.com/ubuntu-core/releases/quantal/release/

Download one image and extract files to quantal-core-x86_64/ and run
the below script in parent dir

./create-cpio.sh quantal-core-x86_64

#!/bin/bash

core=$1

[[ $1 ]] || exit
cd $core || exit

[[ -d ../addon ]] && cp -a ../addon/* .

mkdir initrd
ln -s sbin/init init

find . |
grep -v -e usr/doc \
-e usr/man \
-e usr/info \
-e sbin/modprobe \
-e usr/share/doc \
-e usr/share/man \
-e usr/share/info \
-e usr/share/i18n \
-e usr/share/locales \
-e usr/lib/x86_64-linux-gnu/gconv \
-e var/lib/apt/lists \
-e var/lib/dpkg/info \
-e var/cache/apt/archives |
cpio -o -H newc | gzip -n -9 > ../$core.cgz