2017-03-17 12:00:06

by kernel test robot

[permalink] [raw]
Subject: [x86] 45fc8757d1: BUG:unable_to_handle_kernel


FYI, we noticed the following commit:

commit: 45fc8757d1d2128e342b4e7ef39adedf7752faac ("x86: Make the GDT remapping read-only on 64-bit")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -m 420M

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------+------------+------------+
| | 69218e4799 | 45fc8757d1 |
+------------------------------------------+------------+------------+
| boot_successes | 8 | 2 |
| boot_failures | 0 | 11 |
| BUG:unable_to_handle_kernel | 0 | 11 |
| Oops:#[##] | 0 | 11 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 11 |
+------------------------------------------+------------+------------+



[ 4.347219] BUG: unable to handle kernel paging request at ffffffffff577060
[ 4.349770] IP: 0xf77e91ed
[ 4.351365] PGD 1e0c067
[ 4.351366] P4D 1e0c067
[ 4.352885] PUD 1e0e067
[ 4.354421] PMD 1e0f067
[ 4.355947] PTE 800000000be09161
[ 4.357457]
[ 4.360480] Oops: 0003 [#1] SMP
[ 4.362150] Modules linked in:
[ 4.363816] CPU: 0 PID: 1 Comm: init Not tainted 4.11.0-rc2-00014-g45fc875 #15
[ 4.367207] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
[ 4.371277] task: ffff88000b9a8000 task.stack: ffffc900000d0000
[ 4.373550] RIP: 0023:0xf77e91ed
[ 4.375284] RSP: 002b:00000000ffed034c EFLAGS: 00010246
[ 4.377409] RAX: 0000000000000063 RBX: 00000000f77edff0 RCX: 00000000ffed034c
[ 4.379996] RDX: 00000000f77e1690 RSI: 00000000f77ee094 RDI: 000000000000000c
[ 4.382588] RBP: 00000000ffed0368 R08: 0000000000000000 R09: 0000000000000000
[ 4.385136] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 4.387709] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 4.390289] FS: 0000000000000000(0000) GS:ffff88000be00000(0000) knlGS:0000000000000000
[ 4.393870] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 4.396131] CR2: ffffffffff577060 CR3: 0000000019d08000 CR4: 00000000000006f0
[ 4.398696] RIP: 0xf77e91ed RSP: 00000000ffed034c
[ 4.400716] CR2: ffffffffff577060
[ 4.402425] ---[ end trace 35060e6ad8052d5b ]---


To reproduce:

git clone https://github.com/01org/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Kernel Test Robot


Attachments:
(No filename) (2.64 kB)
config-4.11.0-rc2-00014-g45fc875 (154.31 kB)
job-script (3.74 kB)
dmesg.xz (10.31 kB)
Download all attachments

2017-03-17 17:41:46

by Thomas Garnier

[permalink] [raw]
Subject: Re: [x86] 45fc8757d1: BUG:unable_to_handle_kernel

I tried multiple things to repro this crash without success:
- Used the config on my existing qemu setup (boot fine)
- Add most of the command-line (boot fine)
- Try to run the script on a dedicated machine and it seems it is
really tailored for your setup. I had errors with usernames and cpio
crashing.

Any additional information you could share? (RIP -> function name,
callstack etc..?)

Thanks,

On Fri, Mar 17, 2017 at 4:59 AM, kernel test robot
<[email protected]> wrote:
>
> FYI, we noticed the following commit:
>
> commit: 45fc8757d1d2128e342b4e7ef39adedf7752faac ("x86: Make the GDT remapping read-only on 64-bit")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
>
> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -m 420M
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +------------------------------------------+------------+------------+
> | | 69218e4799 | 45fc8757d1 |
> +------------------------------------------+------------+------------+
> | boot_successes | 8 | 2 |
> | boot_failures | 0 | 11 |
> | BUG:unable_to_handle_kernel | 0 | 11 |
> | Oops:#[##] | 0 | 11 |
> | Kernel_panic-not_syncing:Fatal_exception | 0 | 11 |
> +------------------------------------------+------------+------------+
>
>
>
> [ 4.347219] BUG: unable to handle kernel paging request at ffffffffff577060
> [ 4.349770] IP: 0xf77e91ed
> [ 4.351365] PGD 1e0c067
> [ 4.351366] P4D 1e0c067
> [ 4.352885] PUD 1e0e067
> [ 4.354421] PMD 1e0f067
> [ 4.355947] PTE 800000000be09161
> [ 4.357457]
> [ 4.360480] Oops: 0003 [#1] SMP
> [ 4.362150] Modules linked in:
> [ 4.363816] CPU: 0 PID: 1 Comm: init Not tainted 4.11.0-rc2-00014-g45fc875 #15
> [ 4.367207] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
> [ 4.371277] task: ffff88000b9a8000 task.stack: ffffc900000d0000
> [ 4.373550] RIP: 0023:0xf77e91ed
> [ 4.375284] RSP: 002b:00000000ffed034c EFLAGS: 00010246
> [ 4.377409] RAX: 0000000000000063 RBX: 00000000f77edff0 RCX: 00000000ffed034c
> [ 4.379996] RDX: 00000000f77e1690 RSI: 00000000f77ee094 RDI: 000000000000000c
> [ 4.382588] RBP: 00000000ffed0368 R08: 0000000000000000 R09: 0000000000000000
> [ 4.385136] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 4.387709] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 4.390289] FS: 0000000000000000(0000) GS:ffff88000be00000(0000) knlGS:0000000000000000
> [ 4.393870] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
> [ 4.396131] CR2: ffffffffff577060 CR3: 0000000019d08000 CR4: 00000000000006f0
> [ 4.398696] RIP: 0xf77e91ed RSP: 00000000ffed034c
> [ 4.400716] CR2: ffffffffff577060
> [ 4.402425] ---[ end trace 35060e6ad8052d5b ]---
>
>
> To reproduce:
>
> git clone https://github.com/01org/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
>
>
>
> Thanks,
> Kernel Test Robot



--
Thomas

2017-03-17 17:57:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: [x86] 45fc8757d1: BUG:unable_to_handle_kernel

On Fri, Mar 17, 2017 at 4:59 AM, kernel test robot
<[email protected]> wrote:
>
> FYI, we noticed the following commit:
>
> commit: 45fc8757d1d2128e342b4e7ef39adedf7752faac ("x86: Make the GDT remapping read-only on 64-bit")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
>
> in testcase: boot
>
> [ 4.347219] BUG: unable to handle kernel paging request at ffffffffff577060
> [ 4.360480] Oops: 0003 [#1] SMP
> [ 4.373550] RIP: 0023:0xf77e91ed
> [ 4.375284] RSP: 002b:00000000ffed034c EFLAGS: 00010246

Heh. That's actually in user space, but the error code (0003) means
"protection fault on a write, not a user access".

So it's almost certainly something that tries to access a segment
descriptor in the GDT, but that segment was marked as "not accessed",
and the CPU was trying to set the accessed bit.

I *thought* we always maked everything accessed when we initialize it,
but something clearly is not.

That's why there's no kernel call trace or anything like that: it is a
system page fault, but it's triggered directly from user mode.

The linear address can be used to look up which entry it is. I assume
the GDT starts at ffffffffff577000, and that this is at offset 0x60
from that. Whatever descriptor that would be..

Linus

2017-03-17 18:00:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: [x86] 45fc8757d1: BUG:unable_to_handle_kernel

On Fri, Mar 17, 2017 at 10:49 AM, Linus Torvalds
<[email protected]> wrote:
>
> The linear address can be used to look up which entry it is. I assume
> the GDT starts at ffffffffff577000, and that this is at offset 0x60
> from that. Whatever descriptor that would be..

Hmm. That should be gdt index 12, aka GDT_ENTRY_TLS_MIN.

I guess user space can set almost anything there. Including setting a
segment type that isn't accessed, and that the CPU will change on the
first actual access.

We do have code to verify the limits and types etc iirc, I guess we
can make sure to set the accessed bit too.

Linus

2017-03-17 18:09:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [x86] 45fc8757d1: BUG:unable_to_handle_kernel

On Fri, Mar 17, 2017 at 11:00 AM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Mar 17, 2017 at 10:49 AM, Linus Torvalds
> <[email protected]> wrote:
>>
>> The linear address can be used to look up which entry it is. I assume
>> the GDT starts at ffffffffff577000, and that this is at offset 0x60
>> from that. Whatever descriptor that would be..
>
> Hmm. That should be gdt index 12, aka GDT_ENTRY_TLS_MIN.
>
> I guess user space can set almost anything there. Including setting a
> segment type that isn't accessed, and that the CPU will change on the
> first actual access.
>
> We do have code to verify the limits and types etc iirc, I guess we
> can make sure to set the accessed bit too.

Hmm. "fill_ldt()" does this:

desc->type = (info->read_exec_only ^ 1) << 1;
desc->type |= info->contents << 2;

which always leaves bit #0 of ->type clear. That's the A bit.

Does the problem go away if we just add a

desc->type |= 1;

to the end there?

But it is entirely possible that I'm missing something here. It's been
_years_ since I looked at descriptor table entries.

Linus

2017-03-17 18:21:31

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [x86] 45fc8757d1: BUG:unable_to_handle_kernel

On Fri, Mar 17, 2017 at 11:07 AM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Mar 17, 2017 at 11:00 AM, Linus Torvalds
> <[email protected]> wrote:
>> On Fri, Mar 17, 2017 at 10:49 AM, Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> The linear address can be used to look up which entry it is. I assume
>>> the GDT starts at ffffffffff577000, and that this is at offset 0x60
>>> from that. Whatever descriptor that would be..
>>
>> Hmm. That should be gdt index 12, aka GDT_ENTRY_TLS_MIN.
>>
>> I guess user space can set almost anything there. Including setting a
>> segment type that isn't accessed, and that the CPU will change on the
>> first actual access.
>>
>> We do have code to verify the limits and types etc iirc, I guess we
>> can make sure to set the accessed bit too.
>
> Hmm. "fill_ldt()" does this:
>
> desc->type = (info->read_exec_only ^ 1) << 1;
> desc->type |= info->contents << 2;
>
> which always leaves bit #0 of ->type clear. That's the A bit.
>
> Does the problem go away if we just add a
>
> desc->type |= 1;
>
> to the end there?

I can easily imagine that breaking WINE or DOSEMU because it'll affect
the LDT, too.

How about this:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=df8110544c6e899897e1b2ec3ab53d9e4ee40f65

I'll see why selftests didn't catch this, too.

2017-03-17 20:18:55

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [x86] 45fc8757d1: BUG:unable_to_handle_kernel

On Fri, Mar 17, 2017 at 12:36 PM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Mar 17, 2017 at 11:20 AM, Andy Lutomirski <[email protected]> wrote:
>>
>> I can easily imagine that breaking WINE or DOSEMU because it'll affect
>> the LDT, too.
>
> Can they even *read* the LDT contents, though? The whole accessed bit
> doesn't show up in 'struct user_desc', so you can neither set it nor
> read it.

LAR. I've learned to never underestimate the absurdity of the games
played by 16-bit apps. (See, for example, the fact that some of them
apparently use SGDT just to find a page that's guaranteed not to be
accessible.)

>
>> How about this:
>
> I don't think that's _wrong_, but..
>
> I'd really rather just do it in fill_ldt() itself, unless you can
> explain how it would be visible to anybody..

See above :(

(Also, your approach would probably break some selftests.)

--Andy

2017-03-17 21:12:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: [x86] 45fc8757d1: BUG:unable_to_handle_kernel

On Fri, Mar 17, 2017 at 1:18 PM, Andy Lutomirski <[email protected]> wrote:
> On Fri, Mar 17, 2017 at 12:36 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> Can they even *read* the LDT contents, though? The whole accessed bit
>> doesn't show up in 'struct user_desc', so you can neither set it nor
>> read it.
>
> LAR. I've learned to never underestimate the absurdity of the games
> played by 16-bit apps. (See, for example, the fact that some of them
> apparently use SGDT just to find a page that's guaranteed not to be
> accessible.)

Ugh. Right you are, LAR will return those type bits.

Of course, maybe somebody cares about them in the GDT already? So
it's visible even with your patch, isn't it. We give users four
entries to play with...

Linus

2017-03-17 21:43:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: [x86] 45fc8757d1: BUG:unable_to_handle_kernel

On Fri, Mar 17, 2017 at 11:20 AM, Andy Lutomirski <[email protected]> wrote:
>
> I can easily imagine that breaking WINE or DOSEMU because it'll affect
> the LDT, too.

Can they even *read* the LDT contents, though? The whole accessed bit
doesn't show up in 'struct user_desc', so you can neither set it nor
read it.

> How about this:

I don't think that's _wrong_, but..

I'd really rather just do it in fill_ldt() itself, unless you can
explain how it would be visible to anybody..

Linus

2017-03-20 01:41:56

by kernel test robot

[permalink] [raw]
Subject: Re: [x86] 45fc8757d1: BUG:unable_to_handle_kernel

On 03/17, Thomas Garnier wrote:
>I tried multiple things to repro this crash without success:
> - Used the config on my existing qemu setup (boot fine)
> - Add most of the command-line (boot fine)
> - Try to run the script on a dedicated machine and it seems it is
>really tailored for your setup. I had errors with usernames and cpio
>crashing.

Could you paste the error log?
I suspect it was caused by job-script saved as dos format, you may try
`dos2unix job-script` before "lkp qemu" to see whether it works.

Thanks,
Xiaolong
>
>Any additional information you could share? (RIP -> function name,
>callstack etc..?)
>
>Thanks,
>
>On Fri, Mar 17, 2017 at 4:59 AM, kernel test robot
><[email protected]> wrote:
>>
>> FYI, we noticed the following commit:
>>
>> commit: 45fc8757d1d2128e342b4e7ef39adedf7752faac ("x86: Make the GDT remapping read-only on 64-bit")
>> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
>>
>> in testcase: boot
>>
>> on test machine: qemu-system-x86_64 -enable-kvm -m 420M
>>
>> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>>
>>
>> +------------------------------------------+------------+------------+
>> | | 69218e4799 | 45fc8757d1 |
>> +------------------------------------------+------------+------------+
>> | boot_successes | 8 | 2 |
>> | boot_failures | 0 | 11 |
>> | BUG:unable_to_handle_kernel | 0 | 11 |
>> | Oops:#[##] | 0 | 11 |
>> | Kernel_panic-not_syncing:Fatal_exception | 0 | 11 |
>> +------------------------------------------+------------+------------+
>>
>>
>>
>> [ 4.347219] BUG: unable to handle kernel paging request at ffffffffff577060
>> [ 4.349770] IP: 0xf77e91ed
>> [ 4.351365] PGD 1e0c067
>> [ 4.351366] P4D 1e0c067
>> [ 4.352885] PUD 1e0e067
>> [ 4.354421] PMD 1e0f067
>> [ 4.355947] PTE 800000000be09161
>> [ 4.357457]
>> [ 4.360480] Oops: 0003 [#1] SMP
>> [ 4.362150] Modules linked in:
>> [ 4.363816] CPU: 0 PID: 1 Comm: init Not tainted 4.11.0-rc2-00014-g45fc875 #15
>> [ 4.367207] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
>> [ 4.371277] task: ffff88000b9a8000 task.stack: ffffc900000d0000
>> [ 4.373550] RIP: 0023:0xf77e91ed
>> [ 4.375284] RSP: 002b:00000000ffed034c EFLAGS: 00010246
>> [ 4.377409] RAX: 0000000000000063 RBX: 00000000f77edff0 RCX: 00000000ffed034c
>> [ 4.379996] RDX: 00000000f77e1690 RSI: 00000000f77ee094 RDI: 000000000000000c
>> [ 4.382588] RBP: 00000000ffed0368 R08: 0000000000000000 R09: 0000000000000000
>> [ 4.385136] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>> [ 4.387709] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>> [ 4.390289] FS: 0000000000000000(0000) GS:ffff88000be00000(0000) knlGS:0000000000000000
>> [ 4.393870] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
>> [ 4.396131] CR2: ffffffffff577060 CR3: 0000000019d08000 CR4: 00000000000006f0
>> [ 4.398696] RIP: 0xf77e91ed RSP: 00000000ffed034c
>> [ 4.400716] CR2: ffffffffff577060
>> [ 4.402425] ---[ end trace 35060e6ad8052d5b ]---
>>
>>
>> To reproduce:
>>
>> git clone https://github.com/01org/lkp-tests.git
>> cd lkp-tests
>> bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
>>
>>
>>
>> Thanks,
>> Kernel Test Robot
>
>
>
>--
>Thomas

2017-03-21 16:11:19

by Thomas Garnier

[permalink] [raw]
Subject: Re: [x86] 45fc8757d1: BUG:unable_to_handle_kernel

On Sun, Mar 19, 2017 at 6:40 PM, Ye Xiaolong <[email protected]> wrote:
> Could you paste the error log?
> I suspect it was caused by job-script saved as dos format, you may try
> `dos2unix job-script` before "lkp qemu" to see whether it works.
>

You were right, I had some strange '\n' error in the middle of a lot
of them. dos2unix on the job script fixed the issue.

--
Thomas