2018-01-05 05:16:42

by Xishi Qiu

[permalink] [raw]
Subject: [RFC] boot failed when enable KAISER/KPTI

I run the latest RHEL 7.2 with the KAISER/KPTI patch, and boot failed.

...
[ 0.000000] PM: Registered nosave memory: [mem 0x81000000000-0x8ffffffffff]
[ 0.000000] PM: Registered nosave memory: [mem 0x91000000000-0xfffffffffff]
[ 0.000000] PM: Registered nosave memory: [mem 0x101000000000-0x10ffffffffff]
[ 0.000000] PM: Registered nosave memory: [mem 0x111000000000-0x17ffffffffff]
[ 0.000000] PM: Regitered nosave memory: [mem 0x181000000000-0x18ffffffffff]
[ 0.000000] e820: [mem 0x90000000-0xfed1bfff] available for PCI devices
[ 0.000000] Booting paravirtualized kernel on bare hardware
[ 0.000000] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:1536 nr_cpu_ids:1536 nr_node_ids:8
[ 0.000000] PERCPU: max_distance=0x180ffe240000 too large for vmalloc space 0x1fffffffffff
[ 0.000000] setup_percpu: auto allocator failed (-22), falling back to page size
[ 0.000000] PERCPU: 32 4K pages/cpu @ffffc90000000000 s107200 r8192 d15680
[ 0.000000] Built 8 zonelists in Zone order, mobility grouping on. Total pages: 132001804
[ 0.000000] Policy zone: Normal
iosdevname=0 8250.nr_uarts=8 efi=old_map rdloaddriver=usb_storage rdloaddriver=sd_mod udev.event-timeout=600 softlockup_panic=0 rcupdate.rcu_cpu_stall_timeout=300
[ 0.000000] Intel-IOMMU: enabled
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
[ 0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340
[ 0.000000] AGP: Checking aperture...
[ 0.000000] AGP: No AGP bridge found
[ 0.000000] Memory: 526901612k/26910638080k available (6528k kernel code, 26374249692k absent, 9486776k reserved, 4302k data, 1676k init)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1536, Nodes=8
[ 0.000000] x86/pti: Unmapping kernel while in userspace
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=1536.
[ 0.000000] Offload RCU callbacks from all CPUs
[ 0.000000] Offload RCU callbacks from CPUs: 0-1535.
[ 0.000000] NR_IRQS:327936 nr_irqs:15976 0
[ 0.000000] Console: colour dummy device 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] console [ttyS0] enabled
[ 0.000000] allocated 2145910784 bytes of page_cgroup
[ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[ 0.000000] Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] tsc: Detected 2799.999 MHz processor
[ 0.001803] Calibrating delay loop (skipped), value calculated using timer frequency.. 5599.99 BogoMIPS (lpj=2799999)
[ 0.012408] pid_max: default: 1572864 minimum: 12288
[ 0.017987] init_memory_mapping: [mem 0x5947f000-0x5b47efff]
[ 0.023701] init_memory_mapping: [mem 0x5b47f000-0x5b87efff]
[ 0.029369] init_memory_mapping: [mem 0x6d368000-0x6d3edfff]
[ 0.039130] BUG: unable to handle kernel paging request at 000000005b835f90
[ 0.046101] IP: [<000000005b835f90>] 0x5b835f8f
[ 0.050637] PGD 8000000001f61067 PUD 190ffefff067 PMD 190ffeffd067 PTE 5b835063
[ 0.057989] Oops: 0011 [#1] SMP
[ 0.061241] Modules linked in:
[ 0.064304] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-327.59.59.46.h42.x86_64 #1
[ 0.072280] Hardware name: Huawei FusionServer9032/IT91SMUB, BIOS BLXSV316 11/14/2017
[ 0.080082] task: ffffffff8196e440 ti: ffffffff81958000 task.ti: ffffffff81958000
[ 0.087539] RIP: 0010:[<000000005b835f90>] [<000000005b835f90>] 0x5b835f8f
[ 0.094494] RSP: 0000:ffffffff8195be28 EFLAGS: 00010046
[ 0.099788] RAX: 0000000080050033 RBX: ffff910fbc802000 RCX: 00000000000002d0
[ 0.106897] RDX: 0000000000000030 RSI: 00000000000002d0 RDI: 000000005b835f90
[ 0.114006] RBP: ffffffff8195bf38 R08: 0000000000000001 R09: 0000090fbc802000
[ 0.121116] R10: ffff88ffbcc07340 R11: 0000000000000001 R12: 0000000000000001
[ 0.128225] R13: 0000090fbc802000 R14: 00000000000002d0 R15: 0000000000000001
[ 0.135336] FS: 0000000000000000(0000) GS:ffffc90000000000(0000) knlGS:0000000000000000
[ 0.143398] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.149124] CR2: 000000005b835f90 CR3: 0000000001966000 CR4: 00000000000606b0
[ 0.156234] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.163344] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.170454] Call Trace:
[ 0.172899] [<ffffffff8107512c>] ? efi_call4+0x6c/0xf0
[ 0.178108] [<ffffffff8105b3fe>] ? native_flush_tlb_global+0x8e/0xc0
[ 0.184527] [<ffffffff810652b3>] ? set_memory_x+0x43/0x50
[ 0.189997] [<ffffffff81acf91f>] ? efi_enter_virtual_mode+0x3bc/0x538
[ 0.196505] [<ffffffff81ab104b>] start_kernel+0x39f/0x44f
[ 0.201972] [<ffffffff81ab0ab5>] ? repair_env_string+0x5c/0x5c
[ 0.207872] [<ffffffff81ab0120>] ? early_idt_handlers+0x120/0x120
[ 0.214030] [<ffffffff81ab066c>] x86_64_start_reservations+0x2a/0x2c
[ 0.220449] [<ffffffff81ab07c0>] x86_64_start_kernel+0x152/0x175
[ 0.226521] Code: Bad RIP value.
[ 0.229860] RIP [<000000005b835f90>] 0x5b835f8f
[ 0.234478] RSP <ffffffff8195be28>
[ 0.237955] CR2: 000000005b835f90
[ 0.241266] ---[ end trace 8178226af3e802ca ]---
[ 0.245869] Kernel panic - not syncing: Fatal exception


2018-01-05 18:33:15

by Jiri Kosina

[permalink] [raw]
Subject: Re: [RFC] boot failed when enable KAISER/KPTI

On Fri, 5 Jan 2018, Xishi Qiu wrote:

> I run the latest RHEL 7.2 with the KAISER/KPTI patch, and boot failed.
>
> ...
> [ 0.000000] PM: Registered nosave memory: [mem 0x81000000000-0x8ffffffffff]
> [ 0.000000] PM: Registered nosave memory: [mem 0x91000000000-0xfffffffffff]
> [ 0.000000] PM: Registered nosave memory: [mem 0x101000000000-0x10ffffffffff]
> [ 0.000000] PM: Registered nosave memory: [mem 0x111000000000-0x17ffffffffff]
> [ 0.000000] PM: Regitered nosave memory: [mem 0x181000000000-0x18ffffffffff]
> [ 0.000000] e820: [mem 0x90000000-0xfed1bfff] available for PCI devices
> [ 0.000000] Booting paravirtualized kernel on bare hardware
> [ 0.000000] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:1536 nr_cpu_ids:1536 nr_node_ids:8
> [ 0.000000] PERCPU: max_distance=0x180ffe240000 too large for vmalloc space 0x1fffffffffff
> [ 0.000000] setup_percpu: auto allocator failed (-22), falling back to page size
> [ 0.000000] PERCPU: 32 4K pages/cpu @ffffc90000000000 s107200 r8192 d15680
> [ 0.000000] Built 8 zonelists in Zone order, mobility grouping on. Total pages: 132001804
> [ 0.000000] Policy zone: Normal
> iosdevname=0 8250.nr_uarts=8 efi=old_map rdloaddriver=usb_storage rdloaddriver=sd_mod udev.event-timeout=600 softlockup_panic=0 rcupdate.rcu_cpu_stall_timeout=300
> [ 0.000000] Intel-IOMMU: enabled
> [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
> [ 0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
> [ 0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340
> [ 0.000000] AGP: Checking aperture...
> [ 0.000000] AGP: No AGP bridge found
> [ 0.000000] Memory: 526901612k/26910638080k available (6528k kernel code, 26374249692k absent, 9486776k reserved, 4302k data, 1676k init)
> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1536, Nodes=8
> [ 0.000000] x86/pti: Unmapping kernel while in userspace
> [ 0.000000] Hierarchical RCU implementation.
> [ 0.000000] RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=1536.
> [ 0.000000] Offload RCU callbacks from all CPUs
> [ 0.000000] Offload RCU callbacks from CPUs: 0-1535.
> [ 0.000000] NR_IRQS:327936 nr_irqs:15976 0
> [ 0.000000] Console: colour dummy device 80x25
> [ 0.000000] console [tty0] enabled
> [ 0.000000] console [ttyS0] enabled
> [ 0.000000] allocated 2145910784 bytes of page_cgroup
> [ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
> [ 0.000000] Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
> [ 0.000000] tsc: Fast TSC calibration using PIT
> [ 0.000000] tsc: Detected 2799.999 MHz processor
> [ 0.001803] Calibrating delay loop (skipped), value calculated using timer frequency.. 5599.99 BogoMIPS (lpj=2799999)
> [ 0.012408] pid_max: default: 1572864 minimum: 12288
> [ 0.017987] init_memory_mapping: [mem 0x5947f000-0x5b47efff]
> [ 0.023701] init_memory_mapping: [mem 0x5b47f000-0x5b87efff]
> [ 0.029369] init_memory_mapping: [mem 0x6d368000-0x6d3edfff]
> [ 0.039130] BUG: unable to handle kernel paging request at 000000005b835f90
> [ 0.046101] IP: [<000000005b835f90>] 0x5b835f8f
> [ 0.050637] PGD 8000000001f61067 PUD 190ffefff067 PMD 190ffeffd067 PTE 5b835063
> [ 0.057989] Oops: 0011 [#1] SMP
> [ 0.061241] Modules linked in:
> [ 0.064304] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-327.59.59.46.h42.x86_64 #1
> [ 0.072280] Hardware name: Huawei FusionServer9032/IT91SMUB, BIOS BLXSV316 11/14/2017
> [ 0.080082] task: ffffffff8196e440 ti: ffffffff81958000 task.ti: ffffffff81958000
> [ 0.087539] RIP: 0010:[<000000005b835f90>] [<000000005b835f90>] 0x5b835f8f
> [ 0.094494] RSP: 0000:ffffffff8195be28 EFLAGS: 00010046
> [ 0.099788] RAX: 0000000080050033 RBX: ffff910fbc802000 RCX: 00000000000002d0
> [ 0.106897] RDX: 0000000000000030 RSI: 00000000000002d0 RDI: 000000005b835f90
> [ 0.114006] RBP: ffffffff8195bf38 R08: 0000000000000001 R09: 0000090fbc802000
> [ 0.121116] R10: ffff88ffbcc07340 R11: 0000000000000001 R12: 0000000000000001
> [ 0.128225] R13: 0000090fbc802000 R14: 00000000000002d0 R15: 0000000000000001
> [ 0.135336] FS: 0000000000000000(0000) GS:ffffc90000000000(0000) knlGS:0000000000000000
> [ 0.143398] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.149124] CR2: 000000005b835f90 CR3: 0000000001966000 CR4: 00000000000606b0
> [ 0.156234] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 0.163344] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 0.170454] Call Trace:
> [ 0.172899] [<ffffffff8107512c>] ? efi_call4+0x6c/0xf0

EFI old memmap have NX bit set. Immediate workaround is remove that
cmdline parameter. I have also submitted proposed fix here:

http://lkml.kernel.org/r/[email protected]

--
Jiri Kosina
SUSE Labs

2018-01-06 06:49:07

by Xishi Qiu

[permalink] [raw]
Subject: Re: [RFC] boot failed when enable KAISER/KPTI

On 2018/1/6 2:33, Jiri Kosina wrote:

> On Fri, 5 Jan 2018, Xishi Qiu wrote:
>
>> I run the latest RHEL 7.2 with the KAISER/KPTI patch, and boot failed.
>>
>> ...
>> [ 0.000000] PM: Registered nosave memory: [mem 0x81000000000-0x8ffffffffff]
>> [ 0.000000] PM: Registered nosave memory: [mem 0x91000000000-0xfffffffffff]
>> [ 0.000000] PM: Registered nosave memory: [mem 0x101000000000-0x10ffffffffff]
>> [ 0.000000] PM: Registered nosave memory: [mem 0x111000000000-0x17ffffffffff]
>> [ 0.000000] PM: Regitered nosave memory: [mem 0x181000000000-0x18ffffffffff]
>> [ 0.000000] e820: [mem 0x90000000-0xfed1bfff] available for PCI devices
>> [ 0.000000] Booting paravirtualized kernel on bare hardware
>> [ 0.000000] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:1536 nr_cpu_ids:1536 nr_node_ids:8
>> [ 0.000000] PERCPU: max_distance=0x180ffe240000 too large for vmalloc space 0x1fffffffffff
>> [ 0.000000] setup_percpu: auto allocator failed (-22), falling back to page size
>> [ 0.000000] PERCPU: 32 4K pages/cpu @ffffc90000000000 s107200 r8192 d15680
>> [ 0.000000] Built 8 zonelists in Zone order, mobility grouping on. Total pages: 132001804
>> [ 0.000000] Policy zone: Normal
>> iosdevname=0 8250.nr_uarts=8 efi=old_map rdloaddriver=usb_storage rdloaddriver=sd_mod udev.event-timeout=600 softlockup_panic=0 rcupdate.rcu_cpu_stall_timeout=300
>> [ 0.000000] Intel-IOMMU: enabled
>> [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
>> [ 0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
>> [ 0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340
>> [ 0.000000] AGP: Checking aperture...
>> [ 0.000000] AGP: No AGP bridge found
>> [ 0.000000] Memory: 526901612k/26910638080k available (6528k kernel code, 26374249692k absent, 9486776k reserved, 4302k data, 1676k init)
>> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1536, Nodes=8
>> [ 0.000000] x86/pti: Unmapping kernel while in userspace
>> [ 0.000000] Hierarchical RCU implementation.
>> [ 0.000000] RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=1536.
>> [ 0.000000] Offload RCU callbacks from all CPUs
>> [ 0.000000] Offload RCU callbacks from CPUs: 0-1535.
>> [ 0.000000] NR_IRQS:327936 nr_irqs:15976 0
>> [ 0.000000] Console: colour dummy device 80x25
>> [ 0.000000] console [tty0] enabled
>> [ 0.000000] console [ttyS0] enabled
>> [ 0.000000] allocated 2145910784 bytes of page_cgroup
>> [ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
>> [ 0.000000] Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
>> [ 0.000000] tsc: Fast TSC calibration using PIT
>> [ 0.000000] tsc: Detected 2799.999 MHz processor
>> [ 0.001803] Calibrating delay loop (skipped), value calculated using timer frequency.. 5599.99 BogoMIPS (lpj=2799999)
>> [ 0.012408] pid_max: default: 1572864 minimum: 12288
>> [ 0.017987] init_memory_mapping: [mem 0x5947f000-0x5b47efff]
>> [ 0.023701] init_memory_mapping: [mem 0x5b47f000-0x5b87efff]
>> [ 0.029369] init_memory_mapping: [mem 0x6d368000-0x6d3edfff]
>> [ 0.039130] BUG: unable to handle kernel paging request at 000000005b835f90
>> [ 0.046101] IP: [<000000005b835f90>] 0x5b835f8f
>> [ 0.050637] PGD 8000000001f61067 PUD 190ffefff067 PMD 190ffeffd067 PTE 5b835063
>> [ 0.057989] Oops: 0011 [#1] SMP
>> [ 0.061241] Modules linked in:
>> [ 0.064304] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-327.59.59.46.h42.x86_64 #1
>> [ 0.072280] Hardware name: Huawei FusionServer9032/IT91SMUB, BIOS BLXSV316 11/14/2017
>> [ 0.080082] task: ffffffff8196e440 ti: ffffffff81958000 task.ti: ffffffff81958000
>> [ 0.087539] RIP: 0010:[<000000005b835f90>] [<000000005b835f90>] 0x5b835f8f
>> [ 0.094494] RSP: 0000:ffffffff8195be28 EFLAGS: 00010046
>> [ 0.099788] RAX: 0000000080050033 RBX: ffff910fbc802000 RCX: 00000000000002d0
>> [ 0.106897] RDX: 0000000000000030 RSI: 00000000000002d0 RDI: 000000005b835f90
>> [ 0.114006] RBP: ffffffff8195bf38 R08: 0000000000000001 R09: 0000090fbc802000
>> [ 0.121116] R10: ffff88ffbcc07340 R11: 0000000000000001 R12: 0000000000000001
>> [ 0.128225] R13: 0000090fbc802000 R14: 00000000000002d0 R15: 0000000000000001
>> [ 0.135336] FS: 0000000000000000(0000) GS:ffffc90000000000(0000) knlGS:0000000000000000
>> [ 0.143398] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 0.149124] CR2: 000000005b835f90 CR3: 0000000001966000 CR4: 00000000000606b0
>> [ 0.156234] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 0.163344] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 0.170454] Call Trace:
>> [ 0.172899] [<ffffffff8107512c>] ? efi_call4+0x6c/0xf0
>
> EFI old memmap have NX bit set. Immediate workaround is remove that
> cmdline parameter. I have also submitted proposed fix here:
>
> http://lkml.kernel.org/r/[email protected]
>

Hi Jiri,

Your patch fix the boot problem from efi, thanks.

I have another problem during reboot, it seems the same cause(NX flag).

How about this fix patch? I tested and it works.

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index 088681d..f6c32f5 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -131,6 +131,8 @@ static int map_tboot_page(unsigned long vaddr, unsigned long pfn,
pud = pud_alloc(&tboot_mm, pgd, vaddr);
if (!pud)
return -1;
+ if (__supported_pte_mask & _PAGE_NX)
+ pgd->pgd &= ~_PAGE_NX;
pmd = pmd_alloc(&tboot_mm, pud, vaddr);
if (!pmd)
return -1;

Here is the failed log.
...
[ 1911.622675] BUG: unable to handle kernel paging request at 00000000008041c0
[ 1911.629880] IP: [<00000000008041c0>] 0x8041bf
[ 1911.634389] PGD 80000010272cb067 PUD 2025178067 PMD 10272d8067 PTE 804063
[ 1911.641472] Oops: 0011 [#1] SMP
[ 1911.644847] oom or die happens after reboot! last event=0x24, last cpu=0.
[ 1911.651833] event maps(bit5-bit0): die-oom-intermit-reboot-emerge-panic
[ 1911.660868] collected_len = 100273, LOG_BUF_LEN_LOCAL = 1048576
[ 1911.698656] kbox: notify die begin
[ 1911.702156] kbox: no notify die func register. no need to notify
[ 1911.708336] do nothing after die!
[ 1911.711748] Modules linked in: bum(O) ip_set nfnetlink prio(O) nat(O) vport_vxlan(O) openvswitch(O) nf_defrag_ipv6 gre kboxdriver(O) kbox(O) signo_catch(O) vfat fat tg3 intel_powerclamp coretemp intel_rapl crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel i2c_i801 kvm_intel(O) ptp lrw gf128mul i2c_core glue_helper ablk_helper pps_core kvm(O) cryptd iTCO_wdt iTCO_vendor_support sg pcspkr lpc_ich mfd_core sb_edac mei_me edac_core mei shpchp acpi_power_meter acpi_pad remote_trigger(O) nf_conntrack_ipv4 nf_defrag_ipv4 vhost_net(O) tun(O) vhost(O) macvtap macvlan vfio_pci irqbypass vfio_iommu_type1 vfio xt_sctp nf_conntrack_proto_sctp nf_nat_proto_sctp nf_nat nf_conntrack sctp libcrc32c ip_tables ext3 mbcache jbd sr_mod sd_mod cdrom lpfc crc_t10dif ahci crct10dif_generic crct10dif_pclmul libahci scsi_transport_fc scsi_tgt crct10dif_common libata usb_storage megaraid_sas dm_mod [last unloaded: dev_connlimit]
[ 1911.796711] CPU: 0 PID: 12033 Comm: reboot Tainted: G OE ---- ------- 3.10.0-327.61.59.66_22.x86_64 #1
[ 1911.807449] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.79 11/07/2017
[ 1911.814702] task: ffff881025a91700 ti: ffff8810267fc000 task.ti: ffff8810267fc000
[ 1911.822401] RIP: 0010:[<00000000008041c0>] [<00000000008041c0>] 0x8041bf
[ 1911.829407] RSP: 0018:ffff8810267ffd50 EFLAGS: 00010086
[ 1911.834877] RAX: 00000000008041c0 RBX: 0000000000000000 RCX: ffffffffff425000
[ 1911.842220] RDX: ffff8820a4e40000 RSI: 000000000000c000 RDI: 0000002024e40000
[ 1911.849563] RBP: ffff8810267ffd60 R08: ffff882024e40000 R09: 0000000000000000
[ 1911.856908] R10: ffffffff81a8f300 R11: ffff8810267ffaae R12: 0000000028121969
[ 1911.864250] R13: ffffffff819aa8a0 R14: 0000000000000cf9 R15: 0000000000000000
[ 1911.871596] FS: 00007f89d6143880(0000) GS:ffff881040400000(0000) knlGS:0000000000000000
[ 1911.879921] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1911.885836] CR2: 00000000008041c0 CR3: 0000002024e40000 CR4: 00000000001607f0
[ 1911.893180] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1911.900522] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1911.907863] Call Trace:
[ 1911.910384] [<ffffffff810241ab>] ? tboot_shutdown+0x5b/0x140
[ 1911.916298] [<ffffffff8104723c>] native_machine_emergency_restart+0x4c/0x250
[ 1911.923641] [<ffffffff8104c102>] ? disconnect_bsp_APIC+0x82/0xc0
[ 1911.929913] [<ffffffff81046e17>] native_machine_restart+0x37/0x40
[ 1911.936273] [<ffffffff810470ef>] machine_restart+0xf/0x20
[ 1911.941923] [<ffffffff8109af95>] kernel_restart+0x45/0x60
[ 1911.947570] [<ffffffff8109b1d9>] SYSC_reboot+0x229/0x260
[ 1911.953132] [<ffffffff811ef665>] ? vfs_writev+0x35/0x60
[ 1911.958603] [<ffffffff8109b27e>] SyS_reboot+0xe/0x10
[ 1911.963806] [<ffffffff8165e43d>] system_call_fastpath+0x16/0x1b
[ 1911.969987] Code: Bad RIP value.
[ 1911.973448] RIP [<00000000008041c0>] 0x8041bf
[ 1911.978044] RSP <ffff8810267ffd50>
[ 1911.990106] CR2: 00000000008041c0
[ 1912.001889] ---[ end trace e8475aee26ff7d9f ]---
[ 1912.408111] Kernel panic - not syncing: Fatal exception


2018-01-06 17:37:33

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [RFC] boot failed when enable KAISER/KPTI

Hello Xishi,

On Sat, Jan 06, 2018 at 02:45:30PM +0800, Xishi Qiu wrote:
> How about this fix patch? I tested and it works.
>
> diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
> index 088681d..f6c32f5 100644
> --- a/arch/x86/kernel/tboot.c
> +++ b/arch/x86/kernel/tboot.c
> @@ -131,6 +131,8 @@ static int map_tboot_page(unsigned long vaddr, unsigned long pfn,
> pud = pud_alloc(&tboot_mm, pgd, vaddr);
> if (!pud)
> return -1;
> + if (__supported_pte_mask & _PAGE_NX)
> + pgd->pgd &= ~_PAGE_NX;
> pmd = pmd_alloc(&tboot_mm, pud, vaddr);
> if (!pmd)
> return -1;

Oh great that you already verified this.

The only difference from the above to what I applied is that I didn't
check "__supported_pte_mask & _PAGE_NX", but that's superflous
here. It won't hurt to add it, your patch is fine as well.

The location where to do the NX clearing is the correct one and same
optimal place as in efi_64.c too (right after pud_alloc success).

Only the setting of NX requires verification that it's in the
__supported_pte_mask first, the clearing is always fine (worst case it
will do nothing).

On a side note, I already verified if NX is disabled (-cpu nx=off) the
pgd isn't NX poisoned in the first place, but clearing NX won't hurt
even in such case.

Thanks,
Andrea