Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
exceptions/interrupts, to reduce speculation attack surface") unintendedly
broke Xen PV virtual machines by clearing the %rbx register at the end of
xen_failsafe_callback before the latter jumps to error_exit.
error_exit expects the %rbx register to be a flag indicating whether
there should be a return to kernel mode.
This commit makes sure that the %rbx register is not cleared by
the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated
by xen_failsafe_callback, to avoid the issue.
The impact of this bug includes (and most likely is not limited to) kernel
BUGs when wine is run in Xen PV virtual machines. One example is included
below. This example depicts the bug when wine is used to run TeamViewer
version 13's portable version under a Xen PV virtual machine using an
up-to-date version of Qubes OS R3.2.
[...........] kernel tried to execute NX-protected page - exploit attempt? (uid: 1000)
[ +0.000021] BUG: unable to handle kernel paging request at ffffffff82012480
[ +0.000020] IP: init_task+0x0/0x2ac0
[ +0.000009] PGD 200e067 P4D 200e067 PUD 200f067 PMD 2d5e067 PTE 8010000002012067
[ +0.000019] Oops: 0011 [#1] SMP DEBUG_PAGEALLOC NOPTI
[ +0.000015] Modules linked in: fuse bluetooth ecdh_generic rfkill ip6table_filter ip6_tables xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c intel_rapl x86_pkg_temp_thermal crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel xen_netfront intel_rapl_perf pcspkr binfmt_misc dummy_hcd udc_core xen_gntdev xen_gntalloc u2mfn(O) xen_blkback xenfs xen_evtchn xen_privcmd xen_blkfront
[ +0.000091] CPU: 0 PID: 1350 Comm: TeamViewer.exe Tainted: G O 4.14.20-11.d10d0bb86d #15
[ +0.000018] task: ffff8800042fda00 task.stack: ffffc900006f8000
[ +0.000015] RIP: e030:init_task+0x0/0x2ac0
[ +0.000009] RSP: e02b:ffffffff82003df8 EFLAGS: 00010002
[ +0.000012] RAX: 0000000000000040 RBX: 0000000000000004 RCX: 0000000000000000
[ +0.000015] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffffffff810011aa
[ +0.000015] RBP: 000000000033eb88 R08: 0000000000000000 R09: 0000000000000000
[ +0.000015] R10: 0000000000000000 R11: ffff88000d772600 R12: 0000000000000000
[ +0.000015] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ +0.000027] FS: 000000007ffd8000(0063) GS:ffff88000f400000(006b) knlGS:0000000000000000
[ +0.000016] CS: e033 DS: 002b ES: 002b CR0: 0000000080050033
[ +0.000014] CR2: ffffffff82012480 CR3: 0000000004880000 CR4: 0000000000042660
[ +0.000025] Call Trace:
[ +0.000007] Code: ff ff ff d0 5a 1e 81 ff ff ff ff 00 00 00 00 00 00 00 00 e0 d7 04 82 ff ff ff ff e8 98 c0 0d 00 88 ff ff 01 80 00 00 00 00 00 00 <00> 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ +0.000068] RIP: init_task+0x0/0x2ac0 RSP: ffffffff82003df8
[ +0.000011] CR2: ffffffff82012480
[ +0.000010] ---[ end trace 7fb0075d4247b2a2 ]---
The commit that introduces this bug was found with git-bisect in conjunction with
some scripts and Qubes OS's cross-VM RPCs for automation, and the correction was
prepared after a manual inspection of the changes introduced by the commit in
question.
To the best of my recollection, this issue is also reproducible in an Ubuntu
18.04 LTS dom0 instance with the version of the Xen hypervisor in Ubuntu's
package repositories.
Signed-off-by: M. Vefa Bicakci <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected] # for v4.14 and up
Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface")
---
arch/x86/entry/calling.h | 4 +++-
arch/x86/entry/entry_64.S | 2 +-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 20d0885b00fb..71795767da94 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -97,7 +97,7 @@ For 32-bit we have the following conventions - kernel is built with
#define SIZEOF_PTREGS 21*8
-.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
+.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0 clear_rbx=1
/*
* Push registers and sanitize registers of values that a
* speculation attack might otherwise want to exploit. The
@@ -127,7 +127,9 @@ For 32-bit we have the following conventions - kernel is built with
pushq %r11 /* pt_regs->r11 */
xorl %r11d, %r11d /* nospec r11*/
pushq %rbx /* pt_regs->rbx */
+ .if \clear_rbx
xorl %ebx, %ebx /* nospec rbx*/
+ .endif
pushq %rbp /* pt_regs->rbp */
xorl %ebp, %ebp /* nospec rbp*/
pushq %r12 /* pt_regs->r12 */
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c7449f377a77..96e8ff34129e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
addq $0x30, %rsp
UNWIND_HINT_IRET_REGS
pushq $-1 /* orig_ax = -1 => not a system call */
- PUSH_AND_CLEAR_REGS
+ PUSH_AND_CLEAR_REGS clear_rbx=0
ENCODE_FRAME_POINTER
jmp error_exit
END(xen_failsafe_callback)
--
2.17.1
Commit d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits
adjustment corruption") has moved the query and calculation of the
"x86_virt_bits" and "x86_phys_bits" fields of the "cpuinfo_x86" struct
from the "get_cpu_cap" function to a new function named
"get_cpu_address_sizes".
One of the call sites related to Xen PV VMs was unfortunately missed
in the aforementioned commit, and this prevents successful boot-up of
kernel versions 4.17 and up in Xen PV VMs.
Signed-off-by: M. Vefa Bicakci <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected] # for v4.17 and up
Fixes: d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
---
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/cpu/cpu.h | 1 +
arch/x86/xen/enlighten_pv.c | 1 +
3 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index eb4cb3efd20e..2322d0c4bfd2 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -911,7 +911,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
apply_forced_caps(c);
}
-static void get_cpu_address_sizes(struct cpuinfo_x86 *c)
+void get_cpu_address_sizes(struct cpuinfo_x86 *c)
{
u32 eax, ebx, ecx, edx;
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index 38216f678fc3..12a5f0cec0b2 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -46,6 +46,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[],
*const __x86_cpu_dev_end[];
extern void get_cpu_cap(struct cpuinfo_x86 *c);
+extern void get_cpu_address_sizes(struct cpuinfo_x86 *c);
extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
extern u32 get_scattered_cpuid_leaf(unsigned int level,
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 439a94bf89ad..87afb000142a 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1257,6 +1257,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
/* Work out if we support NX */
get_cpu_cap(&boot_cpu_data);
+ get_cpu_address_sizes(&boot_cpu_data);
x86_configure_nx();
/* Let's presume PV guests always boot on vCPU with id 0. */
--
2.17.1
On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index c7449f377a77..96e8ff34129e 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
> addq $0x30, %rsp
> UNWIND_HINT_IRET_REGS
> pushq $-1 /* orig_ax = -1 => not a system call */
> - PUSH_AND_CLEAR_REGS
> + PUSH_AND_CLEAR_REGS clear_rbx=0
Do we need this at all? We are returning from the hypervisor here.
-boris
> ENCODE_FRAME_POINTER
> jmp error_exit
> END(xen_failsafe_callback)
On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index 439a94bf89ad..87afb000142a 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -1257,6 +1257,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
>
> /* Work out if we support NX */
> get_cpu_cap(&boot_cpu_data);
> + get_cpu_address_sizes(&boot_cpu_data);
> x86_configure_nx();
Have you observed any problems without this call? get_cpu_cap() is only
called here to set X86_FEATURE_NX, and is then called again, together
with get_cpu_address_sizes(), from early_identify_cpu().
-boris
On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci <[email protected]> wrote:
> Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
> exceptions/interrupts, to reduce speculation attack surface") unintendedly
> broke Xen PV virtual machines by clearing the %rbx register at the end of
> xen_failsafe_callback before the latter jumps to error_exit.
> error_exit expects the %rbx register to be a flag indicating whether
> there should be a return to kernel mode.
>
> This commit makes sure that the %rbx register is not cleared by
> the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated
> by xen_failsafe_callback, to avoid the issue.
Seems like a genuine problem, but:
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index c7449f377a77..96e8ff34129e 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
> addq $0x30, %rsp
> UNWIND_HINT_IRET_REGS
> pushq $-1 /* orig_ax = -1 => not a system call */
> - PUSH_AND_CLEAR_REGS
> + PUSH_AND_CLEAR_REGS clear_rbx=0
> ENCODE_FRAME_POINTER
> jmp error_exit
The old code first set RBX to zero then, if frame pointers are on,
sets it to some special non-zero value, then crosses its fingers and
hopes for the best. Your patched code just skips the zeroing part, so
RBX either contains the ENCODE_FRAME_POINTER result or is
uninitialized.
How about actually initializing rbx to something sensible like, say, 1?
On 07/21/2018 05:25 PM, Boris Ostrovsky wrote:
> On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
>> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
>> index 439a94bf89ad..87afb000142a 100644
>> --- a/arch/x86/xen/enlighten_pv.c
>> +++ b/arch/x86/xen/enlighten_pv.c
>> @@ -1257,6 +1257,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
>>
>> /* Work out if we support NX */
>> get_cpu_cap(&boot_cpu_data);
>> + get_cpu_address_sizes(&boot_cpu_data);
>> x86_configure_nx();
>
>
> Have you observed any problems without this call? get_cpu_cap() is only
> called here to set X86_FEATURE_NX, and is then called again, together
> with get_cpu_address_sizes(), from early_identify_cpu().
Hello Boris,
Thank you for the reviews! Without the call to get_cpu_address_sizes,
paravirtualized virtual machines do not boot up kernels with versions
4.17 and up at all; this includes dom0 and domU. No domU logs are
generated in dom0's /var/log/xen/console/ directory either, despite
having earlyprintk=xen on the kernel command line for my test domU.
(For the record, I am using the patched version of Xen 4.6.6 provided
by Qubes OS R3.2.)
Thank you,
Vefa
On 07/21/2018 05:19 PM, Boris Ostrovsky wrote:
> On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index c7449f377a77..96e8ff34129e 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
>> addq $0x30, %rsp
>> UNWIND_HINT_IRET_REGS
>> pushq $-1 /* orig_ax = -1 => not a system call */
>> - PUSH_AND_CLEAR_REGS
>> + PUSH_AND_CLEAR_REGS clear_rbx=0
>
>
> Do we need this at all? We are returning from the hypervisor here.
>
> -boris
>
>> ENCODE_FRAME_POINTER
>> jmp error_exit
>> END(xen_failsafe_callback)
Hello Boris,
If you are referring to the PUSH_AND_CLEAR_REGS macro itself, I am not sure;
however, not clearing the RBX register seemed to resolve the issues mentioned
in the commit message for me. Given Andy's comment though, I believe that the
approach in this patch may not be correct.
Thank you,
Vefa
On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
> On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci <[email protected]> wrote:
>> Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
>> exceptions/interrupts, to reduce speculation attack surface") unintendedly
>> broke Xen PV virtual machines by clearing the %rbx register at the end of
>> xen_failsafe_callback before the latter jumps to error_exit.
>> error_exit expects the %rbx register to be a flag indicating whether
>> there should be a return to kernel mode.
>>
>> This commit makes sure that the %rbx register is not cleared by
>> the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated
>> by xen_failsafe_callback, to avoid the issue.
>
> Seems like a genuine problem, but:
>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index c7449f377a77..96e8ff34129e 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
>> addq $0x30, %rsp
>> UNWIND_HINT_IRET_REGS
>> pushq $-1 /* orig_ax = -1 => not a system call */
>> - PUSH_AND_CLEAR_REGS
>> + PUSH_AND_CLEAR_REGS clear_rbx=0
>> ENCODE_FRAME_POINTER
>> jmp error_exit
>
> The old code first set RBX to zero then, if frame pointers are on,
> sets it to some special non-zero value, then crosses its fingers and
> hopes for the best. Your patched code just skips the zeroing part, so
> RBX either contains the ENCODE_FRAME_POINTER result or is
> uninitialized.
>
> How about actually initializing rbx to something sensible like, say, 1?
Hello Andy,
Thank you for the review! Apparently, I have not done my homework fully.
I will test your suggestion and report back, most likely in a few hours.
I have been testing with the next/linux-next tree's master branch
(dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the
frame pointer (i.e., RBP) register, as opposed to the RBX register,
which the patch aims to avoid changing before jumping to error_exit.
It is possible that I am missing something though -- I am not sure about
the connection between the RBP and RBX registers.
The change introduced by commit 3ac6d8c787b8 is in the excerpt below. Would it
be valid to state that the original code had the same issue that you referred
to (i.e., leaving the RBX register uninitialized)?
[I also realized that I forgot to include Andi Kleen and Dan Williams, authors
of 3ac6d8c787b8, in the discussion; I am copying this e-mail to them as well.]
Thank you,
Vefa
$ git show -W 3ac6d8c787b8 -- arch/x86/entry/entry_64.S
...
ENTRY(xen_failsafe_callback)
UNWIND_HINT_EMPTY
movl %ds, %ecx
cmpw %cx, 0x10(%rsp)
jne 1f
movl %es, %ecx
cmpw %cx, 0x18(%rsp)
jne 1f
movl %fs, %ecx
cmpw %cx, 0x20(%rsp)
jne 1f
movl %gs, %ecx
cmpw %cx, 0x28(%rsp)
jne 1f
/* All segments match their saved values => Category 2 (Bad IRET). */
movq (%rsp), %rcx
movq 8(%rsp), %r11
addq $0x30, %rsp
pushq $0 /* RIP */
UNWIND_HINT_IRET_REGS offset=8
jmp general_protection
1: /* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */
movq (%rsp), %rcx
movq 8(%rsp), %r11
addq $0x30, %rsp
UNWIND_HINT_IRET_REGS
pushq $-1 /* orig_ax = -1 => not a system call */
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
+ CLEAR_REGS_NOSPEC
ENCODE_FRAME_POINTER
jmp error_exit
END(xen_failsafe_callback)
...
On Sat, Jul 21, 2018 at 4:19 PM, M. Vefa Bicakci <[email protected]> wrote:
> On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
>>
>> On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci <[email protected]>
>> wrote:
>>>
>>> Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
>>> exceptions/interrupts, to reduce speculation attack surface")
>>> unintendedly
>>> broke Xen PV virtual machines by clearing the %rbx register at the end of
>>> xen_failsafe_callback before the latter jumps to error_exit.
>>> error_exit expects the %rbx register to be a flag indicating whether
>>> there should be a return to kernel mode.
>>>
>>> This commit makes sure that the %rbx register is not cleared by
>>> the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated
>>> by xen_failsafe_callback, to avoid the issue.
>>
>>
>> Seems like a genuine problem, but:
>>
>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>>> index c7449f377a77..96e8ff34129e 100644
>>> --- a/arch/x86/entry/entry_64.S
>>> +++ b/arch/x86/entry/entry_64.S
>>> @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
>>> addq $0x30, %rsp
>>> UNWIND_HINT_IRET_REGS
>>> pushq $-1 /* orig_ax = -1 => not a system call */
>>> - PUSH_AND_CLEAR_REGS
>>> + PUSH_AND_CLEAR_REGS clear_rbx=0
>>> ENCODE_FRAME_POINTER
>>> jmp error_exit
>>
>>
>> The old code first set RBX to zero then, if frame pointers are on,
>> sets it to some special non-zero value, then crosses its fingers and
>> hopes for the best. Your patched code just skips the zeroing part, so
>> RBX either contains the ENCODE_FRAME_POINTER result or is
>> uninitialized.
>>
>> How about actually initializing rbx to something sensible like, say, 1?
>
>
> Hello Andy,
>
> Thank you for the review! Apparently, I have not done my homework fully.
> I will test your suggestion and report back, most likely in a few hours.
>
> I have been testing with the next/linux-next tree's master branch
> (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the
> frame pointer (i.e., RBP) register, as opposed to the RBX register,
> which the patch aims to avoid changing before jumping to error_exit.
> It is possible that I am missing something though -- I am not sure about
> the connection between the RBP and RBX registers.
Sorry, brain fart on my part.
>
> The change introduced by commit 3ac6d8c787b8 is in the excerpt below. Would
> it
> be valid to state that the original code had the same issue that you
> referred
> to (i.e., leaving the RBX register uninitialized)?
Presumably.
I would propose a rather different fix:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=bb3d76b50c3bc78b67d79cf90d328f38a435c793
Any chance you could test that and see if it fixes your problem?
On 07/21/2018 07:30 PM, Andy Lutomirski wrote:
> On Sat, Jul 21, 2018 at 4:19 PM, M. Vefa Bicakci <[email protected]> wrote:
>> On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
>>>
>>> On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci <[email protected]>
>>> wrote:
>>>>
>>>> Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
>>>> exceptions/interrupts, to reduce speculation attack surface")
>>>> unintendedly
>>>> broke Xen PV virtual machines by clearing the %rbx register at the end of
>>>> xen_failsafe_callback before the latter jumps to error_exit.
>>>> error_exit expects the %rbx register to be a flag indicating whether
>>>> there should be a return to kernel mode.
>>>>
>>>> This commit makes sure that the %rbx register is not cleared by
>>>> the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated
>>>> by xen_failsafe_callback, to avoid the issue.
>>>
>>>
>>> Seems like a genuine problem, but:
>>>
>>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>>>> index c7449f377a77..96e8ff34129e 100644
>>>> --- a/arch/x86/entry/entry_64.S
>>>> +++ b/arch/x86/entry/entry_64.S
>>>> @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
>>>> addq $0x30, %rsp
>>>> UNWIND_HINT_IRET_REGS
>>>> pushq $-1 /* orig_ax = -1 => not a system call */
>>>> - PUSH_AND_CLEAR_REGS
>>>> + PUSH_AND_CLEAR_REGS clear_rbx=0
>>>> ENCODE_FRAME_POINTER
>>>> jmp error_exit
>>>
>>>
>>> The old code first set RBX to zero then, if frame pointers are on,
>>> sets it to some special non-zero value, then crosses its fingers and
>>> hopes for the best. Your patched code just skips the zeroing part, so
>>> RBX either contains the ENCODE_FRAME_POINTER result or is
>>> uninitialized.
>>>
>>> How about actually initializing rbx to something sensible like, say, 1?
>>
>> Hello Andy,
>>
>> Thank you for the review! Apparently, I have not done my homework fully.
>> I will test your suggestion and report back, most likely in a few hours.
>>
>> I have been testing with the next/linux-next tree's master branch
>> (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the
>> frame pointer (i.e., RBP) register, as opposed to the RBX register,
>> which the patch aims to avoid changing before jumping to error_exit.
>> It is possible that I am missing something though -- I am not sure about
>> the connection between the RBP and RBX registers.
>
> Sorry, brain fart on my part.
No problem! :-)
>> The change introduced by commit 3ac6d8c787b8 is in the excerpt below. Would
>> it
>> be valid to state that the original code had the same issue that you
>> referred
>> to (i.e., leaving the RBX register uninitialized)?
>
> Presumably.
>
> I would propose a rather different fix:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=bb3d76b50c3bc78b67d79cf90d328f38a435c793
>
> Any chance you could test that and see if it fixes your problem?
Of course; I will report back with the result in a few hours.
On 07/21/2018 07:37 PM, M. Vefa Bicakci wrote:
> On 07/21/2018 07:30 PM, Andy Lutomirski wrote:
>> On Sat, Jul 21, 2018 at 4:19 PM, M. Vefa Bicakci <[email protected]> wrote:
>>> On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
>>>>
>>>> On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci <[email protected]>
>>>> wrote:
>>>>>
>>>>> Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
>>>>> exceptions/interrupts, to reduce speculation attack surface")
>>>>> unintendedly
>>>>> broke Xen PV virtual machines by clearing the %rbx register at the end of
>>>>> xen_failsafe_callback before the latter jumps to error_exit.
>>>>> error_exit expects the %rbx register to be a flag indicating whether
>>>>> there should be a return to kernel mode.
>>>>>
>>>>> This commit makes sure that the %rbx register is not cleared by
>>>>> the PUSH_AND_CLEAR_REGS macro, when the macro in question is instantiated
>>>>> by xen_failsafe_callback, to avoid the issue.
>>>>
>>>>
>>>> Seems like a genuine problem, but:
>>>>
>>>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>>>>> index c7449f377a77..96e8ff34129e 100644
>>>>> --- a/arch/x86/entry/entry_64.S
>>>>> +++ b/arch/x86/entry/entry_64.S
>>>>> @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
>>>>> ????????? addq??? $0x30, %rsp
>>>>> ????????? UNWIND_HINT_IRET_REGS
>>>>> ????????? pushq?? $-1 /* orig_ax = -1 => not a system call */
>>>>> -?????? PUSH_AND_CLEAR_REGS
>>>>> +?????? PUSH_AND_CLEAR_REGS clear_rbx=0
>>>>> ????????? ENCODE_FRAME_POINTER
>>>>> ????????? jmp???? error_exit
>>>>
>>>>
>>>> The old code first set RBX to zero then, if frame pointers are on,
>>>> sets it to some special non-zero value, then crosses its fingers and
>>>> hopes for the best.? Your patched code just skips the zeroing part, so
>>>> RBX either contains the ENCODE_FRAME_POINTER result or is
>>>> uninitialized.
>>>>
>>>> How about actually initializing rbx to something sensible like, say, 1?
>>>
>>> Hello Andy,
>>>
>>> Thank you for the review! Apparently, I have not done my homework fully.
>>> I will test your suggestion and report back, most likely in a few hours.
>>>
>>> I have been testing with the next/linux-next tree's master branch
>>> (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the
>>> frame pointer (i.e., RBP) register, as opposed to the RBX register,
>>> which the patch aims to avoid changing before jumping to error_exit.
>>> It is possible that I am missing something though -- I am not sure about
>>> the connection between the RBP and RBX registers.
>>
>> Sorry, brain fart on my part.
>
> No problem! :-)
>
>>> The change introduced by commit 3ac6d8c787b8 is in the excerpt below. Would
>>> it
>>> be valid to state that the original code had the same issue that you
>>> referred
>>> to (i.e., leaving the RBX register uninitialized)?
>>
>> Presumably.
>>
>> I would propose a rather different fix:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=bb3d76b50c3bc78b67d79cf90d328f38a435c793
>>
>> Any chance you could test that and see if it fixes your problem?
>
> Of course; I will report back with the result in a few hours.
Hello Andy,
I confirm that the commit at [1] resolves the issue in question as well.
To test, I first reverted my commit, applied your commit and verified that
the bug cannot be reproduced. Afterwards, I reverted your commit and
verified that the bug is reproducible.
I am not sure about the best way to document the bug I encountered in your
commit message, but in case you plan to have your commit merged, please
feel free to add a "Reported-and-tested-by: M. Vefa Bicakci <[email protected]>"
tag to the commit message, possibly with a link to this e-mail thread.
Finally, as I had mentioned in my commit message, this bug exists in all
kernel versions 4.14 and greater, so it would be nice if you could carbon-copy
"[email protected]" in the commit message as well.
Thank you,
Vefa
[1] https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=bb3d76b50c3bc78b67d79cf90d328f38a435c793
On 07/21/2018 07:17 PM, M. Vefa Bicakci wrote:
> On 07/21/2018 05:25 PM, Boris Ostrovsky wrote:
>> On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
>>> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
>>> index 439a94bf89ad..87afb000142a 100644
>>> --- a/arch/x86/xen/enlighten_pv.c
>>> +++ b/arch/x86/xen/enlighten_pv.c
>>> @@ -1257,6 +1257,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
>>> ????? /* Work out if we support NX */
>>> ????? get_cpu_cap(&boot_cpu_data);
>>> +??? get_cpu_address_sizes(&boot_cpu_data);
>>> ????? x86_configure_nx();
>>
>>
>> Have you observed any problems without this call? get_cpu_cap() is only
>> called here to set X86_FEATURE_NX, and is then called again, together
>> with get_cpu_address_sizes(), from early_identify_cpu().
>
> Thank you for the reviews! Without the call to get_cpu_address_sizes,
> paravirtualized virtual machines do not boot up kernels with versions
> 4.17 and up at all; this includes dom0 and domU. No domU logs are
> generated in dom0's /var/log/xen/console/ directory either, despite
> having earlyprintk=xen on the kernel command line for my test domU.
Hello Boris,
I debugged this further with a debugging version of Xen (so that I can
get early kernel print-outs via the "xen_raw_console_write" function),
and I found the root cause of the boot up failure.
In summary, the issue is due to the following call path in version
4.17 (and higher, I assume), which the kernel goes through /only/ when
CONFIG_DEBUG_VIRTUAL is enabled:
enlighten_pv.c::xen_start_kernel
mmu_pv.c::xen_reserve_special_pages
page.h::__pa
physaddr.c::__phys_addr
physaddr.h::phys_addr_valid // uses boot_cpu_data.x86_phys_bits
The return value of phys_addr_valid is used with the VIRTUAL_BUG_ON macro,
which evaluates to BUG_ON in case CONFIG_DEBUG_VIRTUAL is enabled.
It looks like the call to get_cpu_address_size is required in the
xen_start_kernel function. Perhaps there is a more elegant way to
resolve this issue as well.
Another approach could be to check in the phys_addr_valid function whether
boot_cpu_data.x86_phys_bits has been initialized or not, I think, but I am
not sure about the correctness of this approach.
Thank you,
Vefa
On Sat, Jul 21, 2018 at 6:20 PM, M. Vefa Bicakci <[email protected]> wrote:
> On 07/21/2018 07:37 PM, M. Vefa Bicakci wrote:
>>
>> On 07/21/2018 07:30 PM, Andy Lutomirski wrote:
>>>
>>> On Sat, Jul 21, 2018 at 4:19 PM, M. Vefa Bicakci <[email protected]>
>>> wrote:
>>>>
>>>> On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
>>>>>
>>>>>
>>>>> On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
>>>>>> exceptions/interrupts, to reduce speculation attack surface")
>>>>>> unintendedly
>>>>>> broke Xen PV virtual machines by clearing the %rbx register at the end
>>>>>> of
>>>>>> xen_failsafe_callback before the latter jumps to error_exit.
>>>>>> error_exit expects the %rbx register to be a flag indicating whether
>>>>>> there should be a return to kernel mode.
>>>>>>
>>>>>> This commit makes sure that the %rbx register is not cleared by
>>>>>> the PUSH_AND_CLEAR_REGS macro, when the macro in question is
>>>>>> instantiated
>>>>>> by xen_failsafe_callback, to avoid the issue.
>>>>>
>>>>>
>>>>>
>>>>> Seems like a genuine problem, but:
>>>>>
>>>>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>>>>>> index c7449f377a77..96e8ff34129e 100644
>>>>>> --- a/arch/x86/entry/entry_64.S
>>>>>> +++ b/arch/x86/entry/entry_64.S
>>>>>> @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
>>>>>> addq $0x30, %rsp
>>>>>> UNWIND_HINT_IRET_REGS
>>>>>> pushq $-1 /* orig_ax = -1 => not a system call */
>>>>>> - PUSH_AND_CLEAR_REGS
>>>>>> + PUSH_AND_CLEAR_REGS clear_rbx=0
>>>>>> ENCODE_FRAME_POINTER
>>>>>> jmp error_exit
>>>>>
>>>>>
>>>>>
>>>>> The old code first set RBX to zero then, if frame pointers are on,
>>>>> sets it to some special non-zero value, then crosses its fingers and
>>>>> hopes for the best. Your patched code just skips the zeroing part, so
>>>>> RBX either contains the ENCODE_FRAME_POINTER result or is
>>>>> uninitialized.
>>>>>
>>>>> How about actually initializing rbx to something sensible like, say, 1?
>>>>
>>>>
>>>> Hello Andy,
>>>>
>>>> Thank you for the review! Apparently, I have not done my homework fully.
>>>> I will test your suggestion and report back, most likely in a few hours.
>>>>
>>>> I have been testing with the next/linux-next tree's master branch
>>>> (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the
>>>> frame pointer (i.e., RBP) register, as opposed to the RBX register,
>>>> which the patch aims to avoid changing before jumping to error_exit.
>>>> It is possible that I am missing something though -- I am not sure about
>>>> the connection between the RBP and RBX registers.
>>>
>>>
>>> Sorry, brain fart on my part.
>>
>>
>> No problem! :-)
>>
>>>> The change introduced by commit 3ac6d8c787b8 is in the excerpt below.
>>>> Would
>>>> it
>>>> be valid to state that the original code had the same issue that you
>>>> referred
>>>> to (i.e., leaving the RBX register uninitialized)?
>>>
>>>
>>> Presumably.
>>>
>>> I would propose a rather different fix:
>>>
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=bb3d76b50c3bc78b67d79cf90d328f38a435c793
>>>
>>> Any chance you could test that and see if it fixes your problem?
>>
>>
>> Of course; I will report back with the result in a few hours.
>
>
> Hello Andy,
>
> I confirm that the commit at [1] resolves the issue in question as well.
>
> To test, I first reverted my commit, applied your commit and verified that
> the bug cannot be reproduced. Afterwards, I reverted your commit and
> verified that the bug is reproducible.
>
> I am not sure about the best way to document the bug I encountered in your
> commit message, but in case you plan to have your commit merged, please
> feel free to add a "Reported-and-tested-by: M. Vefa Bicakci
> <[email protected]>"
> tag to the commit message, possibly with a link to this e-mail thread.
>
> Finally, as I had mentioned in my commit message, this bug exists in all
> kernel versions 4.14 and greater, so it would be nice if you could
> carbon-copy
> "[email protected]" in the commit message as well.
>
> Thank you,
I'm curious why the sigreturn_64 selftest didn't catch this bug. Do
you happen to know why? The xen_failsafe_callback mechanism is a bit
mysterious to me.
>
> Vefa
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=bb3d76b50c3bc78b67d79cf90d328f38a435c793
>
On 22/07/18 18:56, Andy Lutomirski wrote:
> On Sat, Jul 21, 2018 at 6:20 PM, M. Vefa Bicakci <[email protected]> wrote:
>> On 07/21/2018 07:37 PM, M. Vefa Bicakci wrote:
>>> On 07/21/2018 07:30 PM, Andy Lutomirski wrote:
>>>> On Sat, Jul 21, 2018 at 4:19 PM, M. Vefa Bicakci <[email protected]>
>>>> wrote:
>>>>> On 07/21/2018 05:45 PM, Andy Lutomirski wrote:
>>>>>>
>>>>>> On Sat, Jul 21, 2018 at 12:49 PM, M. Vefa Bicakci <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
>>>>>>> exceptions/interrupts, to reduce speculation attack surface")
>>>>>>> unintendedly
>>>>>>> broke Xen PV virtual machines by clearing the %rbx register at the end
>>>>>>> of
>>>>>>> xen_failsafe_callback before the latter jumps to error_exit.
>>>>>>> error_exit expects the %rbx register to be a flag indicating whether
>>>>>>> there should be a return to kernel mode.
>>>>>>>
>>>>>>> This commit makes sure that the %rbx register is not cleared by
>>>>>>> the PUSH_AND_CLEAR_REGS macro, when the macro in question is
>>>>>>> instantiated
>>>>>>> by xen_failsafe_callback, to avoid the issue.
>>>>>>
>>>>>>
>>>>>> Seems like a genuine problem, but:
>>>>>>
>>>>>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>>>>>>> index c7449f377a77..96e8ff34129e 100644
>>>>>>> --- a/arch/x86/entry/entry_64.S
>>>>>>> +++ b/arch/x86/entry/entry_64.S
>>>>>>> @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
>>>>>>> addq $0x30, %rsp
>>>>>>> UNWIND_HINT_IRET_REGS
>>>>>>> pushq $-1 /* orig_ax = -1 => not a system call */
>>>>>>> - PUSH_AND_CLEAR_REGS
>>>>>>> + PUSH_AND_CLEAR_REGS clear_rbx=0
>>>>>>> ENCODE_FRAME_POINTER
>>>>>>> jmp error_exit
>>>>>>
>>>>>>
>>>>>> The old code first set RBX to zero then, if frame pointers are on,
>>>>>> sets it to some special non-zero value, then crosses its fingers and
>>>>>> hopes for the best. Your patched code just skips the zeroing part, so
>>>>>> RBX either contains the ENCODE_FRAME_POINTER result or is
>>>>>> uninitialized.
>>>>>>
>>>>>> How about actually initializing rbx to something sensible like, say, 1?
>>>>>
>>>>> Hello Andy,
>>>>>
>>>>> Thank you for the review! Apparently, I have not done my homework fully.
>>>>> I will test your suggestion and report back, most likely in a few hours.
>>>>>
>>>>> I have been testing with the next/linux-next tree's master branch
>>>>> (dated 20180720), and I noticed that ENCODE_FRAME_POINTER changes the
>>>>> frame pointer (i.e., RBP) register, as opposed to the RBX register,
>>>>> which the patch aims to avoid changing before jumping to error_exit.
>>>>> It is possible that I am missing something though -- I am not sure about
>>>>> the connection between the RBP and RBX registers.
>>>>
>>>> Sorry, brain fart on my part.
>>>
>>> No problem! :-)
>>>
>>>>> The change introduced by commit 3ac6d8c787b8 is in the excerpt below.
>>>>> Would
>>>>> it
>>>>> be valid to state that the original code had the same issue that you
>>>>> referred
>>>>> to (i.e., leaving the RBX register uninitialized)?
>>>>
>>>> Presumably.
>>>>
>>>> I would propose a rather different fix:
>>>>
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pti&id=bb3d76b50c3bc78b67d79cf90d328f38a435c793
>>>>
>>>> Any chance you could test that and see if it fixes your problem?
>>>
>>> Of course; I will report back with the result in a few hours.
>>
>> Hello Andy,
>>
>> I confirm that the commit at [1] resolves the issue in question as well.
>>
>> To test, I first reverted my commit, applied your commit and verified that
>> the bug cannot be reproduced. Afterwards, I reverted your commit and
>> verified that the bug is reproducible.
>>
>> I am not sure about the best way to document the bug I encountered in your
>> commit message, but in case you plan to have your commit merged, please
>> feel free to add a "Reported-and-tested-by: M. Vefa Bicakci
>> <[email protected]>"
>> tag to the commit message, possibly with a link to this e-mail thread.
>>
>> Finally, as I had mentioned in my commit message, this bug exists in all
>> kernel versions 4.14 and greater, so it would be nice if you could
>> carbon-copy
>> "[email protected]" in the commit message as well.
>>
>> Thank you,
> I'm curious why the sigreturn_64 selftest didn't catch this bug. Do
> you happen to know why? The xen_failsafe_callback mechanism is a bit
> mysterious to me.
It is invoked whenever Xen suffers a fault when trying to load a guest
segment register.
In practice, it is used far more rarely with 64bit builds of Xen than it
used to be with 32bit builds. This is because the only time we reload
guest data segments is in the vcpu context switch path.
More recent versions of Xen also don't have a fallback in the iret path,
due to what think was actually some overzealous cleanup. As a result,
#GP gets delivered pointing at the target of the iret instruction.
Looking at the sigreturn tests, I'd expect the test to complain a lot,
not least because Xen doesn't have espfix64.
~Andrew
On 07/22/2018 11:57 AM, M. Vefa Bicakci wrote:
> On 07/21/2018 07:17 PM, M. Vefa Bicakci wrote:
>> On 07/21/2018 05:25 PM, Boris Ostrovsky wrote:
>>> On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
>>>> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
>>>> index 439a94bf89ad..87afb000142a 100644
>>>> --- a/arch/x86/xen/enlighten_pv.c
>>>> +++ b/arch/x86/xen/enlighten_pv.c
>>>> @@ -1257,6 +1257,7 @@ asmlinkage __visible void __init
>>>> xen_start_kernel(void)
>>>> ????? /* Work out if we support NX */
>>>> ????? get_cpu_cap(&boot_cpu_data);
>>>> +??? get_cpu_address_sizes(&boot_cpu_data);
>>>> ????? x86_configure_nx();
>>>
>>>
>>> Have you observed any problems without this call? get_cpu_cap() is only
>>> called here to set X86_FEATURE_NX, and is then called again, together
>>> with get_cpu_address_sizes(), from early_identify_cpu().
>>
>> Thank you for the reviews! Without the call to get_cpu_address_sizes,
>> paravirtualized virtual machines do not boot up kernels with versions
>> 4.17 and up at all; this includes dom0 and domU. No domU logs are
>> generated in dom0's /var/log/xen/console/ directory either, despite
>> having earlyprintk=xen on the kernel command line for my test domU.
>
> Hello Boris,
>
> I debugged this further with a debugging version of Xen (so that I can
> get early kernel print-outs via the "xen_raw_console_write" function),
> and I found the root cause of the boot up failure.
>
> In summary, the issue is due to the following call path in version
> 4.17 (and higher, I assume), which the kernel goes through /only/ when
> CONFIG_DEBUG_VIRTUAL is enabled:
>
> enlighten_pv.c::xen_start_kernel
> ? mmu_pv.c::xen_reserve_special_pages
> ??? page.h::__pa
> ????? physaddr.c::__phys_addr
> ??????? physaddr.h::phys_addr_valid // uses boot_cpu_data.x86_phys_bits
>
> The return value of phys_addr_valid is used with the VIRTUAL_BUG_ON
> macro,
> which evaluates to BUG_ON in case CONFIG_DEBUG_VIRTUAL is enabled.
Ah, that's why it hasn't been detected.
>
> It looks like the call to get_cpu_address_size is required in the
> xen_start_kernel function. Perhaps there is a more elegant way to
> resolve this issue as well.
>
> Another approach could be to check in the phys_addr_valid function
> whether
> boot_cpu_data.x86_phys_bits has been initialized or not, I think, but
> I am
> not sure about the correctness of this approach.
No, I think your patch is good. The only thing I'd suggest is to move
the call a few lines down. The way it is placed now may create
impression that we are calling get_cpu_address_sizes() to figure out NX
support.
-boris
On 07/21/2018 07:18 PM, M. Vefa Bicakci wrote:
> On 07/21/2018 05:19 PM, Boris Ostrovsky wrote:
>> On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>>> index c7449f377a77..96e8ff34129e 100644
>>> --- a/arch/x86/entry/entry_64.S
>>> +++ b/arch/x86/entry/entry_64.S
>>> @@ -1129,7 +1129,7 @@ ENTRY(xen_failsafe_callback)
>>> ????? addq??? $0x30, %rsp
>>> ????? UNWIND_HINT_IRET_REGS
>>> ????? pushq??? $-1 /* orig_ax = -1 => not a system call */
>>> -??? PUSH_AND_CLEAR_REGS
>>> +??? PUSH_AND_CLEAR_REGS clear_rbx=0
>>
>>
>> Do we need this at all? We are returning from the hypervisor here.
>>
>> -boris
>>
>>> ????? ENCODE_FRAME_POINTER
>>> ????? jmp??? error_exit
>>> ? END(xen_failsafe_callback)
>
> Hello Boris,
>
> If you are referring to the PUSH_AND_CLEAR_REGS macro itself, I am not
> sure;
> however, not clearing the RBX register seemed to resolve the issues
> mentioned
> in the commit message for me. Given Andy's comment though, I believe
> that the
> approach in this patch may not be correct.
I was only referring to register clearing part of PUSH_AND_CLEAR_REGS.
-boris
On 07/23/2018 11:04 AM, Boris Ostrovsky wrote:
> On 07/22/2018 11:57 AM, M. Vefa Bicakci wrote:
>> On 07/21/2018 07:17 PM, M. Vefa Bicakci wrote:
>>> On 07/21/2018 05:25 PM, Boris Ostrovsky wrote:
>>>> On 07/21/2018 03:49 PM, M. Vefa Bicakci wrote:
>>>>> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
>>>>> index 439a94bf89ad..87afb000142a 100644
>>>>> --- a/arch/x86/xen/enlighten_pv.c
>>>>> +++ b/arch/x86/xen/enlighten_pv.c
>>>>> @@ -1257,6 +1257,7 @@ asmlinkage __visible void __init
>>>>> xen_start_kernel(void)
>>>>> ????? /* Work out if we support NX */
>>>>> ????? get_cpu_cap(&boot_cpu_data);
>>>>> +??? get_cpu_address_sizes(&boot_cpu_data);
>>>>> ????? x86_configure_nx();
>>>>
>>>>
>>>> Have you observed any problems without this call? get_cpu_cap() is only
>>>> called here to set X86_FEATURE_NX, and is then called again, together
>>>> with get_cpu_address_sizes(), from early_identify_cpu().
>>>
>>> Thank you for the reviews! Without the call to get_cpu_address_sizes,
>>> paravirtualized virtual machines do not boot up kernels with versions
>>> 4.17 and up at all; this includes dom0 and domU. No domU logs are
>>> generated in dom0's /var/log/xen/console/ directory either, despite
>>> having earlyprintk=xen on the kernel command line for my test domU.
>>
>> Hello Boris,
>>
>> I debugged this further with a debugging version of Xen (so that I can
>> get early kernel print-outs via the "xen_raw_console_write" function),
>> and I found the root cause of the boot up failure.
>>
>> In summary, the issue is due to the following call path in version
>> 4.17 (and higher, I assume), which the kernel goes through /only/ when
>> CONFIG_DEBUG_VIRTUAL is enabled:
>>
>> enlighten_pv.c::xen_start_kernel
>> ? mmu_pv.c::xen_reserve_special_pages
>> ??? page.h::__pa
>> ????? physaddr.c::__phys_addr
>> ??????? physaddr.h::phys_addr_valid // uses boot_cpu_data.x86_phys_bits
>>
>> The return value of phys_addr_valid is used with the VIRTUAL_BUG_ON
>> macro,
>> which evaluates to BUG_ON in case CONFIG_DEBUG_VIRTUAL is enabled.
>
>
> Ah, that's why it hasn't been detected.
>
>
>>
>> It looks like the call to get_cpu_address_size is required in the
>> xen_start_kernel function. Perhaps there is a more elegant way to
>> resolve this issue as well.
>>
>> Another approach could be to check in the phys_addr_valid function
>> whether
>> boot_cpu_data.x86_phys_bits has been initialized or not, I think, but
>> I am
>> not sure about the correctness of this approach.
>
>
> No, I think your patch is good. The only thing I'd suggest is to move
> the call a few lines down. The way it is placed now may create
> impression that we are calling get_cpu_address_sizes() to figure out NX
> support.
Sorry for the delay, and thank you for your comments! I will
send an updated version of this patch in a few moments.
Vefa
Commit d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits
adjustment corruption") has moved the query and calculation of the
x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct
from the get_cpu_cap function to a new function named
get_cpu_address_sizes.
One of the call sites related to Xen PV VMs was unfortunately missed
in the aforementioned commit. This prevents successful boot-up of
kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL
is enabled, due to the following code path:
enlighten_pv.c::xen_start_kernel
mmu_pv.c::xen_reserve_special_pages
page.h::__pa
physaddr.c::__phys_addr
physaddr.h::phys_addr_valid
phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical
addresses. boot_cpu_data.x86_phys_bits is no longer populated before
the call to xen_reserve_special_pages due to the aforementioned commit
though, so the validation performed by phys_addr_valid fails, which
causes __phys_addr to trigger a BUG, preventing boot-up.
Signed-off-by: M. Vefa Bicakci <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected] # for v4.17 and up
Fixes: d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
---
Changes since v1:
- Move the call to get_cpu_address_sizes below the call to
x86_configure_nx.
- Amend the commit message to describe why PV VMs do not boot up
successfully when CONFIG_DEBUG_VIRTUAL is enabled.
---
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/cpu/cpu.h | 1 +
arch/x86/xen/enlighten_pv.c | 3 +++
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f73fa6f6d85e..dd282482de09 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -911,7 +911,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
apply_forced_caps(c);
}
-static void get_cpu_address_sizes(struct cpuinfo_x86 *c)
+void get_cpu_address_sizes(struct cpuinfo_x86 *c)
{
u32 eax, ebx, ecx, edx;
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index 38216f678fc3..12a5f0cec0b2 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -46,6 +46,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[],
*const __x86_cpu_dev_end[];
extern void get_cpu_cap(struct cpuinfo_x86 *c);
+extern void get_cpu_address_sizes(struct cpuinfo_x86 *c);
extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
extern u32 get_scattered_cpuid_leaf(unsigned int level,
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 105a57d73701..ee3b00c7acda 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1256,6 +1256,9 @@ asmlinkage __visible void __init xen_start_kernel(void)
get_cpu_cap(&boot_cpu_data);
x86_configure_nx();
+ /* Determine virtual and physical address sizes */
+ get_cpu_address_sizes(&boot_cpu_data);
+
/* Let's presume PV guests always boot on vCPU with id 0. */
per_cpu(xen_vcpu_id, 0) = 0;
--
2.17.1
On 07/24/2018 08:45 AM, M. Vefa Bicakci wrote:
> Commit d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits
> adjustment corruption") has moved the query and calculation of the
> x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct
> from the get_cpu_cap function to a new function named
> get_cpu_address_sizes.
>
> One of the call sites related to Xen PV VMs was unfortunately missed
> in the aforementioned commit. This prevents successful boot-up of
> kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL
> is enabled, due to the following code path:
>
> enlighten_pv.c::xen_start_kernel
> mmu_pv.c::xen_reserve_special_pages
> page.h::__pa
> physaddr.c::__phys_addr
> physaddr.h::phys_addr_valid
>
> phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical
> addresses. boot_cpu_data.x86_phys_bits is no longer populated before
> the call to xen_reserve_special_pages due to the aforementioned commit
> though, so the validation performed by phys_addr_valid fails, which
> causes __phys_addr to trigger a BUG, preventing boot-up.
>
> Signed-off-by: M. Vefa Bicakci <[email protected]>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Boris Ostrovsky <[email protected]>
> Cc: Juergen Gross <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected] # for v4.17 and up
> Fixes: d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
Reviewed-by: Boris Ostrovsky <[email protected]>
Commit d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits
adjustment corruption") has moved the query and calculation of the
x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct
from the get_cpu_cap function to a new function named
get_cpu_address_sizes.
One of the call sites related to Xen PV VMs was unfortunately missed
in the aforementioned commit. This prevents successful boot-up of
kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL
is enabled, due to the following code path:
enlighten_pv.c::xen_start_kernel
mmu_pv.c::xen_reserve_special_pages
page.h::__pa
physaddr.c::__phys_addr
physaddr.h::phys_addr_valid
phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical
addresses. boot_cpu_data.x86_phys_bits is no longer populated before
the call to xen_reserve_special_pages due to the aforementioned commit
though, so the validation performed by phys_addr_valid fails, which
causes __phys_addr to trigger a BUG, preventing boot-up.
Signed-off-by: M. Vefa Bicakci <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected] # for v4.17 and up
Fixes: d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
---
Changes since v1:
- Move the call to get_cpu_address_sizes below the call to
x86_configure_nx.
- Amend the commit message to describe why PV VMs do not boot up
successfully when CONFIG_DEBUG_VIRTUAL is enabled.
Changes since v2:
- Add a Reviewed-by tag to note Boris Ostrovsky's code review.
- Rebase onto next-20180725.
---
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/cpu/cpu.h | 1 +
arch/x86/xen/enlighten_pv.c | 3 +++
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f73fa6f6d85e..dd282482de09 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -911,7 +911,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
apply_forced_caps(c);
}
-static void get_cpu_address_sizes(struct cpuinfo_x86 *c)
+void get_cpu_address_sizes(struct cpuinfo_x86 *c)
{
u32 eax, ebx, ecx, edx;
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index 38216f678fc3..12a5f0cec0b2 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -46,6 +46,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[],
*const __x86_cpu_dev_end[];
extern void get_cpu_cap(struct cpuinfo_x86 *c);
+extern void get_cpu_address_sizes(struct cpuinfo_x86 *c);
extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
extern u32 get_scattered_cpuid_leaf(unsigned int level,
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 105a57d73701..ee3b00c7acda 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1256,6 +1256,9 @@ asmlinkage __visible void __init xen_start_kernel(void)
get_cpu_cap(&boot_cpu_data);
x86_configure_nx();
+ /* Determine virtual and physical address sizes */
+ get_cpu_address_sizes(&boot_cpu_data);
+
/* Let's presume PV guests always boot on vCPU with id 0. */
per_cpu(xen_vcpu_id, 0) = 0;
--
2.17.1
x86 maintainers, this needs your ack please.
-boris
On 07/24/2018 08:45 AM, M. Vefa Bicakci wrote:
> Commit d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits
> adjustment corruption") has moved the query and calculation of the
> x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct
> from the get_cpu_cap function to a new function named
> get_cpu_address_sizes.
>
> One of the call sites related to Xen PV VMs was unfortunately missed
> in the aforementioned commit. This prevents successful boot-up of
> kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL
> is enabled, due to the following code path:
>
> enlighten_pv.c::xen_start_kernel
> mmu_pv.c::xen_reserve_special_pages
> page.h::__pa
> physaddr.c::__phys_addr
> physaddr.h::phys_addr_valid
>
> phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical
> addresses. boot_cpu_data.x86_phys_bits is no longer populated before
> the call to xen_reserve_special_pages due to the aforementioned commit
> though, so the validation performed by phys_addr_valid fails, which
> causes __phys_addr to trigger a BUG, preventing boot-up.
>
> Signed-off-by: M. Vefa Bicakci <[email protected]>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Boris Ostrovsky <[email protected]>
> Cc: Juergen Gross <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected] # for v4.17 and up
> Fixes: d94a155c59c9 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
>
> ---
>
> Changes since v1:
> - Move the call to get_cpu_address_sizes below the call to
> x86_configure_nx.
> - Amend the commit message to describe why PV VMs do not boot up
> successfully when CONFIG_DEBUG_VIRTUAL is enabled.
> ---
> arch/x86/kernel/cpu/common.c | 2 +-
> arch/x86/kernel/cpu/cpu.h | 1 +
> arch/x86/xen/enlighten_pv.c | 3 +++
> 3 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index f73fa6f6d85e..dd282482de09 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -911,7 +911,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
> apply_forced_caps(c);
> }
>
> -static void get_cpu_address_sizes(struct cpuinfo_x86 *c)
> +void get_cpu_address_sizes(struct cpuinfo_x86 *c)
> {
> u32 eax, ebx, ecx, edx;
>
> diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
> index 38216f678fc3..12a5f0cec0b2 100644
> --- a/arch/x86/kernel/cpu/cpu.h
> +++ b/arch/x86/kernel/cpu/cpu.h
> @@ -46,6 +46,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[],
> *const __x86_cpu_dev_end[];
>
> extern void get_cpu_cap(struct cpuinfo_x86 *c);
> +extern void get_cpu_address_sizes(struct cpuinfo_x86 *c);
> extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
> extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
> extern u32 get_scattered_cpuid_leaf(unsigned int level,
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index 105a57d73701..ee3b00c7acda 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -1256,6 +1256,9 @@ asmlinkage __visible void __init xen_start_kernel(void)
> get_cpu_cap(&boot_cpu_data);
> x86_configure_nx();
>
> + /* Determine virtual and physical address sizes */
> + get_cpu_address_sizes(&boot_cpu_data);
> +
> /* Let's presume PV guests always boot on vCPU with id 0. */
> per_cpu(xen_vcpu_id, 0) = 0;
>
On Mon, 6 Aug 2018, Boris Ostrovsky wrote:
> x86 maintainers, this needs your ack please.
Reviewed-by: Thomas Gleixner <[email protected]>
On 08/06/2018 04:16 PM, Thomas Gleixner wrote:
> On Mon, 6 Aug 2018, Boris Ostrovsky wrote:
>
>> x86 maintainers, this needs your ack please.
> Reviewed-by: Thomas Gleixner <[email protected]>
>
Thanks.
Applied to for-linus-4.19
* Boris Ostrovsky <[email protected]> wrote:
> x86 maintainers, this needs your ack please.
LGTM:
Acked-by: Ingo Molnar <[email protected]>
Thanks,
Ingo