Align stack to match calling sequence requirements in section "The
Stack Frame" of the System V ABI AMD64 Architecture Processor
Supplement, which requires the value (%rsp + 8) to be a multiple of 16
when control is transferred to the function entry point.
This is required because GCC is already aligned with the SysV ABI
spec, and compiles code resulting in (%rsp + 8) being a multiple of 16
when control is transferred to the function entry point.
This fixes guest crashes when compiled guest code contains certain SSE
instructions, because thes SSE instructions expect memory
references (including those on the stack) to be 16-byte-aligned.
Signed-off-by: Ackerley Tng <[email protected]>
---
This patch is a follow-up from discussions at
https://lore.kernel.org/lkml/[email protected]/
---
.../selftests/kvm/include/linux/align.h | 15 +++++++++++++++
.../selftests/kvm/lib/x86_64/processor.c | 18 +++++++++++++++++-
2 files changed, 32 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/kvm/include/linux/align.h
diff --git a/tools/testing/selftests/kvm/include/linux/align.h b/tools/testing/selftests/kvm/include/linux/align.h
new file mode 100644
index 000000000000..2b4acec7b95a
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/linux/align.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_ALIGN_H
+#define _LINUX_ALIGN_H
+
+#include <linux/const.h>
+
+/* @a is a power of 2 value */
+#define ALIGN(x, a) __ALIGN_KERNEL((x), (a))
+#define ALIGN_DOWN(x, a) __ALIGN_KERNEL((x) - ((a) - 1), (a))
+#define __ALIGN_MASK(x, mask) __ALIGN_KERNEL_MASK((x), (mask))
+#define PTR_ALIGN(p, a) ((typeof(p))ALIGN((unsigned long)(p), (a)))
+#define PTR_ALIGN_DOWN(p, a) ((typeof(p))ALIGN_DOWN((unsigned long)(p), (a)))
+#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
+
+#endif /* _LINUX_ALIGN_H */
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index acfa1d01e7df..09b48ae96fdd 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -5,6 +5,7 @@
* Copyright (C) 2018, Google LLC.
*/
+#include "linux/align.h"
#include "test_util.h"
#include "kvm_util.h"
#include "processor.h"
@@ -569,6 +570,21 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,
DEFAULT_GUEST_STACK_VADDR_MIN,
MEM_REGION_DATA);
+ stack_vaddr += DEFAULT_STACK_PGS * getpagesize();
+
+ /*
+ * Align stack to match calling sequence requirements in section "The
+ * Stack Frame" of the System V ABI AMD64 Architecture Processor
+ * Supplement, which requires the value (%rsp + 8) to be a multiple of
+ * 16 when control is transferred to the function entry point.
+ *
+ * If this code is ever used to launch a vCPU with 32-bit entry point it
+ * may need to subtract 4 bytes instead of 8 bytes.
+ */
+ TEST_ASSERT(IS_ALIGNED(stack_vaddr, PAGE_SIZE),
+ "stack_vaddr must be page aligned for stack adjustment of -8 to work");
+ stack_vaddr -= 8;
+
vcpu = __vm_vcpu_add(vm, vcpu_id);
vcpu_init_cpuid(vcpu, kvm_get_supported_cpuid());
vcpu_setup(vm, vcpu);
@@ -576,7 +592,7 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,
/* Setup guest general purpose registers */
vcpu_regs_get(vcpu, ®s);
regs.rflags = regs.rflags | 0x2;
- regs.rsp = stack_vaddr + (DEFAULT_STACK_PGS * getpagesize());
+ regs.rsp = stack_vaddr;
regs.rip = (unsigned long) guest_code;
vcpu_regs_set(vcpu, ®s);
--
2.39.2.637.g21b0678d19-goog
On Fri, Feb 17, 2023, Ackerley Tng wrote:
> Align stack to match calling sequence requirements in section "The
> Stack Frame" of the System V ABI AMD64 Architecture Processor
> Supplement, which requires the value (%rsp + 8) to be a multiple of 16
> when control is transferred to the function entry point.
To make it slightly more clear what is wrong:
Align the guest stack to match calling sequence requirements in section
"The Stack Frame" of the System V ABI AMD64 Architecture Processor
Supplement, which requires the value (%rsp + 8), NOT %rsp, to be a
multiple of 16 when control is transferred to the function entry point.
I.e. in a normal function call, %rsp needs to be 16-byte aligned
_before_ CALL, not after.
> This is required because GCC is already aligned with the SysV ABI
> spec, and compiles code resulting in (%rsp + 8) being a multiple of 16
> when control is transferred to the function entry point.
I'd leave out this paragraph, any sane compiler, not just gcc, will adhere to the
SysV ABI.
> This fixes guest crashes when compiled guest code contains certain SSE
Nit, explicitly call out that #GP behavior, e.g. if/when KVM installs exception
handlers by default, there will be no crash.
E.g.
This fixes unexpected #GPs in guest code when the compiler uses SSE
instructions, e.g. to initialize memory, as many SSE instruction require
memory operands (including those on the stack) to be 16-byte aligned.
> instructions, because thes SSE instructions expect memory
> references (including those on the stack) to be 16-byte-aligned.
>
> Signed-off-by: Ackerley Tng <[email protected]>
> ---
>
> This patch is a follow-up from discussions at
> https://lore.kernel.org/lkml/[email protected]/
>
> ---
> .../selftests/kvm/include/linux/align.h | 15 +++++++++++++++
> .../selftests/kvm/lib/x86_64/processor.c | 18 +++++++++++++++++-
> 2 files changed, 32 insertions(+), 1 deletion(-)
> create mode 100644 tools/testing/selftests/kvm/include/linux/align.h
>
> diff --git a/tools/testing/selftests/kvm/include/linux/align.h b/tools/testing/selftests/kvm/include/linux/align.h
> new file mode 100644
> index 000000000000..2b4acec7b95a
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/include/linux/align.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_ALIGN_H
> +#define _LINUX_ALIGN_H
> +
> +#include <linux/const.h>
> +
> +/* @a is a power of 2 value */
> +#define ALIGN(x, a) __ALIGN_KERNEL((x), (a))
> +#define ALIGN_DOWN(x, a) __ALIGN_KERNEL((x) - ((a) - 1), (a))
> +#define __ALIGN_MASK(x, mask) __ALIGN_KERNEL_MASK((x), (mask))
> +#define PTR_ALIGN(p, a) ((typeof(p))ALIGN((unsigned long)(p), (a)))
> +#define PTR_ALIGN_DOWN(p, a) ((typeof(p))ALIGN_DOWN((unsigned long)(p), (a)))
> +#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
I agree it's high time align.h is pulled into tools/ but it belongs in
tools/include/linux/, not in KVM selftests.
For this fix specifically, tools/include/linux/bitmap.h already #defines IS_ALIGNED(),
so just use that, and pull in align.h (and remove the definition in bitmap.h) in
a separate patch (and let us maintainers will deal with the conflicts).
> +
> +#endif /* _LINUX_ALIGN_H */
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index acfa1d01e7df..09b48ae96fdd 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -5,6 +5,7 @@
> * Copyright (C) 2018, Google LLC.
> */
>
> +#include "linux/align.h"
> #include "test_util.h"
> #include "kvm_util.h"
> #include "processor.h"
> @@ -569,6 +570,21 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,
> DEFAULT_GUEST_STACK_VADDR_MIN,
> MEM_REGION_DATA);
>
> + stack_vaddr += DEFAULT_STACK_PGS * getpagesize();
> +
> + /*
> + * Align stack to match calling sequence requirements in section "The
> + * Stack Frame" of the System V ABI AMD64 Architecture Processor
> + * Supplement, which requires the value (%rsp + 8) to be a multiple of
> + * 16 when control is transferred to the function entry point.
> + *
> + * If this code is ever used to launch a vCPU with 32-bit entry point it
> + * may need to subtract 4 bytes instead of 8 bytes.
> + */
> + TEST_ASSERT(IS_ALIGNED(stack_vaddr, PAGE_SIZE),
> + "stack_vaddr must be page aligned for stack adjustment of -8 to work");
Nit, for the message, tie it to the allocation, not to the usage, e.g.
TEST_ASSERT(IS_ALIGNED(stack_vaddr, PAGE_SIZE),
"__vm_vaddr_alloc() did not provide a page-aligned address");
The assert exists to verify an assumption (that the allocator always provides
page-aligned addresses), and the error message should capture that. Explaining
what will break isn't as helpful because it doesn't help understand what went
wrong
> + stack_vaddr -= 8;
> +
> vcpu = __vm_vcpu_add(vm, vcpu_id);
> vcpu_init_cpuid(vcpu, kvm_get_supported_cpuid());
> vcpu_setup(vm, vcpu);
> @@ -576,7 +592,7 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,
> /* Setup guest general purpose registers */
> vcpu_regs_get(vcpu, ®s);
> regs.rflags = regs.rflags | 0x2;
> - regs.rsp = stack_vaddr + (DEFAULT_STACK_PGS * getpagesize());
> + regs.rsp = stack_vaddr;
> regs.rip = (unsigned long) guest_code;
> vcpu_regs_set(vcpu, ®s);
>
> --
> 2.39.2.637.g21b0678d19-goog
>