2020-07-22 22:17:03

by Thomas Gleixner

[permalink] [raw]
Subject: [patch V5 05/15] entry: Provide infrastructure for work before transitioning to guest mode

From: Thomas Gleixner <[email protected]>

Entering a guest is similar to exiting to user space. Pending work like
handling signals, rescheduling, task work etc. needs to be handled before
that.

Provide generic infrastructure to avoid duplication of the same handling
code all over the place.

The transfer to guest mode handling is different from the exit to usermode
handling, e.g. vs. rseq and live patching, so a separate function is used.

The initial list of work items handled is:

TIF_SIGPENDING, TIF_NEED_RESCHED, TIF_NOTIFY_RESUME

Architecture specific TIF flags can be added via defines in the
architecture specific include files.

The calling convention is also different from the syscall/interrupt entry
functions as KVM invokes this from the outer vcpu_run() loop with
interrupts and preemption enabled. To prevent missing a pending work item
it invokes a check for pending TIF work from interrupt disabled code right
before transitioning to guest mode. The lockdep, RCU and tracing state
handling is also done directly around the switch to and from guest mode.

Signed-off-by: Thomas Gleixner <[email protected]>
---
V5: Rename exit -> xfer (Sean)

V3: Reworked and simplified version adopted to recent X86 and KVM changes

V2: Moved KVM specific functions to kvm (Paolo)
Added lockdep assert (Andy)
Dropped live patching from enter guest mode work (Miroslav)
---
include/linux/entry-kvm.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/kvm_host.h | 8 ++++
kernel/entry/Makefile | 3 +
kernel/entry/kvm.c | 51 +++++++++++++++++++++++++++++
virt/kvm/Kconfig | 3 +
5 files changed, 144 insertions(+), 1 deletion(-)

--- /dev/null
+++ b/include/linux/entry-kvm.h
@@ -0,0 +1,80 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_ENTRYKVM_H
+#define __LINUX_ENTRYKVM_H
+
+#include <linux/entry-common.h>
+
+/* Transfer to guest mode work */
+#ifdef CONFIG_KVM_XFER_TO_GUEST_WORK
+
+#ifndef ARCH_XFER_TO_GUEST_MODE_WORK
+# define ARCH_XFER_TO_GUEST_MODE_WORK (0)
+#endif
+
+#define XFER_TO_GUEST_MODE_WORK \
+ (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
+ _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
+
+struct kvm_vcpu;
+
+/**
+ * arch_xfer_to_guest_mode_work - Architecture specific xfer to guest mode
+ * work function.
+ * @vcpu: Pointer to current's VCPU data
+ * @ti_work: Cached TIF flags gathered in xfer_to_guest_mode()
+ *
+ * Invoked from xfer_to_guest_mode_work(). Defaults to NOOP. Can be
+ * replaced by architecture specific code.
+ */
+static inline int arch_xfer_to_guest_mode_work(struct kvm_vcpu *vcpu,
+ unsigned long ti_work);
+
+#ifndef arch_xfer_to_guest_mode_work
+static inline int arch_xfer_to_guest_mode_work(struct kvm_vcpu *vcpu,
+ unsigned long ti_work)
+{
+ return 0;
+}
+#endif
+
+/**
+ * xfer_to_guest_mode - Check and handle pending work which needs to be
+ * handled before returning to guest mode
+ * @vcpu: Pointer to current's VCPU data
+ *
+ * Returns: 0 or an error code
+ */
+int xfer_to_guest_mode(struct kvm_vcpu *vcpu);
+
+/**
+ * __xfer_to_guest_mode_work_pending - Check if work is pending
+ *
+ * Returns: True if work pending, False otherwise.
+ *
+ * Bare variant of xfer_to_guest_mode_work_pending(). Can be called from
+ * interrupt enabled code for racy quick checks with care.
+ */
+static inline bool __xfer_to_guest_mode_work_pending(void)
+{
+ unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
+
+ return !!(ti_work & XFER_TO_GUEST_MODE_WORK);
+}
+
+/**
+ * xfer_to_guest_mode_work_pending - Check if work is pending which needs to be
+ * handled before returning to guest mode
+ *
+ * Returns: True if work pending, False otherwise.
+ *
+ * Has to be invoked with interrupts disabled before the transition to
+ * guest mode.
+ */
+static inline bool xfer_to_guest_mode_work_pending(void)
+{
+ lockdep_assert_irqs_disabled();
+ return __xfer_to_guest_mode_work_pending();
+}
+#endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
+
+#endif
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1439,4 +1439,12 @@ int kvm_vm_create_worker_thread(struct k
uintptr_t data, const char *name,
struct task_struct **thread_ptr);

+#ifdef CONFIG_KVM_XFER_TO_GUEST_WORK
+static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
+{
+ vcpu->run->exit_reason = KVM_EXIT_INTR;
+ vcpu->stat.signal_exits++;
+}
+#endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
+
#endif
--- a/kernel/entry/Makefile
+++ b/kernel/entry/Makefile
@@ -9,4 +9,5 @@ KCOV_INSTRUMENT := n
CFLAGS_REMOVE_common.o = -fstack-protector -fstack-protector-strong
CFLAGS_common.o += -fno-stack-protector

-obj-$(CONFIG_GENERIC_ENTRY) += common.o
+obj-$(CONFIG_GENERIC_ENTRY) += common.o
+obj-$(CONFIG_KVM_XFER_TO_GUEST_WORK) += kvm.o
--- /dev/null
+++ b/kernel/entry/kvm.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/entry-kvm.h>
+#include <linux/kvm_host.h>
+
+static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
+{
+ do {
+ int ret;
+
+ if (ti_work & _TIF_SIGPENDING) {
+ kvm_handle_signal_exit(vcpu);
+ return -EINTR;
+ }
+
+ if (ti_work & _TIF_NEED_RESCHED)
+ schedule();
+
+ if (ti_work & _TIF_NOTIFY_RESUME) {
+ clear_thread_flag(TIF_NOTIFY_RESUME);
+ tracehook_notify_resume(NULL);
+ }
+
+ ret = arch_xfer_to_guest_mode_work(vcpu, ti_work);
+ if (ret)
+ return ret;
+
+ ti_work = READ_ONCE(current_thread_info()->flags);
+ } while (ti_work & XFER_TO_GUEST_MODE_WORK || need_resched());
+ return 0;
+}
+
+int xfer_to_guest_mode(struct kvm_vcpu *vcpu)
+{
+ unsigned long ti_work;
+
+ /*
+ * This is invoked from the outer guest loop with interrupts and
+ * preemption enabled.
+ *
+ * KVM invokes xfer_to_guest_mode_work_pending() with interrupts
+ * disabled in the inner loop before going into guest mode. No need
+ * to disable interrupts here.
+ */
+ ti_work = READ_ONCE(current_thread_info()->flags);
+ if (!(ti_work & XFER_TO_GUEST_MODE_WORK))
+ return 0;
+
+ return xfer_to_guest_mode_work(vcpu, ti_work);
+}
+EXPORT_SYMBOL_GPL(xfer_to_guest_mode);
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -60,3 +60,6 @@ config HAVE_KVM_VCPU_RUN_PID_CHANGE

config HAVE_KVM_NO_POLL
bool
+
+config KVM_XFER_TO_GUEST_WORK
+ bool


Subject: [tip: core/entry] entry: Provide infrastructure for work before transitioning to guest mode

The following commit has been merged into the core/entry branch of tip:

Commit-ID: 935ace2fb5cc49ae88bd1f1735ddc51cdc2ebfb3
Gitweb: https://git.kernel.org/tip/935ace2fb5cc49ae88bd1f1735ddc51cdc2ebfb3
Author: Thomas Gleixner <[email protected]>
AuthorDate: Wed, 22 Jul 2020 23:59:59 +02:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Fri, 24 Jul 2020 15:03:42 +02:00

entry: Provide infrastructure for work before transitioning to guest mode

Entering a guest is similar to exiting to user space. Pending work like
handling signals, rescheduling, task work etc. needs to be handled before
that.

Provide generic infrastructure to avoid duplication of the same handling
code all over the place.

The transfer to guest mode handling is different from the exit to usermode
handling, e.g. vs. rseq and live patching, so a separate function is used.

The initial list of work items handled is:

TIF_SIGPENDING, TIF_NEED_RESCHED, TIF_NOTIFY_RESUME

Architecture specific TIF flags can be added via defines in the
architecture specific include files.

The calling convention is also different from the syscall/interrupt entry
functions as KVM invokes this from the outer vcpu_run() loop with
interrupts and preemption enabled. To prevent missing a pending work item
it invokes a check for pending TIF work from interrupt disabled code right
before transitioning to guest mode. The lockdep, RCU and tracing state
handling is also done directly around the switch to and from guest mode.

Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
include/linux/entry-kvm.h | 80 ++++++++++++++++++++++++++++++++++++++-
include/linux/kvm_host.h | 8 ++++-
kernel/entry/Makefile | 3 +-
kernel/entry/kvm.c | 51 ++++++++++++++++++++++++-
virt/kvm/Kconfig | 3 +-
5 files changed, 144 insertions(+), 1 deletion(-)
create mode 100644 include/linux/entry-kvm.h
create mode 100644 kernel/entry/kvm.c

diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h
new file mode 100644
index 0000000..0cef17a
--- /dev/null
+++ b/include/linux/entry-kvm.h
@@ -0,0 +1,80 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_ENTRYKVM_H
+#define __LINUX_ENTRYKVM_H
+
+#include <linux/entry-common.h>
+
+/* Transfer to guest mode work */
+#ifdef CONFIG_KVM_XFER_TO_GUEST_WORK
+
+#ifndef ARCH_XFER_TO_GUEST_MODE_WORK
+# define ARCH_XFER_TO_GUEST_MODE_WORK (0)
+#endif
+
+#define XFER_TO_GUEST_MODE_WORK \
+ (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
+ _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
+
+struct kvm_vcpu;
+
+/**
+ * arch_xfer_to_guest_mode_handle_work - Architecture specific xfer to guest
+ * mode work handling function.
+ * @vcpu: Pointer to current's VCPU data
+ * @ti_work: Cached TIF flags gathered in xfer_to_guest_mode_handle_work()
+ *
+ * Invoked from xfer_to_guest_mode_handle_work(). Defaults to NOOP. Can be
+ * replaced by architecture specific code.
+ */
+static inline int arch_xfer_to_guest_mode_handle_work(struct kvm_vcpu *vcpu,
+ unsigned long ti_work);
+
+#ifndef arch_xfer_to_guest_mode_work
+static inline int arch_xfer_to_guest_mode_handle_work(struct kvm_vcpu *vcpu,
+ unsigned long ti_work)
+{
+ return 0;
+}
+#endif
+
+/**
+ * xfer_to_guest_mode_handle_work - Check and handle pending work which needs
+ * to be handled before going to guest mode
+ * @vcpu: Pointer to current's VCPU data
+ *
+ * Returns: 0 or an error code
+ */
+int xfer_to_guest_mode_handle_work(struct kvm_vcpu *vcpu);
+
+/**
+ * __xfer_to_guest_mode_work_pending - Check if work is pending
+ *
+ * Returns: True if work pending, False otherwise.
+ *
+ * Bare variant of xfer_to_guest_mode_work_pending(). Can be called from
+ * interrupt enabled code for racy quick checks with care.
+ */
+static inline bool __xfer_to_guest_mode_work_pending(void)
+{
+ unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
+
+ return !!(ti_work & XFER_TO_GUEST_MODE_WORK);
+}
+
+/**
+ * xfer_to_guest_mode_work_pending - Check if work is pending which needs to be
+ * handled before returning to guest mode
+ *
+ * Returns: True if work pending, False otherwise.
+ *
+ * Has to be invoked with interrupts disabled before the transition to
+ * guest mode.
+ */
+static inline bool xfer_to_guest_mode_work_pending(void)
+{
+ lockdep_assert_irqs_disabled();
+ return __xfer_to_guest_mode_work_pending();
+}
+#endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
+
+#endif
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d564855..ac83e9c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1439,4 +1439,12 @@ int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn,
uintptr_t data, const char *name,
struct task_struct **thread_ptr);

+#ifdef CONFIG_KVM_XFER_TO_GUEST_WORK
+static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
+{
+ vcpu->run->exit_reason = KVM_EXIT_INTR;
+ vcpu->stat.signal_exits++;
+}
+#endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
+
#endif
diff --git a/kernel/entry/Makefile b/kernel/entry/Makefile
index c207d20..34c8a3f 100644
--- a/kernel/entry/Makefile
+++ b/kernel/entry/Makefile
@@ -9,4 +9,5 @@ KCOV_INSTRUMENT := n
CFLAGS_REMOVE_common.o = -fstack-protector -fstack-protector-strong
CFLAGS_common.o += -fno-stack-protector

-obj-$(CONFIG_GENERIC_ENTRY) += common.o
+obj-$(CONFIG_GENERIC_ENTRY) += common.o
+obj-$(CONFIG_KVM_XFER_TO_GUEST_WORK) += kvm.o
diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
new file mode 100644
index 0000000..eb1a8a4
--- /dev/null
+++ b/kernel/entry/kvm.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/entry-kvm.h>
+#include <linux/kvm_host.h>
+
+static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
+{
+ do {
+ int ret;
+
+ if (ti_work & _TIF_SIGPENDING) {
+ kvm_handle_signal_exit(vcpu);
+ return -EINTR;
+ }
+
+ if (ti_work & _TIF_NEED_RESCHED)
+ schedule();
+
+ if (ti_work & _TIF_NOTIFY_RESUME) {
+ clear_thread_flag(TIF_NOTIFY_RESUME);
+ tracehook_notify_resume(NULL);
+ }
+
+ ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
+ if (ret)
+ return ret;
+
+ ti_work = READ_ONCE(current_thread_info()->flags);
+ } while (ti_work & XFER_TO_GUEST_MODE_WORK || need_resched());
+ return 0;
+}
+
+int xfer_to_guest_mode_handle_work(struct kvm_vcpu *vcpu)
+{
+ unsigned long ti_work;
+
+ /*
+ * This is invoked from the outer guest loop with interrupts and
+ * preemption enabled.
+ *
+ * KVM invokes xfer_to_guest_mode_work_pending() with interrupts
+ * disabled in the inner loop before going into guest mode. No need
+ * to disable interrupts here.
+ */
+ ti_work = READ_ONCE(current_thread_info()->flags);
+ if (!(ti_work & XFER_TO_GUEST_MODE_WORK))
+ return 0;
+
+ return xfer_to_guest_mode_work(vcpu, ti_work);
+}
+EXPORT_SYMBOL_GPL(xfer_to_guest_mode_handle_work);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index aad9284..1c37ccd 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -60,3 +60,6 @@ config HAVE_KVM_VCPU_RUN_PID_CHANGE

config HAVE_KVM_NO_POLL
bool
+
+config KVM_XFER_TO_GUEST_WORK
+ bool

2020-07-29 16:58:04

by Qian Cai

[permalink] [raw]
Subject: Re: [patch V5 05/15] entry: Provide infrastructure for work before transitioning to guest mode

On Wed, Jul 22, 2020 at 11:59:59PM +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner <[email protected]>
>
> Entering a guest is similar to exiting to user space. Pending work like
> handling signals, rescheduling, task work etc. needs to be handled before
> that.
>
> Provide generic infrastructure to avoid duplication of the same handling
> code all over the place.
>
> The transfer to guest mode handling is different from the exit to usermode
> handling, e.g. vs. rseq and live patching, so a separate function is used.
>
> The initial list of work items handled is:
>
> TIF_SIGPENDING, TIF_NEED_RESCHED, TIF_NOTIFY_RESUME
>
> Architecture specific TIF flags can be added via defines in the
> architecture specific include files.
>
> The calling convention is also different from the syscall/interrupt entry
> functions as KVM invokes this from the outer vcpu_run() loop with
> interrupts and preemption enabled. To prevent missing a pending work item
> it invokes a check for pending TIF work from interrupt disabled code right
> before transitioning to guest mode. The lockdep, RCU and tracing state
> handling is also done directly around the switch to and from guest mode.
>
> Signed-off-by: Thomas Gleixner <[email protected]>

SR-IOV will start trigger a warning below in this commit,

A reproducer:
# echo 1 > /sys/class/net/eno58/device/sriov_numvfs (bnx2x)
# git clone https://gitlab.com/cailca/linux-mm
# cd linux-mm; make
# ./random -x 0-100 -k0000:02:09.0

https://gitlab.com/cailca/linux-mm/-/blob/master/random.c#L1247
(x86.config is also included in the repo.)

[ 765.409027] ------------[ cut here ]------------
[ 765.434611] WARNING: CPU: 13 PID: 3377 at include/linux/entry-kvm.h:75 kvm_arch_vcpu_ioctl_run+0xb52/0x1320 [kvm]
[ 765.487229] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio loop nls_ascii nls_cp437 vfat fat
[ 766.118568] CPU: 13 PID: 3377 Comm: qemu-kvm Not tainted 5.8.0-rc7-next-20200729 #2
[ 766.147011] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 766.147016] ret_from_fork+0x22/0x30
[ 766.782999] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
[ 766.817799] RIP: 0010:kvm_arch_vcpu_ioctl_run+0xb52/0x1320 [kvm]
[ 766.850054] Code: e8 03 0f b6 04 18 84 c0 74 06 0f 8e 4a 03 00 00 41 c6 85 50 2d 00 00 00 e9 24 f8 ff ff 4c 89 ef e8 93 a8 02 00 e9 3d f8 ff ff <0f> 0b e9 f2 f8 ff ff 48 81 c6 c0 01 00 00 4c 89 ef e8 18 4c fe
ff
[ 766.942485] RSP: 0018:ffffc90007017b58 EFLAGS: 00010202
[ 766.970899] RAX: 0000000000000001 RBX: dffffc0000000000 RCX: ffffffffc0f0ef8a
[ 767.008488] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88956a3821a0
[ 767.045726] RBP: ffffc90007017bb8 R08: ffffed1269290010 R09: ffffed1269290010
[ 767.083242] R10: ffff88934948007f R11: ffffed126929000f R12: ffff889349480424
[ 767.120809] R13: ffff889349480040 R14: ffff88934948006c R15: ffff88980e2da000
[ 767.160531] FS: 00007f12b0e98700(0000) GS:ffff88985fa40000(0000) knlGS:0000000000000000
[ 767.203199] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 767.235083] CR2: 0000000000000000 CR3: 000000120d186002 CR4: 00000000001726e0
[ 767.272303] Call Trace:
[ 767.272303] Call Trace:
[ 767.286947] kvm_vcpu_ioctl+0x3e9/0xb30 [kvm]
[ 767.311162] ? kvm_vcpu_block+0xd30/0xd30 [kvm]
[ 767.335281] ? find_held_lock+0x33/0x1c0
[ 767.357492] ? __fget_files+0x1cb/0x300
[ 767.378687] ? lock_downgrade+0x730/0x730
[ 767.401410] ? rcu_read_lock_held+0xaa/0xc0
[ 767.424228] ? rcu_read_lock_sched_held+0xd0/0xd0
[ 767.450078] ? find_held_lock+0x33/0x1c0
[ 767.471876] ? __fget_files+0x1e5/0x300
[ 767.493841] __x64_sys_ioctl+0x315/0x102f
[ 767.516579] ? generic_block_fiemap+0x60/0x60
[ 767.540678] ? syscall_enter_from_user_mode+0x1b/0x210
[ 767.568612] ? rcu_read_lock_sched_held+0xaa/0xd0
[ 767.593950] ? lockdep_hardirqs_on_prepare+0x33e/0x4e0
[ 767.621355] ? syscall_enter_from_user_mode+0x20/0x210
[ 767.649283] ? trace_hardirqs_on+0x20/0x1b5
[ 767.673770] do_syscall_64+0x33/0x40
[ 767.694699] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 767.723392] RIP: 0033:0x7f12b999a87b
[ 767.744252] Code: 0f 1e fa 48 8b 05 0d 96 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 95 2c 00 f7 d8 64 89 8
[ 767.837418] RSP: 002b:00007f12b0e97678 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 767.877074] RAX: ffffffffffffffda RBX: 000055712c857c60 RCX: 00007f12b999a87b
[ 767.914242] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000014
[ 767.951945] RBP: 000055712c857cfb R08: 000055712ac4fad0 R09: 000055712b3f2080
[ 767.989036] R10: 0000000000000000 R11: 0000000000000246 R12: 000055712ac38100
[ 768.025851] R13: 00007ffd996e9edf R14: 00007f12becc5000 R15: 000055712c857c60
[ 768.063363] irq event stamp: 5257
[ 768.082559] hardirqs last enabled at (5273): [<ffffffffa1800ce2>] asm_sysvec_call_function_single+0x12/0x20
[ 768.132704] hardirqs last disabled at (5290): [<ffffffffa179ae3d>] irqentry_enter+0x1d/0x50
[ 768.176563] softirqs last enabled at (5108): [<ffffffffa1a0070f>] __do_softirq+0x70f/0xa9f
[ 768.221270] softirqs last disabled at (5093): [<ffffffffa1800ec2>] asm_call_on_stack+0x12/0x20
[ 768.267273] ---[ end trace 8730450ad8cfee9f ]---

> ---
> V5: Rename exit -> xfer (Sean)
>
> V3: Reworked and simplified version adopted to recent X86 and KVM changes
>
> V2: Moved KVM specific functions to kvm (Paolo)
> Added lockdep assert (Andy)
> Dropped live patching from enter guest mode work (Miroslav)
> ---
> include/linux/entry-kvm.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/kvm_host.h | 8 ++++
> kernel/entry/Makefile | 3 +
> kernel/entry/kvm.c | 51 +++++++++++++++++++++++++++++
> virt/kvm/Kconfig | 3 +
> 5 files changed, 144 insertions(+), 1 deletion(-)
>
> --- /dev/null
> +++ b/include/linux/entry-kvm.h
> @@ -0,0 +1,80 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __LINUX_ENTRYKVM_H
> +#define __LINUX_ENTRYKVM_H
> +
> +#include <linux/entry-common.h>
> +
> +/* Transfer to guest mode work */
> +#ifdef CONFIG_KVM_XFER_TO_GUEST_WORK
> +
> +#ifndef ARCH_XFER_TO_GUEST_MODE_WORK
> +# define ARCH_XFER_TO_GUEST_MODE_WORK (0)
> +#endif
> +
> +#define XFER_TO_GUEST_MODE_WORK \
> + (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
> + _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK)
> +
> +struct kvm_vcpu;
> +
> +/**
> + * arch_xfer_to_guest_mode_work - Architecture specific xfer to guest mode
> + * work function.
> + * @vcpu: Pointer to current's VCPU data
> + * @ti_work: Cached TIF flags gathered in xfer_to_guest_mode()
> + *
> + * Invoked from xfer_to_guest_mode_work(). Defaults to NOOP. Can be
> + * replaced by architecture specific code.
> + */
> +static inline int arch_xfer_to_guest_mode_work(struct kvm_vcpu *vcpu,
> + unsigned long ti_work);
> +
> +#ifndef arch_xfer_to_guest_mode_work
> +static inline int arch_xfer_to_guest_mode_work(struct kvm_vcpu *vcpu,
> + unsigned long ti_work)
> +{
> + return 0;
> +}
> +#endif
> +
> +/**
> + * xfer_to_guest_mode - Check and handle pending work which needs to be
> + * handled before returning to guest mode
> + * @vcpu: Pointer to current's VCPU data
> + *
> + * Returns: 0 or an error code
> + */
> +int xfer_to_guest_mode(struct kvm_vcpu *vcpu);
> +
> +/**
> + * __xfer_to_guest_mode_work_pending - Check if work is pending
> + *
> + * Returns: True if work pending, False otherwise.
> + *
> + * Bare variant of xfer_to_guest_mode_work_pending(). Can be called from
> + * interrupt enabled code for racy quick checks with care.
> + */
> +static inline bool __xfer_to_guest_mode_work_pending(void)
> +{
> + unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> +
> + return !!(ti_work & XFER_TO_GUEST_MODE_WORK);
> +}
> +
> +/**
> + * xfer_to_guest_mode_work_pending - Check if work is pending which needs to be
> + * handled before returning to guest mode
> + *
> + * Returns: True if work pending, False otherwise.
> + *
> + * Has to be invoked with interrupts disabled before the transition to
> + * guest mode.
> + */
> +static inline bool xfer_to_guest_mode_work_pending(void)
> +{
> + lockdep_assert_irqs_disabled();
> + return __xfer_to_guest_mode_work_pending();
> +}
> +#endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
> +
> +#endif
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1439,4 +1439,12 @@ int kvm_vm_create_worker_thread(struct k
> uintptr_t data, const char *name,
> struct task_struct **thread_ptr);
>
> +#ifdef CONFIG_KVM_XFER_TO_GUEST_WORK
> +static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
> +{
> + vcpu->run->exit_reason = KVM_EXIT_INTR;
> + vcpu->stat.signal_exits++;
> +}
> +#endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
> +
> #endif
> --- a/kernel/entry/Makefile
> +++ b/kernel/entry/Makefile
> @@ -9,4 +9,5 @@ KCOV_INSTRUMENT := n
> CFLAGS_REMOVE_common.o = -fstack-protector -fstack-protector-strong
> CFLAGS_common.o += -fno-stack-protector
>
> -obj-$(CONFIG_GENERIC_ENTRY) += common.o
> +obj-$(CONFIG_GENERIC_ENTRY) += common.o
> +obj-$(CONFIG_KVM_XFER_TO_GUEST_WORK) += kvm.o
> --- /dev/null
> +++ b/kernel/entry/kvm.c
> @@ -0,0 +1,51 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/entry-kvm.h>
> +#include <linux/kvm_host.h>
> +
> +static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
> +{
> + do {
> + int ret;
> +
> + if (ti_work & _TIF_SIGPENDING) {
> + kvm_handle_signal_exit(vcpu);
> + return -EINTR;
> + }
> +
> + if (ti_work & _TIF_NEED_RESCHED)
> + schedule();
> +
> + if (ti_work & _TIF_NOTIFY_RESUME) {
> + clear_thread_flag(TIF_NOTIFY_RESUME);
> + tracehook_notify_resume(NULL);
> + }
> +
> + ret = arch_xfer_to_guest_mode_work(vcpu, ti_work);
> + if (ret)
> + return ret;
> +
> + ti_work = READ_ONCE(current_thread_info()->flags);
> + } while (ti_work & XFER_TO_GUEST_MODE_WORK || need_resched());
> + return 0;
> +}
> +
> +int xfer_to_guest_mode(struct kvm_vcpu *vcpu)
> +{
> + unsigned long ti_work;
> +
> + /*
> + * This is invoked from the outer guest loop with interrupts and
> + * preemption enabled.
> + *
> + * KVM invokes xfer_to_guest_mode_work_pending() with interrupts
> + * disabled in the inner loop before going into guest mode. No need
> + * to disable interrupts here.
> + */
> + ti_work = READ_ONCE(current_thread_info()->flags);
> + if (!(ti_work & XFER_TO_GUEST_MODE_WORK))
> + return 0;
> +
> + return xfer_to_guest_mode_work(vcpu, ti_work);
> +}
> +EXPORT_SYMBOL_GPL(xfer_to_guest_mode);
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -60,3 +60,6 @@ config HAVE_KVM_VCPU_RUN_PID_CHANGE
>
> config HAVE_KVM_NO_POLL
> bool
> +
> +config KVM_XFER_TO_GUEST_WORK
> + bool
>

2020-07-30 07:19:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V5 05/15] entry: Provide infrastructure for work before transitioning to guest mode

Qian Cai <[email protected]> writes:
> On Wed, Jul 22, 2020 at 11:59:59PM +0200, Thomas Gleixner wrote:
> SR-IOV will start trigger a warning below in this commit,
>
> [ 765.434611] WARNING: CPU: 13 PID: 3377 at include/linux/entry-kvm.h:75 kvm_arch_vcpu_ioctl_run+0xb52/0x1320 [kvm]

Yes, I'm a moron. Fixed it locally and failed to transfer the fixup when
merging it. Fix below.

> [ 768.221270] softirqs last disabled at (5093): [<ffffffffa1800ec2>] asm_call_on_stack+0x12/0x20
> [ 768.267273] ---[ end trace 8730450ad8cfee9f ]---

Can you pretty please trim your replies?

>> ---
>> V5: Rename exit -> xfer (Sean)

<removed 200 lines of useless information>

Thanks,

tglx
---
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 82d4a9e88908..532597265c50 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8682,7 +8682,7 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
break;
}

- if (xfer_to_guest_mode_work_pending()) {
+ if (__xfer_to_guest_mode_work_pending()) {
srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
r = xfer_to_guest_mode_handle_work(vcpu);
if (r)

Subject: [tip: x86/entry] x86/kvm: Use __xfer_to_guest_mode_work_pending() in kvm_run_vcpu()

The following commit has been merged into the x86/entry branch of tip:

Commit-ID: f3020b8891b890b48d9e1a83241e3cce518427c1
Gitweb: https://git.kernel.org/tip/f3020b8891b890b48d9e1a83241e3cce518427c1
Author: Thomas Gleixner <[email protected]>
AuthorDate: Thu, 30 Jul 2020 09:19:01 +02:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Thu, 30 Jul 2020 12:31:47 +02:00

x86/kvm: Use __xfer_to_guest_mode_work_pending() in kvm_run_vcpu()

The comments explicitely explain that the work flags check and handling in
kvm_run_vcpu() is done with preemption and interrupts enabled as KVM
invokes the check again right before entering guest mode with interrupts
disabled which guarantees that the work flags are observed and handled
before VMENTER.

Nevertheless the flag pending check in kvm_run_vcpu() uses the helper
variant which requires interrupts to be disabled triggering an instant
lockdep splat. This was caught in testing before and then not fixed up in
the patch before applying. :(

Use the relaxed and intentionally racy __xfer_to_guest_mode_work_pending()
instead.

Fixes: 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function")
Reported-by: Qian Cai <[email protected]> writes:
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]


---
arch/x86/kvm/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 82d4a9e..5325972 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8682,7 +8682,7 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
break;
}

- if (xfer_to_guest_mode_work_pending()) {
+ if (__xfer_to_guest_mode_work_pending()) {
srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
r = xfer_to_guest_mode_handle_work(vcpu);
if (r)