Hi, there is a null-ptr-deref bug in kvm_dirty_ring_get in
virt/kvm/dirty_ring.c and I reproduce it on 5.15.0-rc5+.
###analyze
we can call KVM_XEN_HVM_SET_ATTR ioctl and it would invoke
kvm_xen_hvm_set_attr(), it would call mark_page_dirty_in_slot().
mark_page_dirty_in_slot()
```
void mark_page_dirty_in_slot(struct kvm *kvm,
struct kvm_memory_slot *memslot,
gfn_t gfn)
{
if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
unsigned long rel_gfn = gfn - memslot->base_gfn;
u32 slot = (memslot->as_id << 16) | memslot->id;
if (kvm->dirty_ring_size)
kvm_dirty_ring_push(kvm_dirty_ring_get(kvm),
slot, rel_gfn);
else
set_bit_le(rel_gfn, memslot->dirty_bitmap);
}
}
```
mark_page_dirty_in_slot() would call kvm_dirty_ring_get() to get
vcpu->dirty_ring.
kvm_dirty_ring_get()
```
struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm)
{
struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke
kvm_get_running_vcpu() to get a vcpu.
WARN_ON_ONCE(vcpu->kvm != kvm); [1]
return &vcpu->dirty_ring;
}
```
but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so
vcpu is NULL.
[1].vcpu->kvm caused a null pointer dereference.
###Crash log
root@syzkaller:/home/user# ./kvm_dirty_ring_get
[ 2608.490187][ T6513] BUG: kernel NULL pointer dereference, address:
0000000000000000
[ 2608.491652][ T6513] #PF: supervisor read access in kernel mode
[ 2608.492713][ T6513] #PF: error_code(0x0000) - not-present page
[ 2608.493770][ T6513] PGD 15944067 P4D 15944067 PUD 1589d067 PMD 0
[ 2608.495568][ T6513] Oops: 0000 [#1] PREEMPT SMP
[ 2608.496355][ T6513] CPU: 1 PID: 6513 Comm: kvm_dirty_ring_ Not
tainted 5.15.0-rc5+ #14
[ 2608.497755][ T6513] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[ 2608.499451][ T6513] RIP: 0010:kvm_dirty_ring_get+0x9/0x20
[ 2608.500480][ T6513] Code: 90 e8 5b bb 04 00 83 c0 40 c3 0f 1f 80 00
00 00 00 8b 07 8b 57 04 29 d0 39 47 0c 0f 96 c0 c3 66 90 cc 48 89 fb
e8 17 06 ff ff <48> b
[ 2608.503997][ T6513] RSP: 0018:ffffc90000ab3c08 EFLAGS: 00010286
[ 2608.505054][ T6513] RAX: 0000000000000000 RBX: ffffc90000abd000
RCX: 0000000000000000
[ 2608.506346][ T6513] RDX: 0000000000000001 RSI: ffffffff84fc5baf
RDI: 00000000ffffffff
[ 2608.507705][ T6513] RBP: 0000000000000000 R08: 0000000000000000
R09: 0000000000050198
[ 2608.509119][ T6513] R10: 0000000000000001 R11: 0000000000000000
R12: 0000000000000000
[ 2608.510527][ T6513] R13: 0000000020fff000 R14: 0000000000000000
R15: 0000000000000004
[ 2608.512259][ T6513] FS: 0000000001cb0880(0000)
GS:ffff88807ec00000(0000) knlGS:0000000000000000
[ 2608.513848][ T6513] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2608.515061][ T6513] CR2: 0000000000000000 CR3: 000000001583c000
CR4: 00000000000006e0
[ 2608.516506][ T6513] Call Trace:
[ 2608.517110][ T6513] mark_page_dirty_in_slot.part.0+0x21/0x50
[ 2608.518163][ T6513] __kvm_write_guest_page+0xa1/0xc0
[ 2608.519078][ T6513] kvm_write_guest+0x42/0x80
[ 2608.519901][ T6513] kvm_write_wall_clock+0x7f/0x140
[ 2608.520835][ T6513] kvm_xen_hvm_set_attr+0x13d/0x190
[ 2608.521775][ T6513] kvm_arch_vm_ioctl+0xa8b/0xc50
[ 2608.522762][ T6513] ? tomoyo_path_number_perm+0xee/0x290
[ 2608.523771][ T6513] kvm_vm_ioctl+0x716/0xe10
[ 2608.524545][ T6513] __x64_sys_ioctl+0x7b/0xb0
[ 2608.525362][ T6513] do_syscall_64+0x35/0xb0
[ 2608.530275][ T6513] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 2608.531327][ T6513] RIP: 0033:0x44953d
[ 2608.532096][ T6513] Code: 28 c3 e8 36 29 00 00 66 0f 1f 44 00 00 f3
0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b
4c 24 08 0f 05 <48> 8
[ 2608.535565][ T6513] RSP: 002b:00007ffeb22c2238 EFLAGS: 00000202
ORIG_RAX: 0000000000000010
[ 2608.537028][ T6513] RAX: ffffffffffffffda RBX: 0000000000400518
RCX: 000000000044953d
[ 2608.538436][ T6513] RDX: 0000000020001080 RSI: 000000004048aec9
RDI: 0000000000000004
[ 2608.539851][ T6513] RBP: 00007ffeb22c2250 R08: 0000000000000000
R09: 0000000000000000
[ 2608.541273][ T6513] R10: 0000000000000000 R11: 0000000000000202
R12: 0000000000402fb0
[ 2608.542845][ T6513] R13: 0000000000000000 R14: 00000000004c0018
R15: 0000000000000000
[ 2608.544260][ T6513] Modules linked in:
[ 2608.544965][ T6513] CR2: 0000000000000000
[ 2608.547791][ T6513] ---[ end trace 69dbdf44c6028ede ]---
[ 2608.548674][ T6513] RIP: 0010:kvm_dirty_ring_get+0x9/0x20
[ 2608.549513][ T6513] Code: 90 e8 5b bb 04 00 83 c0 40 c3 0f 1f 80 00
00 00 00 8b 07 8b 57 04 29 d0 39 47 0c 0f 96 c0 c3 66 90 cc 48 89 fb
e8 17 06 ff ff <48> b
[ 2608.552808][ T6513] RSP: 0018:ffffc90000ab3c08 EFLAGS: 00010286
[ 2608.553702][ T6513] RAX: 0000000000000000 RBX: ffffc90000abd000
RCX: 0000000000000000
[ 2608.556308][ T6513] RDX: 0000000000000001 RSI: ffffffff84fc5baf
RDI: 00000000ffffffff
[ 2608.557778][ T6513] RBP: 0000000000000000 R08: 0000000000000000
R09: 0000000000050198
[ 2608.559314][ T6513] R10: 0000000000000001 R11: 0000000000000000
R12: 0000000000000000
[ 2608.560877][ T6513] R13: 0000000020fff000 R14: 0000000000000000
R15: 0000000000000004
[ 2608.562799][ T6513] FS: 0000000001cb0880(0000)
GS:ffff88803ec00000(0000) knlGS:0000000000000000
[ 2608.564529][ T6513] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2608.565864][ T6513] CR2: 0000000020000001 CR3: 000000001583c000
CR4: 00000000000006f0
[ 2608.567378][ T6513] Kernel panic - not syncing: Fatal exception
[ 2608.568551][ T6513] Kernel Offset: disabled
[ 2608.574584][ T6513] Rebooting in 86400 seconds..
Regards,
butt3rflyh4ck.
--
Active Defense Lab of Venustech
On 18/10/21 19:14, butt3rflyh4ck wrote:
> {
> struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke
> kvm_get_running_vcpu() to get a vcpu.
>
> WARN_ON_ONCE(vcpu->kvm != kvm); [1]
>
> return &vcpu->dirty_ring;
> }
> ```
> but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so
> vcpu is NULL.
It's not just because there was no call to KVM_CREATE_VCPU; in general
kvm->dirty_ring_size only works if all writes are associated to a
specific vCPU, which is not the case for the one of
kvm_xen_shared_info_init.
David, what do you think? Making dirty-page ring buffer incompatible
with Xen is ugly and I'd rather avoid it; taking the mutex for vcpu 0 is
not an option because, as the reporter said, you might not have even
created a vCPU yet when you call KVM_XEN_HVM_SET_ATTR. The remaining
option would be just "do not mark the page as dirty if the ring buffer
is active". This is feasible because userspace itself has passed the
shared info gfn; but again, it's ugly...
Paolo
I agree with you. But I don’t have a good idea how to fix it
Regards,
butt3rflyh4ck.
On Fri, Oct 22, 2021 at 4:08 AM Paolo Bonzini <[email protected]> wrote:
>
> On 18/10/21 19:14, butt3rflyh4ck wrote:
> > {
> > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke
> > kvm_get_running_vcpu() to get a vcpu.
> >
> > WARN_ON_ONCE(vcpu->kvm != kvm); [1]
> >
> > return &vcpu->dirty_ring;
> > }
> > ```
> > but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so
> > vcpu is NULL.
>
> It's not just because there was no call to KVM_CREATE_VCPU; in general
> kvm->dirty_ring_size only works if all writes are associated to a
> specific vCPU, which is not the case for the one of
> kvm_xen_shared_info_init.
>
> David, what do you think? Making dirty-page ring buffer incompatible
> with Xen is ugly and I'd rather avoid it; taking the mutex for vcpu 0 is
> not an option because, as the reporter said, you might not have even
> created a vCPU yet when you call KVM_XEN_HVM_SET_ATTR. The remaining
> option would be just "do not mark the page as dirty if the ring buffer
> is active". This is feasible because userspace itself has passed the
> shared info gfn; but again, it's ugly...
>
> Paolo
>
--
Active Defense Lab of Venustech
Hi, No one pays attention to this issue?
Regards,
butt3rflyh4ck.
On Fri, Oct 22, 2021 at 4:08 AM Paolo Bonzini <[email protected]> wrote:
>
> On 18/10/21 19:14, butt3rflyh4ck wrote:
> > {
> > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke
> > kvm_get_running_vcpu() to get a vcpu.
> >
> > WARN_ON_ONCE(vcpu->kvm != kvm); [1]
> >
> > return &vcpu->dirty_ring;
> > }
> > ```
> > but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so
> > vcpu is NULL.
>
> It's not just because there was no call to KVM_CREATE_VCPU; in general
> kvm->dirty_ring_size only works if all writes are associated to a
> specific vCPU, which is not the case for the one of
> kvm_xen_shared_info_init.
>
> David, what do you think? Making dirty-page ring buffer incompatible
> with Xen is ugly and I'd rather avoid it; taking the mutex for vcpu 0 is
> not an option because, as the reporter said, you might not have even
> created a vCPU yet when you call KVM_XEN_HVM_SET_ATTR. The remaining
> option would be just "do not mark the page as dirty if the ring buffer
> is active". This is feasible because userspace itself has passed the
> shared info gfn; but again, it's ugly...
>
> Paolo
>
--
Active Defense Lab of Venustech
For this issue, I have reviewed the implementation code of vm
creating,vCPU creating and dirty_ring creating,
I have some ideas.If only judge kvm->dirty_ring_size, determine
whether to call kvm_dirty_ring_push(), this condition is not
sufficient.
can we add a judgement on kvm->created_vcpus. kvm->created_vcpus is not NULL.
After all, there is a situation, no vCPU was created, but
kvm->dirty_ring_size has a value.
Regards,
butt3rflyh4ck.
On Fri, Oct 22, 2021 at 4:08 AM Paolo Bonzini <[email protected]> wrote:
>
> On 18/10/21 19:14, butt3rflyh4ck wrote:
> > {
> > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke
> > kvm_get_running_vcpu() to get a vcpu.
> >
> > WARN_ON_ONCE(vcpu->kvm != kvm); [1]
> >
> > return &vcpu->dirty_ring;
> > }
> > ```
> > but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so
> > vcpu is NULL.
>
> It's not just because there was no call to KVM_CREATE_VCPU; in general
> kvm->dirty_ring_size only works if all writes are associated to a
> specific vCPU, which is not the case for the one of
> kvm_xen_shared_info_init.
>
> David, what do you think? Making dirty-page ring buffer incompatible
> with Xen is ugly and I'd rather avoid it; taking the mutex for vcpu 0 is
> not an option because, as the reporter said, you might not have even
> created a vCPU yet when you call KVM_XEN_HVM_SET_ATTR. The remaining
> option would be just "do not mark the page as dirty if the ring buffer
> is active". This is feasible because userspace itself has passed the
> shared info gfn; but again, it's ugly...
>
> Paolo
>
--
Active Defense Lab of Venustech
On Thu, 2021-10-21 at 22:08 +0200, Paolo Bonzini wrote:
> On 18/10/21 19:14, butt3rflyh4ck wrote:
> > {
> > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke
> > kvm_get_running_vcpu() to get a vcpu.
> >
> > WARN_ON_ONCE(vcpu->kvm != kvm); [1]
> >
> > return &vcpu->dirty_ring;
> > }
> > ```
> > but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so
> > vcpu is NULL.
>
> It's not just because there was no call to KVM_CREATE_VCPU; in general
> kvm->dirty_ring_size only works if all writes are associated to a
> specific vCPU, which is not the case for the one of
> kvm_xen_shared_info_init.
>
> David, what do you think? Making dirty-page ring buffer incompatible
> with Xen is ugly and I'd rather avoid it; taking the mutex for vcpu 0 is
> not an option because, as the reporter said, you might not have even
> created a vCPU yet when you call KVM_XEN_HVM_SET_ATTR. The remaining
> option would be just "do not mark the page as dirty if the ring buffer
> is active". This is feasible because userspace itself has passed the
> shared info gfn; but again, it's ugly...
Sorry for delayed response.
So, from the point of view of Xen guests setting a shinfo page at
runtime, there *is* a vCPU running, and it's made the
XENMAPSPACE_shared_info hypercall to request that the shinfo page be
mapped.
So from KVM's point of view, that vCPU will have exited with
KVM_EXIT_XEN / KVM_EXIT_XEN_HCALL. The VMM is then invoking the
KVM_XEN_HVM_SET_ATTR ioctl from that vCPU's userspace thread, while the
KVM vCPU is idle.
I suppose we could make it a KVM_XEN_VCPU_SET_ATTR instead, and thus
associate it with a particular CPU at least for the initial wallclock
write?
Interrupts get delivered to the shinfo page too, but those will also
have a vCPU associated with them.
On live migration / live update the solution isn't quite so natural;
right now it *is* restored before any vCPUs are created. But if the
kernel made it a KVM_XEN_VCPU_SET_ATTR then the VMM will just have to
restore it as part of restoring the BSP or something. That can work?
On Tue, 2021-11-16 at 16:22 +0000, David Woodhouse wrote:
>
> I suppose we could make it a KVM_XEN_VCPU_SET_ATTR instead, and thus
> associate it with a particular CPU at least for the initial wallclock
> write?
>
We'd end up needing to do all this too, to plumb that 'vcpu' through to
the actual mark_page_dirty_in_slot(). And I might end up wanting to
kill kvm_write_guest() et al completely. If we're never supposed to be
writing without a vCPU associated with the write, then we should always
use kvm_vcpu_write_guest(), shouldn't we?
Paolo, what do you think? Want me to finish and test this and submit
it, along with changing the shinfo address to a per-vCPU thing?
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 326cdfec74a1..d8411ce4db4b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1143,7 +1143,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
/* Mark the page dirty only if the fault is handled successfully */
if (writable && !ret) {
kvm_set_pfn_dirty(pfn);
- mark_page_dirty_in_slot(kvm, memslot, gfn);
+ mark_page_dirty_in_slot(kvm, vcpu, memslot, gfn);
}
out_unlock:
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 33794379949e..4fd2ad5327b6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3090,7 +3090,7 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
return false;
if (is_writable_pte(new_spte) && !is_writable_pte(old_spte))
- mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn);
+ mark_page_dirty_in_slot(vcpu->kvm, vcpu, fault->slot, fault->gfn);
return true;
}
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 0c76c45fdb68..0598515f3ae2 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -184,7 +184,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
/* Enforced by kvm_mmu_hugepage_adjust. */
WARN_ON(level > PG_LEVEL_4K);
- mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
+ mark_page_dirty_in_slot(vcpu->kvm, vcpu, slot, gfn);
}
*new_spte = spte;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index a54c3491af42..c5669c9918a4 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -247,7 +247,7 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
if ((!is_writable_pte(old_spte) || pfn_changed) &&
is_writable_pte(new_spte)) {
slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
- mark_page_dirty_in_slot(kvm, slot, gfn);
+ mark_page_dirty_in_slot(kvm, NULL, slot, gfn);
}
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a879e4d08758..c14ce545fae9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2118,7 +2118,7 @@ static s64 get_kvmclock_base_ns(void)
}
#endif
-void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs)
+void kvm_write_wall_clock(struct kvm_vcpu *vcpu, gpa_t wall_clock, int sec_hi_ofs)
{
int version;
int r;
@@ -2129,7 +2129,7 @@ void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs)
if (!wall_clock)
return;
- r = kvm_read_guest(kvm, wall_clock, &version, sizeof(version));
+ r = kvm_vcpu_read_guest(vcpu, wall_clock, &version, sizeof(version));
if (r)
return;
@@ -2138,7 +2138,7 @@ void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs)
++version;
- if (kvm_write_guest(kvm, wall_clock, &version, sizeof(version)))
+ if (kvm_vcpu_write_guest(vcpu, wall_clock, &version, sizeof(version)))
return;
/*
@@ -2146,22 +2146,22 @@ void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs)
* system time (updated by kvm_guest_time_update below) to the
* wall clock specified here. We do the reverse here.
*/
- wall_nsec = ktime_get_real_ns() - get_kvmclock_ns(kvm);
+ wall_nsec = ktime_get_real_ns() - get_kvmclock_ns(vcpu->kvm);
wc.nsec = do_div(wall_nsec, 1000000000);
wc.sec = (u32)wall_nsec; /* overflow in 2106 guest time */
wc.version = version;
- kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc));
+ kvm_vcpu_write_guest(vcpu, wall_clock, &wc, sizeof(wc));
if (sec_hi_ofs) {
wc_sec_hi = wall_nsec >> 32;
- kvm_write_guest(kvm, wall_clock + sec_hi_ofs,
- &wc_sec_hi, sizeof(wc_sec_hi));
+ kvm_vcpu_write_guest(vcpu, wall_clock + sec_hi_ofs,
+ &wc_sec_hi, sizeof(wc_sec_hi));
}
version++;
- kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
+ kvm_vcpu_write_guest(vcpu, wall_clock, &version, sizeof(version));
}
static void kvm_write_system_time(struct kvm_vcpu *vcpu, gpa_t system_time,
@@ -3353,7 +3353,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
out:
user_access_end();
dirty:
- mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
+ mark_page_dirty_in_slot(vcpu->kvm, vcpu, ghc->memslot, gpa_to_gfn(ghc->gpa));
}
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
@@ -3494,14 +3494,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
vcpu->kvm->arch.wall_clock = data;
- kvm_write_wall_clock(vcpu->kvm, data, 0);
+ kvm_write_wall_clock(vcpu, data, 0);
break;
case MSR_KVM_WALL_CLOCK:
if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
return 1;
vcpu->kvm->arch.wall_clock = data;
- kvm_write_wall_clock(vcpu->kvm, data, 0);
+ kvm_write_wall_clock(vcpu, data, 0);
break;
case MSR_KVM_SYSTEM_TIME_NEW:
if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
@@ -4421,7 +4421,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
if (!copy_to_user_nofault(&st->preempted, &preempted, sizeof(preempted)))
vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
- mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
+ mark_page_dirty_in_slot(vcpu->kvm, vcpu, ghc->memslot, gpa_to_gfn(ghc->gpa));
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index ea264c4502e4..f1dab3413fc8 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -294,7 +294,7 @@ static inline bool kvm_vcpu_latch_init(struct kvm_vcpu *vcpu)
return is_smm(vcpu) || static_call(kvm_x86_apic_init_signal_blocked)(vcpu);
}
-void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs);
+void kvm_write_wall_clock(struct kvm_vcpu *vcpu, gpa_t wall_clock, int sec_hi_ofs);
void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
u64 get_kvmclock_ns(struct kvm *kvm);
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 7a58df25c9b2..e7b0c0af807d 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -37,7 +37,7 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
ret = kvm_gfn_to_pfn_cache_init(kvm, gpc, NULL, true, gpa,
PAGE_SIZE, true);
- if (ret)
+ if (ret && !kvm->vcpus[0])
goto out;
/* Paranoia checks on the 32-bit struct layout */
@@ -60,7 +60,7 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
}
#endif
- kvm_write_wall_clock(kvm, gpa + wc_ofs, sec_hi_ofs - wc_ofs);
+ kvm_write_wall_clock(kvm->vcpus[0], gpa + wc_ofs, sec_hi_ofs - wc_ofs);
kvm_make_all_cpus_request(kvm, KVM_REQ_MASTERCLOCK_UPDATE);
out:
@@ -316,7 +316,7 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v)
err:
user_access_end();
- mark_page_dirty_in_slot(v->kvm, ghc->memslot, ghc->gpa >> PAGE_SHIFT);
+ mark_page_dirty_in_slot(v->kvm, v, ghc->memslot, ghc->gpa >> PAGE_SHIFT);
} else {
__get_user(rc, (u8 __user *)ghc->hva + offset);
}
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 4da8d4a4140b..f3be974f9c5a 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -43,7 +43,8 @@ static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
return 0;
}
-static inline struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm)
+static inline struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm,
+ struct kvm_vcpu *vcpu)
{
return NULL;
}
@@ -78,7 +79,8 @@ static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
u32 kvm_dirty_ring_get_rsvd_entries(void);
int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
-struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm);
+struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm,
+ struct kvm_vcpu *vcpu);
/*
* called with kvm->slots_lock held, returns the number of
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f145a58e37b0..59a92a82e3a3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -954,7 +954,8 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
-void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, gfn_t gfn);
+void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_vcpu *vcpu,
+ struct kvm_memory_slot *memslot, gfn_t gfn);
void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 2b4474387895..879d454eef71 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -36,12 +36,16 @@ static bool kvm_dirty_ring_full(struct kvm_dirty_ring *ring)
return kvm_dirty_ring_used(ring) >= ring->size;
}
-struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm)
+struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm, struct kvm_vcpu *vcpu)
{
- struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+ struct kvm_vcpu *running_vcpu = kvm_get_running_vcpu();
+ WARN_ON_ONCE(vcpu && vcpu != running_vcpu);
WARN_ON_ONCE(vcpu->kvm != kvm);
+ if (!vcpu)
+ vcpu = running_vcpu;
+
return &vcpu->dirty_ring;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4065fd32271a..24f300e5fa96 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2787,7 +2787,7 @@ int kvm_vcpu_read_guest_atomic(struct kvm_vcpu *vcpu, gpa_t gpa,
}
EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic);
-static int __kvm_write_guest_page(struct kvm *kvm,
+static int __kvm_write_guest_page(struct kvm *kvm, struct kvm_vcpu *vcpu,
struct kvm_memory_slot *memslot, gfn_t gfn,
const void *data, int offset, int len)
{
@@ -2800,7 +2800,7 @@ static int __kvm_write_guest_page(struct kvm *kvm,
r = __copy_to_user((void __user *)addr + offset, data, len);
if (r)
return -EFAULT;
- mark_page_dirty_in_slot(kvm, memslot, gfn);
+ mark_page_dirty_in_slot(kvm, vcpu, memslot, gfn);
return 0;
}
@@ -2809,7 +2809,7 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn,
{
struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
- return __kvm_write_guest_page(kvm, slot, gfn, data, offset, len);
+ return __kvm_write_guest_page(kvm, NULL, slot, gfn, data, offset, len);
}
EXPORT_SYMBOL_GPL(kvm_write_guest_page);
@@ -2818,7 +2818,7 @@ int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn,
{
struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
- return __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len);
+ return __kvm_write_guest_page(vcpu->kvm, vcpu, slot, gfn, data, offset, len);
}
EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest_page);
@@ -2937,7 +2937,7 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
r = __copy_to_user((void __user *)ghc->hva + offset, data, len);
if (r)
return -EFAULT;
- mark_page_dirty_in_slot(kvm, ghc->memslot, gpa >> PAGE_SHIFT);
+ mark_page_dirty_in_slot(kvm, NULL, ghc->memslot, gpa >> PAGE_SHIFT);
return 0;
}
@@ -3006,7 +3006,7 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
}
EXPORT_SYMBOL_GPL(kvm_clear_guest);
-void mark_page_dirty_in_slot(struct kvm *kvm,
+void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_vcpu *vcpu,
struct kvm_memory_slot *memslot,
gfn_t gfn)
{
@@ -3015,7 +3015,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
u32 slot = (memslot->as_id << 16) | memslot->id;
if (kvm->dirty_ring_size)
- kvm_dirty_ring_push(kvm_dirty_ring_get(kvm),
+ kvm_dirty_ring_push(kvm_dirty_ring_get(kvm, vcpu),
slot, rel_gfn);
else
set_bit_le(rel_gfn, memslot->dirty_bitmap);
@@ -3028,7 +3028,7 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
struct kvm_memory_slot *memslot;
memslot = gfn_to_memslot(kvm, gfn);
- mark_page_dirty_in_slot(kvm, memslot, gfn);
+ mark_page_dirty_in_slot(kvm, NULL, memslot, gfn);
}
EXPORT_SYMBOL_GPL(mark_page_dirty);
@@ -3037,7 +3037,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn)
struct kvm_memory_slot *memslot;
memslot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
- mark_page_dirty_in_slot(vcpu->kvm, memslot, gfn);
+ mark_page_dirty_in_slot(vcpu->kvm, vcpu, memslot, gfn);
}
EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);
On Thu, 2021-10-21 at 22:08 +0200, Paolo Bonzini wrote:
> On 18/10/21 19:14, butt3rflyh4ck wrote:
> > {
> > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke
> > kvm_get_running_vcpu() to get a vcpu.
> >
> > WARN_ON_ONCE(vcpu->kvm != kvm); [1]
> >
> > return &vcpu->dirty_ring;
> > }
> > ```
> > but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so
> > vcpu is NULL.
>
> It's not just because there was no call to KVM_CREATE_VCPU; in general
> kvm->dirty_ring_size only works if all writes are associated to a
> specific vCPU, which is not the case for the one of
> kvm_xen_shared_info_init.
>
> David, what do you think? Making dirty-page ring buffer incompatible
> with Xen is ugly and I'd rather avoid it; taking the mutex for vcpu 0 is
> not an option because, as the reporter said, you might not have even
> created a vCPU yet when you call KVM_XEN_HVM_SET_ATTR. The remaining
> option would be just "do not mark the page as dirty if the ring buffer
> is active". This is feasible because userspace itself has passed the
> shared info gfn; but again, it's ugly...
I think I am coming to quite like that 'remaining option' as long as we
rephrase it as follows:
KVM does not mark the shared_info page as dirty, and userspace is
expected to *assume* that it is dirty at all times. It's used for
delivering event channel interrupts and the overhead of marking it
dirty each time is just pointless.
I've merged the patch I sent yesterday into my pfncache series because
I realised we needed the same there, but I'll look at making the Xen
code *not* mark the shinfo page dirty when it writes the clock data
there.
Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
On 11/17/21 10:46, Woodhouse, David wrote:
>> The remaining
>> option would be just "do not mark the page as dirty if the ring buffer
>> is active". This is feasible because userspace itself has passed the
>> shared info gfn; but again, it's ugly...
> I think I am coming to quite like that 'remaining option' as long as we
> rephrase it as follows:
>
> KVM does not mark the shared_info page as dirty, and userspace is
> expected to*assume* that it is dirty at all times. It's used for
> delivering event channel interrupts and the overhead of marking it
> dirty each time is just pointless.
For the case of dirty-bitmap, one solution could be to only set a bool
and actually mark the page dirty lazily, at the time of
KVM_GET_DIRTY_LOG. For dirty-ring, I agree that it's easiest if
userspace just "knows" the page is dirty.
Paolo
On Wed, 2021-11-17 at 17:49 +0100, Paolo Bonzini wrote:
> On 11/17/21 10:46, Woodhouse, David wrote:
> > > The remaining
> > > option would be just "do not mark the page as dirty if the ring buffer
> > > is active". This is feasible because userspace itself has passed the
> > > shared info gfn; but again, it's ugly...
> > I think I am coming to quite like that 'remaining option' as long as we
> > rephrase it as follows:
> >
> > KVM does not mark the shared_info page as dirty, and userspace is
> > expected to*assume* that it is dirty at all times. It's used for
> > delivering event channel interrupts and the overhead of marking it
> > dirty each time is just pointless.
>
> For the case of dirty-bitmap, one solution could be to only set a bool
> and actually mark the page dirty lazily, at the time of
> KVM_GET_DIRTY_LOG. For dirty-ring, I agree that it's easiest if
> userspace just "knows" the page is dirty.
TBH we get that former behaviour for free if we just do the access via
the shiny new gfn_to_pfn_cache. The page is marked dirty once, when the
cache is invalidated. I was actually tempted to avoid even setting the
dirty bit even when we write to the shinfo page.
None of which *immediately* solves it for the wall clock part because
we just call the same kvm_write_wall_clock() that the KVM MSR version
invokes, and that uses kvm_write_guest().
I think I'll just reimplement the interesting part of
kvm_write_wall_clock() directly in kvm_xen_shared_info_init(). I don't
much like the duplication but it isn't much and it's the simplest
option I see. And it actually simplifies kvm_write_wall_clock() which
no longer needs the 'sec_hi_ofs' argument. And the Xen version is also
simpler because it can just access the kernel mapping directly.
Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
From: David Woodhouse <[email protected]>
The various kvm_write_guest() and mark_page_dirty() functions must only
ever be called in the context of an active vCPU, because if dirty ring
tracking is enabled it may simply oops when kvm_get_running_vcpu()
returns NULL for the vcpu and then kvm_dirty_ring_get() dereferences it.
This oops was reported by "butt3rflyh4ck" <[email protected]> in
https://lore.kernel.org/kvm/CAFcO6XOmoS7EacN_n6v4Txk7xL7iqRa2gABg3F7E3Naf5uG94g@mail.gmail.com/
That actual bug will be fixed under separate cover but this warning
should help to prevent new ones from being added.
Signed-off-by: David Woodhouse <[email protected]>
---
include/linux/kvm_dirty_ring.h | 6 ------
virt/kvm/dirty_ring.c | 9 ---------
virt/kvm/kvm_main.c | 7 ++++++-
3 files changed, 6 insertions(+), 16 deletions(-)
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 120e5e90fa1d..fb0fa18878e2 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -43,11 +43,6 @@ static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
return 0;
}
-static inline struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm)
-{
- return NULL;
-}
-
static inline int kvm_dirty_ring_reset(struct kvm *kvm,
struct kvm_dirty_ring *ring)
{
@@ -78,7 +73,6 @@ static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
u32 kvm_dirty_ring_get_rsvd_entries(void);
int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
-struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm);
/*
* called with kvm->slots_lock held, returns the number of
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 88f4683198ea..8e9874760fb3 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -36,15 +36,6 @@ static bool kvm_dirty_ring_full(struct kvm_dirty_ring *ring)
return kvm_dirty_ring_used(ring) >= ring->size;
}
-struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm)
-{
- struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
-
- WARN_ON_ONCE(vcpu->kvm != kvm);
-
- return &vcpu->dirty_ring;
-}
-
static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)
{
struct kvm_memory_slot *memslot;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6c5083f2eb50..72c6453bcef4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3020,12 +3020,17 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
struct kvm_memory_slot *memslot,
gfn_t gfn)
{
+ struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+
+ if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
+ return;
+
if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
unsigned long rel_gfn = gfn - memslot->base_gfn;
u32 slot = (memslot->as_id << 16) | memslot->id;
if (kvm->dirty_ring_size)
- kvm_dirty_ring_push(kvm_dirty_ring_get(kvm),
+ kvm_dirty_ring_push(&vcpu->dirty_ring,
slot, rel_gfn);
else
set_bit_le(rel_gfn, memslot->dirty_bitmap);
--
2.31.1
On Sat, Nov 20, 2021, David Woodhouse wrote:
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6c5083f2eb50..72c6453bcef4 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3020,12 +3020,17 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> struct kvm_memory_slot *memslot,
> gfn_t gfn)
> {
> + struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> +
> + if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
Maybe use KVM_BUG_ON? And two separate WARNs are probably overkill.
if (KVM_BUG_ON(!vcpu || vcpu->kvm != kvm, kvm))
I'd also prefer to not retrieve the vCPU in the dirty_bitmap path, at least not
until it's necessary (for the proposed dirty quota throttling), though that's not
a strong preference.
> + return;
> +
> if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> unsigned long rel_gfn = gfn - memslot->base_gfn;
> u32 slot = (memslot->as_id << 16) | memslot->id;
>
> if (kvm->dirty_ring_size)
> - kvm_dirty_ring_push(kvm_dirty_ring_get(kvm),
> + kvm_dirty_ring_push(&vcpu->dirty_ring,
> slot, rel_gfn);
This can now squeeze on a single line.
kvm_dirty_ring_push(&vcpu->dirty_ring, slot, rel_gfn);
> else
> set_bit_le(rel_gfn, memslot->dirty_bitmap);
> --
> 2.31.1
>
On Mon, 2021-11-22 at 17:01 +0000, Sean Christopherson wrote:
> On Sat, Nov 20, 2021, David Woodhouse wrote:
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 6c5083f2eb50..72c6453bcef4 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3020,12 +3020,17 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> > struct kvm_memory_slot *memslot,
> > gfn_t gfn)
> > {
> > + struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> > +
> > + if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>
> Maybe use KVM_BUG_ON? And two separate WARNs are probably overkill.
>
> if (KVM_BUG_ON(!vcpu || vcpu->kvm != kvm, kvm))
>
>
> I'd also prefer to not retrieve the vCPU in the dirty_bitmap path, at least not
> until it's necessary (for the proposed dirty quota throttling), though that's not
> a strong preference.
I don't think that would achieve my objective. This was my reaction to
learning that I was never supposed to have called kvm_write_guest()
when I didn't have an active vCPU context¹. I wanted there to have been
a *warning* about that, right there and then when I first did it
instead of waiting for syzkaller to find it.
I didn't want to wait for the actual circumstances to arise that made
it *crash*; I wanted an early warning. And that's also why it was a
warning not a BUG(), but I suppose KVM_BUG_ON() would be OK.
--
dwmw2
¹ My other reaction was wanting to remove kvm_write_guest() entirely
and let people use kvm_vcpu_write_guest() instead. That's the path I
was going down with the original patch to propagate the vcpu down.
On Mon, Nov 22, 2021, David Woodhouse wrote:
> On Mon, 2021-11-22 at 17:01 +0000, Sean Christopherson wrote:
> > On Sat, Nov 20, 2021, David Woodhouse wrote:
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 6c5083f2eb50..72c6453bcef4 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -3020,12 +3020,17 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> > > struct kvm_memory_slot *memslot,
> > > gfn_t gfn)
> > > {
> > > + struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> > > +
> > > + if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> >
> > Maybe use KVM_BUG_ON? And two separate WARNs are probably overkill.
> >
> > if (KVM_BUG_ON(!vcpu || vcpu->kvm != kvm, kvm))
> >
> >
> > I'd also prefer to not retrieve the vCPU in the dirty_bitmap path, at least not
> > until it's necessary (for the proposed dirty quota throttling), though that's not
> > a strong preference.
>
> I don't think that would achieve my objective. This was my reaction to
> learning that I was never supposed to have called kvm_write_guest()
> when I didn't have an active vCPU context?. I wanted there to have been
> a *warning* about that, right there and then when I first did it
> instead of waiting for syzkaller to find it.
Fair enough. And probably a moot point since Paolo hasn't vehemently objected
to the dirty quota idea.
From: Christian Borntraeger <[email protected]>
Quick heads-up.
The new warnon triggers on s390. Here we write to the guest from an
irqfd worker. Since we do not use dirty_ring yet this might be an over-indication.
Still have to look into that.
[ 1801.980777] WARNING: CPU: 12 PID: 117600 at arch/s390/kvm/../../../virt/kvm/kvm_main.c:3166 mark_page_dirty_in_slot+0xa0/0xb0 [kvm]
[ 1801.980839] Modules linked in: xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) xt_tcpudp(E) nft_compat(E) nf_nat_tftp(E) nft_objref(E) vhost_vsock(E) vmw_vsock_virtio_transport_common(E) vsock(E) vhost(E) vhost_iotlb(E) nf_conntrack_tftp(E) crc32_generic(E) algif_hash(E) af_alg(E) paes_s390(E) dm_crypt(E) encrypted_keys(E) loop(E) lcs(E) ctcm(E) fsm(E) kvm(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nf_tables(E) nfnetlink(E) sunrpc(E) dm_service_time(E) dm_multipath(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) zfcp(E) scsi_transport_fc(E) ism(E) smc(E) ib_core(E) eadm_sch(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) configfs(E) ip_tables(E) x_tables(E) ghash_s39 [...truncated...]
[ 1801.980915] sha1_s390(E) sha_common(E) pkey(E) zcrypt(E) rng_core(E) autofs4(E) [last unloaded: vfio_ap]
[ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted: G E 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1
[ 1801.980935] Hardware name: IBM 2964 NC9 712 (LPAR)
[ 1801.980938] Workqueue: events irqfd_inject [kvm]
[ 1801.980959] Krnl PSW : 0704e00180000000 000003ff805f0f5c (mark_page_dirty_in_slot+0xa4/0xb0 [kvm])
[ 1801.980981] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[ 1801.980985] Krnl GPRS: 000003ff298e9040 000000017754a660 0000000000000000 0000000000000000
[ 1801.980988] 000000003fefcc36 ffffffffffffff68 0000000000000000 0000000177871500
[ 1801.980990] 00000001d1918000 000000003fefcc36 00000001d1918000 0000000000000000
[ 1801.980993] 00000001375b0000 00000001d191a838 000003ff805f0ee6 0000038000babb48
[ 1801.981003] Krnl Code: 000003ff805f0f4c: eb9ff0a00004 lmg %r9,%r15,160(%r15)
000003ff805f0f52: c0f400018c61 brcl 15,000003ff80622814
#000003ff805f0f58: af000000 mc 0,0
>000003ff805f0f5c: eb9ff0a00004 lmg %r9,%r15,160(%r15)
000003ff805f0f62: c0f400018c59 brcl 15,000003ff80622814
000003ff805f0f68: c004ffe37b10 brcl 0,000003ff80260588
000003ff805f0f6e: ec360033007c cgij %r3,0,6,000003ff805f0fd4
000003ff805f0f74: e31020100012 lt %r1,16(%r2)
[ 1801.981057] Call Trace:
[ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0 [kvm]
[ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268 [kvm]
[ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm]
[ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm]
[ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm]
[ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470
[ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498
[ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110
[ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58
[ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40
On 1/13/22 13:06, Christian Borntraeger wrote:
> From: Christian Borntraeger<[email protected]>
>
> Quick heads-up.
> The new warnon triggers on s390. Here we write to the guest from an
> irqfd worker. Since we do not use dirty_ring yet this might be an over-indication.
> Still have to look into that.
Yes, it's okay to add an #ifdef around the warning.
Paolo
Avoid warnings on s390 like
[ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted: G E 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1
[ 1801.980938] Workqueue: events irqfd_inject [kvm]
[...]
[ 1801.981057] Call Trace:
[ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0 [kvm]
[ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268 [kvm]
[ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm]
[ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm]
[ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm]
[ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470
[ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498
[ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110
[ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58
[ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40
when writing to a guest from an irqfd worker as long as we do not have
the dirty ring.
Signed-off-by: Christian Borntraeger <[email protected]>
---
virt/kvm/kvm_main.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 504158f0e131..1a682d3e106d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3163,8 +3163,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
{
struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING
if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
return;
+#endif
if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
unsigned long rel_gfn = gfn - memslot->base_gfn;
--
2.33.1
On Thu, 2022-01-13 at 13:14 +0100, Paolo Bonzini wrote:
> On 1/13/22 13:06, Christian Borntraeger wrote:
> > From: Christian Borntraeger<
> > [email protected]
> > >
> >
> > Quick heads-up.
> > The new warnon triggers on s390. Here we write to the guest from an
> > irqfd worker. Since we do not use dirty_ring yet this might be an
> > over-indication.
> > Still have to look into that.
>
> Yes, it's okay to add an #ifdef around the warning.
That would be #ifndef CONFIG_HAVE_KVM_DIRTY_RING, yes?
I already found it hard to write down the rules around how
kvm_vcpu_write_guest() doesn't use the vCPU it's passed, and how both
it and kvm_write_guest() need to be invoked on a pCPU which currently
owns *a* vCPU belonging to the same KVM... if we add "unless you're on
an architecture that doesn't support dirty ring logging", you may have
to pass me a bucket.
Are you proposing that as an officially documented part of the already
horrid API, or a temporary measure :)
Btw, that get_map_page() in arch/s390/kvm/interrupt.c looks like it has
the same use-after-free problem that kvm_map_gfn() used to have. It
probably wants converting to the new gfn_to_pfn_cache.
Take a look at how I resolve the same issue for delivering Xen event
channel interrupts.
Although I gave myself a free pass on the dirty marking in that case,
by declaring that the shinfo page doesn't get marked dirty; it should
be considered *always* dirty. You might have less fun declaring that
retrospectively in your case.
On Thu, 2022-01-13 at 13:29 +0100, Christian Borntraeger wrote:
> Avoid warnings on s390 like
> [ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted: G E 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1
> [ 1801.980938] Workqueue: events irqfd_inject [kvm]
> [...]
> [ 1801.981057] Call Trace:
> [ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0 [kvm]
> [ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268 [kvm]
> [ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm]
> [ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm]
> [ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm]
> [ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470
> [ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498
> [ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110
> [ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58
> [ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40
>
> when writing to a guest from an irqfd worker as long as we do not have
> the dirty ring.
>
> Signed-off-by: Christian Borntraeger <[email protected]>
Reluctantly-acked-by: David Woodhouse <[email protected]>
... with a bucket nearby just in case.
Am 13.01.22 um 13:30 schrieb David Woodhouse:
> On Thu, 2022-01-13 at 13:14 +0100, Paolo Bonzini wrote:
>> On 1/13/22 13:06, Christian Borntraeger wrote:
>>> From: Christian Borntraeger<
>>> [email protected]
>>>>
>>>
>>> Quick heads-up.
>>> The new warnon triggers on s390. Here we write to the guest from an
>>> irqfd worker. Since we do not use dirty_ring yet this might be an
>>> over-indication.
>>> Still have to look into that.
>>
>> Yes, it's okay to add an #ifdef around the warning.
>
> That would be #ifndef CONFIG_HAVE_KVM_DIRTY_RING, yes?
>
> I already found it hard to write down the rules around how
> kvm_vcpu_write_guest() doesn't use the vCPU it's passed, and how both
> it and kvm_write_guest() need to be invoked on a pCPU which currently
> owns *a* vCPU belonging to the same KVM... if we add "unless you're on
> an architecture that doesn't support dirty ring logging", you may have
> to pass me a bucket.
>
> Are you proposing that as an officially documented part of the already
> horrid API, or a temporary measure :)
>
> Btw, that get_map_page() in arch/s390/kvm/interrupt.c looks like it has
> the same use-after-free problem that kvm_map_gfn() used to have. It
> probably wants converting to the new gfn_to_pfn_cache.
>
> Take a look at how I resolve the same issue for delivering Xen event
> channel interrupts.
Do you have a commit ID for your Xen event channel fix?
>
> Although I gave myself a free pass on the dirty marking in that case,
> by declaring that the shinfo page doesn't get marked dirty; it should
> be considered *always* dirty. You might have less fun declaring that
> retrospectively in your case.
On Thu, 2022-01-13 at 13:51 +0100, Christian Borntraeger wrote:
> > Btw, that get_map_page() in arch/s390/kvm/interrupt.c looks like it has
> > the same use-after-free problem that kvm_map_gfn() used to have. It
> > probably wants converting to the new gfn_to_pfn_cache.
> >
> > Take a look at how I resolve the same issue for delivering Xen event
> > channel interrupts.
>
> Do you have a commit ID for your Xen event channel fix?
14243b387137 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event
channel delivery") and the commits reworking the gfn_to_pfn_cache which
lead up to it.
Questions: In your kvm_set_routing_entry() where it calls
gmap_translate() to turn the summary_addr and ind_addr from guest
addresses to userspace virtual addresses, what protects those
translations? If the gmap changes before kvm_set_routing_entry() even
returns, what ensures that the IRQ gets retranslated?
And later in adapter_indicators_set() where you take that userspace
virtual address and pass it to your get_map_page() function, the same
question: what if userspace does a concurrent mmap() and changes the
physical page that the userspace address points to?
In the latter case, at least it does look like you don't support
external memory accessed only by a PFN without having a corresponding
struct page. So at least you don't end up accessing a page that can now
belong to *another* guest, because the original underlying page is
locked. You're probably OK in that case, so it's just the gmap changing
that we need to focus on?
On 1/13/22 13:30, David Woodhouse wrote:
> Are you proposing that as an officially documented part of the already
> horrid API, or a temporary measure:)
Hopefully temporary, but honestly you never know how these things go.
Paolo
Am 13.01.22 um 14:22 schrieb David Woodhouse:
> On Thu, 2022-01-13 at 13:51 +0100, Christian Borntraeger wrote:
>>> Btw, that get_map_page() in arch/s390/kvm/interrupt.c looks like it has
>>> the same use-after-free problem that kvm_map_gfn() used to have. It
>>> probably wants converting to the new gfn_to_pfn_cache.
>>>
>>> Take a look at how I resolve the same issue for delivering Xen event
>>> channel interrupts.
>>
>> Do you have a commit ID for your Xen event channel fix?
>
> 14243b387137 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event
> channel delivery") and the commits reworking the gfn_to_pfn_cache which
> lead up to it.
>
> Questions: In your kvm_set_routing_entry() where it calls
> gmap_translate() to turn the summary_addr and ind_addr from guest
> addresses to userspace virtual addresses, what protects those
> translations? If the gmap changes before kvm_set_routing_entry() even
> returns, what ensures that the IRQ gets retranslated?
In the end the gmap translated between guest physical and host virtual, just
like the kvm memslots. This is done in kvm_arch_commit_memory_region. The gmap
is a method where we share the last level page table between the guest mapping
and the user mapping. That is why our memslots have to be on 1 MB boundary (our
page tables have 256 4k entries). So instead of gmap we could have used the
memslots as well for translation.
I have trouble seeing a kernel integrity issue: worst case is that we point to
a different address in the userspace mapping if qemu would change the memslots
maybe even to an invalid one. But then the access should fail (for invalid) or
you get a self-inflicted bit flips on your own address space if you play such
games.
Well, one thing: if QEMU changes memslots, it might need to redo the irqfd
registration as well as we do not follow these changes. Maybe this is something
that we could improve as future QEMUs could do more changes regarding memslots.
>
> And later in adapter_indicators_set() where you take that userspace
> virtual address and pass it to your get_map_page() function, the same
> question: what if userspace does a concurrent mmap() and changes the
> physical page that the userspace address points to?
>
> In the latter case, at least it does look like you don't support
> external memory accessed only by a PFN without having a corresponding
> struct page. So at least you don't end up accessing a page that can now
> belong to *another* guest, because the original underlying page is
> locked. You're probably OK in that case, so it's just the gmap changing
> that we need to focus on?
Am 13.01.22 um 13:29 schrieb Christian Borntraeger:
> Avoid warnings on s390 like
> [ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted: G E 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1
> [ 1801.980938] Workqueue: events irqfd_inject [kvm]
> [...]
> [ 1801.981057] Call Trace:
> [ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0 [kvm]
> [ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268 [kvm]
> [ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm]
> [ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm]
> [ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm]
> [ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470
> [ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498
> [ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110
> [ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58
> [ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40
>
> when writing to a guest from an irqfd worker as long as we do not have
> the dirty ring.
>
> Signed-off-by: Christian Borntraeger <[email protected]>
> ---
> virt/kvm/kvm_main.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 504158f0e131..1a682d3e106d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3163,8 +3163,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> {
> struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>
> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING
> if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> return;
> +#endif
>
> if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> unsigned long rel_gfn = gfn - memslot->base_gfn;
Paolo, are you going to pick this for next for the time being?
On 1/18/22 09:37, Christian Borntraeger wrote:
> Am 13.01.22 um 13:29 schrieb Christian Borntraeger:
>> Avoid warnings on s390 like
>> [ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted:
>> G E
>> 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1
>> [ 1801.980938] Workqueue: events irqfd_inject [kvm]
>> [...]
>> [ 1801.981057] Call Trace:
>> [ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0
>> [kvm]
>> [ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268
>> [kvm]
>> [ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm]
>> [ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm]
>> [ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm]
>> [ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470
>> [ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498
>> [ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110
>> [ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58
>> [ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40
>>
>> when writing to a guest from an irqfd worker as long as we do not have
>> the dirty ring.
>>
>> Signed-off-by: Christian Borntraeger <[email protected]>
>> ---
>> virt/kvm/kvm_main.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 504158f0e131..1a682d3e106d 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -3163,8 +3163,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>> {
>> struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING
>> if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>> return;
>> +#endif
>> if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
>> unsigned long rel_gfn = gfn - memslot->base_gfn;
>
> Paolo, are you going to pick this for next for the time being?
>
Yep, done now.
Paolo
Am 18.01.22 um 09:44 schrieb Paolo Bonzini:
> On 1/18/22 09:37, Christian Borntraeger wrote:
>> Am 13.01.22 um 13:29 schrieb Christian Borntraeger:
>>> Avoid warnings on s390 like
>>> [ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted: G E 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1
>>> [ 1801.980938] Workqueue: events irqfd_inject [kvm]
>>> [...]
>>> [ 1801.981057] Call Trace:
>>> [ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0 [kvm]
>>> [ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268 [kvm]
>>> [ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm]
>>> [ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm]
>>> [ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm]
>>> [ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470
>>> [ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498
>>> [ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110
>>> [ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58
>>> [ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40
>>>
>>> when writing to a guest from an irqfd worker as long as we do not have
>>> the dirty ring.
>>>
>>> Signed-off-by: Christian Borntraeger <[email protected]>
>>> ---
>>> virt/kvm/kvm_main.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 504158f0e131..1a682d3e106d 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -3163,8 +3163,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>> {
>>> struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING
>>> if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>>> return;
>>> +#endif
>>> if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
>>> unsigned long rel_gfn = gfn - memslot->base_gfn;
>>
>> Paolo, are you going to pick this for next for the time being?
>>
>
> Yep, done now.
>
> Paolo
Thanks. I just realized that Davids patch meanwhile landed in Linus tree. So better
take this via master and not next.
Maybe also add
Fixes: 2efd61a608b0 ("KVM: Warn if mark_page_dirty() is called without an active vCPU")
in case the patch is picked for stable
On 1/18/22 09:53, Christian Borntraeger wrote:
>>>
>>> Paolo, are you going to pick this for next for the time being?
>>>
>>
>> Yep, done now.
>>
>> Paolo
>
> Thanks. I just realized that Davids patch meanwhile landed in Linus
> tree. So better
> take this via master and not next.
Yeah, I understood what you meant. :) In fact, "master" right now is
still on 5.16 (for patches that are destined to stable, but have
conflicts merge window changes; those are pushed to master and merged
from there into next). There will be another pull request this week and
it will have this patch.
> Maybe also add
> Fixes: 2efd61a608b0 ("KVM: Warn if mark_page_dirty() is called without
> an active vCPU")
> in case the patch is picked for stable
Ok, will do.
Paolo