2015-08-25 07:47:23

by Jason Wang

[permalink] [raw]
Subject: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

All fields of kvm_io_range were initialized or copied explicitly
afterwards. So switch to use kmalloc().

Cc: Gleb Natapov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
---
virt/kvm/kvm_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b8a444..0d79fe8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3248,7 +3248,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
if (bus->dev_count - bus->ioeventfd_count > NR_IOBUS_DEVS - 1)
return -ENOSPC;

- new_bus = kzalloc(sizeof(*bus) + ((bus->dev_count + 1) *
+ new_bus = kmalloc(sizeof(*bus) + ((bus->dev_count + 1) *
sizeof(struct kvm_io_range)), GFP_KERNEL);
if (!new_bus)
return -ENOMEM;
@@ -3280,7 +3280,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
if (r)
return r;

- new_bus = kzalloc(sizeof(*bus) + ((bus->dev_count - 1) *
+ new_bus = kmalloc(sizeof(*bus) + ((bus->dev_count - 1) *
sizeof(struct kvm_io_range)), GFP_KERNEL);
if (!new_bus)
return -ENOMEM;
--
2.1.4


2015-08-25 07:47:32

by Jason Wang

[permalink] [raw]
Subject: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
and another is KVM_FAST_MMIO_BUS. This leads to issue:

- kvm_io_bus_destroy() knows nothing about the devices on two buses
points to a single dev. Which will lead double free [1] during exit.
- wildcard eventfd ignores data len, so it was registered as a
kvm_io_range with zero length. This will fail the binary search in
kvm_io_bus_get_first_dev() when we try to emulate through
KVM_MMIO_BUS. This will cause userspace io emulation request instead
of a eventfd notification (virtqueue kick will be trapped by qemu
instead of vhost in this case).

Fixing this by don't register wildcard mmio eventfd on two
buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
double free issue of kvm_io_bus_destroy(). For the arch/setups that
does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
KVM_FAST_MMIO_BUS first to see it it has a match.

[1] Panic caused by double free:

CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu
Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
task: ffff88009ae0c4b0 ti: ffff88020e7f0000 task.ti: ffff88020e7f0000
RIP: 0010:[<ffffffffc07e25d8>] [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
RSP: 0018:ffff88020e7f3bc8 EFLAGS: 00010292
RAX: dead000000200200 RBX: ffff8801ec19c900 RCX: 000000018200016d
RDX: ffff8801ec19cf80 RSI: ffffea0008bf1d40 RDI: ffff8801ec19c900
RBP: ffff88020e7f3bd8 R08: 000000002fc75a01 R09: 000000018200016d
R10: ffffffffc07df6ae R11: ffff88022fc75a98 R12: ffff88021e7cc000
R13: ffff88021e7cca48 R14: ffff88021e7cca50 R15: ffff8801ec19c880
FS: 00007fc1ee3e6700(0000) GS:ffff88023e240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8f389d8000 CR3: 000000023dc13000 CR4: 00000000001427e0
Stack:
ffff88021e7cc000 0000000000000000 ffff88020e7f3be8 ffffffffc07e2622
ffff88020e7f3c38 ffffffffc07df69a ffff880232524160 ffff88020e792d80
0000000000000000 ffff880219b78c00 0000000000000008 ffff8802321686a8
Call Trace:
[<ffffffffc07e2622>] ioeventfd_destructor+0x12/0x20 [kvm]
[<ffffffffc07df69a>] kvm_put_kvm+0xca/0x210 [kvm]
[<ffffffffc07df818>] kvm_vcpu_release+0x18/0x20 [kvm]
[<ffffffff811f69f7>] __fput+0xe7/0x250
[<ffffffff811f6bae>] ____fput+0xe/0x10
[<ffffffff81093f04>] task_work_run+0xd4/0xf0
[<ffffffff81079358>] do_exit+0x368/0xa50
[<ffffffff81082c8f>] ? recalc_sigpending+0x1f/0x60
[<ffffffff81079ad5>] do_group_exit+0x45/0xb0
[<ffffffff81085c71>] get_signal+0x291/0x750
[<ffffffff810144d8>] do_signal+0x28/0xab0
[<ffffffff810f3a3b>] ? do_futex+0xdb/0x5d0
[<ffffffff810b7028>] ? __wake_up_locked_key+0x18/0x20
[<ffffffff810f3fa6>] ? SyS_futex+0x76/0x170
[<ffffffff81014fc9>] do_notify_resume+0x69/0xb0
[<ffffffff817cb9af>] int_signal+0x12/0x17
Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00
RIP [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
RSP <ffff88020e7f3bc8>

Cc: Gleb Natapov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
---
Changes from v1:
- change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
needed to save lots of unnecessary changes.
---
virt/kvm/eventfd.c | 30 ++++++++----------------------
virt/kvm/kvm_main.c | 16 ++++++++++++++--
2 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9ff4193..95f2901 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p)
return false;
}

-static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
+static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args)
{
- if (flags & KVM_IOEVENTFD_FLAG_PIO)
+ if (args->flags & KVM_IOEVENTFD_FLAG_PIO)
return KVM_PIO_BUS;
- if (flags & KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
+ if (args->flags & KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
return KVM_VIRTIO_CCW_NOTIFY_BUS;
- return KVM_MMIO_BUS;
+ if (args->len)
+ return KVM_MMIO_BUS;
+ return KVM_FAST_MMIO_BUS;
}

static int
@@ -779,7 +781,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
struct eventfd_ctx *eventfd;
int ret;

- bus_idx = ioeventfd_bus_from_flags(args->flags);
+ bus_idx = ioeventfd_bus_from_flags(args);
/* must be natural-word sized, or 0 to ignore length */
switch (args->len) {
case 0:
@@ -843,16 +845,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
if (ret < 0)
goto unlock_fail;

- /* When length is ignored, MMIO is also put on a separate bus, for
- * faster lookups.
- */
- if (!args->len && !(args->flags & KVM_IOEVENTFD_FLAG_PIO)) {
- ret = kvm_io_bus_register_dev(kvm, KVM_FAST_MMIO_BUS,
- p->addr, 0, &p->dev);
- if (ret < 0)
- goto register_fail;
- }
-
kvm->buses[bus_idx]->ioeventfd_count++;
list_add_tail(&p->list, &kvm->ioeventfds);

@@ -860,8 +852,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)

return 0;

-register_fail:
- kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
unlock_fail:
mutex_unlock(&kvm->slots_lock);

@@ -880,7 +870,7 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
struct eventfd_ctx *eventfd;
int ret = -ENOENT;

- bus_idx = ioeventfd_bus_from_flags(args->flags);
+ bus_idx = ioeventfd_bus_from_flags(args);
eventfd = eventfd_ctx_fdget(args->fd);
if (IS_ERR(eventfd))
return PTR_ERR(eventfd);
@@ -901,10 +891,6 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
continue;

kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
- if (!p->length) {
- kvm_io_bus_unregister_dev(kvm, KVM_FAST_MMIO_BUS,
- &p->dev);
- }
kvm->buses[bus_idx]->ioeventfd_count--;
ioeventfd_release(p);
ret = 0;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0d79fe8..3e3b3bc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3152,8 +3152,8 @@ static int __kvm_io_bus_write(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus,
}

/* kvm_io_bus_write - called under kvm->slots_lock */
-int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
- int len, const void *val)
+static int kvm_io_bus_write_idx(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx,
+ gpa_t addr, int len, const void *val)
{
struct kvm_io_bus *bus;
struct kvm_io_range range;
@@ -3169,6 +3169,18 @@ int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
return r < 0 ? r : 0;
}

+int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
+ int len, const void *val)
+{
+ int r = 0;
+
+ if (bus_idx != KVM_MMIO_BUS ||
+ kvm_io_bus_write_idx(vcpu, KVM_FAST_MMIO_BUS, addr, 0, NULL) < 0)
+ r = kvm_io_bus_write_idx(vcpu, bus_idx, addr, len, val);
+
+ return r < 0 ? r : 0;
+}
+
/* kvm_io_bus_write_cookie - called under kvm->slots_lock */
int kvm_io_bus_write_cookie(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx,
gpa_t addr, int len, const void *val, long cookie)
--
2.1.4

2015-08-25 07:47:35

by Jason Wang

[permalink] [raw]
Subject: [PATCH V2 3/3] kvm: add tracepoint for fast mmio

Cc: Gleb Natapov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
---
arch/x86/kvm/trace.h | 17 +++++++++++++++++
arch/x86/kvm/vmx.c | 1 +
arch/x86/kvm/x86.c | 1 +
3 files changed, 19 insertions(+)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 4eae7c3..2d4e81a 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio,
__entry->count > 1 ? "(...)" : "")
);

+TRACE_EVENT(kvm_fast_mmio,
+ TP_PROTO(u64 gpa),
+ TP_ARGS(gpa),
+
+ TP_STRUCT__entry(
+ __field(u64, gpa)
+ ),
+
+ TP_fast_assign(
+ __entry->gpa = gpa;
+ ),
+
+ TP_printk("fast mmio at gpa 0x%llx", __entry->gpa)
+);
+
+
+
/*
* Tracepoint for cpuid.
*/
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 83b7b5c..a55d279 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5831,6 +5831,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
skip_emulated_instruction(vcpu);
+ trace_kvm_fast_mmio(gpa);
return 1;
}

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f0f6ec..36cf78e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8254,6 +8254,7 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);

EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
--
2.1.4

2015-08-25 08:24:58

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

On Tue, 25 Aug 2015 15:47:14 +0800
Jason Wang <[email protected]> wrote:

> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 9ff4193..95f2901 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p)
> return false;
> }
>
> -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
> +static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args)

ioeventfd_bus_from_args()? But _from_flags() is not wrong either :)

> {
> - if (flags & KVM_IOEVENTFD_FLAG_PIO)
> + if (args->flags & KVM_IOEVENTFD_FLAG_PIO)
> return KVM_PIO_BUS;
> - if (flags & KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
> + if (args->flags & KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
> return KVM_VIRTIO_CCW_NOTIFY_BUS;
> - return KVM_MMIO_BUS;
> + if (args->len)
> + return KVM_MMIO_BUS;
> + return KVM_FAST_MMIO_BUS;

Hm...

/* When length is ignored, MMIO is put on a separate bus, for
* faster lookups.
*/
return args->len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;

> }
>
> static int

This version of the patch looks nice and compact. Regardless whether
you want to follow my (minor) style suggestions, consider this patch

Acked-by: Cornelia Huck <[email protected]>

2015-08-25 09:06:40

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses



On 08/25/2015 04:20 PM, Cornelia Huck wrote:
> On Tue, 25 Aug 2015 15:47:14 +0800
> Jason Wang <[email protected]> wrote:
>
>> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
>> index 9ff4193..95f2901 100644
>> --- a/virt/kvm/eventfd.c
>> +++ b/virt/kvm/eventfd.c
>> @@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p)
>> return false;
>> }
>>
>> -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
>> +static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args)
> ioeventfd_bus_from_args()? But _from_flags() is not wrong either :)
>
>> {
>> - if (flags & KVM_IOEVENTFD_FLAG_PIO)
>> + if (args->flags & KVM_IOEVENTFD_FLAG_PIO)
>> return KVM_PIO_BUS;
>> - if (flags & KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
>> + if (args->flags & KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
>> return KVM_VIRTIO_CCW_NOTIFY_BUS;
>> - return KVM_MMIO_BUS;
>> + if (args->len)
>> + return KVM_MMIO_BUS;
>> + return KVM_FAST_MMIO_BUS;
> Hm...
>
> /* When length is ignored, MMIO is put on a separate bus, for
> * faster lookups.
> */
> return args->len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;
>
>> }
>>
>> static int
> This version of the patch looks nice and compact. Regardless whether
> you want to follow my (minor) style suggestions, consider this patch
>
> Acked-by: Cornelia Huck <[email protected]>
>

Thanks for the review. V3 posted :)

2015-08-25 11:33:54

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

On Tue, Aug 25, 2015 at 03:47:14PM +0800, Jason Wang wrote:
> We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
> and another is KVM_FAST_MMIO_BUS. This leads to issue:
>
> - kvm_io_bus_destroy() knows nothing about the devices on two buses
> points to a single dev. Which will lead double free [1] during exit.
> - wildcard eventfd ignores data len, so it was registered as a
> kvm_io_range with zero length. This will fail the binary search in
> kvm_io_bus_get_first_dev() when we try to emulate through
> KVM_MMIO_BUS. This will cause userspace io emulation request instead
> of a eventfd notification (virtqueue kick will be trapped by qemu
> instead of vhost in this case).
>
> Fixing this by don't register wildcard mmio eventfd on two
> buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
> double free issue of kvm_io_bus_destroy(). For the arch/setups that
> does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
> KVM_FAST_MMIO_BUS first to see it it has a match.
>
> [1] Panic caused by double free:
>
> CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu
> Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
> task: ffff88009ae0c4b0 ti: ffff88020e7f0000 task.ti: ffff88020e7f0000
> RIP: 0010:[<ffffffffc07e25d8>] [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
> RSP: 0018:ffff88020e7f3bc8 EFLAGS: 00010292
> RAX: dead000000200200 RBX: ffff8801ec19c900 RCX: 000000018200016d
> RDX: ffff8801ec19cf80 RSI: ffffea0008bf1d40 RDI: ffff8801ec19c900
> RBP: ffff88020e7f3bd8 R08: 000000002fc75a01 R09: 000000018200016d
> R10: ffffffffc07df6ae R11: ffff88022fc75a98 R12: ffff88021e7cc000
> R13: ffff88021e7cca48 R14: ffff88021e7cca50 R15: ffff8801ec19c880
> FS: 00007fc1ee3e6700(0000) GS:ffff88023e240000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f8f389d8000 CR3: 000000023dc13000 CR4: 00000000001427e0
> Stack:
> ffff88021e7cc000 0000000000000000 ffff88020e7f3be8 ffffffffc07e2622
> ffff88020e7f3c38 ffffffffc07df69a ffff880232524160 ffff88020e792d80
> 0000000000000000 ffff880219b78c00 0000000000000008 ffff8802321686a8
> Call Trace:
> [<ffffffffc07e2622>] ioeventfd_destructor+0x12/0x20 [kvm]
> [<ffffffffc07df69a>] kvm_put_kvm+0xca/0x210 [kvm]
> [<ffffffffc07df818>] kvm_vcpu_release+0x18/0x20 [kvm]
> [<ffffffff811f69f7>] __fput+0xe7/0x250
> [<ffffffff811f6bae>] ____fput+0xe/0x10
> [<ffffffff81093f04>] task_work_run+0xd4/0xf0
> [<ffffffff81079358>] do_exit+0x368/0xa50
> [<ffffffff81082c8f>] ? recalc_sigpending+0x1f/0x60
> [<ffffffff81079ad5>] do_group_exit+0x45/0xb0
> [<ffffffff81085c71>] get_signal+0x291/0x750
> [<ffffffff810144d8>] do_signal+0x28/0xab0
> [<ffffffff810f3a3b>] ? do_futex+0xdb/0x5d0
> [<ffffffff810b7028>] ? __wake_up_locked_key+0x18/0x20
> [<ffffffff810f3fa6>] ? SyS_futex+0x76/0x170
> [<ffffffff81014fc9>] do_notify_resume+0x69/0xb0
> [<ffffffff817cb9af>] int_signal+0x12/0x17
> Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00
> RIP [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
> RSP <ffff88020e7f3bc8>
>
> Cc: Gleb Natapov <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Signed-off-by: Jason Wang <[email protected]>

I'm worried that this slows down the regular MMIO.
Could you share performance #s please?
You need a mix of len=0 and len=2 matches.

One solution for the first issue is to create two ioeventfd objects instead.
For the second issue, we could change bsearch compare function instead.
Again, affects all devices to performance #s would be needed.

> ---
> Changes from v1:
> - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
> needed to save lots of unnecessary changes.
> ---
> virt/kvm/eventfd.c | 30 ++++++++----------------------
> virt/kvm/kvm_main.c | 16 ++++++++++++++--
> 2 files changed, 22 insertions(+), 24 deletions(-)
>
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 9ff4193..95f2901 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -762,13 +762,15 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p)
> return false;
> }
>
> -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
> +static enum kvm_bus ioeventfd_bus_from_flags(struct kvm_ioeventfd *args)
> {
> - if (flags & KVM_IOEVENTFD_FLAG_PIO)
> + if (args->flags & KVM_IOEVENTFD_FLAG_PIO)
> return KVM_PIO_BUS;
> - if (flags & KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
> + if (args->flags & KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
> return KVM_VIRTIO_CCW_NOTIFY_BUS;
> - return KVM_MMIO_BUS;
> + if (args->len)
> + return KVM_MMIO_BUS;
> + return KVM_FAST_MMIO_BUS;
> }
>
> static int
> @@ -779,7 +781,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
> struct eventfd_ctx *eventfd;
> int ret;
>
> - bus_idx = ioeventfd_bus_from_flags(args->flags);
> + bus_idx = ioeventfd_bus_from_flags(args);
> /* must be natural-word sized, or 0 to ignore length */
> switch (args->len) {
> case 0:
> @@ -843,16 +845,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
> if (ret < 0)
> goto unlock_fail;
>
> - /* When length is ignored, MMIO is also put on a separate bus, for
> - * faster lookups.
> - */
> - if (!args->len && !(args->flags & KVM_IOEVENTFD_FLAG_PIO)) {
> - ret = kvm_io_bus_register_dev(kvm, KVM_FAST_MMIO_BUS,
> - p->addr, 0, &p->dev);
> - if (ret < 0)
> - goto register_fail;
> - }
> -
> kvm->buses[bus_idx]->ioeventfd_count++;
> list_add_tail(&p->list, &kvm->ioeventfds);
>
> @@ -860,8 +852,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
>
> return 0;
>
> -register_fail:
> - kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
> unlock_fail:
> mutex_unlock(&kvm->slots_lock);
>
> @@ -880,7 +870,7 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
> struct eventfd_ctx *eventfd;
> int ret = -ENOENT;
>
> - bus_idx = ioeventfd_bus_from_flags(args->flags);
> + bus_idx = ioeventfd_bus_from_flags(args);
> eventfd = eventfd_ctx_fdget(args->fd);
> if (IS_ERR(eventfd))
> return PTR_ERR(eventfd);
> @@ -901,10 +891,6 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
> continue;
>
> kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
> - if (!p->length) {
> - kvm_io_bus_unregister_dev(kvm, KVM_FAST_MMIO_BUS,
> - &p->dev);
> - }
> kvm->buses[bus_idx]->ioeventfd_count--;
> ioeventfd_release(p);
> ret = 0;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 0d79fe8..3e3b3bc 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3152,8 +3152,8 @@ static int __kvm_io_bus_write(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus,
> }
>
> /* kvm_io_bus_write - called under kvm->slots_lock */
> -int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> - int len, const void *val)
> +static int kvm_io_bus_write_idx(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx,
> + gpa_t addr, int len, const void *val)
> {
> struct kvm_io_bus *bus;
> struct kvm_io_range range;
> @@ -3169,6 +3169,18 @@ int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> return r < 0 ? r : 0;
> }
>
> +int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> + int len, const void *val)
> +{
> + int r = 0;
> +
> + if (bus_idx != KVM_MMIO_BUS ||
> + kvm_io_bus_write_idx(vcpu, KVM_FAST_MMIO_BUS, addr, 0, NULL) < 0)
> + r = kvm_io_bus_write_idx(vcpu, bus_idx, addr, len, val);
> +
> + return r < 0 ? r : 0;
> +}
> +
> /* kvm_io_bus_write_cookie - called under kvm->slots_lock */
> int kvm_io_bus_write_cookie(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx,
> gpa_t addr, int len, const void *val, long cookie)
> --
> 2.1.4

2015-08-25 11:34:57

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] kvm: add tracepoint for fast mmio

On Tue, Aug 25, 2015 at 03:47:15PM +0800, Jason Wang wrote:
> Cc: Gleb Natapov <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Signed-off-by: Jason Wang <[email protected]>
> ---
> arch/x86/kvm/trace.h | 17 +++++++++++++++++
> arch/x86/kvm/vmx.c | 1 +
> arch/x86/kvm/x86.c | 1 +
> 3 files changed, 19 insertions(+)
>
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 4eae7c3..2d4e81a 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio,
> __entry->count > 1 ? "(...)" : "")
> );
>
> +TRACE_EVENT(kvm_fast_mmio,
> + TP_PROTO(u64 gpa),
> + TP_ARGS(gpa),
> +
> + TP_STRUCT__entry(
> + __field(u64, gpa)
> + ),
> +
> + TP_fast_assign(
> + __entry->gpa = gpa;
> + ),
> +
> + TP_printk("fast mmio at gpa 0x%llx", __entry->gpa)
> +);
> +
> +
> +

don't add multiple empty lines please.

> /*
> * Tracepoint for cpuid.
> */
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 83b7b5c..a55d279 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5831,6 +5831,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> skip_emulated_instruction(vcpu);
> + trace_kvm_fast_mmio(gpa);
> return 1;
> }
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8f0f6ec..36cf78e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8254,6 +8254,7 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
> EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
>
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
> --
> 2.1.4

2015-08-25 15:29:12

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote:
> All fields of kvm_io_range were initialized or copied explicitly
> afterwards. So switch to use kmalloc().

Is there any compiler added alignment padding
in either structure? If so, those padding
areas would now be uninitialized and may leak
kernel data if copied to user-space.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
[]
> @@ -3248,7 +3248,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
> if (bus->dev_count - bus->ioeventfd_count > NR_IOBUS_DEVS - 1)
> return -ENOSPC;
>
> - new_bus = kzalloc(sizeof(*bus) + ((bus->dev_count + 1) *
> + new_bus = kmalloc(sizeof(*bus) + ((bus->dev_count + 1) *
> sizeof(struct kvm_io_range)), GFP_KERNEL);
> if (!new_bus)
> return -ENOMEM;
> @@ -3280,7 +3280,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
> if (r)
> return r;
>
> - new_bus = kzalloc(sizeof(*bus) + ((bus->dev_count - 1) *
> + new_bus = kmalloc(sizeof(*bus) + ((bus->dev_count - 1) *
> sizeof(struct kvm_io_range)), GFP_KERNEL);
> if (!new_bus)
> return -ENOMEM;


2015-08-26 05:07:24

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH V2 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses



On 08/25/2015 07:33 PM, Michael S. Tsirkin wrote:
> On Tue, Aug 25, 2015 at 03:47:14PM +0800, Jason Wang wrote:
>> > We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
>> > and another is KVM_FAST_MMIO_BUS. This leads to issue:
>> >
>> > - kvm_io_bus_destroy() knows nothing about the devices on two buses
>> > points to a single dev. Which will lead double free [1] during exit.
>> > - wildcard eventfd ignores data len, so it was registered as a
>> > kvm_io_range with zero length. This will fail the binary search in
>> > kvm_io_bus_get_first_dev() when we try to emulate through
>> > KVM_MMIO_BUS. This will cause userspace io emulation request instead
>> > of a eventfd notification (virtqueue kick will be trapped by qemu
>> > instead of vhost in this case).
>> >
>> > Fixing this by don't register wildcard mmio eventfd on two
>> > buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
>> > double free issue of kvm_io_bus_destroy(). For the arch/setups that
>> > does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
>> > KVM_FAST_MMIO_BUS first to see it it has a match.
>> >
>> > [1] Panic caused by double free:
>> >
>> > CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu
>> > Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
>> > task: ffff88009ae0c4b0 ti: ffff88020e7f0000 task.ti: ffff88020e7f0000
>> > RIP: 0010:[<ffffffffc07e25d8>] [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
>> > RSP: 0018:ffff88020e7f3bc8 EFLAGS: 00010292
>> > RAX: dead000000200200 RBX: ffff8801ec19c900 RCX: 000000018200016d
>> > RDX: ffff8801ec19cf80 RSI: ffffea0008bf1d40 RDI: ffff8801ec19c900
>> > RBP: ffff88020e7f3bd8 R08: 000000002fc75a01 R09: 000000018200016d
>> > R10: ffffffffc07df6ae R11: ffff88022fc75a98 R12: ffff88021e7cc000
>> > R13: ffff88021e7cca48 R14: ffff88021e7cca50 R15: ffff8801ec19c880
>> > FS: 00007fc1ee3e6700(0000) GS:ffff88023e240000(0000) knlGS:0000000000000000
>> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > CR2: 00007f8f389d8000 CR3: 000000023dc13000 CR4: 00000000001427e0
>> > Stack:
>> > ffff88021e7cc000 0000000000000000 ffff88020e7f3be8 ffffffffc07e2622
>> > ffff88020e7f3c38 ffffffffc07df69a ffff880232524160 ffff88020e792d80
>> > 0000000000000000 ffff880219b78c00 0000000000000008 ffff8802321686a8
>> > Call Trace:
>> > [<ffffffffc07e2622>] ioeventfd_destructor+0x12/0x20 [kvm]
>> > [<ffffffffc07df69a>] kvm_put_kvm+0xca/0x210 [kvm]
>> > [<ffffffffc07df818>] kvm_vcpu_release+0x18/0x20 [kvm]
>> > [<ffffffff811f69f7>] __fput+0xe7/0x250
>> > [<ffffffff811f6bae>] ____fput+0xe/0x10
>> > [<ffffffff81093f04>] task_work_run+0xd4/0xf0
>> > [<ffffffff81079358>] do_exit+0x368/0xa50
>> > [<ffffffff81082c8f>] ? recalc_sigpending+0x1f/0x60
>> > [<ffffffff81079ad5>] do_group_exit+0x45/0xb0
>> > [<ffffffff81085c71>] get_signal+0x291/0x750
>> > [<ffffffff810144d8>] do_signal+0x28/0xab0
>> > [<ffffffff810f3a3b>] ? do_futex+0xdb/0x5d0
>> > [<ffffffff810b7028>] ? __wake_up_locked_key+0x18/0x20
>> > [<ffffffff810f3fa6>] ? SyS_futex+0x76/0x170
>> > [<ffffffff81014fc9>] do_notify_resume+0x69/0xb0
>> > [<ffffffff817cb9af>] int_signal+0x12/0x17
>> > Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 <48> 89 10 48 b8 00 01 10 00 00
>> > RIP [<ffffffffc07e25d8>] ioeventfd_release+0x28/0x60 [kvm]
>> > RSP <ffff88020e7f3bc8>
>> >
>> > Cc: Gleb Natapov <[email protected]>
>> > Cc: Paolo Bonzini <[email protected]>
>> > Cc: Michael S. Tsirkin <[email protected]>
>> > Signed-off-by: Jason Wang <[email protected]>
> I'm worried that this slows down the regular MMIO.

I doubt whether or not it was measurable.

> Could you share performance #s please?
> You need a mix of len=0 and len=2 matches.

Ok.

> One solution for the first issue is to create two ioeventfd objects instead.

Sounds good.

> For the second issue, we could change bsearch compare function instead.

What do you mean by "second issue" ?

> Again, affects all devices to performance #s would be needed.
>

2015-08-26 05:08:57

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] kvm: add tracepoint for fast mmio



On 08/25/2015 07:34 PM, Michael S. Tsirkin wrote:
> On Tue, Aug 25, 2015 at 03:47:15PM +0800, Jason Wang wrote:
>> > Cc: Gleb Natapov <[email protected]>
>> > Cc: Paolo Bonzini <[email protected]>
>> > Cc: Michael S. Tsirkin <[email protected]>
>> > Signed-off-by: Jason Wang <[email protected]>
>> > ---
>> > arch/x86/kvm/trace.h | 17 +++++++++++++++++
>> > arch/x86/kvm/vmx.c | 1 +
>> > arch/x86/kvm/x86.c | 1 +
>> > 3 files changed, 19 insertions(+)
>> >
>> > diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
>> > index 4eae7c3..2d4e81a 100644
>> > --- a/arch/x86/kvm/trace.h
>> > +++ b/arch/x86/kvm/trace.h
>> > @@ -128,6 +128,23 @@ TRACE_EVENT(kvm_pio,
>> > __entry->count > 1 ? "(...)" : "")
>> > );
>> >
>> > +TRACE_EVENT(kvm_fast_mmio,
>> > + TP_PROTO(u64 gpa),
>> > + TP_ARGS(gpa),
>> > +
>> > + TP_STRUCT__entry(
>> > + __field(u64, gpa)
>> > + ),
>> > +
>> > + TP_fast_assign(
>> > + __entry->gpa = gpa;
>> > + ),
>> > +
>> > + TP_printk("fast mmio at gpa 0x%llx", __entry->gpa)
>> > +);
>> > +
>> > +
>> > +
> don't add multiple empty lines please.
>

Ok

2015-08-26 05:39:14

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister



On 08/25/2015 11:29 PM, Joe Perches wrote:
> On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote:
>> > All fields of kvm_io_range were initialized or copied explicitly
>> > afterwards. So switch to use kmalloc().
> Is there any compiler added alignment padding
> in either structure? If so, those padding
> areas would now be uninitialized and may leak
> kernel data if copied to user-space.
>

I get your concern, but I don't a way to copy them to userspace, did you?

2015-08-26 05:45:53

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister

On Wed, 2015-08-26 at 13:39 +0800, Jason Wang wrote:
>
> On 08/25/2015 11:29 PM, Joe Perches wrote:
> > On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote:
> >> > All fields of kvm_io_range were initialized or copied explicitly
> >> > afterwards. So switch to use kmalloc().
> > Is there any compiler added alignment padding
> > in either structure? If so, those padding
> > areas would now be uninitialized and may leak
> > kernel data if copied to user-space.
> >
> I get your concern, but I don't a way to copy them to userspace, did you?

I didn't look.

I just wanted you to be aware there's a difference
and a reason why kzalloc might be used even though
all structure members are initialized.

2015-08-26 05:49:05

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH V2 1/3] kvm: use kmalloc() instead of kzalloc() during iodev register/unregister



On 08/26/2015 01:45 PM, Joe Perches wrote:
> On Wed, 2015-08-26 at 13:39 +0800, Jason Wang wrote:
>> >
>> > On 08/25/2015 11:29 PM, Joe Perches wrote:
>>> > > On Tue, 2015-08-25 at 15:47 +0800, Jason Wang wrote:
>>>>> > >> > All fields of kvm_io_range were initialized or copied explicitly
>>>>> > >> > afterwards. So switch to use kmalloc().
>>> > > Is there any compiler added alignment padding
>>> > > in either structure? If so, those padding
>>> > > areas would now be uninitialized and may leak
>>> > > kernel data if copied to user-space.
>>> > >
>> > I get your concern, but I don't a way to copy them to userspace, did you?
> I didn't look.
>
> I just wanted you to be aware there's a difference
> and a reason why kzalloc might be used even though
> all structure members are initialized.
>

I see, thanks for the reminding. Looks like we are safe and I will add
something like "kvm_io_range was never accessed by userspace" in the
commit log if there's a new version.