Split irqchip allows pic and ioapic routes to be used without them being
created, which results in NULL access. Check for NULL and avoid it.
(The setup is too racy for a nicer solutions.)
Found by syzkaller:
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 3 PID: 11923 Comm: kworker/3:2 Not tainted 4.9.0-rc5+ #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: events irqfd_inject
task: ffff88006a06c7c0 task.stack: ffff880068638000
RIP: 0010:[...] [...] __lock_acquire+0xb35/0x3380 kernel/locking/lockdep.c:3221
RSP: 0000:ffff88006863ea20 EFLAGS: 00010006
RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000039 RSI: 0000000000000000 RDI: 1ffff1000d0c7d9e
RBP: ffff88006863ef58 R08: 0000000000000001 R09: 0000000000000000
R10: 00000000000001c8 R11: 0000000000000000 R12: ffff88006a06c7c0
R13: 0000000000000001 R14: ffffffff8baab1a0 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004abdd0 CR3: 000000003e2f2000 CR4: 00000000000026e0
Stack:
ffffffff894d0098 1ffff1000d0c7d56 ffff88006863ecd0 dffffc0000000000
ffff88006a06c7c0 0000000000000000 ffff88006863ecf8 0000000000000082
0000000000000000 ffffffff815dd7c1 ffffffff00000000 ffffffff00000000
Call Trace:
[...] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746
[...] __raw_spin_lock include/linux/spinlock_api_smp.h:144
[...] _raw_spin_lock+0x38/0x50 kernel/locking/spinlock.c:151
[...] spin_lock include/linux/spinlock.h:302
[...] kvm_ioapic_set_irq+0x4c/0x100 arch/x86/kvm/ioapic.c:379
[...] kvm_set_ioapic_irq+0x8f/0xc0 arch/x86/kvm/irq_comm.c:52
[...] kvm_set_irq+0x239/0x640 arch/x86/kvm/../../../virt/kvm/irqchip.c:101
[...] irqfd_inject+0xb4/0x150 arch/x86/kvm/../../../virt/kvm/eventfd.c:60
[...] process_one_work+0xb40/0x1ba0 kernel/workqueue.c:2096
[...] worker_thread+0x214/0x18a0 kernel/workqueue.c:2230
[...] kthread+0x328/0x3e0 kernel/kthread.c:209
[...] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
Reported-by: Dmitry Vyukov <[email protected]>
Cc: [email protected]
Fixes: 49df6397edfc ("KVM: x86: Split the APIC from the rest of IRQCHIP.")
Signed-off-by: Radim Krčmář <[email protected]>
---
Cc: Dmitry Vyukov <[email protected]>
Cc: Steve Rutherford <[email protected]>
---
arch/x86/kvm/irq_comm.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 25810b144b58..ddd63b8b176e 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -41,6 +41,15 @@ static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
bool line_status)
{
struct kvm_pic *pic = pic_irqchip(kvm);
+
+ /*
+ * XXX: rejecting pic routes when pic isn't in use would be better,
+ * but the default routing table is installed while kvm->arch.vpic is
+ * NULL and KVM_CREATE_IRQCHIP can race with KVM_SET_GSI_ROUTING.
+ */
+ if (!pic)
+ return -1;
+
return kvm_pic_set_irq(pic, e->irqchip.pin, irq_source_id, level);
}
@@ -49,6 +58,10 @@ static int kvm_set_ioapic_irq(struct kvm_kernel_irq_routing_entry *e,
bool line_status)
{
struct kvm_ioapic *ioapic = kvm->arch.vioapic;
+
+ if (!ioapic)
+ return -1;
+
return kvm_ioapic_set_irq(ioapic, e->irqchip.pin, irq_source_id, level,
line_status);
}
--
2.10.2
On 23/11/2016 21:25, Radim Krčmář wrote:
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 25810b144b58..ddd63b8b176e 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -41,6 +41,15 @@ static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
> bool line_status)
> {
> struct kvm_pic *pic = pic_irqchip(kvm);
> +
> + /*
> + * XXX: rejecting pic routes when pic isn't in use would be better,
> + * but the default routing table is installed while kvm->arch.vpic is
> + * NULL and KVM_CREATE_IRQCHIP can race with KVM_SET_GSI_ROUTING.
> + */
> + if (!pic)
> + return -1;
> +
> return kvm_pic_set_irq(pic, e->irqchip.pin, irq_source_id, level);
> }
>
Can you explain the race with the default routing table better? It
seems to me that it can only make the routing table go from invalid to
valid (there is no KVM_DESTROY_IRQCHIP) so it's benign.
Paolo
2016-11-23 22:58+0100, Paolo Bonzini:
> On 23/11/2016 21:25, Radim Krčmář wrote:
>> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
>> index 25810b144b58..ddd63b8b176e 100644
>> --- a/arch/x86/kvm/irq_comm.c
>> +++ b/arch/x86/kvm/irq_comm.c
>> @@ -41,6 +41,15 @@ static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
>> bool line_status)
>> {
>> struct kvm_pic *pic = pic_irqchip(kvm);
>> +
>> + /*
>> + * XXX: rejecting pic routes when pic isn't in use would be better,
>> + * but the default routing table is installed while kvm->arch.vpic is
>> + * NULL and KVM_CREATE_IRQCHIP can race with KVM_SET_GSI_ROUTING.
>> + */
>> + if (!pic)
>> + return -1;
>> +
>> return kvm_pic_set_irq(pic, e->irqchip.pin, irq_source_id, level);
>> }
>>
>
> Can you explain the race with the default routing table better? It
> seems to me that it can only make the routing table go from invalid to
> valid (there is no KVM_DESTROY_IRQCHIP) so it's benign.
Oops, I wrote the race with wrong IOCTL -- it should be KVM_IRQ_LINE.
1) set KVM_CAP_SPLIT_IRQCHIP (unlocks KVM_IRQ_LINE)
a) call KVM_CREATE_IRQCHIP (creates routes while !kvm->arch.vpic)
b) concurrently call KVM_IRQ_LINE for PIO routes (dereferences NULL)
The problem is that we use pic_in_kernel() as irqchip_in_kernel(), so it
cannot be set before we set up routes, but we then cannot reject routes
when pic is not in use. The best effort is to do this for pic routes in
kvm_set_routing_entry():
// initialization is the only place where pic_in_kernel() != ioapic_in_kernel()
if (!pic_in_kernel(kvm) && !ioapic_in_kernel(kvm))
goto out;
and similar for ioapic routes:
if (!ioapic_in_kernel(kvm))
goto out;
I think it would work if we forbade KVM_CREATE_IRQCHIP after
KVM_CAP_SPLIT_IRQCHIP (which we want to do anyway). And adding a new
variable for irqchip_in_kernel() would allow us to make the pic
condition reasonabled.
I'll do something like that for 4.10, but the current patch is better
suited for stable.
Would fixing the comment be enough?
Do you want the following hunk already in 4.9?
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6f9c9ad13f88..dbed51045c37 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3901,7 +3901,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
mutex_lock(&kvm->lock);
r = -EEXIST;
- if (kvm->arch.vpic)
+ if (irqchip_in_kernel(kvm))
goto create_irqchip_unlock;
r = -EINVAL;
if (kvm->created_vcpus)
> Oops, I wrote the race with wrong IOCTL -- it should be KVM_IRQ_LINE.
>
> 1) set KVM_CAP_SPLIT_IRQCHIP (unlocks KVM_IRQ_LINE)
> a) call KVM_CREATE_IRQCHIP (creates routes while !kvm->arch.vpic)
> b) concurrently call KVM_IRQ_LINE for PIO routes (dereferences NULL)
>
> The problem is that we use pic_in_kernel() as irqchip_in_kernel(), so it
> cannot be set before we set up routes, but we then cannot reject routes
> when pic is not in use. The best effort is to do this for pic routes in
> kvm_set_routing_entry():
>
> // initialization is the only place where pic_in_kernel() !=
> ioapic_in_kernel()
> if (!pic_in_kernel(kvm) && !ioapic_in_kernel(kvm))
> goto out;
>
> and similar for ioapic routes:
>
> if (!ioapic_in_kernel(kvm))
> goto out;
>
> I think it would work if we forbade KVM_CREATE_IRQCHIP after
> KVM_CAP_SPLIT_IRQCHIP (which we want to do anyway).
Yeah, definitely.
> And adding a new
> variable for irqchip_in_kernel() would allow us to make the pic
> condition reasonabled.
Or change kvm->arch.irqchip_split to an enum.
> I'll do something like that for 4.10, but the current patch is better
> suited for stable.
>
> Would fixing the comment be enough?
Yes, fine!
> Do you want the following hunk already in 4.9?
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6f9c9ad13f88..dbed51045c37 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3901,7 +3901,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
>
> mutex_lock(&kvm->lock);
> r = -EEXIST;
> - if (kvm->arch.vpic)
> + if (irqchip_in_kernel(kvm))
> goto create_irqchip_unlock;
> r = -EINVAL;
> if (kvm->created_vcpus)
No, it's unnecessary.
Paolo
2016-11-24 20:42 GMT+08:00 Radim Krčmář <[email protected]>:
> 2016-11-23 22:58+0100, Paolo Bonzini:
>> On 23/11/2016 21:25, Radim Krčmář wrote:
>>> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
>>> index 25810b144b58..ddd63b8b176e 100644
>>> --- a/arch/x86/kvm/irq_comm.c
>>> +++ b/arch/x86/kvm/irq_comm.c
>>> @@ -41,6 +41,15 @@ static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
>>> bool line_status)
>>> {
>>> struct kvm_pic *pic = pic_irqchip(kvm);
>>> +
>>> + /*
>>> + * XXX: rejecting pic routes when pic isn't in use would be better,
>>> + * but the default routing table is installed while kvm->arch.vpic is
>>> + * NULL and KVM_CREATE_IRQCHIP can race with KVM_SET_GSI_ROUTING.
>>> + */
>>> + if (!pic)
>>> + return -1;
>>> +
>>> return kvm_pic_set_irq(pic, e->irqchip.pin, irq_source_id, level);
>>> }
>>>
>>
>> Can you explain the race with the default routing table better? It
>> seems to me that it can only make the routing table go from invalid to
>> valid (there is no KVM_DESTROY_IRQCHIP) so it's benign.
>
> Oops, I wrote the race with wrong IOCTL -- it should be KVM_IRQ_LINE.
>
> 1) set KVM_CAP_SPLIT_IRQCHIP (unlocks KVM_IRQ_LINE)
> a) call KVM_CREATE_IRQCHIP (creates routes while !kvm->arch.vpic)
> b) concurrently call KVM_IRQ_LINE for PIO routes (dereferences NULL)
If we should not go through irqfd if irqchip is split?
Regards,
Wanpeng Li
2016-12-20 19:59+0800, Wanpeng Li:
> 2016-11-24 20:42 GMT+08:00 Radim Krčmář <[email protected]>:
>> 2016-11-23 22:58+0100, Paolo Bonzini:
>>> On 23/11/2016 21:25, Radim Krčmář wrote:
>>>> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
>>>> index 25810b144b58..ddd63b8b176e 100644
>>>> --- a/arch/x86/kvm/irq_comm.c
>>>> +++ b/arch/x86/kvm/irq_comm.c
>>>> @@ -41,6 +41,15 @@ static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
>>>> bool line_status)
>>>> {
>>>> struct kvm_pic *pic = pic_irqchip(kvm);
>>>> +
>>>> + /*
>>>> + * XXX: rejecting pic routes when pic isn't in use would be better,
>>>> + * but the default routing table is installed while kvm->arch.vpic is
>>>> + * NULL and KVM_CREATE_IRQCHIP can race with KVM_SET_GSI_ROUTING.
>>>> + */
>>>> + if (!pic)
>>>> + return -1;
>>>> +
>>>> return kvm_pic_set_irq(pic, e->irqchip.pin, irq_source_id, level);
>>>> }
>>>>
>>>
>>> Can you explain the race with the default routing table better? It
>>> seems to me that it can only make the routing table go from invalid to
>>> valid (there is no KVM_DESTROY_IRQCHIP) so it's benign.
>>
>> Oops, I wrote the race with wrong IOCTL -- it should be KVM_IRQ_LINE.
>>
>> 1) set KVM_CAP_SPLIT_IRQCHIP (unlocks KVM_IRQ_LINE)
>> a) call KVM_CREATE_IRQCHIP (creates routes while !kvm->arch.vpic)
>> b) concurrently call KVM_IRQ_LINE for PIO routes (dereferences NULL)
>
> If we should not go through irqfd if irqchip is split?
I also remember hearing about that -- do you remember where it was?
The documentation does not say that and irqfd is mostly optimization for
KVM_IRQ_LINE ... QEMU uses KVM_IRQ_LINE_STATUS with split irqchip, so we
can't easily say the opposite now.
2016-12-21 20:44 GMT+08:00 Radim Krčmář <[email protected]>:
> 2016-12-20 19:59+0800, Wanpeng Li:
>> 2016-11-24 20:42 GMT+08:00 Radim Krčmář <[email protected]>:
>>> 2016-11-23 22:58+0100, Paolo Bonzini:
>>>> On 23/11/2016 21:25, Radim Krčmář wrote:
>>>>> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
>>>>> index 25810b144b58..ddd63b8b176e 100644
>>>>> --- a/arch/x86/kvm/irq_comm.c
>>>>> +++ b/arch/x86/kvm/irq_comm.c
>>>>> @@ -41,6 +41,15 @@ static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
>>>>> bool line_status)
>>>>> {
>>>>> struct kvm_pic *pic = pic_irqchip(kvm);
>>>>> +
>>>>> + /*
>>>>> + * XXX: rejecting pic routes when pic isn't in use would be better,
>>>>> + * but the default routing table is installed while kvm->arch.vpic is
>>>>> + * NULL and KVM_CREATE_IRQCHIP can race with KVM_SET_GSI_ROUTING.
>>>>> + */
>>>>> + if (!pic)
>>>>> + return -1;
>>>>> +
>>>>> return kvm_pic_set_irq(pic, e->irqchip.pin, irq_source_id, level);
>>>>> }
>>>>>
>>>>
>>>> Can you explain the race with the default routing table better? It
>>>> seems to me that it can only make the routing table go from invalid to
>>>> valid (there is no KVM_DESTROY_IRQCHIP) so it's benign.
>>>
>>> Oops, I wrote the race with wrong IOCTL -- it should be KVM_IRQ_LINE.
>>>
>>> 1) set KVM_CAP_SPLIT_IRQCHIP (unlocks KVM_IRQ_LINE)
>>> a) call KVM_CREATE_IRQCHIP (creates routes while !kvm->arch.vpic)
>>> b) concurrently call KVM_IRQ_LINE for PIO routes (dereferences NULL)
>>
>> If we should not go through irqfd if irqchip is split?
>
> I also remember hearing about that -- do you remember where it was?
Not sure. :)
>
> The documentation does not say that and irqfd is mostly optimization for
> KVM_IRQ_LINE ... QEMU uses KVM_IRQ_LINE_STATUS with split irqchip, so we
> can't easily say the opposite now.
How irqfd optimizes KVM_IRQ_LINE_STATUS? I didn't observe that they
have relationship.
Regards,
Wanpeng Li