2011-02-04 09:49:26

by Jan Kiszka

[permalink] [raw]
Subject: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

Code under this lock requires non-preemptibility. Ensure this also over
-rt by converting it to raw spinlock.

Signed-off-by: Jan Kiszka <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/x86.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ffd7f8d..25e7734 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -448,7 +448,7 @@ struct kvm_arch {

unsigned long irq_sources_bitmap;
s64 kvmclock_offset;
- spinlock_t tsc_write_lock;
+ raw_spinlock_t tsc_write_lock;
u64 last_tsc_nsec;
u64 last_tsc_offset;
u64 last_tsc_write;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a7f65aa..d813919 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1017,7 +1017,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
unsigned long flags;
s64 sdiff;

- spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
+ raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
offset = data - native_read_tsc();
ns = get_kernel_ns();
elapsed = ns - kvm->arch.last_tsc_nsec;
@@ -1050,7 +1050,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
kvm->arch.last_tsc_write = data;
kvm->arch.last_tsc_offset = offset;
kvm_x86_ops->write_tsc_offset(vcpu, offset);
- spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
+ raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);

/* Reset of TSC must disable overshoot protection below */
vcpu->arch.hv_clock.tsc_timestamp = 0;
@@ -6009,7 +6009,7 @@ int kvm_arch_init_vm(struct kvm *kvm)
/* Reserve bit 0 of irq_sources_bitmap for userspace irq source */
set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);

- spin_lock_init(&kvm->arch.tsc_write_lock);
+ raw_spin_lock_init(&kvm->arch.tsc_write_lock);

return 0;
}
--
1.7.1


2011-02-04 21:03:11

by Zachary Amsden

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 02/04/2011 04:49 AM, Jan Kiszka wrote:
> Code under this lock requires non-preemptibility. Ensure this also over
> -rt by converting it to raw spinlock.
>

Oh dear, I had forgotten about that. I believe kvm_lock might have the
same assumption in a few places regarding clock.

2011-02-07 11:35:25

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 2011-02-04 22:03, Zachary Amsden wrote:
> On 02/04/2011 04:49 AM, Jan Kiszka wrote:
>> Code under this lock requires non-preemptibility. Ensure this also over
>> -rt by converting it to raw spinlock.
>>
>
> Oh dear, I had forgotten about that. I believe kvm_lock might have the
> same assumption in a few places regarding clock.

I only found a problematic section in kvmclock_cpufreq_notifier. Didn't
see this during my tests as I have CPUFREQ disabled in my .config.

We may need something like this as converting kvm_lock would likely be
overkill:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 36f54fb..971ee0d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4530,7 +4530,7 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va
struct cpufreq_freqs *freq = data;
struct kvm *kvm;
struct kvm_vcpu *vcpu;
- int i, send_ipi = 0;
+ int i, me, send_ipi = 0;

/*
* We allow guests to temporarily run on slowing clocks,
@@ -4583,9 +4583,11 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va
kvm_for_each_vcpu(i, vcpu, kvm) {
if (vcpu->cpu != freq->cpu)
continue;
+ me = get_cpu();
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
- if (vcpu->cpu != smp_processor_id())
+ if (vcpu->cpu != me)
send_ipi = 1;
+ put_cpu();
}
}
spin_unlock(&kvm_lock);

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2011-02-07 14:11:44

by Zachary Amsden

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 02/07/2011 06:35 AM, Jan Kiszka wrote:
> On 2011-02-04 22:03, Zachary Amsden wrote:
>
>> On 02/04/2011 04:49 AM, Jan Kiszka wrote:
>>
>>> Code under this lock requires non-preemptibility. Ensure this also over
>>> -rt by converting it to raw spinlock.
>>>
>>>
>> Oh dear, I had forgotten about that. I believe kvm_lock might have the
>> same assumption in a few places regarding clock.
>>
> I only found a problematic section in kvmclock_cpufreq_notifier. Didn't
> see this during my tests as I have CPUFREQ disabled in my .config.
>
> We may need something like this as converting kvm_lock would likely be
> overkill:
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 36f54fb..971ee0d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4530,7 +4530,7 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va
> struct cpufreq_freqs *freq = data;
> struct kvm *kvm;
> struct kvm_vcpu *vcpu;
> - int i, send_ipi = 0;
> + int i, me, send_ipi = 0;
>
> /*
> * We allow guests to temporarily run on slowing clocks,
> @@ -4583,9 +4583,11 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va
> kvm_for_each_vcpu(i, vcpu, kvm) {
> if (vcpu->cpu != freq->cpu)
> continue;
> + me = get_cpu();
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> - if (vcpu->cpu != smp_processor_id())
> + if (vcpu->cpu != me)
> send_ipi = 1;
> + put_cpu();
> }
> }
> spin_unlock(&kvm_lock);
>
> Jan
>
>

That looks like a good solution, and I do believe that is the only place
the lock is used in that fashion - please add a comment though in the
giant comment block above that preemption protection is needed for RT.
Also, gcc should catch this, but moving the me variable into the
kvm_for_each_vcpu loop should allow for better register allocation.

The only other thing I can think of is that RT lock preemption may break
some of the CPU initialization semantics enforced by kvm_lock if you
happen to get a hotplug event just as the module is loading. That
should be rare, but if it is indeed a bug, it would be nice to fix, it
would be a panic for sure not to initialize VMX.

Cheers,

Zach

2011-02-07 15:00:15

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 2011-02-07 15:11, Zachary Amsden wrote:
> On 02/07/2011 06:35 AM, Jan Kiszka wrote:
>> On 2011-02-04 22:03, Zachary Amsden wrote:
>>
>>> On 02/04/2011 04:49 AM, Jan Kiszka wrote:
>>>
>>>> Code under this lock requires non-preemptibility. Ensure this also over
>>>> -rt by converting it to raw spinlock.
>>>>
>>>>
>>> Oh dear, I had forgotten about that. I believe kvm_lock might have the
>>> same assumption in a few places regarding clock.
>>>
>> I only found a problematic section in kvmclock_cpufreq_notifier. Didn't
>> see this during my tests as I have CPUFREQ disabled in my .config.
>>
>> We may need something like this as converting kvm_lock would likely be
>> overkill:
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 36f54fb..971ee0d 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4530,7 +4530,7 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va
>> struct cpufreq_freqs *freq = data;
>> struct kvm *kvm;
>> struct kvm_vcpu *vcpu;
>> - int i, send_ipi = 0;
>> + int i, me, send_ipi = 0;
>>
>> /*
>> * We allow guests to temporarily run on slowing clocks,
>> @@ -4583,9 +4583,11 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va
>> kvm_for_each_vcpu(i, vcpu, kvm) {
>> if (vcpu->cpu != freq->cpu)
>> continue;
>> + me = get_cpu();
>> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>> - if (vcpu->cpu != smp_processor_id())
>> + if (vcpu->cpu != me)
>> send_ipi = 1;
>> + put_cpu();
>> }
>> }
>> spin_unlock(&kvm_lock);
>>
>> Jan
>>
>>
>
> That looks like a good solution, and I do believe that is the only place
> the lock is used in that fashion - please add a comment though in the
> giant comment block above that preemption protection is needed for RT.
> Also, gcc should catch this, but moving the me variable into the
> kvm_for_each_vcpu loop should allow for better register allocation.
>
> The only other thing I can think of is that RT lock preemption may break
> some of the CPU initialization semantics enforced by kvm_lock if you
> happen to get a hotplug event just as the module is loading. That
> should be rare, but if it is indeed a bug, it would be nice to fix, it
> would be a panic for sure not to initialize VMX.

Hmm, is a cpu hotplug notifier allowed to run sleepy code? Can't
imagine. So we already have a strong reason to convert kvm_lock to a
raw_spinlock which obsoletes the above workaround.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2011-02-07 15:15:51

by Zachary Amsden

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 02/07/2011 10:00 AM, Jan Kiszka wrote:
> On 2011-02-07 15:11, Zachary Amsden wrote:
>
>> On 02/07/2011 06:35 AM, Jan Kiszka wrote:
>>
>>> On 2011-02-04 22:03, Zachary Amsden wrote:
>>>
>>>
>>>> On 02/04/2011 04:49 AM, Jan Kiszka wrote:
>>>>
>>>>
>>>>> Code under this lock requires non-preemptibility. Ensure this also over
>>>>> -rt by converting it to raw spinlock.
>>>>>
>>>>>
>>>>>
>>>> Oh dear, I had forgotten about that. I believe kvm_lock might have the
>>>> same assumption in a few places regarding clock.
>>>>
>>>>
>>> I only found a problematic section in kvmclock_cpufreq_notifier. Didn't
>>> see this during my tests as I have CPUFREQ disabled in my .config.
>>>
>>> We may need something like this as converting kvm_lock would likely be
>>> overkill:
>>>
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 36f54fb..971ee0d 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -4530,7 +4530,7 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va
>>> struct cpufreq_freqs *freq = data;
>>> struct kvm *kvm;
>>> struct kvm_vcpu *vcpu;
>>> - int i, send_ipi = 0;
>>> + int i, me, send_ipi = 0;
>>>
>>> /*
>>> * We allow guests to temporarily run on slowing clocks,
>>> @@ -4583,9 +4583,11 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va
>>> kvm_for_each_vcpu(i, vcpu, kvm) {
>>> if (vcpu->cpu != freq->cpu)
>>> continue;
>>> + me = get_cpu();
>>> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>>> - if (vcpu->cpu != smp_processor_id())
>>> + if (vcpu->cpu != me)
>>> send_ipi = 1;
>>> + put_cpu();
>>> }
>>> }
>>> spin_unlock(&kvm_lock);
>>>
>>> Jan
>>>
>>>
>>>
>> That looks like a good solution, and I do believe that is the only place
>> the lock is used in that fashion - please add a comment though in the
>> giant comment block above that preemption protection is needed for RT.
>> Also, gcc should catch this, but moving the me variable into the
>> kvm_for_each_vcpu loop should allow for better register allocation.
>>
>> The only other thing I can think of is that RT lock preemption may break
>> some of the CPU initialization semantics enforced by kvm_lock if you
>> happen to get a hotplug event just as the module is loading. That
>> should be rare, but if it is indeed a bug, it would be nice to fix, it
>> would be a panic for sure not to initialize VMX.
>>
> Hmm, is a cpu hotplug notifier allowed to run sleepy code? Can't
> imagine. So we already have a strong reason to convert kvm_lock to a
> raw_spinlock which obsoletes the above workaround.
>

I don't know as it is allowed to sleep, it doesn't call any sleeping
functions to my knowledge. What worries me in the RT case is that the
spinlock acquired for hardware_enable might be preempted and run on
another CPU, which obviously isn't what you want.

Zach

2011-02-07 15:39:11

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 2011-02-07 16:15, Zachary Amsden wrote:
> On 02/07/2011 10:00 AM, Jan Kiszka wrote:
>> On 2011-02-07 15:11, Zachary Amsden wrote:
>>
>>> On 02/07/2011 06:35 AM, Jan Kiszka wrote:
>>>
>>>> On 2011-02-04 22:03, Zachary Amsden wrote:
>>>>
>>>>
>>>>> On 02/04/2011 04:49 AM, Jan Kiszka wrote:
>>>>>
>>>>>
>>>>>> Code under this lock requires non-preemptibility. Ensure this also over
>>>>>> -rt by converting it to raw spinlock.
>>>>>>
>>>>>>
>>>>>>
>>>>> Oh dear, I had forgotten about that. I believe kvm_lock might have the
>>>>> same assumption in a few places regarding clock.
>>>>>
>>>>>
>>>> I only found a problematic section in kvmclock_cpufreq_notifier. Didn't
>>>> see this during my tests as I have CPUFREQ disabled in my .config.
>>>>
>>>> We may need something like this as converting kvm_lock would likely be
>>>> overkill:
>>>>
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index 36f54fb..971ee0d 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -4530,7 +4530,7 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va
>>>> struct cpufreq_freqs *freq = data;
>>>> struct kvm *kvm;
>>>> struct kvm_vcpu *vcpu;
>>>> - int i, send_ipi = 0;
>>>> + int i, me, send_ipi = 0;
>>>>
>>>> /*
>>>> * We allow guests to temporarily run on slowing clocks,
>>>> @@ -4583,9 +4583,11 @@ static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long va
>>>> kvm_for_each_vcpu(i, vcpu, kvm) {
>>>> if (vcpu->cpu != freq->cpu)
>>>> continue;
>>>> + me = get_cpu();
>>>> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>>>> - if (vcpu->cpu != smp_processor_id())
>>>> + if (vcpu->cpu != me)
>>>> send_ipi = 1;
>>>> + put_cpu();
>>>> }
>>>> }
>>>> spin_unlock(&kvm_lock);
>>>>
>>>> Jan
>>>>
>>>>
>>>>
>>> That looks like a good solution, and I do believe that is the only place
>>> the lock is used in that fashion - please add a comment though in the
>>> giant comment block above that preemption protection is needed for RT.
>>> Also, gcc should catch this, but moving the me variable into the
>>> kvm_for_each_vcpu loop should allow for better register allocation.
>>>
>>> The only other thing I can think of is that RT lock preemption may break
>>> some of the CPU initialization semantics enforced by kvm_lock if you
>>> happen to get a hotplug event just as the module is loading. That
>>> should be rare, but if it is indeed a bug, it would be nice to fix, it
>>> would be a panic for sure not to initialize VMX.
>>>
>> Hmm, is a cpu hotplug notifier allowed to run sleepy code? Can't
>> imagine. So we already have a strong reason to convert kvm_lock to a
>> raw_spinlock which obsoletes the above workaround.
>>
>
> I don't know as it is allowed to sleep, it doesn't call any sleeping
> functions to my knowledge. What worries me in the RT case is that the
> spinlock acquired for hardware_enable might be preempted and run on
> another CPU, which obviously isn't what you want.

I see now, there are calls to raw_smp_processor_id.

I think it's best to make this a raw lock. At this chance, some
read-only users of vm_list should be rcu'ified. Will have a look.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2011-02-07 15:52:46

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 02/07/2011 05:38 PM, Jan Kiszka wrote:
> >
> > I don't know as it is allowed to sleep, it doesn't call any sleeping
> > functions to my knowledge. What worries me in the RT case is that the
> > spinlock acquired for hardware_enable might be preempted and run on
> > another CPU, which obviously isn't what you want.
>
> I see now, there are calls to raw_smp_processor_id.
>
> I think it's best to make this a raw lock. At this chance, some
> read-only users of vm_list should be rcu'ified. Will have a look.

vm_list is rarely used, for either read or write. I don't see the need
to rcu it.

--
error compiling committee.c: too many arguments to function

2011-02-07 15:58:31

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 2011-02-07 16:52, Avi Kivity wrote:
> On 02/07/2011 05:38 PM, Jan Kiszka wrote:
>>>
>>> I don't know as it is allowed to sleep, it doesn't call any sleeping
>>> functions to my knowledge. What worries me in the RT case is that the
>>> spinlock acquired for hardware_enable might be preempted and run on
>>> another CPU, which obviously isn't what you want.
>>
>> I see now, there are calls to raw_smp_processor_id.
>>
>> I think it's best to make this a raw lock. At this chance, some
>> read-only users of vm_list should be rcu'ified. Will have a look.
>
> vm_list is rarely used, for either read or write. I don't see the need
> to rcu it.

Avoid that code under this lock expands the preempt-disabled period,
specifically under -rt, and specifically as the number of objects over
which we loop is user-defined.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2011-02-07 16:26:50

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 02/07/2011 05:58 PM, Jan Kiszka wrote:
> On 2011-02-07 16:52, Avi Kivity wrote:
> > On 02/07/2011 05:38 PM, Jan Kiszka wrote:
> >>>
> >>> I don't know as it is allowed to sleep, it doesn't call any sleeping
> >>> functions to my knowledge. What worries me in the RT case is that the
> >>> spinlock acquired for hardware_enable might be preempted and run on
> >>> another CPU, which obviously isn't what you want.
> >>
> >> I see now, there are calls to raw_smp_processor_id.
> >>
> >> I think it's best to make this a raw lock. At this chance, some
> >> read-only users of vm_list should be rcu'ified. Will have a look.
> >
> > vm_list is rarely used, for either read or write. I don't see the need
> > to rcu it.
>
> Avoid that code under this lock expands the preempt-disabled period,
> specifically under -rt, and specifically as the number of objects over
> which we loop is user-defined.

Good point; even under non-rt.

(well, actually, cpufreq_notifier and kvm_arch_hardware_enable are
already non preemptible, and the stats code should just go away?)

--
error compiling committee.c: too many arguments to function

2011-02-07 16:59:46

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 2011-02-07 17:26, Avi Kivity wrote:
> On 02/07/2011 05:58 PM, Jan Kiszka wrote:
>> On 2011-02-07 16:52, Avi Kivity wrote:
>>> On 02/07/2011 05:38 PM, Jan Kiszka wrote:
>>>>>
>>>>> I don't know as it is allowed to sleep, it doesn't call any sleeping
>>>>> functions to my knowledge. What worries me in the RT case is that the
>>>>> spinlock acquired for hardware_enable might be preempted and run on
>>>>> another CPU, which obviously isn't what you want.
>>>>
>>>> I see now, there are calls to raw_smp_processor_id.
>>>>
>>>> I think it's best to make this a raw lock. At this chance, some
>>>> read-only users of vm_list should be rcu'ified. Will have a look.
>>>
>>> vm_list is rarely used, for either read or write. I don't see the need
>>> to rcu it.
>>
>> Avoid that code under this lock expands the preempt-disabled period,
>> specifically under -rt, and specifically as the number of objects over
>> which we loop is user-defined.
>
> Good point; even under non-rt.
>
> (well, actually, cpufreq_notifier and kvm_arch_hardware_enable are
> already non preemptible, and the stats code should just go away?)

The stats code is trivial to convert, so it doesn't matter.

But what about mmu_shrink and its list_move_tail? How is this
synchronized against kvm_destroy_vm - already today?

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2011-02-07 17:11:06

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 02/07/2011 06:59 PM, Jan Kiszka wrote:
> >
> > (well, actually, cpufreq_notifier and kvm_arch_hardware_enable are
> > already non preemptible, and the stats code should just go away?)
>
> The stats code is trivial to convert, so it doesn't matter.

Removal is easier.

> But what about mmu_shrink and its list_move_tail? How is this
> synchronized against kvm_destroy_vm - already today?

kvm_destroy_vm() takes kvm_lock. If a vm is destroyed before
mmu_shrink(), mmu_shrink() will never see it. If we reach mmu_shrink()
before kvm_destroy_vm(), the latter will wait until mmu_shrink() is done.

--
error compiling committee.c: too many arguments to function

2011-02-07 17:24:06

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 2011-02-07 18:10, Avi Kivity wrote:
> On 02/07/2011 06:59 PM, Jan Kiszka wrote:
>>>
>>> (well, actually, cpufreq_notifier and kvm_arch_hardware_enable are
>>> already non preemptible, and the stats code should just go away?)
>>
>> The stats code is trivial to convert, so it doesn't matter.
>
> Removal is easier.

Is that stat interface no longer used?

>
>> But what about mmu_shrink and its list_move_tail? How is this
>> synchronized against kvm_destroy_vm - already today?
>
> kvm_destroy_vm() takes kvm_lock. If a vm is destroyed before
> mmu_shrink(), mmu_shrink() will never see it. If we reach mmu_shrink()
> before kvm_destroy_vm(), the latter will wait until mmu_shrink() is done.
>

Ah, I was confused. Would require some more logic if we wanted to make
the loop lock-less, though.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2011-02-08 09:16:04

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 02/07/2011 07:23 PM, Jan Kiszka wrote:
> On 2011-02-07 18:10, Avi Kivity wrote:
> > On 02/07/2011 06:59 PM, Jan Kiszka wrote:
> >>>
> >>> (well, actually, cpufreq_notifier and kvm_arch_hardware_enable are
> >>> already non preemptible, and the stats code should just go away?)
> >>
> >> The stats code is trivial to convert, so it doesn't matter.
> >
> > Removal is easier.
>
> Is that stat interface no longer used?

It's there for compatibility. I'm itching to remove it. See
qemu-kvm.git/kvm/kvm_stat for the only known user, and for the
replacement via tracepoints.

Tracepoints have marginally lower overhead when disabled, and somewhat
higher overhead when enabled. A disadvantage of tracepoints is that it
is harder to associate an event with a vm when that event is triggered
by a workqueue, but I don't think it matters in practice (kvm_stat
doesn't even provide a per-vm breakdown).

> >
> >> But what about mmu_shrink and its list_move_tail? How is this
> >> synchronized against kvm_destroy_vm - already today?
> >
> > kvm_destroy_vm() takes kvm_lock. If a vm is destroyed before
> > mmu_shrink(), mmu_shrink() will never see it. If we reach mmu_shrink()
> > before kvm_destroy_vm(), the latter will wait until mmu_shrink() is done.
> >
>
> Ah, I was confused. Would require some more logic if we wanted to make
> the loop lock-less, though.

Yes, the usual rcu_read_lock() / grab reference / rcu_read_unlock().

--
error compiling committee.c: too many arguments to function

2011-02-08 09:56:08

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 2011-02-08 10:15, Avi Kivity wrote:
> On 02/07/2011 07:23 PM, Jan Kiszka wrote:
>> On 2011-02-07 18:10, Avi Kivity wrote:
>>> On 02/07/2011 06:59 PM, Jan Kiszka wrote:
>>>>>
>>>>> (well, actually, cpufreq_notifier and kvm_arch_hardware_enable are
>>>>> already non preemptible, and the stats code should just go away?)
>>>>
>>>> The stats code is trivial to convert, so it doesn't matter.
>>>
>>> Removal is easier.
>>
>> Is that stat interface no longer used?
>
> It's there for compatibility. I'm itching to remove it. See
> qemu-kvm.git/kvm/kvm_stat for the only known user, and for the
> replacement via tracepoints.

OK, but that will first take a grace period.

>
> Tracepoints have marginally lower overhead when disabled, and somewhat
> higher overhead when enabled. A disadvantage of tracepoints is that it
> is harder to associate an event with a vm when that event is triggered
> by a workqueue, but I don't think it matters in practice (kvm_stat
> doesn't even provide a per-vm breakdown).

What about using the perf infrastructure for this? Besides that perf can
reuse tracepoints, maybe there is even a more efficient way of added new
stat sources.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2011-02-08 09:58:40

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 02/08/2011 11:55 AM, Jan Kiszka wrote:
> >
> > Tracepoints have marginally lower overhead when disabled, and somewhat
> > higher overhead when enabled. A disadvantage of tracepoints is that it
> > is harder to associate an event with a vm when that event is triggered
> > by a workqueue, but I don't think it matters in practice (kvm_stat
> > doesn't even provide a per-vm breakdown).
>
> What about using the perf infrastructure for this? Besides that perf can
> reuse tracepoints, maybe there is even a more efficient way of added new
> stat sources.

We are using the perf infrastructure for this (using tracepoints).

--
error compiling committee.c: too many arguments to function

2011-02-10 10:40:49

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Convert tsc_write_lock to raw_spinlock

On 02/04/2011 11:49 AM, Jan Kiszka wrote:
> Code under this lock requires non-preemptibility. Ensure this also over
> -rt by converting it to raw spinlock.

Applied, thanks.

--
error compiling committee.c: too many arguments to function