The reasoning behind this is that you may want to run a guest at a
lower CPU frequency for the purposes of trying to match performance
parity between a host of an older CPU type to a newer faster one.
> -----Original Message-----
> From: Paolo Bonzini <[email protected]>
> Sent: 01 June 2022 09:57
> To: Durrant, Paul <[email protected]>; Vitaly Kuznetsov <[email protected]>; Peter Zijlstra
> <[email protected]>
> Cc: Allister, Jack <[email protected]>; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: RE: [EXTERNAL]...\n
>
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On 6/1/22 10:54, Durrant, Paul wrote:
> > That is exactly the case. This is not 'some hare-brained money
> > scheme'; there is genuine concern that moving a VM from old h/w to
> > new h/w may cause it to run 'too fast', breaking any such calibration
> > done by the guest OS/application. I also don't have any real-world
> > examples, but bugs may well be reported and having a lever to address
> > them is IMO a good idea. However, I also agree with Paolo that KVM
> > doesn't really need to be doing this when the VMM could do the job
> > using cpufreq, so we'll pursue that option instead. (FWIW the reason
> > for involving KVM was to do the freq adjustment right before entering
> > the guest and then remove the cap right after VMEXIT).
>
> But if so, you still would submit the full feature, wouldn't you?
>
Yes; the commit message should have at least said that we'd follow up... but a full series would have been a better idea.
> Paul, thanks for chiming in, and sorry for leaving you out of the list
> of people that can help Jack with his upstreaming efforts. :)
>
NP.
Paul
> Paolo
On 6/1/22 09:57, Vitaly Kuznetsov wrote:
>>> I'll bite... What's ludicrous about wanting to run a guest at a lower CPU freq to minimize observable change in whatever workload it is running?
>> Well, the right API is cpufreq, there's no need to make it a KVM
>> functionality.
> KVM may probably use the cpufreq API to run each vCPU at the desired
> frequency: I don't quite see how this can be done with a VMM today when
> it's not a 1-vCPU-per-1-pCPU setup.
True, but then there's also a policy issue, in that KVM shouldn't be
allowed to *bump* the frequency if userspace would ordinarily not have
access to the cpufreq files in sysfs.
All in all, I think it's simpler to let privileged userspace (which
knows when it has a 1:1 mapping of vCPU to pCPU) handle it with cpufreq.
Paolo
Paolo Bonzini <[email protected]> writes:
> On 5/31/22 16:52, Durrant, Paul wrote:
>>> -----Original Message-----
>>> From: Peter Zijlstra <[email protected]>
>>> Sent: 31 May 2022 15:44
>>> To: Allister, Jack <[email protected]>
>>> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
>>> [email protected]; [email protected]; [email protected]; [email protected];
>>> [email protected]; [email protected]; [email protected]; [email protected];
>>> [email protected]; [email protected]; [email protected]
>>> Subject: RE: [EXTERNAL]...\n
>>>
>>>
>>> On Tue, May 31, 2022 at 02:02:36PM +0000, Jack Allister wrote:
>>>> The reasoning behind this is that you may want to run a guest at a
>>>> lower CPU frequency for the purposes of trying to match performance
>>>> parity between a host of an older CPU type to a newer faster one.
>>>
>>> That's quite ludicrus. Also, then it should be the host enforcing the
>>> cpufreq, not the guest.
>>
>> I'll bite... What's ludicrous about wanting to run a guest at a lower CPU freq to minimize observable change in whatever workload it is running?
>
> Well, the right API is cpufreq, there's no need to make it a KVM
> functionality.
KVM may probably use the cpufreq API to run each vCPU at the desired
frequency: I don't quite see how this can be done with a VMM today when
it's not a 1-vCPU-per-1-pCPU setup.
--
Vitaly
On Tue, May 31, 2022 at 02:02:36PM +0000, Jack Allister wrote:
> The reasoning behind this is that you may want to run a guest at a
> lower CPU frequency for the purposes of trying to match performance
> parity between a host of an older CPU type to a newer faster one.
That's quite ludicrus. Also, then it should be the host enforcing the
cpufreq, not the guest.
On Wed, 2022-06-01 at 08:52 +0200, Peter Zijlstra wrote:
> On Tue, May 31, 2022 at 02:52:04PM +0000, Durrant, Paul wrote:
> > >
> > >
> > > On Tue, May 31, 2022 at 02:02:36PM +0000, Jack Allister wrote:
> > > > The reasoning behind this is that you may want to run a guest at a
> > > > lower CPU frequency for the purposes of trying to match performance
> > > > parity between a host of an older CPU type to a newer faster one.
> > >
> > > That's quite ludicrus. Also, then it should be the host enforcing the
> > > cpufreq, not the guest.
> >
> > I'll bite... What's ludicrous about wanting to run a guest at a lower
> > CPU freq to minimize observable change in whatever workload it is
> > running?
>
> *why* would you want to do that? Everybody wants their stuff done
> faster.
We're running out of older hardware on which VMs have been started, and
these have to be moved to newer hardware.
We want the customer experience to stay as close to the current
situation as possible (i.e. no surprises), as this is just a live-
migration event for these instances.
Live migration events happen today as well, within the same hardware
and hypervisor cluster. But this hw deprecation thing is going to be
new -- meaning customers and workloads aren't used to having hw
characteristics change as part of LM events.
> If this is some hare-brained money scheme; must not give them if they
> didn't pay up then I really don't care.
Many workloads that are still tied to the older generation instances we
offer are there for a reason. EC2's newer instance generations have a
better price and performance than the older ones; yet folks use the
older ones. We don't want to guess as to why that is. We just want
these workloads to continue running w/o changes or w/o customers having
to even think about these things, while running on supported hardware.
So as infrastructure providers, we're doing everything possible behind-
the-scenes to ensure there's as little disruption to existing workloads
as possible.
> On top of that, you can't hide uarch differences with cpufreq capping.
Yes, this move (old hw -> new hw) isn't supposed to be "hide from the
instances we're doing this". It's rather "try to match the
capabilities to the older hw as much as possible".
Some software will adapt to these changes; some software won't. We're
aiming to be ready for both scenarios as far as software allows us.
> Also, it is probably more power efficient to let it run faster and idle
> more, so you're not being environmental either.
Agreed; I can chat about that quite a bit, but that doesn't apply to
this context.
Amit
> On 1 Jun 2022, at 10:03, Vitaly Kuznetsov <[email protected]> wrote:
>
> Peter Zijlstra <[email protected]> writes:
>
>> On Tue, May 31, 2022 at 02:52:04PM +0000, Durrant, Paul wrote:
>
> ...
>
>>>
>>> I'll bite... What's ludicrous about wanting to run a guest at a lower
>>> CPU freq to minimize observable change in whatever workload it is
>>> running?
>>
>> *why* would you want to do that? Everybody wants their stuff done
>> faster.
>>
>
> FWIW, I can see a valid use-case: imagine you're running some software
> which calibrates itself in the beginning to run at some desired real
> time speed but then the VM running it has to be migrated to a host with
> faster (newer) CPUs. I don't have a real world examples out of top of my
> head but I remember some old DOS era games were impossible to play on
> newer CPUs because everything was happenning too fast. Maybe that's the
> case :-)
The PC version of Alpha Waves was such an example, but Frederick Raynal,
who did the port, said it was the last time he made the mistake. That was 1990 :-)
More seriously, what about mitigating timing-based remote attacks by
arbitrarily changing the CPU frequency and injecting noise in the timing?
That could be a valid use case, no? Although I can think of about a
million other ways of doing this more efficiently…
>
> --
> Vitaly
>
On 6/1/22 10:54, Durrant, Paul wrote:
> That is exactly the case. This is not 'some hare-brained money
> scheme'; there is genuine concern that moving a VM from old h/w to
> new h/w may cause it to run 'too fast', breaking any such calibration
> done by the guest OS/application. I also don't have any real-world
> examples, but bugs may well be reported and having a lever to address
> them is IMO a good idea. However, I also agree with Paolo that KVM
> doesn't really need to be doing this when the VMM could do the job
> using cpufreq, so we'll pursue that option instead. (FWIW the reason
> for involving KVM was to do the freq adjustment right before entering
> the guest and then remove the cap right after VMEXIT).
But if so, you still would submit the full feature, wouldn't you?
Paul, thanks for chiming in, and sorry for leaving you out of the list
of people that can help Jack with his upstreaming efforts. :)
Paolo
On 5/31/22 16:52, Durrant, Paul wrote:
>> -----Original Message-----
>> From: Peter Zijlstra <[email protected]>
>> Sent: 31 May 2022 15:44
>> To: Allister, Jack <[email protected]>
>> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]
>> Subject: RE: [EXTERNAL]...\n
>>
>>
>> On Tue, May 31, 2022 at 02:02:36PM +0000, Jack Allister wrote:
>>> The reasoning behind this is that you may want to run a guest at a
>>> lower CPU frequency for the purposes of trying to match performance
>>> parity between a host of an older CPU type to a newer faster one.
>>
>> That's quite ludicrus. Also, then it should be the host enforcing the
>> cpufreq, not the guest.
>
> I'll bite... What's ludicrous about wanting to run a guest at a lower CPU freq to minimize observable change in whatever workload it is running?
Well, the right API is cpufreq, there's no need to make it a KVM
functionality.
Paolo
On Wed, 2022-06-01 at 08:52 +0200, Peter Zijlstra wrote:
> On Tue, May 31, 2022 at 02:52:04PM +0000, Durrant, Paul wrote:
> > > On Tue, May 31, 2022 at 02:02:36PM +0000, Jack Allister wrote:
> > > > The reasoning behind this is that you may want to run a guest at a
> > > > lower CPU frequency for the purposes of trying to match performance
> > > > parity between a host of an older CPU type to a newer faster one.
> > >
> > > That's quite ludicrus. Also, then it should be the host enforcing the
> > > cpufreq, not the guest.
> >
> > I'll bite... What's ludicrous about wanting to run a guest at a lower
> > CPU freq to minimize observable change in whatever workload it is
> > running?
>
> *why* would you want to do that? Everybody wants their stuff done
> faster.
>
Nah, lots of customers have existing workloads and they want them to
run the *same*. They don't want them to run *faster* because that could
expose existing bugs and race conditions in guest code that has worked
perfectly fine for years. They don't want us to stress-test it when it
was working fine before.
Hell, we are implementing guest transparent live migration to KVM from
*actual* Xen in order to let stuff "just continue to run as it did
before", when for many it would "just" be a case of rebuilding their
guest with new NVMe and network drivers.
> If this is some hare-brained money scheme; must not give them if they
> didn't pay up then I really don't care.
It's actually the other way round. The older instance types were more
expensive; prices generally went down over time, especially $/perf.
None of that eliminates customer inertia.
> On top of that, you can't hide uarch differences with cpufreq capping.
No, but you can bring the performance envelope within spitting
distance. This isn't about hiding the fact that they are now running on
Linux and on newer CPUs; it's about not *breaking* things too much.
> Also, it is probably more power efficient to let it run faster and idle
> more, so you're not being environmental either.
Not sure about that. I thought I saw analysis that something like the
last 5% of turbo performance cost 30% of the power budget in practice.
And running these Xen guests even scaled down on modern hardware is
still much more power efficient than running them on the original
hardware that we're migrating them from.
> -----Original Message-----
[snip]
> >>
> >> I'll bite... What's ludicrous about wanting to run a guest at a lower
> >> CPU freq to minimize observable change in whatever workload it is
> >> running?
> >
> > *why* would you want to do that? Everybody wants their stuff done
> > faster.
> >
>
> FWIW, I can see a valid use-case: imagine you're running some software
> which calibrates itself in the beginning to run at some desired real
> time speed but then the VM running it has to be migrated to a host with
> faster (newer) CPUs. I don't have a real world examples out of top of my
> head but I remember some old DOS era games were impossible to play on
> newer CPUs because everything was happenning too fast. Maybe that's the
> case :-)
>
That is exactly the case. This is not 'some hare-brained money scheme'; there is genuine concern that moving a VM from old h/w to new h/w may cause it to run 'too fast', breaking any such calibration done by the guest OS/application. I also don't have any real-world examples, but bugs may well be reported and having a lever to address them is IMO a good idea.
However, I also agree with Paolo that KVM doesn't really need to be doing this when the VMM could do the job using cpufreq, so we'll pursue that option instead. (FWIW the reason for involving KVM was to do the freq adjustment right before entering the guest and then remove the cap right after VMEXIT).
Paul
> -----Original Message-----
> From: Peter Zijlstra <[email protected]>
> Sent: 31 May 2022 15:44
> To: Allister, Jack <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: RE: [EXTERNAL]...\n
>
>
> On Tue, May 31, 2022 at 02:02:36PM +0000, Jack Allister wrote:
> > The reasoning behind this is that you may want to run a guest at a
> > lower CPU frequency for the purposes of trying to match performance
> > parity between a host of an older CPU type to a newer faster one.
>
> That's quite ludicrus. Also, then it should be the host enforcing the
> cpufreq, not the guest.
I'll bite... What's ludicrous about wanting to run a guest at a lower CPU freq to minimize observable change in whatever workload it is running?
Paul
Peter Zijlstra <[email protected]> writes:
> On Tue, May 31, 2022 at 02:52:04PM +0000, Durrant, Paul wrote:
...
>>
>> I'll bite... What's ludicrous about wanting to run a guest at a lower
>> CPU freq to minimize observable change in whatever workload it is
>> running?
>
> *why* would you want to do that? Everybody wants their stuff done
> faster.
>
FWIW, I can see a valid use-case: imagine you're running some software
which calibrates itself in the beginning to run at some desired real
time speed but then the VM running it has to be migrated to a host with
faster (newer) CPUs. I don't have a real world examples out of top of my
head but I remember some old DOS era games were impossible to play on
newer CPUs because everything was happenning too fast. Maybe that's the
case :-)
--
Vitaly
On Wed, Jun 01, 2022 at 10:59:17AM +0200, Paolo Bonzini wrote:
> On 6/1/22 09:57, Vitaly Kuznetsov wrote:
> > > > I'll bite... What's ludicrous about wanting to run a guest at a lower CPU freq to minimize observable change in whatever workload it is running?
> > > Well, the right API is cpufreq, there's no need to make it a KVM
> > > functionality.
> > KVM may probably use the cpufreq API to run each vCPU at the desired
> > frequency: I don't quite see how this can be done with a VMM today when
> > it's not a 1-vCPU-per-1-pCPU setup.
>
> True, but then there's also a policy issue, in that KVM shouldn't be allowed
> to *bump* the frequency if userspace would ordinarily not have access to the
> cpufreq files in sysfs.
So, when using schedutil (which requires intel_pstate in passive mode),
then there's the option to use per-task uclamps which are somewhat
complicated but also affect cpufreq.
On Tue, May 31, 2022 at 02:52:04PM +0000, Durrant, Paul wrote:
> > -----Original Message-----
> > From: Peter Zijlstra <[email protected]>
> > Sent: 31 May 2022 15:44
> > To: Allister, Jack <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Subject: RE: [EXTERNAL]...\n
> >
> >
> > On Tue, May 31, 2022 at 02:02:36PM +0000, Jack Allister wrote:
> > > The reasoning behind this is that you may want to run a guest at a
> > > lower CPU frequency for the purposes of trying to match performance
> > > parity between a host of an older CPU type to a newer faster one.
> >
> > That's quite ludicrus. Also, then it should be the host enforcing the
> > cpufreq, not the guest.
>
> I'll bite... What's ludicrous about wanting to run a guest at a lower
> CPU freq to minimize observable change in whatever workload it is
> running?
*why* would you want to do that? Everybody wants their stuff done
faster.
If this is some hare-brained money scheme; must not give them if they
didn't pay up then I really don't care.
On top of that, you can't hide uarch differences with cpufreq capping.
Also, it is probably more power efficient to let it run faster and idle
more, so you're not being environmental either.