2019-11-05 16:21:19

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

Virtualized guests may pick a different strategy to mitigate hardware
vulnerabilities when it comes to hyper-threading: disable SMT completely,
use core scheduling, or, for example, opt in for STIBP. Making the
decision, however, requires an extra bit of information which is currently
missing: does the topology the guest see match hardware or if it is 'fake'
and two vCPUs which look like different cores from guest's perspective can
actually be scheduled on the same physical core. Disabling SMT or doing
core scheduling only makes sense when the topology is trustworthy.

Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
topology is actually trustworthy. It would, of course, be possible to get
away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
compatibility but the current approach looks more straightforward.

There were some offline discussions on whether this new feature bit should
be complemented with a 're-enlightenment' mechanism for live migration (so
it can change in guest's lifetime) but it doesn't seem to be very
practical: what a sane guest is supposed to do if it's told that SMT
topology is about to become fake other than kill itself? Also, it seems to
make little sense to do e.g. CPU pinning on the source but not on the
destination.

There is also one additional piece of the information missing. A VM can be
sharing physical cores with other VMs (or other userspace tasks on the
host) so does KVM_FEATURE_TRUSTWORTHY_SMT imply that it's not the case or
not? It is unclear if this changes anything and can probably be left out
of scope (just don't do that).

Similar to the already existent 'NoNonArchitecturalCoreSharing' Hyper-V
enlightenment, the default value of KVM_HINTS_TRUSTWORTHY_SMT is set to
!cpu_smt_possible(). KVM userspace is thus supposed to pass it to guest's
CPUIDs in case it is '1' (meaning no SMT on the host at all) or do some
extra work (like CPU pinning and exposing the correct topology) before
passing '1' to the guest.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
Documentation/virt/kvm/cpuid.rst | 27 +++++++++++++++++++--------
arch/x86/include/uapi/asm/kvm_para.h | 2 ++
arch/x86/kvm/cpuid.c | 7 ++++++-
3 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index 01b081f6e7ea..64b94103fc90 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
before using paravirtualized
sched yield.

+KVM_FEATURE_TRUSTWORTHY_SMT 14 set when host supports 'SMT
+ topology is trustworthy' hint
+ (KVM_HINTS_TRUSTWORTHY_SMT).
+
KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
per-cpu warps are expeced in
kvmclock
@@ -97,11 +101,18 @@ KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side

Where ``flag`` here is defined as below:

-================== ============ =================================
-flag value meaning
-================== ============ =================================
-KVM_HINTS_REALTIME 0 guest checks this feature bit to
- determine that vCPUs are never
- preempted for an unlimited time
- allowing optimizations
-================== ============ =================================
+================================= =========== =================================
+flag value meaning
+================================= =========== =================================
+KVM_HINTS_REALTIME 0 guest checks this feature bit to
+ determine that vCPUs are never
+ preempted for an unlimited time
+ allowing optimizations
+
+KVM_HINTS_TRUSTWORTHY_SMT 1 the bit is set when the exposed
+ SMT topology is trustworthy, this
+ means that two guest vCPUs will
+ never share a physical core
+ unless they are exposed as SMT
+ threads.
+================================= =========== =================================
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 2a8e0b6b9805..183239d5dfad 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -31,8 +31,10 @@
#define KVM_FEATURE_PV_SEND_IPI 11
#define KVM_FEATURE_POLL_CONTROL 12
#define KVM_FEATURE_PV_SCHED_YIELD 13
+#define KVM_FEATURE_TRUSTWORTHY_SMT 14

#define KVM_HINTS_REALTIME 0
+#define KVM_HINTS_TRUSTWORTHY_SMT 1

/* The last 8 bits are used to indicate how to interpret the flags field
* in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f68c0c753c38..dab527a7081f 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -712,7 +712,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
(1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
(1 << KVM_FEATURE_PV_SEND_IPI) |
(1 << KVM_FEATURE_POLL_CONTROL) |
- (1 << KVM_FEATURE_PV_SCHED_YIELD);
+ (1 << KVM_FEATURE_PV_SCHED_YIELD) |
+ (1 << KVM_FEATURE_TRUSTWORTHY_SMT);

if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
@@ -720,6 +721,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
entry->ebx = 0;
entry->ecx = 0;
entry->edx = 0;
+
+ if (!cpu_smt_possible())
+ entry->edx |= (1 << KVM_HINTS_TRUSTWORTHY_SMT);
+
break;
case 0x80000000:
entry->eax = min(entry->eax, 0x8000001f);
--
2.20.1


2019-11-05 17:20:12

by Liran Alon

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy



> On 5 Nov 2019, at 18:17, Vitaly Kuznetsov <[email protected]> wrote:
>
> Virtualized guests may pick a different strategy to mitigate hardware
> vulnerabilities when it comes to hyper-threading: disable SMT completely,
> use core scheduling, or, for example, opt in for STIBP. Making the
> decision, however, requires an extra bit of information which is currently
> missing: does the topology the guest see match hardware or if it is 'fake'
> and two vCPUs which look like different cores from guest's perspective can
> actually be scheduled on the same physical core. Disabling SMT or doing
> core scheduling only makes sense when the topology is trustworthy.

This is not only related to vulnerability mitigations.
It’s also important for guest to know if it’s SMT topology is trustworthy for various optimisation algorithms.
E.g. Should it attempt to run tasks that share memory on same NUMA node?

>
> Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
> that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
> topology is actually trustworthy. It would, of course, be possible to get
> away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
> compatibility but the current approach looks more straightforward.

Agree.

>
> There were some offline discussions on whether this new feature bit should
> be complemented with a 're-enlightenment' mechanism for live migration (so
> it can change in guest's lifetime) but it doesn't seem to be very
> practical: what a sane guest is supposed to do if it's told that SMT
> topology is about to become fake other than kill itself? Also, it seems to
> make little sense to do e.g. CPU pinning on the source but not on the
> destination.

Agree.

>
> There is also one additional piece of the information missing. A VM can be
> sharing physical cores with other VMs (or other userspace tasks on the
> host) so does KVM_FEATURE_TRUSTWORTHY_SMT imply that it's not the case or
> not? It is unclear if this changes anything and can probably be left out
> of scope (just don't do that).

I don’t think KVM_FEATURE_TRUSTWORTHY_SMT should indicate to guest whether it’s vCPU shares a CPU core with another guest.
It should only expose to guest the fact that he can rely on it’s virtual SMT topology. i.e. That there is a relation between virtual SMT topology
to which physical logical processors run which vCPUs.

Guest have nothing to do with the fact that he is now aware host doesn’t guarantee to him that one of it’s vCPU shares a CPU core with another guest vCPU.
I don’t think we should have a CPUID bit that expose this information to guest.

>
> Similar to the already existent 'NoNonArchitecturalCoreSharing' Hyper-V
> enlightenment, the default value of KVM_HINTS_TRUSTWORTHY_SMT is set to
> !cpu_smt_possible(). KVM userspace is thus supposed to pass it to guest's
> CPUIDs in case it is '1' (meaning no SMT on the host at all) or do some
> extra work (like CPU pinning and exposing the correct topology) before
> passing '1' to the guest.

Hmm… I’m not sure this is correct.
For example, it is possible to expose in virtual SMT topology that guest have 2 vCPUs running on single NUMA node,
while in reality each vCPU task can be scheduled to run on different NUMA nodes. Therefore, making virtual SMT topology not trustworthy.
i.e. Disabling SMT on host doesn’t mean that virtual SMT topology is reliable.

I think this CPUID bit should just be set from userspace when admin have guaranteed to guest that it have set vCPU task affinity properly.
Without KVM attempting to set this bit by itself.

Note that we defined above KVM_HINTS_TRUSTWORTHY_SMT bit differently than “NoNonArchitecturalCoreSharing”.
“NoNonArchitecturalCoreSharing” guarantees to guest that vCPUs of guest won’t share a physical CPU core unless they are defined as virtual SMT siblings.
In contrast, KVM_HINTS_TRUSTWORTHY_SMT bit attempts to state that virtual SMT topology is a subset of how vCPUs are scheduled on physical SMT topology.
i.e. It seems that Hyper-V bit is indeed only attempting to provide guest information related to security mitigations. While newly proposed KVM bit attempts to also
assist guest to determine how to perform it’s internal scheduling decisions.

-Liran

>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> Documentation/virt/kvm/cpuid.rst | 27 +++++++++++++++++++--------
> arch/x86/include/uapi/asm/kvm_para.h | 2 ++
> arch/x86/kvm/cpuid.c | 7 ++++++-
> 3 files changed, 27 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index 01b081f6e7ea..64b94103fc90 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
> before using paravirtualized
> sched yield.
>
> +KVM_FEATURE_TRUSTWORTHY_SMT 14 set when host supports 'SMT
> + topology is trustworthy' hint
> + (KVM_HINTS_TRUSTWORTHY_SMT).
> +
> KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
> per-cpu warps are expeced in
> kvmclock
> @@ -97,11 +101,18 @@ KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
>
> Where ``flag`` here is defined as below:
>
> -================== ============ =================================
> -flag value meaning
> -================== ============ =================================
> -KVM_HINTS_REALTIME 0 guest checks this feature bit to
> - determine that vCPUs are never
> - preempted for an unlimited time
> - allowing optimizations
> -================== ============ =================================
> +================================= =========== =================================
> +flag value meaning
> +================================= =========== =================================
> +KVM_HINTS_REALTIME 0 guest checks this feature bit to
> + determine that vCPUs are never
> + preempted for an unlimited time
> + allowing optimizations
> +
> +KVM_HINTS_TRUSTWORTHY_SMT 1 the bit is set when the exposed
> + SMT topology is trustworthy, this
> + means that two guest vCPUs will
> + never share a physical core
> + unless they are exposed as SMT
> + threads.
> +================================= =========== =================================
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 2a8e0b6b9805..183239d5dfad 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -31,8 +31,10 @@
> #define KVM_FEATURE_PV_SEND_IPI 11
> #define KVM_FEATURE_POLL_CONTROL 12
> #define KVM_FEATURE_PV_SCHED_YIELD 13
> +#define KVM_FEATURE_TRUSTWORTHY_SMT 14
>
> #define KVM_HINTS_REALTIME 0
> +#define KVM_HINTS_TRUSTWORTHY_SMT 1
>
> /* The last 8 bits are used to indicate how to interpret the flags field
> * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index f68c0c753c38..dab527a7081f 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -712,7 +712,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
> (1 << KVM_FEATURE_PV_SEND_IPI) |
> (1 << KVM_FEATURE_POLL_CONTROL) |
> - (1 << KVM_FEATURE_PV_SCHED_YIELD);
> + (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> + (1 << KVM_FEATURE_TRUSTWORTHY_SMT);
>
> if (sched_info_on())
> entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> @@ -720,6 +721,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> entry->ebx = 0;
> entry->ecx = 0;
> entry->edx = 0;
> +
> + if (!cpu_smt_possible())
> + entry->edx |= (1 << KVM_HINTS_TRUSTWORTHY_SMT);
> +
> break;
> case 0x80000000:
> entry->eax = min(entry->eax, 0x8000001f);
> --
> 2.20.1
>

2019-11-05 17:34:39

by Liran Alon

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy



> On 5 Nov 2019, at 19:17, Liran Alon <[email protected]> wrote:
>
>
>
>> On 5 Nov 2019, at 18:17, Vitaly Kuznetsov <[email protected]> wrote:
>>
>> Virtualized guests may pick a different strategy to mitigate hardware
>> vulnerabilities when it comes to hyper-threading: disable SMT completely,
>> use core scheduling, or, for example, opt in for STIBP. Making the
>> decision, however, requires an extra bit of information which is currently
>> missing: does the topology the guest see match hardware or if it is 'fake'
>> and two vCPUs which look like different cores from guest's perspective can
>> actually be scheduled on the same physical core. Disabling SMT or doing
>> core scheduling only makes sense when the topology is trustworthy.
>
> This is not only related to vulnerability mitigations.
> It’s also important for guest to know if it’s SMT topology is trustworthy for various optimisation algorithms.
> E.g. Should it attempt to run tasks that share memory on same NUMA node?
>
>>
>> Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
>> that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
>> topology is actually trustworthy. It would, of course, be possible to get
>> away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
>> compatibility but the current approach looks more straightforward.
>
> Agree.
>
>>
>> There were some offline discussions on whether this new feature bit should
>> be complemented with a 're-enlightenment' mechanism for live migration (so
>> it can change in guest's lifetime) but it doesn't seem to be very
>> practical: what a sane guest is supposed to do if it's told that SMT
>> topology is about to become fake other than kill itself? Also, it seems to
>> make little sense to do e.g. CPU pinning on the source but not on the
>> destination.
>
> Agree.
>
>>
>> There is also one additional piece of the information missing. A VM can be
>> sharing physical cores with other VMs (or other userspace tasks on the
>> host) so does KVM_FEATURE_TRUSTWORTHY_SMT imply that it's not the case or
>> not? It is unclear if this changes anything and can probably be left out
>> of scope (just don't do that).
>
> I don’t think KVM_FEATURE_TRUSTWORTHY_SMT should indicate to guest whether it’s vCPU shares a CPU core with another guest.
> It should only expose to guest the fact that he can rely on it’s virtual SMT topology. i.e. That there is a relation between virtual SMT topology
> to which physical logical processors run which vCPUs.
>
> Guest have nothing to do with the fact that he is now aware host doesn’t guarantee to him that one of it’s vCPU shares a CPU core with another guest vCPU.
> I don’t think we should have a CPUID bit that expose this information to guest.
>
>>
>> Similar to the already existent 'NoNonArchitecturalCoreSharing' Hyper-V
>> enlightenment, the default value of KVM_HINTS_TRUSTWORTHY_SMT is set to
>> !cpu_smt_possible(). KVM userspace is thus supposed to pass it to guest's
>> CPUIDs in case it is '1' (meaning no SMT on the host at all) or do some
>> extra work (like CPU pinning and exposing the correct topology) before
>> passing '1' to the guest.
>
> Hmm… I’m not sure this is correct.
> For example, it is possible to expose in virtual SMT topology that guest have 2 vCPUs running on single NUMA node,
> while in reality each vCPU task can be scheduled to run on different NUMA nodes. Therefore, making virtual SMT topology not trustworthy.
> i.e. Disabling SMT on host doesn’t mean that virtual SMT topology is reliable.
>
> I think this CPUID bit should just be set from userspace when admin have guaranteed to guest that it have set vCPU task affinity properly.
> Without KVM attempting to set this bit by itself.
>
> Note that we defined above KVM_HINTS_TRUSTWORTHY_SMT bit differently than “NoNonArchitecturalCoreSharing”.
> “NoNonArchitecturalCoreSharing” guarantees to guest that vCPUs of guest won’t share a physical CPU core unless they are defined as virtual SMT siblings.
> In contrast, KVM_HINTS_TRUSTWORTHY_SMT bit attempts to state that virtual SMT topology is a subset of how vCPUs are scheduled on physical SMT topology.
> i.e. It seems that Hyper-V bit is indeed only attempting to provide guest information related to security mitigations. While newly proposed KVM bit attempts to also
> assist guest to determine how to perform it’s internal scheduling decisions.
>
> -Liran

Oh I later saw below that you defined KVM_HINTS_TRUSTWORTHY_SMT indeed as Microsoft defined “NoNonArchitecturalCoreSharing”.
If you plan to go with this direction, than I suggest renaming to similar name as Hyper-V.
But I think having a general vSMT topology is trustworthy is also useful.
Maybe we should have separate bits for each.

-Liran

>
>>
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> ---
>> Documentation/virt/kvm/cpuid.rst | 27 +++++++++++++++++++--------
>> arch/x86/include/uapi/asm/kvm_para.h | 2 ++
>> arch/x86/kvm/cpuid.c | 7 ++++++-
>> 3 files changed, 27 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
>> index 01b081f6e7ea..64b94103fc90 100644
>> --- a/Documentation/virt/kvm/cpuid.rst
>> +++ b/Documentation/virt/kvm/cpuid.rst
>> @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
>> before using paravirtualized
>> sched yield.
>>
>> +KVM_FEATURE_TRUSTWORTHY_SMT 14 set when host supports 'SMT
>> + topology is trustworthy' hint
>> + (KVM_HINTS_TRUSTWORTHY_SMT).
>> +
>> KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
>> per-cpu warps are expeced in
>> kvmclock
>> @@ -97,11 +101,18 @@ KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
>>
>> Where ``flag`` here is defined as below:
>>
>> -================== ============ =================================
>> -flag value meaning
>> -================== ============ =================================
>> -KVM_HINTS_REALTIME 0 guest checks this feature bit to
>> - determine that vCPUs are never
>> - preempted for an unlimited time
>> - allowing optimizations
>> -================== ============ =================================
>> +================================= =========== =================================
>> +flag value meaning
>> +================================= =========== =================================
>> +KVM_HINTS_REALTIME 0 guest checks this feature bit to
>> + determine that vCPUs are never
>> + preempted for an unlimited time
>> + allowing optimizations
>> +
>> +KVM_HINTS_TRUSTWORTHY_SMT 1 the bit is set when the exposed
>> + SMT topology is trustworthy, this
>> + means that two guest vCPUs will
>> + never share a physical core
>> + unless they are exposed as SMT
>> + threads.
>> +================================= =========== =================================
>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>> index 2a8e0b6b9805..183239d5dfad 100644
>> --- a/arch/x86/include/uapi/asm/kvm_para.h
>> +++ b/arch/x86/include/uapi/asm/kvm_para.h
>> @@ -31,8 +31,10 @@
>> #define KVM_FEATURE_PV_SEND_IPI 11
>> #define KVM_FEATURE_POLL_CONTROL 12
>> #define KVM_FEATURE_PV_SCHED_YIELD 13
>> +#define KVM_FEATURE_TRUSTWORTHY_SMT 14
>>
>> #define KVM_HINTS_REALTIME 0
>> +#define KVM_HINTS_TRUSTWORTHY_SMT 1
>>
>> /* The last 8 bits are used to indicate how to interpret the flags field
>> * in pvclock structure. If no bits are set, all flags are ignored.
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index f68c0c753c38..dab527a7081f 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -712,7 +712,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
>> (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
>> (1 << KVM_FEATURE_PV_SEND_IPI) |
>> (1 << KVM_FEATURE_POLL_CONTROL) |
>> - (1 << KVM_FEATURE_PV_SCHED_YIELD);
>> + (1 << KVM_FEATURE_PV_SCHED_YIELD) |
>> + (1 << KVM_FEATURE_TRUSTWORTHY_SMT);
>>
>> if (sched_info_on())
>> entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
>> @@ -720,6 +721,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
>> entry->ebx = 0;
>> entry->ecx = 0;
>> entry->edx = 0;
>> +
>> + if (!cpu_smt_possible())
>> + entry->edx |= (1 << KVM_HINTS_TRUSTWORTHY_SMT);
>> +
>> break;
>> case 0x80000000:
>> entry->eax = min(entry->eax, 0x8000001f);
>> --
>> 2.20.1
>>
>

2019-11-05 17:38:50

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

On Tue, Nov 5, 2019 at 9:32 AM Liran Alon <[email protected]> wrote:
>
>
>
> > On 5 Nov 2019, at 19:17, Liran Alon <[email protected]> wrote:
> >
> >
> >
> >> On 5 Nov 2019, at 18:17, Vitaly Kuznetsov <[email protected]> wrote:
> >>
> >> Virtualized guests may pick a different strategy to mitigate hardware
> >> vulnerabilities when it comes to hyper-threading: disable SMT completely,
> >> use core scheduling, or, for example, opt in for STIBP. Making the
> >> decision, however, requires an extra bit of information which is currently
> >> missing: does the topology the guest see match hardware or if it is 'fake'
> >> and two vCPUs which look like different cores from guest's perspective can
> >> actually be scheduled on the same physical core. Disabling SMT or doing
> >> core scheduling only makes sense when the topology is trustworthy.
> >
> > This is not only related to vulnerability mitigations.
> > It’s also important for guest to know if it’s SMT topology is trustworthy for various optimisation algorithms.
> > E.g. Should it attempt to run tasks that share memory on same NUMA node?
> >
> >>
> >> Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
> >> that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
> >> topology is actually trustworthy. It would, of course, be possible to get
> >> away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
> >> compatibility but the current approach looks more straightforward.
> >
> > Agree.
> >
> >>
> >> There were some offline discussions on whether this new feature bit should
> >> be complemented with a 're-enlightenment' mechanism for live migration (so
> >> it can change in guest's lifetime) but it doesn't seem to be very
> >> practical: what a sane guest is supposed to do if it's told that SMT
> >> topology is about to become fake other than kill itself? Also, it seems to
> >> make little sense to do e.g. CPU pinning on the source but not on the
> >> destination.
> >
> > Agree.
> >
> >>
> >> There is also one additional piece of the information missing. A VM can be
> >> sharing physical cores with other VMs (or other userspace tasks on the
> >> host) so does KVM_FEATURE_TRUSTWORTHY_SMT imply that it's not the case or
> >> not? It is unclear if this changes anything and can probably be left out
> >> of scope (just don't do that).
> >
> > I don’t think KVM_FEATURE_TRUSTWORTHY_SMT should indicate to guest whether it’s vCPU shares a CPU core with another guest.
> > It should only expose to guest the fact that he can rely on it’s virtual SMT topology. i.e. That there is a relation between virtual SMT topology
> > to which physical logical processors run which vCPUs.
> >
> > Guest have nothing to do with the fact that he is now aware host doesn’t guarantee to him that one of it’s vCPU shares a CPU core with another guest vCPU.
> > I don’t think we should have a CPUID bit that expose this information to guest.
> >
> >>
> >> Similar to the already existent 'NoNonArchitecturalCoreSharing' Hyper-V
> >> enlightenment, the default value of KVM_HINTS_TRUSTWORTHY_SMT is set to
> >> !cpu_smt_possible(). KVM userspace is thus supposed to pass it to guest's
> >> CPUIDs in case it is '1' (meaning no SMT on the host at all) or do some
> >> extra work (like CPU pinning and exposing the correct topology) before
> >> passing '1' to the guest.
> >
> > Hmm… I’m not sure this is correct.
> > For example, it is possible to expose in virtual SMT topology that guest have 2 vCPUs running on single NUMA node,
> > while in reality each vCPU task can be scheduled to run on different NUMA nodes. Therefore, making virtual SMT topology not trustworthy.
> > i.e. Disabling SMT on host doesn’t mean that virtual SMT topology is reliable.
> >
> > I think this CPUID bit should just be set from userspace when admin have guaranteed to guest that it have set vCPU task affinity properly.
> > Without KVM attempting to set this bit by itself.
> >
> > Note that we defined above KVM_HINTS_TRUSTWORTHY_SMT bit differently than “NoNonArchitecturalCoreSharing”.
> > “NoNonArchitecturalCoreSharing” guarantees to guest that vCPUs of guest won’t share a physical CPU core unless they are defined as virtual SMT siblings.
> > In contrast, KVM_HINTS_TRUSTWORTHY_SMT bit attempts to state that virtual SMT topology is a subset of how vCPUs are scheduled on physical SMT topology.
> > i.e. It seems that Hyper-V bit is indeed only attempting to provide guest information related to security mitigations. While newly proposed KVM bit attempts to also
> > assist guest to determine how to perform it’s internal scheduling decisions.
> >
> > -Liran
>
> Oh I later saw below that you defined KVM_HINTS_TRUSTWORTHY_SMT indeed as Microsoft defined “NoNonArchitecturalCoreSharing”.
> If you plan to go with this direction, than I suggest renaming to similar name as Hyper-V.
> But I think having a general vSMT topology is trustworthy is also useful.
> Maybe we should have separate bits for each.

And perhaps a bit each for "vCCX topology is trustworthy" and "vNUMA
topology is trustworthy"?

> -Liran
>
> >
> >>
> >> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> >> ---
> >> Documentation/virt/kvm/cpuid.rst | 27 +++++++++++++++++++--------
> >> arch/x86/include/uapi/asm/kvm_para.h | 2 ++
> >> arch/x86/kvm/cpuid.c | 7 ++++++-
> >> 3 files changed, 27 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> >> index 01b081f6e7ea..64b94103fc90 100644
> >> --- a/Documentation/virt/kvm/cpuid.rst
> >> +++ b/Documentation/virt/kvm/cpuid.rst
> >> @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
> >> before using paravirtualized
> >> sched yield.
> >>
> >> +KVM_FEATURE_TRUSTWORTHY_SMT 14 set when host supports 'SMT
> >> + topology is trustworthy' hint
> >> + (KVM_HINTS_TRUSTWORTHY_SMT).
> >> +
> >> KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
> >> per-cpu warps are expeced in
> >> kvmclock
> >> @@ -97,11 +101,18 @@ KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
> >>
> >> Where ``flag`` here is defined as below:
> >>
> >> -================== ============ =================================
> >> -flag value meaning
> >> -================== ============ =================================
> >> -KVM_HINTS_REALTIME 0 guest checks this feature bit to
> >> - determine that vCPUs are never
> >> - preempted for an unlimited time
> >> - allowing optimizations
> >> -================== ============ =================================
> >> +================================= =========== =================================
> >> +flag value meaning
> >> +================================= =========== =================================
> >> +KVM_HINTS_REALTIME 0 guest checks this feature bit to
> >> + determine that vCPUs are never
> >> + preempted for an unlimited time
> >> + allowing optimizations
> >> +
> >> +KVM_HINTS_TRUSTWORTHY_SMT 1 the bit is set when the exposed
> >> + SMT topology is trustworthy, this
> >> + means that two guest vCPUs will
> >> + never share a physical core
> >> + unless they are exposed as SMT
> >> + threads.
> >> +================================= =========== =================================
> >> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> >> index 2a8e0b6b9805..183239d5dfad 100644
> >> --- a/arch/x86/include/uapi/asm/kvm_para.h
> >> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> >> @@ -31,8 +31,10 @@
> >> #define KVM_FEATURE_PV_SEND_IPI 11
> >> #define KVM_FEATURE_POLL_CONTROL 12
> >> #define KVM_FEATURE_PV_SCHED_YIELD 13
> >> +#define KVM_FEATURE_TRUSTWORTHY_SMT 14
> >>
> >> #define KVM_HINTS_REALTIME 0
> >> +#define KVM_HINTS_TRUSTWORTHY_SMT 1
> >>
> >> /* The last 8 bits are used to indicate how to interpret the flags field
> >> * in pvclock structure. If no bits are set, all flags are ignored.
> >> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> >> index f68c0c753c38..dab527a7081f 100644
> >> --- a/arch/x86/kvm/cpuid.c
> >> +++ b/arch/x86/kvm/cpuid.c
> >> @@ -712,7 +712,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> >> (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
> >> (1 << KVM_FEATURE_PV_SEND_IPI) |
> >> (1 << KVM_FEATURE_POLL_CONTROL) |
> >> - (1 << KVM_FEATURE_PV_SCHED_YIELD);
> >> + (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> >> + (1 << KVM_FEATURE_TRUSTWORTHY_SMT);
> >>
> >> if (sched_info_on())
> >> entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> >> @@ -720,6 +721,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> >> entry->ebx = 0;
> >> entry->ecx = 0;
> >> entry->edx = 0;
> >> +
> >> + if (!cpu_smt_possible())
> >> + entry->edx |= (1 << KVM_HINTS_TRUSTWORTHY_SMT);
> >> +
> >> break;
> >> case 0x80000000:
> >> entry->eax = min(entry->eax, 0x8000001f);
> >> --
> >> 2.20.1
> >>
> >
>

2019-11-05 19:38:57

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

On Tue, Nov 05, 2019 at 05:17:37PM +0100, Vitaly Kuznetsov wrote:
> Virtualized guests may pick a different strategy to mitigate hardware
> vulnerabilities when it comes to hyper-threading: disable SMT completely,
> use core scheduling, or, for example, opt in for STIBP. Making the
> decision, however, requires an extra bit of information which is currently
> missing: does the topology the guest see match hardware or if it is 'fake'
> and two vCPUs which look like different cores from guest's perspective can
> actually be scheduled on the same physical core. Disabling SMT or doing
> core scheduling only makes sense when the topology is trustworthy.
>
> Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
> that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
> topology is actually trustworthy. It would, of course, be possible to get
> away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
> compatibility but the current approach looks more straightforward.

I'd stay away from "trustworthy", especially if this is controlled by
userspace. Whether or not the hint is trustworthy is purely up to the
guest. Right now it doesn't really matter, but that will change as we
start moving pieces of the host out of the guest's TCB.

It may make sense to split the two (or even three?) cases, e.g.
KVM_FEATURE_NO_SMT and KVM_FEATURE_ACCURATE_TOPOLOGY. KVM can easily
enforce NO_SMT _today_, i.e. allow it to be set if and only if SMT is
truly disabled. Verifying that the topology exposed to the guest is legit
is a completely different beast.

If/when the kernel gains core scheduling, a new flag to communicate core
sharing could be added, e.g. KVM_FEATURE_NO_NON_ARCH_CORE_SHARING.

Differentiating between host kernel and userspace doesn't add much unless
the host kernel can attest itself to the guest, but splitting things up
at least keeps that option open.

> There were some offline discussions on whether this new feature bit should
> be complemented with a 're-enlightenment' mechanism for live migration (so
> it can change in guest's lifetime) but it doesn't seem to be very
> practical: what a sane guest is supposed to do if it's told that SMT
> topology is about to become fake other than kill itself? Also, it seems to
> make little sense to do e.g. CPU pinning on the source but not on the
> destination.
>
> There is also one additional piece of the information missing. A VM can be
> sharing physical cores with other VMs (or other userspace tasks on the
> host) so does KVM_FEATURE_TRUSTWORTHY_SMT imply that it's not the case or
> not? It is unclear if this changes anything and can probably be left out
> of scope (just don't do that).

This is why I'd go with something like "accurate" instead of "trustworthy".

> Similar to the already existent 'NoNonArchitecturalCoreSharing' Hyper-V
> enlightenment, the default value of KVM_HINTS_TRUSTWORTHY_SMT is set to
> !cpu_smt_possible(). KVM userspace is thus supposed to pass it to guest's
> CPUIDs in case it is '1' (meaning no SMT on the host at all) or do some
> extra work (like CPU pinning and exposing the correct topology) before
> passing '1' to the guest.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> Documentation/virt/kvm/cpuid.rst | 27 +++++++++++++++++++--------
> arch/x86/include/uapi/asm/kvm_para.h | 2 ++
> arch/x86/kvm/cpuid.c | 7 ++++++-
> 3 files changed, 27 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index 01b081f6e7ea..64b94103fc90 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
> before using paravirtualized
> sched yield.
>
> +KVM_FEATURE_TRUSTWORTHY_SMT 14 set when host supports 'SMT
> + topology is trustworthy' hint
> + (KVM_HINTS_TRUSTWORTHY_SMT).
> +
> KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
> per-cpu warps are expeced in
> kvmclock
> @@ -97,11 +101,18 @@ KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
>
> Where ``flag`` here is defined as below:
>
> -================== ============ =================================
> -flag value meaning
> -================== ============ =================================
> -KVM_HINTS_REALTIME 0 guest checks this feature bit to
> - determine that vCPUs are never
> - preempted for an unlimited time
> - allowing optimizations
> -================== ============ =================================
> +================================= =========== =================================
> +flag value meaning
> +================================= =========== =================================
> +KVM_HINTS_REALTIME 0 guest checks this feature bit to
> + determine that vCPUs are never
> + preempted for an unlimited time
> + allowing optimizations
> +
> +KVM_HINTS_TRUSTWORTHY_SMT 1 the bit is set when the exposed
> + SMT topology is trustworthy, this
> + means that two guest vCPUs will
> + never share a physical core
> + unless they are exposed as SMT
> + threads.
> +================================= =========== =================================
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 2a8e0b6b9805..183239d5dfad 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -31,8 +31,10 @@
> #define KVM_FEATURE_PV_SEND_IPI 11
> #define KVM_FEATURE_POLL_CONTROL 12
> #define KVM_FEATURE_PV_SCHED_YIELD 13
> +#define KVM_FEATURE_TRUSTWORTHY_SMT 14
>
> #define KVM_HINTS_REALTIME 0
> +#define KVM_HINTS_TRUSTWORTHY_SMT 1
>
> /* The last 8 bits are used to indicate how to interpret the flags field
> * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index f68c0c753c38..dab527a7081f 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -712,7 +712,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
> (1 << KVM_FEATURE_PV_SEND_IPI) |
> (1 << KVM_FEATURE_POLL_CONTROL) |
> - (1 << KVM_FEATURE_PV_SCHED_YIELD);
> + (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> + (1 << KVM_FEATURE_TRUSTWORTHY_SMT);
>
> if (sched_info_on())
> entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> @@ -720,6 +721,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> entry->ebx = 0;
> entry->ecx = 0;
> entry->edx = 0;
> +
> + if (!cpu_smt_possible())
> + entry->edx |= (1 << KVM_HINTS_TRUSTWORTHY_SMT);
> +
> break;
> case 0x80000000:
> entry->eax = min(entry->eax, 0x8000001f);
> --
> 2.20.1
>

2019-11-05 20:04:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

On Tue, Nov 05, 2019 at 05:17:37PM +0100, Vitaly Kuznetsov wrote:
> Virtualized guests may pick a different strategy to mitigate hardware
> vulnerabilities when it comes to hyper-threading: disable SMT completely,
> use core scheduling, or, for example, opt in for STIBP. Making the
> decision, however, requires an extra bit of information which is currently
> missing: does the topology the guest see match hardware or if it is 'fake'
> and two vCPUs which look like different cores from guest's perspective can
> actually be scheduled on the same physical core. Disabling SMT or doing
> core scheduling only makes sense when the topology is trustworthy.
>
> Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
> that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
> topology is actually trustworthy. It would, of course, be possible to get
> away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
> compatibility but the current approach looks more straightforward.

The only way virt topology can make any sense what so ever is if the
vcpus are pinned to physical CPUs.

And I was under the impression we already had a bit for that (isn't it
used to disable paravirt spinlocks and the like?). But I cannot seem to
find it in a hurry.

So I would much rather you have a bit that indicates the 1:1 vcpu/cpu
mapping and if that is set accept the topology information and otherwise
completely ignore it.

2019-11-05 23:26:27

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

On Tue, Nov 05, 2019 at 11:37:50AM -0800, Sean Christopherson wrote:
> On Tue, Nov 05, 2019 at 05:17:37PM +0100, Vitaly Kuznetsov wrote:
> > Virtualized guests may pick a different strategy to mitigate hardware
> > vulnerabilities when it comes to hyper-threading: disable SMT completely,
> > use core scheduling, or, for example, opt in for STIBP. Making the
> > decision, however, requires an extra bit of information which is currently
> > missing: does the topology the guest see match hardware or if it is 'fake'
> > and two vCPUs which look like different cores from guest's perspective can
> > actually be scheduled on the same physical core. Disabling SMT or doing
> > core scheduling only makes sense when the topology is trustworthy.
> >
> > Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
> > that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
> > topology is actually trustworthy. It would, of course, be possible to get
> > away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
> > compatibility but the current approach looks more straightforward.
>
> I'd stay away from "trustworthy", especially if this is controlled by
> userspace. Whether or not the hint is trustworthy is purely up to the
> guest. Right now it doesn't really matter, but that will change as we
> start moving pieces of the host out of the guest's TCB.
>
> It may make sense to split the two (or even three?) cases, e.g.
> KVM_FEATURE_NO_SMT and KVM_FEATURE_ACCURATE_TOPOLOGY. KVM can easily
> enforce NO_SMT _today_, i.e. allow it to be set if and only if SMT is
> truly disabled. Verifying that the topology exposed to the guest is legit
> is a completely different beast.

Scratch the ACCURATE_TOPOLOGY idea, I doubt there's a real use case for
setting ACCURATE_TOPOLOGY and not KVM_HINTS_REALTIME. A feature flag to
state that SMT is disabled seems simple and useful.

2019-11-05 23:28:45

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

On Tue, Nov 05, 2019 at 09:02:18PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 05, 2019 at 05:17:37PM +0100, Vitaly Kuznetsov wrote:
> > Virtualized guests may pick a different strategy to mitigate hardware
> > vulnerabilities when it comes to hyper-threading: disable SMT completely,
> > use core scheduling, or, for example, opt in for STIBP. Making the
> > decision, however, requires an extra bit of information which is currently
> > missing: does the topology the guest see match hardware or if it is 'fake'
> > and two vCPUs which look like different cores from guest's perspective can
> > actually be scheduled on the same physical core. Disabling SMT or doing
> > core scheduling only makes sense when the topology is trustworthy.
> >
> > Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
> > that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
> > topology is actually trustworthy. It would, of course, be possible to get
> > away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
> > compatibility but the current approach looks more straightforward.
>
> The only way virt topology can make any sense what so ever is if the
> vcpus are pinned to physical CPUs.
>
> And I was under the impression we already had a bit for that (isn't it
> used to disable paravirt spinlocks and the like?). But I cannot seem to
> find it in a hurry.

Yep, KVM_HINTS_REALTIME does what you describe.

> So I would much rather you have a bit that indicates the 1:1 vcpu/cpu
> mapping and if that is set accept the topology information and otherwise
> completely ignore it.

2019-11-05 23:53:42

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

On 05/11/19 21:02, Peter Zijlstra wrote:
> On Tue, Nov 05, 2019 at 05:17:37PM +0100, Vitaly Kuznetsov wrote:
>> Virtualized guests may pick a different strategy to mitigate hardware
>> vulnerabilities when it comes to hyper-threading: disable SMT completely,
>> use core scheduling, or, for example, opt in for STIBP. Making the
>> decision, however, requires an extra bit of information which is currently
>> missing: does the topology the guest see match hardware or if it is 'fake'
>> and two vCPUs which look like different cores from guest's perspective can
>> actually be scheduled on the same physical core. Disabling SMT or doing
>> core scheduling only makes sense when the topology is trustworthy.
>>
>> Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
>> that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
>> topology is actually trustworthy. It would, of course, be possible to get
>> away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
>> compatibility but the current approach looks more straightforward.
>
> The only way virt topology can make any sense what so ever is if the
> vcpus are pinned to physical CPUs.

This is a subset of the requirements for "trustworthy" SMT. You can have:

- vCPUs pinned to two threads in the same core and exposed as multiple
cores in the guest

- vCPUs from different guests pinned to two threads in the same core

and that would be okay as far as KVM_HINTS_REALTIME is concerned, but
would still allow exploitation of side-channels, respectively within the
VM and between VMs.

Paolo

> And I was under the impression we already had a bit for that (isn't it
> used to disable paravirt spinlocks and the like?). But I cannot seem to
> find it in a hurry.
>
> So I would much rather you have a bit that indicates the 1:1 vcpu/cpu
> mapping and if that is set accept the topology information and otherwise
> completely ignore it.
>

2019-11-05 23:57:35

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

On 05/11/19 17:17, Vitaly Kuznetsov wrote:
> There is also one additional piece of the information missing. A VM can be
> sharing physical cores with other VMs (or other userspace tasks on the
> host) so does KVM_FEATURE_TRUSTWORTHY_SMT imply that it's not the case or
> not? It is unclear if this changes anything and can probably be left out
> of scope (just don't do that).
>
> Similar to the already existent 'NoNonArchitecturalCoreSharing' Hyper-V
> enlightenment, the default value of KVM_HINTS_TRUSTWORTHY_SMT is set to
> !cpu_smt_possible(). KVM userspace is thus supposed to pass it to guest's
> CPUIDs in case it is '1' (meaning no SMT on the host at all) or do some
> extra work (like CPU pinning and exposing the correct topology) before
> passing '1' to the guest.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> Documentation/virt/kvm/cpuid.rst | 27 +++++++++++++++++++--------
> arch/x86/include/uapi/asm/kvm_para.h | 2 ++
> arch/x86/kvm/cpuid.c | 7 ++++++-
> 3 files changed, 27 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index 01b081f6e7ea..64b94103fc90 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
> before using paravirtualized
> sched yield.
>
> +KVM_FEATURE_TRUSTWORTHY_SMT 14 set when host supports 'SMT
> + topology is trustworthy' hint
> + (KVM_HINTS_TRUSTWORTHY_SMT).
> +

Instead of defining a one-off bit, can we make:

ecx = the set of known "hints" (defaults to edx if zero)

edx = the set of hints that apply to the virtual machine

Paolo

2019-11-06 08:33:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

On Wed, Nov 06, 2019 at 12:51:30AM +0100, Paolo Bonzini wrote:
> On 05/11/19 21:02, Peter Zijlstra wrote:
> > On Tue, Nov 05, 2019 at 05:17:37PM +0100, Vitaly Kuznetsov wrote:
> >> Virtualized guests may pick a different strategy to mitigate hardware
> >> vulnerabilities when it comes to hyper-threading: disable SMT completely,
> >> use core scheduling, or, for example, opt in for STIBP. Making the
> >> decision, however, requires an extra bit of information which is currently
> >> missing: does the topology the guest see match hardware or if it is 'fake'
> >> and two vCPUs which look like different cores from guest's perspective can
> >> actually be scheduled on the same physical core. Disabling SMT or doing
> >> core scheduling only makes sense when the topology is trustworthy.
> >>
> >> Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
> >> that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
> >> topology is actually trustworthy. It would, of course, be possible to get
> >> away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
> >> compatibility but the current approach looks more straightforward.
> >
> > The only way virt topology can make any sense what so ever is if the
> > vcpus are pinned to physical CPUs.
>
> This is a subset of the requirements for "trustworthy" SMT. You can have:
>
> - vCPUs pinned to two threads in the same core and exposed as multiple
> cores in the guest

Why the .... would one do anything like that?

> - vCPUs from different guests pinned to two threads in the same core
>
> and that would be okay as far as KVM_HINTS_REALTIME is concerned, but
> would still allow exploitation of side-channels, respectively within the
> VM and between VMs.

Hardly, RT really rather would not have SMT. SMT is pretty crap for
determinism.

2019-11-06 08:35:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

On Tue, Nov 05, 2019 at 03:25:28PM -0800, Sean Christopherson wrote:
> On Tue, Nov 05, 2019 at 09:02:18PM +0100, Peter Zijlstra wrote:
> > On Tue, Nov 05, 2019 at 05:17:37PM +0100, Vitaly Kuznetsov wrote:
> > > Virtualized guests may pick a different strategy to mitigate hardware
> > > vulnerabilities when it comes to hyper-threading: disable SMT completely,
> > > use core scheduling, or, for example, opt in for STIBP. Making the
> > > decision, however, requires an extra bit of information which is currently
> > > missing: does the topology the guest see match hardware or if it is 'fake'
> > > and two vCPUs which look like different cores from guest's perspective can
> > > actually be scheduled on the same physical core. Disabling SMT or doing
> > > core scheduling only makes sense when the topology is trustworthy.
> > >
> > > Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
> > > that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
> > > topology is actually trustworthy. It would, of course, be possible to get
> > > away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
> > > compatibility but the current approach looks more straightforward.
> >
> > The only way virt topology can make any sense what so ever is if the
> > vcpus are pinned to physical CPUs.
> >
> > And I was under the impression we already had a bit for that (isn't it
> > used to disable paravirt spinlocks and the like?). But I cannot seem to
> > find it in a hurry.
>
> Yep, KVM_HINTS_REALTIME does what you describe.

*sigh*, that's a pretty shit name for it :/

2019-11-06 09:42:46

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

On 06/11/19 09:32, Peter Zijlstra wrote:
>>> The only way virt topology can make any sense what so ever is if the
>>> vcpus are pinned to physical CPUs.
>>
>> This is a subset of the requirements for "trustworthy" SMT. You can have:
>>
>> - vCPUs pinned to two threads in the same core and exposed as multiple
>> cores in the guest
>
> Why the .... would one do anything like that?

If a vCPUs from a different guest could be pinned to a threads in the
same core as this guest (e.g. guests with an odd number of vCPUs), then
why not. Side-channel wise, you're screwed anyway.

>> - vCPUs from different guests pinned to two threads in the same core
>>
>> and that would be okay as far as KVM_HINTS_REALTIME is concerned, but
>> would still allow exploitation of side-channels, respectively within the
>> VM and between VMs.
>
> Hardly, RT really rather would not have SMT. SMT is pretty crap for
> determinism.

True, but not a problem as long as the guest knows that - it can ignore
one sibling for each core for RT tasks, and use hyperthreading for
non-RT and housekeeping tasks.

Paolo

2019-11-07 10:41:52

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

Sean Christopherson <[email protected]> writes:

> On Tue, Nov 05, 2019 at 11:37:50AM -0800, Sean Christopherson wrote:
>> On Tue, Nov 05, 2019 at 05:17:37PM +0100, Vitaly Kuznetsov wrote:
>> > Virtualized guests may pick a different strategy to mitigate hardware
>> > vulnerabilities when it comes to hyper-threading: disable SMT completely,
>> > use core scheduling, or, for example, opt in for STIBP. Making the
>> > decision, however, requires an extra bit of information which is currently
>> > missing: does the topology the guest see match hardware or if it is 'fake'
>> > and two vCPUs which look like different cores from guest's perspective can
>> > actually be scheduled on the same physical core. Disabling SMT or doing
>> > core scheduling only makes sense when the topology is trustworthy.
>> >
>> > Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
>> > that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
>> > topology is actually trustworthy. It would, of course, be possible to get
>> > away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
>> > compatibility but the current approach looks more straightforward.
>>
>> I'd stay away from "trustworthy", especially if this is controlled by
>> userspace. Whether or not the hint is trustworthy is purely up to the
>> guest. Right now it doesn't really matter, but that will change as we
>> start moving pieces of the host out of the guest's TCB.
>>
>> It may make sense to split the two (or even three?) cases, e.g.
>> KVM_FEATURE_NO_SMT and KVM_FEATURE_ACCURATE_TOPOLOGY. KVM can easily
>> enforce NO_SMT _today_, i.e. allow it to be set if and only if SMT is
>> truly disabled. Verifying that the topology exposed to the guest is legit
>> is a completely different beast.
>
> Scratch the ACCURATE_TOPOLOGY idea, I doubt there's a real use case for
> setting ACCURATE_TOPOLOGY and not KVM_HINTS_REALTIME. A feature flag to
> state that SMT is disabled seems simple and useful.

You seem to have the most conservative ideas around)

I wasn't actually thinking about KVM enforcing anything here, my goal
was to provide guest with enough information so it can adjust its SMT
related settings accordingly. These could be:
- 'nosmt'
- STIBP
- Core scheduling (not yet upstream afaik)
- Manual vCPU pinning for certain tasks (including running nested
guests). [E.g. nested Hyper-V hides 'md_clear' from its guests when it
doesn't see 'NoNonarchitecturalCoreSharing' from the underlying
hypervisor as it's pointless]
- (Liran's idea): Using SMT information for better scheduling

KVM_FEATURE_NO_SMT would cover one case, however, it's basically the
same as not exposing hyperthreads by KVM userspace, just enforced. Do
you think that such enforcement makes sense? What the guest is supposed
to do when it sees the flag (compared to what's doing now)?

('KVM_HINTS_REALTIME' may actually imply what we want, however, judging
by its name it may gain additional meaning in the future like 'always do
busy polling everywhere')

--
Vitaly

2019-11-07 15:08:52

by Liran Alon

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy



> On 7 Nov 2019, at 16:00, Christophe de Dinechin <[email protected]> wrote:
>
>
>
>> On 6 Nov 2019, at 00:25, Sean Christopherson <[email protected]> wrote:
>>
>> On Tue, Nov 05, 2019 at 11:37:50AM -0800, Sean Christopherson wrote:
>>> On Tue, Nov 05, 2019 at 05:17:37PM +0100, Vitaly Kuznetsov wrote:
>>>> Virtualized guests may pick a different strategy to mitigate hardware
>>>> vulnerabilities when it comes to hyper-threading: disable SMT completely,
>>>> use core scheduling, or, for example, opt in for STIBP. Making the
>>>> decision, however, requires an extra bit of information which is currently
>>>> missing: does the topology the guest see match hardware or if it is 'fake'
>>>> and two vCPUs which look like different cores from guest's perspective can
>>>> actually be scheduled on the same physical core. Disabling SMT or doing
>>>> core scheduling only makes sense when the topology is trustworthy.
>>>>
>>>> Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
>>>> that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
>>>> topology is actually trustworthy. It would, of course, be possible to get
>>>> away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
>>>> compatibility but the current approach looks more straightforward.
>>>
>>> I'd stay away from "trustworthy", especially if this is controlled by
>>> userspace. Whether or not the hint is trustworthy is purely up to the
>>> guest. Right now it doesn't really matter, but that will change as we
>>> start moving pieces of the host out of the guest's TCB.
>>>
>>> It may make sense to split the two (or even three?) cases, e.g.
>>> KVM_FEATURE_NO_SMT and KVM_FEATURE_ACCURATE_TOPOLOGY. KVM can easily
>>> enforce NO_SMT _today_, i.e. allow it to be set if and only if SMT is
>>> truly disabled. Verifying that the topology exposed to the guest is legit
>>> is a completely different beast.
>>
>> Scratch the ACCURATE_TOPOLOGY idea, I doubt there's a real use case for
>> setting ACCURATE_TOPOLOGY and not KVM_HINTS_REALTIME. A feature flag to
>> state that SMT is disabled seems simple and useful.

A bit such as NoNonArchitecturalCoreSharing can be set even when host SMT is enabled.
For example, when host use core-scheduling to group together vCPUs that run as sibling hyperthreads.
Therefore, I wouldn’t want to tie the feature-flag semantics to host SMT being enabled/disabled.
It’s just true that this bit can be set when host SMT is disabled.

>
> I share that concern about the naming, although I do see some
> value in exposing the cpu_smt_possible() result. I think it’s easier
> to state that something does not work than to state something does
> work.
>
> Also, with respect to mitigation, we may want to split the two cases
> that Paolo outlined, i.e. have KVM_HINTS_REALTIME,
> KVM_HINTS_CORES_CROSSTALK and
> KVM_HINTS_CORES_LEAKING,
> where CORES_CROSSTALKS indicates there may be some
> cross-talk between what the guest thinks are isolated cores,
> and CORES_LEAKING indicates that cores may leak data
> to some other guest.
>
> The problem with my approach is that it is shouting “don’t trust me”
> a bit too loudly.

I don’t see a value in exposing CORES_LEAKING to guest. As guest have nothing to do with it.

-Liran









2019-11-08 15:36:37

by Christophe de Dinechin

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy



> On 7 Nov 2019, at 16:02, Liran Alon <[email protected]> wrote:
>
>
>
>> On 7 Nov 2019, at 16:00, Christophe de Dinechin <[email protected]> wrote:
>>
>
>>
>> I share that concern about the naming, although I do see some
>> value in exposing the cpu_smt_possible() result. I think it’s easier
>> to state that something does not work than to state something does
>> work.
>>
>> Also, with respect to mitigation, we may want to split the two cases
>> that Paolo outlined, i.e. have KVM_HINTS_REALTIME,
>> KVM_HINTS_CORES_CROSSTALK and
>> KVM_HINTS_CORES_LEAKING,
>> where CORES_CROSSTALKS indicates there may be some
>> cross-talk between what the guest thinks are isolated cores,
>> and CORES_LEAKING indicates that cores may leak data
>> to some other guest.
>>
>> The problem with my approach is that it is shouting “don’t trust me”
>> a bit too loudly.
>
> I don’t see a value in exposing CORES_LEAKING to guest. As guest have nothing to do with it.

The guest could display / expose the information to guest sysadmins
and admin tools (e.g. through /proc).

While the kernel cannot mitigate, a higher-level product could for example
have a policy about which workloads can be deployed on a system which
may leak data to other VMs.

Christophe

2019-11-08 15:56:54

by Liran Alon

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy



> On 8 Nov 2019, at 17:35, Christophe de Dinechin <[email protected]> wrote:
>
>
>
>> On 7 Nov 2019, at 16:02, Liran Alon <[email protected]> wrote:
>>
>>
>>
>>> On 7 Nov 2019, at 16:00, Christophe de Dinechin <[email protected]> wrote:
>>>
>>
>>>
>>> I share that concern about the naming, although I do see some
>>> value in exposing the cpu_smt_possible() result. I think it’s easier
>>> to state that something does not work than to state something does
>>> work.
>>>
>>> Also, with respect to mitigation, we may want to split the two cases
>>> that Paolo outlined, i.e. have KVM_HINTS_REALTIME,
>>> KVM_HINTS_CORES_CROSSTALK and
>>> KVM_HINTS_CORES_LEAKING,
>>> where CORES_CROSSTALKS indicates there may be some
>>> cross-talk between what the guest thinks are isolated cores,
>>> and CORES_LEAKING indicates that cores may leak data
>>> to some other guest.
>>>
>>> The problem with my approach is that it is shouting “don’t trust me”
>>> a bit too loudly.
>>
>> I don’t see a value in exposing CORES_LEAKING to guest. As guest have nothing to do with it.
>
> The guest could display / expose the information to guest sysadmins
> and admin tools (e.g. through /proc).
>
> While the kernel cannot mitigate, a higher-level product could for example
> have a policy about which workloads can be deployed on a system which
> may leak data to other VMs.
>
> Christophe

Honestly, I don’t think any sane cloud provider will schedule vCPUs of different guests on same physical CPU core and report this to guest.
Therefore, I think this is only relevant for use-cases where the guest owner is also the host/hypervisor owner. And therefore, doesn’t need this
information exposed in a CPUID bit.
I see your point regarding how in theory it could be used, but I think we should wait and see if such use-case exists before defining this interface.

-Liran


2019-11-20 12:57:49

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

Hi Paolo,
On Wed, 6 Nov 2019 at 16:34, Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Nov 05, 2019 at 03:25:28PM -0800, Sean Christopherson wrote:
> > On Tue, Nov 05, 2019 at 09:02:18PM +0100, Peter Zijlstra wrote:
> > > On Tue, Nov 05, 2019 at 05:17:37PM +0100, Vitaly Kuznetsov wrote:
> > > > Virtualized guests may pick a different strategy to mitigate hardware
> > > > vulnerabilities when it comes to hyper-threading: disable SMT completely,
> > > > use core scheduling, or, for example, opt in for STIBP. Making the
> > > > decision, however, requires an extra bit of information which is currently
> > > > missing: does the topology the guest see match hardware or if it is 'fake'
> > > > and two vCPUs which look like different cores from guest's perspective can
> > > > actually be scheduled on the same physical core. Disabling SMT or doing
> > > > core scheduling only makes sense when the topology is trustworthy.
> > > >
> > > > Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
> > > > that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
> > > > topology is actually trustworthy. It would, of course, be possible to get
> > > > away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
> > > > compatibility but the current approach looks more straightforward.
> > >
> > > The only way virt topology can make any sense what so ever is if the
> > > vcpus are pinned to physical CPUs.
> > >
> > > And I was under the impression we already had a bit for that (isn't it
> > > used to disable paravirt spinlocks and the like?). But I cannot seem to
> > > find it in a hurry.
> >
> > Yep, KVM_HINTS_REALTIME does what you describe.
>
> *sigh*, that's a pretty shit name for it :/

My original commit name this to KVM_HINTS_DEDICATED, commit a4429e53c
(KVM: Introduce paravirtualization hints and KVM_HINTS_DEDICATED),
could we revert the KVM_HINTS_REALTIME renaming? A lot of guys
confused by this renaming now, Peterz, Marcelo ("The previous
definition was much better IMO: HINTS_DEDICATED".
https://lkml.org/lkml/2019/8/26/855).

Wanpeng

2019-12-06 04:02:43

by Ankur Arora

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

On 2019-11-05 3:56 p.m., Paolo Bonzini wrote:
> On 05/11/19 17:17, Vitaly Kuznetsov wrote:
>> There is also one additional piece of the information missing. A VM can be
>> sharing physical cores with other VMs (or other userspace tasks on the
>> host) so does KVM_FEATURE_TRUSTWORTHY_SMT imply that it's not the case or
>> not? It is unclear if this changes anything and can probably be left out
>> of scope (just don't do that).
>>
>> Similar to the already existent 'NoNonArchitecturalCoreSharing' Hyper-V
>> enlightenment, the default value of KVM_HINTS_TRUSTWORTHY_SMT is set to
>> !cpu_smt_possible(). KVM userspace is thus supposed to pass it to guest's
>> CPUIDs in case it is '1' (meaning no SMT on the host at all) or do some
>> extra work (like CPU pinning and exposing the correct topology) before
>> passing '1' to the guest.
>>
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> ---
>> Documentation/virt/kvm/cpuid.rst | 27 +++++++++++++++++++--------
>> arch/x86/include/uapi/asm/kvm_para.h | 2 ++
>> arch/x86/kvm/cpuid.c | 7 ++++++-
>> 3 files changed, 27 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
>> index 01b081f6e7ea..64b94103fc90 100644
>> --- a/Documentation/virt/kvm/cpuid.rst
>> +++ b/Documentation/virt/kvm/cpuid.rst
>> @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
>> before using paravirtualized
>> sched yield.
>>
>> +KVM_FEATURE_TRUSTWORTHY_SMT 14 set when host supports 'SMT
>> + topology is trustworthy' hint
>> + (KVM_HINTS_TRUSTWORTHY_SMT).
>> +
>
> Instead of defining a one-off bit, can we make:
>
> ecx = the set of known "hints" (defaults to edx if zero)
>
> edx = the set of hints that apply to the virtual machine
Just to resurrect this thread, the guest could explicitly ACK
a KVM_FEATURE_DYNAMIC_HINT at init. This would allow the host
to change the hints whenever with the guest not needing to separately
ACK the changed hints.


Ankur

>
> Paolo
>
6

2019-12-06 13:47:49

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

Ankur Arora <[email protected]> writes:

> On 2019-11-05 3:56 p.m., Paolo Bonzini wrote:
>> On 05/11/19 17:17, Vitaly Kuznetsov wrote:
>>> There is also one additional piece of the information missing. A VM can be
>>> sharing physical cores with other VMs (or other userspace tasks on the
>>> host) so does KVM_FEATURE_TRUSTWORTHY_SMT imply that it's not the case or
>>> not? It is unclear if this changes anything and can probably be left out
>>> of scope (just don't do that).
>>>
>>> Similar to the already existent 'NoNonArchitecturalCoreSharing' Hyper-V
>>> enlightenment, the default value of KVM_HINTS_TRUSTWORTHY_SMT is set to
>>> !cpu_smt_possible(). KVM userspace is thus supposed to pass it to guest's
>>> CPUIDs in case it is '1' (meaning no SMT on the host at all) or do some
>>> extra work (like CPU pinning and exposing the correct topology) before
>>> passing '1' to the guest.
>>>
>>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>>> ---
>>> Documentation/virt/kvm/cpuid.rst | 27 +++++++++++++++++++--------
>>> arch/x86/include/uapi/asm/kvm_para.h | 2 ++
>>> arch/x86/kvm/cpuid.c | 7 ++++++-
>>> 3 files changed, 27 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
>>> index 01b081f6e7ea..64b94103fc90 100644
>>> --- a/Documentation/virt/kvm/cpuid.rst
>>> +++ b/Documentation/virt/kvm/cpuid.rst
>>> @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
>>> before using paravirtualized
>>> sched yield.
>>>
>>> +KVM_FEATURE_TRUSTWORTHY_SMT 14 set when host supports 'SMT
>>> + topology is trustworthy' hint
>>> + (KVM_HINTS_TRUSTWORTHY_SMT).
>>> +
>>
>> Instead of defining a one-off bit, can we make:
>>
>> ecx = the set of known "hints" (defaults to edx if zero)
>>
>> edx = the set of hints that apply to the virtual machine
>>
> Just to resurrect this thread, the guest could explicitly ACK
> a KVM_FEATURE_DYNAMIC_HINT at init. This would allow the host
> to change the hints whenever with the guest not needing to separately
> ACK the changed hints.

(I apologize for dropping the ball on this, I'm intended to do RFCv2 in
a nearby future)

Regarding this particular hint (let's call it 'no nonarchitectural
coresharing' for now) I don't see much value in communicating change to
guest when it happens. Imagine our host for some reason is not able to
guarantee that anymore e.g. we've migrated to a host with less pCPUs
and/or special restrictions and have to start sharing. What we, as a
guest, are supposed to do when we receive a notification? "You're now
insecure, deal with it!" :-) Equally, I don't see much value in
pre-acking such change. "I'm fine with becoming insecure at some point".

If we, however, discuss other hints such 'pre-ACK' mechanism may make
sense, however, I'd make it an option to a 'challenge/response'
protocol: if host wants to change a hint it notifies the guest and waits
for an ACK from it (e.g. a pair of MSRs + an interrupt). I, however,
have no good candidate from the existing hints which would require guest
to ACK (e.g revoking PV EOI would probably do but why would we do that?)
As I said before, challenge/response protocol is needed if we'd like to
make TSC frequency change the way Hyper-V does it (required for updating
guest TSC pages in nested case) but this is less and less important with
the appearance of TSC scaling. I'm still not sure if this is an
over-engineering or not. We can wait for the first good candidate to
decide.

--
Vitaly

2019-12-06 20:32:42

by Ankur Arora

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy



On 12/6/19 5:46 AM, Vitaly Kuznetsov wrote:
> Ankur Arora <[email protected]> writes:
>
>> On 2019-11-05 3:56 p.m., Paolo Bonzini wrote:
>>> On 05/11/19 17:17, Vitaly Kuznetsov wrote:
>>>> There is also one additional piece of the information missing. A VM can be
>>>> sharing physical cores with other VMs (or other userspace tasks on the
>>>> host) so does KVM_FEATURE_TRUSTWORTHY_SMT imply that it's not the case or
>>>> not? It is unclear if this changes anything and can probably be left out
>>>> of scope (just don't do that).
>>>>
>>>> Similar to the already existent 'NoNonArchitecturalCoreSharing' Hyper-V
>>>> enlightenment, the default value of KVM_HINTS_TRUSTWORTHY_SMT is set to
>>>> !cpu_smt_possible(). KVM userspace is thus supposed to pass it to guest's
>>>> CPUIDs in case it is '1' (meaning no SMT on the host at all) or do some
>>>> extra work (like CPU pinning and exposing the correct topology) before
>>>> passing '1' to the guest.
>>>>
>>>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>>>> ---
>>>> Documentation/virt/kvm/cpuid.rst | 27 +++++++++++++++++++--------
>>>> arch/x86/include/uapi/asm/kvm_para.h | 2 ++
>>>> arch/x86/kvm/cpuid.c | 7 ++++++-
>>>> 3 files changed, 27 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
>>>> index 01b081f6e7ea..64b94103fc90 100644
>>>> --- a/Documentation/virt/kvm/cpuid.rst
>>>> +++ b/Documentation/virt/kvm/cpuid.rst
>>>> @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
>>>> before using paravirtualized
>>>> sched yield.
>>>>
>>>> +KVM_FEATURE_TRUSTWORTHY_SMT 14 set when host supports 'SMT
>>>> + topology is trustworthy' hint
>>>> + (KVM_HINTS_TRUSTWORTHY_SMT).
>>>> +
>>>
>>> Instead of defining a one-off bit, can we make:
>>>
>>> ecx = the set of known "hints" (defaults to edx if zero)
>>>
>>> edx = the set of hints that apply to the virtual machine
>>>
>> Just to resurrect this thread, the guest could explicitly ACK
>> a KVM_FEATURE_DYNAMIC_HINT at init. This would allow the host
>> to change the hints whenever with the guest not needing to separately
>> ACK the changed hints.
>
> (I apologize for dropping the ball on this, I'm intended to do RFCv2 in
> a nearby future)
>
> Regarding this particular hint (let's call it 'no nonarchitectural
> coresharing' for now) I don't see much value in communicating change to
> guest when it happens. Imagine our host for some reason is not able to
> guarantee that anymore e.g. we've migrated to a host with less pCPUs
> and/or special restrictions and have to start sharing. What we, as a
> guest, are supposed to do when we receive a notification? "You're now
> insecure, deal with it!" :-) Equally, I don't see much value in
> pre-acking such change. "I'm fine with becoming insecure at some point".
True, for that use-case pre-ACK seems like exactly the thing you would
not want.
I do see some value in the guest receiving the notification though.
Maybe it could print a big fat printk or something :). Or, it could
change to a different security-policy-that-I-just-made-up.


> If we, however, discuss other hints such 'pre-ACK' mechanism may make
> sense, however, I'd make it an option to a 'challenge/response'
> protocol: if host wants to change a hint it notifies the guest and waits
> for an ACK from it (e.g. a pair of MSRs + an interrupt). I, however,
My main reason for this 'pre-ACK' approach is some discomfort with
changing the CPUID edx from under the guest.

The MSR+interrupt approach would work as well but then we have the
same set of hints spread across CPUID and the MSR. What do you think
is the right handling for a guest that refuses to ACK the MSR?

> have no good candidate from the existing hints which would require guest
> to ACK (e.g revoking PV EOI would probably do but why would we do that?)
> As I said before, challenge/response protocol is needed if we'd like to
> make TSC frequency change the way Hyper-V does it (required for updating
> guest TSC pages in nested case) but this is less and less important with
> the appearance of TSC scaling. I'm still not sure if this is an
> over-engineering or not. We can wait for the first good candidate to
> decide.
As we've discussed offlist, the particular hint I'm interested in is
KVM_HINT_REALTIME. That's not a particularly good candidate though
because there's no correctness problem if the host does switch it
off suddenly.


Ankur

2019-12-09 09:17:04

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

On 06/12/19 21:31, Ankur Arora wrote:
>> If we, however, discuss other hints such 'pre-ACK' mechanism may make
>> sense, however, I'd make it an option to a 'challenge/response'
>> protocol: if host wants to change a hint it notifies the guest and waits
>> for an ACK from it (e.g. a pair of MSRs + an interrupt). I, however,
>
> My main reason for this 'pre-ACK' approach is some discomfort with
> changing the CPUID edx from under the guest.

Changing the CPUID is fine, if we document which CPUID can change.
There are CPUID leaves that change at runtime, for example in leaf 0Dh
(though in that case it's based on XCR0 and not on external circumstances).

> As we've discussed offlist, the particular hint I'm interested in is
> KVM_HINT_REALTIME. That's not a particularly good candidate though
> because there's no correctness problem if the host does switch it
> off suddenly.

Or perhaps it's a good candidate, exactly because there's no correctness
problem. For SMT topology, there are security issues if the host
doesn't respect it anymore, so making it changeable is of limited utility.

Paolo