2020-02-14 14:33:59

by Erwan Velu

[permalink] [raw]
Subject: [PATCH] kvm: x86: Print "disabled by bios" only once per host

The current behavior is to print a "disabled by bios" message per CPU thread.
As modern CPUs can have up to 64 cores, 128 on a dual socket, and turns this
printk to be a pretty noisy by showing up to 256 times the same line in a row.

This patch offer to only print the message once per host considering the BIOS will
disabled the feature for all sockets/cores at once and not on a per core basis.

Signed-off-by: Erwan Velu <[email protected]>
---
arch/x86/kvm/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fbabb2f06273..8f0d7a09d453 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7300,7 +7300,7 @@ int kvm_arch_init(void *opaque)
goto out;
}
if (ops->disabled_by_bios()) {
- printk(KERN_ERR "kvm: disabled by bios\n");
+ printk_once(KERN_ERR "kvm: disabled by bios\n");
r = -EOPNOTSUPP;
goto out;
}
--
2.24.1


2020-02-14 17:06:22

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: Print "disabled by bios" only once per host

On Fri, Feb 14, 2020 at 03:30:35PM +0100, Erwan Velu wrote:
> The current behavior is to print a "disabled by bios" message per CPU thread.
> As modern CPUs can have up to 64 cores, 128 on a dual socket, and turns this
> printk to be a pretty noisy by showing up to 256 times the same line in a row.
>
> This patch offer to only print the message once per host considering the BIOS will
> disabled the feature for all sockets/cores at once and not on a per core basis.

This has come up before[*]. Using _once() doesn't fully solve the issue
when KVM is built as a module. The spam is more than likely a userspace
bug, i.e. userspace is probing KVM on every CPU.

[*] https://lkml.kernel.org/r/[email protected]

> Signed-off-by: Erwan Velu <[email protected]>
> ---
> arch/x86/kvm/x86.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fbabb2f06273..8f0d7a09d453 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7300,7 +7300,7 @@ int kvm_arch_init(void *opaque)
> goto out;
> }
> if (ops->disabled_by_bios()) {
> - printk(KERN_ERR "kvm: disabled by bios\n");
> + printk_once(KERN_ERR "kvm: disabled by bios\n");
> r = -EOPNOTSUPP;
> goto out;
> }
> --
> 2.24.1
>

2020-02-14 17:44:43

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: Print "disabled by bios" only once per host

On Fri, Feb 14, 2020 at 6:33 AM Erwan Velu <[email protected]> wrote:
>
> The current behavior is to print a "disabled by bios" message per CPU thread.
> As modern CPUs can have up to 64 cores, 128 on a dual socket, and turns this
> printk to be a pretty noisy by showing up to 256 times the same line in a row.
>
> This patch offer to only print the message once per host considering the BIOS will
> disabled the feature for all sockets/cores at once and not on a per core basis.

That's quite an assumption you're making. I guess I've seen more
broken BIOSes than you have. :-) Still, I would rather see the message
just once--perhaps with an additional warning if all logical
processors aren't in agreement.

2020-02-18 16:30:45

by Erwan Velu

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: Print "disabled by bios" only once per host

On 14/02/2020 18:05, Sean Christopherson wrote:
> This has come up before[*]. Using _once() doesn't fully solve the issue
> when KVM is built as a module. The spam is more than likely a userspace
> bug, i.e. userspace is probing KVM on every CPU.

I made some progress on this.


That's "/usr/bin/udevadm trigger --type=devices --action=add" the culprit.

It does echo "add" in /sys/devices/system/cpu/cpu<x>/uevent

For the each cpu, it does the 'add' which trigger the "disabled by bios"
message from kvm_arch_init.

Note that doing a "add" on the same processor will trigger the same
message at every "add" event.


So I tried the patch of using pr_err_once() instead of printk() and the
behavior is fine : despite the number of "add" generated, there is a
single line being printed out.

Without the patch, every "add" generates the "disabled by bios" message.


So the question is : do we want to handle the case where a possible bios
missed the configuration of some cores ?

If no, then the patch is fine and could be submitted. I don't see the
need of printing this message at every call as it pollute the kernel log.

If yes, then we need to keep a trace of the number of enabled/disabled
cores so we can report a mismatch. As this message seems printed per
cpu, that would kind of mean a global variable right ?


What are your recommendations on this ?


Erwan,

2020-02-18 18:48:21

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: Print "disabled by bios" only once per host

On Tue, Feb 18, 2020 at 05:28:49PM +0100, Erwan Velu wrote:
> On 14/02/2020 18:05, Sean Christopherson wrote:
> >This has come up before[*]. Using _once() doesn't fully solve the issue
> >when KVM is built as a module. The spam is more than likely a userspace
> >bug, i.e. userspace is probing KVM on every CPU.
>
> I made some progress on this.
>
>
> That's "/usr/bin/udevadm trigger --type=devices --action=add" the culprit.
>
> It does echo "add" in /sys/devices/system/cpu/cpu<x>/uevent
>
> For the each cpu, it does the 'add' which trigger the "disabled by bios"
> message from kvm_arch_init.
>
> Note that doing a "add" on the same processor will trigger the same message
> at every "add" event.
>
>
> So I tried the patch of using pr_err_once() instead of printk() and the
> behavior is fine : despite the number of "add" generated, there is a single
> line being printed out.
>
> Without the patch, every "add" generates the "disabled by bios" message.

That's a sort of unintentional side effect of KVM being split into two
modules, kvm and kvm_{intel,amd}. E.g. if userspace did 'rmmod kvm' on
failure of 'modprobe kvm_intel' then using _*once() would be ineffective.

> So the question is : do we want to handle the case where a possible bios
> missed the configuration of some cores ?

That's a question for AMD/SVM. Starting with kernel 5.6, Intel/VMX checks
for BIOS enabling on all CPUs.

That being said, checking for correct BIOS configuration on all CPUs is
orthogonal to this print statement issue. Probing kvm_intel on every CPU
doesn't do anything to address a misoncifgured BIOS, e.g. if VMX/SVM is
fully supported on CPU0 then additional probes of kvm_{intel,amd} are nops,
they don't actually check for support on other CPUs.

> If no, then the patch is fine and could be submitted. I don't see the need
> of printing this message at every call as it pollute the kernel log.
>
> If yes, then we need to keep a trace of the number of enabled/disabled cores
> so we can report a mismatch. As this message seems printed per cpu, that
> would kind of mean a global variable right ?
>
> What are your recommendations on this ?

Fix userspace to only do the "add" on one CPU.

Changing kvm_arch_init() to use pr_err_once() for the disabled_by_bios()
case "works", but it's effectively a hack to workaround a flawed userspace.

2020-02-19 11:21:48

by Erwan Velu

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: Print "disabled by bios" only once per host

On 18/02/2020 19:48, Sean Christopherson wrote:
[...]
> Fix userspace to only do the "add" on one CPU.
>
> Changing kvm_arch_init() to use pr_err_once() for the disabled_by_bios()
> case "works", but it's effectively a hack to workaround a flawed userspace.

I'll see with the user space tool to sort this out.

I'm also considering how "wrong" is what they do: udevadm trigger is
generating 3500  "uevent add" on my system and I only noticed kvm to
print this noisy message.

So as the print once isn't that "wrong" neither, this simple patch would
avoid polluting the kernel logs.


So my proposal would be

- have this simple patch on the kernel side to avoid having userspace
apps polluting logs

- contacting the udev people to see the rational & fix it too : I'll do that


As you said, once probed, there is no need reprinting the same message
again as the situation cannot have changed.

As we are on the preliminary return code path (to make a EOPNOTSUPP), I
would vote for protecting the print against multiple calls/prints.

The kernel patch isn't impacting anyone (in a regular case) and just
avoid pollution.


Would you agree on that ?

Erwan,

2020-02-19 11:34:05

by Erwan Velu

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: Print "disabled by bios" only once per host

On 19/02/2020 12:19, Erwan Velu wrote:
[...]
> - contacting the udev people to see the rational & fix it too : I'll
> do that

I created https://github.com/systemd/systemd/issues/14906, if some want
to participate/follow the discussions with systemd.

Erwan,

2020-02-19 16:19:21

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: Print "disabled by bios" only once per host

On Wed, Feb 19, 2020 at 12:19:01PM +0100, Erwan Velu wrote:
> On 18/02/2020 19:48, Sean Christopherson wrote:
> [...]
> >Fix userspace to only do the "add" on one CPU.
> >
> >Changing kvm_arch_init() to use pr_err_once() for the disabled_by_bios()
> >case "works", but it's effectively a hack to workaround a flawed userspace.
>
> I'll see with the user space tool to sort this out.
>
> I'm also considering how "wrong" is what they do: udevadm trigger is
> generating 3500? "uevent add" on my system and I only noticed kvm to print
> this noisy message.
>
> So as the print once isn't that "wrong" neither, this simple patch would
> avoid polluting the kernel logs.
>
>
> So my proposal would be
>
> - have this simple patch on the kernel side to avoid having userspace apps
> polluting logs
>
> - contacting the udev people to see the rational & fix it too : I'll do that
>
>
> As you said, once probed, there is no need reprinting the same message again
> as the situation cannot have changed.

For this exact scenario, on Intel/VMX, this is mostly true. But, the MSR
check for AMD/SVM has a disable bit that takes effect irrespective of the
MSR's locked bit, i.e. SVM could theoretically change state without any
super special behavior.

Even on Intel, the state can potentially change, especially on a system
with a misbehaving BIOS. FEATURE_CONTROL is cleared on CPU RESET, e.g. VMX
enabling could change if BIOS "forgets" to reinitialize the MSR upon waking
from S3 (suspend). Things get really weird if we consider the case where
BIOS leaves the MSR unlocked after S3, the user manually writes the MSR,
and then it gets cleared again on a different S3 transition.

SVM is even more sensitive because VM_CR is cleared on INIT, not just RESET.

> As we are on the preliminary return code path (to make a EOPNOTSUPP), I
> would vote for protecting the print against multiple calls/prints.
>
> The kernel patch isn't impacting anyone (in a regular case) and just avoid
> pollution.
>
> Would you agree on that ?

Sadly, no. Don't get me wrong, I completely agree that, ideally, KVM would
not spam the log, even when presented with a misbehaving userspace.

My hesitation about changing the error message to pr_err_once() isn't so
much about right versus wrong as it is about creating misleading and
potentially confusing code in KVM, and setting a precedent that I don't
think we want to carry forward.

E.g. the _once() doesn't hold true if module kvm is unloaded and other
error messages such as basic CPU support would still be unlimited. The
basic CPU support message definitely should *not* be _once() as that would
squash messages when loading the wrong vendor module.

As for setting a precedent, moving the error message to the vendor module
or making kvm a monolithic module would "break" the _once() behavior.

And, the current systemd behavior is actually quite dangerous, e.g. on a
misconfigured system where SVM is enabled on a subset of CPUs, probing KVM
on every CPU essentially guarantees that KVM will be loaded on a broken
system. In that case, I think we actually want the spam. Note, as of
kernel 5.6, this doesn't apply to VMX as kvm_intel will no longer load on a
misconfigured system since FEATURE_CONTROL configuration is incorporated
into the per-CPU checks.

All of that being said, what about converting all of the error messages to
pr_err_ratelimited()? That would take the edge off this particular problem,
wouldn't create incosistencies between error messages, and won't completely
squash error messages in corner case scenarios on misconfigured systems.

2020-02-19 16:56:41

by Erwan Velu

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: Print "disabled by bios" only once per host

[..]

On 19/02/2020 17:18, Sean Christopherson wrote:
> All of that being said, what about converting all of the error messages to
> pr_err_ratelimited()? That would take the edge off this particular problem,
> wouldn't create incosistencies between error messages, and won't completely
> squash error messages in corner case scenarios on misconfigured systems.

Thanks Sean for your very detailed answer.

I've been testing the ratelimited which is far better but still prints
12 messages.

I saw the ratelimit is on about 5 sec, I wonder if we can explicit a
longer one for this one.

I searched around this but it doesn't seems that hacking the delay is a
common usage.

Do you have any insights/ideas around that ?


Switching to ratelimit could be done by replacing the actual call or add
a macro similar to  kvm_pr_unimpl() so it can be reused easily.

I can offer the patches for that.


Erwan,


2020-02-19 17:45:34

by Erwan Velu

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: Print "disabled by bios" only once per host


On 19/02/2020 12:32, Erwan Velu wrote:
> On 19/02/2020 12:19, Erwan Velu wrote:
> [...]
>> - contacting the udev people to see the rational & fix it too : I'll
>> do that
>
> I created https://github.com/systemd/systemd/issues/14906, if some
> want to participate/follow the discussions with systemd.
>
It appear that its fixed on recent systemd, I tried at least starting
from v239 this is fixed. I didn't tried earlier releases as they are
painful to build.

To me that doesn't means that the ratelimited printing have no interest
for all existing systems unless we weren't at least two reporting this ;)

Erwan,

2020-02-19 17:51:33

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: Print "disabled by bios" only once per host

On 19/02/20 17:53, Erwan Velu wrote:
>
>
> I've been testing the ratelimited which is far better but still prints
> 12 messages.

12 is already much better than 256. Someone else will have an even
bigger system requiring a larger delay, so I'd go with the default.

Paolo

> I saw the ratelimit is on about 5 sec, I wonder if we can explicit a
> longer one for this one.
>
> I searched around this but it doesn't seems that hacking the delay is a
> common usage.
>
> Do you have any insights/ideas around that ?
>
>
> Switching to ratelimit could be done by replacing the actual call or add
> a macro similar to  kvm_pr_unimpl() so it can be reused easily.

2020-02-19 21:52:48

by Erwan Velu

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: Print "disabled by bios" only once per host

I'll send a patch in this direction.
Thanks,

Le mer. 19 févr. 2020 à 18:51, Paolo Bonzini <[email protected]> a écrit :
>
> On 19/02/20 17:53, Erwan Velu wrote:
> >
> >
> > I've been testing the ratelimited which is far better but still prints
> > 12 messages.
>
> 12 is already much better than 256. Someone else will have an even
> bigger system requiring a larger delay, so I'd go with the default.
>
> Paolo
>
> > I saw the ratelimit is on about 5 sec, I wonder if we can explicit a
> > longer one for this one.
> >
> > I searched around this but it doesn't seems that hacking the delay is a
> > common usage.
> >
> > Do you have any insights/ideas around that ?
> >
> >
> > Switching to ratelimit could be done by replacing the actual call or add
> > a macro similar to kvm_pr_unimpl() so it can be reused easily.
>

2020-02-27 18:03:35

by Erwan Velu

[permalink] [raw]
Subject: [PATCH v2] kvm: x86: Limit the number of "kvm: disabled by bios" messages

In older version of systemd(219), at boot time, udevadm is called with :
/usr/bin/udevadm trigger --type=devices --action=add"

This program generates an echo "add" in /sys/devices/system/cpu/cpu<x>/uevent,
leading to the "kvm: disabled by bios" message in case of your Bios disabled
the virtualization extensions.

On a modern system running up to 256 CPU threads, this pollutes the Kernel logs.

This patch offers to ratelimit this message to avoid any userspace program triggering
this uevent printing this message too often.

This patch is only a workaround but greatly reduce the pollution without
breaking the current behavior of printing a message if some try to instantiate
KVM on a system that doesn't support it.

Note that recent versions of systemd (>239) do not have trigger this behavior.

This patch will be useful at least for some using older systemd with recent Kernels.

Signed-off-by: Erwan Velu <[email protected]>
---
arch/x86/kvm/x86.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 359fcd395132..c8a90231befe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7308,12 +7308,12 @@ int kvm_arch_init(void *opaque)
}

if (!ops->cpu_has_kvm_support()) {
- printk(KERN_ERR "kvm: no hardware support\n");
+ pr_err_ratelimited("kvm: no hardware support\n");
r = -EOPNOTSUPP;
goto out;
}
if (ops->disabled_by_bios()) {
- printk(KERN_ERR "kvm: disabled by bios\n");
+ pr_err_ratelimited("kvm: disabled by bios\n");
r = -EOPNOTSUPP;
goto out;
}
--
2.24.1

2020-02-28 10:39:16

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] kvm: x86: Limit the number of "kvm: disabled by bios" messages

On 27/02/20 19:00, Erwan Velu wrote:
> In older version of systemd(219), at boot time, udevadm is called with :
> /usr/bin/udevadm trigger --type=devices --action=add"
>
> This program generates an echo "add" in /sys/devices/system/cpu/cpu<x>/uevent,
> leading to the "kvm: disabled by bios" message in case of your Bios disabled
> the virtualization extensions.
>
> On a modern system running up to 256 CPU threads, this pollutes the Kernel logs.
>
> This patch offers to ratelimit this message to avoid any userspace program triggering
> this uevent printing this message too often.
>
> This patch is only a workaround but greatly reduce the pollution without
> breaking the current behavior of printing a message if some try to instantiate
> KVM on a system that doesn't support it.
>
> Note that recent versions of systemd (>239) do not have trigger this behavior.
>
> This patch will be useful at least for some using older systemd with recent Kernels.
>
> Signed-off-by: Erwan Velu <[email protected]>
> ---
> arch/x86/kvm/x86.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 359fcd395132..c8a90231befe 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7308,12 +7308,12 @@ int kvm_arch_init(void *opaque)
> }
>
> if (!ops->cpu_has_kvm_support()) {
> - printk(KERN_ERR "kvm: no hardware support\n");
> + pr_err_ratelimited("kvm: no hardware support\n");
> r = -EOPNOTSUPP;
> goto out;
> }
> if (ops->disabled_by_bios()) {
> - printk(KERN_ERR "kvm: disabled by bios\n");
> + pr_err_ratelimited("kvm: disabled by bios\n");
> r = -EOPNOTSUPP;
> goto out;
> }
>

Queued, thanks.

Paolo