2020-06-12 10:54:49

by Borislav Petkov

[permalink] [raw]
Subject: [RFC PATCH] x86/msr: Filter MSR writes

Hi,

so this has been popping up from time to time in the last couple of
years so let's have a go at it. The reason for it is explained in the
commit message below but basically the goal is to have MSR writes
disabled by default on distros and on the general Linux setup and only
those who know what they're doing, can reenable them if they really need
to.

The other, more important goal is to stop the perpetuation of poking at
naked MSRs from userspace instead of defining proper functionality and
interfaces which synchronize with the kernel's access to those MSRs.

Hell, even while testing this, I did

# wrmsr --all 0x10 12; rdmsr --all 0x10

0x10 being the TSC MSR and the module wouldn't unload after that. And
this is just the latest example of what can go wrong.

Now, a concern with questionable validity has been raised that this
might break "some tools on github" which poke directly at MSRs. And we
don't break userspace.

Well, for starters, the functionality is still there - it is just behind
a module parameter. Then, it'll hopefully convince people providing the
functionality controlled by those MSRs, to design proper interfaces for
that.

For example, the whitelisted MSR_IA32_ENERGY_PERF_BIAS is used by
cpupower in tools/. The hope is that this gets converted to a proper
interface too.

In any case, let me add Linus as I might be missing some angle here.

Thx.

--

Disable writing to MSRs from userspace by default. Writes can still be
allowed by supplying the allow_writes=1 module parameter and the kernel
will be tainted so that it shows in oopses.

Having unfettered access to all MSRs on a system is and has always been
a disaster waiting to happen. Think performance counter MSRs, MSRs with
sticky or locked bits, MSRs making major system changes like loading
microcode, MTRRs, PAT configuration, TSC counter, security mitigations
MSRs, you name it.

This also destroys all the kernel's caching of MSR values for
performance, as the recent case with MSR_AMD64_LS_CFG showed.

Another example is writing MSRs by mistake by simply typing the wrong
MSR address. System freezes have been experienced that way.

In general, poking at MSRs under the kernel's feet is a bad bad idea.

So disable poking directly at the MSRs by default. If userspace still
wants to do that, then proper interfaces should be defined which
are under the kernel's control and accesses to those MSRs can be
synchronized and sanitized properly.

Signed-off-by: Borislav Petkov <[email protected]>

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index 1547be359d7f..931e7b00ffb7 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -41,6 +41,7 @@

static struct class *msr_class;
static enum cpuhp_state cpuhp_msr_state;
+static bool allow_writes;

static ssize_t msr_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
@@ -70,6 +71,18 @@ static ssize_t msr_read(struct file *file, char __user *buf,
return bytes ? bytes : err;
}

+static int filter_write(u32 reg)
+{
+ switch (reg) {
+ case MSR_IA32_ENERGY_PERF_BIAS:
+ return 0;
+
+ default:
+ pr_err("%s: Filter out MSR write to 0x%x\n", __func__, reg);
+ return -EPERM;
+ }
+}
+
static ssize_t msr_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
@@ -84,6 +97,12 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
if (err)
return err;

+ if (!allow_writes) {
+ err = filter_write(reg);
+ if (err)
+ return err;
+ }
+
if (count % 8)
return -EINVAL; /* Invalid chunk size */

@@ -95,11 +114,18 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
err = wrmsr_safe_on_cpu(cpu, reg, data[0], data[1]);
if (err)
break;
+
tmp += 2;
bytes += 8;
}

- return bytes ? bytes : err;
+ if (bytes) {
+ add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
+
+ return bytes;
+ }
+
+ return err;
}

static long msr_ioctl(struct file *file, unsigned int ioc, unsigned long arg)
@@ -242,6 +268,8 @@ static void __exit msr_exit(void)
}
module_exit(msr_exit)

+module_param(allow_writes, bool, 0400);
+
MODULE_AUTHOR("H. Peter Anvin <[email protected]>");
MODULE_DESCRIPTION("x86 generic MSR driver");
MODULE_LICENSE("GPL");


2020-06-12 16:36:33

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/msr: Filter MSR writes

On Fri, Jun 12, 2020 at 12:50:26PM +0200, Borislav Petkov wrote:
> @@ -95,11 +114,18 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
> err = wrmsr_safe_on_cpu(cpu, reg, data[0], data[1]);
> if (err)
> break;
> +
> tmp += 2;
> bytes += 8;
> }
>
> - return bytes ? bytes : err;
> + if (bytes) {
> + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);

The kernel should be tainted if the WRMSR is attempted, regardless of
whether it succeeds, and it should happen before the WRMSR. E.g. pointing
MSR_IA32_DS_AREA at a bad address will likely cause an OOPS on the #PF
that would occur the next time the CPU attempts to access the area, which
could happen before wrmsr_safe_on_cpu() even returns. In general, there's
no telling what microcode magic is buried behind WRMSR, i.e. a fault on
WRMSR is not a good indicator that the CPU is still in a sane state.

It might also make sense to do pr_err/warn_ratelimited() before the WRMSR,
e.g. to help triage MSR writes that cause insta-death or lead to known bad
behavior down the line.

> +
> + return bytes;
> + }
> +
> + return err;
> }
>
> static long msr_ioctl(struct file *file, unsigned int ioc, unsigned long arg)
> @@ -242,6 +268,8 @@ static void __exit msr_exit(void)
> }
> module_exit(msr_exit)
>
> +module_param(allow_writes, bool, 0400);

This can be 0600, or maybe 0644, i.e. allow the user to enable/disable
writes after the module has been loaded.

> +
> MODULE_AUTHOR("H. Peter Anvin <[email protected]>");
> MODULE_DESCRIPTION("x86 generic MSR driver");
> MODULE_LICENSE("GPL");

2020-06-12 16:49:26

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/msr: Filter MSR writes

On Fri, Jun 12, 2020 at 09:34:06AM -0700, Sean Christopherson wrote:
> The kernel should be tainted if the WRMSR is attempted, regardless of
> whether it succeeds, and it should happen before the WRMSR. E.g. pointing
> MSR_IA32_DS_AREA at a bad address will likely cause an OOPS on the #PF

If the MSR write fails, MSR_IA32_DS_AREA won't have the bad address. If
the writes fail, nothing has been changed.

> This can be 0600, or maybe 0644, i.e. allow the user to enable/disable
> writes after the module has been loaded.

What for?

crw------- 1 root root 202, 0 Jun 10 19:54 /dev/cpu/0/msr

You need root to write to the chrdev.

Also, the intent is *not* to open it more but to close it so that the
incentive to design proper interfaces is there.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-06-12 17:01:27

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/msr: Filter MSR writes

On Fri, Jun 12, 2020 at 06:46:02PM +0200, Borislav Petkov wrote:
> On Fri, Jun 12, 2020 at 09:34:06AM -0700, Sean Christopherson wrote:
> > The kernel should be tainted if the WRMSR is attempted, regardless of
> > whether it succeeds, and it should happen before the WRMSR. E.g. pointing
> > MSR_IA32_DS_AREA at a bad address will likely cause an OOPS on the #PF
>
> If the MSR write fails, MSR_IA32_DS_AREA won't have the bad address. If
> the writes fail, nothing has been changed.

DS_AREA takes a virtual (linear) address, i.e. the address can be legal from
the CPUs perspective but still lead to a #PF due to the address not being
mapped in the page tables.

> > This can be 0600, or maybe 0644, i.e. allow the user to enable/disable
> > writes after the module has been loaded.
>
> What for?

So users don't have to unload and reload the module just to enable or
disable writes. I don't think it changes the protections in any way, a
priveleged user still needs to explicitly toggle the control.

> crw------- 1 root root 202, 0 Jun 10 19:54 /dev/cpu/0/msr
>
> You need root to write to the chrdev.
>
> Also, the intent is *not* to open it more but to close it so that the
> incentive to design proper interfaces is there.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2020-06-12 17:06:20

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/msr: Filter MSR writes

On Fri, Jun 12, 2020 at 09:57:09AM -0700, Sean Christopherson wrote:
> DS_AREA takes a virtual (linear) address, i.e. the address can be legal from
> the CPUs perspective but still lead to a #PF due to the address not being
> mapped in the page tables.

It's not that - peterz and tglx - and I assume you meant that too - you
all want to taint on the very *attempt* to WRMSR, regardless of whether
the MSR exists or not.

I don't necessarily agree with that because I don't think we should
taint when the MSR doesn't exist but if you all want it, sure, whatever.
I don't care that deeply.

> So users don't have to unload and reload the module just to enable or
> disable writes. I don't think it changes the protections in any way, a
> priveleged user still needs to explicitly toggle the control.

There's /sys/module/msr/parameters/. A privileged user can do whatever.
A non-privileged should not disable that.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-06-12 17:24:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/msr: Filter MSR writes

On Fri, Jun 12, 2020 at 3:50 AM Borislav Petkov <[email protected]> wrote:
>
> Disable writing to MSRs from userspace by default. Writes can still be
> allowed by supplying the allow_writes=1 module parameter and the kernel
> will be tainted so that it shows in oopses.

Since you already added the filtering, this looks fairly sane.

IOW, what MSR's do we expect people to maybe write to normally? You
added MSR_IA32_ENERGY_PERF_BIAS as an allowed MST, maybe there are
others?

So I'm not against this, but I suspect the whitelist should be thought
through more, and then maybe the "allow_writes" parameter should be
yes/no/default/<msr-list>, where "default" is that list of
known-normal MSR's.

And I suspect it's a lot longer list than your single one. IIRC,
people were working around CPU bugs or features by doing MSR writes at
startup.

So the first phase might be to introduce this, but have the default
for non-recognized MSR's be "log", not "deny".

Linus

2020-06-12 17:45:22

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/msr: Filter MSR writes

On Fri, Jun 12, 2020 at 07:03:03PM +0200, Borislav Petkov wrote:
> On Fri, Jun 12, 2020 at 09:57:09AM -0700, Sean Christopherson wrote:
> > DS_AREA takes a virtual (linear) address, i.e. the address can be legal from
> > the CPUs perspective but still lead to a #PF due to the address not being
> > mapped in the page tables.
>
> It's not that - peterz and tglx - and I assume you meant that too - you
> all want to taint on the very *attempt* to WRMSR, regardless of whether
> the MSR exists or not.
>
> I don't necessarily agree with that because I don't think we should
> taint when the MSR doesn't exist but if you all want it, sure, whatever.
> I don't care that deeply.

The problem is a fault on WRMSR doesn't mean the MSR doesn't exist, it only
means WRMSR faulted. WRMSR can for all intents and purpose trigger completely
arbitrary microcode flows, e.g. WRMSR 0x79 can fundamentally change the
behavior of the CPU.

And it's not like the WRMSR->taint is atomic, e.g. changing a platform scoped
MSR that affects voltage settings or something of that nature could easily
tank the system on a successful WRMSR before the kernel can be marked tainted.

> > So users don't have to unload and reload the module just to enable or
> > disable writes. I don't think it changes the protections in any way, a
> > priveleged user still needs to explicitly toggle the control.
>
> There's /sys/module/msr/parameters/. A privileged user can do whatever.
> A non-privileged should not disable that.

0400 only allows a privelged user to read the parameter, e.g. for parameters
that are snapshotted at module load time and/or changing the param while the
module is running would cause breakage.

0600 allows a priveleged user to read and write the parameter, which AFAICT
is safe here.

0644 allows a priveleged user to read and write the parameter, and allows an
unpriveleged user to read the param.

> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2020-06-12 17:51:09

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/msr: Filter MSR writes

On Fri, Jun 12, 2020 at 10:20:03AM -0700, Linus Torvalds wrote:
> Since you already added the filtering, this looks fairly sane.
>
> IOW, what MSR's do we expect people to maybe write to normally? You
> added MSR_IA32_ENERGY_PERF_BIAS as an allowed MST, maybe there are
> others?

Right, this MSR is being written by cpupower in tools/. My search was
confined within the kernel source only so there very likely are others.

If we want to add others, though, I think the criteria would be, if
writing to a MSR is safe - and this is where it becomes fuzzy but to use
the above example: if all it does is give hints to the hw but the hw can
ignore those hints and there's no other effect on the hw, then those
writes are fairly safe.

I'm pretty sure we'll massage that definition of "safe" over time.

> So I'm not against this, but I suspect the whitelist should be thought
> through more,

Yap.

> and then maybe the "allow_writes" parameter should be
> yes/no/default/<msr-list>, where "default" is that list of
> known-normal MSR's.

So I know of another out-of-tree tool which has a whole whitelist
management of MSRs: https://github.com/LLNL/msr-safe

Question is, what use cases we want to support. My thinking is to start
small and then keep extending it based on use cases we want to support.

> And I suspect it's a lot longer list than your single one. IIRC,
> people were working around CPU bugs or features by doing MSR writes at
> startup.
>
> So the first phase might be to introduce this, but have the default
> for non-recognized MSR's be "log", not "deny".

Ok. How are we going to "learn" about those non-recognized MSRs? Ask
people to send us a note to lkml so that we can fix the list once they
see the logged message?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-06-12 17:55:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/msr: Filter MSR writes

On Fri, Jun 12, 2020 at 10:43:07AM -0700, Sean Christopherson wrote:
> The problem is a fault on WRMSR doesn't mean the MSR doesn't exist, it only
> means WRMSR faulted. WRMSR can for all intents and purpose trigger completely
> arbitrary microcode flows, e.g. WRMSR 0x79 can fundamentally change the
> behavior of the CPU.

Yes, that case is in the commit message.

> And it's not like the WRMSR->taint is atomic, e.g. changing a platform scoped
> MSR that affects voltage settings or something of that nature could easily
> tank the system on a successful WRMSR before the kernel can be marked tainted.

Yes, yes, I'll taint before the WRMSR.

> 0400 only allows a privelged user to read the parameter, e.g. for parameters
> that are snapshotted at module load time and/or changing the param while the
> module is running would cause breakage.
>
> 0600 allows a priveleged user to read and write the parameter, which AFAICT
> is safe here.

Ok, we can do that.

> 0644 allows a priveleged user to read and write the parameter, and allows an
> unpriveleged user to read the param.

Not so sure about that. Why would the unprivileged user need to read it?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-06-12 19:50:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/msr: Filter MSR writes

On Fri, Jun 12, 2020 at 07:48:01PM +0200, Borislav Petkov wrote:
> > So the first phase might be to introduce this, but have the default
> > for non-recognized MSR's be "log", not "deny".
>
> Ok. How are we going to "learn" about those non-recognized MSRs? Ask
> people to send us a note to lkml so that we can fix the list once they
> see the logged message?

We could do something noisy like this:

[ 27.395795] msr: filter_write: Write to unrecognized MSR 0x10
[ 27.396865] msr: Please report to [email protected]
[ 27.397811] ------------[ cut here ]------------
[ 27.398630] WARNING: CPU: 10 PID: 4405 at arch/x86/kernel/msr.c:83 msr_write.cold+0x1f/0x26 [msr]
[ 27.400123] Modules linked in: msr
[ 27.400728] CPU: 10 PID: 4405 Comm: wrmsr Not tainted 5.7.0+ #2
[ 27.400838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[ 27.400838] RIP: 0010:msr_write.cold+0x1f/0x26 [msr]
[ 27.400838] Code: eb e0 49 c7 c4 ea ff ff ff eb d7 48 c7 c6 40 11 00 a0 48 c7 c7 c0 10 00 a0 e8 5f 83 0c e1 48 c7 c7 f0 10 00 a0 e8 53 83 0c e1 <0f> 0b e9 af fe ff ff 8b 3d 7c 20 00 00 be 01 00 00 00 e8 1e c7 06
[ 27.400838] RSP: 0018:ffffc900003d7e98 EFLAGS: 00010282
[ 27.400838] RAX: 0000000000000024 RBX: 0000000000000008 RCX: 0000000000000007
[ 27.400838] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff88807dc98360
[ 27.400838] RBP: 00007fffffffe978 R08: 00000000000001ca R09: 0000000000000000
[ 27.400838] R10: ffffffff82549708 R11: ffffc900003d7d50 R12: 0000000000000008
[ 27.400838] R13: 0000000000000000 R14: 0000000000000010 R15: 0000000000000000
[ 27.400838] FS: 00007ffff7fb7540(0000) GS:ffff88807dc80000(0000) knlGS:0000000000000000
[ 27.400838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 27.400838] CR2: 00007ffff7e672a0 CR3: 0000000078a4c000 CR4: 00000000003406e0
[ 27.400838] Call Trace:
[ 27.400838] vfs_write+0xe4/0x1e0
[ 27.400838] ksys_pwrite64+0x7d/0xb0
[ 27.400838] do_syscall_64+0x48/0x140
[ 27.400838] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 27.400838] RIP: 0033:0x7ffff7edda37
[ 27.400838] Code: ff ff eb b6 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 c9 7c 0d 00 49 89 ca 8b 00 85 c0 75 10 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 c3 41 55 49 89 cd 41 54 49 89 d4 55 48 89
[ 27.400838] RSP: 002b:00007fffffffe968 EFLAGS: 00000246 ORIG_RAX: 0000000000000012
[ 27.400838] RAX: ffffffffffffffda RBX: 00007fffffffeb78 RCX: 00007ffff7edda37
[ 27.400838] RDX: 0000000000000008 RSI: 00007fffffffe978 RDI: 0000000000000003
[ 27.400838] RBP: 0000000000000003 R08: 00007fffffffedb6 R09: 0000000000000000
[ 27.400838] R10: 0000000000000010 R11: 0000000000000246 R12: 00007fffffffeb78
[ 27.400838] R13: 0000000000000010 R14: 0000000000000000 R15: 0000000000000010
[ 27.400838] ---[ end trace 3bd374c0e61187b1 ]---

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-06-12 20:42:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/msr: Filter MSR writes

On Fri, Jun 12, 2020 at 07:48:01PM +0200, Borislav Petkov wrote:
> On Fri, Jun 12, 2020 at 10:20:03AM -0700, Linus Torvalds wrote:
> > Since you already added the filtering, this looks fairly sane.
> >
> > IOW, what MSR's do we expect people to maybe write to normally? You
> > added MSR_IA32_ENERGY_PERF_BIAS as an allowed MST, maybe there are
> > others?
>
> Right, this MSR is being written by cpupower in tools/. My search was
> confined within the kernel source only so there very likely are others.

So that tool writing to /dev/msr has already caused pain; the direct
result is that the intel pstate driver doesn't want to use an MSR shadow
variable to avoid RDMSR because that'd loose input.

https://lkml.org/lkml/2019/3/25/310

(sorry, that's what google found me)

So ideally we'd just disallow it too. It already has a sysfs file (per
those patches):

Documentation/admin-guide/pm/intel_epb.rst

2020-06-13 05:44:50

by Tony Luck

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/msr: Filter MSR writes

On Fri, Jun 12, 2020 at 1:41 PM Peter Zijlstra <[email protected]> wrote:
>
> On Fri, Jun 12, 2020 at 07:48:01PM +0200, Borislav Petkov wrote:
> > On Fri, Jun 12, 2020 at 10:20:03AM -0700, Linus Torvalds wrote:
> > > Since you already added the filtering, this looks fairly sane.
> > >
> > > IOW, what MSR's do we expect people to maybe write to normally? You
> > > added MSR_IA32_ENERGY_PERF_BIAS as an allowed MST, maybe there are
> > > others?
> >
> > Right, this MSR is being written by cpupower in tools/. My search was
> > confined within the kernel source only so there very likely are others.
>
> So that tool writing to /dev/msr has already caused pain; the direct
> result is that the intel pstate driver doesn't want to use an MSR shadow
> variable to avoid RDMSR because that'd loose input.
>
> https://lkml.org/lkml/2019/3/25/310
>
> (sorry, that's what google found me)
>
> So ideally we'd just disallow it too. It already has a sysfs file (per
> those patches):
>
> Documentation/admin-guide/pm/intel_epb.rst

Some group internal at Intel want something like this, but more extensive,
They want to limit RDMSR to a subset (not exactly sure why, I don't
know of MSRs that have side effects on read ... but then again
not all of the MSR space is documented).

On the write side they divide into categories:
1) Some MSRs can only be cleared.
2) Some MSRs can only have certain bits set
3) Some MSRs allow any write
4) Maybe something else ... this is from memory, and a somewhat
cursory read of their patch.

They have maybe a couple of dozen MSRs split between those classes.

-Tony

2020-06-13 09:41:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/msr: Filter MSR writes

On Fri, Jun 12, 2020 at 10:39:35PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 12, 2020 at 07:48:01PM +0200, Borislav Petkov wrote:
> > On Fri, Jun 12, 2020 at 10:20:03AM -0700, Linus Torvalds wrote:
> > > Since you already added the filtering, this looks fairly sane.
> > >
> > > IOW, what MSR's do we expect people to maybe write to normally? You
> > > added MSR_IA32_ENERGY_PERF_BIAS as an allowed MST, maybe there are
> > > others?
> >
> > Right, this MSR is being written by cpupower in tools/. My search was
> > confined within the kernel source only so there very likely are others.
>
> So that tool writing to /dev/msr has already caused pain; the direct
> result is that the intel pstate driver doesn't want to use an MSR shadow
> variable to avoid RDMSR because that'd loose input.
>
> https://lkml.org/lkml/2019/3/25/310
>
> (sorry, that's what google found me)
>
> So ideally we'd just disallow it too. It already has a sysfs file (per
> those patches):
>
> Documentation/admin-guide/pm/intel_epb.rst

Damn, that has fallen off my radar completely and the reason for me
requesting the sysfs interface is the *same* - kill the direct MSR
access.

Rafael, how about I refresh those patches and teach cpupower to access
the sysfs interface too and we drop that MSR from the whitelist too?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-06-13 15:51:10

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH -v2] x86/msr: Filter MSR writes

Hi,

here's v2 with the requested changes.

The whitelist is still TBD, I might be able to remove it competely and
defer the whole whitelisting to the future. when people start reporting
MSRs (see pr_err_ratelimited() call below).

---
Disable writing to MSRs from userspace by default. Writes can still be
allowed by supplying the allow_writes=1 module parameter and the kernel
will be tainted so that it shows in oopses.

Having unfettered access to all MSRs on a system is and has always been
a disaster waiting to happen. Think performance counter MSRs, MSRs with
sticky or locked bits, MSRs making major system changes like loading
microcode, MTRRs, PAT configuration, TSC counter, security mitigations
MSRs, you name it.

This also destroys all the kernel's caching of MSR values for
performance, as the recent case with MSR_AMD64_LS_CFG showed.

Another example is writing MSRs by mistake by simply typing the wrong
MSR address. System freezes have been experienced that way.

In general, poking at MSRs under the kernel's feet is a bad bad idea.

So disable poking directly at the MSRs by default. If userspace still
wants to do that, then proper interfaces should be defined which
are under the kernel's control and accesses to those MSRs can be
synchronized and sanitized properly.

Changelog:
- taint before WRMSR, all
- make param 0600, Sean.
- do not deny but log writes by default, Linus.

Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/msr.c | 68 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index 1547be359d7f..1a4f8b30fb09 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -42,6 +42,14 @@
static struct class *msr_class;
static enum cpuhp_state cpuhp_msr_state;

+enum allow_write_msrs {
+ MSR_WRITES_ON,
+ MSR_WRITES_OFF,
+ MSR_WRITES_DEFAULT,
+};
+
+static enum allow_write_msrs allow_writes = MSR_WRITES_DEFAULT;
+
static ssize_t msr_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
@@ -70,6 +78,24 @@ static ssize_t msr_read(struct file *file, char __user *buf,
return bytes ? bytes : err;
}

+static int filter_write(u32 reg)
+{
+ switch (allow_writes) {
+ case MSR_WRITES_ON: return 0; break;
+ case MSR_WRITES_OFF: return -EPERM; break;
+ default: fallthrough;
+ }
+
+ if (reg == MSR_IA32_ENERGY_PERF_BIAS)
+ return 0;
+
+ pr_err_ratelimited("Write to unrecognized MSR 0x%x by %s\n"
+ "Please report to [email protected]\n",
+ reg, current->comm);
+
+ return 0;
+}
+
static ssize_t msr_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
@@ -84,6 +110,10 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
if (err)
return err;

+ err = filter_write(reg);
+ if (err)
+ return err;
+
if (count % 8)
return -EINVAL; /* Invalid chunk size */

@@ -92,9 +122,13 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
err = -EFAULT;
break;
}
+
+ add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
+
err = wrmsr_safe_on_cpu(cpu, reg, data[0], data[1]);
if (err)
break;
+
tmp += 2;
bytes += 8;
}
@@ -242,6 +276,40 @@ static void __exit msr_exit(void)
}
module_exit(msr_exit)

+static int set_allow_writes(const char *val, const struct kernel_param *cp)
+{
+ char *s = strstrip(val);
+
+ if (!strcmp(s, "on"))
+ allow_writes = MSR_WRITES_ON;
+ else if (!strcmp(s, "off"))
+ allow_writes = MSR_WRITES_OFF;
+ else
+ allow_writes = MSR_WRITES_DEFAULT;
+
+ return 0;
+}
+
+static int get_allow_writes(char *buf, const struct kernel_param *kp)
+{
+ const char *res;
+
+ switch (allow_writes) {
+ case MSR_WRITES_ON: res = "on"; break;
+ case MSR_WRITES_OFF: res = "off"; break;
+ default: res = "default"; break;
+ }
+
+ return sprintf(buf, "%s\n", res);
+}
+
+static const struct kernel_param_ops allow_writes_ops = {
+ .set = set_allow_writes,
+ .get = get_allow_writes
+};
+
+module_param_cb(allow_writes, &allow_writes_ops, NULL, 0600);
+
MODULE_AUTHOR("H. Peter Anvin <[email protected]>");
MODULE_DESCRIPTION("x86 generic MSR driver");
MODULE_LICENSE("GPL");
--
2.21.0


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-06-15 06:41:41

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH -v2.1] x86/msr: Filter MSR writes

Here's an improved v2 with sparse warnings fixed:

---
Disable writing to MSRs from userspace by default. Writes can still be
allowed by supplying the allow_writes=1 module parameter and the kernel
will be tainted so that it shows in oopses.

Having unfettered access to all MSRs on a system is and has always been
a disaster waiting to happen. Think performance counter MSRs, MSRs with
sticky or locked bits, MSRs making major system changes like loading
microcode, MTRRs, PAT configuration, TSC counter, security mitigations
MSRs, you name it.

This also destroys all the kernel's caching of MSR values for
performance, as the recent case with MSR_AMD64_LS_CFG showed.

Another example is writing MSRs by mistake by simply typing the wrong
MSR address. System freezes have been experienced that way.

In general, poking at MSRs under the kernel's feet is a bad bad idea.

So disable poking directly at the MSRs by default. If userspace still
wants to do that, then proper interfaces should be defined which
are under the kernel's control and accesses to those MSRs can be
synchronized and sanitized properly.

Changelog:
- taint before WRMSR, all
- make param 0600, Sean.
- do not deny but log writes by default, Linus.

[ Fix sparse warnings ]
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/msr.c | 69 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 69 insertions(+)

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index 1547be359d7f..576c43e39247 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -42,6 +42,14 @@
static struct class *msr_class;
static enum cpuhp_state cpuhp_msr_state;

+enum allow_write_msrs {
+ MSR_WRITES_ON,
+ MSR_WRITES_OFF,
+ MSR_WRITES_DEFAULT,
+};
+
+static enum allow_write_msrs allow_writes = MSR_WRITES_DEFAULT;
+
static ssize_t msr_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
@@ -70,6 +78,24 @@ static ssize_t msr_read(struct file *file, char __user *buf,
return bytes ? bytes : err;
}

+static int filter_write(u32 reg)
+{
+ switch (allow_writes) {
+ case MSR_WRITES_ON: return 0; break;
+ case MSR_WRITES_OFF: return -EPERM; break;
+ default: break;
+ }
+
+ if (reg == MSR_IA32_ENERGY_PERF_BIAS)
+ return 0;
+
+ pr_err_ratelimited("Write to unrecognized MSR 0x%x by %s\n"
+ "Please report to [email protected]\n",
+ reg, current->comm);
+
+ return 0;
+}
+
static ssize_t msr_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
@@ -84,6 +110,10 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
if (err)
return err;

+ err = filter_write(reg);
+ if (err)
+ return err;
+
if (count % 8)
return -EINVAL; /* Invalid chunk size */

@@ -92,9 +122,13 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
err = -EFAULT;
break;
}
+
+ add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
+
err = wrmsr_safe_on_cpu(cpu, reg, data[0], data[1]);
if (err)
break;
+
tmp += 2;
bytes += 8;
}
@@ -242,6 +276,41 @@ static void __exit msr_exit(void)
}
module_exit(msr_exit)

+static int set_allow_writes(const char *val, const struct kernel_param *cp)
+{
+ /* val is NUL-terminated, see kernfs_fop_write() */
+ char *s = strstrip((char *)val);
+
+ if (!strcmp(s, "on"))
+ allow_writes = MSR_WRITES_ON;
+ else if (!strcmp(s, "off"))
+ allow_writes = MSR_WRITES_OFF;
+ else
+ allow_writes = MSR_WRITES_DEFAULT;
+
+ return 0;
+}
+
+static int get_allow_writes(char *buf, const struct kernel_param *kp)
+{
+ const char *res;
+
+ switch (allow_writes) {
+ case MSR_WRITES_ON: res = "on"; break;
+ case MSR_WRITES_OFF: res = "off"; break;
+ default: res = "default"; break;
+ }
+
+ return sprintf(buf, "%s\n", res);
+}
+
+static const struct kernel_param_ops allow_writes_ops = {
+ .set = set_allow_writes,
+ .get = get_allow_writes
+};
+
+module_param_cb(allow_writes, &allow_writes_ops, NULL, 0600);
+
MODULE_AUTHOR("H. Peter Anvin <[email protected]>");
MODULE_DESCRIPTION("x86 generic MSR driver");
MODULE_LICENSE("GPL");
--
2.21.0

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: [tip: x86/misc] x86/msr: Filter MSR writes

The following commit has been merged into the x86/misc branch of tip:

Commit-ID: 5603119e48d0fee163a827c2342b372f1a0e913c
Gitweb: https://git.kernel.org/tip/5603119e48d0fee163a827c2342b372f1a0e913c
Author: Borislav Petkov <[email protected]>
AuthorDate: Wed, 10 Jun 2020 21:37:49 +02:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Wed, 17 Jun 2020 15:46:52 +02:00

x86/msr: Filter MSR writes

Add functionality to disable writing to MSRs from userspace. Writes can
still be allowed by supplying the allow_writes=on module parameter. The
kernel will be tainted so that it shows in oopses.

Having unfettered access to all MSRs on a system is and has always been
a disaster waiting to happen. Think performance counter MSRs, MSRs with
sticky or locked bits, MSRs making major system changes like loading
microcode, MTRRs, PAT configuration, TSC counter, security mitigations
MSRs, you name it.

This also destroys all the kernel's caching of MSR values for
performance, as the recent case with MSR_AMD64_LS_CFG showed.

Another example is writing MSRs by mistake by simply typing the wrong
MSR address. System freezes have been experienced that way.

In general, poking at MSRs under the kernel's feet is a bad bad idea.

So log writing to MSRs by default. Longer term, such writes will be
disabled by default.

If userspace still wants to do that, then proper interfaces should be
defined which are under the kernel's control and accesses to those MSRs
can be synchronized and sanitized properly.

[ Fix sparse warnings. ]
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/msr.c | 69 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 69 insertions(+)

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index 1547be3..576c43e 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -42,6 +42,14 @@
static struct class *msr_class;
static enum cpuhp_state cpuhp_msr_state;

+enum allow_write_msrs {
+ MSR_WRITES_ON,
+ MSR_WRITES_OFF,
+ MSR_WRITES_DEFAULT,
+};
+
+static enum allow_write_msrs allow_writes = MSR_WRITES_DEFAULT;
+
static ssize_t msr_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
@@ -70,6 +78,24 @@ static ssize_t msr_read(struct file *file, char __user *buf,
return bytes ? bytes : err;
}

+static int filter_write(u32 reg)
+{
+ switch (allow_writes) {
+ case MSR_WRITES_ON: return 0; break;
+ case MSR_WRITES_OFF: return -EPERM; break;
+ default: break;
+ }
+
+ if (reg == MSR_IA32_ENERGY_PERF_BIAS)
+ return 0;
+
+ pr_err_ratelimited("Write to unrecognized MSR 0x%x by %s\n"
+ "Please report to [email protected]\n",
+ reg, current->comm);
+
+ return 0;
+}
+
static ssize_t msr_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
@@ -84,6 +110,10 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
if (err)
return err;

+ err = filter_write(reg);
+ if (err)
+ return err;
+
if (count % 8)
return -EINVAL; /* Invalid chunk size */

@@ -92,9 +122,13 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
err = -EFAULT;
break;
}
+
+ add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
+
err = wrmsr_safe_on_cpu(cpu, reg, data[0], data[1]);
if (err)
break;
+
tmp += 2;
bytes += 8;
}
@@ -242,6 +276,41 @@ static void __exit msr_exit(void)
}
module_exit(msr_exit)

+static int set_allow_writes(const char *val, const struct kernel_param *cp)
+{
+ /* val is NUL-terminated, see kernfs_fop_write() */
+ char *s = strstrip((char *)val);
+
+ if (!strcmp(s, "on"))
+ allow_writes = MSR_WRITES_ON;
+ else if (!strcmp(s, "off"))
+ allow_writes = MSR_WRITES_OFF;
+ else
+ allow_writes = MSR_WRITES_DEFAULT;
+
+ return 0;
+}
+
+static int get_allow_writes(char *buf, const struct kernel_param *kp)
+{
+ const char *res;
+
+ switch (allow_writes) {
+ case MSR_WRITES_ON: res = "on"; break;
+ case MSR_WRITES_OFF: res = "off"; break;
+ default: res = "default"; break;
+ }
+
+ return sprintf(buf, "%s\n", res);
+}
+
+static const struct kernel_param_ops allow_writes_ops = {
+ .set = set_allow_writes,
+ .get = get_allow_writes
+};
+
+module_param_cb(allow_writes, &allow_writes_ops, NULL, 0600);
+
MODULE_AUTHOR("H. Peter Anvin <[email protected]>");
MODULE_DESCRIPTION("x86 generic MSR driver");
MODULE_LICENSE("GPL");

2020-06-25 06:30:07

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Mon, Jun 15, 2020 at 08:38:37AM +0200, Borislav Petkov wrote:
> Here's an improved v2 with sparse warnings fixed:
>
> ---
> Disable writing to MSRs from userspace by default. Writes can still be
> allowed by supplying the allow_writes=1 module parameter and the kernel
> will be tainted so that it shows in oopses.
>
> Having unfettered access to all MSRs on a system is and has always been
> a disaster waiting to happen. Think performance counter MSRs, MSRs with
> sticky or locked bits, MSRs making major system changes like loading
> microcode, MTRRs, PAT configuration, TSC counter, security mitigations
> MSRs, you name it.
>
> This also destroys all the kernel's caching of MSR values for
> performance, as the recent case with MSR_AMD64_LS_CFG showed.
>
> Another example is writing MSRs by mistake by simply typing the wrong
> MSR address. System freezes have been experienced that way.
>
> In general, poking at MSRs under the kernel's feet is a bad bad idea.
>
> So disable poking directly at the MSRs by default. If userspace still
> wants to do that, then proper interfaces should be defined which
> are under the kernel's control and accesses to those MSRs can be
> synchronized and sanitized properly.
>
> Changelog:
> - taint before WRMSR, all
> - make param 0600, Sean.
> - do not deny but log writes by default, Linus.
>
> [ Fix sparse warnings ]
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>

A few non-functional nits below.

Tested-by: Sean Christopherson <[email protected]>

> ---
> arch/x86/kernel/msr.c | 69 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 69 insertions(+)
>
> diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
> index 1547be359d7f..576c43e39247 100644
> --- a/arch/x86/kernel/msr.c
> +++ b/arch/x86/kernel/msr.c
> @@ -42,6 +42,14 @@
> static struct class *msr_class;
> static enum cpuhp_state cpuhp_msr_state;
>
> +enum allow_write_msrs {
> + MSR_WRITES_ON,
> + MSR_WRITES_OFF,
> + MSR_WRITES_DEFAULT,
> +};
> +
> +static enum allow_write_msrs allow_writes = MSR_WRITES_DEFAULT;
> +
> static ssize_t msr_read(struct file *file, char __user *buf,
> size_t count, loff_t *ppos)
> {
> @@ -70,6 +78,24 @@ static ssize_t msr_read(struct file *file, char __user *buf,
> return bytes ? bytes : err;
> }
>
> +static int filter_write(u32 reg)
> +{
> + switch (allow_writes) {
> + case MSR_WRITES_ON: return 0; break;
> + case MSR_WRITES_OFF: return -EPERM; break;

The breaks after the returns are unnecessary.

> + default: break;
> + }
> +
> + if (reg == MSR_IA32_ENERGY_PERF_BIAS)
> + return 0;
> +
> + pr_err_ratelimited("Write to unrecognized MSR 0x%x by %s\n"
> + "Please report to [email protected]\n",
> + reg, current->comm);


Maybe s/unrecognized/unauthorized? Unrecognized implies the kernel doesn't
know anything about the MSR being written, which may not hold true.

> + return 0;
> +}
> +
> static ssize_t msr_write(struct file *file, const char __user *buf,
> size_t count, loff_t *ppos)
> {
> @@ -84,6 +110,10 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
> if (err)
> return err;
>
> + err = filter_write(reg);
> + if (err)
> + return err;
> +
> if (count % 8)
> return -EINVAL; /* Invalid chunk size */
>
> @@ -92,9 +122,13 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
> err = -EFAULT;
> break;
> }
> +
> + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
> +
> err = wrmsr_safe_on_cpu(cpu, reg, data[0], data[1]);
> if (err)
> break;
> +

Random leftover whitespace change.

> tmp += 2;
> bytes += 8;
> }

2020-06-25 09:45:09

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Wed, Jun 24, 2020 at 10:51:40PM -0700, Sean Christopherson wrote:
> Maybe s/unrecognized/unauthorized? Unrecognized implies the kernel doesn't
> know anything about the MSR being written, which may not hold true.

"unrecognized" in the sense that it is not on the whitelist above. It
contains only one MSR and when the patches for the interface to that MSR
get upstreamed, it won't be needed anymore.

But this all is WIP anyway so we'll adjust that printk when we know
whether we wanna have a whitelist at all...

> Random leftover whitespace change.

That's spreading out the statements for better readability. That file
could use more spreading out too.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: [tip: x86/misc] x86/msr: Filter MSR writes

The following commit has been merged into the x86/misc branch of tip:

Commit-ID: a7e1f67ed29f0c339e2aa7483d13b085127566ab
Gitweb: https://git.kernel.org/tip/a7e1f67ed29f0c339e2aa7483d13b085127566ab
Author: Borislav Petkov <[email protected]>
AuthorDate: Wed, 10 Jun 2020 21:37:49 +02:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 25 Jun 2020 10:39:02 +02:00

x86/msr: Filter MSR writes

Add functionality to disable writing to MSRs from userspace. Writes can
still be allowed by supplying the allow_writes=on module parameter. The
kernel will be tainted so that it shows in oopses.

Having unfettered access to all MSRs on a system is and has always been
a disaster waiting to happen. Think performance counter MSRs, MSRs with
sticky or locked bits, MSRs making major system changes like loading
microcode, MTRRs, PAT configuration, TSC counter, security mitigations
MSRs, you name it.

This also destroys all the kernel's caching of MSR values for
performance, as the recent case with MSR_AMD64_LS_CFG showed.

Another example is writing MSRs by mistake by simply typing the wrong
MSR address. System freezes have been experienced that way.

In general, poking at MSRs under the kernel's feet is a bad bad idea.

So log writing to MSRs by default. Longer term, such writes will be
disabled by default.

If userspace still wants to do that, then proper interfaces should be
defined which are under the kernel's control and accesses to those MSRs
can be synchronized and sanitized properly.

[ Fix sparse warnings. ]
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Tested-by: Sean Christopherson <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/msr.c | 69 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 69 insertions(+)

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index 1547be3..49dcfb8 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -42,6 +42,14 @@
static struct class *msr_class;
static enum cpuhp_state cpuhp_msr_state;

+enum allow_write_msrs {
+ MSR_WRITES_ON,
+ MSR_WRITES_OFF,
+ MSR_WRITES_DEFAULT,
+};
+
+static enum allow_write_msrs allow_writes = MSR_WRITES_DEFAULT;
+
static ssize_t msr_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
@@ -70,6 +78,24 @@ static ssize_t msr_read(struct file *file, char __user *buf,
return bytes ? bytes : err;
}

+static int filter_write(u32 reg)
+{
+ switch (allow_writes) {
+ case MSR_WRITES_ON: return 0;
+ case MSR_WRITES_OFF: return -EPERM;
+ default: break;
+ }
+
+ if (reg == MSR_IA32_ENERGY_PERF_BIAS)
+ return 0;
+
+ pr_err_ratelimited("Write to unrecognized MSR 0x%x by %s\n"
+ "Please report to [email protected]\n",
+ reg, current->comm);
+
+ return 0;
+}
+
static ssize_t msr_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
@@ -84,6 +110,10 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
if (err)
return err;

+ err = filter_write(reg);
+ if (err)
+ return err;
+
if (count % 8)
return -EINVAL; /* Invalid chunk size */

@@ -92,9 +122,13 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
err = -EFAULT;
break;
}
+
+ add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
+
err = wrmsr_safe_on_cpu(cpu, reg, data[0], data[1]);
if (err)
break;
+
tmp += 2;
bytes += 8;
}
@@ -242,6 +276,41 @@ static void __exit msr_exit(void)
}
module_exit(msr_exit)

+static int set_allow_writes(const char *val, const struct kernel_param *cp)
+{
+ /* val is NUL-terminated, see kernfs_fop_write() */
+ char *s = strstrip((char *)val);
+
+ if (!strcmp(s, "on"))
+ allow_writes = MSR_WRITES_ON;
+ else if (!strcmp(s, "off"))
+ allow_writes = MSR_WRITES_OFF;
+ else
+ allow_writes = MSR_WRITES_DEFAULT;
+
+ return 0;
+}
+
+static int get_allow_writes(char *buf, const struct kernel_param *kp)
+{
+ const char *res;
+
+ switch (allow_writes) {
+ case MSR_WRITES_ON: res = "on"; break;
+ case MSR_WRITES_OFF: res = "off"; break;
+ default: res = "default"; break;
+ }
+
+ return sprintf(buf, "%s\n", res);
+}
+
+static const struct kernel_param_ops allow_writes_ops = {
+ .set = set_allow_writes,
+ .get = get_allow_writes
+};
+
+module_param_cb(allow_writes, &allow_writes_ops, NULL, 0600);
+
MODULE_AUTHOR("H. Peter Anvin <[email protected]>");
MODULE_DESCRIPTION("x86 generic MSR driver");
MODULE_LICENSE("GPL");

2020-07-14 12:20:31

by Chris Down

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

Hi Borislav,

This is certainly a good idea, but I wonder whether we should be more pragmatic
about the printk ratelimiting while we give userspace time to react and update
their methodologies.

As one example, there is a common MSR hack which is verging on essential if
you're doing thermally intensive work on some recent ThinkPads[0][1], and this
drastically reduces the signal-to-noise ratio in kmsg (and this is only about
five minutes after boot):

% dmesg | wc -l
2963
% dmesg | grep -c 'unrecognized MSR'
2411

That is, even with pr_err_ratelimited, we still end up logging on basically
every single write, even though it's from the same TGID writing to the same
MSRs, and end up becoming >80% of kmsg.

Of course, one can boot with `allow_writes=1` to avoid these messages at all,
but that then has the downfall that one doesn't get _any_ notification at all
about these problems in the first place, and so is much less likely to forget
to fix it. One might rather it was less binary: it was still logged, just less
often, so that application developers _do_ have the incentive to improve their
current methods, without us having to push other useful stuff out of the kmsg
buffer.

This one example isn't the point, of course: I'm sure there are plenty of other
non-ideal-but-pragmatic cases where people are writing to MSRs from userspace
right now, and it will take time for those people to find other solutions.

I completely agree with you that there should be a better solution for these
cases, and that writing to MSRs from userspace is really not a good idea.
However, going from zero to over 80% of dmesg in cases where these MSRs are
repeatedly used seems too fast to me.

Have you considered perhaps making the ramping up of error logging more gradual
by having this printk have its own, more conservative `struct ratelimit_state`,
as we do in some other places with similar noise concerns? Then we could
gradually make the warnings more aggressive as time goes on, up until the point
where we make allow_writes=0 the default.

Thanks,

Chris

0: Lenovo is supposedly fixing this since last year, but no news yet.
1: https://github.com/erpalma/throttled

2020-07-14 15:50:26

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Tue, Jul 14, 2020 at 01:19:55PM +0100, Chris Down wrote:
> That is, even with pr_err_ratelimited, we still end up logging on basically
> every single write, even though it's from the same TGID writing to the same
> MSRs, and end up becoming >80% of kmsg.
>
> Of course, one can boot with `allow_writes=1` to avoid these messages at

Yes, use that.

From a quick scan over that "tool" you pointed me at, it pokes at some
MSRs from userspace which the kernel *also* writes to and this is
exactly what should not be allowed.

As to the "MSR hack", please describe what the issue is exactly so that
we can get the proper people involved. If anything, this needs to be
fixed in the kernel properly. If people are waiting for a year for a
BIOS fix, I'd say there's a very slim chance for that to ever happen...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-07-14 16:06:23

by Chris Down

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

Borislav Petkov writes:
>On Tue, Jul 14, 2020 at 01:19:55PM +0100, Chris Down wrote:
>> That is, even with pr_err_ratelimited, we still end up logging on basically
>> every single write, even though it's from the same TGID writing to the same
>> MSRs, and end up becoming >80% of kmsg.
>>
>> Of course, one can boot with `allow_writes=1` to avoid these messages at
>
>Yes, use that.
>
>From a quick scan over that "tool" you pointed me at, it pokes at some
>MSRs from userspace which the kernel *also* writes to and this is
>exactly what should not be allowed.

I don't think we're in disagreement about that. My concern is strictly about
the amount of spam caused for some of those existing use cases during the
transition phase. People should know that their tools would break, but there
shouldn't be so many messages generated that it inevitably pushes other useful
information out of the kmsg buffer.

>As to the "MSR hack", please describe what the issue is exactly so that
>we can get the proper people involved. If anything, this needs to be
>fixed in the kernel properly. If people are waiting for a year for a
>BIOS fix, I'd say there's a very slim chance for that to ever happen...

Since the issue involves DPTF which is only supported via binary blobs, I can't
say for certain what the issue is. As I understand it, when the throttling
behaviour isn't explicitly configured by the OS kernel, the default policy is
extremely overeager. Matthew also had a look at it[0], but I don't know if
anything eventually happened there. I've cc'ed him.

Either way, again, this isn't really the point. :-) The point is that there
_are_ currently widespread cases involving poking MSRs from userspace, however
sacrilegious or ugly (which I agree with!), and while people should be told
about that, it's excessive to have the potential to take up 80% of kmsg in the
default configuration. It doesn't take thousands of messages to get the message
across, that's what a custom printk ratelimit is for.

0: https://twitter.com/mjg59/status/1034132444201582596

2020-07-14 16:48:44

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Tue, Jul 14, 2020 at 05:04:48PM +0100, Chris Down wrote:
> Borislav Petkov writes:
> > On Tue, Jul 14, 2020 at 01:19:55PM +0100, Chris Down wrote:
> > > That is, even with pr_err_ratelimited, we still end up logging on basically
> > > every single write, even though it's from the same TGID writing to the same
> > > MSRs, and end up becoming >80% of kmsg.
> > >
> > > Of course, one can boot with `allow_writes=1` to avoid these messages at
> >
> > Yes, use that.
> >
> > From a quick scan over that "tool" you pointed me at, it pokes at some
> > MSRs from userspace which the kernel *also* writes to and this is
> > exactly what should not be allowed.
>
> I don't think we're in disagreement about that. My concern is strictly about
> the amount of spam caused for some of those existing use cases during the
> transition phase. People should know that their tools would break, but there
> shouldn't be so many messages generated that it inevitably pushes other
> useful information out of the kmsg buffer.

Maybe we just need smarter filtering of warnings. It doesn't
seem at all useful to warn for the same MSR 1000's of times.
Maybe keep a count of warnings for each MSR and just stop
all reports when reach a threshold?

-Tony

2020-07-14 16:58:57

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Tue, Jul 14, 2020 at 09:46:12AM -0700, Luck, Tony wrote:
> Maybe we just need smarter filtering of warnings. It doesn't
> seem at all useful to warn for the same MSR 1000's of times.
> Maybe keep a count of warnings for each MSR and just stop
> all reports when reach a threshold?

No, not stop - just make them more rare.

And this is where the bikeshedding starts about how rare.

/me runs away to the beach and lets the others fight it out.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-07-14 16:59:30

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Tue, Jul 14, 2020 at 05:04:48PM +0100, Chris Down wrote:
> Since the issue involves DPTF which is only supported via binary blobs, I
> can't say for certain what the issue is. As I understand it, when the
> throttling behaviour isn't explicitly configured by the OS kernel, the
> default policy is extremely overeager. Matthew also had a look at it[0], but
> I don't know if anything eventually happened there. I've cc'ed him.
>
> Either way, again, this isn't really the point. :-) The point is that there
> _are_ currently widespread cases involving poking MSRs from userspace,
> however sacrilegious or ugly (which I agree with!), and while people should
> be told about that, it's excessive to have the potential to take up 80% of
> kmsg in the default configuration. It doesn't take thousands of messages to
> get the message across, that's what a custom printk ratelimit is for.

Ok, feel free to suggest a fix, better yet send a patch. Otherwise,
you'd have to wait for my vacation to end first. :-)

> 0: https://twitter.com/mjg59/status/1034132444201582596

As to the power issue, lemme CC some Intel folks I found in MAINTAINERS.

Intel folks, pls check the link above and upthread: Why TF do people
need to use some luserspace daemon which pokes at MSRs which the kernel
writes to too, in order to bypass some apparently too conservative
throttling, AFAIU?

And why does this work on windoze reportedly?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-07-14 17:05:42

by Chris Down

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

Luck, Tony writes:
>On Tue, Jul 14, 2020 at 05:04:48PM +0100, Chris Down wrote:
>> Borislav Petkov writes:
>> > On Tue, Jul 14, 2020 at 01:19:55PM +0100, Chris Down wrote:
>> > > That is, even with pr_err_ratelimited, we still end up logging on basically
>> > > every single write, even though it's from the same TGID writing to the same
>> > > MSRs, and end up becoming >80% of kmsg.
>> > >
>> > > Of course, one can boot with `allow_writes=1` to avoid these messages at
>> >
>> > Yes, use that.
>> >
>> > From a quick scan over that "tool" you pointed me at, it pokes at some
>> > MSRs from userspace which the kernel *also* writes to and this is
>> > exactly what should not be allowed.
>>
>> I don't think we're in disagreement about that. My concern is strictly about
>> the amount of spam caused for some of those existing use cases during the
>> transition phase. People should know that their tools would break, but there
>> shouldn't be so many messages generated that it inevitably pushes other
>> useful information out of the kmsg buffer.
>
>Maybe we just need smarter filtering of warnings. It doesn't
>seem at all useful to warn for the same MSR 1000's of times.
>Maybe keep a count of warnings for each MSR and just stop
>all reports when reach a threshold?

That also a fine good solution, albeit more complex than just using the
existing custom ratelimit_state infrastructure. Doing so probably also means
we'd miss out on some of the other stuff that comes for free with it.

My only other concern with ratelimiting per-TGID or per-MSR was that the
ratelimit cache table could become unwieldy, but if we keep it simple by
limiting the size and not printing after we reach that, that sounds fine too.

Any solution which means that we avoid saturating kmsg for workloads which
currently twiddle MSRs sounds fine to me. People should know that we don't
support or encourage this, but it shouldn't be at the cost of potentially
pushing everything else out of the kmsg buffer.

2020-07-14 17:06:06

by Chris Down

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

Borislav Petkov writes:
>On Tue, Jul 14, 2020 at 05:04:48PM +0100, Chris Down wrote:
>> Since the issue involves DPTF which is only supported via binary blobs, I
>> can't say for certain what the issue is. As I understand it, when the
>> throttling behaviour isn't explicitly configured by the OS kernel, the
>> default policy is extremely overeager. Matthew also had a look at it[0], but
>> I don't know if anything eventually happened there. I've cc'ed him.
>>
>> Either way, again, this isn't really the point. :-) The point is that there
>> _are_ currently widespread cases involving poking MSRs from userspace,
>> however sacrilegious or ugly (which I agree with!), and while people should
>> be told about that, it's excessive to have the potential to take up 80% of
>> kmsg in the default configuration. It doesn't take thousands of messages to
>> get the message across, that's what a custom printk ratelimit is for.
>
>Ok, feel free to suggest a fix, better yet send a patch. Otherwise,
>you'd have to wait for my vacation to end first. :-)

Sure thing, I'll send a patch tomorrow, then. :-)

2020-07-14 18:56:56

by srinivas pandruvada

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Tue, 2020-07-14 at 18:56 +0200, Borislav Petkov wrote:
> On Tue, Jul 14, 2020 at 05:04:48PM +0100, Chris Down wrote:
> > Since the issue involves DPTF which is only supported via binary
> > blobs, I
> > can't say for certain what the issue is. As I understand it, when
> > the
> > throttling behaviour isn't explicitly configured by the OS kernel,
> > the
> > default policy is extremely overeager. Matthew also had a look at
> > it[0], but
> > I don't know if anything eventually happened there. I've cc'ed him.
> >
> > Either way, again, this isn't really the point. :-) The point is
> > that there
> > _are_ currently widespread cases involving poking MSRs from
> > userspace,
> > however sacrilegious or ugly (which I agree with!), and while
> > people should
> > be told about that, it's excessive to have the potential to take up
> > 80% of
> > kmsg in the default configuration. It doesn't take thousands of
> > messages to
> > get the message across, that's what a custom printk ratelimit is
> > for.
>
> Ok, feel free to suggest a fix, better yet send a patch. Otherwise,
> you'd have to wait for my vacation to end first. :-)
>
> > 0: https://twitter.com/mjg59/status/1034132444201582596
>
> As to the power issue, lemme CC some Intel folks I found in
> MAINTAINERS.
>
> Intel folks, pls check the link above and upthread: Why TF do people
> need to use some luserspace daemon which pokes at MSRs which the
> kernel
> writes to too, in order to bypass some apparently too conservative
> throttling, AFAIU?
For issues related to thermal or power, we don't expect to poke MSRs
from user space by any daemon. We have sysfs interfaces for the
required controls. This is also true for controls via MMIO space.
Anytime if it is safe to add, we are adding controls via sysfs.

The tool in question from the link (not from Intel), when developed may
not have TCC or RAPL-MMIO controls via sysfs. We have sysfs interfaces
for a while. They can send email to me to justify other controls if
any.

Only time direct MSR access is required is for debug tools like
turbostat.

>
> And why does this work on windoze reportedly?
This is not related to MSR or MMIO. This is related to some ACPI
tables. In Linux, thermald will adjust these knobs like Windows. It was
missing some ACPI details, which Matthew Garrett submitted patches to
kernel and getting merged with 5.8 series.

Thanks,
Srinivas


2020-07-14 19:18:51

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Tue, Jul 14, 2020 at 9:04 AM Chris Down <[email protected]> wrote:
> Either way, again, this isn't really the point. :-) The point is that there
> _are_ currently widespread cases involving poking MSRs from userspace, however
> sacrilegious or ugly (which I agree with!), and while people should be told
> about that, it's excessive to have the potential to take up 80% of kmsg in the
> default configuration. It doesn't take thousands of messages to get the message
> across, that's what a custom printk ratelimit is for.

Agreed - we should now offer all the necessary interfaces to avoid
userspace having to hit MSRs directly for thermal management, but that
wasn't always the case, and as a result there's tooling that still
behaves this way.

2020-07-15 04:27:57

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Tue, Jul 14, 2020 at 11:52:47AM -0700, Srinivas Pandruvada wrote:
> > As to the power issue, lemme CC some Intel folks I found in
> > MAINTAINERS.
> >
> > Intel folks, pls check the link above and upthread: Why TF do people
> > need to use some luserspace daemon which pokes at MSRs which the
> > kernel
> > writes to too, in order to bypass some apparently too conservative
> > throttling, AFAIU?
> For issues related to thermal or power, we don't expect to poke MSRs
> from user space by any daemon. We have sysfs interfaces for the
> required controls. This is also true for controls via MMIO space.
> Anytime if it is safe to add, we are adding controls via sysfs.
>
> The tool in question from the link (not from Intel), when developed may
> not have TCC or RAPL-MMIO controls via sysfs. We have sysfs interfaces
> for a while. They can send email to me to justify other controls if
> any.

CCed. (I think I got the right email from the repo).

Francesco, see the whole thread starting here:

https://lkml.kernel.org/r/[email protected]

> > And why does this work on windoze reportedly?
> This is not related to MSR or MMIO. This is related to some ACPI
> tables. In Linux, thermald will adjust these knobs like Windows. It was
> missing some ACPI details, which Matthew Garrett submitted patches to
> kernel and getting merged with 5.8 series.

Good.

Which means that that throttled tool could do the same thing thermald is
doing so that they're all on the same page. Or simply not do anything
and tell users to install thermald instead.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

Hello all,

On Tue, Jul 14, 2020 at 12:17:50PM -0700, Matthew Garrett wrote:
> On Tue, Jul 14, 2020 at 9:04 AM Chris Down <[email protected]> wrote:
> > Either way, again, this isn't really the point. :-) The point is that there
> > _are_ currently widespread cases involving poking MSRs from userspace, however
> > sacrilegious or ugly (which I agree with!), and while people should be told
> > about that, it's excessive to have the potential to take up 80% of kmsg in the
> > default configuration. It doesn't take thousands of messages to get the message
> > across, that's what a custom printk ratelimit is for.

> Agreed - we should now offer all the necessary interfaces to avoid
> userspace having to hit MSRs directly for thermal management, but that
> wasn't always the case, and as a result there's tooling that still
> behaves this way.

I'm late to the party but it seems allowing MSR_IA32_ENERGY_PERF_BIAS
has the downside of flagging the kernel as tainted without telling you
why if you use something like x86_energy_perf_policy (from
tools/power/x86/x86_energy_perf_policy) which itself is used by tuned.

I can taint my kernel manually by just running:
x86_energy_perf_policy -c all performance

The net impact is an OOPS triggered on such kernel won't necessarily be
read by anyone nor analyzed by reporting tools as the kernel is now
considered tainted.

For instance abrt reports the following:
===========8<===========8<===========8<===========8<===========8<===========8<
A kernel problem occurred, but your kernel has been tainted (flags:GS).
Explanation:
S - SMP with CPUs not designed for SMP.
Kernel maintainers are unable to diagnose tainted reports.
===========8<===========8<===========8<===========8<===========8<===========8<

To add to the confusion, kernel documentation
(Documentation/admin-guide/tainted-kernels.rst) is not up to date so
while looking for an explanation, one gets to wonder how what used to be
a regular average computer can now be classified as something using "an
officially SMP incapable processor"...

So while both documentation and tools should be updated as to be clearer
and to not taint the kernel respectively, there's something that remains
to be done to explain why or how the kernel got tainted because of
poking into MSRs...

--
Mathieu Chouquet-Stringer
The sun itself sees not till heaven clears.
-- William Shakespeare --

2020-11-17 21:25:07

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Tue, Nov 17, 2020 at 10:00:18PM +0100, Mathieu Chouquet-Stringer wrote:
> I'm late to the party but it seems allowing MSR_IA32_ENERGY_PERF_BIAS
> has the downside of flagging the kernel as tainted without telling you
> why if you use something like x86_energy_perf_policy (from
> tools/power/x86/x86_energy_perf_policy) which itself is used by tuned.

Not for long:

https://git.kernel.org/tip/fe0a5788624c8b8f113a35bbe4636e37f9321241

> So while both documentation and tools should be updated as to be clearer
> and to not taint the kernel respectively, there's something that remains
> to be done to explain why or how the kernel got tainted because of
> poking into MSRs...

Because if you poke at random MSRs and you manage to "configure" your
CPU to run "out of spec" - this is what the taint flag is called:
TAINT_CPU_OUT_OF_SPEC - then this is exactly the case you've created: a
CPU executing outside of specifications.

I agree with the update-the-documentation aspect - S does not mean only
SMP kernel on !SMP-capable CPU but the more general, CPU is out of spec.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-11-17 21:25:48

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Tue, Nov 17, 2020 at 1:21 PM Matthew Garrett <[email protected]> wrote:
>
> On Tue, Nov 17, 2020 at 1:00 PM Mathieu Chouquet-Stringer
> <[email protected]> wrote:
>
> > I'm late to the party but it seems allowing MSR_IA32_ENERGY_PERF_BIAS
> > has the downside of flagging the kernel as tainted without telling you
> > why if you use something like x86_energy_perf_policy (from
> > tools/power/x86/x86_energy_perf_policy) which itself is used by tuned.
>
> I initially pushed back against a kernel interface for
> MSR_IA32_ENERGY_PERF_BIAS (cc: Len Brown, who tried mightily to
> convince me I was wrong here) on the grounds that it was exporting an
> implementation detail rather than providing a generic interface, and
> that it was something that could be done via userland instead. I
> thought we'd end up with more examples of similar functionality and
> could tie it into something more reasonable - history has proven me
> wrong on that. I think it's probably reasonable to dust off the driver
> that Len submitted however many years ago and push that into the
> kernel now.

But ha ok based on Borislav's response it looks like someone's already
done that.

2020-11-17 21:26:21

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Tue, Nov 17, 2020 at 1:00 PM Mathieu Chouquet-Stringer
<[email protected]> wrote:

> I'm late to the party but it seems allowing MSR_IA32_ENERGY_PERF_BIAS
> has the downside of flagging the kernel as tainted without telling you
> why if you use something like x86_energy_perf_policy (from
> tools/power/x86/x86_energy_perf_policy) which itself is used by tuned.

I initially pushed back against a kernel interface for
MSR_IA32_ENERGY_PERF_BIAS (cc: Len Brown, who tried mightily to
convince me I was wrong here) on the grounds that it was exporting an
implementation detail rather than providing a generic interface, and
that it was something that could be done via userland instead. I
thought we'd end up with more examples of similar functionality and
could tie it into something more reasonable - history has proven me
wrong on that. I think it's probably reasonable to dust off the driver
that Len submitted however many years ago and push that into the
kernel now.

Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Tue, Nov 17, 2020 at 10:20:16PM +0100, Borislav Petkov wrote:
> Not for long:
>
> https://git.kernel.org/tip/fe0a5788624c8b8f113a35bbe4636e37f9321241

That's fantastic.

> Because if you poke at random MSRs and you manage to "configure" your
> CPU to run "out of spec" - this is what the taint flag is called:
> TAINT_CPU_OUT_OF_SPEC - then this is exactly the case you've created: a
> CPU executing outside of specifications.

Don't get me wrong, it makes total sense to do that, it's just the
original reason of !SMP-capable isn't so clear while CPU out of spec is.

--
Mathieu Chouquet-Stringer [email protected]
The sun itself sees not till heaven clears.
-- William Shakespeare --

Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Tue, Nov 17, 2020 at 01:22:49PM -0800, Matthew Garrett wrote:
> > MSR_IA32_ENERGY_PERF_BIAS (cc: Len Brown, who tried mightily to
> > convince me I was wrong here) on the grounds that it was exporting an
> > implementation detail rather than providing a generic interface, and
> > that it was something that could be done via userland instead. I
> > thought we'd end up with more examples of similar functionality and
> > could tie it into something more reasonable - history has proven me
> > wrong on that. I think it's probably reasonable to dust off the driver
> > that Len submitted however many years ago and push that into the
> > kernel now.
>
> But ha ok based on Borislav's response it looks like someone's already
> done that.

Indeed, all is good. Just have to wait for it to be merged and the
proper kernel to be released...

--
Mathieu Chouquet-Stringer [email protected]
The sun itself sees not till heaven clears.
-- William Shakespeare --

Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Tue, Nov 17, 2020 at 10:20:16PM +0100, Borislav Petkov wrote:
> I agree with the update-the-documentation aspect - S does not mean only
> SMP kernel on !SMP-capable CPU but the more general, CPU is out of spec.

Speaking of doc, looking at the patches you submitted, I didn't see any
update to the documentation. Would you like me to create a patch for
that?

--
Mathieu Chouquet-Stringer [email protected]
The sun itself sees not till heaven clears.
-- William Shakespeare --

2020-11-18 11:53:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH -v2.1] x86/msr: Filter MSR writes

On Wed, Nov 18, 2020 at 10:09:29AM +0100, Mathieu Chouquet-Stringer wrote:
> Speaking of doc, looking at the patches you submitted, I didn't see any
> update to the documentation. Would you like me to create a patch for
> that?

Sure, that would be appreciated.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: [PATCH] x86/msr: Filter MSR writes

On Wed, Nov 18, 2020 at 12:50:27PM +0100, Borislav Petkov wrote:
> On Wed, Nov 18, 2020 at 10:09:29AM +0100, Mathieu Chouquet-Stringer wrote:
> > Speaking of doc, looking at the patches you submitted, I didn't see any
> > update to the documentation. Would you like me to create a patch for
> > that?
>
> Sure, that would be appreciated.

Here you go, let me know if I got that right...

---
TAINT_CPU_OUT_OF_SPEC now means what it says. Historically it was for
SMP kernel oops on an officially SMP incapable processor but now it also
covers CPUs whose MSRs have been incorrectly poked at. Update
documentation and script to reflect that.

Signed-off-by: Mathieu Chouquet-Stringer <[email protected]>
---
Documentation/admin-guide/tainted-kernels.rst | 11 ++++++-----
tools/debugging/kernel-chktaint | 2 +-
2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index f718a2eaf1f6..95f432c43ba0 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -84,7 +84,7 @@ Bit Log Number Reason that got the kernel tainted
=== === ====== ========================================================
0 G/P 1 proprietary module was loaded
1 _/F 2 module was force loaded
- 2 _/S 4 SMP kernel oops on an officially SMP incapable processor
+ 2 _/S 4 kernel running on out of spec processor
3 _/R 8 module was force unloaded
4 _/M 16 processor reported a Machine Check Exception (MCE)
5 _/B 32 bad page referenced or some unexpected page flags
@@ -116,10 +116,11 @@ More detailed explanation for tainting
1) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all
modules were loaded normally.

- 2) ``S`` if the oops occurred on an SMP kernel running on hardware that
- hasn't been certified as safe to run multiprocessor.
- Currently this occurs only on various Athlons that are not
- SMP capable.
+ 2) ``S`` if the kernel is running on any processor that is out of
+ specifications (writing to MSRs will trigger this behavior).
+ Historically, it could also be if an oops occured on a kernel running on
+ hardware that hasn't been certified as safe to run multiprocessor, such
+ as various Athlons that are not SMP capable.

3) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all
modules were unloaded normally.
diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint
index 2240cb56e6e5..0b9d93e27910 100755
--- a/tools/debugging/kernel-chktaint
+++ b/tools/debugging/kernel-chktaint
@@ -72,7 +72,7 @@ if [ `expr $T % 2` -eq 0 ]; then
addout " "
else
addout "S"
- echo " * SMP kernel oops on an officially SMP incapable processor (#2)"
+ echo " * kernel running on out of spec processor (#2)"
fi

T=`expr $T / 2`

2020-11-18 17:53:07

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/msr: Filter MSR writes

On Wed, Nov 18, 2020 at 03:04:27PM +0100, Mathieu Chouquet-Stringer wrote:
> TAINT_CPU_OUT_OF_SPEC now means what it says. Historically it was for
> SMP kernel oops on an officially SMP incapable processor but now it also
> covers CPUs whose MSRs have been incorrectly poked at. Update
> documentation and script to reflect that.
>
> Signed-off-by: Mathieu Chouquet-Stringer <[email protected]>
> ---
> Documentation/admin-guide/tainted-kernels.rst | 11 ++++++-----
> tools/debugging/kernel-chktaint | 2 +-
> 2 files changed, 7 insertions(+), 6 deletions(-)

Please fix the text in Documentation/admin-guide/sysctl/kernel.rst also.

> diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
> index f718a2eaf1f6..95f432c43ba0 100644
> --- a/Documentation/admin-guide/tainted-kernels.rst
> +++ b/Documentation/admin-guide/tainted-kernels.rst
> @@ -84,7 +84,7 @@ Bit Log Number Reason that got the kernel tainted
> === === ====== ========================================================
> 0 G/P 1 proprietary module was loaded
> 1 _/F 2 module was force loaded
> - 2 _/S 4 SMP kernel oops on an officially SMP incapable processor
> + 2 _/S 4 kernel running on out of spec processor
> 3 _/R 8 module was force unloaded
> 4 _/M 16 processor reported a Machine Check Exception (MCE)
> 5 _/B 32 bad page referenced or some unexpected page flags
> @@ -116,10 +116,11 @@ More detailed explanation for tainting
> 1) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all
> modules were loaded normally.
>
> - 2) ``S`` if the oops occurred on an SMP kernel running on hardware that
> - hasn't been certified as safe to run multiprocessor.
> - Currently this occurs only on various Athlons that are not
> - SMP capable.
> + 2) ``S`` if the kernel is running on any processor that is out of
> + specifications (writing to MSRs will trigger this behavior).

People might wonder what "out of specifications" means. I'd say
something along the lines of "the CPU has been put into a not supported
configuration, therefore proper execution cannot be guaranteed". Grep
the tree for TAINT_CPU_OUT_OF_SPEC to see when this gets set, might give
you a better idea of what to say.

> + Historically, it could also be if an oops occured on a kernel running on
> + hardware that hasn't been certified as safe to run multiprocessor, such
> + as various Athlons that are not SMP capable.

And here you can expand on the examples by saying that poking at random
MSRs from userspace is one possible way to mis-configure it.

> 3) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all
> modules were unloaded normally.
> diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint
> index 2240cb56e6e5..0b9d93e27910 100755
> --- a/tools/debugging/kernel-chktaint
> +++ b/tools/debugging/kernel-chktaint
> @@ -72,7 +72,7 @@ if [ `expr $T % 2` -eq 0 ]; then
> addout " "
> else
> addout "S"
> - echo " * SMP kernel oops on an officially SMP incapable processor (#2)"
> + echo " * kernel running on out of spec processor (#2)"

Yeah, can you think of a better formulation than "out of spec
processor"?

The CPU is fine, only its current configuration is not.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [PATCH] x86/msr: Filter MSR writes

On Wed, Nov 18, 2020 at 06:50:48PM +0100, Borislav Petkov wrote:
> Please fix the text in Documentation/admin-guide/sysctl/kernel.rst also.

Done.

> People might wonder what "out of specifications" means. I'd say
> something along the lines of "the CPU has been put into a not supported
> configuration, therefore proper execution cannot be guaranteed". Grep
> the tree for TAINT_CPU_OUT_OF_SPEC to see when this gets set, might give
> you a better idea of what to say.

Did the grep thing and it showed it's not just a CPU thing as drivers
can also set that flag hence why I use "system" instead of CPU now.

> And here you can expand on the examples by saying that poking at random
> MSRs from userspace is one possible way to mis-configure it.

I added almost all examples I found to the documentation.

> Yeah, can you think of a better formulation than "out of spec
> processor"?
>
> The CPU is fine, only its current configuration is not.

Given it can be something else than just a CPU thing which tainted the
kernel, I use out of spec system as there are too many cases to have a
clear simple short definition. I mean I've yet to find a better explanation...

So what about that patch?

---
TAINT_CPU_OUT_OF_SPEC now means what a bit more than what it implies as
the flag isn't set just because of a CPU misconfiguration or mismatch.
Historically it was for SMP kernel oops on an officially SMP incapable
processor but now it also covers CPUs whose MSRs have been incorrectly
poked at from userspace, drivers being used on non supported
architectures, broken firmware, mismatched CPUs, ...

Update documentation and script to reflect that.

Signed-off-by: Mathieu Chouquet-Stringer <[email protected]>
---
Documentation/admin-guide/sysctl/kernel.rst | 2 +-
Documentation/admin-guide/tainted-kernels.rst | 22 +++++++++++++++++-----
tools/debugging/kernel-chktaint | 2 +-
3 files changed, 19 insertions(+), 7 deletions(-)


diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index d4b32cc32bb7..edd89e2d3af7 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1336,7 +1336,7 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.
====== ===== ==============================================================
1 `(P)` proprietary module was loaded
2 `(F)` module was force loaded
- 4 `(S)` SMP kernel oops on an officially SMP incapable processor
+ 4 `(S)` kernel running on an out of specification system
8 `(R)` module was force unloaded
16 `(M)` processor reported a Machine Check Exception (MCE)
32 `(B)` bad page referenced or some unexpected page flags
diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index f718a2eaf1f6..5737dfa17cd1 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -84,7 +84,7 @@ Bit Log Number Reason that got the kernel tainted
=== === ====== ========================================================
0 G/P 1 proprietary module was loaded
1 _/F 2 module was force loaded
- 2 _/S 4 SMP kernel oops on an officially SMP incapable processor
+ 2 _/S 4 kernel running on an out of specification system
3 _/R 8 module was force unloaded
4 _/M 16 processor reported a Machine Check Exception (MCE)
5 _/B 32 bad page referenced or some unexpected page flags
@@ -116,10 +116,22 @@ More detailed explanation for tainting
1) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all
modules were loaded normally.

- 2) ``S`` if the oops occurred on an SMP kernel running on hardware that
- hasn't been certified as safe to run multiprocessor.
- Currently this occurs only on various Athlons that are not
- SMP capable.
+ 2) ``S`` if the kernel is running on a processor or system that is out of
+ specification: hardware has been put into an unsupported configuration,
+ therefore proper execution cannot be guaranteed.
+ Kernel will be tainted for example if:
+
+ - on x86: you force PAE on intel CPUs, you run a SMP kernel on non
+ officially capable SMP Athlon CPUs, you poke at random MSRs from
+ userspace.
+ - on arm: you run a kernel on certain CPUs (such as Keystone 2) without
+ having certain kernel features enabled.
+ - on arm64: you have mismatched hardware features between CPUs, the
+ bootloader has booted CPUs in different modes.
+ - you use certain drivers on non supported architectures (such as
+ scsi/snic on something else than x86_64, scsi/ips on non
+ x86/x86_64/itanium, have broken firmware settings for the
+ irqchip/irq-gic on arm64 ...).

3) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all
modules were unloaded normally.
diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint
index 2240cb56e6e5..607b2b280945 100755
--- a/tools/debugging/kernel-chktaint
+++ b/tools/debugging/kernel-chktaint
@@ -72,7 +72,7 @@ if [ `expr $T % 2` -eq 0 ]; then
addout " "
else
addout "S"
- echo " * SMP kernel oops on an officially SMP incapable processor (#2)"
+ echo " * kernel running on an out of specification system (#2)"
fi

T=`expr $T / 2`

Subject: Re: [PATCH] x86/msr: Filter MSR writes

Hey Borislav,

On Thu, Nov 19, 2020 at 11:53:44AM +0100, Mathieu Chouquet-Stringer wrote:
> On Wed, Nov 18, 2020 at 06:50:48PM +0100, Borislav Petkov wrote:
> > Please fix the text in Documentation/admin-guide/sysctl/kernel.rst also.
>
> Done.

I haven't heard from you, what did you think of my last patch?

--
Mathieu Chouquet-Stringer [email protected]
The sun itself sees not till heaven clears.
-- William Shakespeare --

2020-11-26 11:40:18

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/msr: Filter MSR writes

On Thu, Nov 19, 2020 at 11:53:44AM +0100, Mathieu Chouquet-Stringer wrote:
> ---
> TAINT_CPU_OUT_OF_SPEC now means what a bit more than what it implies as

s/now means what a bit/now means a bit/

> the flag isn't set just because of a CPU misconfiguration or mismatch.
> Historically it was for SMP kernel oops on an officially SMP incapable
> processor but now it also covers CPUs whose MSRs have been incorrectly
> poked at from userspace, drivers being used on non supported
> architectures, broken firmware, mismatched CPUs, ...
>
> Update documentation and script to reflect that.
>
> Signed-off-by: Mathieu Chouquet-Stringer <[email protected]>
> ---
> Documentation/admin-guide/sysctl/kernel.rst | 2 +-
> Documentation/admin-guide/tainted-kernels.rst | 22 +++++++++++++++++-----
> tools/debugging/kernel-chktaint | 2 +-
> 3 files changed, 19 insertions(+), 7 deletions(-)
>
>
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index d4b32cc32bb7..edd89e2d3af7 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -1336,7 +1336,7 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.
> ====== ===== ==============================================================
> 1 `(P)` proprietary module was loaded
> 2 `(F)` module was force loaded
> - 4 `(S)` SMP kernel oops on an officially SMP incapable processor
> + 4 `(S)` kernel running on an out of specification system
> 8 `(R)` module was force unloaded
> 16 `(M)` processor reported a Machine Check Exception (MCE)
> 32 `(B)` bad page referenced or some unexpected page flags
> diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
> index f718a2eaf1f6..5737dfa17cd1 100644
> --- a/Documentation/admin-guide/tainted-kernels.rst
> +++ b/Documentation/admin-guide/tainted-kernels.rst
> @@ -84,7 +84,7 @@ Bit Log Number Reason that got the kernel tainted
> === === ====== ========================================================
> 0 G/P 1 proprietary module was loaded
> 1 _/F 2 module was force loaded
> - 2 _/S 4 SMP kernel oops on an officially SMP incapable processor
> + 2 _/S 4 kernel running on an out of specification system
> 3 _/R 8 module was force unloaded
> 4 _/M 16 processor reported a Machine Check Exception (MCE)
> 5 _/B 32 bad page referenced or some unexpected page flags
> @@ -116,10 +116,22 @@ More detailed explanation for tainting
> 1) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all
> modules were loaded normally.
>
> - 2) ``S`` if the oops occurred on an SMP kernel running on hardware that
> - hasn't been certified as safe to run multiprocessor.
> - Currently this occurs only on various Athlons that are not
> - SMP capable.
> + 2) ``S`` if the kernel is running on a processor or system that is out of
> + specification: hardware has been put into an unsupported configuration,
> + therefore proper execution cannot be guaranteed.
> + Kernel will be tainted for example if:

"Kernel will be tainted if, for example:"

> +
> + - on x86: you force PAE on intel CPUs, you run a SMP kernel on non

"user forces PAE on some Intel CPUs which do not advertize it by default"
- otherwise it doesn't make sense.

Also, please use passive voice: no "we" or "I", etc. Look at how the
rest of the text in that file is written and try to mimick it and get
rid of the "you" formulations.

> + officially capable SMP Athlon CPUs, you poke at random MSRs from
> + userspace.
> + - on arm: you run a kernel on certain CPUs (such as Keystone 2) without
> + having certain kernel features enabled.
> + - on arm64: you have mismatched hardware features between CPUs, the
> + bootloader has booted CPUs in different modes.
> + - you use certain drivers on non supported architectures (such as
> + scsi/snic on something else than x86_64, scsi/ips on non
> + x86/x86_64/itanium, have broken firmware settings for the
> + irqchip/irq-gic on arm64 ...).

Yes, the text should be giving the idea that the configuration the
system is running on, is not supported.

All in all, this is shaping up ok, you can send the next one as
a proper patch with the people on Cc who should take it - (use
./scripts/get_maintainer.pl for that) and not as an inline diff anymore.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette