2021-03-30 08:29:25

by Feng Tang

[permalink] [raw]
Subject: [RFC 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked

Normally the tsc_sync will be checked every time system enters idle state,
but there is still caveat that a system won't enter idle, either because
it's too busy or configured purposely to not enter idle. Setup a periodic
timer to make sure the check is always on.

Signed-off-by: Feng Tang <[email protected]>
---
arch/x86/kernel/tsc_sync.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 3d3c761..1ba3941 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -30,6 +30,7 @@ struct tsc_adjust {
};

static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
+static struct timer_list tsc_sync_check_timer;

/*
* TSC's on different sockets may be reset asynchronously.
@@ -77,6 +78,43 @@ void tsc_verify_tsc_adjust(bool resume)
}
}

+/*
+ * Normally the tsc_sync will be checked every time system enters idle state,
+ * but there is still caveat that a system won't enter idle, either because
+ * it's too busy or configured purposely to not enter idle.
+ *
+ * So setup a periodic timer to make sure the check is always on.
+ */
+
+#define SYNC_CHECK_INTERVAL (HZ * 600)
+static void tsc_sync_check_timer_fn(struct timer_list *unused)
+{
+ int next_cpu;
+
+ tsc_verify_tsc_adjust(false);
+
+ /* Loop to do the check for all onlined CPUs */
+ next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
+ if (next_cpu >= nr_cpu_ids)
+ next_cpu = cpumask_first(cpu_online_mask);
+
+ tsc_sync_check_timer.expires += SYNC_CHECK_INTERVAL;
+ add_timer_on(&tsc_sync_check_timer, next_cpu);
+}
+
+static int __init start_sync_check_timer(void)
+{
+ if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+ return 0;
+
+ timer_setup(&tsc_sync_check_timer, tsc_sync_check_timer_fn, 0);
+ tsc_sync_check_timer.expires = jiffies + SYNC_CHECK_INTERVAL;
+ add_timer(&tsc_sync_check_timer);
+
+ return 0;
+}
+late_initcall(start_sync_check_timer);
+
static void tsc_sanitize_first_cpu(struct tsc_adjust *cur, s64 bootval,
unsigned int cpu, bool bootcpu)
{
--
2.7.4


2021-04-10 09:27:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked

On Tue, Mar 30 2021 at 16:25, Feng Tang wrote:
> Normally the tsc_sync will be checked every time system enters idle state,
> but there is still caveat that a system won't enter idle, either because
> it's too busy or configured purposely to not enter idle. Setup a periodic
> timer to make sure the check is always on.

Bah. I really hate the fact that we don't have a knob to disable writes
to the TSC/TSC_ADJUST msrs. That would spare this business alltogether.

> +/*
> + * Normally the tsc_sync will be checked every time system enters idle state,
> + * but there is still caveat that a system won't enter idle, either because
> + * it's too busy or configured purposely to not enter idle.
> + *
> + * So setup a periodic timer to make sure the check is always on.
> + */
> +
> +#define SYNC_CHECK_INTERVAL (HZ * 600)
> +static void tsc_sync_check_timer_fn(struct timer_list *unused)

I've surely mentioned this before that glueing a define without an empty
newline to a function definition is horrible to read.

> +{
> + int next_cpu;
> +
> + tsc_verify_tsc_adjust(false);
> +
> + /* Loop to do the check for all onlined CPUs */

I don't see a loop here.

> + next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);

Why raw_smp_processor_id()? What's wrong with smp_processor_id()?

> + if (next_cpu >= nr_cpu_ids)
> + next_cpu = cpumask_first(cpu_online_mask);
> +
> + tsc_sync_check_timer.expires += SYNC_CHECK_INTERVAL;
> + add_timer_on(&tsc_sync_check_timer, next_cpu);
> +}
> +
> +static int __init start_sync_check_timer(void)
> +{
> + if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
> + return 0;
> +
> + timer_setup(&tsc_sync_check_timer, tsc_sync_check_timer_fn, 0);
> + tsc_sync_check_timer.expires = jiffies + SYNC_CHECK_INTERVAL;
> + add_timer(&tsc_sync_check_timer);
> +
> + return 0;
> +}
> +late_initcall(start_sync_check_timer);

So right now, if someone adds 'tsc=reliable' on the kernel command line
then all of the watchdog checking, except for the idle enter TSC_ADJUST
check is disabled. The NOHZ full people are probably going to be pretty
unhappy about yet another unconditional timer they have to chase down.

So this needs some more thought.

Thanks,

tglx

2021-04-10 09:49:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked

On Sat, Apr 10, 2021 at 11:27:11AM +0200, Thomas Gleixner wrote:
> On Tue, Mar 30 2021 at 16:25, Feng Tang wrote:
> > Normally the tsc_sync will be checked every time system enters idle state,
> > but there is still caveat that a system won't enter idle, either because
> > it's too busy or configured purposely to not enter idle. Setup a periodic
> > timer to make sure the check is always on.
>
> Bah. I really hate the fact that we don't have a knob to disable writes
> to the TSC/TSC_ADJUST msrs. That would spare this business alltogether.

We have the MSR filtering and I'd *love* to add those MSRs to a
permanent ban list of MSRs which will never ever be written to from
luserspace.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-04-10 12:12:32

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH] x86/msr: Block writes to certain MSRs unconditionally

From: Borislav Petkov <[email protected]>
Date: Sat, 10 Apr 2021 14:08:13 +0200

There are a bunch of MSRs which luserspace has no business poking at,
whatsoever. Add a ban list and put the TSC-related MSRs in there. Issue
a big juicy splat to catch offenders.

Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/msr.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index ed8ac6bcbafb..574bd2c6d4f8 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -78,6 +78,13 @@ static ssize_t msr_read(struct file *file, char __user *buf,
return bytes ? bytes : err;
}

+static const u32 msr_ban_list[] = {
+ MSR_IA32_TSC,
+ MSR_TSC_AUX,
+ MSR_IA32_TSC_ADJUST,
+ MSR_IA32_TSC_DEADLINE,
+};
+
static int filter_write(u32 reg)
{
/*
@@ -89,6 +96,16 @@ static int filter_write(u32 reg)
* avoid saturating the ring buffer.
*/
static DEFINE_RATELIMIT_STATE(fw_rs, 30 * HZ, 1);
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(msr_ban_list); i++) {
+ if (msr_ban_list[i] != reg)
+ continue;
+
+ WARN_ONCE(1, "Blocked write to MSR 0x%x\n", reg);
+
+ return -EINVAL;
+ }

switch (allow_writes) {
case MSR_WRITES_ON: return 0;
--
2.29.2


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-04-10 14:38:56

by Feng Tang

[permalink] [raw]
Subject: Re: [RFC 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked

Hi Thomas,

On Sat, Apr 10, 2021 at 11:27:11AM +0200, Thomas Gleixner wrote:
> On Tue, Mar 30 2021 at 16:25, Feng Tang wrote:
> > Normally the tsc_sync will be checked every time system enters idle state,
> > but there is still caveat that a system won't enter idle, either because
> > it's too busy or configured purposely to not enter idle. Setup a periodic
> > timer to make sure the check is always on.
>
> Bah. I really hate the fact that we don't have a knob to disable writes
> to the TSC/TSC_ADJUST msrs. That would spare this business alltogether.
>
> > +/*
> > + * Normally the tsc_sync will be checked every time system enters idle state,
> > + * but there is still caveat that a system won't enter idle, either because
> > + * it's too busy or configured purposely to not enter idle.
> > + *
> > + * So setup a periodic timer to make sure the check is always on.
> > + */
> > +
> > +#define SYNC_CHECK_INTERVAL (HZ * 600)
> > +static void tsc_sync_check_timer_fn(struct timer_list *unused)
>
> I've surely mentioned this before that glueing a define without an empty
> newline to a function definition is horrible to read.

Got it, will add a newline.

> > +{
> > + int next_cpu;
> > +
> > + tsc_verify_tsc_adjust(false);
> > +
> > + /* Loop to do the check for all onlined CPUs */
>
> I don't see a loop here.

I meant to loop all onlined CPUs, and the comment could be
changed to

/* Run the check for all onlined CPUs in turn */

> > + next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
>
> Why raw_smp_processor_id()? What's wrong with smp_processor_id()?

Will change to smp_processor_id() for this timer function.

> > + if (next_cpu >= nr_cpu_ids)
> > + next_cpu = cpumask_first(cpu_online_mask);
> > +
> > + tsc_sync_check_timer.expires += SYNC_CHECK_INTERVAL;
> > + add_timer_on(&tsc_sync_check_timer, next_cpu);
> > +}
> > +
> > +static int __init start_sync_check_timer(void)
> > +{
> > + if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
> > + return 0;
> > +
> > + timer_setup(&tsc_sync_check_timer, tsc_sync_check_timer_fn, 0);
> > + tsc_sync_check_timer.expires = jiffies + SYNC_CHECK_INTERVAL;
> > + add_timer(&tsc_sync_check_timer);
> > +
> > + return 0;
> > +}
> > +late_initcall(start_sync_check_timer);
>
> So right now, if someone adds 'tsc=reliable' on the kernel command line
> then all of the watchdog checking, except for the idle enter TSC_ADJUST
> check is disabled. The NOHZ full people are probably going to be pretty
> unhappy about yet another unconditional timer they have to chase down.
>
> So this needs some more thought.

'tsc=reliable' in cmdline will set 'tsc_clocksource_reliable' to 1, so
we can skip starting this timer if 'tsc_clocksource_reliable==1' ?

Thanks,
Feng

>
> Thanks,
>
> tglx

2021-04-10 14:50:32

by Feng Tang

[permalink] [raw]
Subject: Re: [RFC 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked

Hi Boris,

On Sat, Apr 10, 2021 at 11:47:52AM +0200, Borislav Petkov wrote:
> On Sat, Apr 10, 2021 at 11:27:11AM +0200, Thomas Gleixner wrote:
> > On Tue, Mar 30 2021 at 16:25, Feng Tang wrote:
> > > Normally the tsc_sync will be checked every time system enters idle state,
> > > but there is still caveat that a system won't enter idle, either because
> > > it's too busy or configured purposely to not enter idle. Setup a periodic
> > > timer to make sure the check is always on.
> >
> > Bah. I really hate the fact that we don't have a knob to disable writes
> > to the TSC/TSC_ADJUST msrs. That would spare this business alltogether.
>
> We have the MSR filtering and I'd *love* to add those MSRs to a
> permanent ban list of MSRs which will never ever be written to from
> luserspace.

Yep, I just tried changing TSC_ADJUST msr with 'wrmsr' command, and
it did succeed and trigger the warning of tsc_verify_tsc_adjust():

[ 1135.387866] [Firmware Bug]: TSC ADJUST differs: CPU0 0 --> 4096. Restoring

And the bigger risk is still BIOS's writing to TSC_ADJUST MSR beneath
kernel.

Thanks,
Feng

> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2021-04-10 14:53:25

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/msr: Block writes to certain MSRs unconditionally



> On Apr 10, 2021, at 5:11 AM, Borislav Petkov <[email protected]> wrote:
>
> From: Borislav Petkov <[email protected]>
> Date: Sat, 10 Apr 2021 14:08:13 +0200
>
> There are a bunch of MSRs which luserspace has no business poking at,
> whatsoever. Add a ban list and put the TSC-related MSRs in there. Issue
> a big juicy splat to catch offenders.

Ack!

Can you add STAR, CSTAR, LSTAR, SYSENTER*, SYSCALL*, and EFER please?

>
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
> arch/x86/kernel/msr.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
> index ed8ac6bcbafb..574bd2c6d4f8 100644
> --- a/arch/x86/kernel/msr.c
> +++ b/arch/x86/kernel/msr.c
> @@ -78,6 +78,13 @@ static ssize_t msr_read(struct file *file, char __user *buf,
> return bytes ? bytes : err;
> }
>
> +static const u32 msr_ban_list[] = {
> + MSR_IA32_TSC,
> + MSR_TSC_AUX,
> + MSR_IA32_TSC_ADJUST,
> + MSR_IA32_TSC_DEADLINE,
> +};
> +
> static int filter_write(u32 reg)
> {
> /*
> @@ -89,6 +96,16 @@ static int filter_write(u32 reg)
> * avoid saturating the ring buffer.
> */
> static DEFINE_RATELIMIT_STATE(fw_rs, 30 * HZ, 1);
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(msr_ban_list); i++) {
> + if (msr_ban_list[i] != reg)
> + continue;
> +
> + WARN_ONCE(1, "Blocked write to MSR 0x%x\n", reg);
> +
> + return -EINVAL;
> + }
>
> switch (allow_writes) {
> case MSR_WRITES_ON: return 0;
> --
> 2.29.2
>
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2021-04-10 15:35:03

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH -v1.1] x86/msr: Block writes to certain MSRs unconditionally

On Sat, Apr 10, 2021 at 07:51:58AM -0700, Andy Lutomirski wrote:
> Can you add STAR, CSTAR, LSTAR, SYSENTER*, SYSCALL*, and EFER please?

Sure.

---

From: Borislav Petkov <[email protected]>
Date: Sat, 10 Apr 2021 14:08:13 +0200

There are a bunch of MSRs which luserspace has no business poking
at, whatsoever. Add a ban list and put the TSC-related MSRs and the
ring0-entry and features control MSRs in there. Issue a big juicy splat
to catch offenders.

Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kernel/msr.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index ed8ac6bcbafb..2435a619cd9f 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -78,6 +78,21 @@ static ssize_t msr_read(struct file *file, char __user *buf,
return bytes ? bytes : err;
}

+static const u32 msr_ban_list[] = {
+ MSR_IA32_TSC,
+ MSR_TSC_AUX,
+ MSR_IA32_TSC_ADJUST,
+ MSR_IA32_TSC_DEADLINE,
+ MSR_EFER,
+ MSR_STAR,
+ MSR_CSTAR,
+ MSR_LSTAR,
+ MSR_SYSCALL_MASK,
+ MSR_IA32_SYSENTER_CS,
+ MSR_IA32_SYSENTER_ESP,
+ MSR_IA32_SYSENTER_EIP,
+};
+
static int filter_write(u32 reg)
{
/*
@@ -89,6 +104,16 @@ static int filter_write(u32 reg)
* avoid saturating the ring buffer.
*/
static DEFINE_RATELIMIT_STATE(fw_rs, 30 * HZ, 1);
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(msr_ban_list); i++) {
+ if (msr_ban_list[i] != reg)
+ continue;
+
+ WARN_ONCE(1, "Blocked write to MSR 0x%x.\n", reg);
+
+ return -EINVAL;
+ }

switch (allow_writes) {
case MSR_WRITES_ON: return 0;
--
2.29.2

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-04-10 15:39:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked

On Sat, Apr 10, 2021 at 10:48:56PM +0800, Feng Tang wrote:
> And the bigger risk is still BIOS's writing to TSC_ADJUST MSR beneath
> kernel.

For that we need to do more persuasive work with hw guys. Needs a *lot*
of perseverance.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-04-10 18:44:49

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked

On Sat, Apr 10 2021 at 11:47, Borislav Petkov wrote:

> On Sat, Apr 10, 2021 at 11:27:11AM +0200, Thomas Gleixner wrote:
>> On Tue, Mar 30 2021 at 16:25, Feng Tang wrote:
>> > Normally the tsc_sync will be checked every time system enters idle state,
>> > but there is still caveat that a system won't enter idle, either because
>> > it's too busy or configured purposely to not enter idle. Setup a periodic
>> > timer to make sure the check is always on.
>>
>> Bah. I really hate the fact that we don't have a knob to disable writes
>> to the TSC/TSC_ADJUST msrs. That would spare this business alltogether.
>
> We have the MSR filtering and I'd *love* to add those MSRs to a
> permanent ban list of MSRs which will never ever be written to from
> luserspace.

That's good, but what I really want is a knob which prevents BIOS/SMM
from writing to it. The only reason why BIOS ever needs to write is for
physical hotplug and perhaps for 4+ socket machines on boot. After that
every write is a bug.

Thanks,

tglx

2021-04-10 18:47:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked

Feng,

On Sat, Apr 10 2021 at 22:38, Feng Tang wrote:
> On Sat, Apr 10, 2021 at 11:27:11AM +0200, Thomas Gleixner wrote:
>> > +static int __init start_sync_check_timer(void)
>> > +{
>> > + if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
>> > + return 0;
>> > +
>> > + timer_setup(&tsc_sync_check_timer, tsc_sync_check_timer_fn, 0);
>> > + tsc_sync_check_timer.expires = jiffies + SYNC_CHECK_INTERVAL;
>> > + add_timer(&tsc_sync_check_timer);
>> > +
>> > + return 0;
>> > +}
>> > +late_initcall(start_sync_check_timer);
>>
>> So right now, if someone adds 'tsc=reliable' on the kernel command line
>> then all of the watchdog checking, except for the idle enter TSC_ADJUST
>> check is disabled. The NOHZ full people are probably going to be pretty
>> unhappy about yet another unconditional timer they have to chase down.
>>
>> So this needs some more thought.
>
> 'tsc=reliable' in cmdline will set 'tsc_clocksource_reliable' to 1, so
> we can skip starting this timer if 'tsc_clocksource_reliable==1' ?

Then we can just ignore that patch alltogether because of 2/2 doing:

+ if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
+ boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
+ boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
+ nr_online_nodes <= 2)
+ tsc_clocksource_reliable = 1;

....

I said for a reason:

>> So this needs some more thought.

Thanks,

tglx

2021-04-10 18:56:01

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86/msr: Block writes to certain MSRs unconditionally

Borislav Petkov <[email protected]> writes:

> From: Borislav Petkov <[email protected]>
> Date: Sat, 10 Apr 2021 14:08:13 +0200
>
> There are a bunch of MSRs which luserspace has no business poking at,
> whatsoever. Add a ban list and put the TSC-related MSRs in there. Issue
> a big juicy splat to catch offenders.

Have you ever seen any user programs actually write those MSRs?
I don't see why they ever would, it's not that they have any motivation
to do it (unlike SMM), and I don't know of any examples.

The whole MSR blocking seems more like a tilting at windmills
type effort. Root kits typically write from the kernel anyways. And the
only results we have so far is various legitimate debug
and benchmark utilities running much slower due to them flooding the
kernel log with warnings.

I can see that there are security reasons to lock down MSRs, but that is
already handled fine with existing sandbox and lockdown mechanisms. But
on a non locked down system fully accessible MSRs are really useful for
all kind of debugging and tuning, and anything that prevents that
is bad.

-Andi

2021-04-11 07:22:11

by Feng Tang

[permalink] [raw]
Subject: Re: [RFC 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked

On Sat, Apr 10, 2021 at 08:46:38PM +0200, Thomas Gleixner wrote:
> Feng,
>
> On Sat, Apr 10 2021 at 22:38, Feng Tang wrote:
> > On Sat, Apr 10, 2021 at 11:27:11AM +0200, Thomas Gleixner wrote:
> >> > +static int __init start_sync_check_timer(void)
> >> > +{
> >> > + if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
> >> > + return 0;
> >> > +
> >> > + timer_setup(&tsc_sync_check_timer, tsc_sync_check_timer_fn, 0);
> >> > + tsc_sync_check_timer.expires = jiffies + SYNC_CHECK_INTERVAL;
> >> > + add_timer(&tsc_sync_check_timer);
> >> > +
> >> > + return 0;
> >> > +}
> >> > +late_initcall(start_sync_check_timer);
> >>
> >> So right now, if someone adds 'tsc=reliable' on the kernel command line
> >> then all of the watchdog checking, except for the idle enter TSC_ADJUST
> >> check is disabled. The NOHZ full people are probably going to be pretty
> >> unhappy about yet another unconditional timer they have to chase down.
> >>
> >> So this needs some more thought.
> >
> > 'tsc=reliable' in cmdline will set 'tsc_clocksource_reliable' to 1, so
> > we can skip starting this timer if 'tsc_clocksource_reliable==1' ?
>
> Then we can just ignore that patch alltogether because of 2/2 doing:
>
> + if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> + boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> + boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> + nr_online_nodes <= 2)
> + tsc_clocksource_reliable = 1;
>
> ....
>
> I said for a reason:

Sorry, I missed that part and should have put more thought on it,
which is much trickier than I thought.

In the very first patch I set 'tsc_clocksource_reliable' to 1 to
try reusing the logic of clearing CLOCK_SOURCE_MUST_VERIFY bit,
and now we may need to decouple them.

One thing I can think of now is something below:

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc..bfd013b 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1177,6 +1177,12 @@ void mark_tsc_unstable(char *reason)

EXPORT_SYMBOL_GPL(mark_tsc_unstable);

+static void __init tsc_skip_watchdog_verify(void)
+{
+ clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+ clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+}
+
static void __init check_system_tsc_reliable(void)
{
#if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
@@ -1193,6 +1199,17 @@ static void __init check_system_tsc_reliable(void)
#endif
if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
tsc_clocksource_reliable = 1;
+
+ /*
+ * Ideally the socket number should be checked, but this is called
+ * by tsc_init() which is in early boot phase and the socket numbers
+ * may not be available. Use 'nr_online_nodes' as a fallback solution
+ */
+ if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
+ boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
+ boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
+ nr_online_nodes <= 2)
+ tsc_skip_watchdog_verify();
}

/*
@@ -1384,9 +1401,6 @@ static int __init init_tsc_clocksource(void)
if (tsc_unstable)
goto unreg;

- if (tsc_clocksource_reliable || no_tsc_watchdog)
- clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
-
if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;

@@ -1524,7 +1538,7 @@ void __init tsc_init(void)
}

if (tsc_clocksource_reliable || no_tsc_watchdog)
- clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+ tsc_skip_watchdog_verify();

clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
detect_art();

Thanks,
Feng

> >> So this needs some more thought.
>
> Thanks,
>
> tglx

2021-04-11 09:42:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/msr: Block writes to certain MSRs unconditionally

On Sat, Apr 10, 2021 at 11:52:17AM -0700, Andi Kleen wrote:
> Have you ever seen any user programs actually write those MSRs?
> I don't see why they ever would, it's not that they have any motivation
> to do it (unlike SMM), and I don't know of any examples.

You'd be surprised how much motivation people have to poke at random
MSRs. Just browse some of those tools on github which think poking at
MSRs is ok.

> The whole MSR blocking seems more like a tilting at windmills
> type effort.

Nope, this is trying to salvage the current situation of people thinking
it is a good idea to poke at random MSRs and develop all kinds of tools
around it and then run those tools ontop of a kernel which pokes at
those same MSRs.

It is not really hard to realize that touching resources in an
unsynchronized way is a disaster waiting to happen. No matter how useful
and how wonderful those tools are.

> But on a non locked down system fully accessible MSRs are really
> useful for all kind of debugging and tuning, and anything that
> prevents that is bad.

If you wanna do that, you can just as well patch your kernel.

We're currently tainting the kernel on MSR writes and perhaps that's
good enough for now but if some tool starts doing dumb crap and pokes at
MSRs it should not be poking at and users start complaining because of
it, I'm committing that.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-04-11 17:45:13

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/msr: Block writes to certain MSRs unconditionally

On Sat, Apr 10, 2021 at 11:52 AM Andi Kleen <[email protected]> wrote:
>
> Borislav Petkov <[email protected]> writes:
>
> > From: Borislav Petkov <[email protected]>
> > Date: Sat, 10 Apr 2021 14:08:13 +0200
> >
> > There are a bunch of MSRs which luserspace has no business poking at,
> > whatsoever. Add a ban list and put the TSC-related MSRs in there. Issue
> > a big juicy splat to catch offenders.
>
> Have you ever seen any user programs actually write those MSRs?
> I don't see why they ever would, it's not that they have any motivation
> to do it (unlike SMM), and I don't know of any examples.

I have actually seen real user programs poke MSR_SYSCALL_MASK.

2021-04-11 17:50:12

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86/msr: Block writes to certain MSRs unconditionally

> I have actually seen real user programs poke MSR_SYSCALL_MASK.

Hmm, what was the use case?

-Andi

2021-04-11 19:48:00

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/msr: Block writes to certain MSRs unconditionally




> On Apr 11, 2021, at 9:43 AM, Andi Kleen <[email protected]> wrote:
>
> 
>>
>> I have actually seen real user programs poke MSR_SYSCALL_MASK.
>
> Hmm, what was the use case?
>
>

Working around a kernel bug. The workaround only worked on AMD systems. The correct solution was to fix the kernel bug, not poke MSRs.

2021-04-11 19:50:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/msr: Block writes to certain MSRs unconditionally

On Sun, Apr 11, 2021 at 09:57:20AM -0700, Andy Lutomirski wrote:
> Working around a kernel bug. The workaround only worked on AMD
> systems. The correct solution was to fix the kernel bug, not poke
> MSRs.

Do you remember which program(s) and where I can get them to have a
look?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-04-12 00:02:59

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/msr: Block writes to certain MSRs unconditionally

On Sun, Apr 11, 2021 at 10:04 AM Borislav Petkov <[email protected]> wrote:
>
> On Sun, Apr 11, 2021 at 09:57:20AM -0700, Andy Lutomirski wrote:
> > Working around a kernel bug. The workaround only worked on AMD
> > systems. The correct solution was to fix the kernel bug, not poke
> > MSRs.
>
> Do you remember which program(s) and where I can get them to have a
> look?
>

https://bugs.winehq.org/show_bug.cgi?id=33275#c19

I sure hope no one is still doing this.

--Andy

2021-04-12 23:14:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/msr: Block writes to certain MSRs unconditionally

On Sun, Apr 11, 2021 at 04:21:21PM -0700, Andy Lutomirski wrote:
> https://bugs.winehq.org/show_bug.cgi?id=33275#c19
>
> I sure hope no one is still doing this.

Aha, IRET with rFLAGS.NT set. At least it is only an ad-hoc program to
fix this particular issue and I hope too it hasn't propagated somewhere
else.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette