Popular tools, like intel-undervolt, use MSR 0x150 to control the CPU
voltage offset. In fact, evidently the intel_turbo_max_3 driver in-tree
also uses this MSR. So, teach the kernel's MSR list about this, so that
intel-undervolt and other such tools don't spew warnings to dmesg, while
unifying the constant used throughout the kernel.
Fixes: a7e1f67ed29f ("x86/msr: Filter MSR writes")
Cc: Borislav Petkov <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
---
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kernel/msr.c | 5 ++++-
drivers/platform/x86/intel_turbo_max_3.c | 6 +++---
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 2859ee4f39a8..0bcb3604d0e2 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -132,6 +132,8 @@
#define MSR_IA32_MCU_OPT_CTRL 0x00000123
#define RNGDS_MITG_DIS BIT(0)
+#define MSR_IA32_OC_MAILBOX 0x00000150
+
#define MSR_IA32_SYSENTER_CS 0x00000174
#define MSR_IA32_SYSENTER_ESP 0x00000175
#define MSR_IA32_SYSENTER_EIP 0x00000176
diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index 49dcfb85e773..64f6200681e3 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -86,8 +86,11 @@ static int filter_write(u32 reg)
default: break;
}
- if (reg == MSR_IA32_ENERGY_PERF_BIAS)
+ switch (reg) {
+ case MSR_IA32_ENERGY_PERF_BIAS:
+ case MSR_IA32_OC_MAILBOX:
return 0;
+ }
pr_err_ratelimited("Write to unrecognized MSR 0x%x by %s\n"
"Please report to [email protected]\n",
diff --git a/drivers/platform/x86/intel_turbo_max_3.c b/drivers/platform/x86/intel_turbo_max_3.c
index 892140b62898..991cdbc3295b 100644
--- a/drivers/platform/x86/intel_turbo_max_3.c
+++ b/drivers/platform/x86/intel_turbo_max_3.c
@@ -17,8 +17,8 @@
#include <asm/cpu_device_id.h>
#include <asm/intel-family.h>
+#include <asm/msr.h>
-#define MSR_OC_MAILBOX 0x150
#define MSR_OC_MAILBOX_CMD_OFFSET 32
#define MSR_OC_MAILBOX_RSP_OFFSET 32
#define MSR_OC_MAILBOX_BUSY_BIT 63
@@ -41,14 +41,14 @@ static int get_oc_core_priority(unsigned int cpu)
value = cmd << MSR_OC_MAILBOX_CMD_OFFSET;
/* Set the busy bit to indicate OS is trying to issue command */
value |= BIT_ULL(MSR_OC_MAILBOX_BUSY_BIT);
- ret = wrmsrl_safe(MSR_OC_MAILBOX, value);
+ ret = wrmsrl_safe(MSR_IA32_OC_MAILBOX, value);
if (ret) {
pr_debug("cpu %d OC mailbox write failed\n", cpu);
return ret;
}
for (i = 0; i < OC_MAILBOX_RETRY_COUNT; ++i) {
- ret = rdmsrl_safe(MSR_OC_MAILBOX, &value);
+ ret = rdmsrl_safe(MSR_IA32_OC_MAILBOX, &value);
if (ret) {
pr_debug("cpu %d OC mailbox read failed\n", cpu);
break;
--
2.28.0
+ Srinivas.
+ kitsunyan.
On Mon, Sep 07, 2020 at 11:48:43AM +0200, Jason A. Donenfeld wrote:
> Popular tools, like intel-undervolt, use MSR 0x150 to control the CPU
> voltage offset. In fact, evidently the intel_turbo_max_3 driver in-tree
> also uses this MSR. So, teach the kernel's MSR list about this, so that
> intel-undervolt and other such tools don't spew warnings to dmesg, while
> unifying the constant used throughout the kernel.
>
> Fixes: a7e1f67ed29f ("x86/msr: Filter MSR writes")
> Cc: Borislav Petkov <[email protected]>
> Signed-off-by: Jason A. Donenfeld <[email protected]>
> ---
> arch/x86/include/asm/msr-index.h | 2 ++
> arch/x86/kernel/msr.c | 5 ++++-
> drivers/platform/x86/intel_turbo_max_3.c | 6 +++---
> 3 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 2859ee4f39a8..0bcb3604d0e2 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -132,6 +132,8 @@
> #define MSR_IA32_MCU_OPT_CTRL 0x00000123
> #define RNGDS_MITG_DIS BIT(0)
>
> +#define MSR_IA32_OC_MAILBOX 0x00000150
> +
> #define MSR_IA32_SYSENTER_CS 0x00000174
> #define MSR_IA32_SYSENTER_ESP 0x00000175
> #define MSR_IA32_SYSENTER_EIP 0x00000176
> diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
> index 49dcfb85e773..64f6200681e3 100644
> --- a/arch/x86/kernel/msr.c
> +++ b/arch/x86/kernel/msr.c
> @@ -86,8 +86,11 @@ static int filter_write(u32 reg)
> default: break;
> }
>
> - if (reg == MSR_IA32_ENERGY_PERF_BIAS)
> + switch (reg) {
> + case MSR_IA32_ENERGY_PERF_BIAS:
> + case MSR_IA32_OC_MAILBOX:
> return 0;
> + }
>
> pr_err_ratelimited("Write to unrecognized MSR 0x%x by %s\n"
> "Please report to [email protected]\n",
> diff --git a/drivers/platform/x86/intel_turbo_max_3.c b/drivers/platform/x86/intel_turbo_max_3.c
> index 892140b62898..991cdbc3295b 100644
> --- a/drivers/platform/x86/intel_turbo_max_3.c
> +++ b/drivers/platform/x86/intel_turbo_max_3.c
> @@ -17,8 +17,8 @@
>
> #include <asm/cpu_device_id.h>
> #include <asm/intel-family.h>
> +#include <asm/msr.h>
>
> -#define MSR_OC_MAILBOX 0x150
> #define MSR_OC_MAILBOX_CMD_OFFSET 32
> #define MSR_OC_MAILBOX_RSP_OFFSET 32
> #define MSR_OC_MAILBOX_BUSY_BIT 63
> @@ -41,14 +41,14 @@ static int get_oc_core_priority(unsigned int cpu)
> value = cmd << MSR_OC_MAILBOX_CMD_OFFSET;
> /* Set the busy bit to indicate OS is trying to issue command */
> value |= BIT_ULL(MSR_OC_MAILBOX_BUSY_BIT);
> - ret = wrmsrl_safe(MSR_OC_MAILBOX, value);
> + ret = wrmsrl_safe(MSR_IA32_OC_MAILBOX, value);
> if (ret) {
> pr_debug("cpu %d OC mailbox write failed\n", cpu);
> return ret;
> }
>
> for (i = 0; i < OC_MAILBOX_RETRY_COUNT; ++i) {
> - ret = rdmsrl_safe(MSR_OC_MAILBOX, &value);
> + ret = rdmsrl_safe(MSR_IA32_OC_MAILBOX, &value);
> if (ret) {
> pr_debug("cpu %d OC mailbox read failed\n", cpu);
> break;
> --
Actually, we added the filtering to catch exactly such misuses and,
lemme check what is the proper word now... /me checks, aha, adding new
MSRs to the "passlist" is the wrong thing to do.
Srinivas, can you pls convert this in-tree driver to use a proper sysfs
interface for that mailbox MSR and also work with the intel-undervolt
author - I hope I have the right person CCed from the git repo on github
- to come up with a proper interface so that we can drop this MSR use
too.
Thx.
--
Regards/Gruss,
Boris.
SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg
Hi Borislav,
On Mon, Sep 7, 2020 at 12:06 PM Borislav Petkov <[email protected]> wrote:
>
> + Srinivas.
> + kitsunyan.
>
> On Mon, Sep 07, 2020 at 11:48:43AM +0200, Jason A. Donenfeld wrote:
> > Popular tools, like intel-undervolt, use MSR 0x150 to control the CPU
> > voltage offset. In fact, evidently the intel_turbo_max_3 driver in-tree
> > also uses this MSR. So, teach the kernel's MSR list about this, so that
> > intel-undervolt and other such tools don't spew warnings to dmesg, while
> > unifying the constant used throughout the kernel.
> >
> > Fixes: a7e1f67ed29f ("x86/msr: Filter MSR writes")
> > Cc: Borislav Petkov <[email protected]>
> > Signed-off-by: Jason A. Donenfeld <[email protected]>
> > ---
> > arch/x86/include/asm/msr-index.h | 2 ++
> > arch/x86/kernel/msr.c | 5 ++++-
> > drivers/platform/x86/intel_turbo_max_3.c | 6 +++---
> > 3 files changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> > - if (reg == MSR_IA32_ENERGY_PERF_BIAS)
> > + switch (reg) {
> > + case MSR_IA32_ENERGY_PERF_BIAS:
> > + case MSR_IA32_OC_MAILBOX:
> > return 0;
> > + }
> Actually, we added the filtering to catch exactly such misuses and,
Are you sure that intel-undervolt using OC_MAILBOX from userspace is
actually a "misuse"? Should the kernel or kernel drivers actually be
involved with the task of underclocking? This seems pretty squarely in
the realm of "hobbyists poking and prodding at their CPUs" rather than
something made for a kernel driver, right? Also, what was the
justification for whitelisting MSR_IA32_ENERGY_PERF_BIAS?
Jason
Hi,
On Mon, Sep 07, 2020 at 12:46:35PM +0200, Jason A. Donenfeld wrote:
> Are you sure that intel-undervolt using OC_MAILBOX from userspace is
> actually a "misuse"? Should the kernel or kernel drivers actually be
> involved with the task of underclocking? This seems pretty squarely in
> the realm of "hobbyists poking and prodding at their CPUs" rather than
> something made for a kernel driver, right?
The only thing I'm sure is that *if* it makes sense for any driver to
control something in the hardware over MSRs, it should *not* poke at
naked MSRs but use a proper interface.
I'd leave it to the people who actually need this interface, to explain
why they do.
> Also, what was the justification for whitelisting
> MSR_IA32_ENERGY_PERF_BIAS?
That's:
tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
Once that thing gets converted to a proper interface too, that MSR goes
off the allowlist too.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Mon, Sep 07, 2020 at 01:15:14PM +0200, Jason A. Donenfeld wrote:
> Gotcha. So your perspective is that the goal is actually to have no
> list at all in the end, because all MSR writes should go through sysfs
> interfaces and such, always? I certainly like that goal -- it'd make a
> whole lot of CPU functionality a lot more discoverable and easier to
> tinker with. In practice, it seems like that's a hard goal to
> accomplish, with different MSRs having different semantics and
> meanings of different bit offsets, and a great deal of them aren't
> actually publicly documented by Intel. Were you hoping to just handle
> these piece by piece, and eventually Linux will have a decent
> compendium of MSRs? That sure would be nice.
Yes to all of the above.
The MSRs should not have been exposed to userspace in the first place.
See the commit message of:
a7e1f67ed29f ("x86/msr: Filter MSR writes")
for why not.
> Is Intel on board with that plan?
They better be. Like the other vendors who have MSRs too.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Mon, Sep 7, 2020 at 1:11 PM Borislav Petkov <[email protected]> wrote:
>
> Hi,
>
> On Mon, Sep 07, 2020 at 12:46:35PM +0200, Jason A. Donenfeld wrote:
> > Are you sure that intel-undervolt using OC_MAILBOX from userspace is
> > actually a "misuse"? Should the kernel or kernel drivers actually be
> > involved with the task of underclocking? This seems pretty squarely in
> > the realm of "hobbyists poking and prodding at their CPUs" rather than
> > something made for a kernel driver, right?
>
> The only thing I'm sure is that *if* it makes sense for any driver to
> control something in the hardware over MSRs, it should *not* poke at
> naked MSRs but use a proper interface.
>
> I'd leave it to the people who actually need this interface, to explain
> why they do.
>
> > Also, what was the justification for whitelisting
> > MSR_IA32_ENERGY_PERF_BIAS?
>
> That's:
>
> tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
>
> Once that thing gets converted to a proper interface too, that MSR goes
> off the allowlist too.
Gotcha. So your perspective is that the goal is actually to have no
list at all in the end, because all MSR writes should go through sysfs
interfaces and such, always? I certainly like that goal -- it'd make a
whole lot of CPU functionality a lot more discoverable and easier to
tinker with. In practice, it seems like that's a hard goal to
accomplish, with different MSRs having different semantics and
meanings of different bit offsets, and a great deal of them aren't
actually publicly documented by Intel. Were you hoping to just handle
these piece by piece, and eventually Linux will have a decent
compendium of MSRs? That sure would be nice. Is Intel on board with
that plan?
Jason
On Mon, 2020-09-07 at 12:06 +0200, Borislav Petkov wrote:
> + Srinivas.
> + kitsunyan.
>
> On Mon, Sep 07, 2020 at 11:48:43AM +0200, Jason A. Donenfeld wrote:
> > Popular tools, like intel-undervolt, use MSR 0x150 to control the
> > CPU
> > voltage offset. In fact, evidently the intel_turbo_max_3 driver in-
> > tree
> > also uses this MSR. So, teach the kernel's MSR list about this, so
> > that
> > intel-undervolt and other such tools don't spew warnings to dmesg,
> > while
> > unifying the constant used throughout the kernel.
> >
[...]
> > - if (reg == MSR_IA32_ENERGY_PERF_BIAS)
> > + switch (reg) {
> > + case MSR_IA32_ENERGY_PERF_BIAS:
There is already sysfs interface for it.
> > + case MSR_IA32_OC_MAILBOX:
> > return 0;
> > + }
> >
[...]
> Actually, we added the filtering to catch exactly such misuses and,
> lemme check what is the proper word now... /me checks, aha, adding
> new
> MSRs to the "passlist" is the wrong thing to do.
>
> Srinivas, can you pls convert this in-tree driver to use a proper
> sysfs
> interface for that mailbox MSR and also work with the intel-undervolt
> author - I hope I have the right person CCed from the git repo on
> github
> - to come up with a proper interface so that we can drop this MSR use
> too.
Overclocking is not architectural I/F and is supported by some special
CPU skews. I can't find any public document to specify the commands
which can be used via this OC mailbox. I have to check internally to
see if there is any. To add a proper sysfs interface we have to make
sure that we are not allowing some random commands to hardware and
crash the system.
Thanks,
Srinivas
On Tue, Sep 8, 2020 at 7:10 PM Srinivas Pandruvada
<[email protected]> wrote:
>
> On Mon, 2020-09-07 at 12:06 +0200, Borislav Petkov wrote:
> > + Srinivas.
> > + kitsunyan.
> >
> > On Mon, Sep 07, 2020 at 11:48:43AM +0200, Jason A. Donenfeld wrote:
> > > Popular tools, like intel-undervolt, use MSR 0x150 to control the
> > > CPU
> > > voltage offset. In fact, evidently the intel_turbo_max_3 driver in-
> > > tree
> > > also uses this MSR. So, teach the kernel's MSR list about this, so
> > > that
> > > intel-undervolt and other such tools don't spew warnings to dmesg,
> > > while
> > > unifying the constant used throughout the kernel.
> > >
>
> [...]
>
> > > - if (reg == MSR_IA32_ENERGY_PERF_BIAS)
> > > + switch (reg) {
> > > + case MSR_IA32_ENERGY_PERF_BIAS:
> There is already sysfs interface for it.
>
> > > + case MSR_IA32_OC_MAILBOX:
> > > return 0;
> > > + }
> > >
>
> [...]
>
> > Actually, we added the filtering to catch exactly such misuses and,
> > lemme check what is the proper word now... /me checks, aha, adding
> > new
> > MSRs to the "passlist" is the wrong thing to do.
> >
> > Srinivas, can you pls convert this in-tree driver to use a proper
> > sysfs
> > interface for that mailbox MSR and also work with the intel-undervolt
> > author - I hope I have the right person CCed from the git repo on
> > github
> > - to come up with a proper interface so that we can drop this MSR use
> > too.
>
> Overclocking is not architectural I/F and is supported by some special
> CPU skews. I can't find any public document to specify the commands
> which can be used via this OC mailbox. I have to check internally to
> see if there is any. To add a proper sysfs interface we have to make
> sure that we are not allowing some random commands to hardware and
> crash the system.
Well you can definitely crash the system this way -- undervolting can
result in all sorts of nice glitching. You might be able to even
programmatically undervolt to compromise the kernel in clever ways (a
lockdown bypass, I guess, but who cares).
That's why I initially suggested this was pretty squarely in the realm
of hobbyists and should just be added to that whitelist.
On Tue, Sep 08, 2020 at 07:12:44PM +0200, Jason A. Donenfeld wrote:
> > Overclocking is not architectural I/F and is supported by some special
> > CPU skews. I can't find any public document to specify the commands
> > which can be used via this OC mailbox. I have to check internally to
> > see if there is any. To add a proper sysfs interface we have to make
> > sure that we are not allowing some random commands to hardware and
> > crash the system.
>
> Well you can definitely crash the system this way -- undervolting can
> result in all sorts of nice glitching. You might be able to even
> programmatically undervolt to compromise the kernel in clever ways (a
> lockdown bypass, I guess, but who cares).
>
> That's why I initially suggested this was pretty squarely in the realm
> of hobbyists and should just be added to that whitelist.
If that MSR can cause all kinds of crazy, I'd prefer writes to it from
userspace to be completely forbidden, actually. And if force-enabled,
with a BIG FAT WARNING each time userspace writes to it.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Sep 8, 2020 at 7:26 PM Borislav Petkov <[email protected]> wrote:
>
> On Tue, Sep 08, 2020 at 07:12:44PM +0200, Jason A. Donenfeld wrote:
> > > Overclocking is not architectural I/F and is supported by some special
> > > CPU skews. I can't find any public document to specify the commands
> > > which can be used via this OC mailbox. I have to check internally to
> > > see if there is any. To add a proper sysfs interface we have to make
> > > sure that we are not allowing some random commands to hardware and
> > > crash the system.
> >
> > Well you can definitely crash the system this way -- undervolting can
> > result in all sorts of nice glitching. You might be able to even
> > programmatically undervolt to compromise the kernel in clever ways (a
> > lockdown bypass, I guess, but who cares).
> >
> > That's why I initially suggested this was pretty squarely in the realm
> > of hobbyists and should just be added to that whitelist.
>
> If that MSR can cause all kinds of crazy, I'd prefer writes to it from
> userspace to be completely forbidden, actually. And if force-enabled,
> with a BIG FAT WARNING each time userspace writes to it.
Well that's not cool. And it's sure to really upset the fairly sizable
crowd of people who rely on undervolting and related things to make
their laptops remotely usable, especially in light of the crazy
thermal designs for late-era 14nm intel cpus. Tools like
intel-undervolt have been a godsend in that regard. I came here
posting a patch to remove the annoying message you added for that use
case. Now you want to just totally remove that feature all together
from the kernel? Sounds like a regression in functionality I simply
can't get behind. I know that my laptop, at least, would suffer.
Jason
On Tue, Sep 08, 2020 at 10:10:17AM -0700, Srinivas Pandruvada wrote:
> > > - if (reg == MSR_IA32_ENERGY_PERF_BIAS)
> > > + switch (reg) {
> > > + case MSR_IA32_ENERGY_PERF_BIAS:
> There is already sysfs interface for it.
So someone needs to convert all those to it:
tools/power/cpupower/utils/helpers/msr.c:14:#define MSR_IA32_ENERGY_PERF_BIAS 0x1b0
tools/power/cpupower/utils/helpers/msr.c:84: ret = read_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS, &val);
tools/power/cpupower/utils/helpers/msr.c:97: ret = write_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS, val);
tools/power/x86/turbostat/turbostat.8:222:cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x00000006 (balanced)
tools/power/x86/turbostat/turbostat.c:3652: if (get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS, &msr))
tools/power/x86/turbostat/turbostat.c:3669: fprintf(outf, "cpu%d: MSR_IA32_ENERGY_PERF_BIAS: 0x%08llx (%s)\n", cpu, msr, epb_string);
tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8:25:Policy in MSR_IA32_ENERGY_PERF_BIAS (EPB)
tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8:33:Note that MSR_IA32_ENERGY_PERF_BIAS is defined per CPU,
tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c:84:unsigned int has_epb; /* MSR_IA32_ENERGY_PERF_BIAS */
tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c:755: get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS, &msr);
tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c:1044: get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS, &msr);
tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c:1045: put_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS, new_epb);
Any takers?
If there are none, it could be made into a small kernelnewbies
project...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Sep 08, 2020 at 07:29:11PM +0200, Jason A. Donenfeld wrote:
> Well that's not cool.
So you're saying that if someone wants to kill its box by poking at that
MSR, we should just let her/him?
If anything, I think that a BIG FAT WARNING at least would make sense.
Now, if there were a proper interface which would allow only valid
commands, now that would be optimal...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Sep 8, 2020 at 7:36 PM Borislav Petkov <[email protected]> wrote:
>
> On Tue, Sep 08, 2020 at 07:29:11PM +0200, Jason A. Donenfeld wrote:
> > Well that's not cool.
>
> So you're saying that if someone wants to kill its box by poking at that
> MSR, we should just let her/him?
>
> If anything, I think that a BIG FAT WARNING at least would make sense.
Are you prepared to track down all the MSRs that might maybe do
something naughty?
After determining optimal voltages, people have systemd running
intel-undervolt for them. It becomes part of the normal system
configuration, is applied all the time, and after figuring it out
once, users forget they ever had enabled it, except when observing
that their laptop works better than it originally did.
Does `dd` warn when you run `dd if=/dev/zero of=/dev/sda`?
> Now, if there were a proper interface which would allow only valid
> commands, now that would be optimal...
Probably not possible. Optimal values are related to the "silicon
lottery" that occurs when you buy a new CPU. Different optimal values
for different individual chips.
On Tue, Sep 08, 2020 at 07:42:12PM +0200, Jason A. Donenfeld wrote:
> Are you prepared to track down all the MSRs that might maybe do
> something naughty?
I'm not prepared - that's why this MSR filtering. To block *all* direct
MSR accesses from userspace in the future.
> Does `dd` warn when you run `dd if=/dev/zero of=/dev/sda`?
Yah, because that's the same as bricking your hardware. Geez.
> Probably not possible. Optimal values are related to the "silicon
> lottery" that occurs when you buy a new CPU. Different optimal values
> for different individual chips.
Let's wait for what Srinivas finds out. I'd let Intel decide what they
wanna do.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Sep 8, 2020 at 8:01 PM Borislav Petkov <[email protected]> wrote:
>
> On Tue, Sep 08, 2020 at 07:42:12PM +0200, Jason A. Donenfeld wrote:
> > Are you prepared to track down all the MSRs that might maybe do
> > something naughty?
>
> I'm not prepared - that's why this MSR filtering. To block *all* direct
> MSR accesses from userspace in the future.
>
> > Does `dd` warn when you run `dd if=/dev/zero of=/dev/sda`?
>
> Yah, because that's the same as bricking your hardware. Geez.
Nobody is talking about bricking any hardware. Sorry if I seemed to
imply that before. In my experience, undervolting improperly results
in the CPU calculating things wrong and eventually crashing, and
overclocking usually will trip some thermal limits where the CPU
powers down. I've never experienced bricked hardware as a result of
this.
Jason
On Tue, Sep 08, 2020 at 08:01:12PM +0200, Borislav Petkov wrote:
> On Tue, Sep 08, 2020 at 07:42:12PM +0200, Jason A. Donenfeld wrote:
> > Are you prepared to track down all the MSRs that might maybe do
> > something naughty?
>
> I'm not prepared - that's why this MSR filtering. To block *all* direct
> MSR accesses from userspace in the future.
>
> > Does `dd` warn when you run `dd if=/dev/zero of=/dev/sda`?
>
> Yah, because that's the same as bricking your hardware. Geez.
>
> > Probably not possible. Optimal values are related to the "silicon
> > lottery" that occurs when you buy a new CPU. Different optimal values
> > for different individual chips.
>
> Let's wait for what Srinivas finds out. I'd let Intel decide what they
> wanna do.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
I'd like to point out that on Intel's recent 14nm parts, undervolting is not so
much for squeezing every last drop of performance out of the SoC as it is for
necessity.
I have an i9-9880H laptop which, at 4000 MHz on all CPUs while compiling a
kernel, uses 107W with the stock configuration. Upon undervolting, this figure
goes down to 88W, a ~22% reduction in power consumption at the maximum pstate.
The point of this is not primarily to reduce the power footprint of my machine,
but to get the thermals of this CPU under control. Without undervolting, my
i9-9880H is quite useless, and quickly reaches 100 degC with the fans at max
speed when placed under load.
The 22% reduction in power consumption also makes it clear that these parts use
very heavy-handed voltages out-of-the-box. I consider this to be a defect in the
part; I've never encountered any other CPUs that run, by default, at a voltage
so high that I could achieve a 22% reduction in power consumption by
undervolting.
I've tested multiple 14nm Intel CPUs, 8th gen and above, and they all exhibit
this behavior. On my i5-8265U laptop, compiling a kernel causes the CPU to hit
its 35W power limit and run at about 3300 MHz instead of its maximum all-core
frequency of 3700 MHz. After undervolting, when compiling a kernel, my i5-8265U
runs at 3700 MHz and uses 27W; not only was power consumption reduced by ~30%,
but also performance was improved. Not to mention, my i5-8265U laptop's thermal
design cannot dissipate 35W for very long, but it can sustain 27W without
reaching 100 degC quite well.
If Intel's recent 14nm CPUs didn't guzzle power, then we would have no need to
undervolt. But they do, and here we are.
Sultan
On Tue, Sep 08, 2020 at 12:18:38PM -0700, Sultan Alsawaf wrote:
> I'd like to point out that on Intel's recent 14nm parts, undervolting
> is not so much for squeezing every last drop of performance out of the
> SoC as it is for necessity.
<snip interesting examples>
Sounds to me that this undervolting functionality should be part of
the kernel and happen automatically. I have no clue, though, whether
people who do it, just get lucky and undervolting doesn't cause any
other hardware issues, or there's a real reason for this power madness
and if not done, power-related failures happen only on some boxes so
they decided to do them on all.
Or maybe BIOS is nuts, which is not a stretch.
Srinivas, what's the story here?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
> On Sep 8, 2020, at 12:30 PM, Borislav Petkov <[email protected]> wrote:
>
> On Tue, Sep 08, 2020 at 12:18:38PM -0700, Sultan Alsawaf wrote:
>> I'd like to point out that on Intel's recent 14nm parts, undervolting
>> is not so much for squeezing every last drop of performance out of the
>> SoC as it is for necessity.
>
> <snip interesting examples>
>
> Sounds to me that this undervolting functionality should be part of
> the kernel and happen automatically. I have no clue, though, whether
> people who do it, just get lucky and undervolting doesn't cause any
> other hardware issues, or there's a real reason for this power madness
> and if not done, power-related failures happen only on some boxes so
> they decided to do them on all.
>
> Or maybe BIOS is nuts, which is not a stretch.
>
> Srinivas, what's the story here?
>
I have seriously mixed opinions about this.
First, there are really two separate issues here. One is undervolting these parts, and one is the MSR interface.
In my mind, the MSR part is an easy call. Userspace has no more business poking some undocumented mailbox MSR than it does poking any other MSR. We should not give special dispensation here. It should be allowed if the MSR restrictions are off, and otherwise we should warn or forbid it.
Undervolting is a bit different. It’s a genuinely useful configuration that can affect system stability. In general, I think it should be allowed, and it should have a real driver in tree.
But this has a tricky interaction with lockdown. An interface that allows root to destabilize a system may well allow root to escalate privileges. But I think that making lockdown=integrity prevent tuning voltages and such would be quite obnoxious.
Should there perhaps be a separate lockdown bit for stability?
I assume that Intel XTU and such work on Secure-Booted Windows.
On Tue, Sep 8, 2020 at 1:35 PM Andy Lutomirski <[email protected]> wrote:
> Undervolting is a bit different. It’s a genuinely useful configuration that can affect system stability. In general, I think it should be allowed, and it should have a real driver in tree.
Agree that this should be a proper driver rather than permitting
arbitrary poking (especially if this isn't an architecturally defined
MSR - there's no guarantee that it'll have the same functionality
everywhere).
> But this has a tricky interaction with lockdown. An interface that allows root to destabilize a system may well allow root to escalate privileges. But I think that making lockdown=integrity prevent tuning voltages and such would be quite obnoxious.
Indeed - plundervolt.com is a demonstration of this. Any realistic
attack involves being able to drop the voltage enough to interfere
with a calculation and then raise it again before everything else
falls over, so simply applying some rate limiting seems like it would
be sufficient.
> Should there perhaps be a separate lockdown bit for stability?
If it's a sysfs interface then I think it'd be easy enough for people
who care to just add an SELinux or Apparmor rule, tbh.
On Tue, 2020-09-08 at 21:30 +0200, Borislav Petkov wrote:
> On Tue, Sep 08, 2020 at 12:18:38PM -0700, Sultan Alsawaf wrote:
> > I'd like to point out that on Intel's recent 14nm parts,
> > undervolting
> > is not so much for squeezing every last drop of performance out of
> > the
> > SoC as it is for necessity.
>
> <snip interesting examples>
>
> Sounds to me that this undervolting functionality should be part of
> the kernel and happen automatically. I have no clue, though, whether
> people who do it, just get lucky and undervolting doesn't cause any
> other hardware issues, or there's a real reason for this power
> madness
> and if not done, power-related failures happen only on some boxes so
> they decided to do them on all.
>
> Or maybe BIOS is nuts, which is not a stretch.
>
The whole OC is based on experiments to come to correct values. This
depends on whole system design not just CPUs.
https://www.intel.com/content/www/us/en/gaming/resources/how-to-overclock.html
It warns about system stability.
> Srinivas, what's the story here?
I checked and there is no public spec. There are several mailbox
commands with version dependent on the processor.
The actual OC mailbox implementation itself is implemented in Linux in
intel_turbo_max_3 driver. So that is public.
So someone can develop a driver and provide some sysfs to send mailbox
commands, but kernel can't validate commands which can cause any
security or stability issues. Not sure if this is acceptable standard.
I don't think there is any precedent of creating such blind sysfs
entries.
Thanks,
Srinivas
On Tue, Sep 8, 2020 at 3:32 PM Matthew Garrett <[email protected]> wrote:
>
> On Tue, Sep 8, 2020 at 1:35 PM Andy Lutomirski <[email protected]> wrote:
>
> > Undervolting is a bit different. It’s a genuinely useful configuration that can affect system stability. In general, I think it should be allowed, and it should have a real driver in tree.
>
> Agree that this should be a proper driver rather than permitting
> arbitrary poking (especially if this isn't an architecturally defined
> MSR - there's no guarantee that it'll have the same functionality
> everywhere).
After looking at the code for intel-undervolt a bit, that definitely
needs kernel or even firmware support. That MSR really is a mailbox.
You write commands to it and read responses. There's no way that user
code can have adequate locking.
On Tue, Sep 8, 2020 at 6:02 PM Srinivas Pandruvada
<[email protected]> wrote:
>
> On Tue, 2020-09-08 at 21:30 +0200, Borislav Petkov wrote:
> > On Tue, Sep 08, 2020 at 12:18:38PM -0700, Sultan Alsawaf wrote:
> > > I'd like to point out that on Intel's recent 14nm parts,
> > > undervolting
> > > is not so much for squeezing every last drop of performance out of
> > > the
> > > SoC as it is for necessity.
> >
> > <snip interesting examples>
> >
> > Sounds to me that this undervolting functionality should be part of
> > the kernel and happen automatically. I have no clue, though, whether
> > people who do it, just get lucky and undervolting doesn't cause any
> > other hardware issues, or there's a real reason for this power
> > madness
> > and if not done, power-related failures happen only on some boxes so
> > they decided to do them on all.
> >
> > Or maybe BIOS is nuts, which is not a stretch.
> >
> The whole OC is based on experiments to come to correct values. This
> depends on whole system design not just CPUs.
> https://www.intel.com/content/www/us/en/gaming/resources/how-to-overclock.html
> It warns about system stability.
>
> > Srinivas, what's the story here?
> I checked and there is no public spec. There are several mailbox
> commands with version dependent on the processor.
>
> The actual OC mailbox implementation itself is implemented in Linux in
> intel_turbo_max_3 driver. So that is public.
> So someone can develop a driver and provide some sysfs to send mailbox
> commands, but kernel can't validate commands which can cause any
> security or stability issues. Not sure if this is acceptable standard.
> I don't think there is any precedent of creating such blind sysfs
> entries.
>
>
Having once written a rejected driver to poke at the Intel Xeon memory
controller SMBUS, there is at least precedent for writing drivers for
dangerous interfaces, if not merging said drivers :)
But we have drivers for various EC interfaces, for other SMBUS hosts,
etc, and all of these can cause all kinds of mischief if misused.
On Tue, Sep 08, 2020 at 06:02:05PM -0700, Srinivas Pandruvada wrote:
> The actual OC mailbox implementation itself is implemented in Linux in
> intel_turbo_max_3 driver. So that is public.
> So someone can develop a driver and provide some sysfs to send mailbox
> commands, but kernel can't validate commands which can cause any
> security or stability issues. Not sure if this is acceptable standard.
> I don't think there is any precedent of creating such blind sysfs
> entries.
So we don't need to validate those commands - we can issue a
pr_warn_once() when something pokes at that to say that issuing those
commands is dangerous.
For example, from looking at
drivers/platform/x86/intel_turbo_max_3.c::get_oc_core_priority()
we should at least provide a well-defined interface to at least
synchronize access to that MSR with the kernel. And then maybe allow a
well-defined set of commands or better yet, we do them ourselves. Here's
what I mean:
Looking at the code in intel-undervolt:
bool undervolt(struct config_t * config, bool * nl, bool write) {
bool success = true;
bool nll = false;
int i;
for (i = 0; config->undervolts && i < config->undervolts->count; i++) {
struct undervolt_t * undervolt = array_get(config->undervolts, i);
static const int mask = 0x800;
uint64_t uvint = ((uint64_t) (mask - absf(undervolt->value) * 1.024f +
0.5f) << 21) & 0xffffffff;
uint64_t rdval = 0x8000001000000000 |
((uint64_t) undervolt->index << 40);
uint64_t wrval = rdval | 0x100000000 | uvint;
bool write_success = !write ||
wr(config, MSR_ADDR_VOLTAGE, wrval);
bool read_success = write_success &&
wr(config, MSR_ADDR_VOLTAGE, rdval) &&
rd(config, MSR_ADDR_VOLTAGE, rdval);
That MSR_ADDR_VOLTAGE is 0x150, i.e., MSR_OC_MAILBOX.
Trying to decipher the MSR accesses, it looks like it does the write
with:
0x8000001000000000 | (0xf << 40) | (0x3 << 21) | 0x100000000
and I've made the uvint 0x3 so that I can see the two 11s in the
bitfield below.
The undervolt index I made 0xffff for a similar reason:
And the result is:
Hex: 0x80000f1100600000 Dec: 9.223.388.602.549.927.936
31 27 23 19 15 11 7 3 31 27 23 19 15 11 7 3
1000_0000_0000_0000_0000_1111_0001_0001_0000_0000_0110_0000_0000_0000_0000_0000
63 59 55 51 47 43 39 35 31 27 23 19 15 11 7 3
With
- bit 63: MSR_OC_MAILBOX_BUSY_BIT
- [47?:40]: that's some index, undervolting index, who knows. I'm assuming this is
a byte, thus the 47?.
- [39?:32]: cmd, in this case, 0x11, gonna assume that the command is bits [39:32]
looking how this is a byte too:
#define OC_MAILBOX_FC_CONTROL_CMD 0x1C
and
- [31:21]: the undervolt value
The second write does:
0x8000001000000000 | (0xf << 40)
Hex: 0x80000f1000000000 Dec: 9.223.388.598.248.669.184
31 27 23 19 15 11 7 3 31 27 23 19 15 11 7 3
1000_0000_0000_0000_0000_1111_0001_0000_0000_0000_0000_0000_0000_0000_0000_0000
63 59 55 51 47 43 39 35 31 27 23 19 15 11 7 3
- bit 63: MSR_OC_MAILBOX_BUSY_BIT
- [47:40] index
- [39:32] cmd - 0x10
All from only staring at this anyway - could very well be wrong.
In any case, my point is that we could have a sysfs interface for
those userspace-suppliable values like the undervolt value at [31:21],
dunno if the index can be inferred by the kernel automatically or
enumerated and the commands we should issue ourselves depending on the
functionality, etc.
And put all that in drivers/platform/x86/intel_turbo_max_3.c instead of
leaving userspace to poke at it.
Thoughts?
Btw, intel-undervolt pokes all in all at:
#define MSR_ADDR_TEMPERATURE 0x1a2
#define MSR_ADDR_UNITS 0x606
#define MSR_ADDR_VOLTAGE 0x150
and those should probably be exposed too.
The temperature target one is read at least by this too:
https://py3status.readthedocs.io/en/latest/modules.html
but at least that MSR is documented so exposing it is trivial.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Mon, 2020-10-19 at 19:15 +0200, Borislav Petkov wrote:
> On Tue, Sep 08, 2020 at 06:02:05PM -0700, Srinivas Pandruvada wrote:
> > The actual OC mailbox implementation itself is implemented in Linux
> > in
> > intel_turbo_max_3 driver. So that is public.
> > So someone can develop a driver and provide some sysfs to send
> > mailbox
> > commands, but kernel can't validate commands which can cause any
> > security or stability issues. Not sure if this is acceptable
> > standard.
> > I don't think there is any precedent of creating such blind sysfs
> > entries.
>
> So we don't need to validate those commands - we can issue a
> pr_warn_once() when something pokes at that to say that issuing those
> commands is dangerous.
>
> For example, from looking at
>
> drivers/platform/x86/intel_turbo_max_3.c::get_oc_core_priority()
>
> we should at least provide a well-defined interface to at least
> synchronize access to that MSR with the kernel. And then maybe allow
> a
> well-defined set of commands or better yet, we do them ourselves.
> Here's
> what I mean:
>
> Looking at the code in intel-undervolt:
>
> bool undervolt(struct config_t * config, bool * nl, bool write) {
> bool success = true;
> bool nll = false;
> int i;
>
> for (i = 0; config->undervolts && i < config->undervolts-
> >count; i++) {
> struct undervolt_t * undervolt = array_get(config-
> >undervolts, i);
>
> static const int mask = 0x800;
> uint64_t uvint = ((uint64_t) (mask - absf(undervolt-
> >value) * 1.024f +
> 0.5f) << 21) & 0xffffffff;
> uint64_t rdval = 0x8000001000000000 |
> ((uint64_t) undervolt->index << 40);
> uint64_t wrval = rdval | 0x100000000 | uvint;
>
> bool write_success = !write ||
> wr(config, MSR_ADDR_VOLTAGE, wrval);
> bool read_success = write_success &&
> wr(config, MSR_ADDR_VOLTAGE, rdval) &&
> rd(config, MSR_ADDR_VOLTAGE, rdval);
>
>
> That MSR_ADDR_VOLTAGE is 0x150, i.e., MSR_OC_MAILBOX.
>
> Trying to decipher the MSR accesses, it looks like it does the write
> with:
>
> 0x8000001000000000 | (0xf << 40) | (0x3 << 21) | 0x100000000
>
> and I've made the uvint 0x3 so that I can see the two 11s in the
> bitfield below.
>
> The undervolt index I made 0xffff for a similar reason:
>
> And the result is:
>
> Hex: 0x80000f1100600000 Dec: 9.223.388.602.549.927.936
> 31 27 23 19 15 11 7 3 31 27 23 19 15 11
> 7 3
> 1000_0000_0000_0000_0000_1111_0001_0001_0000_0000_0110_0000_0000_0000
> _0000_0000
> 63 59 55 51 47 43 39 35 31 27 23 19 15 11
> 7 3
>
> With
>
> - bit 63: MSR_OC_MAILBOX_BUSY_BIT
>
> - [47?:40]: that's some index, undervolting index, who knows. I'm
> assuming this is
> a byte, thus the 47?.
>
>
> - [39?:32]: cmd, in this case, 0x11, gonna assume that the command is
> bits [39:32]
> looking how this is a byte too:
>
> #define OC_MAILBOX_FC_CONTROL_CMD 0x1C
>
> and
>
> - [31:21]: the undervolt value
>
> The second write does:
>
> 0x8000001000000000 | (0xf << 40)
> Hex: 0x80000f1000000000 Dec: 9.223.388.598.248.669.184
> 31 27 23 19 15 11 7 3 31 27 23 19 15 11
> 7 3
> 1000_0000_0000_0000_0000_1111_0001_0000_0000_0000_0000_0000_0000_0000
> _0000_0000
> 63 59 55 51 47 43 39 35 31 27 23 19 15 11
> 7 3
>
> - bit 63: MSR_OC_MAILBOX_BUSY_BIT
> - [47:40] index
> - [39:32] cmd - 0x10
>
> All from only staring at this anyway - could very well be wrong.
>
These command id are model specific. There is no guarantee that even
meaning changes. So I don't think we should write any code in kernel
which can't stick.
> In any case, my point is that we could have a sysfs interface for
> those userspace-suppliable values like the undervolt value at
> [31:21],
> dunno if the index can be inferred by the kernel automatically or
> enumerated and the commands we should issue ourselves depending on
> the
> functionality, etc.
>
> And put all that in drivers/platform/x86/intel_turbo_max_3.c instead
> of
> leaving userspace to poke at it.
>
May be something like this:
- Separate mailbox stuff from intel_turbo_max_3.c
- Create a standalone module which creates a debugfs interface
- This debugs interface takes one 64 bit value from user space and use
protocol to avoid contention
- Warns users on writes via new interfaces you suggested above
> Thoughts?
>
> Btw, intel-undervolt pokes all in all at:
>
> #define MSR_ADDR_TEMPERATURE 0x1a2
Need to check use case for undervolt.
> #define MSR_ADDR_UNITS 0x606
Why not reuse powercap rapl interface. That interface will take care of
units.
> #define MSR_ADDR_VOLTAGE 0x150
This will not be needed once we expose the above debugfs interface.
This is the OC mailbox MSR.
Thanks,
Srinivas
>
> and those should probably be exposed too.
>
> The temperature target one is read at least by this too:
>
> https://py3status.readthedocs.io/en/latest/modules.html
>
> but at least that MSR is documented so exposing it is trivial.
>
> Thx.
>
On Tue, Oct 20, 2020 at 10:21:48AM -0700, Srinivas Pandruvada wrote:
> These command id are model specific. There is no guarantee that even
> meaning changes. So I don't think we should write any code in kernel
> which can't stick.
Ok, is there a common *set* of values present on all models?
A common set which we can abstract out from the MSR and have userspace
write them into sysfs and the kernel does the model-specific write?
The sysfs interface should simply provide the functionality, like, for
example say: "we have X valid undervolt indices, choose one".
Userspace doesn't have to deal with *how* that write happens and which
bits need to be set in the MSR and depend on the model - that's all
abstracted away by the kernel. All userspace needs to care about is
*what* it wants done to the hw. The *how exactly* is done by the kernel.
And then the differences are done with x86 model tests.
Does that make more sense?
> May be something like this:
> - Separate mailbox stuff from intel_turbo_max_3.c
Yah, that makes sense.
> - Create a standalone module which creates a debugfs interface
> - This debugs interface takes one 64 bit value from user space and use
> protocol to avoid contention
We can't make debugfs an API - debugfs can change at any point in time.
If you want an API, you put it in sysfs or in a separate fs.
> - Warns users on writes via new interfaces you suggested above
> > #define MSR_ADDR_TEMPERATURE 0x1a2
> Need to check use case for undervolt.
throttled uses it too. I asked them today to talk to us to design a
proper interface which satisfies their needs:
https://github.com/erpalma/throttled/issues/215
> > #define MSR_ADDR_UNITS 0x606
> Why not reuse powercap rapl interface. That interface will take care of
> units.
Sure.
Btw, you should have a look at those tools - they all poke at all kinds
of MSRs and correcting that is like a whack-a-mole game... ;-\
Oh, and the kernel pokes at them too so imagine the surprise one would have when
some kernel driver like
drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
went and read some MSRs and then all of a sudden they changed because
some userspace daemon wrote them underneath it. Not good.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Oct 20, 2020 at 10:21:48AM -0700, Srinivas Pandruvada wrote:
> On Mon, 2020-10-19 at 19:15 +0200, Borislav Petkov wrote:
> These command id are model specific. There is no guarantee that even
> meaning changes. So I don't think we should write any code in kernel
> which can't stick.
>
>
> > In any case, my point is that we could have a sysfs interface for
> > those userspace-suppliable values like the undervolt value at
> > [31:21],
> > dunno if the index can be inferred by the kernel automatically or
> > enumerated and the commands we should issue ourselves depending on
> > the
> > functionality, etc.
Why not have a full undervolt driver. That is, don't expose OC_MAILBOX
_at_all_, but have a model specific driver that provides undervolt
capabilities.
Someone is now maintaining this thing in userspace, might as well do it
as a kernel driver and keep all the icky bits inside.
On 10/20/20 11:40 AM, Srinivas Pandruvada wrote:
> On Tue, 2020-10-20 at 19:47 +0200, Borislav Petkov wrote:
>> On Tue, Oct 20, 2020 at 10:21:48AM -0700, Srinivas Pandruvada wrote:
>>> These command id are model specific. There is no guarantee that
>>> even
>>> meaning changes. So I don't think we should write any code in
>>> kernel
>>> which can't stick.
>> Ok, is there a common *set* of values present on all models
> Sorry, don't know.
So, the question is: Is Intel willing to document this on a sufficient
number of models that folks can make a sane driver out of it?
Srinivas, that seems like a pretty sane thing for the community to ask.
We've got random folks poking at MSRs and we don't know whether they're
nuts or not and whether we should spew warnings of disdain. Seems like
it would be in Intel's best interests to understand what users are doing
with this MSR and to try to make sure they're not doing stuff which is
too nutty, or at least give them the chance of avoiding warnings if
they're being nice.
Sounds like Borislav is willing to help give Intel's customers a nicer
interface. Mostly we from Intel would have to go dig out the docs for
as many models as we can, and make sure we're allowed to talk about it
publicly.
I dunno. Maybe we should try it for *one* model and see how it goes.
Maybe start with the one we're already poking from inside the kernel.
On Tue, 2020-10-20 at 19:47 +0200, Borislav Petkov wrote:
> On Tue, Oct 20, 2020 at 10:21:48AM -0700, Srinivas Pandruvada wrote:
> > These command id are model specific. There is no guarantee that
> > even
> > meaning changes. So I don't think we should write any code in
> > kernel
> > which can't stick.
>
> Ok, is there a common *set* of values present on all models
Sorry, don't know.
>
> A common set which we can abstract out from the MSR and have
> userspace
> write them into sysfs and the kernel does the model-specific write?
>
> The sysfs interface should simply provide the functionality, like,
> for
> example say: "we have X valid undervolt indices, choose one".
>
> Userspace doesn't have to deal with *how* that write happens and
> which
> bits need to be set in the MSR and depend on the model - that's all
> abstracted away by the kernel. All userspace needs to care about is
> *what* it wants done to the hw. The *how exactly* is done by the
> kernel.
>
> And then the differences are done with x86 model tests.
>
> Does that make more sense?
>
> > May be something like this:
> > - Separate mailbox stuff from intel_turbo_max_3.c
>
> Yah, that makes sense.
>
> > - Create a standalone module which creates a debugfs interface
> > - This debugs interface takes one 64 bit value from user space and
> > use
> > protocol to avoid contention
>
> We can't make debugfs an API - debugfs can change at any point in
> time.
> If you want an API, you put it in sysfs or in a separate fs.
Ok we can create a sysfs entry.
>
> > - Warns users on writes via new interfaces you suggested above
> > > #define MSR_ADDR_TEMPERATURE 0x1a2
> > Need to check use case for undervolt.
>
> throttled uses it too. I asked them today to talk to us to design a
> proper interface which satisfies their needs:
>
> https://github.com/erpalma/throttled/issues/215
>
> > > #define MSR_ADDR_UNITS 0x606
> > Why not reuse powercap rapl interface. That interface will take
> > care of
> > units.
>
> Sure.
>
> Btw, you should have a look at those tools - they all poke at all
> kinds
> of MSRs and correcting that is like a whack-a-mole game... ;-\
>
> Oh, and the kernel pokes at them too so imagine the surprise one
> would have when
> some kernel driver like
>
> drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
>
> went and read some MSRs and then all of a sudden they changed because
> some userspace daemon wrote them underneath it. Not good.
Agree, that poking MSR from user space is not a right thing to do.
Thanks,
Srinivas
>
> Thx.
>
On Tue, 2020-10-20 at 12:40 -0700, Dave Hansen wrote:
> On 10/20/20 11:40 AM, Srinivas Pandruvada wrote:
> > On Tue, 2020-10-20 at 19:47 +0200, Borislav Petkov wrote:
> > > On Tue, Oct 20, 2020 at 10:21:48AM -0700, Srinivas Pandruvada
> > > wrote:
> > > > These command id are model specific. There is no guarantee that
> > > > even
> > > > meaning changes. So I don't think we should write any code in
> > > > kernel
> > > > which can't stick.
> > > Ok, is there a common *set* of values present on all models
> > Sorry, don't know.
>
> So, the question is: Is Intel willing to document this on a
> sufficient
> number of models that folks can make a sane driver out of it?
>
> Srinivas, that seems like a pretty sane thing for the community to
> ask.
> We've got random folks poking at MSRs and we don't know whether
> they're
> nuts or not and whether we should spew warnings of disdain. Seems
> like
> it would be in Intel's best interests to understand what users are
> doing
> with this MSR and to try to make sure they're not doing stuff which
> is
> too nutty, or at least give them the chance of avoiding warnings if
> they're being nice.
We are all for it. We have added several sysfs interfaces and adding
more.
>
> Sounds like Borislav is willing to help give Intel's customers a
> nicer
> interface. Mostly we from Intel would have to go dig out the docs
> for
> as many models as we can, and make sure we're allowed to talk about
> it
> publicly.
>
That is the problem. There is no public document.
> I dunno. Maybe we should try it for *one* model and see how it goes.
> Maybe start with the one we're already poking from inside the kernel.
On Wed, Oct 21, 2020 at 06:11:09AM -0700, Srinivas Pandruvada wrote:
> That is the problem. There is no public document.
Well, considering how important this functionality has become, maybe
they should be made public and maybe there should be a kernel undervolt
driver, as peterz suggests.
> > I dunno. Maybe we should try it for *one* model and see how it goes.
> > Maybe start with the one we're already poking from inside the kernel.
Yes, we can surely give it a try.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette