Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently
turns all rdmsr and wrmsr operations into the safe variants without
any checks that the operations actually succeed.
With CONFIG_PARAVIRT=n, unchecked MSR failures OOPS and probably
cause boot to fail if they happen before init starts.
Neither behavior is very good, and it's particularly unfortunate that
the behavior changes depending on CONFIG_PARAVIRT.
In particular, KVM guests might be unwittingly depending on the
PARAVIRT=y behavior because CONFIG_KVM_GUEST currently depends on
CONFIG_PARAVIRT, and, because accesses in that case are completely
unchecked, we wouldn't even see a warning.
This series changes the native behavior, regardless of
CONFIG_PARAVIRT. A non-"safe" MSR failure will give an informative
warning and will be fixed up -- native_read_msr will return zero,
and both reads and writes will continue where they left off.
If panic_on_oops is set, they will still OOPS and panic.
By using the shiny new custom exception handler infrastructure,
there should be no overhead on the success paths.
I didn't change the behavior on Xen, but, with this series applied,
it would be straightforward for the Xen maintainers to make the
corresponding change -- knowledge of whether the access is "safe" is
now propagated into the pvops.
Doing this is probably a prerequisite to sanely decoupling
CONFIG_KVM_GUEST and CONFIG_PARAVIRT, which would probably make
Arjan and the rest of the Clear Containers people happy :)
There's also room to reduce the code size of the "safe" variants
using custom exception handlers in the future.
Andy Lutomirski (5):
x86/paravirt: Add _safe to the read_msr and write_msr PV hooks
x86/msr: Carry on after a non-"safe" MSR access fails without
!panic_on_oops
x86/paravirt: Add paravirt_{read,write}_msr
x86/paravirt: Make "unsafe" MSR accesses unsafe even if PARAVIRT=y
x86/msr: Set the return value to zero when native_rdmsr_safe fails
arch/x86/include/asm/msr.h | 20 ++++++++++++----
arch/x86/include/asm/paravirt.h | 45 +++++++++++++++++++++--------------
arch/x86/include/asm/paravirt_types.h | 14 +++++++----
arch/x86/kernel/paravirt.c | 6 +++--
arch/x86/mm/extable.c | 33 +++++++++++++++++++++++++
arch/x86/xen/enlighten.c | 27 +++++++++++++++++++--
6 files changed, 114 insertions(+), 31 deletions(-)
--
2.5.0
This demotes an OOPS and likely panic due to a failed non-"safe" MSR
access to a WARN and, for RDMSR, a return value of zero. If
panic_on_oops is set, then failed unsafe MSR accesses will still
oops and panic.
To be clear, this type of failure should *not* happen. This patch
exists to minimize the chance of nasty undebuggable failures due on
systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.
Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/include/asm/msr.h | 10 ++++++++--
arch/x86/mm/extable.c | 33 +++++++++++++++++++++++++++++++++
2 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 93fb7c1cffda..1487054a1a70 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -92,7 +92,10 @@ static inline unsigned long long native_read_msr(unsigned int msr)
{
DECLARE_ARGS(val, low, high);
- asm volatile("rdmsr" : EAX_EDX_RET(val, low, high) : "c" (msr));
+ asm volatile("1: rdmsr\n"
+ "2:\n"
+ _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_unsafe)
+ : EAX_EDX_RET(val, low, high) : "c" (msr));
if (msr_tracepoint_active(__tracepoint_read_msr))
do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), 0);
return EAX_EDX_VAL(val, low, high);
@@ -119,7 +122,10 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr,
static inline void native_write_msr(unsigned int msr,
unsigned low, unsigned high)
{
- asm volatile("wrmsr" : : "c" (msr), "a"(low), "d" (high) : "memory");
+ asm volatile("1: wrmsr\n"
+ "2:\n"
+ _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe)
+ : : "c" (msr), "a"(low), "d" (high) : "memory");
if (msr_tracepoint_active(__tracepoint_read_msr))
do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
}
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 9dd7e4b7fcde..f310714e6e6d 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -49,6 +49,39 @@ bool ex_handler_ext(const struct exception_table_entry *fixup,
}
EXPORT_SYMBOL(ex_handler_ext);
+bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup,
+ struct pt_regs *regs, int trapnr)
+{
+ WARN(1, "unsafe MSR access error: RDMSR from 0x%x",
+ (unsigned int)regs->cx);
+
+ /* If panic_on_oops is set, don't try to recover. */
+ if (panic_on_oops)
+ return false;
+
+ regs->ip = ex_fixup_addr(fixup);
+ regs->ax = 0;
+ regs->dx = 0;
+ return true;
+}
+EXPORT_SYMBOL(ex_handler_rdmsr_unsafe);
+
+bool ex_handler_wrmsr_unsafe(const struct exception_table_entry *fixup,
+ struct pt_regs *regs, int trapnr)
+{
+ WARN(1, "unsafe MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x)\n",
+ (unsigned int)regs->cx,
+ (unsigned int)regs->dx, (unsigned int)regs->ax);
+
+ /* If panic_on_oops is set, don't try to recover. */
+ if (panic_on_oops)
+ return false;
+
+ regs->ip = ex_fixup_addr(fixup);
+ return true;
+}
+EXPORT_SYMBOL(ex_handler_wrmsr_unsafe);
+
bool ex_has_fault_handler(unsigned long ip)
{
const struct exception_table_entry *e;
--
2.5.0
This will cause unchecked native_rdmsr_safe failures to return
deterministic results.
Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/include/asm/msr.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 13da359881d7..e97e79f8a22b 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -109,7 +109,10 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr,
asm volatile("2: rdmsr ; xor %[err],%[err]\n"
"1:\n\t"
".section .fixup,\"ax\"\n\t"
- "3: mov %[fault],%[err] ; jmp 1b\n\t"
+ "3: mov %[fault],%[err]\n\t"
+ "xorl %%eax, %%eax\n\t"
+ "xorl %%edx, %%edx\n\t"
+ "jmp 1b\n\t"
".previous\n\t"
_ASM_EXTABLE(2b, 3b)
: [err] "=r" (*err), EAX_EDX_RET(val, low, high)
--
2.5.0
Enabling CONFIG_PARAVIRT had an unintended side effect: rdmsr turned
into rdmsr_safe and wrmsr turned into wrmsr_safe, even on bare
metal. Undo that by using the new unsafe paravirt MSR hooks.
Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/include/asm/paravirt.h | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 68297d87e85c..0c99f10874e4 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -151,24 +151,21 @@ static inline int paravirt_write_msr_safe(unsigned msr,
return PVOP_CALL3(int, pv_cpu_ops.write_msr_safe, msr, low, high);
}
-/* These should all do BUG_ON(_err), but our headers are too tangled. */
#define rdmsr(msr, val1, val2) \
do { \
- int _err; \
- u64 _l = paravirt_read_msr_safe(msr, &_err); \
+ u64 _l = paravirt_read_msr(msr); \
val1 = (u32)_l; \
val2 = _l >> 32; \
} while (0)
#define wrmsr(msr, val1, val2) \
do { \
- paravirt_write_msr_safe(msr, val1, val2); \
+ paravirt_write_msr(msr, val1, val2); \
} while (0)
#define rdmsrl(msr, val) \
do { \
- int _err; \
- val = paravirt_read_msr_safe(msr, &_err); \
+ val = paravirt_read_msr(msr); \
} while (0)
static inline void wrmsrl(unsigned msr, u64 val)
--
2.5.0
This adds paravirt hooks for unsafe MSR access. On native, they
call native_{read,write}_msr. On Xen, they use
xen_{read,write}_msr_safe.
Nothing uses them yet for ease of bisection. The next patch will
use them in rdmsrl, wrmsrl, etc.
I intentionally didn't make them OOPS on #GP on Xen. I think that
should be done separately by the Xen maintainers.
Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/include/asm/msr.h | 5 +++--
arch/x86/include/asm/paravirt.h | 11 +++++++++++
arch/x86/include/asm/paravirt_types.h | 10 ++++++++--
arch/x86/kernel/paravirt.c | 2 ++
arch/x86/xen/enlighten.c | 23 +++++++++++++++++++++++
5 files changed, 47 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 1487054a1a70..13da359881d7 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -119,8 +119,9 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr,
return EAX_EDX_VAL(val, low, high);
}
-static inline void native_write_msr(unsigned int msr,
- unsigned low, unsigned high)
+/* Can be uninlined because referenced by paravirt */
+notrace static inline void native_write_msr(unsigned int msr,
+ unsigned low, unsigned high)
{
asm volatile("1: wrmsr\n"
"2:\n"
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 2e49228ed9a3..68297d87e85c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -129,6 +129,17 @@ static inline void wbinvd(void)
#define get_kernel_rpl() (pv_info.kernel_rpl)
+static inline u64 paravirt_read_msr(unsigned msr)
+{
+ return PVOP_CALL1(u64, pv_cpu_ops.read_msr, msr);
+}
+
+static inline void paravirt_write_msr(unsigned msr,
+ unsigned low, unsigned high)
+{
+ return PVOP_VCALL3(pv_cpu_ops.write_msr, msr, low, high);
+}
+
static inline u64 paravirt_read_msr_safe(unsigned msr, int *err)
{
return PVOP_CALL2(u64, pv_cpu_ops.read_msr_safe, msr, err);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 5a06cccd36f0..b85aec45a1b8 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -155,8 +155,14 @@ struct pv_cpu_ops {
void (*cpuid)(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx);
- /* MSR operations.
- err = 0/-EIO. wrmsr returns 0/-EIO. */
+ /* Unsafe MSR operations. These will warn or panic on failure. */
+ u64 (*read_msr)(unsigned int msr);
+ void (*write_msr)(unsigned int msr, unsigned low, unsigned high);
+
+ /*
+ * Safe MSR operations.
+ * read sets err to 0 or -EIO. write returns 0 or -EIO.
+ */
u64 (*read_msr_safe)(unsigned int msr, int *err);
int (*write_msr_safe)(unsigned int msr, unsigned low, unsigned high);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 8aad95478ae5..f9583917c7c4 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -339,6 +339,8 @@ __visible struct pv_cpu_ops pv_cpu_ops = {
.write_cr8 = native_write_cr8,
#endif
.wbinvd = native_wbinvd,
+ .read_msr = native_read_msr,
+ .write_msr = native_write_msr,
.read_msr_safe = native_read_msr_safe,
.write_msr_safe = native_write_msr_safe,
.read_pmc = native_read_pmc,
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ff2df20d0067..548cd3e02b9e 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1092,6 +1092,26 @@ static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
return ret;
}
+static u64 xen_read_msr(unsigned int msr)
+{
+ /*
+ * This will silently swallow a #GP from RDMSR. It may be worth
+ * changing that.
+ */
+ int err;
+
+ return xen_read_msr_safe(msr, &err);
+}
+
+static void xen_write_msr(unsigned int msr, unsigned low, unsigned high)
+{
+ /*
+ * This will silently swallow a #GP from WRMSR. It may be worth
+ * changing that.
+ */
+ xen_write_msr_safe(msr, low, high);
+}
+
void xen_setup_shared_info(void)
{
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
@@ -1222,6 +1242,9 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = {
.wbinvd = native_wbinvd,
+ .read_msr = xen_read_msr,
+ .write_msr = xen_write_msr,
+
.read_msr_safe = xen_read_msr_safe,
.write_msr_safe = xen_write_msr_safe,
--
2.5.0
These hooks match the _safe variants, so name them accordingly.
This will make room for unsafe PV hooks.
Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/include/asm/paravirt.h | 33 +++++++++++++++++----------------
arch/x86/include/asm/paravirt_types.h | 8 ++++----
arch/x86/kernel/paravirt.c | 4 ++--
arch/x86/xen/enlighten.c | 4 ++--
4 files changed, 25 insertions(+), 24 deletions(-)
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index f6192502149e..2e49228ed9a3 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -129,34 +129,35 @@ static inline void wbinvd(void)
#define get_kernel_rpl() (pv_info.kernel_rpl)
-static inline u64 paravirt_read_msr(unsigned msr, int *err)
+static inline u64 paravirt_read_msr_safe(unsigned msr, int *err)
{
- return PVOP_CALL2(u64, pv_cpu_ops.read_msr, msr, err);
+ return PVOP_CALL2(u64, pv_cpu_ops.read_msr_safe, msr, err);
}
-static inline int paravirt_write_msr(unsigned msr, unsigned low, unsigned high)
+static inline int paravirt_write_msr_safe(unsigned msr,
+ unsigned low, unsigned high)
{
- return PVOP_CALL3(int, pv_cpu_ops.write_msr, msr, low, high);
+ return PVOP_CALL3(int, pv_cpu_ops.write_msr_safe, msr, low, high);
}
/* These should all do BUG_ON(_err), but our headers are too tangled. */
#define rdmsr(msr, val1, val2) \
do { \
int _err; \
- u64 _l = paravirt_read_msr(msr, &_err); \
+ u64 _l = paravirt_read_msr_safe(msr, &_err); \
val1 = (u32)_l; \
val2 = _l >> 32; \
} while (0)
#define wrmsr(msr, val1, val2) \
do { \
- paravirt_write_msr(msr, val1, val2); \
+ paravirt_write_msr_safe(msr, val1, val2); \
} while (0)
#define rdmsrl(msr, val) \
do { \
int _err; \
- val = paravirt_read_msr(msr, &_err); \
+ val = paravirt_read_msr_safe(msr, &_err); \
} while (0)
static inline void wrmsrl(unsigned msr, u64 val)
@@ -164,23 +165,23 @@ static inline void wrmsrl(unsigned msr, u64 val)
wrmsr(msr, (u32)val, (u32)(val>>32));
}
-#define wrmsr_safe(msr, a, b) paravirt_write_msr(msr, a, b)
+#define wrmsr_safe(msr, a, b) paravirt_write_msr_safe(msr, a, b)
/* rdmsr with exception handling */
-#define rdmsr_safe(msr, a, b) \
-({ \
- int _err; \
- u64 _l = paravirt_read_msr(msr, &_err); \
- (*a) = (u32)_l; \
- (*b) = _l >> 32; \
- _err; \
+#define rdmsr_safe(msr, a, b) \
+({ \
+ int _err; \
+ u64 _l = paravirt_read_msr_safe(msr, &_err); \
+ (*a) = (u32)_l; \
+ (*b) = _l >> 32; \
+ _err; \
})
static inline int rdmsrl_safe(unsigned msr, unsigned long long *p)
{
int err;
- *p = paravirt_read_msr(msr, &err);
+ *p = paravirt_read_msr_safe(msr, &err);
return err;
}
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 77db5616a473..5a06cccd36f0 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -155,10 +155,10 @@ struct pv_cpu_ops {
void (*cpuid)(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx);
- /* MSR, PMC and TSR operations.
- err = 0/-EFAULT. wrmsr returns 0/-EFAULT. */
- u64 (*read_msr)(unsigned int msr, int *err);
- int (*write_msr)(unsigned int msr, unsigned low, unsigned high);
+ /* MSR operations.
+ err = 0/-EIO. wrmsr returns 0/-EIO. */
+ u64 (*read_msr_safe)(unsigned int msr, int *err);
+ int (*write_msr_safe)(unsigned int msr, unsigned low, unsigned high);
u64 (*read_pmc)(int counter);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index f08ac28b8136..8aad95478ae5 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -339,8 +339,8 @@ __visible struct pv_cpu_ops pv_cpu_ops = {
.write_cr8 = native_write_cr8,
#endif
.wbinvd = native_wbinvd,
- .read_msr = native_read_msr_safe,
- .write_msr = native_write_msr_safe,
+ .read_msr_safe = native_read_msr_safe,
+ .write_msr_safe = native_write_msr_safe,
.read_pmc = native_read_pmc,
.load_tr_desc = native_load_tr_desc,
.set_ldt = native_set_ldt,
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index d09e4c9d7cc5..ff2df20d0067 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1222,8 +1222,8 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = {
.wbinvd = native_wbinvd,
- .read_msr = xen_read_msr_safe,
- .write_msr = xen_write_msr_safe,
+ .read_msr_safe = xen_read_msr_safe,
+ .write_msr_safe = xen_write_msr_safe,
.read_pmc = xen_read_pmc,
--
2.5.0
* Andy Lutomirski <[email protected]> wrote:
> +bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup,
> + struct pt_regs *regs, int trapnr)
> +{
> + WARN(1, "unsafe MSR access error: RDMSR from 0x%x",
> + (unsigned int)regs->cx);
Please make this WARN_ONCE(). There's no point in locking up the system with
WARN() spam, should this trigger frequently.
> + WARN(1, "unsafe MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x)\n",
> + (unsigned int)regs->cx,
> + (unsigned int)regs->dx, (unsigned int)regs->ax);
Ditto.
Thanks,
Ingo
* Andy Lutomirski <[email protected]> wrote:
> This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> access to a WARN and, for RDMSR, a return value of zero. If
> panic_on_oops is set, then failed unsafe MSR accesses will still
> oops and panic.
>
> To be clear, this type of failure should *not* happen. This patch
> exists to minimize the chance of nasty undebuggable failures due on
> systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.
>
> Signed-off-by: Andy Lutomirski <[email protected]>
> ---
> arch/x86/include/asm/msr.h | 10 ++++++++--
> arch/x86/mm/extable.c | 33 +++++++++++++++++++++++++++++++++
> 2 files changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index 93fb7c1cffda..1487054a1a70 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -92,7 +92,10 @@ static inline unsigned long long native_read_msr(unsigned int msr)
> {
> DECLARE_ARGS(val, low, high);
>
> - asm volatile("rdmsr" : EAX_EDX_RET(val, low, high) : "c" (msr));
> + asm volatile("1: rdmsr\n"
> + "2:\n"
> + _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_unsafe)
> + : EAX_EDX_RET(val, low, high) : "c" (msr));
> if (msr_tracepoint_active(__tracepoint_read_msr))
> do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), 0);
> return EAX_EDX_VAL(val, low, high);
> @@ -119,7 +122,10 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr,
> static inline void native_write_msr(unsigned int msr,
> unsigned low, unsigned high)
> {
> - asm volatile("wrmsr" : : "c" (msr), "a"(low), "d" (high) : "memory");
> + asm volatile("1: wrmsr\n"
> + "2:\n"
> + _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe)
> + : : "c" (msr), "a"(low), "d" (high) : "memory");
> if (msr_tracepoint_active(__tracepoint_read_msr))
> do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
> }
> diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
> index 9dd7e4b7fcde..f310714e6e6d 100644
> --- a/arch/x86/mm/extable.c
> +++ b/arch/x86/mm/extable.c
> @@ -49,6 +49,39 @@ bool ex_handler_ext(const struct exception_table_entry *fixup,
> }
> EXPORT_SYMBOL(ex_handler_ext);
>
> +bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup,
> + struct pt_regs *regs, int trapnr)
> +{
> + WARN(1, "unsafe MSR access error: RDMSR from 0x%x",
> + (unsigned int)regs->cx);
Btw., instead of the safe/unsafe naming (which has an emotional and security
secondary attribute), shouldn't we move this over to a _check() (or _checking())
naming instead that we do in other places in the kernel?
I.e.:
rdmsr(msr, l, h);
and:
if (rdmsr_check(msr, l, h)) {
...
}
and then we could name the helpers as _check() and _nocheck() - which is neutral
naming.
Thanks,
Ingo
On Sat, Mar 12, 2016 at 7:36 AM, Ingo Molnar <[email protected]> wrote:
>
> * Andy Lutomirski <[email protected]> wrote:
>
>> This demotes an OOPS and likely panic due to a failed non-"safe" MSR
>> access to a WARN and, for RDMSR, a return value of zero. If
>> panic_on_oops is set, then failed unsafe MSR accesses will still
>> oops and panic.
>>
>> To be clear, this type of failure should *not* happen. This patch
>> exists to minimize the chance of nasty undebuggable failures due on
>> systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.
>>
>> Signed-off-by: Andy Lutomirski <[email protected]>
>> ---
>> arch/x86/include/asm/msr.h | 10 ++++++++--
>> arch/x86/mm/extable.c | 33 +++++++++++++++++++++++++++++++++
>> 2 files changed, 41 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
>> index 93fb7c1cffda..1487054a1a70 100644
>> --- a/arch/x86/include/asm/msr.h
>> +++ b/arch/x86/include/asm/msr.h
>> @@ -92,7 +92,10 @@ static inline unsigned long long native_read_msr(unsigned int msr)
>> {
>> DECLARE_ARGS(val, low, high);
>>
>> - asm volatile("rdmsr" : EAX_EDX_RET(val, low, high) : "c" (msr));
>> + asm volatile("1: rdmsr\n"
>> + "2:\n"
>> + _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_unsafe)
>> + : EAX_EDX_RET(val, low, high) : "c" (msr));
>> if (msr_tracepoint_active(__tracepoint_read_msr))
>> do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), 0);
>> return EAX_EDX_VAL(val, low, high);
>> @@ -119,7 +122,10 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr,
>> static inline void native_write_msr(unsigned int msr,
>> unsigned low, unsigned high)
>> {
>> - asm volatile("wrmsr" : : "c" (msr), "a"(low), "d" (high) : "memory");
>> + asm volatile("1: wrmsr\n"
>> + "2:\n"
>> + _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe)
>> + : : "c" (msr), "a"(low), "d" (high) : "memory");
>> if (msr_tracepoint_active(__tracepoint_read_msr))
>> do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
>> }
>> diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
>> index 9dd7e4b7fcde..f310714e6e6d 100644
>> --- a/arch/x86/mm/extable.c
>> +++ b/arch/x86/mm/extable.c
>> @@ -49,6 +49,39 @@ bool ex_handler_ext(const struct exception_table_entry *fixup,
>> }
>> EXPORT_SYMBOL(ex_handler_ext);
>>
>> +bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup,
>> + struct pt_regs *regs, int trapnr)
>> +{
>> + WARN(1, "unsafe MSR access error: RDMSR from 0x%x",
>> + (unsigned int)regs->cx);
>
> Btw., instead of the safe/unsafe naming (which has an emotional and security
> secondary attribute), shouldn't we move this over to a _check() (or _checking())
> naming instead that we do in other places in the kernel?
>
> I.e.:
>
> rdmsr(msr, l, h);
>
> and:
>
> if (rdmsr_check(msr, l, h)) {
> ...
> }
>
> and then we could name the helpers as _check() and _nocheck() - which is neutral
> naming.
Will do as a separate followup series.
At least with this series applied, the functions named _safe all point
to each other correctly.
--Andy
On 11/03/2016 20:06, Andy Lutomirski wrote:
> This adds paravirt hooks for unsafe MSR access. On native, they
> call native_{read,write}_msr. On Xen, they use
> xen_{read,write}_msr_safe.
>
> Nothing uses them yet for ease of bisection. The next patch will
> use them in rdmsrl, wrmsrl, etc.
>
> I intentionally didn't make them OOPS on #GP on Xen. I think that
> should be done separately by the Xen maintainers.
Please do the same for KVM.
Paolo
On Mar 14, 2016 7:03 AM, "Paolo Bonzini" <[email protected]> wrote:
>
>
>
> On 11/03/2016 20:06, Andy Lutomirski wrote:
> > This adds paravirt hooks for unsafe MSR access. On native, they
> > call native_{read,write}_msr. On Xen, they use
> > xen_{read,write}_msr_safe.
> >
> > Nothing uses them yet for ease of bisection. The next patch will
> > use them in rdmsrl, wrmsrl, etc.
> >
> > I intentionally didn't make them OOPS on #GP on Xen. I think that
> > should be done separately by the Xen maintainers.
>
> Please do the same for KVM.
Can you clarify? KVM uses the native version, and the native version
only oopses with this series applied if panic_on_oops is set.
--Andy
On Mon, Mar 14, 2016 at 9:58 AM, Linus Torvalds
<[email protected]> wrote:
>
> On Mar 14, 2016 9:53 AM, "Andy Lutomirski" <[email protected]> wrote:
>>
>> Can you clarify? KVM uses the native version, and the native version
>> only oopses with this series applied if panic_on_oops is set.
>
> Can we please remove that idiocy?
>
> There is no reason to panic whatsoever. Seriously. What's the upside of that
> logic?
I imagine that people who set panic_on_oops want their systems to stop
running user code if something happens that could corrupt the state or
if there's any sign that user code is trying some non-deterministic
exploit. So I'm guessing that they'd want this type of "the kernel
screwed up -- abort" to actually result in a panic.
As a concrete, although somewhat silly, example, suppose that a write
to MSR_SYSENTER_STACK fails. If that happened, then user code could
subsequently try to take over the kernel by evil manipulation of TF
and/or perf.
I'd be okay with removing this too, though, since arranging for MSR
access to fail seems unlikely as an exploit vector.
Borislav: SUSE actually uses panic_on_oops, right? What's their goal?
--Andy
On 14/03/2016 18:02, Andy Lutomirski wrote:
> On Mon, Mar 14, 2016 at 9:58 AM, Linus Torvalds
> <[email protected]> wrote:
>>
>> On Mar 14, 2016 9:53 AM, "Andy Lutomirski" <[email protected]> wrote:
>>>
>>> Can you clarify? KVM uses the native version, and the native version
>>> only oopses with this series applied if panic_on_oops is set.
>>
>> Can we please remove that idiocy?
>>
>> There is no reason to panic whatsoever. Seriously. What's the upside of that
>> logic?
>
> I imagine that people who set panic_on_oops want their systems to stop
> running user code if something happens that could corrupt the state or
> if there's any sign that user code is trying some non-deterministic
> exploit. So I'm guessing that they'd want this type of "the kernel
> screwed up -- abort" to actually result in a panic.
>
> As a concrete, although somewhat silly, example, suppose that a write
> to MSR_SYSENTER_STACK fails. If that happened, then user code could
> subsequently try to take over the kernel by evil manipulation of TF
> and/or perf.
>
> I'd be okay with removing this too, though, since arranging for MSR
> access to fail seems unlikely as an exploit vector.
>
> Borislav: SUSE actually uses panic_on_oops, right? What's their goal?
RHEL also does, and it's mostly to trap kernel page faults before they
do more damage such as filesystem corruption. The debug kernel has
panic_on_oops=0, while the production kernel has panic_on_oops=1.
Paolo
On 14/03/2016 17:53, Andy Lutomirski wrote:
> > Please do the same for KVM.
>
> Can you clarify? KVM uses the native version, and the native version
> only oopses with this series applied if panic_on_oops is set.
Since you fixed the panic_on_oops thing, I'm okay with letting KVM use
the unsafe version and seeing what breaks...
Paolo