2019-10-04 13:47:10

by Changbin Du

[permalink] [raw]
Subject: [PATCH] x86/mm: determine whether the fault address is canonical

We know the answer, so don't ask the user.

Signed-off-by: Changbin Du <[email protected]>
---
arch/x86/mm/extable.c | 5 ++++-
arch/x86/mm/mm_internal.h | 11 +++++++++++
2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 4d75bc656f97..5196e586756f 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -8,6 +8,8 @@
#include <asm/traps.h>
#include <asm/kdebug.h>

+#include "mm_internal.h"
+
typedef bool (*ex_handler_t)(const struct exception_table_entry *,
struct pt_regs *, int, unsigned long,
unsigned long);
@@ -123,7 +125,8 @@ __visible bool ex_handler_uaccess(const struct exception_table_entry *fixup,
unsigned long error_code,
unsigned long fault_addr)
{
- WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
+ WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
+ is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");
regs->ip = ex_fixup_addr(fixup);
return true;
}
diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h
index eeae142062ed..4c8a0fdd1c64 100644
--- a/arch/x86/mm/mm_internal.h
+++ b/arch/x86/mm/mm_internal.h
@@ -2,6 +2,17 @@
#ifndef __X86_MM_INTERNAL_H
#define __X86_MM_INTERNAL_H

+static inline bool is_canonical_addr(u64 addr)
+{
+#ifdef CONFIG_X86_64
+ int shift = 64 - boot_cpu_data.x86_phys_bits;
+
+ return ((int64_t)addr << shift >> shift) == addr;
+#else
+ return true;
+#endif
+}
+
void *alloc_low_pages(unsigned int num);
static inline void *alloc_low_page(void)
{
--
2.20.1


2019-10-04 14:42:55

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: determine whether the fault address is canonical

On 10/4/19 6:45 AM, Changbin Du wrote:
> +static inline bool is_canonical_addr(u64 addr)
> +{
> +#ifdef CONFIG_X86_64
> + int shift = 64 - boot_cpu_data.x86_phys_bits;

I think you mean to check the virtual bits member, not "phys_bits".

BTW, I also prefer the IS_ENABLED(CONFIG_) checks to explicit #ifdefs.
Would one of those work in this case?

As for the error message:

> {
> - WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
> + WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
> + is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");

I've always read that as "the GP might have been caused by a
non-canonical access". The main nit I'd have with the change is that I
don't think all #GP's during user access functions which are given a
non-canonical address *necessarily* caused the #GP.

There are a billion ways you can get a #GP and I bet canonical
violations aren't the only way you can get one in a user copy function.

2019-10-04 15:00:07

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: determine whether the fault address is canonical

On Fri, Oct 4, 2019 at 6:45 AM Changbin Du <[email protected]> wrote:
>
> We know the answer, so don't ask the user.
>
> Signed-off-by: Changbin Du <[email protected]>
> ---
> arch/x86/mm/extable.c | 5 ++++-
> arch/x86/mm/mm_internal.h | 11 +++++++++++
> 2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
> index 4d75bc656f97..5196e586756f 100644
> --- a/arch/x86/mm/extable.c
> +++ b/arch/x86/mm/extable.c
> @@ -8,6 +8,8 @@
> #include <asm/traps.h>
> #include <asm/kdebug.h>
>
> +#include "mm_internal.h"
> +
> typedef bool (*ex_handler_t)(const struct exception_table_entry *,
> struct pt_regs *, int, unsigned long,
> unsigned long);
> @@ -123,7 +125,8 @@ __visible bool ex_handler_uaccess(const struct exception_table_entry *fixup,
> unsigned long error_code,
> unsigned long fault_addr)
> {
> - WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
> + WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
> + is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");

Unless the hardware behaves rather differently from the way I think it
does, fault_addr is garbage for anything other than #PF and sometimes
for #DF. (And maybe the virtualization faults?) I don't believe that
#GP fills in CR2.

2019-10-04 15:14:58

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: determine whether the fault address is canonical

On 10/4/19 7:59 AM, Andy Lutomirski wrote:
>> @@ -123,7 +125,8 @@ __visible bool ex_handler_uaccess(const struct exception_table_entry *fixup,
>> unsigned long error_code,
>> unsigned long fault_addr)
>> {
>> - WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
>> + WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
>> + is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");
> Unless the hardware behaves rather differently from the way I think it
> does, fault_addr is garbage for anything other than #PF and sometimes
> for #DF. (And maybe the virtualization faults?) I don't believe that
> #GP fills in CR2.

For #GP, we do:

do_general_protection(struct pt_regs *regs, long error_code)
{
...
if (!user_mode(regs)) {
if (fixup_exception(regs, X86_TRAP_GP, error_code, 0))
return;

Where the 0 is 'fault_addr'. I'm not sure any other way that
ex_handler_uaccess() can get called with trapnr == X86_TRAP_GP. 0 is
canonical last I checked, which would make this patch a bit academic. :)

2019-10-04 15:33:17

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: determine whether the fault address is canonical

On Fri, Oct 04, 2019 at 07:39:08AM -0700, Dave Hansen wrote:
> On 10/4/19 6:45 AM, Changbin Du wrote:
> > +static inline bool is_canonical_addr(u64 addr)
> > +{
> > +#ifdef CONFIG_X86_64
> > + int shift = 64 - boot_cpu_data.x86_phys_bits;
>
> I think you mean to check the virtual bits member, not "phys_bits".
>
> BTW, I also prefer the IS_ENABLED(CONFIG_) checks to explicit #ifdefs.
> Would one of those work in this case?
>
> As for the error message:
>
> > {
> > - WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
> > + WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
> > + is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");
>
> I've always read that as "the GP might have been caused by a
> non-canonical access". The main nit I'd have with the change is that I
> don't think all #GP's during user access functions which are given a
> non-canonical address *necessarily* caused the #GP.
>
> There are a billion ways you can get a #GP and I bet canonical
> violations aren't the only way you can get one in a user copy function.

All the other reasons would require a fairly egregious kernel bug, hence
the speculation that the #GP is due to a non-canonical address. Something
like the following would be more precise, though highly unlikely to ever
be exercised, e.g. KVM had a fatal bug related to injecting a non-zero
error code that went unnoticed for years.

WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. %s?\n",
(IS_ENABLED(CONFIG_X86_64) && !error_code) ? "Non-canonical address" :
"Segmentation bug");

2019-10-06 02:32:58

by Changbin Du

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: determine whether the fault address is canonical

On Fri, Oct 04, 2019 at 08:14:25AM -0700, Dave Hansen wrote:
> On 10/4/19 7:59 AM, Andy Lutomirski wrote:
> >> @@ -123,7 +125,8 @@ __visible bool ex_handler_uaccess(const struct exception_table_entry *fixup,
> >> unsigned long error_code,
> >> unsigned long fault_addr)
> >> {
> >> - WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
> >> + WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
> >> + is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");
> > Unless the hardware behaves rather differently from the way I think it
> > does, fault_addr is garbage for anything other than #PF and sometimes
> > for #DF. (And maybe the virtualization faults?) I don't believe that
> > #GP fills in CR2.
>
> For #GP, we do:
>
> do_general_protection(struct pt_regs *regs, long error_code)
> {
> ...
> if (!user_mode(regs)) {
> if (fixup_exception(regs, X86_TRAP_GP, error_code, 0))
> return;
>
> Where the 0 is 'fault_addr'. I'm not sure any other way that
> ex_handler_uaccess() can get called with trapnr == X86_TRAP_GP. 0 is
> canonical last I checked, which would make this patch a bit academic. :)
My fault. I thought the 'fault_addr' is filled with a valid value. So we really
don't know the answer without decoding the instruction which causes this #GP. :)

--
Cheers,
Changbin Du

2019-10-07 14:35:55

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: determine whether the fault address is canonical


* Sean Christopherson <[email protected]> wrote:

> On Fri, Oct 04, 2019 at 07:39:08AM -0700, Dave Hansen wrote:
> > On 10/4/19 6:45 AM, Changbin Du wrote:
> > > +static inline bool is_canonical_addr(u64 addr)
> > > +{
> > > +#ifdef CONFIG_X86_64
> > > + int shift = 64 - boot_cpu_data.x86_phys_bits;
> >
> > I think you mean to check the virtual bits member, not "phys_bits".
> >
> > BTW, I also prefer the IS_ENABLED(CONFIG_) checks to explicit #ifdefs.
> > Would one of those work in this case?
> >
> > As for the error message:
> >
> > > {
> > > - WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
> > > + WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
> > > + is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");
> >
> > I've always read that as "the GP might have been caused by a
> > non-canonical access". The main nit I'd have with the change is that I
> > don't think all #GP's during user access functions which are given a
> > non-canonical address *necessarily* caused the #GP.
> >
> > There are a billion ways you can get a #GP and I bet canonical
> > violations aren't the only way you can get one in a user copy function.
>
> All the other reasons would require a fairly egregious kernel bug, hence
> the speculation that the #GP is due to a non-canonical address. Something
> like the following would be more precise, though highly unlikely to ever
> be exercised, e.g. KVM had a fatal bug related to injecting a non-zero
> error code that went unnoticed for years.
>
> WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. %s?\n",
> (IS_ENABLED(CONFIG_X86_64) && !error_code) ? "Non-canonical address" :
> "Segmentation bug");

Instead of trying to guess the reason of the #GPF (which guess might be
wrong), please just state it as the reason if we are sure that the cause
is a non-canonical address - and provide a best-guess if it's not but
clearly signal that it's a guess.

I.e. if I understood all the cases correctly we'd have three types of
messages generated:

!error_code:
"General protection fault in user access, due to non-canonical address."

error_code && !is_canonical_addr(fault_addr):
"General protection fault in user access. Non-canonical address?"

error_code && is_canonical_addr(fault_addr):
"General protection fault in user access. Segmentation bug?"

Only the first one is declarative, because we know we got a #GP with a
zero error code which should denote a non-canonical address access.

The second and third ones are guesses with question marks to communicate
the uncertainty.

Assuming that !error_code always means non-canonical access?

And hopefully "!error_code && !is_canonical_addr(fault_addr)" is not
possible?

Thanks,

Ingo

2019-10-07 14:47:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: determine whether the fault address is canonical


* Ingo Molnar <[email protected]> wrote:

> > All the other reasons would require a fairly egregious kernel bug, hence
> > the speculation that the #GP is due to a non-canonical address. Something
> > like the following would be more precise, though highly unlikely to ever
> > be exercised, e.g. KVM had a fatal bug related to injecting a non-zero
> > error code that went unnoticed for years.
> >
> > WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. %s?\n",
> > (IS_ENABLED(CONFIG_X86_64) && !error_code) ? "Non-canonical address" :
> > "Segmentation bug");
>
> Instead of trying to guess the reason of the #GPF (which guess might be
> wrong), please just state it as the reason if we are sure that the cause
> is a non-canonical address - and provide a best-guess if it's not but
> clearly signal that it's a guess.
>
> I.e. if I understood all the cases correctly we'd have three types of
> messages generated:
>
> !error_code:
> "General protection fault in user access, due to non-canonical address."
>
> error_code && !is_canonical_addr(fault_addr):
> "General protection fault in user access. Non-canonical address?"
>
> error_code && is_canonical_addr(fault_addr):
> "General protection fault in user access. Segmentation bug?"

Now that I've read the rest of the thread, since fault_addr is always 0
we can ignore most of this I suspect ...

Thanks,

Ingo

2019-10-07 15:13:54

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: determine whether the fault address is canonical

On Mon, Oct 07, 2019 at 04:44:23PM +0200, Ingo Molnar wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
> > > All the other reasons would require a fairly egregious kernel bug, hence
> > > the speculation that the #GP is due to a non-canonical address. Something
> > > like the following would be more precise, though highly unlikely to ever
> > > be exercised, e.g. KVM had a fatal bug related to injecting a non-zero
> > > error code that went unnoticed for years.
> > >
> > > WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. %s?\n",
> > > (IS_ENABLED(CONFIG_X86_64) && !error_code) ? "Non-canonical address" :
> > > "Segmentation bug");
> >
> > Instead of trying to guess the reason of the #GPF (which guess might be
> > wrong), please just state it as the reason if we are sure that the cause
> > is a non-canonical address - and provide a best-guess if it's not but
> > clearly signal that it's a guess.
> >
> > I.e. if I understood all the cases correctly we'd have three types of
> > messages generated:
> >
> > !error_code:
> > "General protection fault in user access, due to non-canonical address."

A non-canonical #GP always has an error code of '0', but the reverse isn't
technically true. And 32-bit mode obviously can't generate a non-canonical
address.

But practically speaking, since _ASM_EXTABLE_UA() should only be used for
reg<->mem instructions, the only way to get a #GP on a usercopy instruction
would be to corrupt the code itself or have a bad segment loaded in 32-bit
mode. So qualifying the non-canonical message on '64-bit && !error_code'
is techncally more precise/correct, but likely meaningless in practice.

> > error_code && !is_canonical_addr(fault_addr):
> > "General protection fault in user access. Non-canonical address?"
> >
> > error_code && is_canonical_addr(fault_addr):
> > "General protection fault in user access. Segmentation bug?"
>
> Now that I've read the rest of the thread, since fault_addr is always 0
> we can ignore most of this I suspect ...