...instead of manually handling the case where error_code=0, e.g. to
display "[SUPERVISOR] [READ]" instead of "normal kernel read fault".
This makes the zero case consistent with all other messages and also
provides additional information for other error code combinations,
e.g. error_code==1 will display "[PROT] [SUPERVISOR] [READ]" instead
of simply "[PROT]".
Print unique names for the negative cases as opposed to e.g. "[!USER]"
to avoid mixups due to users missing a single "!" character, and to be
more concise for the !INSTR && !WRITE case.
Print "SUPERVISOR" in favor of "KERNEL" to reduce the likelihood that
the message is misinterpreted as a generic kernel/software error and
to be consistent with the SDM's nomenclature.
An alternative to passing a negated error code to err_str_append() would
be to expand err_str_append() to take a second string for the negative
test, but that approach complicates handling the "[READ]" case, which
looks for !INSTR && !WRITE, e.g. it would require an extra call to
err_str_append() and logic in err_str_append() to allow null messages
for both the positive and negative tests. Printing "[INSTR] [READ]"
wouldn't be the end of the world, but a little bit of trickery in the
kernel is a relatively small price to pay in exchange for the ability
to unequivocally know the access type by reading a single word.
Now that all components of the message use the [<code>] format,
explicitly state that it's the error *code* that's being printed and
group the err_str_append() calls by type so that the resulting print
messages are consistent, e.g. the deciphered codes will always be:
[PROT] [USER|SUPERVISOR] [WRITE|INSTR|READ] [RSDV] [PK]
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yu-cheng Yu <[email protected]>
Cc: [email protected]
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/mm/fault.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2ff25ad33233..0b4ce5d2b461 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -609,7 +609,7 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
*/
static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
{
- if (error_code & mask) {
+ if ((error_code & mask) == mask) {
if (buf[0])
strcat(buf, " ");
strcat(buf, txt);
@@ -655,13 +655,16 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
* zero delimiter must fit into err_txt[].
*/
err_str_append(error_code, err_txt, X86_PF_PROT, "[PROT]" );
- err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
err_str_append(error_code, err_txt, X86_PF_USER, "[USER]" );
- err_str_append(error_code, err_txt, X86_PF_RSVD, "[RSVD]" );
+ err_str_append(~error_code, err_txt, X86_PF_USER, "[SUPERVISOR]");
+ err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
+ err_str_append(~error_code, err_txt, X86_PF_WRITE | X86_PF_INSTR,
+ "[READ]");
+ err_str_append(error_code, err_txt, X86_PF_RSVD, "[RSVD]" );
err_str_append(error_code, err_txt, X86_PF_PK, "[PK]" );
- pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]");
+ pr_alert("#PF error code: %s\n", err_txt);
if (!(error_code & X86_PF_USER) && user_mode(regs)) {
struct desc_ptr idt, gdt;
--
2.19.2
On 12/5/18 8:36 AM, Sean Christopherson wrote:
> ...instead of manually handling the case where error_code=0, e.g. to
> display "[SUPERVISOR] [READ]" instead of "normal kernel read fault".
>
> This makes the zero case consistent with all other messages and also
> provides additional information for other error code combinations,
> e.g. error_code==1 will display "[PROT] [SUPERVISOR] [READ]" instead
> of simply "[PROT]".
>
> Print unique names for the negative cases as opposed to e.g. "[!USER]"
> to avoid mixups due to users missing a single "!" character, and to be
> more concise for the !INSTR && !WRITE case.
>
> Print "SUPERVISOR" in favor of "KERNEL" to reduce the likelihood that
> the message is misinterpreted as a generic kernel/software error and
> to be consistent with the SDM's nomenclature.
>
> An alternative to passing a negated error code to err_str_append() would
> be to expand err_str_append() to take a second string for the negative
> test, but that approach complicates handling the "[READ]" case, which
> looks for !INSTR && !WRITE, e.g. it would require an extra call to
> err_str_append() and logic in err_str_append() to allow null messages
> for both the positive and negative tests. Printing "[INSTR] [READ]"
> wouldn't be the end of the world, but a little bit of trickery in the
> kernel is a relatively small price to pay in exchange for the ability
> to unequivocally know the access type by reading a single word.
>
> Now that all components of the message use the [<code>] format,
> explicitly state that it's the error *code* that's being printed and
> group the err_str_append() calls by type so that the resulting print
> messages are consistent, e.g. the deciphered codes will always be:
>
> [PROT] [USER|SUPERVISOR] [WRITE|INSTR|READ] [RSDV] [PK]
RSVD
but it's correct in the patch.
BTW, what does PK mean?
thanks.
>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Yu-cheng Yu <[email protected]>
> Cc: [email protected]
> Cc: Ingo Molnar <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/mm/fault.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 2ff25ad33233..0b4ce5d2b461 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -609,7 +609,7 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
> */
> static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
> {
> - if (error_code & mask) {
> + if ((error_code & mask) == mask) {
> if (buf[0])
> strcat(buf, " ");
> strcat(buf, txt);
> @@ -655,13 +655,16 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
> * zero delimiter must fit into err_txt[].
> */
> err_str_append(error_code, err_txt, X86_PF_PROT, "[PROT]" );
> - err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
> err_str_append(error_code, err_txt, X86_PF_USER, "[USER]" );
> - err_str_append(error_code, err_txt, X86_PF_RSVD, "[RSVD]" );
> + err_str_append(~error_code, err_txt, X86_PF_USER, "[SUPERVISOR]");
> + err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
> err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
> + err_str_append(~error_code, err_txt, X86_PF_WRITE | X86_PF_INSTR,
> + "[READ]");
> + err_str_append(error_code, err_txt, X86_PF_RSVD, "[RSVD]" );
> err_str_append(error_code, err_txt, X86_PF_PK, "[PK]" );
>
> - pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]");
> + pr_alert("#PF error code: %s\n", err_txt);
>
> if (!(error_code & X86_PF_USER) && user_mode(regs)) {
> struct desc_ptr idt, gdt;
>
--
~Randy
On Wed, Dec 5, 2018 at 11:27 AM Randy Dunlap <[email protected]> wrote:
>
> BTW, what does PK mean?
"Protection Key"
Linus
* Sean Christopherson <[email protected]> wrote:
> ...instead of manually handling the case where error_code=0, e.g. to
> display "[SUPERVISOR] [READ]" instead of "normal kernel read fault".
>
> This makes the zero case consistent with all other messages and also
> provides additional information for other error code combinations,
> e.g. error_code==1 will display "[PROT] [SUPERVISOR] [READ]" instead
> of simply "[PROT]".
>
> Print unique names for the negative cases as opposed to e.g. "[!USER]"
> to avoid mixups due to users missing a single "!" character, and to be
> more concise for the !INSTR && !WRITE case.
>
> Print "SUPERVISOR" in favor of "KERNEL" to reduce the likelihood that
> the message is misinterpreted as a generic kernel/software error and
> to be consistent with the SDM's nomenclature.
>
> An alternative to passing a negated error code to err_str_append() would
> be to expand err_str_append() to take a second string for the negative
> test, but that approach complicates handling the "[READ]" case, which
> looks for !INSTR && !WRITE, e.g. it would require an extra call to
> err_str_append() and logic in err_str_append() to allow null messages
> for both the positive and negative tests. Printing "[INSTR] [READ]"
> wouldn't be the end of the world, but a little bit of trickery in the
> kernel is a relatively small price to pay in exchange for the ability
> to unequivocally know the access type by reading a single word.
>
> Now that all components of the message use the [<code>] format,
> explicitly state that it's the error *code* that's being printed and
> group the err_str_append() calls by type so that the resulting print
> messages are consistent, e.g. the deciphered codes will always be:
>
> [PROT] [USER|SUPERVISOR] [WRITE|INSTR|READ] [RSDV] [PK]
>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Yu-cheng Yu <[email protected]>
> Cc: [email protected]
> Cc: Ingo Molnar <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/mm/fault.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 2ff25ad33233..0b4ce5d2b461 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -609,7 +609,7 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
> */
> static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
> {
> - if (error_code & mask) {
> + if ((error_code & mask) == mask) {
> if (buf[0])
> strcat(buf, " ");
> strcat(buf, txt);
> @@ -655,13 +655,16 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
> * zero delimiter must fit into err_txt[].
> */
> err_str_append(error_code, err_txt, X86_PF_PROT, "[PROT]" );
> - err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
> err_str_append(error_code, err_txt, X86_PF_USER, "[USER]" );
> - err_str_append(error_code, err_txt, X86_PF_RSVD, "[RSVD]" );
> + err_str_append(~error_code, err_txt, X86_PF_USER, "[SUPERVISOR]");
> + err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
> err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
> + err_str_append(~error_code, err_txt, X86_PF_WRITE | X86_PF_INSTR,
> + "[READ]");
> + err_str_append(error_code, err_txt, X86_PF_RSVD, "[RSVD]" );
> err_str_append(error_code, err_txt, X86_PF_PK, "[PK]" );
>
> - pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]");
> + pr_alert("#PF error code: %s\n", err_txt);
>
> if (!(error_code & X86_PF_USER) && user_mode(regs)) {
> struct desc_ptr idt, gdt;
Yeah, so I don't like the overly long 'SUPERVISOR' and the somewhat
inconsistent, sporadic handling of negatives. Here's our error code bits:
/*
* Page fault error code bits:
*
* bit 0 == 0: no page found 1: protection fault
* bit 1 == 0: read access 1: write access
* bit 2 == 0: kernel-mode access 1: user-mode access
* bit 3 == 1: use of reserved bit detected
* bit 4 == 1: fault was an instruction fetch
* bit 5 == 1: protection keys block access
*/
enum x86_pf_error_code {
X86_PF_PROT = 1 << 0,
X86_PF_WRITE = 1 << 1,
X86_PF_USER = 1 << 2,
X86_PF_RSVD = 1 << 3,
X86_PF_INSTR = 1 << 4,
X86_PF_PK = 1 << 5,
};
While not all of these combinations will happen on real hardware, I think
the message should nevertheless be fixed length and be of a predictable
nature.
I like your '!' idea, but with a further simplification: how about using
'-/+' differentiation and a single character and a fixed-length message.
The new output will be lines of:
#PF error code: -P -W -U -S -I -K (0x00)
...
#PF error code: -P -W +U +S -I -K (0x0c)
...
#PF error code: +P +W +U +S +I +K (0x3f)
The symbol abbreviations are pretty self-explanatory:
P = protection fault (X86_PF_PROT)
W = write access (X86_PF_WRITE)
U = user-mode access (X86_PF_USER)
S = supervisor mode (X86_PF_RSVD)
I = instruction fault (X86_PF_INSTR)
K = keys fault (X86_PF_PK)
Misc notes:
- In principle the new text is now short enough to include it in one of
the existing output lines, further shortening the oops output - but I
havent done that in this patch.
- Another question is the ordering of the bits: the symbolic display is
actually big endian, while the numeric hexa printout is little endian.
I kind of still like it that way, not just because the decoding loop is
more natural, but because the bits are actually ordered by importance:
the PROT bits is more important than the INSTR or the PK bits - and the
more important bits are displayed first.
- Only build-tested the patch and looked at the generated assembly, but
it all looks sane enough so will obviously work just fine! ;-)
Thanks,
Ingo
======================>
Subject: x86/mm/fault: Streamline the fault error_code decoder some more
From: Ingo Molnar <[email protected]>
Date: Thu, 6 Dec 2018 08:12:06 +0100
Sean Christopherson pointed out that the newfangled human-readable page
fault oops error_code decoder we added in:
a1a371c468f7: ("x86/fault: Decode page fault OOPSes better")
a2aa52ab16ef: ("x86/fault: Clean up the page fault oops decoder a bit")
*still* confuses humans due to the hiding of the negative case and due to the
special casing of the all-zeroes error code, which is suboptimal.
Improve it some more:
- Change the text from variable-length string to a fixed-length string,
- display non-set bits,
- include the error code itself as well numerically,
- get rid of the '[normal kernel read fault]' special case,
- factor out the code, simplify and speed up the string generation logic.
The new output will be lines of:
#PF error code: -P -W -U -S -I -K (0x00)
...
#PF error code: -P -W +U +S -I -K (0x0c)
...
#PF error code: +P +W +U +S +I +K (0x3f)
The symbol abbreviations are pretty self-explanatory:
P = protection fault (X86_PF_PROT)
W = write access (X86_PF_WRITE)
U = user-mode access (X86_PF_USER)
S = supervisor mode (X86_PF_RSVD)
I = instruction fault (X86_PF_INSTR)
K = keys fault (X86_PF_PK)
In principle this is now short enough to include it in one of the
existing output lines, further shortening the oops output.
( Also clean up some nearby line breaks while at it. )
Suggested-by: Sean Christopherson <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yu-cheng Yu <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/mm/fault.c | 67 +++++++++++++++++++++++++++++++---------------------
1 file changed, 40 insertions(+), 27 deletions(-)
Index: tip/arch/x86/mm/fault.c
===================================================================
--- tip.orig/arch/x86/mm/fault.c
+++ tip/arch/x86/mm/fault.c
@@ -604,23 +604,51 @@ static void show_ldttss(const struct des
}
/*
- * This helper function transforms the #PF error_code bits into
- * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
+ * This maps the somewhat obscure error_code number to symbolic text:
+ *
+ * P = protection fault (X86_PF_PROT)
+ * W = write access (X86_PF_WRITE)
+ * U = user-mode access (X86_PF_USER)
+ * S = supervisor mode (X86_PF_RSVD)
+ * I = instruction fault (X86_PF_INSTR)
+ * K = keys fault (X86_PF_PK)
*/
-static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
+static const char error_code_chars[] = "PWUSIK";
+
+/*
+ * This helper function transforms the #PF error_code bits into " +P -W +U -R -I -K"
+ * type of descriptive, almost human-readable error strings:
+ */
+static void show_error_code(struct pt_regs *regs, unsigned long error_code)
{
- if (error_code & mask) {
- if (buf[0])
- strcat(buf, " ");
- strcat(buf, txt);
+ unsigned int bit, mask;
+ char err_txt[6*3+1]; /* Fixed length of 6 bits decoded plus zero at the end */
+
+ /* We go from the X86_PF_PROT bit to the X86_PF_PK bit: */
+
+ for (bit = 0; bit < 6; bit++) {
+ unsigned int offset = bit*3;
+
+ err_txt[offset+0] = ' ';
+
+ mask = 1 << bit;
+ if (error_code & mask)
+ err_txt[offset+1] = '+';
+ else
+ err_txt[offset+1] = '-';
+
+ err_txt[offset+2] = error_code_chars[bit];
}
+
+ /* Close the string: */
+ err_txt[sizeof(err_txt)-1] = 0;
+
+ pr_alert("#PF error code: %s (%02lx)\n", err_txt, error_code);
}
static void
show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address)
{
- char err_txt[64];
-
if (!oops_may_print())
return;
@@ -648,20 +676,7 @@ show_fault_oops(struct pt_regs *regs, un
address < PAGE_SIZE ? "NULL pointer dereference" : "paging request",
(void *)address);
- err_txt[0] = 0;
-
- /*
- * Note: length of these appended strings including the separation space and the
- * zero delimiter must fit into err_txt[].
- */
- err_str_append(error_code, err_txt, X86_PF_PROT, "[PROT]" );
- err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
- err_str_append(error_code, err_txt, X86_PF_USER, "[USER]" );
- err_str_append(error_code, err_txt, X86_PF_RSVD, "[RSVD]" );
- err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
- err_str_append(error_code, err_txt, X86_PF_PK, "[PK]" );
-
- pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]");
+ show_error_code(regs, error_code);
if (!(error_code & X86_PF_USER) && user_mode(regs)) {
struct desc_ptr idt, gdt;
@@ -698,8 +713,7 @@ show_fault_oops(struct pt_regs *regs, un
}
static noinline void
-pgtable_bad(struct pt_regs *regs, unsigned long error_code,
- unsigned long address)
+pgtable_bad(struct pt_regs *regs, unsigned long error_code, unsigned long address)
{
struct task_struct *tsk;
unsigned long flags;
@@ -719,8 +733,7 @@ pgtable_bad(struct pt_regs *regs, unsign
oops_end(flags, regs, sig);
}
-static void set_signal_archinfo(unsigned long address,
- unsigned long error_code)
+static void set_signal_archinfo(unsigned long address, unsigned long error_code)
{
struct task_struct *tsk = current;
On Thu, Dec 06, 2018 at 08:34:09AM +0100, Ingo Molnar wrote:
> I like your '!' idea, but with a further simplification: how about using
> '-/+' differentiation and a single character and a fixed-length message.
>
> The new output will be lines of:
>
> #PF error code: -P -W -U -S -I -K (0x00)
> ...
> #PF error code: -P -W +U +S -I -K (0x0c)
> ...
> #PF error code: +P +W +U +S +I +K (0x3f)
>
> The symbol abbreviations are pretty self-explanatory:
>
> P = protection fault (X86_PF_PROT)
> W = write access (X86_PF_WRITE)
> U = user-mode access (X86_PF_USER)
> S = supervisor mode (X86_PF_RSVD)
> I = instruction fault (X86_PF_INSTR)
> K = keys fault (X86_PF_PK)
>
> Misc notes:
>
> - In principle the new text is now short enough to include it in one of
> the existing output lines, further shortening the oops output - but I
> havent done that in this patch.
>
> - Another question is the ordering of the bits: the symbolic display is
> actually big endian, while the numeric hexa printout is little endian.
>
> I kind of still like it that way, not just because the decoding loop is
> more natural, but because the bits are actually ordered by importance:
> the PROT bits is more important than the INSTR or the PK bits - and the
> more important bits are displayed first.
Hmm, my eyes tend to be drawn to the end of the line, e.g. having PROT
be the last thing makes it stand out more than being buried in the middle
of the line. Inverting the ordering between raw and decoded also makes
it very difficult to correlate the raw value with each bit.
> - Only build-tested the patch and looked at the generated assembly, but
> it all looks sane enough so will obviously work just fine! ;-)
...
> /*
> - * This helper function transforms the #PF error_code bits into
> - * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
> + * This maps the somewhat obscure error_code number to symbolic text:
> + *
> + * P = protection fault (X86_PF_PROT)
> + * W = write access (X86_PF_WRITE)
> + * U = user-mode access (X86_PF_USER)
> + * S = supervisor mode (X86_PF_RSVD)
> + * I = instruction fault (X86_PF_INSTR)
> + * K = keys fault (X86_PF_PK)
> */
> -static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
> +static const char error_code_chars[] = "PWUSIK";
> +
> +/*
> + * This helper function transforms the #PF error_code bits into " +P -W +U -R -I -K"
> + * type of descriptive, almost human-readable error strings:
> + */
> +static void show_error_code(struct pt_regs *regs, unsigned long error_code)
No need for @regs.
> {
> - if (error_code & mask) {
> - if (buf[0])
> - strcat(buf, " ");
> - strcat(buf, txt);
> + unsigned int bit, mask;
> + char err_txt[6*3+1]; /* Fixed length of 6 bits decoded plus zero at the end */
Assuming the error code bits are contiguous breaks if/when SGX gets added,
which uses bit 15. Oopsing on an SGX fault should be nigh impossible, but
it might be nice to be able to reuse show_error_code in other places.
Hardcoding "6" is also a bit painful.
> +
> + /* We go from the X86_PF_PROT bit to the X86_PF_PK bit: */
> +
> + for (bit = 0; bit < 6; bit++) {
> + unsigned int offset = bit*3;
> +
> + err_txt[offset+0] = ' ';
> +
> + mask = 1 << bit;
> + if (error_code & mask)
> + err_txt[offset+1] = '+';
> + else
> + err_txt[offset+1] = '-';
To me, using '!' contrasts better when side-by-side with '+'.
> +
> + err_txt[offset+2] = error_code_chars[bit];
> }
> +
> + /* Close the string: */
> + err_txt[sizeof(err_txt)-1] = 0;
> +
> + pr_alert("#PF error code: %s (%02lx)\n", err_txt, error_code);
The changelog example has a leading "0x" on the error code. That being
said, I actually like it without the "0x".
How about printing the raw value before the colon? Having it at the end
makes it look like extra noise. And for me, seeing the raw code first
(reading left to right) cue's my brain that it's about to decode some
bits.
SGX will also break the two digit printing. And for whatever reason four
digits makes me think "this is an error code!". IIRC the vectoring ucode
makes a lot of assumptions about the error code being at most 16 bits, so
in theory four digits is all we'll ever need.
E.g.
[ 0.144247] #PF error code: +P -W -U -S -I -K (01)
[ 0.144411] #PF error code: +P +W -U -S -I -K (03)
[ 0.144826] #PF error code: +P +W +U -S -I -K (07)
[ 0.145252] #PF error code: +P -W +U -S -I +K (25)
[ 0.145706] #PF error code: -P +W -U -S -I -K (02)
[ 0.146111] #PF error code: -P -W +U -S -I -K (04)
[ 0.146521] #PF error code: -P +W +U -S -I -K (06)
[ 0.146934] #PF error code: -P -W +U -S +I -K (14)
[ 0.147348] #PF error code: +P -W -U -S +I -K (11)
[ 0.147767] #PF error code: -P -W -U -S -I -K (00)
vs. (with SGX added as 'G' for testing purposes)
[ 0.158849] #PF error code(0001): +P !W !U !S !I !K !G
[ 0.159292] #PF error code(0003): +P +W !U !S !I !K !G
[ 0.159742] #PF error code(0007): +P +W +U !S !I !K !G
[ 0.160190] #PF error code(0025): +P !W +U !S !I +K !G
[ 0.160638] #PF error code(0002): !P +W !U !S !I !K !G
[ 0.161087] #PF error code(0004): !P !W +U !S !I !K !G
[ 0.161538] #PF error code(0006): !P +W +U !S !I !K !G
[ 0.161992] #PF error code(0014): !P !W +U !S +I !K !G
[ 0.162450] #PF error code(0011): +P !W !U !S +I !K !G
[ 0.162667] #PF error code(8001): +P !W !U !S !I !K +G
[ 0.162667] #PF error code(8003): +P +W !U !S !I !K +G
[ 0.162667] #PF error code(8007): +P +W +U !S !I !K +G
[ 0.162667] #PF error code(8025): +P !W +U !S !I +K +G
[ 0.162667] #PF error code(8002): !P +W !U !S !I !K +G
[ 0.162667] #PF error code(8004): !P !W +U !S !I !K +G
[ 0.162667] #PF error code(8006): !P +W +U !S !I !K +G
[ 0.162667] #PF error code(8014): !P !W +U !S +I !K +G
[ 0.162667] #PF error code(8011): +P !W !U !S +I !K +G
[ 0.162667] #PF error code(0000): !P !W !U !S !I !K !G
vs. (with consistent ordering between raw and decoded)
[ 0.153004] #PF error code(0001): !K !I !S !U !W +P
[ 0.153004] #PF error code(0003): !K !I !S !U +W +P
[ 0.153004] #PF error code(0007): !K !I !S +U +W +P
[ 0.153004] #PF error code(0025): +K !I !S +U !W +P
[ 0.153004] #PF error code(0002): !K !I !S !U +W !P
[ 0.153004] #PF error code(0004): !K !I !S +U !W !P
[ 0.153004] #PF error code(0006): !K !I !S +U +W !P
[ 0.153362] #PF error code(0014): !K +I !S +U !W !P
[ 0.153788] #PF error code(0011): !K +I !S !U !W +P
[ 0.154104] #PF error code(0000): !K !I !S !U !W !P
A patch with the kitchen sink...
================================>
From 22e6d40e52b4341a424f065a138be54bc79d4db4 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <[email protected]>
Date: Thu, 6 Dec 2018 07:25:13 -0800
Subject: [PATCH] x86/fault: Make show_error_code() more extensible and tweak
its formatting
- Initialize each bit individually in the error_code_chars array. This
allows for non-contiguous bits and is self-documenting. Define a
macro to make the initialization code somewhatmore readable.
- Reverse the decode order so it's consistent with the raw display.
- Use ARRAY_SIZE instead of hardcoding '6' in multiple locations.
- Use '!' for the negative case to better contrast against '+'.
- Display four digits (was two) when printing the raw error code.
- Remove @regs from show_error_code().
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/mm/fault.c | 47 ++++++++++++++++++++++++---------------------
1 file changed, 25 insertions(+), 22 deletions(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 23dc7163f6ac..cd28d058ed39 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -605,45 +605,48 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
/*
* This maps the somewhat obscure error_code number to symbolic text:
- *
- * P = protection fault (X86_PF_PROT)
- * W = write access (X86_PF_WRITE)
- * U = user-mode access (X86_PF_USER)
- * S = supervisor mode (X86_PF_RSVD)
- * I = instruction fault (X86_PF_INSTR)
- * K = keys fault (X86_PF_PK)
*/
-static const char error_code_chars[] = "PWUSIK";
+#define EC(x) ilog2(X86_PF_##x)
+static const char error_code_chars[] = {
+ [EC(PROT)] = 'P',
+ [EC(WRITE)] = 'W',
+ [EC(USER)] = 'U',
+ [EC(RSVD)] = 'S',
+ [EC(INSTR)] = 'I',
+ [EC(PK)] = 'K',
+};
/*
- * This helper function transforms the #PF error_code bits into " +P -W +U -R -I -K"
+ * This helper function transforms the #PF error_code bits into " +P !W +U !R !I !K"
* type of descriptive, almost human-readable error strings:
*/
-static void show_error_code(struct pt_regs *regs, unsigned long error_code)
+static void show_error_code(unsigned long error_code)
{
- unsigned int bit, mask;
- char err_txt[6*3+1]; /* Fixed length of 6 bits decoded plus zero at the end */
+ char err_txt[ARRAY_SIZE(error_code_chars)*3+1]; /* 3 chars per bit plus zero at the end */
+ unsigned offset = 0;
+ unsigned long mask;
+ int i;
- /* We go from the X86_PF_PROT bit to the X86_PF_PK bit: */
-
- for (bit = 0; bit < 6; bit++) {
- unsigned int offset = bit*3;
+ for (i = ARRAY_SIZE(error_code_chars) - 1; i >= 0; i--) {
+ if (!error_code_chars[i])
+ continue;
err_txt[offset+0] = ' ';
- mask = 1 << bit;
+ mask = 1 << i;
if (error_code & mask)
err_txt[offset+1] = '+';
else
- err_txt[offset+1] = '-';
+ err_txt[offset+1] = '!';
- err_txt[offset+2] = error_code_chars[bit];
+ err_txt[offset+2] = error_code_chars[i];
+ offset += 3;
}
/* Close the string: */
- err_txt[sizeof(err_txt)-1] = 0;
+ err_txt[offset] = 0;
- pr_alert("#PF error code: %s (%02lx)\n", err_txt, error_code);
+ pr_alert("#PF error code(%04lx): %s\n", error_code, err_txt);
}
static void
@@ -676,7 +679,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
address < PAGE_SIZE ? "NULL pointer dereference" : "paging request",
(void *)address);
- show_error_code(regs, error_code);
+ show_error_code(error_code);
if (!(error_code & X86_PF_USER) && user_mode(regs)) {
struct desc_ptr idt, gdt;
--
2.19.2
* Sean Christopherson <[email protected]> wrote:
> On Thu, Dec 06, 2018 at 08:34:09AM +0100, Ingo Molnar wrote:
> > I like your '!' idea, but with a further simplification: how about using
> > '-/+' differentiation and a single character and a fixed-length message.
> >
> > The new output will be lines of:
> >
> > #PF error code: -P -W -U -S -I -K (0x00)
> > ...
> > #PF error code: -P -W +U +S -I -K (0x0c)
> > ...
> > #PF error code: +P +W +U +S +I +K (0x3f)
> >
> > The symbol abbreviations are pretty self-explanatory:
> >
> > P = protection fault (X86_PF_PROT)
> > W = write access (X86_PF_WRITE)
> > U = user-mode access (X86_PF_USER)
> > S = supervisor mode (X86_PF_RSVD)
> > I = instruction fault (X86_PF_INSTR)
> > K = keys fault (X86_PF_PK)
> >
> > Misc notes:
> >
> > - In principle the new text is now short enough to include it in one of
> > the existing output lines, further shortening the oops output - but I
> > havent done that in this patch.
> >
> > - Another question is the ordering of the bits: the symbolic display is
> > actually big endian, while the numeric hexa printout is little endian.
> >
> > I kind of still like it that way, not just because the decoding loop is
> > more natural, but because the bits are actually ordered by importance:
> > the PROT bits is more important than the INSTR or the PK bits - and the
> > more important bits are displayed first.
>
> Hmm, my eyes tend to be drawn to the end of the line, e.g. having PROT
> be the last thing makes it stand out more than being buried in the middle
> of the line. Inverting the ordering between raw and decoded also makes
> it very difficult to correlate the raw value with each bit.
>
> > - Only build-tested the patch and looked at the generated assembly, but
> > it all looks sane enough so will obviously work just fine! ;-)
>
> ...
>
> > /*
> > - * This helper function transforms the #PF error_code bits into
> > - * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
> > + * This maps the somewhat obscure error_code number to symbolic text:
> > + *
> > + * P = protection fault (X86_PF_PROT)
> > + * W = write access (X86_PF_WRITE)
> > + * U = user-mode access (X86_PF_USER)
> > + * S = supervisor mode (X86_PF_RSVD)
> > + * I = instruction fault (X86_PF_INSTR)
> > + * K = keys fault (X86_PF_PK)
> > */
> > -static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
> > +static const char error_code_chars[] = "PWUSIK";
> > +
> > +/*
> > + * This helper function transforms the #PF error_code bits into " +P -W +U -R -I -K"
> > + * type of descriptive, almost human-readable error strings:
> > + */
> > +static void show_error_code(struct pt_regs *regs, unsigned long error_code)
>
> No need for @regs.
>
> > {
> > - if (error_code & mask) {
> > - if (buf[0])
> > - strcat(buf, " ");
> > - strcat(buf, txt);
> > + unsigned int bit, mask;
> > + char err_txt[6*3+1]; /* Fixed length of 6 bits decoded plus zero at the end */
>
> Assuming the error code bits are contiguous breaks if/when SGX gets added,
> which uses bit 15. Oopsing on an SGX fault should be nigh impossible, but
> it might be nice to be able to reuse show_error_code in other places.
>
> Hardcoding "6" is also a bit painful.
>
> > +
> > + /* We go from the X86_PF_PROT bit to the X86_PF_PK bit: */
> > +
> > + for (bit = 0; bit < 6; bit++) {
> > + unsigned int offset = bit*3;
> > +
> > + err_txt[offset+0] = ' ';
> > +
> > + mask = 1 << bit;
> > + if (error_code & mask)
> > + err_txt[offset+1] = '+';
> > + else
> > + err_txt[offset+1] = '-';
>
> To me, using '!' contrasts better when side-by-side with '+'.
>
> > +
> > + err_txt[offset+2] = error_code_chars[bit];
> > }
> > +
> > + /* Close the string: */
> > + err_txt[sizeof(err_txt)-1] = 0;
> > +
> > + pr_alert("#PF error code: %s (%02lx)\n", err_txt, error_code);
>
> The changelog example has a leading "0x" on the error code. That being
> said, I actually like it without the "0x".
>
> How about printing the raw value before the colon? Having it at the end
> makes it look like extra noise. And for me, seeing the raw code first
> (reading left to right) cue's my brain that it's about to decode some
> bits.
>
> SGX will also break the two digit printing. And for whatever reason four
> digits makes me think "this is an error code!". IIRC the vectoring ucode
> makes a lot of assumptions about the error code being at most 16 bits, so
> in theory four digits is all we'll ever need.
>
> E.g.
>
> [ 0.144247] #PF error code: +P -W -U -S -I -K (01)
> [ 0.144411] #PF error code: +P +W -U -S -I -K (03)
> [ 0.144826] #PF error code: +P +W +U -S -I -K (07)
> [ 0.145252] #PF error code: +P -W +U -S -I +K (25)
> [ 0.145706] #PF error code: -P +W -U -S -I -K (02)
> [ 0.146111] #PF error code: -P -W +U -S -I -K (04)
> [ 0.146521] #PF error code: -P +W +U -S -I -K (06)
> [ 0.146934] #PF error code: -P -W +U -S +I -K (14)
> [ 0.147348] #PF error code: +P -W -U -S +I -K (11)
> [ 0.147767] #PF error code: -P -W -U -S -I -K (00)
>
> vs. (with SGX added as 'G' for testing purposes)
>
> [ 0.158849] #PF error code(0001): +P !W !U !S !I !K !G
> [ 0.159292] #PF error code(0003): +P +W !U !S !I !K !G
> [ 0.159742] #PF error code(0007): +P +W +U !S !I !K !G
> [ 0.160190] #PF error code(0025): +P !W +U !S !I +K !G
> [ 0.160638] #PF error code(0002): !P +W !U !S !I !K !G
> [ 0.161087] #PF error code(0004): !P !W +U !S !I !K !G
> [ 0.161538] #PF error code(0006): !P +W +U !S !I !K !G
> [ 0.161992] #PF error code(0014): !P !W +U !S +I !K !G
> [ 0.162450] #PF error code(0011): +P !W !U !S +I !K !G
> [ 0.162667] #PF error code(8001): +P !W !U !S !I !K +G
> [ 0.162667] #PF error code(8003): +P +W !U !S !I !K +G
> [ 0.162667] #PF error code(8007): +P +W +U !S !I !K +G
> [ 0.162667] #PF error code(8025): +P !W +U !S !I +K +G
> [ 0.162667] #PF error code(8002): !P +W !U !S !I !K +G
> [ 0.162667] #PF error code(8004): !P !W +U !S !I !K +G
> [ 0.162667] #PF error code(8006): !P +W +U !S !I !K +G
> [ 0.162667] #PF error code(8014): !P !W +U !S +I !K +G
> [ 0.162667] #PF error code(8011): +P !W !U !S +I !K +G
> [ 0.162667] #PF error code(0000): !P !W !U !S !I !K !G
>
> vs. (with consistent ordering between raw and decoded)
>
> [ 0.153004] #PF error code(0001): !K !I !S !U !W +P
> [ 0.153004] #PF error code(0003): !K !I !S !U +W +P
> [ 0.153004] #PF error code(0007): !K !I !S +U +W +P
> [ 0.153004] #PF error code(0025): +K !I !S +U !W +P
> [ 0.153004] #PF error code(0002): !K !I !S !U +W !P
> [ 0.153004] #PF error code(0004): !K !I !S +U !W !P
> [ 0.153004] #PF error code(0006): !K !I !S +U +W !P
> [ 0.153362] #PF error code(0014): !K +I !S +U !W !P
> [ 0.153788] #PF error code(0011): !K +I !S !U !W +P
> [ 0.154104] #PF error code(0000): !K !I !S !U !W !P
Ok, looks nice enough to me, with one request: please make it 0x prefixed
as is common with hexa code: "0010" could be binary, octal or decimal as
well. Since this is a separate line we don't lack the space.
Also some nits:
> A patch with the kitchen sink...
> ================================>
>
> From 22e6d40e52b4341a424f065a138be54bc79d4db4 Mon Sep 17 00:00:00 2001
> From: Sean Christopherson <[email protected]>
> Date: Thu, 6 Dec 2018 07:25:13 -0800
> Subject: [PATCH] x86/fault: Make show_error_code() more extensible and tweak
> its formatting
>
> - Initialize each bit individually in the error_code_chars array. This
> allows for non-contiguous bits and is self-documenting. Define a
> macro to make the initialization code somewhatmore readable.
>
> - Reverse the decode order so it's consistent with the raw display.
>
> - Use ARRAY_SIZE instead of hardcoding '6' in multiple locations.
>
> - Use '!' for the negative case to better contrast against '+'.
>
> - Display four digits (was two) when printing the raw error code.
>
> - Remove @regs from show_error_code().
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/mm/fault.c | 47 ++++++++++++++++++++++++---------------------
> 1 file changed, 25 insertions(+), 22 deletions(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 23dc7163f6ac..cd28d058ed39 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -605,45 +605,48 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
>
> /*
> * This maps the somewhat obscure error_code number to symbolic text:
> - *
> - * P = protection fault (X86_PF_PROT)
> - * W = write access (X86_PF_WRITE)
> - * U = user-mode access (X86_PF_USER)
> - * S = supervisor mode (X86_PF_RSVD)
> - * I = instruction fault (X86_PF_INSTR)
> - * K = keys fault (X86_PF_PK)
> */
> -static const char error_code_chars[] = "PWUSIK";
> +#define EC(x) ilog2(X86_PF_##x)
> +static const char error_code_chars[] = {
> + [EC(PROT)] = 'P',
> + [EC(WRITE)] = 'W',
> + [EC(USER)] = 'U',
> + [EC(RSVD)] = 'S',
> + [EC(INSTR)] = 'I',
> + [EC(PK)] = 'K',
> +};
Please use an extra newline between the #define and the variable
definition.
>
> /*
> - * This helper function transforms the #PF error_code bits into " +P -W +U -R -I -K"
> + * This helper function transforms the #PF error_code bits into " +P !W +U !R !I !K"
> * type of descriptive, almost human-readable error strings:
> */
> -static void show_error_code(struct pt_regs *regs, unsigned long error_code)
> +static void show_error_code(unsigned long error_code)
> {
> - unsigned int bit, mask;
> - char err_txt[6*3+1]; /* Fixed length of 6 bits decoded plus zero at the end */
> + char err_txt[ARRAY_SIZE(error_code_chars)*3+1]; /* 3 chars per bit plus zero at the end */
> + unsigned offset = 0;
> + unsigned long mask;
> + int i;
>
> - /* We go from the X86_PF_PROT bit to the X86_PF_PK bit: */
> -
> - for (bit = 0; bit < 6; bit++) {
> - unsigned int offset = bit*3;
> + for (i = ARRAY_SIZE(error_code_chars) - 1; i >= 0; i--) {
> + if (!error_code_chars[i])
> + continue;
>
> err_txt[offset+0] = ' ';
>
> - mask = 1 << bit;
> + mask = 1 << i;
> if (error_code & mask)
> err_txt[offset+1] = '+';
> else
> - err_txt[offset+1] = '-';
> + err_txt[offset+1] = '!';
>
> - err_txt[offset+2] = error_code_chars[bit];
> + err_txt[offset+2] = error_code_chars[i];
> + offset += 3;
> }
>
> /* Close the string: */
> - err_txt[sizeof(err_txt)-1] = 0;
> + err_txt[offset] = 0;
>
> - pr_alert("#PF error code: %s (%02lx)\n", err_txt, error_code);
> + pr_alert("#PF error code(%04lx): %s\n", error_code, err_txt);
0x%04lx here, but other than that looks good to me!
Thanks,
Ingo
> On Dec 6, 2018, at 7:53 AM, Sean Christopherson <[email protected]> wrote:
>
>> On Thu, Dec 06, 2018 at 08:34:09AM +0100, Ingo Molnar wrote:
>> I like your '!' idea, but with a further simplification: how about using
>> '-/+' differentiation and a single character and a fixed-length message.
>>
>> The new output will be lines of:
>>
>> #PF error code: -P -W -U -S -I -K (0x00)
>> ...
>> #PF error code: -P -W +U +S -I -K (0x0c)
>> ...
>> #PF error code: +P +W +U +S +I +K (0x3f)
>>
>> The symbol abbreviations are pretty self-explanatory:
>>
>> P = protection fault (X86_PF_PROT)
>> W = write access (X86_PF_WRITE)
>> U = user-mode access (X86_PF_USER)
>> S = supervisor mode (X86_PF_RSVD)
>> I = instruction fault (X86_PF_INSTR)
>> K = keys fault (X86_PF_PK)
>>
>> Misc notes:
>>
>> - In principle the new text is now short enough to include it in one of
>> the existing output lines, further shortening the oops output - but I
>> havent done that in this patch.
>>
>> - Another question is the ordering of the bits: the symbolic display is
>> actually big endian, while the numeric hexa printout is little endian.
>>
>> I kind of still like it that way, not just because the decoding loop is
>> more natural, but because the bits are actually ordered by importance:
>> the PROT bits is more important than the INSTR or the PK bits - and the
>> more important bits are displayed first.
>
> Hmm, my eyes tend to be drawn to the end of the line, e.g. having PROT
> be the last thing makes it stand out more than being buried in the middle
> of the line. Inverting the ordering between raw and decoded also makes
> it very difficult to correlate the raw value with each bit.
>
>> - Only build-tested the patch and looked at the generated assembly, but
>> it all looks sane enough so will obviously work just fine! ;-)
>
> ...
>
>> /*
>> - * This helper function transforms the #PF error_code bits into
>> - * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
>> + * This maps the somewhat obscure error_code number to symbolic text:
>> + *
>> + * P = protection fault (X86_PF_PROT)
>> + * W = write access (X86_PF_WRITE)
>> + * U = user-mode access (X86_PF_USER)
>> + * S = supervisor mode (X86_PF_RSVD)
>> + * I = instruction fault (X86_PF_INSTR)
>> + * K = keys fault (X86_PF_PK)
>> */
>> -static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
>> +static const char error_code_chars[] = "PWUSIK";
>> +
>> +/*
>> + * This helper function transforms the #PF error_code bits into " +P -W +U -R -I -K"
>> + * type of descriptive, almost human-readable error strings:
>> + */
>> +static void show_error_code(struct pt_regs *regs, unsigned long error_code)
>
> No need for @regs.
>
>> {
>> - if (error_code & mask) {
>> - if (buf[0])
>> - strcat(buf, " ");
>> - strcat(buf, txt);
>> + unsigned int bit, mask;
>> + char err_txt[6*3+1]; /* Fixed length of 6 bits decoded plus zero at the end */
>
> Assuming the error code bits are contiguous breaks if/when SGX gets added,
> which uses bit 15. Oopsing on an SGX fault should be nigh impossible, but
> it might be nice to be able to reuse show_error_code in other places.
>
> Hardcoding "6" is also a bit painful.
>
>> +
>> + /* We go from the X86_PF_PROT bit to the X86_PF_PK bit: */
>> +
>> + for (bit = 0; bit < 6; bit++) {
>> + unsigned int offset = bit*3;
>> +
>> + err_txt[offset+0] = ' ';
>> +
>> + mask = 1 << bit;
>> + if (error_code & mask)
>> + err_txt[offset+1] = '+';
>> + else
>> + err_txt[offset+1] = '-';
>
> To me, using '!' contrasts better when side-by-side with '+'.
>
>> +
>> + err_txt[offset+2] = error_code_chars[bit];
>> }
>> +
>> + /* Close the string: */
>> + err_txt[sizeof(err_txt)-1] = 0;
>> +
>> + pr_alert("#PF error code: %s (%02lx)\n", err_txt, error_code);
>
> The changelog example has a leading "0x" on the error code. That being
> said, I actually like it without the "0x".
>
> How about printing the raw value before the colon? Having it at the end
> makes it look like extra noise. And for me, seeing the raw code first
> (reading left to right) cue's my brain that it's about to decode some
> bits.
>
> SGX will also break the two digit printing. And for whatever reason four
> digits makes me think "this is an error code!". IIRC the vectoring ucode
> makes a lot of assumptions about the error code being at most 16 bits, so
> in theory four digits is all we'll ever need.
>
> E.g.
>
> [ 0.144247] #PF error code: +P -W -U -S -I -K (01)
> [ 0.144411] #PF error code: +P +W -U -S -I -K (03)
> [ 0.144826] #PF error code: +P +W +U -S -I -K (07)
> [ 0.145252] #PF error code: +P -W +U -S -I +K (25)
> [ 0.145706] #PF error code: -P +W -U -S -I -K (02)
> [ 0.146111] #PF error code: -P -W +U -S -I -K (04)
> [ 0.146521] #PF error code: -P +W +U -S -I -K (06)
> [ 0.146934] #PF error code: -P -W +U -S +I -K (14)
> [ 0.147348] #PF error code: +P -W -U -S +I -K (11)
> [ 0.147767] #PF error code: -P -W -U -S -I -K (00)
>
> vs. (with SGX added as 'G' for testing purposes)
>
> [ 0.158849] #PF error code(0001): +P !W !U !S !I !K !G
> [ 0.159292] #PF error code(0003): +P +W !U !S !I !K !G
> [ 0.159742] #PF error code(0007): +P +W +U !S !I !K !G
> [ 0.160190] #PF error code(0025): +P !W +U !S !I +K !G
> [ 0.160638] #PF error code(0002): !P +W !U !S !I !K !G
> [ 0.161087] #PF error code(0004): !P !W +U !S !I !K !G
> [ 0.161538] #PF error code(0006): !P +W +U !S !I !K !G
> [ 0.161992] #PF error code(0014): !P !W +U !S +I !K !G
> [ 0.162450] #PF error code(0011): +P !W !U !S +I !K !G
> [ 0.162667] #PF error code(8001): +P !W !U !S !I !K +G
> [ 0.162667] #PF error code(8003): +P +W !U !S !I !K +G
> [ 0.162667] #PF error code(8007): +P +W +U !S !I !K +G
> [ 0.162667] #PF error code(8025): +P !W +U !S !I +K +G
> [ 0.162667] #PF error code(8002): !P +W !U !S !I !K +G
> [ 0.162667] #PF error code(8004): !P !W +U !S !I !K +G
> [ 0.162667] #PF error code(8006): !P +W +U !S !I !K +G
> [ 0.162667] #PF error code(8014): !P !W +U !S +I !K +G
> [ 0.162667] #PF error code(8011): +P !W !U !S +I !K +G
> [ 0.162667] #PF error code(0000): !P !W !U !S !I !K !G
>
Please don’t. The whole reason I added the decoding was to make it easy to read without a cheat sheet. This is incomprehensible without reference to the code, and I’m familiar with it to begin with.
How about:
#PF error code: 0001 [PROT read kernel]
#PF error code: 0001 [PROT WRITE kernel]
#PF error code: 0001 [PROT read kernel]
#PF error code: 8011 [PROT INSTR kernel SGX]
This has no noise from unset bits except that we add lowercase “read” or “kernel” as appropriate. Even “kernel” seems barely necessary.
* Andy Lutomirski <[email protected]> wrote:
> > vs. (with SGX added as 'G' for testing purposes)
> >
> > [ 0.158849] #PF error code(0001): +P !W !U !S !I !K !G
> > [ 0.159292] #PF error code(0003): +P +W !U !S !I !K !G
> > [ 0.159742] #PF error code(0007): +P +W +U !S !I !K !G
> > [ 0.160190] #PF error code(0025): +P !W +U !S !I +K !G
> > [ 0.160638] #PF error code(0002): !P +W !U !S !I !K !G
> > [ 0.161087] #PF error code(0004): !P !W +U !S !I !K !G
> > [ 0.161538] #PF error code(0006): !P +W +U !S !I !K !G
> > [ 0.161992] #PF error code(0014): !P !W +U !S +I !K !G
> > [ 0.162450] #PF error code(0011): +P !W !U !S +I !K !G
> > [ 0.162667] #PF error code(8001): +P !W !U !S !I !K +G
> > [ 0.162667] #PF error code(8003): +P +W !U !S !I !K +G
> > [ 0.162667] #PF error code(8007): +P +W +U !S !I !K +G
> > [ 0.162667] #PF error code(8025): +P !W +U !S !I +K +G
> > [ 0.162667] #PF error code(8002): !P +W !U !S !I !K +G
> > [ 0.162667] #PF error code(8004): !P !W +U !S !I !K +G
> > [ 0.162667] #PF error code(8006): !P +W +U !S !I !K +G
> > [ 0.162667] #PF error code(8014): !P !W +U !S +I !K +G
> > [ 0.162667] #PF error code(8011): +P !W !U !S +I !K +G
> > [ 0.162667] #PF error code(0000): !P !W !U !S !I !K !G
> >
>
> Please don’t. The whole reason I added the decoding was to make it easy
> to read without a cheat sheet. This is incomprehensible without
> reference to the code, and I’m familiar with it to begin with.
Dunno, I can deduct the meaning from the above abbreviations without a
cheat sheet and I'm sure you'll be able to too from now on. All the
letters are very obvious references - to me at least, and brevity and
predictable, fixed-length output matters.
> How about:
>
> #PF error code: 0001 [PROT read kernel]
>
> #PF error code: 0001 [PROT WRITE kernel]
>
> #PF error code: 0001 [PROT read kernel]
>
> #PF error code: 8011 [PROT INSTR kernel SGX]
>
> This has no noise from unset bits except that we add lowercase “read”
> or “kernel” as appropriate. Even “kernel” seems barely necessary.
The thing is, the 'noise' from unset bits is actually important
information as well - at least for the major bits: it was a mostly random
choice that Intel defined '1' for write access and not for read access.
Thanks,
Ingo
> On Dec 6, 2018, at 8:47 AM, Ingo Molnar <[email protected]> wrote:
>
>
> * Andy Lutomirski <[email protected]> wrote:
>
>>> vs. (with SGX added as 'G' for testing purposes)
>>>
>>> [ 0.158849] #PF error code(0001): +P !W !U !S !I !K !G
>>> [ 0.159292] #PF error code(0003): +P +W !U !S !I !K !G
>>> [ 0.159742] #PF error code(0007): +P +W +U !S !I !K !G
>>> [ 0.160190] #PF error code(0025): +P !W +U !S !I +K !G
>>> [ 0.160638] #PF error code(0002): !P +W !U !S !I !K !G
>>> [ 0.161087] #PF error code(0004): !P !W +U !S !I !K !G
>>> [ 0.161538] #PF error code(0006): !P +W +U !S !I !K !G
>>> [ 0.161992] #PF error code(0014): !P !W +U !S +I !K !G
>>> [ 0.162450] #PF error code(0011): +P !W !U !S +I !K !G
>>> [ 0.162667] #PF error code(8001): +P !W !U !S !I !K +G
>>> [ 0.162667] #PF error code(8003): +P +W !U !S !I !K +G
>>> [ 0.162667] #PF error code(8007): +P +W +U !S !I !K +G
>>> [ 0.162667] #PF error code(8025): +P !W +U !S !I +K +G
>>> [ 0.162667] #PF error code(8002): !P +W !U !S !I !K +G
>>> [ 0.162667] #PF error code(8004): !P !W +U !S !I !K +G
>>> [ 0.162667] #PF error code(8006): !P +W +U !S !I !K +G
>>> [ 0.162667] #PF error code(8014): !P !W +U !S +I !K +G
>>> [ 0.162667] #PF error code(8011): +P !W !U !S +I !K +G
>>> [ 0.162667] #PF error code(0000): !P !W !U !S !I !K !G
>>>
>>
>> Please don’t. The whole reason I added the decoding was to make it easy
>> to read without a cheat sheet. This is incomprehensible without
>> reference to the code, and I’m familiar with it to begin with.
>
> Dunno, I can deduct the meaning from the above abbreviations without a
> cheat sheet and I'm sure you'll be able to too from now on. All the
> letters are very obvious references - to me at least, and brevity and
> predictable, fixed-length output matters.
>
>> How about:
>>
>> #PF error code: 0001 [PROT read kernel]
>>
>> #PF error code: 0001 [PROT WRITE kernel]
>>
>> #PF error code: 0001 [PROT read kernel]
>>
>> #PF error code: 8011 [PROT INSTR kernel SGX]
>>
>> This has no noise from unset bits except that we add lowercase “read”
>> or “kernel” as appropriate. Even “kernel” seems barely necessary.
>
> The thing is, the 'noise' from unset bits is actually important
> information as well - at least for the major bits: it was a mostly random
> choice that Intel defined '1' for write access and not for read access.
>
>
That’s why I suggested “read,” in lowercase, for reads. Other than that, most of the unset bits are uninteresting. An OOPS is so likely to be a kernel fault that it’s barely worth mentioning, and I even added a whole separate diagnostic for user oopses. Similarly, I don’t think we need to remind the reader that an oops wasn’t an SGX error or that it wasn’t a PK error. So I think my idea highlights the interesting bits and avoids distraction from the uninteresting bits.
* Andy Lutomirski <[email protected]> wrote:
> That’s why I suggested “read,” in lowercase, for reads. Other than
> that, most of the unset bits are uninteresting. An OOPS is so likely to
> be a kernel fault that it’s barely worth mentioning, and I even added a
> whole separate diagnostic for user oopses. Similarly, I don’t think we
> need to remind the reader that an oops wasn’t an SGX error or that it
> wasn’t a PK error. So I think my idea highlights the interesting bits
> and avoids distraction from the uninteresting bits.
Ok - all good points.
Thanks,
Ingo
On Wed, Dec 5, 2018 at 11:34 PM Ingo Molnar <[email protected]> wrote:
>
> Yeah, so I don't like the overly long 'SUPERVISOR' and the somewhat
> inconsistent, sporadic handling of negatives. Here's our error code bits:
>
> /*
> * Page fault error code bits:
> *
> * bit 0 == 0: no page found 1: protection fault
> * bit 1 == 0: read access 1: write access
> * bit 2 == 0: kernel-mode access 1: user-mode access
No. Really not at all.
Bit 2 is *not* "kernel vs user". Never has been. Never will be.
It's a single bit that mixes up *three* different cases:
- regular user mode access (value: 1)
- regular CPL0 access (value: 0)
- CPU system access (value: 0)
and that third case really is important and relevant. And importantly,
it can happen from user space.
In fact, these days we possibly have a fourth case:
- kernel access using wruss (value: 1)
and I'd rather see just the numbers (which you have to know to decode)
than see the simplified AND WRONG decoding of those numbers.
Please don't ever confuse the fault U/S bit with "user vs kernel".
It's just not true, and people should be very very aware of it now
being true.
If you care whether a page fault happened in user mode or not, you
have to look at the register state (ie "user_mode(regs)").
Please call the U/S bit something else than "user" or "kernel".
Linus
On Thu, Dec 6, 2018 at 10:15 AM Linus Torvalds
<[email protected]> wrote:
>
> On Wed, Dec 5, 2018 at 11:34 PM Ingo Molnar <[email protected]> wrote:
> >
> > Yeah, so I don't like the overly long 'SUPERVISOR' and the somewhat
> > inconsistent, sporadic handling of negatives. Here's our error code bits:
> >
> > /*
> > * Page fault error code bits:
> > *
> > * bit 0 == 0: no page found 1: protection fault
> > * bit 1 == 0: read access 1: write access
> > * bit 2 == 0: kernel-mode access 1: user-mode access
>
> No. Really not at all.
>
> Bit 2 is *not* "kernel vs user". Never has been. Never will be.
>
> It's a single bit that mixes up *three* different cases:
>
> - regular user mode access (value: 1)
>
> - regular CPL0 access (value: 0)
>
> - CPU system access (value: 0)
>
> and that third case really is important and relevant. And importantly,
> it can happen from user space.
>
> In fact, these days we possibly have a fourth case:
>
> - kernel access using wruss (value: 1)
Indeed.
>
> and I'd rather see just the numbers (which you have to know to decode)
> than see the simplified AND WRONG decoding of those numbers.
That's why the very next line in the OOPS explains this.
>
> Please don't ever confuse the fault U/S bit with "user vs kernel".
> It's just not true, and people should be very very aware of it now
> being true.
>
> If you care whether a page fault happened in user mode or not, you
> have to look at the register state (ie "user_mode(regs)").
The code we're arguing over was part of a patch set to fix this
confusion in the page fault code for real.
>
> Please call the U/S bit something else than "user" or "kernel".
Dunno. I kind of like following the SDM.
How do you like the attached patch?
On Thu, Dec 6, 2018 at 11:07 AM Andy Lutomirski <[email protected]> wrote:
>
> How do you like the attached patch?
I agree with whoever thought it's odd that "read" is in lower case
when everything else is in upper case.
And honestly, I'd just siggest making the err_text simply have the
real user/kernel difference in it too, using something like
pr_alert("#PF error: 0x%04lx%s from %s mode\n", error_code, err_txt,
user_mode(regs) ? "user" : "kernel" );
The "oh, PF_USER and user_mode differs, so let's point that out
explicitly" thing is a good thing regardless.
Linus
On Thu, Dec 6, 2018 at 12:24 PM Linus Torvalds
<[email protected]> wrote:
>
> On Thu, Dec 6, 2018 at 11:07 AM Andy Lutomirski <[email protected]> wrote:
> >
> > How do you like the attached patch?
>
> I agree with whoever thought it's odd that "read" is in lower case
> when everything else is in upper case.
"read" isn't an actual bit in the error code, so I thought it would be
polite to make it look a little bit different. That's not a big deal,
though.
>
> And honestly, I'd just siggest making the err_text simply have the
> real user/kernel difference in it too, using something like
>
> pr_alert("#PF error: 0x%04lx%s from %s mode\n", error_code, err_txt,
> user_mode(regs) ? "user" : "kernel" );
>
> The "oh, PF_USER and user_mode differs, so let's point that out
> explicitly" thing is a good thing regardless.
Sure. Although it's extremely odd for us to OOPS from user mode, so
maybe the OOPS code in general should print a big fat warning, and
we'll just otherwise assume it was from kernel mode. (In your tree
right now, we print the wrong value for CS for OOPSes from user mode,
which is extra confusing. That's fixed in -tip.)
--Andy
On Thu, Dec 6, 2018 at 12:28 PM Andy Lutomirski <[email protected]> wrote:
>
> "read" isn't an actual bit in the error code, so I thought it would be
> polite to make it look a little bit different.
If you care about the bits in the error code, then just look at the number.
And if you care about what the numbers mean, it doesn't matter how it's encoded.
I don't think you should mix up the two concepts.
> Sure. Although it's extremely odd for us to OOPS from user mode, so
> maybe the OOPS code in general should print a big fat warning, and
> we'll just otherwise assume it was from kernel mode.
Yeah, the "from user mode" case is generally really something horribly
bad (reserved bits being set or us screwing up LDT's etc), so yeah,
maybe a better model is indeed to point that out explicitly, the same
way the "user/kernel doesn't match U/S bit" is pointed out.
Linus
Linus pointed out[1] that deciphering the raw #PF error code and
printing a more human readable message are two different things,
e.g. not-executable, not-writable, protection keys and even SGX faults
are all some form of permission violation faults, and a #PF with the
USER bit cleared in the error code does not indicate that the fault
originated in kernel code.
Remove the per-bit decoding of the error code and instead print the raw
error code followed by a brief description of what caused the fault, the
effective privilege level of the faulting access, and whether the fault
originated in user code or kernel code.
This provides the user with the information needed to do basic triage
on 99.9% oopses without polluting the message with information that is
useless the vast majority of the time, e.g. an oops on a protection keys
violation is beyond rare, stating that an oops *wasn't* due to a keys
violation is pointless noise.
[1] https://lkml.kernel.org/r/CAHk-=whk_fsnxVMvF1T2fFCaP2WrvSybABrLQCWLJyCvHw6NKA@mail.gmail.com
Suggested-by: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yu-cheng Yu <[email protected]>
Cc: [email protected]
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/mm/fault.c | 41 ++++++++++-------------------------------
1 file changed, 10 insertions(+), 31 deletions(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2ff25ad33233..85de05d41f58 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -603,24 +603,9 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
name, index, addr, (desc.limit0 | (desc.limit1 << 16)));
}
-/*
- * This helper function transforms the #PF error_code bits into
- * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
- */
-static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
-{
- if (error_code & mask) {
- if (buf[0])
- strcat(buf, " ");
- strcat(buf, txt);
- }
-}
-
static void
show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address)
{
- char err_txt[64];
-
if (!oops_may_print())
return;
@@ -648,27 +633,21 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
address < PAGE_SIZE ? "NULL pointer dereference" : "paging request",
(void *)address);
- err_txt[0] = 0;
-
- /*
- * Note: length of these appended strings including the separation space and the
- * zero delimiter must fit into err_txt[].
- */
- err_str_append(error_code, err_txt, X86_PF_PROT, "[PROT]" );
- err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
- err_str_append(error_code, err_txt, X86_PF_USER, "[USER]" );
- err_str_append(error_code, err_txt, X86_PF_RSVD, "[RSVD]" );
- err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
- err_str_append(error_code, err_txt, X86_PF_PK, "[PK]" );
-
- pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]");
+ pr_alert("#PF: error_code(0x%04lx) - %s\n", error_code,
+ !(error_code & X86_PF_PROT) ? "not-present page" :
+ (error_code & X86_PF_RSVD) ? "reserved bit violation" :
+ "permissions violation");
+ pr_alert("#PF: %s-privileged %s from %s code\n",
+ (error_code & X86_PF_USER) ? "user" : "supervisor",
+ (error_code & X86_PF_INSTR) ? "instruction fetch" :
+ (error_code & X86_PF_WRITE) ? "write access" :
+ "read access",
+ user_mode(regs) ? "user" : "kernel");
if (!(error_code & X86_PF_USER) && user_mode(regs)) {
struct desc_ptr idt, gdt;
u16 ldtr, tr;
- pr_alert("This was a system access from user code\n");
-
/*
* This can happen for quite a few reasons. The more obvious
* ones are faults accessing the GDT, or LDT. Perhaps
--
2.19.2
On Fri, Dec 7, 2018 at 10:44 AM Sean Christopherson
<[email protected]> wrote:
>
> Remove the per-bit decoding of the error code and instead print the raw
> error code followed by a brief description of what caused the fault, the
> effective privilege level of the faulting access, and whether the fault
> originated in user code or kernel code.
This doesn't quite work as-is, though.
For example, at least the PK bit is independent of the other bits and
would be interesting in the human-legible version, but doesn't show up
there at all.
That said, I think the end result might be more legible than the
previous version, so this approach may well be good, it just needs at
least that "permissions violation" part to be extended with whether
it was PK or not.
Also, shouldn't we show the SGX bit too as some kind of "during SGX"
extension on the "in user/kernel space" part?
Linus
On Fri, Dec 07, 2018 at 10:52:49AM -0800, Linus Torvalds wrote:
> On Fri, Dec 7, 2018 at 10:44 AM Sean Christopherson
> <[email protected]> wrote:
> >
> > Remove the per-bit decoding of the error code and instead print the raw
> > error code followed by a brief description of what caused the fault, the
> > effective privilege level of the faulting access, and whether the fault
> > originated in user code or kernel code.
>
> This doesn't quite work as-is, though.
>
> For example, at least the PK bit is independent of the other bits and
> would be interesting in the human-legible version, but doesn't show up
> there at all.
Heh, I actually intentionally omitted protection keys thinking it'd be
superfluous, i.e. "go look at the error code bits if you care that much".
> That said, I think the end result might be more legible than the
> previous version, so this approach may well be good, it just needs at
> least that "permissions violation" part to be extended with whether
> it was PK or not.
>
> Also, shouldn't we show the SGX bit too as some kind of "during SGX"
> extension on the "in user/kernel space" part?
The SGX bit isn't defined in mainline yet. But yeah, I can see how
printing e.g. "SGX EPCM violation" would be a lot more helpful than
a vanilla "permissions violation". I'll send a v2 with the PK bit
added and a slightly reworded changelog.
Linus pointed out that deciphering the raw #PF error code and printing
a more human readable message are two different things, and also that
printing the negative cases is mostly just noise[1]. For example, the
USER bit doesn't mean the fault originated in user code and stating
that an oops wasn't due to a protection keys violation isn't interesting
since an oops on a keys violation is a one-in-a-million scenario.
Remove the per-bit decoding of the error code and instead print:
- the raw error code
- why the fault occurred
- the effective privilege level of the access
- the type of access
- whether the fault originated in user code or kernel code
This provides the user with the information needed to triage 99.9% of
oopses without polluting the log with useless information or conflating
the error_code with the CPL.
[1] https://lkml.kernel.org/r/CAHk-=whk_fsnxVMvF1T2fFCaP2WrvSybABrLQCWLJyCvHw6NKA@mail.gmail.com
Suggested-by: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yu-cheng Yu <[email protected]>
Cc: [email protected]
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
v2:
- Explicitly call out protection keys violations
- "Slightly" reword the changelog
arch/x86/mm/fault.c | 42 +++++++++++-------------------------------
1 file changed, 11 insertions(+), 31 deletions(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2ff25ad33233..26feb8c525c1 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -603,24 +603,9 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
name, index, addr, (desc.limit0 | (desc.limit1 << 16)));
}
-/*
- * This helper function transforms the #PF error_code bits into
- * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
- */
-static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
-{
- if (error_code & mask) {
- if (buf[0])
- strcat(buf, " ");
- strcat(buf, txt);
- }
-}
-
static void
show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address)
{
- char err_txt[64];
-
if (!oops_may_print())
return;
@@ -648,27 +633,22 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
address < PAGE_SIZE ? "NULL pointer dereference" : "paging request",
(void *)address);
- err_txt[0] = 0;
-
- /*
- * Note: length of these appended strings including the separation space and the
- * zero delimiter must fit into err_txt[].
- */
- err_str_append(error_code, err_txt, X86_PF_PROT, "[PROT]" );
- err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
- err_str_append(error_code, err_txt, X86_PF_USER, "[USER]" );
- err_str_append(error_code, err_txt, X86_PF_RSVD, "[RSVD]" );
- err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
- err_str_append(error_code, err_txt, X86_PF_PK, "[PK]" );
-
- pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]");
+ pr_alert("#PF: error_code(0x%04lx) - %s\n", error_code,
+ !(error_code & X86_PF_PROT) ? "not-present page" :
+ (error_code & X86_PF_RSVD) ? "reserved bit violation" :
+ (error_code & X86_PF_PK) ? "protection keys violation" :
+ "permissions violation");
+ pr_alert("#PF: %s-privileged %s from %s code\n",
+ (error_code & X86_PF_USER) ? "user" : "supervisor",
+ (error_code & X86_PF_INSTR) ? "instruction fetch" :
+ (error_code & X86_PF_WRITE) ? "write access" :
+ "read access",
+ user_mode(regs) ? "user" : "kernel");
if (!(error_code & X86_PF_USER) && user_mode(regs)) {
struct desc_ptr idt, gdt;
u16 ldtr, tr;
- pr_alert("This was a system access from user code\n");
-
/*
* This can happen for quite a few reasons. The more obvious
* ones are faults accessing the GDT, or LDT. Perhaps
--
2.19.2
On Fri, Dec 7, 2018 at 11:52 AM Sean Christopherson
<[email protected]> wrote:
>
> Remove the per-bit decoding of the error code and instead print:
The patch looks fine to me, so feel free to add an acked-by, but:
(a) I'm not the one who wanted the human-legible version in the first
place, since I'm also perfectly ok with just the error code, so my
judgement is obviously garbage wrt this whole thing
(b) it would be good to have a couple of actual examples of the
printout to judge.
I can certainly imagine how it looks just from the patch, but maybe if
I actually see reality I'd go "eww". Or somebody else goes "eww".
Linus
On Fri, Dec 07, 2018 at 12:46:30PM -0800, Linus Torvalds wrote:
> On Fri, Dec 7, 2018 at 11:52 AM Sean Christopherson
> <[email protected]> wrote:
> >
> > Remove the per-bit decoding of the error code and instead print:
>
> The patch looks fine to me, so feel free to add an acked-by, but:
>
> (a) I'm not the one who wanted the human-legible version in the first
> place, since I'm also perfectly ok with just the error code, so my
> judgement is obviously garbage wrt this whole thing
>
> (b) it would be good to have a couple of actual examples of the
> printout to judge.
>
> I can certainly imagine how it looks just from the patch, but maybe if
> I actually see reality I'd go "eww". Or somebody else goes "eww".
Here's what it looks like in the full oops context:
[ 14.109977] BUG: unable to handle kernel paging request at ffffbeef00000000
[ 14.110604] #PF: error_code(0x0000) - not-present page
[ 14.111048] #PF: supervisor-privileged read access from kernel code
[ 14.111594] PGD 0 P4D 0
[ 14.111820] Oops: 0000 [#1] SMP
[ 14.112098] CPU: 4 PID: 1007 Comm: modprobe Not tainted 4.20.0-rc2+ #91
[ 14.112668] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 14.113344] RIP: 0010:hardware_setup+0xab/0x6f1 [kvm_intel]
[ 14.113825] Code: 05 30 8c ff ff 0f 84 56 05 00 00 0f 31 83 e0 03 75 13 48 b8 00 00 00 00 ef be ff ff 48 c7 00 00 00 00 00 eb 2b 48 ff c8 75 0c <48> a1 00 00 00 00 ef be ff ff eb 1a 48 8b 15 50 ff f6 ff 48 b8 00
[ 14.115421] RSP: 0018:ffffc9000058bbb8 EFLAGS: 00010246
[ 14.115878] RAX: 0000000000000000 RBX: 0000000000000006 RCX: ffffea0009d42148
[ 14.116497] RDX: 000000000000000a RSI: dead000000000200 RDI: 00000000006000c0
[ 14.117115] RBP: ffffc9000058bc50 R08: 0000000000000006 R09: 000000000016a6d0
[ 14.117731] R10: ffffc9000058bc68 R11: ffff8882750ef5e8 R12: 0000000000005e40
[ 14.118346] R13: ffffffffa014dd00 R14: ffff888270273480 R15: ffffc9000058be98
[ 14.118961] FS: 00007ffac1c70700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000
[ 14.119661] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 14.120160] CR2: ffffbeef00000000 CR3: 0000000270352004 CR4: 0000000000360ee0
[ 14.120778] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 14.121393] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 14.122006] Call Trace:
[ 14.122235] kvm_arch_hardware_setup+0x42/0x200 [kvm]
[ 14.122685] ? vmx_check_processor_compat+0x8d/0x8d [kvm_intel]
[ 14.123201] ? kvm_init+0x79/0x280 [kvm]
[ 14.123552] kvm_init+0x79/0x280 [kvm]
[ 14.123883] ? vmx_check_processor_compat+0x8d/0x8d [kvm_intel]
[ 14.124394] vmx_init+0x9d/0x400 [kvm_intel]
[ 14.124766] ? vmx_check_processor_compat+0x8d/0x8d [kvm_intel]
[ 14.125277] do_one_initcall+0x4d/0x1c4
[ 14.125613] ? kmem_cache_alloc_trace+0x30/0x1a0
[ 14.126014] do_init_module+0x5b/0x20d
[ 14.126342] load_module+0x2389/0x2980
[ 14.126670] ? vfs_read+0x117/0x130
[ 14.126976] ? __do_sys_finit_module+0xd2/0x100
[ 14.127369] __do_sys_finit_module+0xd2/0x100
[ 14.127751] do_syscall_64+0x4f/0x100
[ 14.128078] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 14.128504] RIP: 0033:0x7ffac178e4d9
[ 14.128810] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01 48
[ 14.130358] RSP: 002b:00007ffc5b68d0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 14.131023] RAX: ffffffffffffffda RBX: 00005581920445c0 RCX: 00007ffac178e4d9
[ 14.131658] RDX: 0000000000000000 RSI: 0000558192048850 RDI: 0000000000000005
[ 14.132256] RBP: 0000558192048850 R08: 0000000000000000 R09: 0000000000000008
[ 14.132856] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000000000
[ 14.133454] R13: 0000558192044640 R14: 0000000000040000 R15: 0000000000000008
[ 14.134055] Modules linked in: kvm_intel(+) kvm irqbypass bridge stp llc
[ 14.134649] CR2: ffffbeef00000000
[ 14.134933] ---[ end trace 8dce06ca17fa1e39 ]---
Looking at it again, my own personal preference would be to swap the order
of the #PF lines. For most cases the three main lines show up in reverse
fir-tree ordering and the cause of the fault is easy to pick out since it's
the last thing highlighted by pr_alert (excepting when we dump the IDT,
GDT, etc...).
[ 160.246820] BUG: unable to handle kernel paging request at ffffbeef00000000
[ 160.247517] #PF: supervisor-privileged instruction fetch from kernel code
[ 160.248085] #PF: error_code(0x0010) - not-present page
[ 160.248517] PGD 0 P4D 0
[ 160.248738] Oops: 0010 [#1] SMP
[ 160.249012] CPU: 4 PID: 1003 Comm: modprobe Not tainted 4.20.0-rc2+ #93
[ 160.249566] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 160.250221] RIP: 0010:0xffffbeef00000000
[ 160.250556] Code: Bad RIP value.
[ 160.250833] RSP: 0018:ffffc90000447bf8 EFLAGS: 00010246
[ 160.251274] RAX: ffffbeef00000000 RBX: 00000000ffffffff RCX: 0000000000000000
[ 160.251871] RDX: ffff8882733dd7c0 RSI: 000000000000000c RDI: ffffffff820379c0
[ 160.252469] RBP: ffffc90000447c50 R08: 0000000000000006 R09: 0000000000000000
[ 160.253120] R10: ffffc90000447c68 R11: ffff88827518cfe8 R12: 0000000000005e40
[ 160.253717] R13: ffffffffa014dd00 R14: ffff888274431900 R15: ffffc90000447e98
[ 160.254322] FS: 00007f54cc42b700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000
[ 160.254982] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 160.255487] CR2: ffffbeeeffffffd6 CR3: 0000000271020006 CR4: 0000000000360ee0
[ 160.256074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 160.256660] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 160.257297] Call Trace:
[ 160.257521] ? kvm_arch_hardware_setup+0x42/0x200 [kvm]
[ 160.257964] ? vmx_check_processor_compat+0x41d/0x41d [kvm_intel]
[ 160.258481] ? kvm_init+0x79/0x280 [kvm]
[ 160.258820] ? kvm_init+0x79/0x280 [kvm]
[ 160.259156] ? vmx_check_processor_compat+0x41d/0x41d [kvm_intel]
[ 160.259669] ? vmx_init+0x9d/0xaee [kvm_intel]
[ 160.260047] ? vmx_check_processor_compat+0x41d/0x41d [kvm_intel]
[ 160.260560] ? do_one_initcall+0x4d/0x1c4
[ 160.260911] ? kmem_cache_alloc_trace+0x30/0x1a0
[ 160.261307] ? do_init_module+0x5b/0x20d
[ 160.261641] ? load_module+0x2389/0x2980
[ 160.261974] ? vfs_read+0x117/0x130
[ 160.262272] ? __do_sys_finit_module+0xd2/0x100
[ 160.262655] ? __do_sys_finit_module+0xd2/0x100
[ 160.263038] ? do_syscall_64+0x4f/0x100
[ 160.263364] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 160.263804] Modules linked in: kvm_intel(+) kvm irqbypass bridge stp llc
[ 160.264367] CR2: ffffbeef00000000
[ 160.264651] ---[ end trace 2b9acada0f38bc58 ]---
Other combinations with the current order:
[ 0.143436] BUG: unable to handle kernel paging request at ffffc90000230000
[ 0.143985] #PF: error_code(0x000b) - reserved bit violation
[ 0.144505] #PF: supervisor-privileged write access from kernel code
[ 0.145090] BUG: unable to handle kernel paging request at ffffc90000230000
[ 0.145755] #PF: error_code(0x0019) - reserved bit violation
[ 0.146244] #PF: supervisor-privileged instruction fetch from kernel code
[ 0.146847] BUG: unable to handle kernel paging request at ffffc90000230000
[ 0.147040] #PF: error_code(0x0009) - reserved bit violation
[ 0.147040] #PF: supervisor-privileged read access from kernel code
[ 0.147040] BUG: unable to handle kernel paging request at ffffc90000230000
[ 0.147040] #PF: error_code(0x0023) - protection keys violation
[ 0.147040] #PF: supervisor-privileged write access from kernel code
[ 0.147040] BUG: unable to handle kernel paging request at ffffc90000230000
[ 0.147040] #PF: error_code(0x0031) - protection keys violation
[ 0.147040] #PF: supervisor-privileged instruction fetch from kernel code
[ 0.147040] BUG: unable to handle kernel paging request at ffffc90000230000
[ 0.147040] #PF: error_code(0x0021) - protection keys violation
[ 0.147040] #PF: supervisor-privileged read access from kernel code
[ 0.147040] BUG: unable to handle kernel paging request at ffffc90000230000
[ 0.147040] #PF: error_code(0x0025) - protection keys violation
[ 0.147040] #PF: user-privileged read access from kernel code
[ 0.147040] BUG: unable to handle kernel paging request at ffffc90000230000
[ 0.147040] #PF: error_code(0x0007) - permissions violation
[ 0.147040] #PF: user-privileged write access from kernel code
[ 0.147040] BUG: unable to handle kernel paging request at ffffc90000230000
[ 0.147040] #PF: error_code(0x0015) - permissions violation
[ 0.147040] #PF: user-privileged instruction fetch from kernel code
[ 0.147040] BUG: unable to handle kernel paging request at ffffc90000230000
[ 0.147040] #PF: error_code(0x0005) - permissions violation
[ 0.147040] #PF: user-privileged read access from kernel code
[ 0.147040] BUG: unable to handle kernel paging request at ffffc90000230000
[ 0.147040] #PF: error_code(0x0004) - not-present page
[ 0.147040] #PF: user-privileged read access from kernel code
[ 0.147040] BUG: unable to handle kernel paging request at ffffc90000230000
[ 0.147040] #PF: error_code(0x0000) - not-present page
[ 0.147040] #PF: supervisor-privileged read access from kernel code
On Fri, Dec 7, 2018 at 2:06 PM Sean Christopherson
<[email protected]> wrote:
>
> Looking at it again, my own personal preference would be to swap the order
> of the #PF lines.
Yeah, probably.
Also:
> [ 160.246820] BUG: unable to handle kernel paging request at ffffbeef00000000
> [ 160.247517] #PF: supervisor-privileged instruction fetch from kernel code
> [ 160.248085] #PF: error_code(0x0010) - not-present page
With this form, I think the "kernel" in the first line is actually
misleading. Yes, it's a #PF for the kernel, but then the "kernel" on
the second line talks about what mode we were in when it happened, so
we have two different meanings of "kernel" on two adjacent lines.
So maybe that "BUG: unable to handle kernel paging request" message
should be something like
"BUG: unable to handle page fault for address ffffbeef00000000"
instead? Does that make sense to people?
Anyway, enough bike-shedding from me, I'll just shut up about this,
since I don't really care all that deeply, and I wasn't really the
target audience anyway. Sorry for the noise, and I'll leave the
decision to the people who actually wanted this.
Linus
On Fri, Dec 7, 2018 at 2:14 PM Linus Torvalds
<[email protected]> wrote:
>
> On Fri, Dec 7, 2018 at 2:06 PM Sean Christopherson
> <[email protected]> wrote:
> >
> > Looking at it again, my own personal preference would be to swap the order
> > of the #PF lines.
>
> Yeah, probably.
>
> Also:
>
> > [ 160.246820] BUG: unable to handle kernel paging request at ffffbeef00000000
> > [ 160.247517] #PF: supervisor-privileged instruction fetch from kernel code
> > [ 160.248085] #PF: error_code(0x0010) - not-present page
>
> With this form, I think the "kernel" in the first line is actually
> misleading. Yes, it's a #PF for the kernel, but then the "kernel" on
> the second line talks about what mode we were in when it happened, so
> we have two different meanings of "kernel" on two adjacent lines.
I'm okay with this variant. I have a slight preference for:
#PF: supervisor-privileged instruction fetch from kernel code
#PF error_code: 0x0010 [READ]
Which is what we'd get from Sean's patch plus my patch here:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm&id=ccfb1941f90153818c07fb1a7dc22121a970d252
Sean, what do you think?
> So maybe that "BUG: unable to handle kernel paging request" message
> should be something like
>
> "BUG: unable to handle page fault for address ffffbeef00000000"
>
> instead? Does that make sense to people?
Yes please.
On Fri, Dec 07, 2018 at 03:57:10PM -0800, Andy Lutomirski wrote:
> On Fri, Dec 7, 2018 at 2:14 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > On Fri, Dec 7, 2018 at 2:06 PM Sean Christopherson
> > <[email protected]> wrote:
> > >
> > > Looking at it again, my own personal preference would be to swap the order
> > > of the #PF lines.
> >
> > Yeah, probably.
> >
> > Also:
> >
> > > [ 160.246820] BUG: unable to handle kernel paging request at ffffbeef00000000
> > > [ 160.247517] #PF: supervisor-privileged instruction fetch from kernel code
> > > [ 160.248085] #PF: error_code(0x0010) - not-present page
> >
> > With this form, I think the "kernel" in the first line is actually
> > misleading. Yes, it's a #PF for the kernel, but then the "kernel" on
> > the second line talks about what mode we were in when it happened, so
> > we have two different meanings of "kernel" on two adjacent lines.
>
> I'm okay with this variant. I have a slight preference for:
>
> #PF: supervisor-privileged instruction fetch from kernel code
> #PF error_code: 0x0010 [READ]
[INSTR], but I get the gist :)
> Which is what we'd get from Sean's patch plus my patch here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm&id=ccfb1941f90153818c07fb1a7dc22121a970d252
>
> Sean, what do you think?
Munging the two concepts is my least favorite approach. Printing the
individual bits becomes redundant (with the first line) in many cases,
and superfluous in other cases, e.g. [PROT] is effectively implied by
[RSVD], [PK] and [SGX].
In the example above, printing "[INSTR]" doesn't provide any new info
since the line above already states it was an instruction fetch, and
it never provides a human-readable message describing *why* the fault
occurred.
It'd be more palatable if we printed the negative case for PROT, e.g.
"[!PROT]", but that re-opens the discussion on which bits should be
printed in the negative case. Like Ingo said, it's rather arbitrary
that USER=1 instead of SUPERVISOR=1.
> > So maybe that "BUG: unable to handle kernel paging request" message
> > should be something like
> >
> > "BUG: unable to handle page fault for address ffffbeef00000000"
> >
> > instead? Does that make sense to people?
>
> Yes please.