Linus,
Please pull the safe-poison-pointers commit from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git safe-poison-pointers
this didnt fit into any of the other categories and affects all
architectures (via the opt-in CONFIG_ILLEGAL_POINTER_VALUE).
Thanks,
Ingo
------------------>
Avi Kivity (1):
core, x86: make LIST_POISON less deadly
arch/x86/Kconfig | 5 +++++
include/linux/poison.h | 10 ++++++++--
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e0edaaa..f09b3e5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1019,6 +1019,11 @@ config ARCH_MEMORY_PROBE
def_bool X86_64
depends on MEMORY_HOTPLUG
+config ILLEGAL_POINTER_VALUE
+ hex
+ default 0 if X86_32
+ default 0xffffc10000000000 if X86_64
+
source "mm/Kconfig"
config HIGHPTE
diff --git a/include/linux/poison.h b/include/linux/poison.h
index 9f31683..0d105a5 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -1,14 +1,20 @@
#ifndef _LINUX_POISON_H
#define _LINUX_POISON_H
+#ifdef CONFIG_ILLEGAL_POINTER_VALUE
+#define POISON_POINTER_DELTA CONFIG_ILLEGAL_POINTER_VALUE
+#else
+#define POISON_POINTER_DELTA 0L
+#endif
+
/********** include/linux/list.h **********/
/*
* These are non-NULL pointers that will result in page faults
* under normal circumstances, used to verify that nobody uses
* non-initialized list entries.
*/
-#define LIST_POISON1 ((void *) 0x00100100)
-#define LIST_POISON2 ((void *) 0x00200200)
+#define LIST_POISON1 ((void *) 0x00100100 + POISON_POINTER_DELTA)
+#define LIST_POISON2 ((void *) 0x00200200 + POISON_POINTER_DELTA)
/********** include/linux/timer.h **********/
/*
On Mon, 14 Jul 2008, Ingo Molnar wrote:
>
> +config ILLEGAL_POINTER_VALUE
> + hex
> + default 0 if X86_32
> + default 0xffffc10000000000 if X86_64
This looks like a singularly bad pointer value on x86-64.
Why not pick something that is *guaranteed* to fault? The above looks like
any future setup that supports 41 bits of addressing and has extended the
page tables (yes, it will happen eventually) will find that to be a
perfectly valid address?
It's also visually confusing, since it's visually very close to a real
kernel pointer too.
Grr.
Why not use something sane like 0xdead000000000000, which has the high bit
set but very fundamentally isn't a valid pointer, and never will be? And
which is a *lot* more visually obvious too!
Linus
* Linus Torvalds <[email protected]> wrote:
> On Mon, 14 Jul 2008, Ingo Molnar wrote:
> >
> > +config ILLEGAL_POINTER_VALUE
> > + hex
> > + default 0 if X86_32
> > + default 0xffffc10000000000 if X86_64
>
> This looks like a singularly bad pointer value on x86-64.
>
> Why not pick something that is *guaranteed* to fault? The above looks
> like any future setup that supports 41 bits of addressing and has
> extended the page tables (yes, it will happen eventually) will find
> that to be a perfectly valid address?
>
> It's also visually confusing, since it's visually very close to a real
> kernel pointer too.
>
> Grr.
>
> Why not use something sane like 0xdead000000000000, which has the high
> bit set but very fundamentally isn't a valid pointer, and never will
> be? And which is a *lot* more visually obvious too!
initially i suggested that too - but such addresses raise a #GP instead
of a page fault so their decoding is a bit harder.
We dont do any instruction decoding in #GP handlers to figure out what
happened, while in the pagefault case we know which address faulted,
etc.
Perhaps we could try to make #GP handlers a bit more informative -
although decoding instructions will make things a bit more fragile
inevitably.
Perhaps make it 0xffffcdead0000000 ?
in any case, please ignore this topic until it's all worked out - no
other topics depend on this one so it can be skipped safely.
Ingo
Ingo Molnar wrote:
> * Linus Torvalds <[email protected]> wrote:
>
>
>> On Mon, 14 Jul 2008, Ingo Molnar wrote:
>>
>>>
>>> +config ILLEGAL_POINTER_VALUE
>>> + hex
>>> + default 0 if X86_32
>>> + default 0xffffc10000000000 if X86_64
>>>
>> This looks like a singularly bad pointer value on x86-64.
>>
>> Why not pick something that is *guaranteed* to fault? The above looks
>> like any future setup that supports 41 bits of addressing and has
>> extended the page tables (yes, it will happen eventually) will find
>> that to be a perfectly valid address?
>>
>> It's also visually confusing, since it's visually very close to a real
>> kernel pointer too.
>>
>> Grr.
>>
>> Why not use something sane like 0xdead000000000000, which has the high
>> bit set but very fundamentally isn't a valid pointer, and never will
>> be? And which is a *lot* more visually obvious too!
>>
>
> initially i suggested that too - but such addresses raise a #GP instead
> of a page fault so their decoding is a bit harder.
>
> We dont do any instruction decoding in #GP handlers to figure out what
> happened, while in the pagefault case we know which address faulted,
> etc.
>
> Perhaps we could try to make #GP handlers a bit more informative -
> although decoding instructions will make things a bit more fragile
> inevitably.
>
> Perhaps make it 0xffffcdead0000000 ?
>
We could have the oops handler detect this address range, and point out
the problem in plain English.
--
error compiling committee.c: too many arguments to function
On Mon, 14 Jul 2008, Ingo Molnar wrote:
> >
> > Why not use something sane like 0xdead000000000000, which has the high
> > bit set but very fundamentally isn't a valid pointer, and never will
> > be? And which is a *lot* more visually obvious too!
>
> initially i suggested that too - but such addresses raise a #GP instead
> of a page fault so their decoding is a bit harder.
But raising a GP is exactly what you want: a PF is an indication that the
address was actually half-way valid, and will not fault at all on some
(possibly future) machine.
> We dont do any instruction decoding in #GP handlers to figure out what
> happened, while in the pagefault case we know which address faulted,
> etc.
Why would we care? It would be very obvious from the instruction
disassembly plus the register contents. No need to decode instructions.
> Perhaps we could try to make #GP handlers a bit more informative -
> although decoding instructions will make things a bit more fragile
> inevitably.
>
> Perhaps make it 0xffffcdead0000000 ?
I'm really not seeing the reason for not just doing it right.
Linus
* Linus Torvalds <[email protected]> wrote:
> On Mon, 14 Jul 2008, Ingo Molnar wrote:
> > >
> > > Why not use something sane like 0xdead000000000000, which has the high
> > > bit set but very fundamentally isn't a valid pointer, and never will
> > > be? And which is a *lot* more visually obvious too!
> >
> > initially i suggested that too - but such addresses raise a #GP instead
> > of a page fault so their decoding is a bit harder.
>
> But raising a GP is exactly what you want: a PF is an indication that the
> address was actually half-way valid, and will not fault at all on some
> (possibly future) machine.
>
> > We dont do any instruction decoding in #GP handlers to figure out what
> > happened, while in the pagefault case we know which address faulted,
> > etc.
>
> Why would we care? It would be very obvious from the instruction
> disassembly plus the register contents. No need to decode instructions.
>
> > Perhaps we could try to make #GP handlers a bit more informative -
> > although decoding instructions will make things a bit more fragile
> > inevitably.
> >
> > Perhaps make it 0xffffcdead0000000 ?
>
> I'm really not seeing the reason for not just doing it right.
ok. Find the updated pull request below. I've added your Acked-by to the
last commit.
Please pull the latest safe-poison-pointers git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git safe-poison-pointers
Thanks,
Ingo
------------------>
Avi Kivity (1):
core, x86: make LIST_POISON less deadly
Ingo Molnar (1):
x86: change LIST_POISON to 0xdead000000000000
arch/x86/Kconfig | 5 +++++
include/linux/poison.h | 10 ++++++++--
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e0edaaa..0dbd040 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1019,6 +1019,11 @@ config ARCH_MEMORY_PROBE
def_bool X86_64
depends on MEMORY_HOTPLUG
+config ILLEGAL_POINTER_VALUE
+ hex
+ default 0 if X86_32
+ default 0xdead000000000000 if X86_64
+
source "mm/Kconfig"
config HIGHPTE
diff --git a/include/linux/poison.h b/include/linux/poison.h
index 9f31683..0d105a5 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -1,14 +1,20 @@
#ifndef _LINUX_POISON_H
#define _LINUX_POISON_H
+#ifdef CONFIG_ILLEGAL_POINTER_VALUE
+#define POISON_POINTER_DELTA CONFIG_ILLEGAL_POINTER_VALUE
+#else
+#define POISON_POINTER_DELTA 0L
+#endif
+
/********** include/linux/list.h **********/
/*
* These are non-NULL pointers that will result in page faults
* under normal circumstances, used to verify that nobody uses
* non-initialized list entries.
*/
-#define LIST_POISON1 ((void *) 0x00100100)
-#define LIST_POISON2 ((void *) 0x00200200)
+#define LIST_POISON1 ((void *) 0x00100100 + POISON_POINTER_DELTA)
+#define LIST_POISON2 ((void *) 0x00200200 + POISON_POINTER_DELTA)
/********** include/linux/timer.h **********/
/*
Linus Torvalds wrote:
>> We dont do any instruction decoding in #GP handlers to figure out what
>> happened, while in the pagefault case we know which address faulted,
>> etc.
>>
>
> Why would we care? It would be very obvious from the instruction
> disassembly plus the register contents. No need to decode instructions.
>
We could add a printk to the #GP handler that alerts the reader that a
poisoned list is suspected, if we find the address pattern in one of the
registers.
--
error compiling committee.c: too many arguments to function
On Mon, 14 Jul 2008, Avi Kivity wrote:
>
> We could add a printk to the #GP handler that alerts the reader that a
> poisoned list is suspected, if we find the address pattern in one of the
> registers.
I wouldn't worry too much - it's going to be pretty visible anyway.
The only thing I _would_ worry about is the AMD prefetch bug - some AMD
cores raise a spurious page fault for prefetch instructions, and we ignore
it.
I _think_ that bug is a pure TLB issue and would never trigger for an
address that doesn't do page fault handling at all because it gets caught
in the "is the address valid" code, but it does make me go "Hmm".
See
http://lkml.org/lkml/2003/9/10/397
from Rich Brunner. The AMD errata listing does say just page fault, and
does talk about speculative TLB reloads, so I think we're all good.
Linus
* Linus Torvalds <[email protected]> wrote:
> On Mon, 14 Jul 2008, Avi Kivity wrote:
> >
> > We could add a printk to the #GP handler that alerts the reader that a
> > poisoned list is suspected, if we find the address pattern in one of the
> > registers.
>
> I wouldn't worry too much - it's going to be pretty visible anyway.
>
> The only thing I _would_ worry about is the AMD prefetch bug - some AMD
> cores raise a spurious page fault for prefetch instructions, and we ignore
> it.
>
> I _think_ that bug is a pure TLB issue and would never trigger for an
> address that doesn't do page fault handling at all because it gets caught
> in the "is the address valid" code, but it does make me go "Hmm".
>
> See
>
> http://lkml.org/lkml/2003/9/10/397
>
> from Rich Brunner. The AMD errata listing does say just page fault,
> and does talk about speculative TLB reloads, so I think we're all
> good.
one of my testboxes has this erratum and i saw related bugs in the past
during stress-tests so it will show up if the non-canonical address
causes any problems. So i think 0xdead000000000000 will be fine, and
even if it isnt, we'll catch any problems fast enough.
Ingo
Linus Torvalds <[email protected]> writes:
>
>> We dont do any instruction decoding in #GP handlers to figure out what
>> happened, while in the pagefault case we know which address faulted,
>> etc.
>
> Why would we care? It would be very obvious from the instruction
> disassembly plus the register contents. No need to decode instructions.
The issue is that the kernel cannot detect it (short of running the
KVM x86 emulator on #GP, but surely you're not suggesting that), so it
cannot print something out.
You would always need someone skilled in x86 assembler code
[a lot of kernel developers actually aren't]
to look at the oops and say: ok it was a bad list poison.
Currently it is very obvious because the fault address is printed
even if you don't know any x86 assembler.
I would consider that a regression.
That is why I suggested using a canonical address.
Also BTW if a CPU ever supports more than 40 bits virtual there
will be so many kernel changes needed (5 level page tables) that
changing the poison too is the smallest of your worries.
-Andi
Linus Torvalds <[email protected]> writes:
>
> Why not pick something that is *guaranteed* to fault? The above looks like
> any future setup that supports 41 bits of addressing and has extended the
> page tables (yes, it will happen eventually) will find that to be a
> perfectly valid address?
When that happens then it will be either unmapped (and fault anyways)
or the kernel has to be changed to 5 level page tables first and then
surely the poison value could change too.
-Andi
On Mon, 14 Jul 2008, Andi Kleen wrote:
>
> The issue is that the kernel cannot detect it (short of running the
> KVM x86 emulator on #GP, but surely you're not suggesting that), so it
> cannot print something out.
Don't be silly. It prints out the oops message.
People who cannot see where that oops is, and cannot be bothered to look
at the register state aren't going to help _anyway_.
In fact, with the 0xdead... sequence, it's going to be *more* obvious than
with some almost-kernel 0xffffc.. address, even if it's not showing up in
the first line.
In other words, your whole argument is pure and utter sh*t. The page fault
is _less_ readable than the GP fault.
> That is why I suggested using a canonical address.
And I disagree. Violently.
The whole and ONLY point of poisoning is to get the fault.
With the canonical address, you won't get it reliably, and when you do get
it, it's not obvious to decode.
End of discussion.
Linus
On Mon, 14 Jul 2008, Andi Kleen wrote:
>
> When that happens then it will be either unmapped (and fault anyways)
> or the kernel has to be changed to 5 level page tables first and then
> surely the poison value could change too.
You're still ignoring the other point: ffffc.. is much _less_ visually
obvious than dead0...
Linus
Linus Torvalds wrote:
>
> On Mon, 14 Jul 2008, Andi Kleen wrote:
>> The issue is that the kernel cannot detect it (short of running the
>> KVM x86 emulator on #GP, but surely you're not suggesting that), so it
>> cannot print something out.
>
> Don't be silly. It prints out the oops message.
>
> People who cannot see where that oops is, and cannot be bothered to look
> at the register state aren't going to help _anyway_.
I've seen a lot of people who don't know too much assembler just work based
on the RIP. As in look it up in the source using addr2line or gdb and then try
to figure it out based on the source. Actually looking at the register contents
and how the instruction uses it is pretty arcane knowledge and usually
not even needed. And with more and more kernel developers being
newbie friendly in this area is a good thing.
>
> In fact, with the 0xdead... sequence, it's going to be *more* obvious than
> with some almost-kernel 0xffffc.. address, even if it's not showing up in
> the first line.
> In other words, your whole argument is pure and utter sh*t. The page fault
> is _less_ readable than the GP fault.
How about if the page fault handler checks for the value and prints
a obvious string? It could do this reliably, unlike the "grep
all registers for poison on #GP" method that was earlier proposed.
>> That is why I suggested using a canonical address.
>
> And I disagree. Violently.
>
> The whole and ONLY point of poisoning is to get the fault.
>
> With the canonical address, you won't get it reliably, and when you do get
> it, it's not obvious to decode.
I discussed this with Avi earlier and we were careful to put it into
one of the guaranteed to be unmapped holes. This should actually not
change if the CPU changes because it's defined by the kernel mapping.
If these holes ever change the poison would need to change too, but
otherwise I don't really see how it should be particularly unreliable.
Ok it might break if someone messes up the direct mapping, but if that
happens you typically don't even go through the oops handler completely
and won't see the message.
-Andi
On Mon, 14 Jul 2008, Andi Kleen wrote:
>
> How about if the page fault handler checks for the value and prints
> a obvious string? It could do this reliably, unlike the "grep
> all registers for poison on #GP" method that was earlier proposed.
The GP handler, you mean.
Sure, that might work. I'm not convinced it's worth it, but it's certainly
not difficult to do either. Something like this (TOTALLY UNTESTED!) might
work.
It puts it in the general "__die()" routine because it was easier this way
(we do want it below the "oops" thing, so that it gets captured by Arjan's
scripts!), in case the deathnotes are useful for other cases too.
It could perhaps be extended to allow negative offsets off the 0xdead000..
pointer (right now those would be ignored because the '0xdead00..' part
would turns into '0xdeacff..') but that would not matter for the
particular case of the list poisoning (since that one adds the 0x0100100
thing to it anyway).
Again: totally untested.
Linus
---
arch/x86/kernel/traps_64.c | 24 ++++++++++++++++++++++++
1 files changed, 24 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/traps_64.c b/arch/x86/kernel/traps_64.c
index f1a95d1..7487ce2 100644
--- a/arch/x86/kernel/traps_64.c
+++ b/arch/x86/kernel/traps_64.c
@@ -434,6 +434,29 @@ void dump_stack(void)
EXPORT_SYMBOL(dump_stack);
+/*
+ * This depends intimately on the layout of pt_regs to
+ * show deathnotes in the GP registers.
+ */
+static void show_deathnotes(struct pt_regs *regs)
+{
+ static const char *names[16] = {
+ "r15", "r14", "r13", "r12",
+ "rbp", "rbx", "r11", "r10",
+ "r9", "r8", "rax", "rcx",
+ "rdx", "rsi", "rdi",
+ };
+ const unsigned long *r = ®s->r15;
+ int i;
+
+ for (i = 0; i < 15; i++) {
+ unsigned long value = r[i];
+ if ((value >> 48) != 0xdead)
+ continue;
+ printk("Deathnote %lx in register %s\n", value, names[i]);
+ }
+}
+
void show_registers(struct pt_regs *regs)
{
int i;
@@ -554,6 +577,7 @@ int __kprobes __die(const char * str, struct pt_regs * regs, long err)
printk("\n");
if (notify_die(DIE_OOPS, str, regs, err, current->thread.trap_no, SIGSEGV) == NOTIFY_STOP)
return 1;
+ show_deathnotes(regs);
show_registers(regs);
add_taint(TAINT_DIE);
/* Executive summary in case the oops scrolled away */
Linus Torvalds wrote:
>
> On Mon, 14 Jul 2008, Andi Kleen wrote:
>> How about if the page fault handler checks for the value and prints
>> a obvious string? It could do this reliably, unlike the "grep
>> all registers for poison on #GP" method that was earlier proposed.
>
> The GP handler, you mean.
No i meant #PF because #PF can do it reliably (sorry wasn't fully
convinced by you earlier)
But ok doing it a little unreliably in #GP is better than nothing.
It would be cool if that function knew about the various poisons
in poison.h and printed them out by name.
That would indeed improve newbie friendliness of oopses significantly
I believe. Kind of like a mini AI in the oops printer.
I can look at that later if nobody beats me to it.
-Andi