From: Andy Lutomirski <[email protected]>
The IRQ stack lives in percpu space, so an IRQ handler that overflows it
will overwrite other data structures.
Use vmap() to remap the IRQ stack so that it will have the usual guard
pages that vmap/vmalloc allocations have. With this the kernel will panic
immediately on an IRQ stack overflow.
[ tglx: Move the map code to a proper place and invoke it only when a CPU
is about to be brought online. No point in installing the map at
early boot for all possible CPUs. Fail the CPU bringup if the vmap
fails as done for all other preparatory stages in cpu hotplug. ]
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/include/asm/irq.h | 4 ---
arch/x86/kernel/irq_64.c | 45 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/setup_percpu.c | 4 ---
3 files changed, 45 insertions(+), 8 deletions(-)
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -16,11 +16,7 @@ static inline int irq_canonicalize(int i
return ((irq == 2) ? 9 : irq);
}
-#ifdef CONFIG_X86_32
extern int irq_init_percpu_irqstack(unsigned int cpu);
-#else
-static inline int irq_init_percpu_irqstack(unsigned int cpu) { return 0; }
-#endif
#define __ARCH_HAS_DO_SOFTIRQ
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -89,3 +89,48 @@ bool handle_irq(struct irq_desc *desc, s
generic_handle_irq_desc(desc);
return true;
}
+
+#ifdef CONFIG_VMAP_STACK
+/*
+ * VMAP the backing store with guard pages
+ */
+static int map_irq_stack(unsigned int cpu)
+{
+ char *stack = (char *)per_cpu_ptr(&irq_stack_backing_store, cpu);
+ struct page *pages[IRQ_STACK_SIZE / PAGE_SIZE];
+ void *va;
+ int i;
+
+ for (i = 0; i < IRQ_STACK_SIZE / PAGE_SIZE; i++) {
+ phys_addr_t pa = per_cpu_ptr_to_phys(stack + (i << PAGE_SHIFT));
+
+ pages[i] = pfn_to_page(pa >> PAGE_SHIFT);
+ }
+
+ va = vmap(pages, IRQ_STACK_SIZE / PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
+ if (!va)
+ return -ENOMEM;
+
+ per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE;
+ return 0;
+}
+#else
+/*
+ * If VMAP stacks are disabled due to KASAN, just use the per cpu
+ * backing store without guard pages.
+ */
+static int map_irq_stack(unsigned int cpu)
+{
+ void *va = per_cpu_ptr(&irq_stack_backing_store, cpu);
+
+ per_cpu(hardirq_stack_ptr, cpu) = va + IRQ_STACK_SIZE;
+ return 0;
+}
+#endif
+
+int irq_init_percpu_irqstack(unsigned int cpu)
+{
+ if (per_cpu(hardirq_stack_ptr, cpu))
+ return 0;
+ return map_irq_stack(cpu);
+}
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -244,10 +244,6 @@ void __init setup_per_cpu_areas(void)
per_cpu(x86_cpu_to_logical_apicid, cpu) =
early_per_cpu_map(x86_cpu_to_logical_apicid, cpu);
#endif
-#ifdef CONFIG_X86_64
- per_cpu(hardirq_stack_ptr, cpu) = (struct irq_stack *)
- per_cpu_ptr(&irq_stack_backing_store, cpu) + 1;
-#endif
#ifdef CONFIG_NUMA
per_cpu(x86_cpu_to_node_map, cpu) =
early_per_cpu_map(x86_cpu_to_node_map, cpu);
On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <[email protected]> wrote:
>
> From: Andy Lutomirski <[email protected]>
>
> The IRQ stack lives in percpu space, so an IRQ handler that overflows it
> will overwrite other data structures.
>
> Use vmap() to remap the IRQ stack so that it will have the usual guard
> pages that vmap/vmalloc allocations have. With this the kernel will panic
> immediately on an IRQ stack overflow.
The 0day bot noticed that this dies with DEBUG_PAGEALLOC on. This is
because the store_stackinfo() function is utter garbage and this patch
correctly detects just how broken it is. The attached patch "fixes"
it. (It also contains a reliability improvement that should probably
get folded in, but is otherwise unrelated.)
A real fix would remove the generic kstack_end() function entirely
along with __HAVE_ARCH_KSTACK_END and would optionally replace
store_stackinfo() with something useful. Josh, do we have a generic
API to do a little stack walk like this? Otherwise, I don't think it
would be the end of the world to just remove the offending code.
--Andy
On Sat, 6 Apr 2019, Andy Lutomirski wrote:
> On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <[email protected]> wrote:
> >
> > From: Andy Lutomirski <[email protected]>
> >
> > The IRQ stack lives in percpu space, so an IRQ handler that overflows it
> > will overwrite other data structures.
> >
> > Use vmap() to remap the IRQ stack so that it will have the usual guard
> > pages that vmap/vmalloc allocations have. With this the kernel will panic
> > immediately on an IRQ stack overflow.
>
> The 0day bot noticed that this dies with DEBUG_PAGEALLOC on. This is
> because the store_stackinfo() function is utter garbage and this patch
> correctly detects just how broken it is. The attached patch "fixes"
> it. (It also contains a reliability improvement that should probably
> get folded in, but is otherwise unrelated.)
>
> A real fix would remove the generic kstack_end() function entirely
> along with __HAVE_ARCH_KSTACK_END and would optionally replace
> store_stackinfo() with something useful. Josh, do we have a generic
> API to do a little stack walk like this? Otherwise, I don't think it
> would be the end of the world to just remove the offending code.
Yes, I found the same yesterday before heading out. It's already broken
with the percpu stack because there is no guarantee that the per cpu stack
is thread size aligned. It's guaranteed to be page aligned not more.
I'm all for removing that nonsense, but the real question is whether there
is more code which assumes THREAD_SIZE aligned stacks aside of the thread
stack itself.
Thanks,
tglx
> On Apr 6, 2019, at 11:08 PM, Thomas Gleixner <[email protected]> wrote:
>
>> On Sat, 6 Apr 2019, Andy Lutomirski wrote:
>>> On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <[email protected]> wrote:
>>>
>>> From: Andy Lutomirski <[email protected]>
>>>
>>> The IRQ stack lives in percpu space, so an IRQ handler that overflows it
>>> will overwrite other data structures.
>>>
>>> Use vmap() to remap the IRQ stack so that it will have the usual guard
>>> pages that vmap/vmalloc allocations have. With this the kernel will panic
>>> immediately on an IRQ stack overflow.
>>
>> The 0day bot noticed that this dies with DEBUG_PAGEALLOC on. This is
>> because the store_stackinfo() function is utter garbage and this patch
>> correctly detects just how broken it is. The attached patch "fixes"
>> it. (It also contains a reliability improvement that should probably
>> get folded in, but is otherwise unrelated.)
>>
>> A real fix would remove the generic kstack_end() function entirely
>> along with __HAVE_ARCH_KSTACK_END and would optionally replace
>> store_stackinfo() with something useful. Josh, do we have a generic
>> API to do a little stack walk like this? Otherwise, I don't think it
>> would be the end of the world to just remove the offending code.
>
> Yes, I found the same yesterday before heading out. It's already broken
> with the percpu stack because there is no guarantee that the per cpu stack
> is thread size aligned. It's guaranteed to be page aligned not more.
>
> I'm all for removing that nonsense, but the real question is whether there
> is more code which assumes THREAD_SIZE aligned stacks aside of the thread
> stack itself.
>
>
Well, any code like this is already busted, since the stacks alignment doesn’t really change with these patches applied.
On Sun, 7 Apr 2019, Andy Lutomirski wrote:
> > On Apr 6, 2019, at 11:08 PM, Thomas Gleixner <[email protected]> wrote:
> >
> >> On Sat, 6 Apr 2019, Andy Lutomirski wrote:
> >>> On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <[email protected]> wrote:
> >>>
> >>> From: Andy Lutomirski <[email protected]>
> >>>
> >>> The IRQ stack lives in percpu space, so an IRQ handler that overflows it
> >>> will overwrite other data structures.
> >>>
> >>> Use vmap() to remap the IRQ stack so that it will have the usual guard
> >>> pages that vmap/vmalloc allocations have. With this the kernel will panic
> >>> immediately on an IRQ stack overflow.
> >>
> >> The 0day bot noticed that this dies with DEBUG_PAGEALLOC on. This is
> >> because the store_stackinfo() function is utter garbage and this patch
> >> correctly detects just how broken it is. The attached patch "fixes"
> >> it. (It also contains a reliability improvement that should probably
> >> get folded in, but is otherwise unrelated.)
> >>
> >> A real fix would remove the generic kstack_end() function entirely
> >> along with __HAVE_ARCH_KSTACK_END and would optionally replace
> >> store_stackinfo() with something useful. Josh, do we have a generic
> >> API to do a little stack walk like this? Otherwise, I don't think it
> >> would be the end of the world to just remove the offending code.
> >
> > Yes, I found the same yesterday before heading out. It's already broken
> > with the percpu stack because there is no guarantee that the per cpu stack
> > is thread size aligned. It's guaranteed to be page aligned not more.
> >
> > I'm all for removing that nonsense, but the real question is whether there
> > is more code which assumes THREAD_SIZE aligned stacks aside of the thread
> > stack itself.
> >
> >
> Well, any code like this is already busted, since the stacks alignment
> doesn’t really change with these patches applied.
It does. The existing code has it aligned by chance because the irq stack
is the first entry in the per cpu area. But yes, there is no reason to require
that alignment for irqstacks.
Thanks,
tglx
> On Apr 7, 2019, at 2:34 AM, Thomas Gleixner <[email protected]> wrote:
>
> On Sun, 7 Apr 2019, Andy Lutomirski wrote:
>>> On Apr 6, 2019, at 11:08 PM, Thomas Gleixner <[email protected]> wrote:
>>>
>>>>> On Sat, 6 Apr 2019, Andy Lutomirski wrote:
>>>>> On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <[email protected]> wrote:
>>>>>
>>>>> From: Andy Lutomirski <[email protected]>
>>>>>
>>>>> The IRQ stack lives in percpu space, so an IRQ handler that overflows it
>>>>> will overwrite other data structures.
>>>>>
>>>>> Use vmap() to remap the IRQ stack so that it will have the usual guard
>>>>> pages that vmap/vmalloc allocations have. With this the kernel will panic
>>>>> immediately on an IRQ stack overflow.
>>>>
>>>> The 0day bot noticed that this dies with DEBUG_PAGEALLOC on. This is
>>>> because the store_stackinfo() function is utter garbage and this patch
>>>> correctly detects just how broken it is. The attached patch "fixes"
>>>> it. (It also contains a reliability improvement that should probably
>>>> get folded in, but is otherwise unrelated.)
>>>>
>>>> A real fix would remove the generic kstack_end() function entirely
>>>> along with __HAVE_ARCH_KSTACK_END and would optionally replace
>>>> store_stackinfo() with something useful. Josh, do we have a generic
>>>> API to do a little stack walk like this? Otherwise, I don't think it
>>>> would be the end of the world to just remove the offending code.
>>>
>>> Yes, I found the same yesterday before heading out. It's already broken
>>> with the percpu stack because there is no guarantee that the per cpu stack
>>> is thread size aligned. It's guaranteed to be page aligned not more.
>>>
>>> I'm all for removing that nonsense, but the real question is whether there
>>> is more code which assumes THREAD_SIZE aligned stacks aside of the thread
>>> stack itself.
>>>
>>>
>> Well, any code like this is already busted, since the stacks alignment
>> doesn’t really change with these patches applied.
>
> It does. The existing code has it aligned by chance because the irq stack
> is the first entry in the per cpu area. But yes, there is no reason to require
> that alignment for irqstacks.
>
Isn’t it the first entry in the percpu area (the normal one, not cea)? Is that aligned beyond PAGE_SIZE?
On Sat, 6 Apr 2019, Andy Lutomirski wrote:
> On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <[email protected]> wrote:
> >
> > From: Andy Lutomirski <[email protected]>
> >
> > The IRQ stack lives in percpu space, so an IRQ handler that overflows it
> > will overwrite other data structures.
> >
> > Use vmap() to remap the IRQ stack so that it will have the usual guard
> > pages that vmap/vmalloc allocations have. With this the kernel will panic
> > immediately on an IRQ stack overflow.
>
> The 0day bot noticed that this dies with DEBUG_PAGEALLOC on. This is
> because the store_stackinfo() function is utter garbage and this patch
> correctly detects just how broken it is. The attached patch "fixes"
> it. (It also contains a reliability improvement that should probably
> get folded in, but is otherwise unrelated.)
>
> A real fix would remove the generic kstack_end() function entirely
> along with __HAVE_ARCH_KSTACK_END and would optionally replace
> store_stackinfo() with something useful. Josh, do we have a generic
> API to do a little stack walk like this? Otherwise, I don't think it
> would be the end of the world to just remove the offending code.
Actually we have: save_stack_trace()
On Sun, Apr 7, 2019 at 3:44 PM Thomas Gleixner <[email protected]> wrote:
>
> On Sat, 6 Apr 2019, Andy Lutomirski wrote:
> > On Fri, Apr 5, 2019 at 8:11 AM Thomas Gleixner <[email protected]> wrote:
> > >
> > > From: Andy Lutomirski <[email protected]>
> > >
> > > The IRQ stack lives in percpu space, so an IRQ handler that overflows it
> > > will overwrite other data structures.
> > >
> > > Use vmap() to remap the IRQ stack so that it will have the usual guard
> > > pages that vmap/vmalloc allocations have. With this the kernel will panic
> > > immediately on an IRQ stack overflow.
> >
> > The 0day bot noticed that this dies with DEBUG_PAGEALLOC on. This is
> > because the store_stackinfo() function is utter garbage and this patch
> > correctly detects just how broken it is. The attached patch "fixes"
> > it. (It also contains a reliability improvement that should probably
> > get folded in, but is otherwise unrelated.)
> >
> > A real fix would remove the generic kstack_end() function entirely
> > along with __HAVE_ARCH_KSTACK_END and would optionally replace
> > store_stackinfo() with something useful. Josh, do we have a generic
> > API to do a little stack walk like this? Otherwise, I don't think it
> > would be the end of the world to just remove the offending code.
>
> Actually we have: save_stack_trace()
>
Like I did here:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=WIP.x86/stackguards
(Link is bad right now but will hopefully be okay when you read it.
I'm still fiddling with the other patches in there -- I'd like to kill
kstack_end() entirely.
On Sun, 7 Apr 2019, Andy Lutomirski wrote:
> On Sun, Apr 7, 2019 at 3:44 PM Thomas Gleixner <[email protected]> wrote:
> > Actually we have: save_stack_trace()
> >
>
> Like I did here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=WIP.x86/stackguards
Kinda, but what that code wants is to skip any entry before 'caller'. So we
either add something like save_stack_trace_from() which is trivial on x86
because unwind_start() already has an argument to hand in the start of
stack or we filter out the entries up to 'caller' in that code.
Btw, your patch will explode badly because stack_trace::entries is just a
pointer. It does not provide a storage array :)
Thanks,
tglx
On Mon, Apr 08, 2019 at 09:18:00AM -0700, Andy Lutomirski wrote:
> On Sun, Apr 7, 2019 at 11:46 PM Thomas Gleixner <[email protected]> wrote:
> >
> > On Sun, 7 Apr 2019, Andy Lutomirski wrote:
> > > On Sun, Apr 7, 2019 at 3:44 PM Thomas Gleixner <[email protected]> wrote:
> > > > Actually we have: save_stack_trace()
> > > >
> > >
> > > Like I did here:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=WIP.x86/stackguards
> >
> > Kinda, but what that code wants is to skip any entry before 'caller'. So we
> > either add something like save_stack_trace_from() which is trivial on x86
> > because unwind_start() already has an argument to hand in the start of
> > stack or we filter out the entries up to 'caller' in that code.
> >
> >
>
> Whoops!
>
> I could add a save_stack_trace_from() or I could add a "caller"
> argument to struct stack_trace. Any preference as to which looks
> better? The latter seems a little nicer to me.
The current official way to do that is to use the stack_trace "skip"
field. That's a hack though because it relies on inlining decisions.
It would be nicer if the skip interface were pointer-based like your
suggestion.
--
Josh
On Sun, Apr 7, 2019 at 11:46 PM Thomas Gleixner <[email protected]> wrote:
>
> On Sun, 7 Apr 2019, Andy Lutomirski wrote:
> > On Sun, Apr 7, 2019 at 3:44 PM Thomas Gleixner <[email protected]> wrote:
> > > Actually we have: save_stack_trace()
> > >
> >
> > Like I did here:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=WIP.x86/stackguards
>
> Kinda, but what that code wants is to skip any entry before 'caller'. So we
> either add something like save_stack_trace_from() which is trivial on x86
> because unwind_start() already has an argument to hand in the start of
> stack or we filter out the entries up to 'caller' in that code.
>
>
Whoops!
I could add a save_stack_trace_from() or I could add a "caller"
argument to struct stack_trace. Any preference as to which looks
better? The latter seems a little nicer to me.
On Mon, 8 Apr 2019, Andy Lutomirski wrote:
> On Sun, Apr 7, 2019 at 11:46 PM Thomas Gleixner <[email protected]> wrote:
> >
> > On Sun, 7 Apr 2019, Andy Lutomirski wrote:
> > > On Sun, Apr 7, 2019 at 3:44 PM Thomas Gleixner <[email protected]> wrote:
> > > > Actually we have: save_stack_trace()
> > > >
> > >
> > > Like I did here:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=WIP.x86/stackguards
> >
> > Kinda, but what that code wants is to skip any entry before 'caller'. So we
> > either add something like save_stack_trace_from() which is trivial on x86
> > because unwind_start() already has an argument to hand in the start of
> > stack or we filter out the entries up to 'caller' in that code.
> >
> >
> Whoops!
>
> I could add a save_stack_trace_from() or I could add a "caller"
> argument to struct stack_trace. Any preference as to which looks
> better? The latter seems a little nicer to me.
The whole interface with struct stack_trace sucks. Why is skip and max
entries in that struct and not an argument? I went through all the call
sites and it just makes me shudder. That terminate trace with ULONG_MAX is
another horrible hack which is then undone on several callsites
again. Before we add more hacky stuff to it, lets cleanup that whole mess
first.
Thanks,
tglx
On Mon, 8 Apr 2019, Thomas Gleixner wrote:
> On Mon, 8 Apr 2019, Andy Lutomirski wrote:
> > On Sun, Apr 7, 2019 at 11:46 PM Thomas Gleixner <[email protected]> wrote:
> > >
> > > On Sun, 7 Apr 2019, Andy Lutomirski wrote:
> > > > On Sun, Apr 7, 2019 at 3:44 PM Thomas Gleixner <[email protected]> wrote:
> > > > > Actually we have: save_stack_trace()
> > > > >
> > > >
> > > > Like I did here:
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=WIP.x86/stackguards
> > >
> > > Kinda, but what that code wants is to skip any entry before 'caller'. So we
> > > either add something like save_stack_trace_from() which is trivial on x86
> > > because unwind_start() already has an argument to hand in the start of
> > > stack or we filter out the entries up to 'caller' in that code.
> > >
> > >
> > Whoops!
> >
> > I could add a save_stack_trace_from() or I could add a "caller"
> > argument to struct stack_trace. Any preference as to which looks
> > better? The latter seems a little nicer to me.
Bah, all that sucks. Because 'caller' is comes from __RET_IP__ and is not a
pointer to the stack. So this really needs to be a filter which prevents
storing an entry _before_ caller is seen on the stack.
ftrace and kasan do some post stacktrace filtering as well just different.
That's all bonkers.
tglx