by Andy Lutomirski

[permalink] [raw]

Subject: Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

On Apr 29, 2016 3:11 PM, "Jiri Kosina" <[email protected]> wrote:
>
> On Fri, 29 Apr 2016, Andy Lutomirski wrote:
>
> > > NMI, MCE and interrupts aren't a problem because they have dedicated
> > > stacks, which are easy to detect. If the tasks' stack is on an
> > > exception stack or an irq stack, we consider it unreliable.
> >
> > Only on x86_64.
>
> Well, MCEs are more or less x86-specific as well. But otherwise good
> point, thanks Andy.
>
> So, how does stack layout generally look like in case when NMI is actually
> running on proper kernel stack? I thought it's guaranteed to contain
> pt_regs anyway in all cases. Is that not guaranteed to be the case?
>

On x86, at least, there will still be pt_regs for the NMI. For the
interrupted state, though, there might not be pt_regs, as the NMI
might have happened while still populating pt_regs. In fact, the NMI
stack could overlap task_pt_regs.

For x86_32, there's no guarantee that pt_regs contains sp due to
hardware silliness. You need to parse it more carefully, as,
!user_mode(regs), then the old sp is just above pt_regs.

--Andy

2016-04-30 00:10:28

by Andy Lutomirski

[permalink] [raw]

Subject: Re: [RFC PATCH v2 03/18] x86/asm/head: standardize the bottom of the stack for idle tasks

On Apr 29, 2016 4:27 PM, "Josh Poimboeuf" <[email protected]> wrote:
>
> On Fri, Apr 29, 2016 at 02:38:02PM -0700, Andy Lutomirski wrote:
> > On Fri, Apr 29, 2016 at 1:50 PM, Josh Poimboeuf <[email protected]> wrote:
> > > On Fri, Apr 29, 2016 at 12:39:16PM -0700, Andy Lutomirski wrote:
> > >> On Thu, Apr 28, 2016 at 1:44 PM, Josh Poimboeuf <[email protected]> wrote:
> > >> > Thanks to all the recent x86 entry code refactoring, most tasks' kernel
> > >> > stacks start at the same offset right above their saved pt_regs,
> > >> > regardless of which syscall was used to enter the kernel. That creates
> > >> > a nice convention which makes it straightforward to identify the
> > >> > "bottom" of the stack, which can be useful for stack walking code which
> > >> > needs to verify the stack is sane.
> > >> >
> > >> > However there are still a few types of tasks which don't yet follow that
> > >> > convention:
> > >> >
> > >> > 1) CPU idle tasks, aka the "swapper" tasks
> > >> >
> > >> > 2) freshly forked TIF_FORK tasks which don't have a stack at all
> > >> >
> > >> > Make the idle tasks conform to the new stack bottom convention by
> > >> > starting their stack at a sizeof(pt_regs) offset from the end of the
> > >> > stack page.
> > >> >
> > >> > Signed-off-by: Josh Poimboeuf <[email protected]>
> > >> > ---
> > >> > arch/x86/kernel/head_64.S | 7 ++++---
> > >> > 1 file changed, 4 insertions(+), 3 deletions(-)
> > >> >
> > >> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > >> > index 6dbd2c0..0b12311 100644
> > >> > --- a/arch/x86/kernel/head_64.S
> > >> > +++ b/arch/x86/kernel/head_64.S
> > >> > @@ -296,8 +296,9 @@ ENTRY(start_cpu)
> > >> > * REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
> > >> > * address given in m16:64.
> > >> > */
> > >> > - movq initial_code(%rip),%rax
> > >> > - pushq $0 # fake return address to stop unwinder
> > >> > + call 1f # put return address on stack for unwinder
> > >> > +1: xorq %rbp, %rbp # clear frame pointer
> > >> > + movq initial_code(%rip), %rax
> > >> > pushq $__KERNEL_CS # set correct cs
> > >> > pushq %rax # target address in negative space
> > >> > lretq
> > >> > @@ -325,7 +326,7 @@ ENDPROC(start_cpu0)
> > >> > GLOBAL(initial_gs)
> > >> > .quad INIT_PER_CPU_VAR(irq_stack_union)
> > >> > GLOBAL(initial_stack)
> > >> > - .quad init_thread_union+THREAD_SIZE-8
> > >> > + .quad init_thread_union + THREAD_SIZE - SIZEOF_PTREGS
> > >>
> > >> As long as you're doing this, could you also set orig_ax to -1? I
> > >> remember running into some oddities resulting from orig_ax containing
> > >> garbage at some point.
> > >
> > > I assume you mean to initialize the orig_rax value in the pt_regs at the
> > > bottom of the stack of the idle task?
> > >
> > > How could that cause a problem? Since the idle task never returns from
> > > a system call, I'd assume that memory never gets accessed?
> > >
> >
> > Look at collect_syscall in lib/syscall.c
>
> I don't see how collect_syscall() can be called for the per-cpu idle
> "swapper" tasks (which is what the above code affects). They don't have
> pids or /proc entries so you can't do /proc/<pid>/syscall on them.

If so, then never mind.

--Andy

>
> --
> Josh