DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com EC94261D10
Date: Thu, 29 Jun 2017 21:12:49 -0500
From: Josh Poimboeuf <jpoimboe@redhat.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        live-patching@vger.kernel.org,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Jiri Slaby <jslaby@suse.cz>, Ingo Molnar <mingo@kernel.org>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Peter Zijlstra <peterz@infradead.org>, Mike Galbraith <efault@gmx.de>
Subject: Re: [PATCH v2 6/8] x86/entry: add unwind hint annotations
Message-ID: <20170630021249.cqkszxaqtwakmzpg@treble>
References: <cover.1498659915.git.jpoimboe@redhat.com>
 <f068d37f70b7f5d7cc2e3b2bdccec2a1932802ad.1498659915.git.jpoimboe@redhat.com>
 <20170629175333.bicpvbwo4d5pdbak@treble>
 <CALCETrWJeNXiEPPVecc2OGRENj220zNeFCYgm+Q2P7ZiMvoVFg@mail.gmail.com>
 <20170629190559.ttw52ahwtsjynayx@treble>
 <CALCETrUNhPgrQV9Sj+ZP3A_jSHbww4XPpBLJQ15OmH3nsHBsdg@mail.gmail.com>
 <20170629214134.c36krjhvzegwkfjk@treble>
 <F1E11F3A-039D-48B2-A57D-7881E93028DD@amacapital.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <F1E11F3A-039D-48B2-A57D-7881E93028DD@amacapital.net>
User-Agent: Mutt/1.6.0.1 (2016-04-01)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4734
Lines: 96

On Thu, Jun 29, 2017 at 03:59:04PM -0700, Andy Lutomirski wrote:
> > On Jun 29, 2017, at 2:41 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> On Thu, Jun 29, 2017 at 02:09:54PM -0700, Andy Lutomirski wrote:
> >>> On Thu, Jun 29, 2017 at 12:05 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >>>> On Thu, Jun 29, 2017 at 11:50:18AM -0700, Andy Lutomirski wrote:
> >>>>> On Thu, Jun 29, 2017 at 10:53 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >>>>> There's a bug here that will need a small change to the entry code.
> >>>>> 
> >>>>> Mike Galbraith reported:
> >>>>> 
> >>>>>  WARNING: can't dereference registers at ffffc900089d7e08 for ip ffffffff81740bbb
> >>>>> 
> >>>>> After some looking I found that it's caused by the following code
> >>>>> snippet in the 'interrupt' macro in entry_64.S:
> >>>>> 
> >>>>>        /*
> >>>>>         * Save previous stack pointer, optionally switch to interrupt stack.
> >>>>>         * irq_count is used to check if a CPU is already on an interrupt stack
> >>>>>         * or not. While this is essentially redundant with preempt_count it is
> >>>>>         * a little cheaper to use a separate counter in the PDA (short of
> >>>>>         * moving irq_enter into assembly, which would be too much work)
> >>>>>         */
> >>>>>        movq    %rsp, %rdi
> >>>>>        incl    PER_CPU_VAR(irq_count)
> >>>>>        cmovzq  PER_CPU_VAR(irq_stack_ptr), %rsp
> >>>>>        UNWIND_HINT_REGS base=rdi
> >>>>>        pushq   %rdi
> >>>>>        UNWIND_HINT_REGS indirect=1
> >>>>> 
> >>>>> The problem is that it's changing the stack pointer *before* writing the
> >>>>> previous stack pointer (push %rdi).  So when unwinding from an NMI which
> >>>>> hit between the rsp write and the rdi push, the unwinder tries to access
> >>>>> the regs on the previous stack (by reading rdi), but the previous stack
> >>>>> pointer isn't there yet, so the access is considered out of bounds.
> >>>> 
> >>>> Ugh, that code.  Does this problem go away with this patch applied:
> >>>> 
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_ist&id=2231ec7e0bcc1a2bc94a17081511ab54cc6badd1
> >>>> 
> >>>> If so, want to update the patch for new kernels (shouldn't conflict
> >>>> with anything except your unwind hints)?
> >>> 
> >>> I don't think that patch will fix it, because it still updates rsp
> >>> *before* writing the old rsp on the new stack.  So there's still a
> >>> window where the "previous stack" pointer is missing.
> >> 
> >> But it's in a register.  Is undwarf not able to grok that?
> > 
> > Sorry, I didn't explain it very well.  Undwarf can find the regs pointer
> > in rdi, it just doesn't trust its value.
> > 
> > See the stack_info.next_sp field, which is set in in_irq_stack():
> > 
> >    /*
> >     * The next stack pointer is the first thing pushed by the entry code
> >     * after switching to the irq stack.
> >     */
> >    info->next_sp = (unsigned long *)*(end - 1);
> > 
> > It's a safety mechanism.  The unwinder needs the last word of the irq
> > stack page to point to the previous stack.  That way it can double check
> > that the stack pointer it calculates is within the bounds of either the
> > current stack or the previous stack.
> > 
> > In the above code, the previous stack pointer (or next stack pointer,
> > depending on your perspective) hasn't been set up before it switches
> > stacks.  So the unwinder reads an uninitialized value into
> > info->next_sp, and compares that with the regs pointer, and then stops
> > the unwind because it thinks it went off into the weeds.
> > 
> 
> That should be manageable, though, I think.  With my patch applied
> (and maybe even without it), the only exception to that rule is if
> regs->sp points just above the top of the IRQ stack and the next
> instruction is push reg.  In that case, the reg is exactly as
> trustworthy as the normal rule.*  Can you teach the unwinding code
> that this is okay?
> 
> * If an NMI hits right there, then it relies on unwinding out of the
> NMI correctly.  But the usual checks that the target stack is a valid
> stack should prevent us from going off into the weeds regardless.

But that would remove a safeguard against the undwarf data being
corrupt.  Sure, it would only affect the rare case where the stack
pointer is at the top of the IRQ stack, but still...

Also, the frame pointer and guess unwinders have the same issue, and
this solution wouldn't work for them.

And, worst of all, the oops stack dumping code in show_trace_log_lvl()
also has this issue .  It relies on those previous stack pointers.  And
it's separated from the unwinder logic by design, so it can't ask the
unwinder where the next stack is.

-- 
Josh