Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751823AbdF3CMy (ORCPT ); Thu, 29 Jun 2017 22:12:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54016 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751666AbdF3CMw (ORCPT ); Thu, 29 Jun 2017 22:12:52 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com EC94261D10 Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=jpoimboe@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com EC94261D10 Date: Thu, 29 Jun 2017 21:12:49 -0500 From: Josh Poimboeuf To: Andy Lutomirski Cc: Andy Lutomirski , X86 ML , "linux-kernel@vger.kernel.org" , live-patching@vger.kernel.org, Linus Torvalds , Jiri Slaby , Ingo Molnar , "H. Peter Anvin" , Peter Zijlstra , Mike Galbraith Subject: Re: [PATCH v2 6/8] x86/entry: add unwind hint annotations Message-ID: <20170630021249.cqkszxaqtwakmzpg@treble> References: <20170629175333.bicpvbwo4d5pdbak@treble> <20170629190559.ttw52ahwtsjynayx@treble> <20170629214134.c36krjhvzegwkfjk@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Fri, 30 Jun 2017 02:12:52 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4734 Lines: 96 On Thu, Jun 29, 2017 at 03:59:04PM -0700, Andy Lutomirski wrote: > > On Jun 29, 2017, at 2:41 PM, Josh Poimboeuf wrote: > >> On Thu, Jun 29, 2017 at 02:09:54PM -0700, Andy Lutomirski wrote: > >>> On Thu, Jun 29, 2017 at 12:05 PM, Josh Poimboeuf wrote: > >>>> On Thu, Jun 29, 2017 at 11:50:18AM -0700, Andy Lutomirski wrote: > >>>>> On Thu, Jun 29, 2017 at 10:53 AM, Josh Poimboeuf wrote: > >>>>> There's a bug here that will need a small change to the entry code. > >>>>> > >>>>> Mike Galbraith reported: > >>>>> > >>>>> WARNING: can't dereference registers at ffffc900089d7e08 for ip ffffffff81740bbb > >>>>> > >>>>> After some looking I found that it's caused by the following code > >>>>> snippet in the 'interrupt' macro in entry_64.S: > >>>>> > >>>>> /* > >>>>> * Save previous stack pointer, optionally switch to interrupt stack. > >>>>> * irq_count is used to check if a CPU is already on an interrupt stack > >>>>> * or not. While this is essentially redundant with preempt_count it is > >>>>> * a little cheaper to use a separate counter in the PDA (short of > >>>>> * moving irq_enter into assembly, which would be too much work) > >>>>> */ > >>>>> movq %rsp, %rdi > >>>>> incl PER_CPU_VAR(irq_count) > >>>>> cmovzq PER_CPU_VAR(irq_stack_ptr), %rsp > >>>>> UNWIND_HINT_REGS base=rdi > >>>>> pushq %rdi > >>>>> UNWIND_HINT_REGS indirect=1 > >>>>> > >>>>> The problem is that it's changing the stack pointer *before* writing the > >>>>> previous stack pointer (push %rdi). So when unwinding from an NMI which > >>>>> hit between the rsp write and the rdi push, the unwinder tries to access > >>>>> the regs on the previous stack (by reading rdi), but the previous stack > >>>>> pointer isn't there yet, so the access is considered out of bounds. > >>>> > >>>> Ugh, that code. Does this problem go away with this patch applied: > >>>> > >>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_ist&id=2231ec7e0bcc1a2bc94a17081511ab54cc6badd1 > >>>> > >>>> If so, want to update the patch for new kernels (shouldn't conflict > >>>> with anything except your unwind hints)? > >>> > >>> I don't think that patch will fix it, because it still updates rsp > >>> *before* writing the old rsp on the new stack. So there's still a > >>> window where the "previous stack" pointer is missing. > >> > >> But it's in a register. Is undwarf not able to grok that? > > > > Sorry, I didn't explain it very well. Undwarf can find the regs pointer > > in rdi, it just doesn't trust its value. > > > > See the stack_info.next_sp field, which is set in in_irq_stack(): > > > > /* > > * The next stack pointer is the first thing pushed by the entry code > > * after switching to the irq stack. > > */ > > info->next_sp = (unsigned long *)*(end - 1); > > > > It's a safety mechanism. The unwinder needs the last word of the irq > > stack page to point to the previous stack. That way it can double check > > that the stack pointer it calculates is within the bounds of either the > > current stack or the previous stack. > > > > In the above code, the previous stack pointer (or next stack pointer, > > depending on your perspective) hasn't been set up before it switches > > stacks. So the unwinder reads an uninitialized value into > > info->next_sp, and compares that with the regs pointer, and then stops > > the unwind because it thinks it went off into the weeds. > > > > That should be manageable, though, I think. With my patch applied > (and maybe even without it), the only exception to that rule is if > regs->sp points just above the top of the IRQ stack and the next > instruction is push reg. In that case, the reg is exactly as > trustworthy as the normal rule.* Can you teach the unwinding code > that this is okay? > > * If an NMI hits right there, then it relies on unwinding out of the > NMI correctly. But the usual checks that the target stack is a valid > stack should prevent us from going off into the weeds regardless. But that would remove a safeguard against the undwarf data being corrupt. Sure, it would only affect the rare case where the stack pointer is at the top of the IRQ stack, but still... Also, the frame pointer and guess unwinders have the same issue, and this solution wouldn't work for them. And, worst of all, the oops stack dumping code in show_trace_log_lvl() also has this issue . It relies on those previous stack pointers. And it's separated from the unwinder logic by design, so it can't ask the unwinder where the next stack is. -- Josh