Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751900AbdF2W7J (ORCPT ); Thu, 29 Jun 2017 18:59:09 -0400 Received: from mail-pg0-f48.google.com ([74.125.83.48]:32806 "EHLO mail-pg0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751710AbdF2W7H (ORCPT ); Thu, 29 Jun 2017 18:59:07 -0400 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v2 6/8] x86/entry: add unwind hint annotations From: Andy Lutomirski X-Mailer: iPhone Mail (14F89) In-Reply-To: <20170629214134.c36krjhvzegwkfjk@treble> Date: Thu, 29 Jun 2017 15:59:04 -0700 Cc: Andy Lutomirski , X86 ML , "linux-kernel@vger.kernel.org" , live-patching@vger.kernel.org, Linus Torvalds , Jiri Slaby , Ingo Molnar , "H. Peter Anvin" , Peter Zijlstra , Mike Galbraith Message-Id: References: <20170629175333.bicpvbwo4d5pdbak@treble> <20170629190559.ttw52ahwtsjynayx@treble> <20170629214134.c36krjhvzegwkfjk@treble> To: Josh Poimboeuf Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id v5TMxMSX013148 Content-Length: 3984 Lines: 81 --Andy > On Jun 29, 2017, at 2:41 PM, Josh Poimboeuf wrote: > >> On Thu, Jun 29, 2017 at 02:09:54PM -0700, Andy Lutomirski wrote: >>> On Thu, Jun 29, 2017 at 12:05 PM, Josh Poimboeuf wrote: >>>> On Thu, Jun 29, 2017 at 11:50:18AM -0700, Andy Lutomirski wrote: >>>>> On Thu, Jun 29, 2017 at 10:53 AM, Josh Poimboeuf wrote: >>>>> There's a bug here that will need a small change to the entry code. >>>>> >>>>> Mike Galbraith reported: >>>>> >>>>> WARNING: can't dereference registers at ffffc900089d7e08 for ip ffffffff81740bbb >>>>> >>>>> After some looking I found that it's caused by the following code >>>>> snippet in the 'interrupt' macro in entry_64.S: >>>>> >>>>> /* >>>>> * Save previous stack pointer, optionally switch to interrupt stack. >>>>> * irq_count is used to check if a CPU is already on an interrupt stack >>>>> * or not. While this is essentially redundant with preempt_count it is >>>>> * a little cheaper to use a separate counter in the PDA (short of >>>>> * moving irq_enter into assembly, which would be too much work) >>>>> */ >>>>> movq %rsp, %rdi >>>>> incl PER_CPU_VAR(irq_count) >>>>> cmovzq PER_CPU_VAR(irq_stack_ptr), %rsp >>>>> UNWIND_HINT_REGS base=rdi >>>>> pushq %rdi >>>>> UNWIND_HINT_REGS indirect=1 >>>>> >>>>> The problem is that it's changing the stack pointer *before* writing the >>>>> previous stack pointer (push %rdi). So when unwinding from an NMI which >>>>> hit between the rsp write and the rdi push, the unwinder tries to access >>>>> the regs on the previous stack (by reading rdi), but the previous stack >>>>> pointer isn't there yet, so the access is considered out of bounds. >>>> >>>> Ugh, that code. Does this problem go away with this patch applied: >>>> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_ist&id=2231ec7e0bcc1a2bc94a17081511ab54cc6badd1 >>>> >>>> If so, want to update the patch for new kernels (shouldn't conflict >>>> with anything except your unwind hints)? >>> >>> I don't think that patch will fix it, because it still updates rsp >>> *before* writing the old rsp on the new stack. So there's still a >>> window where the "previous stack" pointer is missing. >> >> But it's in a register. Is undwarf not able to grok that? > > Sorry, I didn't explain it very well. Undwarf can find the regs pointer > in rdi, it just doesn't trust its value. > > See the stack_info.next_sp field, which is set in in_irq_stack(): > > /* > * The next stack pointer is the first thing pushed by the entry code > * after switching to the irq stack. > */ > info->next_sp = (unsigned long *)*(end - 1); > > It's a safety mechanism. The unwinder needs the last word of the irq > stack page to point to the previous stack. That way it can double check > that the stack pointer it calculates is within the bounds of either the > current stack or the previous stack. > > In the above code, the previous stack pointer (or next stack pointer, > depending on your perspective) hasn't been set up before it switches > stacks. So the unwinder reads an uninitialized value into > info->next_sp, and compares that with the regs pointer, and then stops > the unwind because it thinks it went off into the weeds. > That should be manageable, though, I think. With my patch applied (and maybe even without it), the only exception to that rule is if regs->sp points just above the top of the IRQ stack and the next instruction is push reg. In that case, the reg is exactly as trustworthy as the normal rule.* Can you teach the unwinding code that this is okay? * If an NMI hits right there, then it relies on unwinding out of the NMI correctly. But the usual checks that the target stack is a valid stack should prevent us from going off into the weeds regardless. > -- > Josh