Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751988AbdF3FFh (ORCPT ); Fri, 30 Jun 2017 01:05:37 -0400 Received: from mail.kernel.org ([198.145.29.99]:39868 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751696AbdF3FFg (ORCPT ); Fri, 30 Jun 2017 01:05:36 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F63F22BE1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org MIME-Version: 1.0 In-Reply-To: <20170630021249.cqkszxaqtwakmzpg@treble> References: <20170629175333.bicpvbwo4d5pdbak@treble> <20170629190559.ttw52ahwtsjynayx@treble> <20170629214134.c36krjhvzegwkfjk@treble> <20170630021249.cqkszxaqtwakmzpg@treble> From: Andy Lutomirski Date: Thu, 29 Jun 2017 22:05:14 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2 6/8] x86/entry: add unwind hint annotations To: Josh Poimboeuf Cc: Andy Lutomirski , X86 ML , "linux-kernel@vger.kernel.org" , live-patching@vger.kernel.org, Linus Torvalds , Jiri Slaby , Ingo Molnar , "H. Peter Anvin" , Peter Zijlstra , Mike Galbraith Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4500 Lines: 114 On Thu, Jun 29, 2017 at 7:12 PM, Josh Poimboeuf wrote: > On Thu, Jun 29, 2017 at 03:59:04PM -0700, Andy Lutomirski wrote: >> > >> > Sorry, I didn't explain it very well. Undwarf can find the regs pointer >> > in rdi, it just doesn't trust its value. >> > >> > See the stack_info.next_sp field, which is set in in_irq_stack(): >> > >> > /* >> > * The next stack pointer is the first thing pushed by the entry code >> > * after switching to the irq stack. >> > */ >> > info->next_sp = (unsigned long *)*(end - 1); >> > >> > It's a safety mechanism. The unwinder needs the last word of the irq >> > stack page to point to the previous stack. That way it can double check >> > that the stack pointer it calculates is within the bounds of either the >> > current stack or the previous stack. >> > >> > In the above code, the previous stack pointer (or next stack pointer, >> > depending on your perspective) hasn't been set up before it switches >> > stacks. So the unwinder reads an uninitialized value into >> > info->next_sp, and compares that with the regs pointer, and then stops >> > the unwind because it thinks it went off into the weeds. >> > >> >> That should be manageable, though, I think. With my patch applied >> (and maybe even without it), the only exception to that rule is if >> regs->sp points just above the top of the IRQ stack and the next >> instruction is push reg. In that case, the reg is exactly as >> trustworthy as the normal rule.* Can you teach the unwinding code >> that this is okay? >> >> * If an NMI hits right there, then it relies on unwinding out of the >> NMI correctly. But the usual checks that the target stack is a valid >> stack should prevent us from going off into the weeds regardless. > > But that would remove a safeguard against the undwarf data being > corrupt. Sure, it would only affect the rare case where the stack > pointer is at the top of the IRQ stack, but still... > > Also, the frame pointer and guess unwinders have the same issue, and > this solution wouldn't work for them. > > And, worst of all, the oops stack dumping code in show_trace_log_lvl() > also has this issue . It relies on those previous stack pointers. And > it's separated from the unwinder logic by design, so it can't ask the > unwinder where the next stack is. Ugh. I feel like we had this debate before, and I thought it was rather silly that the unwinder cared. After all, we already have separate safety mechanisms to make sure that the unwinder never wanders off of the valid stacks and that it never touches any given stack more than once. But it is indeed useful for the oops unwinder, so c'est la vie. That being said, I bet we could get away with this (sorry for immense whitespace damage): .macro ENTER_IRQ_STACK old_rsp scratch_reg DEBUG_ENTRY_ASSERT_IRQS_OFF movq %rsp, \old_rsp incl PER_CPU_VAR(irq_count) jnz .Lrecurse_irq_stack_\@ /* * Right now, we just incremented irq_count to zero, so we've * claimed the IRQ stack but we haven't switched to it yet. * Anything that can interrupt us here without using IST * must be *extremely* careful to limit its stack usage. * * We write old_rsp to the IRQ stack before switching to * %rsp for the benefit of the OOPS unwinder. */ movq PER_CPU_VAR(irq_stack_ptr), \scratch_reg movq \old_rsp, -8(\scratch_reg) leaq -8(\old_rsp), %rsp jmp .Lout_\@ .Lrecurse_irq_stack_\@: pushq \old_rsp .Lout_\@: .endm After all, it looks like all the users have a scratch reg available. Hmm. There's another option that might be considerably nicer, though: put the IRQ stack at a known (at link time) position *in percpu space*. (Presumably it already is -- I haven't checked.) Then we do: .macro ENTER_IRQ_STACK old_rsp DEBUG_ENTRY_ASSERT_IRQS_OFF movq %rsp, \old_rsp incl PER_CPU_VAR(irq_count) /* * Right now, if we just incremented irq_count to zero, we've * claimed the IRQ stack but we haven't switched to it yet. * Anything that can interrupt us here without using IST * must be *extremely* careful to limit its stack usage. */ jnz .Lpush_old_rsp_\@ movq \old_rsp, PER_CPU_VAR(top_word_in_irq_stack) movq PER_CPU_VAR(irq_stack_ptr), %rsp .Lpush_old_rsp_\@: pushq \old_rsp .endm This pushes the old pointer twice, but that's easy enough to fix if we really cared. I think I like this variant better. What do you think? --Andy