Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932162AbdHWPVD (ORCPT ); Wed, 23 Aug 2017 11:21:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53196 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932090AbdHWPVC (ORCPT ); Wed, 23 Aug 2017 11:21:02 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 200723DE40 Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=jpoimboe@redhat.com Date: Wed, 23 Aug 2017 10:20:59 -0500 From: Josh Poimboeuf To: oliver yang Cc: yang oliver , "tglx@linutronix.de" , "mingo@redhat.com" , "hpa@zytor.com" , "luto@kernel.org" , "x86@kernel.org" , "rostedt@goodmis.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH] x86/nmi/64: avoid passing user space rsp of pt_regs to nmi handler Message-ID: <20170823152059.aix2zb24xsndfntq@treble> References: <20170822175115.k3tdjlkltua7lkiu@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Wed, 23 Aug 2017 15:21:02 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2959 Lines: 78 On Wed, Aug 23, 2017 at 10:19:02PM +0800, oliver yang wrote: > 2017-08-23 1:51 GMT+08:00 Josh Poimboeuf : > > On Tue, Aug 22, 2017 at 05:09:19PM +0000, yang oliver wrote: > >> From: Yong Yang > >> > >> While NMI interrupts the very beginning of SYSCALL, the rsp could > >> be still user space address without loading kernel address. Then > >> the pt_regs constructed by the NMI entry point could have a user > >> space rsp. If a NMI handler tried to dump stack by using this rsp, > >> it can cause the kernel panic. > > > > To me this sounds like an unwinder bug. The unwinder is supposed to > > have checks to prevent it from accessing user space. > > What kind of checking should we do here? > > While the NMI interrupts the first instruction of entry_SYSCALL_64, > the rip is kernel address, but rsp is still a user space address. > > Usually, kernel uses the "user_mode" check to determine user space > pt_regs, but this check doesn't work for this situation. The unwinder tries to make sure that all its memory accesses are within the bounds of the kernel stacks (task, irq, and exception stacks). See update_stack_state() in a recent kernel: /* * If the next bp isn't on the current stack, switch to the next one. * * We may have to traverse multiple stacks to deal with the possibility * that info->next_sp could point to an empty stack and the next bp * could be on a subsequent stack. */ while (!on_stack(info, frame, len)) if (get_stack_info(info->next_sp, state->task, info, &state->stack_mask)) return false; > > I know you had > > previously reported this for an older (pre-4.9) kernel. Have you > > verified the bug still exists on a recent kernel? > > Yes, I ran into kernel panic on 3.10 kernel, and the root case is > show_stack_log_lvl in nmi handler dump the user space stack, > but that memory got swapped out or freed. > > > On latest kernel, show_stack_log_lvl never dump stack contents. > But I'm not sure other NMI handler, or kernel trace code could dump > a user space stack dump while the kernel mode pt_regs which has > a user space sp. > > I wrote a module which register a NMI handler, it did the check like > below, > > if (!user_mode(regs)) { /* kernel mode, but could contain user address */ > rsp = regs->sp; > > if (rsp < __PAGE_OFFSET) { > pr_err("User Stack Pointer: %016lx", rsp); > pr_err("User Stack Dump: %016lx", *((unsigned long *)rsp)); > } > } > > I manage to make NMI interrupted the entry_SYSCALL_64 boundary, > then I could trigger the issue. > > Not sure whether it is good enough. > > Let me know if you have other suggestions. The frame pointer unwinder was completely rewritten in 4.9, and show_stack_log_lvl() was removed. AFAIK, it shouldn't panic when unwinding from an NMI in early syscall code. If you can recreate the problem on a recent kernel, please let me know. -- Josh