Received: by 10.223.176.5 with SMTP id f5csp445371wra; Fri, 9 Feb 2018 01:32:11 -0800 (PST) X-Google-Smtp-Source: AH8x227BNC7mJ03D3ZS/Nwq1gJ4Yw8v6NBik2YuSE+LjAka/UaHa+swyYHFmm16yw1tZt1LW8MUn X-Received: by 2002:a17:902:6001:: with SMTP id r1-v6mr528023plj.391.1518168731183; Fri, 09 Feb 2018 01:32:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518168731; cv=none; d=google.com; s=arc-20160816; b=mxPyU7NC5C4MjUrm8AsAg8tzTmfN9YeBXHYZ8od9zfBusHdKZB0iqAI5VjZQYqYuzk 4GHdv/ZdSSMsMLvrXDEgtLw+FxWicVzw/kB9qGca1yAOIrUBqGEQvh3fIaa02u5X5Nbv NGG98P1VkWMuEg4Jb57nEBhd7sifsM/8MpdBJSMHIgo4mapdpPfe5X6r+vskds8aaIHb Jq8AhAcv3KHwCFw8ReptFz5/kB4B7SrXU5aog8rltufmdwG3aQPULl5OR105EPpDAmLF xlVnqCKY7oLYZaebDhxlCbqcyAMvOi2JBMthnzHgCT6dIlJTiXjpn8LBS0dlz0RqZPga /E5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=DrqeQNOm0g8Mc4vr7QAR9T+xeBlp8SjMqmZj74xwEDw=; b=qEm44xAqcpxN2HDOEj79EBKjb3Rx1JIUSW9EBDQSLRTQbOMkESSpyVzIo8H+iKhz2R WT2TZ7M19FReLzc5sFaGwx2XE3etgwrLGuNoB9VfeO5DImMcYW0Ah8luuDpqau6u/ZZc URQDlkK7cx3uxy5cDWZiKGGNkL43hKhYebPsa01jd1fzqXCw76RZF5f6y5z337mUpMmI SHbIOf9KNq1+C3VRso5mncEFlOPBZiQ2bIV1bU6qBsUcgJpmtdkcelZuXcooW0TmZJ/V v+yFCwVYZ8dBQbUrYAQ1KLs5REZmL4qbkMvDCO1DuVYRQdSxFzl0VY1vFNgb9ldvsVXN PoGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail (test mode) header.i=@8bytes.org header.s=mail-1 header.b=CNLxIxPL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b59-v6si1305643plb.514.2018.02.09.01.31.56; Fri, 09 Feb 2018 01:32:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail (test mode) header.i=@8bytes.org header.s=mail-1 header.b=CNLxIxPL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752362AbeBIJ0J (ORCPT + 99 others); Fri, 9 Feb 2018 04:26:09 -0500 Received: from 8bytes.org ([81.169.241.247]:42996 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752126AbeBIJZ7 (ORCPT ); Fri, 9 Feb 2018 04:25:59 -0500 Received: by theia.8bytes.org (Postfix, from userid 1000) id 8A03644C; Fri, 9 Feb 2018 10:25:52 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=8bytes.org; s=mail-1; t=1518168352; bh=eoNrMsfFNhTPIP/bgYgkkvE5KK2KXUvfSm9MuQNGzWU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CNLxIxPLrX9lm+9eLgLzWWPvhraR0dPfhNbNlD8stjLEjGFVIvVXNTKpZDumO3uzQ /GGp3j4LC/p1oYUo2hI8X1jH/5wH7us2osdFtHF93U+XENQjKECJuIfIMVnncz+71R nXG78K96LFyAr8iR4IaD+5ZzLtJEgbuKvEa8E9BCfYUelHLrKNJ0lCJVpJxZkjMOGp 7ac3W0Xz257tzWOl566qSChlpvFFlr2mowDHgR3NxYLo3YW1KbZKbMxnwXd+uN0m3t hIQ1zLafmaMwkmAWlSqkp0bpTTVECPj9BbvFm17omdbjvSnNEa2Y+o4359XQuemDrW gkKfVe2/b0Hmg== From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , Pavel Machek , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 13/31] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack Date: Fri, 9 Feb 2018 10:25:22 +0100 Message-Id: <1518168340-9392-14-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1518168340-9392-1-git-send-email-joro@8bytes.org> References: <1518168340-9392-1-git-send-email-joro@8bytes.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Joerg Roedel It can happen that we enter the kernel from kernel-mode and on the entry-stack. The most common way this happens is when we get an exception while loading the user-space segment registers on the kernel-to-userspace exit path. The segment loading needs to be done after the entry-stack switch, because the stack-switch needs kernel %fs for per_cpu access. When this happens, we need to make sure that we leave the kernel with the entry-stack again, so that the interrupted code-path runs on the right stack when switching to the user-cr3. We do this by detecting this condition on kernel-entry by checking CS.RPL and %esp, and if it happens, we copy over the complete content of the entry stack to the task-stack. This needs to be done because once we enter the exception handlers we might be scheduled out or even migrated to a different CPU, so that we can't rely on the entry-stack contents. We also leave a marker in the stack-frame to detect this condition on the exit path. On the exit path the copy is reversed, we copy all of the remaining task-stack back to the entry-stack and switch to it. Signed-off-by: Joerg Roedel --- arch/x86/entry/entry_32.S | 109 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 108 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index b5ef003..d94dab6 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -358,6 +358,9 @@ * copied there. So allocate the stack-frame on the task-stack and * switch to it before we do any copying. */ + +#define CS_FROM_ENTRY_STACK (1 << 31) + .macro SWITCH_TO_KERNEL_STACK ALTERNATIVE "", "jmp .Lend_\@", X86_FEATURE_XENPV @@ -381,6 +384,10 @@ /* Load top of task-stack into %edi */ movl TSS_entry_stack(%edi), %edi + /* Special case - entry from kernel mode via entry stack */ + testl $SEGMENT_RPL_MASK, PT_CS(%esp) + jz .Lentry_from_kernel_\@ + /* Bytes to copy */ movl $PTREGS_SIZE, %ecx @@ -394,8 +401,8 @@ */ addl $(4 * 4), %ecx -.Lcopy_pt_regs_\@: #endif +.Lcopy_pt_regs_\@: /* Allocate frame on task-stack */ subl %ecx, %edi @@ -410,6 +417,56 @@ cld rep movsb + jmp .Lend_\@ + +.Lentry_from_kernel_\@: + + /* + * This handles the case when we enter the kernel from + * kernel-mode and %esp points to the entry-stack. When this + * happens we need to switch to the task-stack to run C code, + * but switch back to the entry-stack again when we approach + * iret and return to the interrupted code-path. This usually + * happens when we hit an exception while restoring user-space + * segment registers on the way back to user-space. + * + * When we switch to the task-stack here, we can't trust the + * contents of the entry-stack anymore, as the exception handler + * might be scheduled out or moved to another CPU. Therefore we + * copy the complete entry-stack to the task-stack and set a + * marker in the iret-frame (bit 31 of the CS dword) to detect + * what we've done on the iret path. + * + * On the iret path we copy everything back and switch to the + * entry-stack, so that the interrupted kernel code-path + * continues on the same stack it was interrupted with. + * + * Be aware that an NMI can happen anytime in this code. + * + * %esi: Entry-Stack pointer (same as %esp) + * %edi: Top of the task stack + */ + + /* Calculate number of bytes on the entry stack in %ecx */ + movl %esi, %ecx + + /* %ecx to the top of entry-stack */ + andl $(MASK_entry_stack), %ecx + addl $(SIZEOF_entry_stack), %ecx + + /* Number of bytes on the entry stack to %ecx */ + sub %esi, %ecx + + /* Mark stackframe as coming from entry stack */ + orl $CS_FROM_ENTRY_STACK, PT_CS(%esp) + + /* + * %esi and %edi are unchanged, %ecx contains the number of + * bytes to copy. The code at .Lcopy_pt_regs_\@ will allocate + * the stack-frame on task-stack and copy everything over + */ + jmp .Lcopy_pt_regs_\@ + .Lend_\@: .endm @@ -467,6 +524,55 @@ .endm /* + * This macro handles the case when we return to kernel-mode on the iret + * path and have to switch back to the entry stack. + * + * See the comments below the .Lentry_from_kernel_\@ label in the + * SWITCH_TO_KERNEL_STACK macro for more details. + */ +.macro PARANOID_EXIT_TO_KERNEL_MODE + + /* + * Test if we entered the kernel with the entry-stack. Most + * likely we did not, because this code only runs on the + * return-to-kernel path. + */ + testl $CS_FROM_ENTRY_STACK, PT_CS(%esp) + jz .Lend_\@ + + /* Unlikely slow-path */ + + /* Clear marker from stack-frame */ + andl $(~CS_FROM_ENTRY_STACK), PT_CS(%esp) + + /* Copy the remaining task-stack contents to entry-stack */ + movl %esp, %esi + movl PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi + + /* Bytes on the task-stack to ecx */ + movl PER_CPU_VAR(cpu_current_top_of_stack), %ecx + subl %esi, %ecx + + /* Allocate stack-frame on entry-stack */ + subl %ecx, %edi + + /* + * Save future stack-pointer, we must not switch until the + * copy is done, otherwise the NMI handler could destroy the + * contents of the task-stack we are about to copy. + */ + movl %edi, %ebx + + /* Do the copy */ + cld + rep movsb + + /* Safe to switch to entry-stack now */ + movl %ebx, %esp + +.Lend_\@: +.endm +/* * %eax: prev task * %edx: next task */ @@ -837,6 +943,7 @@ restore_all: restore_all_kernel: TRACE_IRQS_IRET + PARANOID_EXIT_TO_KERNEL_MODE RESTORE_REGS 4 jmp .Lirq_return -- 2.7.4