Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp640423imm; Fri, 13 Jul 2018 03:57:13 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdVthQn95JJJ6uKP5kn3NjENJ8zcBTeznwHl4++yFlacGD6pNL97IR4sC5ZW6LNhOzVzWgk X-Received: by 2002:a62:f587:: with SMTP id b7-v6mr6566776pfm.158.1531479433220; Fri, 13 Jul 2018 03:57:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531479433; cv=none; d=google.com; s=arc-20160816; b=IB5DF215EWlwwgC/KY71km4Ishg9zqVa2KhQmdI+pPmoFzxaCvX0YXzSTbMEf9m0Ka 7NoF54bxJwcxtEdgaZeOIPIlkGXgq+9f6daJZ51whhx6IX9iuUi429o/JfQjVqGBdnXI oDwQs+Z5Uq+gxHCtezbAUUHxcTADe+kYJkOPsLxANMkbHDtDpa9foNf1dZMOclh3nl2y qa8b3Sav2Lb/D4Y9i4AMKOWFgfJ2c1qZeuSbRcCGEOhTsfzIbsRQvIJaxPRGtV6DuVFZ Dqd4cF10qw9zqqmvzd2EGFSaUx7yFkSbtsjG4+bep+VK7tQeILYAU8loVBKOMANQ8blZ x1cg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=ZtsA4VjHfT4I64hWXxyv/JXOvzpHa2Ufl1Npy20yIhg=; b=Bm+0+KliBcTF/Qf3GoahF1sUEPnEHILBCOKKaITjEh203up/FNcoUL4ztAxb+ug5z8 OtqTGOMpq9uoAsXeKJubnWXUS1Wp3gkZfx9rshZ5CYH6CdoSKUQ7QduQ0yzZ0IdpD4gs RiNTHu1mgFKnJYcSlIYJlYYwCZOIcCfl2o0j5/2ZoW9m95BfkVW4HadgmWrboTCgx3wo qELWE8NqiIHRIzado8PPT87J/9vZYenLqyOuA+gra+JmVar3NWi++McQaq7JDvGaft4N EqRIukUVU89qf5dEEwFfxBHeT/wLEK6hSuPqxIjYf00gROV6QqffwPwh/k1tRNtYmTwf hgcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail (test mode) header.i=@8bytes.org header.s=mail-1 header.b=MDXCl9uI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a20-v6si248159pls.237.2018.07.13.03.56.57; Fri, 13 Jul 2018 03:57:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail (test mode) header.i=@8bytes.org header.s=mail-1 header.b=MDXCl9uI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727409AbeGMLKc (ORCPT + 99 others); Fri, 13 Jul 2018 07:10:32 -0400 Received: from 8bytes.org ([81.169.241.247]:37790 "EHLO theia.8bytes.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727147AbeGMLKc (ORCPT ); Fri, 13 Jul 2018 07:10:32 -0400 Received: by theia.8bytes.org (Postfix, from userid 1000) id 41A7D377; Fri, 13 Jul 2018 12:56:22 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=8bytes.org; s=mail-1; t=1531479382; bh=IVDS67gxDLpxLVOnjfwZeRj3wAP+sX1YyMfHVCuQ0hE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=MDXCl9uI9bElWv58JHqetUcdV1Mp6y1V6YUcY5AuZ1fv6zmwuJ4FMH61JFM1ffUp6 ABqSgELL5Sijuune6kGXi9Rhh8E2SlcNZxdywCn+cyEjJh9U7eowq8AhWLIPAnOvYY ObPE/GFdnR8qDg/02kdBEe0LOqiiVW1+lsi1M+R2csHTaIu1I+Z2YC9rZY0pv1zHV2 0v1mc3Z0Nwc+ztVY8SScZyLTLsr1ZQzR/2SqbxfZcv1CLZNB698VU2ppy/qAo7dtmo biaW2EfyMvGjutIROaNQx20f2HOygwbE0/hCulTuFJ58BZyccnWp/oay+YdRVIxCS7 0x26oV7N81SlA== Date: Fri, 13 Jul 2018 12:56:20 +0200 From: Joerg Roedel To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , Pavel Machek , "David H . Gutteridge" , jroedel@suse.de Subject: Re: [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack Message-ID: <20180713105620.z6bjhqzfez2hll6r@8bytes.org> References: <1531308586-29340-1-git-send-email-joro@8bytes.org> <1531308586-29340-8-git-send-email-joro@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Andy, thanks for you valuable feedback. On Thu, Jul 12, 2018 at 02:09:45PM -0700, Andy Lutomirski wrote: > > On Jul 11, 2018, at 4:29 AM, Joerg Roedel wrote: > > -.macro SAVE_ALL pt_regs_ax=%eax > > +.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0 > > cld > > + /* Push segment registers and %eax */ > > PUSH_GS > > pushl %fs > > pushl %es > > pushl %ds > > pushl \pt_regs_ax > > + > > + /* Load kernel segments */ > > + movl $(__USER_DS), %eax > > If \pt_regs_ax != %eax, then this will behave oddly. Maybe it’s okay. > But I don’t see why this change was needed at all. This is a left-over from a previous approach I tried and then abandoned later. You are right, it is not needed. > > +/* > > + * Called with pt_regs fully populated and kernel segments loaded, > > + * so we can access PER_CPU and use the integer registers. > > + * > > + * We need to be very careful here with the %esp switch, because an NMI > > + * can happen everywhere. If the NMI handler finds itself on the > > + * entry-stack, it will overwrite the task-stack and everything we > > + * copied there. So allocate the stack-frame on the task-stack and > > + * switch to it before we do any copying. > > Ick, right. Same with machine check, though. You could alternatively > fix it by running NMIs on an irq stack if the irq count is zero. How > confident are you that you got #MC right? Pretty confident, #MC uses the exception entry path which also handles entry-stack and user-cr3 correctly. It might go through through the slow paranoid exit path, but that's okay for #MC I guess. And when the #MC happens while we switch to the task stack and do the copying the same precautions as for NMI apply. > > + */ > > +.macro SWITCH_TO_KERNEL_STACK > > + > > + ALTERNATIVE "", "jmp .Lend_\@", X86_FEATURE_XENPV > > + > > + /* Are we on the entry stack? Bail out if not! */ > > + movl PER_CPU_VAR(cpu_entry_area), %edi > > + addl $CPU_ENTRY_AREA_entry_stack, %edi > > + cmpl %esp, %edi > > + jae .Lend_\@ > > That’s an alarming assumption about the address space layout. How > about an xor and an and instead of cmpl? As it stands, if the address > layout ever changes, the failure may be rather subtle. Right, I implement a more restrictive check. > Anyway, wouldn’t it be easier to solve this by just not switching > stacks on entries from kernel mode and making the entry stack bigger? > Stick an assertion in the scheduling code that we’re not on an entry > stack, perhaps. That'll save us the check whether we are on the entry stack and replace it with a check whether we are coming from user/vm86 mode. I don't think that this will simplify things much and I am a bit afraid that it'll break unwritten assumptions elsewhere. It is probably something we can look into later separatly from the basic pti-x32 enablement. Thanks, Joerg