Received: by 2002:ac2:48a3:0:0:0:0:0 with SMTP id u3csp568636lfg; Fri, 11 Mar 2022 13:34:19 -0800 (PST) X-Google-Smtp-Source: ABdhPJxYRiPTuHhNnvRp7sweWCH8Ht/SC5zSKT0K7wfcCvIUAN99g1WuMT1gG6yPxOZoXib49zEA X-Received: by 2002:a17:90b:1bc5:b0:1bf:1c96:66ac with SMTP id oa5-20020a17090b1bc500b001bf1c9666acmr23483788pjb.167.1647034459468; Fri, 11 Mar 2022 13:34:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1647034459; cv=none; d=google.com; s=arc-20160816; b=wnF0bU3q4vvYjySU9PxwYwLIJ/X5lGX6pNX+WuL4PEi4cFegzc/kRBO5i2CZQxlx5j 0kDxAOokO6U3wpIJ1busNzbaUNk0Lqm74RhX7XoDsptXr1PZD+iz85cifqiRnOSN4SHQ 0KnOTEjrJLgVApq61635GZE8V9pwQ4zAwC8tbbfFIoMs5ZPyWQ2IhhbSvtOrTmdSKngh VwzmOtjWVI6dSdkHBv9xyqHNzSvdKwUGDRJGNljrwsfn3CoVjLLxwIX/Xu11PeUn+p4R f/tfw+4xQeCS4/xT/rL7dv4SaZvAf92ehv7YWelAuQInBJ9fW7ok7ZilJDNJegHEH7jm WE8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature; bh=MoZ2qNKrT/Lc488k6fmLJIUtTw3bOntVQ/JflyXWKyk=; b=xwbUutHwpMMygtP0MQtFYCo3nUZA1gc49/lpCEazVEnhIFWQbLhGh9S6TPYsfeyy2r lakzbk+stTCJS0+6UGbc5lkf5LxS5B4ptOxH+3qCznsiXfaW0jjkLTmA3uTnLyoxfpGx qIsDYKxBaYp+vnrdk390rMS8ny6r0qQpdMVq2gVE+If9qb/5vJ5IGK3wRcPTvYFbaa+s CQzJgcz42rvgjIdGJzKlboJEUNpXMzjfyKxWsJeDEXcP4Ev/kNM1gBrIRCmcOxSb7L1K 540lLrP+FOVpyZGPdEbulk2tZfmemcu+LX9qGNp+FkEQETQf9RS/rzMoRyDJJYPrfROd Krlw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=dZPSAZML; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id v202-20020a6361d3000000b00380fb4dd9dfsi4655099pgb.595.2022.03.11.13.34.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Mar 2022 13:34:19 -0800 (PST) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=dZPSAZML; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C8EC31070A5; Fri, 11 Mar 2022 13:03:21 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351156AbiCKTnK (ORCPT + 99 others); Fri, 11 Mar 2022 14:43:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232941AbiCKTnJ (ORCPT ); Fri, 11 Mar 2022 14:43:09 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48EC51A39CE for ; Fri, 11 Mar 2022 11:42:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647027726; x=1678563726; h=message-id:date:mime-version:to:cc:references:from: subject:in-reply-to:content-transfer-encoding; bh=mQiNdXMDSm6S2pRJONnpTeru2KszX9Gpzb6dViTFnCk=; b=dZPSAZMLEeZDXOK8xTUio//EH5Zn3J/OEszEIoSc94bjLNdk2+dD/VGM eFcNxS9b9ZtwvkCsYzVccQe0PcqFaoXmcI/2UK7AY7ceT8AzDruVZhaMu fYIFKEzQoUck5v0pd6JAo3Phh01aJqXnWLn8xZfmuLDwXiliaP4mBYUb2 +oRJI30BX468sYvKHynuDNJdNe5xrxuOyXsqqpF4Vb4ftzyjIqJiQg1+g eCCbQfifX3se+Yu1uu8UW+2Hwv+79kIZbEuDwq4qyDtWN41vk20PxQFIN XdfRW/VexOaJQtS1S0zgltr58Vm5Pixuk8+Vn1gVdM508YeKLSbjW9v7R Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10283"; a="280386875" X-IronPort-AV: E=Sophos;i="5.90,174,1643702400"; d="scan'208";a="280386875" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Mar 2022 11:42:05 -0800 X-IronPort-AV: E=Sophos;i="5.90,174,1643702400"; d="scan'208";a="645033254" Received: from cpeirce-mobl1.amr.corp.intel.com (HELO [10.212.128.243]) ([10.212.128.243]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Mar 2022 11:42:05 -0800 Message-ID: Date: Fri, 11 Mar 2022 11:41:58 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Content-Language: en-US To: Nadav Amit , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org References: <20220311190749.338281-1-namit@vmware.com> <20220311190749.338281-3-namit@vmware.com> From: Dave Hansen Subject: Re: [RESEND PATCH v3 2/5] x86/mm: check exec permissions on fault In-Reply-To: <20220311190749.338281-3-namit@vmware.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/11/22 11:07, Nadav Amit wrote: > From: Nadav Amit > > access_error() currently does not check for execution permission > violation. As a result, spurious page-faults due to execution permission > violation cause SIGSEGV. This is a bit muddy on the problem statement. I get that spurious faults can theoretically cause this, but *do* they in practice on current kernels? > It appears not to be an issue so far, but the next patches avoid TLB > flushes on permission promotion, which can lead to this scenario. nodejs > for instance crashes when TLB flush is avoided on permission promotion. By "it appears not to be an issue", do you mean that this suboptimal behavior can not be triggered, period? Or, it can be triggered but folks seem not to care that it can be triggered? I *think* these can be triggered today. I think it takes two threads that do something like: Thread 1 Thread 2 ======== ======== ptr = malloc(); memcpy(ptr, &code, len); exec_now = 1; while (!exec_now); call(ptr); // fault mprotect(ptr, PROT_EXEC, len); // fault sees VM_EXEC But that has a bug: exec_now is set before the mprotect(). It's not sane code. Can any sane code trigger this? > Add a check to prevent access_error() from returning mistakenly that > spurious page-faults due to instruction fetch are a reason for an access > error. > > It is assumed that error code bits of "instruction fetch" and "write" in > the hardware error code are mutual exclusive, and the change assumes so. > However, to be on the safe side, especially if hypervisors misbehave, > assert this is the case and warn otherwise. > > Cc: Andrea Arcangeli > Cc: Andrew Cooper > Cc: Andrew Morton > Cc: Andy Lutomirski > Cc: Dave Hansen > Cc: Peter Xu > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Will Deacon > Cc: Yu Zhao > Cc: Nick Piggin > Cc: x86@kernel.org > Signed-off-by: Nadav Amit > --- > arch/x86/mm/fault.c | 22 ++++++++++++++++++++-- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > index d0074c6ed31a..ad0ef0a6087a 100644 > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -1107,10 +1107,28 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) > (error_code & X86_PF_INSTR), foreign)) > return 1; > > - if (error_code & X86_PF_WRITE) { > + if (error_code & (X86_PF_WRITE | X86_PF_INSTR)) { > + /* > + * CPUs are not expected to set the two error code bits > + * together, but to ensure that hypervisors do not misbehave, > + * run an additional sanity check. > + */ > + if ((error_code & (X86_PF_WRITE|X86_PF_INSTR)) == > + (X86_PF_WRITE|X86_PF_INSTR)) { > + WARN_ON_ONCE(1); > + return 1; > + } access_error() is only used on the do_user_addr_fault() side of things. Can we stick this check somewhere that also works for kernel address faults? This is a generic sanity check. It can also be in a separate patch. Also, we should *probably* stop talking about CPUs here. If there's ever something wonky with error code bits, I'd put my money on a weird hypervisor before any kind of CPU issue. > /* write, present and write, not present: */ > - if (unlikely(!(vma->vm_flags & VM_WRITE))) > + if ((error_code & X86_PF_WRITE) && > + unlikely(!(vma->vm_flags & VM_WRITE))) > + return 1; > + > + /* exec, present and exec, not present: */ > + if ((error_code & X86_PF_INSTR) && > + unlikely(!(vma->vm_flags & VM_EXEC))) > return 1; > + > return 0; > } This is getting really ugly. I think we've gone over this before, but it escapes me. Why do we need a common (X86_PF_WRITE | X86_PF_INSTR) block of code? Why can't we just add a simple X86_PF_INSN if() that mirrors the current X86_PF_WRITE one? if (error_code & X86_PF_INSN) { /* present and not exec: */ if (unlikely(!(vma->vm_flags & VM_EXEC))) return 1; return 0; }