Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp802958pxp; Fri, 11 Mar 2022 15:33:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJwTWugncHdkJnk2jQM1SOL0QRLkXtC9l7fVCgqLfww2JYDctZML7sBfFV6upWy0XKrAztae X-Received: by 2002:a17:903:234e:b0:151:f6b0:9e70 with SMTP id c14-20020a170903234e00b00151f6b09e70mr13332362plh.27.1647041625885; Fri, 11 Mar 2022 15:33:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1647041625; cv=none; d=google.com; s=arc-20160816; b=hWrrBkItzifw6+Raah4GsOX0I1922v93ov4NDYg/O5yt1FErUqT/Dli6QiTM50DZge L+Ra6btvpn6FY0eRMoiaX77jjw6j0qSctwekr8gjB7OoFW6IMsvP8qPQRWZBWCAZiHZu LeWBKAICqn+/qDkUwNQYG6MTGO2zn5XTp1syLxUeZ26MkiK6hALWwf0kupgkN/F7AdHX Zsq6eJpZCGpCLnt1Fflnc2OLHv0ZqU1eiYQoF9uqGhMMZPi9wJ91FYlQyya8PL3pj4kI qGr0maE1NEZexnKlRNG3Qj3s15w+mB3kl89gz6AA2IliXYGorKwJWtK4F5cXvp6H7SRA OjhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=XH+5/+cW8VQKO81AORhyT2Hmdy1TzbLeTlOyFWJWYeA=; b=Gcj/luuA70zkQCziegmKoLB7tktCaTLiSPG/D9MGtQE+BS8G0WP5KZHngf5KzLWhP2 HluEq1CsQ2LXwxRWygxqq3ebDhk3JSRgdTN/OJ9g6pnFw2LaI9i08jF+5HA8FiAAegD7 W0/hww01bY4GnoGrsY9Y/ZJAyDrtQwuWT9jp/z2JvJdRXyRCYMmnyDJROJEFNvxOx0iX ufJci6tiJbUze2sv4jEqvJ25szx2hDCNuzE5udVnyb4yaH0xXj6GiC3xhNzpaUpP+yt7 9ecBHIKiUER0kr28Ef2Ro6QQzjJAzBTkdoVYg9X9j5AxfNfZe2/TmEBUXrMX6edxGnaf vfcA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=PgciNoUU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id n8-20020a170903110800b0015345edee13si1118752plh.299.2022.03.11.15.33.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Mar 2022 15:33:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=PgciNoUU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A753119457D; Fri, 11 Mar 2022 14:29:32 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229481AbiCKW3T (ORCPT + 99 others); Fri, 11 Mar 2022 17:29:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229472AbiCKW3R (ORCPT ); Fri, 11 Mar 2022 17:29:17 -0500 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC62F2706FB for ; Fri, 11 Mar 2022 14:18:34 -0800 (PST) Received: by mail-oi1-x22e.google.com with SMTP id o64so10914189oib.7 for ; Fri, 11 Mar 2022 14:18:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=XH+5/+cW8VQKO81AORhyT2Hmdy1TzbLeTlOyFWJWYeA=; b=PgciNoUU40YKRk+bsPBh194RRGa5sI3NFQvsgyxxdemnlRf1lRQ/7uF7PvVTwNVjfQ 7Wyakm1uYkqo7VUjKAM5orbK+8q8lq7akwX+fqKy6nN1+sjl1DfS0c8YnPPtBGF5jDg6 fJkE1+lwwLz8zeQVLEywqWQi90ErE7gVWl4dNjehnQZsCox+D81F4cPth65VOI7g0VKD WfK1qgaHCY4UHfnAsl3pcdK/OEzZAO2tLDrXaen8KPpom8gQnUTQbnAxvLrka4s29uMb eS9Up5m0pSRi6L7s5+L8KqFaRtMyIWUWJDKnkL7XPpKByOKQcdqnCmmVV9QkS2tr+VFX 4SGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=XH+5/+cW8VQKO81AORhyT2Hmdy1TzbLeTlOyFWJWYeA=; b=H1xl/B+2hmGtgEFO6e9KRYIX3sq8iumGRNqsB4RbYNvI8LNfJVr0tYci8jMge2ln3N l5kIiVIglYJKC4qVg//B3P+hVA1r4vO5NvneErudujMwCz8a0qWlBZddpWvZhJOGjoHt oD4b2WOrJqbb7gtdBivGgya0gNjkFjRESLPGcT4rvBd0hucmFioKlL+R80ryIadP4bSJ qytn3yW8diGD1UaJ8QZ3OECAfrmWCD/rceVJ9Oe5iJKfMKygDtp/i56e842Lw0kmXh5K /HgSr7wR28vJ4Dy7Td/ko9arpIFy7miCNQEBxUSjt4fNJTa+SpXutAKIrk58MJ9EsrrD 0XWQ== X-Gm-Message-State: AOAM532eQn+jq29d6OpmRb3BUDgFZ5DtwUN7+8MPy7JMDZ/M5TmIx3lq hrlW4Y5kERqORmQmYgjMnVS2oBRNnq4= X-Received: by 2002:a17:90b:4ac1:b0:1bf:6d51:1ad9 with SMTP id mh1-20020a17090b4ac100b001bf6d511ad9mr23648278pjb.199.1647033421514; Fri, 11 Mar 2022 13:17:01 -0800 (PST) Received: from smtpclient.apple ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id c14-20020a056a00248e00b004f77e0fbfc0sm6864387pfv.185.2022.03.11.13.17.00 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Mar 2022 13:17:01 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.60.0.1.1\)) Subject: Re: [RESEND PATCH v3 2/5] x86/mm: check exec permissions on fault From: Nadav Amit In-Reply-To: <70e08bd5-187a-daee-2822-1d9a437a9cff@intel.com> Date: Fri, 11 Mar 2022 13:16:59 -0800 Cc: Linux-MM , Linux Kernel Mailing List , Andrew Morton , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , "x86@kernel.org" Content-Transfer-Encoding: quoted-printable Message-Id: References: <20220311190749.338281-1-namit@vmware.com> <20220311190749.338281-3-namit@vmware.com> <70e08bd5-187a-daee-2822-1d9a437a9cff@intel.com> To: Dave Hansen X-Mailer: Apple Mail (2.3693.60.0.1.1) X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Mar 11, 2022, at 12:59 PM, Dave Hansen = wrote: >=20 > On 3/11/22 12:38, Nadav Amit wrote: >>> On Mar 11, 2022, at 11:41 AM, Dave Hansen = wrote: > ... >>> Can any sane code trigger this? >>=20 >> Well, regarding this question and the previous one: I do not think = that >> this scenario is possible today since mprotect() holds the mmap_lock >> for write. There is no other code that I am aware of that toggles >> the NX bit on a present entry. >>=20 >> But I will not bet my life on it. That=E2=80=99s the reason for the = somewhat >> vague phrasing that I used. >=20 > =46rom the userspace perspective, mmap(MAP_FIXED) can do this too. = But, > sane userspace can't rely on the syscall to have done any work and the > TLB flushing is currently done before the syscall returns. >=20 > I'd put it this way: >=20 > Today, it is possible for a thread to end up in access_error() > for a PF_INSN fault and observe a VM_EXEC VMA. If you are > generous, this could be considered a spurious fault. >=20 > However, the faulting thread would have had to race with the > thread which was changing the PTE and the VMA and is currently > *in* mprotect() (or some other syscall). In other words, the > faulting thread can encounter this situation, but it never had > any assurance from the kernel that it wouldn't fault. This is > because the faulting thread never had a chance to observe the > syscall return. >=20 > There is no evidence that the existing behavior can cause any > issues with sane userspace. Done. Thanks. >=20 >>>> index d0074c6ed31a..ad0ef0a6087a 100644 >>>> --- a/arch/x86/mm/fault.c >>>> +++ b/arch/x86/mm/fault.c >>>> @@ -1107,10 +1107,28 @@ access_error(unsigned long error_code, = struct vm_area_struct *vma) >>>> (error_code & X86_PF_INSTR), = foreign)) >>>> return 1; >>>>=20 >>>> - if (error_code & X86_PF_WRITE) { >>>> + if (error_code & (X86_PF_WRITE | X86_PF_INSTR)) { >>>> + /* >>>> + * CPUs are not expected to set the two error code bits >>>> + * together, but to ensure that hypervisors do not = misbehave, >>>> + * run an additional sanity check. >>>> + */ >>>> + if ((error_code & (X86_PF_WRITE|X86_PF_INSTR)) =3D=3D >>>> + (X86_PF_WRITE|X86_PF_INSTR)) { >>>> + WARN_ON_ONCE(1); >>>> + return 1; >>>> + } >>>=20 >>> access_error() is only used on the do_user_addr_fault() side of = things. >>> Can we stick this check somewhere that also works for kernel address >>> faults? This is a generic sanity check. It can also be in a = separate >>> patch. >>=20 >> I can wrap it in a different function and also call it from >> do_kern_addr_fault() or spurious_kernel_fault(). >>=20 >> Anyhow, spurious_kernel_fault() should handle spurious faults on >> executable code correctly.=20 >=20 > This is really about checking the sanity of the "hardware"-provided > error code. Let's just do it in handle_page_fault(), maybe hidden in = a > function like: >=20 > void check_error_code_sanity(unsigned long error_code) > { > WARN_ON_ONCE(...); > } >=20 > You can leave the X86_PF_PK check in place for now. It's probably = going > away soon anyway. Done. Thanks. But note that removing the check from access_error() means that if the assertion is broken, userspace might crash inadvertently (in contrast to the version I sent, which would have potentially led to infinite stream of page-faults). I don=E2=80=99t know which behavior is = better, so let=E2=80=99s go with your version and just hope it doesn=E2=80=99t = happen. >=20 >>> Also, we should *probably* stop talking about CPUs here. If there's >>> ever something wonky with error code bits, I'd put my money on a = weird >>> hypervisor before any kind of CPU issue. >>=20 >> I thought I manage to convey exactly that in the comment. Can you = provide >> a better phrasing? >=20 > Maybe: >=20 > /* > * X86_PF_INSTR for instruction _fetches_. Fetches never write. > * X86_PF_WRITE should never be set with X86_PF_INSTR. > * > * This is most likely due to a buggy hypervisor. > */ Done, thank you.