Received: by 2002:ab2:3b09:0:b0:1ed:14ea:9113 with SMTP id b9csp74940lqc; Thu, 29 Feb 2024 10:40:30 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVGJ3XGVqmZzTR29XGLgDBMRURsDxMHtcCfj1LQ7uzzCSq94tk9S8n30DCxIbp/sfpP82MMjx6sbY2AkxVm90lHXMrCeff8Jcc9iOFxdg== X-Google-Smtp-Source: AGHT+IHa4uB9eCsEL29k0U9+bHyf7OdYXwQ7bJHLOehaR+f2hJlHbnkJSlZRiMqOnyO3Why57WvG X-Received: by 2002:a5b:30d:0:b0:dcc:6757:1720 with SMTP id j13-20020a5b030d000000b00dcc67571720mr3323889ybp.32.1709232030407; Thu, 29 Feb 2024 10:40:30 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709232030; cv=pass; d=google.com; s=arc-20160816; b=oHqFLH74RBxxPJO7M7AT/lWlqoRdXclv5XQnjB4jq03AqDANHwLygg3ooJMP1cJUu0 rZanDNZYMcq1cS9m3ez/CCqnAJJ7HjczebigRU6upmGLqaM7ELVFMaqnGGKaPGvucbds 6cSLQiCqOn5/n/AQLy0NreHrYEIYAm2UiDITco+aNH0sKi++l7Ysb955k+qQmJEQ8hRY RJwamP5ndELaZi/g1ntj5SajXex1JApqUrK5Es0Ma4SH/ZTLWx6w9WiRMrQgCIyFK7Ko t/XLdlnBpeJNOiQ+n4YZkPNhqeQ314KAZTNeeoDsrLOk4cWCdahvzFIkTv8vWqMJ1PIH QNRA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :in-reply-to:date:dkim-signature; bh=vY/rdOoD7orW1sSyXqQMEdLlDPVHr5lpBJycktuWo78=; fh=NF0lAm+tahEu1WsPSvDIlvkMNcYIM898RuPpXia9Ces=; b=D0Mb+Y59zqok+n13nle5pSgySh3Weehgz6mN22401k0Bd7pPltlGJXfM35BmlGsZck svLVPidr8WE1owaum9gNReiCPEEzqPWVg3QHAcIAap1cgr8aZ6QfhV0YkTFmFlf+YdAS 7OS6q3RAlNoLKUQTU73KW/jsxH2Xfx4nPp/6gIZacJx0EJQPa+kIkp7fSPfP1Q0b4m4R DyQzjdJLGPJtqOboXNT4ZLPqk2bZJCN/t1QestRUypLO81HC+Gg/yVIec8lfyaYu56/G SvGHnM/vl0cdlqyWfMyWFcW8bowAzs/NOFl/SpkiUW8hjgJYc+jZdWEmlf971EzCj8rh w8yg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=v98Jnhab; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-87302-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-87302-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d3-20020a05622a100300b0042eb8d52057si2034370qte.320.2024.02.29.10.40.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Feb 2024 10:40:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-87302-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=v98Jnhab; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-87302-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-87302-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id D64AA1C22C9C for ; Thu, 29 Feb 2024 18:40:19 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 11623132C1E; Thu, 29 Feb 2024 18:40:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="v98Jnhab" Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70D8D160629 for ; Thu, 29 Feb 2024 18:40:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709232009; cv=none; b=LUDEVbT4Nu+Qg3ifACAH9M9I2nR9nMSD3sYgck/KuD7Dh1aJ7nRBJO080HFS5OGiLeIUteEJfCv/buZCdsvTR/LQNY0V4InB5w3sK/VpHCfi+vk+xWLNV6IE5zP7vt9QXRhbKGfy0Ha+kW0YJBFBT2Wox15/37CTYnNReZjbs/Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709232009; c=relaxed/simple; bh=EfSzWHQXiIuv/QXiMcoNXY6pXi3oCnDesgNhCcVu5z4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=umM4UZaIvCyq/oDVDeoEsP8dsFjNrkK703LygDpbv8vIvIkZNUJ9eGsrrnaq6y252/gbaSFy1mF6DLTq8aJ1VaVG4oMEUsfvxExdPvBXTjE3+54kdGPUrHow8GLJOudoZpxZQW8BLVpE3WGsrbbcqeOWo+iiABra/3P8AXnaxyo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=v98Jnhab; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-607e613a1baso22984997b3.1 for ; Thu, 29 Feb 2024 10:40:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709232005; x=1709836805; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=vY/rdOoD7orW1sSyXqQMEdLlDPVHr5lpBJycktuWo78=; b=v98JnhabzmqTSfjRq8Je7E5qmQeMOZC6VxwZfVfo30sJoX6NsNLR/KOz7t8MiLnF2n DPbWvdtLZ60biyTC7scafSoEnF5ppBAbu3+9lzuKSdYDzL6vFwxYkPNSzuR3CgAGeXRH NKxBxkTNc7FTgIcuYAD5BK4+bw54wqNX3M35ocCo/PvEFe9nVHbyQsrCRK9PRze8P36E 1Bi54/TgcYMKz71c8Myr8pvTJiSYvn3d79rCLyZcEEulFCqCj5CybtiQrpVbDL1wtj4o iF/6Uq2dGV4Te9plU0lblnxmqr9xW/RinYxB/IvV4u8RtDhkUK7T2JQtJoUU1Fsb4iZw VHJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709232005; x=1709836805; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=vY/rdOoD7orW1sSyXqQMEdLlDPVHr5lpBJycktuWo78=; b=CWJJfkeni3YqaZKDC7QJHGnyW8s5cYoCjfG9HihnzXBmO29LhXDWqMGRy2Y3z9I3cw DwAmcagP9aUVnrXLEYLq2bc+SxdzqcsH7k7F3dgrplIqzCaFcsjnW7hn44xKLaMvH994 0LK/vwFL+ruJrco0gdQkyO5KMrLPFTBIZ1uV98uAWetT/Tm3M9OrJD72Y/F5KhL6laAX lYb4nZn0z6nBjmADEB5qcFdStxT4WwKazm3WOVGNjUKYORKzHBLxs0B65zXAtF5Cezd6 gOFOo37TIYbVTPUNi7i0aZFv4i7TgEWrH4NsY9ASL44jBwmQ/EgUh1j6CsdEOW7mhbbz k6ww== X-Forwarded-Encrypted: i=1; AJvYcCXF12dAmgmBumRzck9mFqgemCS7YgXHiPx6w7Ld0PkKf+IWtfsJriP9z0jIQkf8JdZYuZ7kFd+s2cD1MP4IjQErrOi96MQJxl5CSVS8 X-Gm-Message-State: AOJu0Yyohdh3t5Z6JbLH2bOjzNqTKQoIgFhYRxMGKL2ayriTyfjnYQEo k0butdkTO6DCwO37EpmgyBy+C7u3UClaYSo+5F+DurxQgD12CRuuu/80vWeMBjpQJrMx18NW3ju YeQ== X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a81:9b47:0:b0:608:1b39:246b with SMTP id s68-20020a819b47000000b006081b39246bmr706371ywg.3.1709232005455; Thu, 29 Feb 2024 10:40:05 -0800 (PST) Date: Thu, 29 Feb 2024 10:40:03 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240228024147.41573-1-seanjc@google.com> <20240228024147.41573-3-seanjc@google.com> Message-ID: Subject: Re: [PATCH 02/16] KVM: x86: Remove separate "bit" defines for page fault error code masks From: Sean Christopherson To: Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Yan Zhao , Isaku Yamahata , Michael Roth , Yu Zhang , Chao Peng , Fuad Tabba , David Matlack Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Thu, Feb 29, 2024, Paolo Bonzini wrote: > On Wed, Feb 28, 2024 at 3:46=E2=80=AFAM Sean Christopherson wrote: > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > > index 60f21bb4c27b..e8b620a85627 100644 > > --- a/arch/x86/kvm/mmu.h > > +++ b/arch/x86/kvm/mmu.h > > @@ -213,7 +213,7 @@ static inline u8 permission_fault(struct kvm_vcpu *= vcpu, struct kvm_mmu *mmu, > > */ > > u64 implicit_access =3D access & PFERR_IMPLICIT_ACCESS; > > bool not_smap =3D ((rflags & X86_EFLAGS_AC) | implicit_access) = =3D=3D X86_EFLAGS_AC; > > - int index =3D (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1; > > + int index =3D (pfec + (not_smap << ilog2(PFERR_RSVD_MASK))) >> = 1; >=20 > Just use "(pfec + (not_smap ? PFERR_RSVD_MASK : 0)) >> 1". >=20 > Likewise below, "pte_access & PT_USER_MASK ? PFERR_RSVD_MASK : 0"/ >=20 > No need to even check what the compiler produces, it will be either > exactly the same code or a bunch of cmov instructions. I couldn't resist :-) The second one generates identical code, but for this one: int index =3D (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1; gcc generates almost bizarrely different code in the call from vcpu_mmio_gv= a_to_gpa(). clang is clever enough to realize "pfec" can only contain USER_MASK and/or = WRITE_MASK, and so does a ton of dead code elimination and other optimizations. But fo= r some reason, gcc doesn't appear to realize that, and generates a MOVSX when comp= uting "index", i.e. sign-extends the result of the ADD (at least, I think that's = what it's doing). There's no actual bug today, and the vcpu_mmio_gva_to_gpa() path is super s= afe since KVM fully controls the error code. But the call from FNAME(walk_addr= _generic) uses a _much_ more dynamic error code. If an error code with unexpected bits set managed to get into permission_fa= ult(), I'm pretty sure we'd end up with out-of-bounds accesses. KVM sanity checks= that PK and RSVD aren't set,=20 WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK)); but KVM unnecessarily uses an ADD instead of OR, here int index =3D (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1; and here /* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */ offset =3D (pfec & ~1) + ((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT)); i.e. if the WARN fired, KVM would generate completely unexpected values due= to adding two RSVD bit flags. And if _really_ unexpected flags make their way into permission_fault(), e.= g. the upcoming RMP flag (bit 31) or Intel's SGX flag (bit 15), then the use of in= dex fault =3D (mmu->permissions[index] >> pte_access) & 1; could generate a read waaaya outside of the array. It can't/shouldn't happ= en in practice since KVM shouldn't be trying to emulate RMP violations or faults = in SGX enclaves, but it's unnecessarily dangerous. Long story short, I think we should get to the below (I'll post a separate = series, assuming I'm not missing something). unsigned long rflags =3D static_call(kvm_x86_get_rflags)(vcpu); unsigned int pfec =3D access & (PFERR_PRESENT_MASK | PFERR_WRITE_MASK | PFERR_USER_MASK | PFERR_FETCH_MASK); /* * For explicit supervisor accesses, SMAP is disabled if EFLAGS.AC =3D 1. * For implicit supervisor accesses, SMAP cannot be overridden. * * SMAP works on supervisor accesses only, and not_smap can * be set or not set when user access with neither has any bearing * on the result. * * We put the SMAP checking bit in place of the PFERR_RSVD_MASK bit; * this bit will always be zero in pfec, but it will be one in index * if SMAP checks are being disabled. */ u64 implicit_access =3D access & PFERR_IMPLICIT_ACCESS; bool not_smap =3D ((rflags & X86_EFLAGS_AC) | implicit_access) =3D=3D X86_= EFLAGS_AC; int index =3D (pfec | (not_smap ? PFERR_RSVD_MASK : 0)) >> 1; u32 errcode =3D PFERR_PRESENT_MASK; bool fault; kvm_mmu_refresh_passthrough_bits(vcpu, mmu); fault =3D (mmu->permissions[index] >> pte_access) & 1; /* * Sanity check that no bits are set in the legacy #PF error code * (bits 31:0) other than the supported permission bits (see above). */ WARN_ON_ONCE(pfec !=3D (unsigned int)access); if (unlikely(mmu->pkru_mask)) { u32 pkru_bits, offset; /* * PKRU defines 32 bits, there are 16 domains and 2 * attribute bits per domain in pkru. pte_pkey is the * index of the protection domain, so pte_pkey * 2 is * is the index of the first bit for the domain. */ pkru_bits =3D (vcpu->arch.pkru >> (pte_pkey * 2)) & 3; /* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */ offset =3D (pfec & ~1) | (pte_access & PT_USER_MASK ? PFERR_RSVD_MASK : 0= ); pkru_bits &=3D mmu->pkru_mask >> offset; errcode |=3D -pkru_bits & PFERR_PK_MASK; fault |=3D (pkru_bits !=3D 0); } return -(u32)fault & errcode;