Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp2644062ybt; Fri, 3 Jul 2020 14:52:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxZ+5OiQemEFTv83J+pQfYXt1vsdrVnqDZuN57w+Qn8oQVD9XR7NjvhKJeSte6ZFrUvjm9H X-Received: by 2002:a17:906:1e95:: with SMTP id e21mr35381315ejj.240.1593813148622; Fri, 03 Jul 2020 14:52:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593813148; cv=none; d=google.com; s=arc-20160816; b=fdTxROYox8EJv++0sNMB1Qi575KC2/CY7/xw9n/w4H9QLW6DxLoH4qDF/oSC0UMUE6 DqSP4FQjuI+GJYdnZwlMNK+0FTonIEuHTCmhMbO5kMJDk2JOttynZg8kmlDSsITLGoWU nHyAFN8m4XZKgI63p3fJHov8+OqmgLEyUjgsodPt3y+ACRBgrmnBY5bulrSII3aQ5qUs 1oDwlEAH3TuOuorilEnBhYgS0waeKiwf+kPHOsXPOmwF6e3EvrLu2k9te9d/cXZlvHq1 yy3vyiAz8cmZs6NYjvF17AeOl4jKXsTl54ndTmaISeWg/TaLm/iLLpnYb4bT4v9TY8bD zWcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:ironport-sdr:ironport-sdr; bh=qExccDxDGy8le7Zx5e9EnrM5gvLIkxLoqr6Qfn6cFZk=; b=uzaV/hWER3S0cGq2WiFhe3v48FazYsSp9wgl3zYAXaZeJHtMw66d4MXP6zrEapW83G C9p6qjXaJIK5tAu5PRQmARlOyEpV1t+z4ea5z/gWDRZR1EOCiWf31+oC7gXugPq0JroF /DeIUGbPHf0Tqxg1XCqmjAb78JpAKP1k3pXxqYdTEL5W4O6b94D9m7M3Mx9UXN7ZrO2t j2UWdyaZjrPsLig50E0WUadg6yh5W8FODdVxoE7C0zZ52QRBFTl0Hri6CitS1azTAH5L 63FsU4JSwv15WfPPLOf/jl8NERX2qg8tmKLchdXTMT1ZJIByjBO+7yezuPjYxVSuAPxM PHJQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gj5si9100054ejb.49.2020.07.03.14.52.05; Fri, 03 Jul 2020 14:52:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726643AbgGCVvt (ORCPT + 99 others); Fri, 3 Jul 2020 17:51:49 -0400 Received: from mga12.intel.com ([192.55.52.136]:51126 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726258AbgGCVvs (ORCPT ); Fri, 3 Jul 2020 17:51:48 -0400 IronPort-SDR: QypOtFxETaDQDasKBbZI5139sMLRV/P1M5GdWlWfumc8p4LR2YpSMdKgJ3iyHpy9XucgnJe01x 8S07k+nW15MA== X-IronPort-AV: E=McAfee;i="6000,8403,9671"; a="126809537" X-IronPort-AV: E=Sophos;i="5.75,309,1589266800"; d="scan'208";a="126809537" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jul 2020 14:51:47 -0700 IronPort-SDR: gaU4VT34QuPfGir0q6IPf5xgGynQHvqGIU4X2zuJsg907nbT3IOqyOq2GXWmUNG7y4mTdaNYuf JMcfPOCcT6QQ== X-IronPort-AV: E=Sophos;i="5.75,309,1589266800"; d="scan'208";a="456007914" Received: from otcsectest.jf.intel.com (HELO 0e1a9e0069b7) ([10.54.30.81]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jul 2020 14:51:47 -0700 Date: Fri, 3 Jul 2020 21:48:14 +0000 From: "Andersen, John" To: Andy Lutomirski Cc: Jonathan Corbet , Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , X86 ML , "H. Peter Anvin" , Shuah Khan , "Christopherson, Sean J" , Liran Alon , Andrew Jones , Rick Edgecombe , Kristen Carlson Accardi , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , mchehab+huawei@kernel.org, Greg KH , "Paul E. McKenney" , pawan.kumar.gupta@linux.intel.com, Juergen Gross , Mike Kravetz , Oliver Neukum , Peter Zijlstra , Fenghua Yu , reinette.chatre@intel.com, vineela.tummalapalli@intel.com, Dave Hansen , Arjan van de Ven , caoj.fnst@cn.fujitsu.com, Baoquan He , Arvind Sankar , Kees Cook , Geremy Condra , Dan Williams , eric.auger@redhat.com, aaronlewis@google.com, Peter Xu , makarandsonare@google.com, "open list:DOCUMENTATION" , LKML , kvm list , "open list:KERNEL SELFTEST FRAMEWORK" , Kernel Hardening Subject: Re: [PATCH 4/4] X86: Use KVM CR pin MSRs Message-ID: <20200703214814.GA25@0e1a9e0069b7> References: <20200617190757.27081-1-john.s.andersen@intel.com> <20200617190757.27081-5-john.s.andersen@intel.com> <20200623200334.GA23@6540770db1d7> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200623200334.GA23@6540770db1d7> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > Is there a plan for fixing this for real? I'm wondering if there is a > > sane weakening of this feature that still allows things like kexec. > > > > I'm pretty sure kexec can be fixed. I had it working at one point, I'm > currently in the process of revalidating this. The issue was though that > kexec only worked within the guest, not on the physical host, which I suspect > is related to the need for supervisor pages to be mapped, which seems to be > required before enabling SMAP (based on what I'd seen with the selftests and > unittests). I was also just blindly turning on the bits without checking for > support when I'd tried this, so that could have been the issue too. > > I think most of the changes for just blindly enabling the bits were in > relocate_kernel, secondary_startup_64, and startup_32. > So I have a naive fix for kexec which has only been tested to work under KVM. When tested on a physical host, it did not boot when SMAP or UMIP were set. Undoubtedly it's not the correct way to do this, as it skips CPU feature identification, opting instead for blindly setting the bits. The physical host I tested this on does not have UMIP so that's likely why it failed to boot when UMIP gets set blindly. Within kvm-unit-tests, the test for SMAP maps memory as supervisor pages before enabling SMAP. I suspect this is why setting SMAP blindly causes the physical host not to boot. Within trampoline_32bit_src() if I add more instructions I get an error about "attempt to move .org backwards", which as I understand it means there are only so many instructions allowed in each of those functions. My suspicion is that someone with more knowledge of this area has a good idea on how best to handle this. Feedback would be much appreciated. > > There's no SMEP or SMAP in real mode, and real mode has basically no security > > mitigations at all. > > > > We'd thought about the switch to real mode being a case where we'd want to drop > pinning. However, we weren't sure how much weaker, if at all, it makes this > protection. > > Unless someone knows, I'll probably need to do some digging into what an > exploit might look like that tries switching to real mode and switching back as > a way around this protection. > TL;DR We probably shouldn't use the switch to real mode as a trigger to drop pinning. This protection assumes that the attacker is at the point where they have the ability to write a payload for a ROP/JOP attack and gain control of execution. For this case where we are going to switch to real mode we need to add an assumption that the attacker has a write primitive that allows them to write part of their payload to memory that will be addressable within 16 bit mode. If the attacker has this write primitive, the attack becomes write payloads, within the first stage, switch to real mode, use stage two within real mode via JOP or just machine code (since there's we don't have to worry NX) to setup protected mode and jump back into the kernel with protections disabled. > > PCID is an odd case. I see no good reason to pin it, and pinning PCID > > on prevents use of 32-bit mode. > > Maybe it makes sense to default to the values we have, but allow host userspace > to overwrite the allowed values, in case some other guest OS wants to do > something that Linux doesn't with PCID or other bits. In the next version of this patchset I've made it so that the default allowed values are WP, SMEP, SMAP, and UMIP. However, a write to the allowed MSR from the host VMM (QEMU) can change which bits are allowed.