Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp38267ybt; Tue, 23 Jun 2020 14:41:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz+Se3EIM4FF4x/dUdR/6aqW+GynuvlByibD0uh+cn8y1qXajHFKCdwiNUNIbMSQRQdcw/b X-Received: by 2002:a17:907:119a:: with SMTP id uz26mr23004892ejb.523.1592948500667; Tue, 23 Jun 2020 14:41:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592948500; cv=none; d=google.com; s=arc-20160816; b=YndRduInXgaC8198GRp0zwJfelxe8P9q3uZsWh0yAecgRFvxX7zqHbrrN43BiDX7n9 WVUf+T0oVXTHdo2/pOsFT2mrb5jDBwJhN2T/vwSZna3efCyyGBx5UKj7m2g1/sb9WnR8 66We00IQHbGfhJ9gdaVw76xprcgLcn2WuTCzDF/hWVTSe5jR+ihiz3UclZzUkyqQWQCm XXTu8c+8L7ikeklQ6qijcxbdgFHAdFtAodToIyj5XKlYVlML7hNJIXMgJAsQg30aJzXc BBP9uIcBZP7YPbUTXETaLeEnCRxLgGsl9ys0EKH7g7sJgu9/ayeQzaJzgBiIGTR1FUfR pZBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:ironport-sdr:ironport-sdr; bh=kUxLsjW6o2T6CrKWcSQ6p2W75wCo90MEHk8rjjDkJTo=; b=TSoCw8kER2AvcvnRTV/06bjqDzhQwfCVNZnqoHvjpSx5GyNdBOfNtjPQfwAteLxXaS BL+uc9YW1G4epui7XCAvO8YgvZbTuKs1hH586ShRgrsLAmYNdD6fwURyizI8Oq/dQu6K Ok/yuAC7QJEHpY6F1iNR+/Wdln2XOBD8wG6uJqlYekKFFqnuum3Bt28cRUCSzrM41px3 WU2YEgos1H3FKC4WbrVBRJirhqKhGEskZMhAdVhZ7Z3pqnL5MTQUrixs6cgQXclKAAg6 7kP9c08RjZ/vK3xiBL8CIbCwkMPJdLzQkwenO8YfTQaHvTnsfwi1/UHCYySIQlSbAKoq xjkg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d17si11832347edz.508.2020.06.23.14.41.15; Tue, 23 Jun 2020 14:41:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393868AbgFWVjg (ORCPT + 99 others); Tue, 23 Jun 2020 17:39:36 -0400 Received: from mga07.intel.com ([134.134.136.100]:53885 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387773AbgFWUHF (ORCPT ); Tue, 23 Jun 2020 16:07:05 -0400 IronPort-SDR: +Aas+ROpdhZR/x82s1jEC4syCnTI4vlepPjlBKogu9ymwwSZbTVkF5Y5M0YGawDVlE2RvFihKL x4L/sXVRfLtA== X-IronPort-AV: E=McAfee;i="6000,8403,9661"; a="209400455" X-IronPort-AV: E=Sophos;i="5.75,272,1589266800"; d="scan'208";a="209400455" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2020 13:07:03 -0700 IronPort-SDR: BUquHP6/egIubskUC+wd1qMKZIo8Ikj9vc9R1L2KgEVN7Re1Fh7E6yhj8tO0FBTJxozd8Oe4ES HDhJEnS/VEsA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,272,1589266800"; d="scan'208";a="301379329" Received: from otcsectest.jf.intel.com (HELO 6540770db1d7) ([10.54.30.81]) by fmsmga004.fm.intel.com with ESMTP; 23 Jun 2020 13:07:02 -0700 Date: Tue, 23 Jun 2020 20:03:35 +0000 From: "Andersen, John" To: Andy Lutomirski Cc: Jonathan Corbet , Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , X86 ML , "H. Peter Anvin" , Shuah Khan , "Christopherson, Sean J" , Liran Alon , Andrew Jones , Rick Edgecombe , Kristen Carlson Accardi , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , mchehab+huawei@kernel.org, Greg KH , "Paul E. McKenney" , pawan.kumar.gupta@linux.intel.com, Juergen Gross , Mike Kravetz , Oliver Neukum , Peter Zijlstra , Fenghua Yu , reinette.chatre@intel.com, vineela.tummalapalli@intel.com, Dave Hansen , Arjan van de Ven , caoj.fnst@cn.fujitsu.com, Baoquan He , Arvind Sankar , Kees Cook , Dan Williams , eric.auger@redhat.com, aaronlewis@google.com, Peter Xu , makarandsonare@google.com, "open list:DOCUMENTATION" , LKML , kvm list , "open list:KERNEL SELFTEST FRAMEWORK" , Kernel Hardening Subject: Re: [PATCH 4/4] X86: Use KVM CR pin MSRs Message-ID: <20200623200334.GA23@6540770db1d7> References: <20200617190757.27081-1-john.s.andersen@intel.com> <20200617190757.27081-5-john.s.andersen@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 19, 2020 at 10:13:25PM -0700, Andy Lutomirski wrote: > On Wed, Jun 17, 2020 at 12:05 PM John Andersen > wrote: > > Guests using the kexec system call currently do not support > > paravirtualized control register pinning. This is due to early boot > > code writing known good values to control registers, these values do > > not contain the protected bits. This is due to CPU feature > > identification being done at a later time, when the kernel properly > > checks if it can enable protections. As such, the pv_cr_pin command line > > option has been added which instructs the kernel to disable kexec in > > favor of enabling paravirtualized control register pinning. crashkernel > > is also disabled when the pv_cr_pin parameter is specified due to its > > reliance on kexec. > > Is there a plan for fixing this for real? I'm wondering if there is a > sane weakening of this feature that still allows things like kexec. > I'm pretty sure kexec can be fixed. I had it working at one point, I'm currently in the process of revalidating this. The issue was though that kexec only worked within the guest, not on the physical host, which I suspect is related to the need for supervisor pages to be mapped, which seems to be required before enabling SMAP (based on what I'd seen with the selftests and unittests). I was also just blindly turning on the bits without checking for support when I'd tried this, so that could have been the issue too. I think most of the changes for just blindly enabling the bits were in relocate_kernel, secondary_startup_64, and startup_32. > What happens if a guest tries to reset? For that matter, what happens > when a guest vCPU sends SIPI to another guest vCPU? The target CPU > starts up in real mode, right? > In this case we hit kvm_vcpu_reset, where we clear pinning. Yes I believe it starts up in real mode. > There's no SMEP or SMAP in real mode, and real mode has basically no security > mitigations at all. > We'd thought about the switch to real mode being a case where we'd want to drop pinning. However, we weren't sure how much weaker, if at all, it makes this protection. Unless someone knows, I'll probably need to do some digging into what an exploit might look like that tries switching to real mode and switching back as a way around this protection. If we can use the switch to real mode as a drop pinning trigger then I think that might just solve the kexec problem. > PCID is an odd case. I see no good reason to pin it, and pinning PCID > on prevents use of 32-bit mode. Maybe it makes sense to default to the values we have, but allow host userspace to overwrite the allowed values, in case some other guest OS wants to do something that Linux doesn't with PCID or other bits.