Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp26255imm; Wed, 29 Aug 2018 12:58:28 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaOdmllMyb0UBzIfF/la4kCqBuS3UADA8xgudZk5urDBXvAk3MT1aX/Tfu9BtfjMrPm8dUK X-Received: by 2002:a63:d20e:: with SMTP id a14-v6mr4847381pgg.226.1535572708037; Wed, 29 Aug 2018 12:58:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535572708; cv=none; d=google.com; s=arc-20160816; b=onGv8PvlYMjZRdk4Af4t7uJYV+yWHuKR1vvvCiq092wSxi7cu6bBDPVfvQI8bmei0w C2kt0pD1NvYnJLspHGLuT905Pt/20EGQzJslLmKcs4+dip2qokZ6XqTtSqF9FQc+xwRf fOxEEJHpIeKwtaFtu/Ui5RLwNuWDPiVe1v+s64hUrECBEGvHxg7EVTDJhidKYTqYCfna Z+lxhmZF2lmv/HgCq5jiMWcKbPdBtjYg0vmjujV1cwRd+tztjFTcfgosykOhZt161o5T k773D0WXI5VTZQIkizug5xnOpAuiAlE4BIaN4XYmssWCseQ/qjp2cK2j1XrBc/Pfnmp2 HUQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=vtUWS8XgeSRwh9px562zhz22x72BIIc9h2mnI+0Po3g=; b=RAZa82++biqCCl1gO6S9eBsun91DIjE49ICTAU6+qFHSJc+lqXYkhGvu9I6fgIyMYL aWAogaJ/PXANScr1aPgapz6KZKEbjyMmdOg0fz9/W0lql3mTfhZMh5Ut0QQYoPiuesWB UT4wQB3bw6SWpmvNhbvyyOpVotwTu7ATXU7M2Hqt66Ppu35j6u96wR+FvIJnoATZPY0M 1QCNlHVSAEPJx5qGfZgDarf1cZ8UzD5Ar/EPl3gRDdNsKewQYszZ6XPVUCqayzQUH3nt C3SutGP3oZ5mjx5+uiRfLX/VsGtAeu+2aeTVF+wlwUe9xU2S3eJoqFTCM9Wxv+fiUEsU oAnQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bb4-v6si4462100plb.467.2018.08.29.12.58.11; Wed, 29 Aug 2018 12:58:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728139AbeH2Xyd (ORCPT + 99 others); Wed, 29 Aug 2018 19:54:33 -0400 Received: from mga18.intel.com ([134.134.136.126]:26602 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727245AbeH2Xyd (ORCPT ); Wed, 29 Aug 2018 19:54:33 -0400 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Aug 2018 12:56:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,304,1531810800"; d="scan'208";a="85570799" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.20]) by fmsmga001.fm.intel.com with ESMTP; 29 Aug 2018 12:56:07 -0700 Date: Wed, 29 Aug 2018 12:56:07 -0700 From: Sean Christopherson To: Brijesh Singh Cc: x86@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Tom Lendacky , Thomas Gleixner , Borislav Petkov , "H. Peter Anvin" , Paolo Bonzini , Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: [PATCH v3 4/4] x86/kvm: use __decrypted attribute in shared variables Message-ID: <20180829195558.GA6801@linux.intel.com> References: <1535567040-1370-1-git-send-email-brijesh.singh@amd.com> <1535567040-1370-5-git-send-email-brijesh.singh@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1535567040-1370-5-git-send-email-brijesh.singh@amd.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 29, 2018 at 01:24:00PM -0500, Brijesh Singh wrote: > The following commit: > > 368a540e0232 (x86/kvmclock: Remove memblock dependency) Checkpatch prefers: Commit 368a540e0232 ("x86/kvmclock: Remove memblock dependency") That'll also save three lines in the commit message. > caused SEV guest regression. When SEV is active, we map the shared > variables (wall_clock and hv_clock_boot) with C=0 to ensure that both > the guest and the hypervisor is able to access the data. To map the Nit: s/is/are > variables we use kernel_physical_mapping_init() to split the large pages, > but this routine fails to allocate a new page. Before the above commit, > kvmclock initialization was called after memory allocator was available > but now its called early during boot. What about something like this to make the issue a bit clearer: variables we use kernel_physical_mapping_init() to split the large pages, but splitting large pages requires allocating a new PMD, which fails now that kvmclock initialization is called early during boot. > Recently we added a special .data..decrypted section to hold the shared > variables. This section is mapped with C=0 early during boot. Use > __decrypted attribute to put the wall_clock and hv_clock_boot in > .data..decrypted section so that they are mapped with C=0. > > Signed-off-by: Brijesh Singh > Fixes: 368a540e0232 ("x86/kvmclock: Remove memblock dependency") > Cc: Tom Lendacky > Cc: kvm@vger.kernel.org > Cc: Thomas Gleixner > Cc: Borislav Petkov > Cc: "H. Peter Anvin" > Cc: linux-kernel@vger.kernel.org > Cc: Paolo Bonzini > Cc: Sean Christopherson > Cc: kvm@vger.kernel.org > Cc: "Radim Krčmář" > --- > arch/x86/kernel/kvmclock.c | 30 +++++++++++++++++++++++++----- > 1 file changed, 25 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c > index 1e67646..08f5f8a 100644 > --- a/arch/x86/kernel/kvmclock.c > +++ b/arch/x86/kernel/kvmclock.c > @@ -28,6 +28,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -61,8 +62,8 @@ early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall); > (PAGE_SIZE / sizeof(struct pvclock_vsyscall_time_info)) > > static struct pvclock_vsyscall_time_info > - hv_clock_boot[HVC_BOOT_ARRAY_SIZE] __aligned(PAGE_SIZE); > -static struct pvclock_wall_clock wall_clock; > + hv_clock_boot[HVC_BOOT_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE); > +static struct pvclock_wall_clock wall_clock __decrypted; > static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu); > > static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void) > @@ -267,10 +268,29 @@ static int kvmclock_setup_percpu(unsigned int cpu) > return 0; > > /* Use the static page for the first CPUs, allocate otherwise */ > - if (cpu < HVC_BOOT_ARRAY_SIZE) > + if (cpu < HVC_BOOT_ARRAY_SIZE) { > p = &hv_clock_boot[cpu]; > - else > - p = kzalloc(sizeof(*p), GFP_KERNEL); > + } else { > + int rc; > + unsigned int sz = sizeof(*p); > + > + if (sev_active()) > + sz = PAGE_ALIGN(sz); This is a definite downside to the section approach. Unless I missed something, the section padding goes to waste since we don't have a mechanism in place to allocate into that section, e.g. as is we're burning nearly 2mb of data since we're only using 4k of the 2mb page. And every decrypted allocation can potentially fracture a large page since the allocator is unaware of the decrypted requirement. Might not be an issue for kvmclock since it's a one-time allocation, but we could suffer death by a thousand cuts if there are scenarios where a decrypted allocation isn't be persistent (VirtIO queues maybe?). Duplicating the full kernel tables for C=0 accesses doesn't suffer from these issues. And I think potential corruption issues due to mis-{aligned,size} objects can be detected through static analysis, build assertions and/or runtime checks. > + p = kzalloc(sz, GFP_KERNEL); For the SEV case, can't we do a straight kmalloc() since we zero out the page after decrypting it? > + > + /* > + * The physical address of per-cpu variable will be shared with > + * the hypervisor. Let's clear the C-bit before we assign the > + * memory to per_cpu variable. > + */ > + if (p && sev_active()) { > + rc = set_memory_decrypted((unsigned long)p, sz >> PAGE_SHIFT); > + if (rc) > + return rc; > + memset(p, 0, sz); > + } > + } > > per_cpu(hv_clock_per_cpu, cpu) = p; > return p ? 0 : -ENOMEM; > -- > 2.7.4 >