Received: by 10.223.185.116 with SMTP id b49csp3210494wrg; Mon, 5 Mar 2018 16:37:27 -0800 (PST) X-Google-Smtp-Source: AG47ELsA6n1p6MV/WhrEgeXDrEoEpC7mliCja4v4Je4xOBsBoDgJVd3K873r9gPcGGriupxn27bJ X-Received: by 10.98.16.131 with SMTP id 3mr17128346pfq.188.1520296647252; Mon, 05 Mar 2018 16:37:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520296647; cv=none; d=google.com; s=arc-20160816; b=Abv2/LMWcRSD8vag1N7SSQTk8Y1CiiNDanWfWdWbnw17wYsU3GerVUC2N/6+sIY0RS NlWSN6sjvP7Vicn6m8vJD/mCDplMd1IGCo5npExBHFGz/X9YXw9Lu+NEGOg51ir4w1AF 4oIvuv4ziXRfHuRyMSln7UNYGftsqZ0w3mIeSoPALI/DdrxsZBYnkgA81JjnFoqwHY6Y 21wXDzTTUhZh+inSrfbdVnt9yovd+txrSVQ0oe+S59rpMS2Fm8dNMc+Cz16kuMw+IGiu qA2tZBR95KIwm8vkeBOftHAMC0IT4A4YtY7Xrzz/d0V+Ew5ByYltmk3k06CFTjRVVBGJ Tfdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:to:from:dkim-signature:arc-authentication-results; bh=b0sYpNZcTYiUhKugHUt8j7U/BMw7oc5hNhgDLi3c3Og=; b=SHyk2RBaEYzFLXlqvBUg5UQsGnCBzOx6+f0zCVdHdiiMNZrndJ6gUe1StXMMXbK5S8 0EBZhymtmjmA4DspIiOTxvdRHIBayyth0pj9N/7MDk3zs/JrQ34db9iLTPwAq6H9KUbD mrPcpYI+170wsRfe8fMlZhc8pqOhfYQPJZKFDefOBcMXxltoy0kx2z5IgIqkXcBqsxAN 2SBrJFtFdHKE2y5s6I2o4ZluVTLSAWJHjwklf0Wa+ZDq/Hxz7wUF61syre7RPdS/BQOZ uXQLxvPRW8HY6kvdhWay5YWmH7K7hZwKXH0HN35K/p1LQIkrWmEVvQVt040EILWs66Tn JD2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=g5TJO54h; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q126si11071487pfc.43.2018.03.05.16.37.13; Mon, 05 Mar 2018 16:37:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=g5TJO54h; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933248AbeCFA0d (ORCPT + 99 others); Mon, 5 Mar 2018 19:26:33 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:34950 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933192AbeCFA02 (ORCPT ); Mon, 5 Mar 2018 19:26:28 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w260Mbmc115618; Tue, 6 Mar 2018 00:26:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2017-10-26; bh=b0sYpNZcTYiUhKugHUt8j7U/BMw7oc5hNhgDLi3c3Og=; b=g5TJO54hqKjqzZJVD14ipgxLFq9eEbJhxBPr74JXLPOwpl43uBGssdweuNbEp7KQZyZt 5+CPDKuDnDew4YQLghoTKJ6clnI/tu9HAGmt0eJ5C/exLZqttRolCu60lpYeS3p591hG QFq5bKQunMFr4s+ZOXJyXlK4egq55D787HwEKWOrkcffAzuuQUwUvMojPtzv+jOGPN56 GJhz/QCP0AxY3BamPC4WtZuWE8qz4B/i0wvG8t9Aosb0FjBLGfXuVsKPYKHAteVqRTk8 ecqkkREAFwAa6JHQmHXYzP+AwZFo6rlXn8BErcOvFLPVMqvFzgrcFWuhRVBgsDvZZvAa gg== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2130.oracle.com with ESMTP id 2ghdxf8k4w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 06 Mar 2018 00:26:24 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w260QNGx025523 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 6 Mar 2018 00:26:24 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w260QNh2022161; Tue, 6 Mar 2018 00:26:23 GMT Received: from localhost.localdomain (/98.216.35.41) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 05 Mar 2018 16:26:23 -0800 From: Pavel Tatashin To: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, linux-kernel@vger.kernel.org, Alexander.Levin@microsoft.com, dan.j.williams@intel.com, sathyanarayanan.kuppuswamy@intel.com, pankaj.laxminarayan.bharadiya@intel.com, akuster@mvista.com, cminyard@mvista.com, pasha.tatashin@oracle.com, gregkh@linuxfoundation.org, stable@vger.kernel.org Subject: [PATCH 4.1 28/65] kaiser: do not set _PAGE_NX on pgd_none Date: Mon, 5 Mar 2018 19:25:01 -0500 Message-Id: <20180306002538.1761-29-pasha.tatashin@oracle.com> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180306002538.1761-1-pasha.tatashin@oracle.com> References: <20180306002538.1761-1-pasha.tatashin@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8823 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1803060003 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Hugh Dickins native_pgd_clear() uses native_set_pgd(), so native_set_pgd() must avoid setting the _PAGE_NX bit on an otherwise pgd_none() entry: usually that just generated a warning on exit, but sometimes more mysterious and damaging failures (our production machines could not complete booting). The original fix to this just avoided adding _PAGE_NX to an empty entry; but eventually more problems surfaced with kexec, and EFI mapping expected to be a problem too. So now instead change native_set_pgd() to update shadow only if _PAGE_USER: A few places (kernel/machine_kexec_64.c, platform/efi/efi_64.c for sure) use set_pgd() to set up a temporary internal virtual address space, with physical pages remapped at what Kaiser regards as userspace addresses: Kaiser then assumes a shadow pgd follows, which it will try to corrupt. This appears to be responsible for the recent kexec and kdump failures; though it's unclear how those did not manifest as a problem before. Ah, the shadow pgd will only be assumed to "follow" if the requested pgd is on an even-numbered page: so I suppose it was going wrong 50% of the time all along. What we need is a flag to set_pgd(), to tell it we're dealing with userspace. Er, isn't that what the pgd's _PAGE_USER bit is saying? Add a test for that. But we cannot do the same for pgd_clear() (which may be called to clear corrupted entries - set aside the question of "corrupt in which pgd?" until later), so there just rely on pgd_clear() not being called in the problematic cases - with a WARN_ON_ONCE() which should fire half the time if it is. But this is getting too big for an inline function: move it into arch/x86/mm/kaiser.c (which then demands a boot/compressed mod); and de-void and de-space native_get_shadow/normal_pgd() while here. Signed-off-by: Hugh Dickins Acked-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman (cherry picked from commit edde73205b3fdde8c8a3adfce78cc6d0de72386b) Signed-off-by: Pavel Tatashin --- arch/x86/boot/compressed/misc.h | 1 + arch/x86/include/asm/pgtable_64.h | 51 ++++++++++----------------------------- arch/x86/mm/kaiser.c | 42 ++++++++++++++++++++++++++++++++ 3 files changed, 56 insertions(+), 38 deletions(-) diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h index 805d25ca5f1d..cbaa2ebfa88d 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -9,6 +9,7 @@ */ #undef CONFIG_PARAVIRT #undef CONFIG_PARAVIRT_SPINLOCKS +#undef CONFIG_KAISER #undef CONFIG_KASAN #include diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index b0adf0d11c9a..8629be9e6649 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -107,61 +107,36 @@ static inline void native_pud_clear(pud_t *pud) } #ifdef CONFIG_KAISER -static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp) +extern pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd); + +static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp) { - return (pgd_t *)(void*)((unsigned long)(void*)pgdp | (unsigned long)PAGE_SIZE); + return (pgd_t *)((unsigned long)pgdp | (unsigned long)PAGE_SIZE); } -static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp) +static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp) { - return (pgd_t *)(void*)((unsigned long)(void*)pgdp & ~(unsigned long)PAGE_SIZE); + return (pgd_t *)((unsigned long)pgdp & ~(unsigned long)PAGE_SIZE); } #else -static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp) +static inline pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd) +{ + return pgd; +} +static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp) { BUILD_BUG_ON(1); return NULL; } -static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp) +static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp) { return pgdp; } #endif /* CONFIG_KAISER */ -/* - * Page table pages are page-aligned. The lower half of the top - * level is used for userspace and the top half for the kernel. - * This returns true for user pages that need to get copied into - * both the user and kernel copies of the page tables, and false - * for kernel pages that should only be in the kernel copy. - */ -static inline bool is_userspace_pgd(void *__ptr) -{ - unsigned long ptr = (unsigned long)__ptr; - - return ((ptr % PAGE_SIZE) < (PAGE_SIZE / 2)); -} - static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd) { -#ifdef CONFIG_KAISER - pteval_t extra_kern_pgd_flags = 0; - /* Do we need to also populate the shadow pgd? */ - if (is_userspace_pgd(pgdp)) { - native_get_shadow_pgd(pgdp)->pgd = pgd.pgd; - /* - * Even if the entry is *mapping* userspace, ensure - * that userspace can not use it. This way, if we - * get out to userspace running on the kernel CR3, - * userspace will crash instead of running. - */ - extra_kern_pgd_flags = _PAGE_NX; - } - pgdp->pgd = pgd.pgd; - pgdp->pgd |= extra_kern_pgd_flags; -#else /* CONFIG_KAISER */ - *pgdp = pgd; -#endif + *pgdp = kaiser_set_shadow_pgd(pgdp, pgd); } static inline void native_pgd_clear(pgd_t *pgd) diff --git a/arch/x86/mm/kaiser.c b/arch/x86/mm/kaiser.c index f29c0db1c9b4..df7f6591d5aa 100644 --- a/arch/x86/mm/kaiser.c +++ b/arch/x86/mm/kaiser.c @@ -303,4 +303,46 @@ void kaiser_remove_mapping(unsigned long start, unsigned long size) unmap_pud_range_nofree(pgd, addr, end); } } + +/* + * Page table pages are page-aligned. The lower half of the top + * level is used for userspace and the top half for the kernel. + * This returns true for user pages that need to get copied into + * both the user and kernel copies of the page tables, and false + * for kernel pages that should only be in the kernel copy. + */ +static inline bool is_userspace_pgd(pgd_t *pgdp) +{ + return ((unsigned long)pgdp % PAGE_SIZE) < (PAGE_SIZE / 2); +} + +pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd) +{ + /* + * Do we need to also populate the shadow pgd? Check _PAGE_USER to + * skip cases like kexec and EFI which make temporary low mappings. + */ + if (pgd.pgd & _PAGE_USER) { + if (is_userspace_pgd(pgdp)) { + native_get_shadow_pgd(pgdp)->pgd = pgd.pgd; + /* + * Even if the entry is *mapping* userspace, ensure + * that userspace can not use it. This way, if we + * get out to userspace running on the kernel CR3, + * userspace will crash instead of running. + */ + pgd.pgd |= _PAGE_NX; + } + } else if (!pgd.pgd) { + /* + * pgd_clear() cannot check _PAGE_USER, and is even used to + * clear corrupted pgd entries: so just rely on cases like + * kexec and EFI never to be using pgd_clear(). + */ + if (!WARN_ON_ONCE((unsigned long)pgdp & PAGE_SIZE) && + is_userspace_pgd(pgdp)) + native_get_shadow_pgd(pgdp)->pgd = pgd.pgd; + } + return pgd; +} #endif /* CONFIG_KAISER */ -- 2.16.2