Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp1475888ybl; Wed, 28 Aug 2019 15:33:08 -0700 (PDT) X-Google-Smtp-Source: APXvYqzqqnDEvjxKOGmCjzPrT2iV1TDtPHM37X+WXbXM/Ke4wUKxFid6eS1oaD+LIwSfx63PLJTm X-Received: by 2002:a17:902:ff02:: with SMTP id f2mr6129620plj.99.1567031588421; Wed, 28 Aug 2019 15:33:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567031588; cv=none; d=google.com; s=arc-20160816; b=0/0NG2JeauPFi2Q+D1wQHRRjGumy6GjAU8DaJs4HcpSdReNt0pjXDJauT7xm/URWHu VhpwaotoF7RbsFk+NGZhbgulh+E2XbExKcbFErKdF/BHWnRupFBypARvO9fkRH0XFmya wQu6M7vLOMhk3ePPM8+iZ71pCwMatfqYrXREDQuLeQj9QCPJGqGFcADbpqtHGZ5Nh7n8 ji93Qb58B9p7BIdVVRQ1jUXYSE8d3NGXrZXRFBrnSVhug9bIbw1+af+59Rh1Onzofffm 7IR4+bNiBE49QZT6M7eRNR7M1KrKYn4BKt4UOVVXUqJ7l+0Vb7q9pB42zCS0LopK2mxj Yzzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=hPBcmp7c1sKiF5L9XJ6e7GIdSx6E/H4ZsHFuGlgosZY=; b=GRHg2i+EaRtqTPWPmtahSxDYvd72Uej8jvDIx57bcjly7qipbUSzGs/OdIZdP4TN4z 2n2CXydzn7VSWjOJgPLOELQ2OiIsrFxOAOIm0iA5gk/8bJzg8mALdlbJfXhKxJqCtHz6 Y+WAjJD2WTh6GP3YSjHTeI5+A/fBzVN4/rMKSS2/FlzMEUSRzG4Z/snA8Eokn5Dv9HMP t5E8nwZ6sIIu7R9YoKgzIHeceRfPrkDXZ0auZyhdbsU4G0yLLXOFBMGvNDuYXOPDXgRe euVdfqDnnGMN97G6gj7fSlQVNXVu7/2nf6QkVfc1FvwnnrAAEj4CL+A3ov6KPQBviIzy Bhtg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q128si503875pfb.192.2019.08.28.15.32.52; Wed, 28 Aug 2019 15:33:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726982AbfH1Wbx (ORCPT + 99 others); Wed, 28 Aug 2019 18:31:53 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:48645 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726658AbfH1Wbw (ORCPT ); Wed, 28 Aug 2019 18:31:52 -0400 Received: from p5de0b6c5.dip0.t-ipconnect.de ([93.224.182.197] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1i36TX-0005tv-Iv; Thu, 29 Aug 2019 00:31:36 +0200 Date: Thu, 29 Aug 2019 00:31:34 +0200 (CEST) From: Thomas Gleixner To: Song Liu cc: Dave Hansen , LKML , "x86@kernel.org" , Joerg Roedel , Andy Lutomirski , Peter Zijlstra , Rik van Riel , Steven Rostedt Subject: [PATCH] x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text In-Reply-To: <9B34E971-20ED-4A58-B086-AB94990B5A26@fb.com> Message-ID: References: <20190828142445.454151604@linutronix.de> <20190828143123.971884723@linutronix.de> <55bb026c-5d54-6ebf-608f-3f376fbec4e5@intel.com> <309E5006-E869-4761-ADE2-ADB7A1A63FF1@fb.com> <9B34E971-20ED-4A58-B086-AB94990B5A26@fb.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ftrace does not use text_poke() for enabling trace functionality. It uses its own mechanism and flips the whole kernel text to RW and back to RO. The CPA rework removed a loop based check of 4k pages which tried to preserve a large page by checking each 4k page whether the change would actually cover all pages in the large page. This resulted in endless loops for nothing as in testing it turned out that it actually never preserved anything. Of course testing missed to include ftrace, which is the one and only case which benefitted from the 4k loop. As a consequence enabling function tracing or ftrace based kprobes results in a full 4k split of the kernel text, which affects iTLB performance. The kernel RO protection is the only valid case where this can actually preserve large pages. All other static protections (RO data, data NX, PCI, BIOS) are truly static. So a conflict with those protections which results in a split should only ever happen when a change of memory next to a protected region is attempted. But these conflicts are rightfully splitting the large page to preserve the protected regions. In fact a change to the protected regions itself is a bug and is warned about. Add an exception for the static protection check for kernel text RO when the to be changed region spawns a full large page which allows to preserve the large mappings. This also prevents the syslog to be spammed about CPA violations when ftrace is used. The exception needs to be removed once ftrace switched over to text_poke() which avoids the whole issue. Fixes: 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely") Reported-by: Song Liu Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org --- arch/x86/mm/pageattr.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -516,7 +516,7 @@ static inline void check_conflict(int wa */ static inline pgprot_t static_protections(pgprot_t prot, unsigned long start, unsigned long pfn, unsigned long npg, - int warnlvl) + unsigned long lpsize, int warnlvl) { pgprotval_t forbidden, res; unsigned long end; @@ -535,9 +535,17 @@ static inline pgprot_t static_protection check_conflict(warnlvl, prot, res, start, end, pfn, "Text NX"); forbidden = res; - res = protect_kernel_text_ro(start, end); - check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO"); - forbidden |= res; + /* + * Special case to preserve a large page. If the change spawns the + * full large page mapping then there is no point to split it + * up. Happens with ftrace and is going to be removed once ftrace + * switched to text_poke(). + */ + if (lpsize != (npg * PAGE_SIZE) || (start & (lpsize - 1))) { + res = protect_kernel_text_ro(start, end); + check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO"); + forbidden |= res; + } /* Check the PFN directly */ res = protect_pci_bios(pfn, pfn + npg - 1); @@ -819,7 +827,7 @@ static int __should_split_large_page(pte * extra conditional required here. */ chk_prot = static_protections(old_prot, lpaddr, old_pfn, numpages, - CPA_CONFLICT); + psize, CPA_CONFLICT); if (WARN_ON_ONCE(pgprot_val(chk_prot) != pgprot_val(old_prot))) { /* @@ -855,7 +863,7 @@ static int __should_split_large_page(pte * protection requirement in the large page. */ new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages, - CPA_DETECT); + psize, CPA_DETECT); /* * If there is a conflict, split the large page. @@ -906,7 +914,8 @@ static void split_set_pte(struct cpa_dat if (!cpa->force_static_prot) goto set; - prot = static_protections(ref_prot, address, pfn, npg, CPA_PROTECT); + /* Hand in lpsize = 0 to enforce the protection mechanism */ + prot = static_protections(ref_prot, address, pfn, npg, 0, CPA_PROTECT); if (pgprot_val(prot) == pgprot_val(ref_prot)) goto set; @@ -1503,7 +1512,8 @@ static int __change_page_attr(struct cpa pgprot_val(new_prot) |= pgprot_val(cpa->mask_set); cpa_inc_4k_install(); - new_prot = static_protections(new_prot, address, pfn, 1, + /* Hand in lpsize = 0 to enforce the protection mechanism */ + new_prot = static_protections(new_prot, address, pfn, 1, 0, CPA_PROTECT); new_prot = pgprot_clear_protnone_bits(new_prot);