Received: by 10.223.185.116 with SMTP id b49csp3207276wrg; Mon, 5 Mar 2018 16:33:03 -0800 (PST) X-Google-Smtp-Source: AG47ELt0Qg4aSjaeR47Y1HcwVgJM/wyZ+lmY9RWHxGSIqy88tyJGy30bfgKx2vRQsyzZEsGnbab3 X-Received: by 2002:a17:902:b606:: with SMTP id b6-v6mr14549620pls.93.1520296383468; Mon, 05 Mar 2018 16:33:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520296383; cv=none; d=google.com; s=arc-20160816; b=jQf/p5bWiS1OeVzo8HIczjqAdLCgzNCxviOqWB8j34xTJpb6bB7WkQ9WgG5g93QrDv +9OH+AGN69+kedF/wNJOCCJeD2RM77wqZFmIlkQuYKXxUoTbgo/t32moQZDBaBLefTNx YzhtPoTmO+0SXP3awa68o8ZVdIsNp4NU5ANbZAhyG1hLYQdF8JnYt+hnAeLn2lrKA+WC 7l91pReaZHgIBPtfRrJOWy+jXhMXwxxNEeKxEaVSbKtBTwjLlDS1fm6RhY+y2pWRRN2R cKbTglERMXdX959dYqU9E2TIyIDO8kEG8GUBwYzHyiuyc7j6UV/yvvz8Tfhs6lHt85Xh kviA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:to:from:dkim-signature:arc-authentication-results; bh=h3rVuypm5GbF87smHo84TfJO/0okRDVlkb7hgxNOebo=; b=yc9p/NvG9x5VTly96jMfurR49eI1X0l0on9B7uaIb+U8H0H86h2rPCihMAyV1AypYV cqe1KzY9Du5NFXrO2tg4vRE/7gbKDVhYTswo9wqZF32GXzdOTwMilO7EABNUIhnPaD2t Y2kfz4CnVkaYeuVLIYYc3nkFzhjWSk6gsRrhaVOlNo6dNRHNuU5r4aDLugc+87YOiVYI vyp2FjiNCfdR54+/k/38jf/JMmJo09yWr/uhbBWO53sGOM4AOKa0O31ASQ6FKSmZ4DPt CkcQ+Ah+9Aoq/PVyqOlYB2q06rbQY77n2gzz9JJlEoQfeyg3vaUENQJ/+xiAiOrLY31k rg0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=sIE1nrDo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l7-v6si10190483pls.331.2018.03.05.16.32.49; Mon, 05 Mar 2018 16:33:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=sIE1nrDo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933463AbeCFA0t (ORCPT + 99 others); Mon, 5 Mar 2018 19:26:49 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:42920 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933337AbeCFA0l (ORCPT ); Mon, 5 Mar 2018 19:26:41 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w260LUDK073674; Tue, 6 Mar 2018 00:26:36 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2017-10-26; bh=h3rVuypm5GbF87smHo84TfJO/0okRDVlkb7hgxNOebo=; b=sIE1nrDoT+ROzZM+pxaEmd3Ed4qT5QkOHbCFw6mPEsOERezBniO5Jc6V/3dLE0RkJmL0 pcRRBCfWUwQTerw5rLsUcvnHqbDxyd2nq3QI8LfgyWLuDRoOycEtfn9k7TE8rOGyuxN7 XbhDdOJHPU8sE6uYeQ1s1z8SQTpZG9dFdJVlfg7rG+zYj3QEDscrgFw97U3sjHlz/f+A w0iInrHycLO8b7KOx4DxEOCRvKtwDIzRzBfDUZsr4B5Yvt4X2sI5utsohygRc3Q6i0kP WmhoumXHjzNwGF0i7x/U8N7ET/t7ycSkvcPM2ZzcBXh+4Ol69/f6y9/Z88HcNFperBDu nA== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2120.oracle.com with ESMTP id 2ghe5xgf11-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 06 Mar 2018 00:26:36 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w260QZvb030877 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 6 Mar 2018 00:26:35 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w260QZx2022217; Tue, 6 Mar 2018 00:26:35 GMT Received: from localhost.localdomain (/98.216.35.41) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 05 Mar 2018 16:26:35 -0800 From: Pavel Tatashin To: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, linux-kernel@vger.kernel.org, Alexander.Levin@microsoft.com, dan.j.williams@intel.com, sathyanarayanan.kuppuswamy@intel.com, pankaj.laxminarayan.bharadiya@intel.com, akuster@mvista.com, cminyard@mvista.com, pasha.tatashin@oracle.com, gregkh@linuxfoundation.org, stable@vger.kernel.org Subject: [PATCH 4.1 41/65] kaiser: vmstat show NR_KAISERTABLE as nr_overhead Date: Mon, 5 Mar 2018 19:25:14 -0500 Message-Id: <20180306002538.1761-42-pasha.tatashin@oracle.com> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180306002538.1761-1-pasha.tatashin@oracle.com> References: <20180306002538.1761-1-pasha.tatashin@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8823 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=881 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1803060003 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Hugh Dickins The kaiser update made an interesting choice, never to free any shadow page tables. Contention on global spinlock was worrying, particularly with it held across page table scans when freeing. Something had to be done: I was going to add refcounting; but simply never to free them is an appealing choice, minimizing contention without complicating the code (the more a page table is found already, the less the spinlock is used). But leaking pages in this way is also a worry: can we get away with it? At the very least, we need a count to show how bad it actually gets: in principle, one might end up wasting about 1/256 of memory that way (1/512 for when direct-mapped pages have to be user-mapped, plus 1/512 for when they are user-mapped from the vmalloc area on another occasion (but we don't have vmalloc'ed stacks, so only large ldts are vmalloc'ed). Add per-cpu stat NR_KAISERTABLE: including 256 at startup for the shared pgd entries, and 1 for each intermediate page table added thereafter for user-mapping - but leave out the 1 per mm, for its shadow pgd, because that distracts from the monotonic increase. Shown in /proc/vmstat as nr_overhead (0 if kaiser not enabled). In practice, it doesn't look so bad so far: more like 1/12000 after nine hours of gtests below; and movable pageblock segregation should tend to cluster the kaiser tables into a subset of the address space (if not, they will be bad for compaction too). But production may tell a different story: keep an eye on this number, and bring back lighter freeing if it gets out of control (maybe a shrinker). Signed-off-by: Hugh Dickins Acked-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 3e3d38fd9832e82a8cb1a5b1154acfa43ac08d15) Signed-off-by: Pavel Tatashin --- arch/x86/mm/kaiser.c | 16 +++++++++++----- include/linux/mmzone.h | 3 ++- mm/vmstat.c | 1 + 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/x86/mm/kaiser.c b/arch/x86/mm/kaiser.c index 8996f3292596..50d650799f39 100644 --- a/arch/x86/mm/kaiser.c +++ b/arch/x86/mm/kaiser.c @@ -122,9 +122,11 @@ static pte_t *kaiser_pagetable_walk(unsigned long address, bool is_atomic) if (!new_pmd_page) return NULL; spin_lock(&shadow_table_allocation_lock); - if (pud_none(*pud)) + if (pud_none(*pud)) { set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page))); - else + __inc_zone_page_state(virt_to_page((void *) + new_pmd_page), NR_KAISERTABLE); + } else free_page(new_pmd_page); spin_unlock(&shadow_table_allocation_lock); } @@ -140,9 +142,11 @@ static pte_t *kaiser_pagetable_walk(unsigned long address, bool is_atomic) if (!new_pte_page) return NULL; spin_lock(&shadow_table_allocation_lock); - if (pmd_none(*pmd)) + if (pmd_none(*pmd)) { set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page))); - else + __inc_zone_page_state(virt_to_page((void *) + new_pte_page), NR_KAISERTABLE); + } else free_page(new_pte_page); spin_unlock(&shadow_table_allocation_lock); } @@ -206,11 +210,13 @@ static void __init kaiser_init_all_pgds(void) pgd = native_get_shadow_pgd(pgd_offset_k((unsigned long )0)); for (i = PTRS_PER_PGD / 2; i < PTRS_PER_PGD; i++) { pgd_t new_pgd; - pud_t *pud = pud_alloc_one(&init_mm, PAGE_OFFSET + i * PGDIR_SIZE); + pud_t *pud = pud_alloc_one(&init_mm, + PAGE_OFFSET + i * PGDIR_SIZE); if (!pud) { WARN_ON(1); break; } + inc_zone_page_state(virt_to_page(pud), NR_KAISERTABLE); new_pgd = __pgd(_KERNPG_TABLE |__pa(pud)); /* * Make sure not to stomp on some other pgd entry. diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 54d74f6eb233..42c56e0c947f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -131,8 +131,9 @@ enum zone_stat_item { NR_SLAB_RECLAIMABLE, NR_SLAB_UNRECLAIMABLE, NR_PAGETABLE, /* used for pagetables */ - NR_KERNEL_STACK, /* Second 128 byte cacheline */ + NR_KERNEL_STACK, + NR_KAISERTABLE, NR_UNSTABLE_NFS, /* NFS unstable pages */ NR_BOUNCE, NR_VMSCAN_WRITE, diff --git a/mm/vmstat.c b/mm/vmstat.c index 4f5cd974e11a..8e0cbcd0fccc 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -714,6 +714,7 @@ const char * const vmstat_text[] = { "nr_slab_unreclaimable", "nr_page_table_pages", "nr_kernel_stack", + "nr_overhead", "nr_unstable", "nr_bounce", "nr_vmscan_write", -- 2.16.2