Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp594318pxb; Wed, 27 Jan 2021 16:09:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJw9nMfA7cdswSyljQ1FXHcBBNpOxx//FO71hwmwrQXZAWePBCSSeHqvgiYElZwWuzkikGrw X-Received: by 2002:a17:906:d189:: with SMTP id c9mr9011356ejz.36.1611792586077; Wed, 27 Jan 2021 16:09:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611792586; cv=none; d=google.com; s=arc-20160816; b=L943kur4nPOL9UmuArWYyGp9OpbTvBm5aMAUjxfXPpAnb6sxe1BVsEtvp0fnRAGNFQ pvC9MFdTgOlOmjKjYtm4+2QiNsyLJTKkHx/R/B1s5aBLCsDTR5tEonu/C5+9vGPdgdL2 fU/Q349hJYOWzI+DOEtRPrUxm85qut7gbV+Xm18C+OZlZ9VO+kvNqy5QJWOlnDmvueFB xt67VH6hfGsrGYiSFjHorbbF9rYtoD9sSxTuKn2433FZhiJ1E79u/yysxxcnty0wwC8x 0+xsF4jeqDXQQ0hGVvH7nqCgBE2CcExV0bnolAJWph3KIqkEoQ6DXxnnlaES4TXZcpNN lEkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=58ZT0JVD91Bka4oJD8AUawWGelkoDTZaGh/LPoBVWtE=; b=vBU7pQ4CRBhvw4dUEhmkHWKL/72almzULGoT+esi8eytN+XuydE/W1uCofwdUXRJNG Z6mgnM5M7l5FJ/4fspafLpcJZI0VNG8SkDvXA0lGiQHkpJpTJr5ZltpA5uBqKLbc0Vim KUYnMgOxNcWO8wssEeuGdG6KAdXxPYXG3PQCNTsUVYMMUnCrgQMWDtasg3TfeU9XnVPc sHeRZF7ZhYqPU60JDrZTj0que+7QgxbxoAk5IgIS4FeztZLyIoQzEGaRTrBL1Cfw98VX v2xa7T0wwPoJgnBIcimwnqv13X88/uQDRSdaM/n2/JHwI9f65aSGDxiLC/FEO/FT9e2v VJ2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b=Qdd4zdNz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=fb.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w16si1793188edi.602.2021.01.27.16.09.22; Wed, 27 Jan 2021 16:09:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b=Qdd4zdNz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=fb.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344187AbhA0RxS (ORCPT + 99 others); Wed, 27 Jan 2021 12:53:18 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:31068 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344236AbhA0Rw2 (ORCPT ); Wed, 27 Jan 2021 12:52:28 -0500 Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 10RHTXJn005687 for ; Wed, 27 Jan 2021 09:51:46 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=58ZT0JVD91Bka4oJD8AUawWGelkoDTZaGh/LPoBVWtE=; b=Qdd4zdNzIzp4YzBb5Ar/HjJty1IVFBwPzBTcIIRGgVMmApxa6QF5CDKj3MpdsnXYMk73 8Zxz3B3YFtKNshjrrKDTSaDfw3bUyYxz1zRjAoa9tZ5aCQJq6lEQx6nLjqzVvrvSIJeK /hi0mJUR4DZ/xuiXcdYSrU7f1GNIWii+WuI= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 36b7vwhsks-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 27 Jan 2021 09:51:45 -0800 Received: from intmgw002.25.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Wed, 27 Jan 2021 09:51:43 -0800 Received: from devvm1945.atn0.facebook.com (localhost [127.0.0.1]) by devvm1945.atn0.facebook.com (Postfix) with ESMTP id 336572520585; Wed, 27 Jan 2021 09:51:42 -0800 (PST) Received: (from saravanand@localhost) by devvm1945.atn0.facebook.com (8.15.2/8.15.2/Submit) id 10RHpfLZ3290703; Wed, 27 Jan 2021 09:51:41 -0800 X-Authentication-Warning: devvm1945.atn0.facebook.com: saravanand set sender to saravanand@fb.com using -f From: Saravanan D To: , , , CC: , , Saravanan D Subject: [PATCH V2] x86/mm: Tracking linear mapping split events Date: Wed, 27 Jan 2021 09:51:24 -0800 Message-ID: <20210127175124.3289879-1-saravanand@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe Content-Type: text/plain X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.343,18.0.737 definitions=2021-01-27_06:2021-01-27,2021-01-27 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 bulkscore=0 suspectscore=0 impostorscore=0 lowpriorityscore=0 mlxscore=0 clxscore=1011 mlxlogscore=999 spamscore=0 priorityscore=1501 malwarescore=0 phishscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2101270088 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Numerous hugepage splits in the linear mapping would give admins the signal to narrow down the sluggishness caused by TLB miss/reload. To help with debugging, we introduce monotonic lifetime hugepage split event counts since SYSTEM_RUNNING to be displayed as part of /proc/vmstat in x86 servers The lifetime split event information will be displayed at the bottom of /proc/vmstat .... swap_ra 0 swap_ra_hit 0 direct_map_2M_splits 139 direct_map_4M_splits 0 direct_map_1G_splits 7 nr_unstable 0 .... Ancillary debugfs split event counts exported to userspace via read-write endpoints : /sys/kernel/debug/x86/direct_map_[2M|4M|1G]_split dmesg log when user resets the debugfs split event count for debugging .... [ 232.470531] debugfs 2M Pages split event count(128) reset to 0 .... One of the many lasting (as we don't coalesce back) sources for huge page splits is tracing as the granular page attribute/permission changes would force the kernel to split code segments mapped to huge pages to smaller ones thereby increasing the probability of TLB miss/reload even after tracing has been stopped. Signed-off-by: Saravanan D --- arch/x86/mm/pat/set_memory.c | 117 ++++++++++++++++++++++++++++++++++ include/linux/vm_event_item.h | 8 +++ mm/vmstat.c | 8 +++ 3 files changed, 133 insertions(+) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 16f878c26667..97b6ef8dbd12 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -16,6 +16,8 @@ #include #include #include +#include +#include =20 #include #include @@ -76,6 +78,104 @@ static inline pgprot_t cachemode2pgprot(enum page_cac= he_mode pcm) =20 #ifdef CONFIG_PROC_FS static unsigned long direct_pages_count[PG_LEVEL_NUM]; +static unsigned long split_page_event_count[PG_LEVEL_NUM]; + +#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) +static int direct_map_2M_split_set(void *data, u64 val) +{ + switch (val) { + case 0: + break; + default: + return -EINVAL; + } + + pr_info("debugfs 2M Pages split event count(%lu) reset to 0", + split_page_event_count[PG_LEVEL_2M]); + split_page_event_count[PG_LEVEL_2M] =3D 0; + + return 0; +} + +static int direct_map_2M_split_get(void *data, u64 *val) +{ + *val =3D split_page_event_count[PG_LEVEL_2M]; + return 0; +} + +DEFINE_DEBUGFS_ATTRIBUTE(fops_direct_map_2M_split, direct_map_2M_split_g= et, + direct_map_2M_split_set, "%llu\n"); +#else +static int direct_map_4M_split_set(void *data, u64 val) +{ + switch (val) { + case 0: + break; + default: + return -EINVAL; + } + + pr_info("debugfs 4M Pages split event count(%lu) reset to 0", + split_page_event_count[PG_LEVEL_2M]); + split_page_event_count[PG_LEVEL_2M] =3D 0; + + return 0; +} + +static int direct_map_4M_split_get(void *data, u64 *val) +{ + *val =3D split_page_event_count[PG_LEVEL_2M]; + return 0; +} + +DEFINE_DEBUGFS_ATTRIBUTE(fops_direct_map_4M_split, direct_map_4M_split_g= et, + direct_map_4M_split_set, "%llu\n"); +#endif + +static int direct_map_1G_split_set(void *data, u64 val) +{ + switch (val) { + case 0: + break; + default: + return -EINVAL; + } + + pr_info("debugfs 1G Pages split event count(%lu) reset to 0", + split_page_event_count[PG_LEVEL_1G]); + split_page_event_count[PG_LEVEL_1G] =3D 0; + + return 0; +} + +static int direct_map_1G_split_get(void *data, u64 *val) +{ + *val =3D split_page_event_count[PG_LEVEL_1G]; + return 0; +} + +DEFINE_DEBUGFS_ATTRIBUTE(fops_direct_map_1G_split, direct_map_1G_split_g= et, + direct_map_1G_split_set, "%llu\n"); + +static __init int direct_map_split_debugfs_init(void) +{ +#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) + debugfs_create_file("direct_map_2M_split", 0600, + arch_debugfs_dir, NULL, + &fops_direct_map_2M_split); +#else + debugfs_create_file("direct_map_4M_split", 0600, + arch_debugfs_dir, NULL, + &fops_direct_map_4M_split); +#endif + if (direct_gbpages) + debugfs_create_file("direct_map_1G_split", 0600, + arch_debugfs_dir, NULL, + &fops_direct_map_1G_split); + return 0; +} + +late_initcall(direct_map_split_debugfs_init); =20 void update_page_count(int level, unsigned long pages) { @@ -85,12 +185,29 @@ void update_page_count(int level, unsigned long page= s) spin_unlock(&pgd_lock); } =20 +void update_split_page_event_count(int level) +{ + if (system_state =3D=3D SYSTEM_RUNNING) { + split_page_event_count[level]++; + if (level =3D=3D PG_LEVEL_2M) { +#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) + count_vm_event(DIRECT_MAP_2M_SPLIT); +#else + count_vm_event(DIRECT_MAP_4M_SPLIT); +#endif + } else if (level =3D=3D PG_LEVEL_1G) { + count_vm_event(DIRECT_MAP_1G_SPLIT); + } + } +} + static void split_page_count(int level) { if (direct_pages_count[level] =3D=3D 0) return; =20 direct_pages_count[level]--; + update_split_page_event_count(level); direct_pages_count[level - 1] +=3D PTRS_PER_PTE; } =20 diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.= h index 18e75974d4e3..439742d2435e 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -120,6 +120,14 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOU= T, #ifdef CONFIG_SWAP SWAP_RA, SWAP_RA_HIT, +#endif +#if defined(__x86_64__) +#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) + DIRECT_MAP_2M_SPLIT, +#else + DIRECT_MAP_4M_SPLIT, +#endif + DIRECT_MAP_1G_SPLIT, #endif NR_VM_EVENT_ITEMS }; diff --git a/mm/vmstat.c b/mm/vmstat.c index f8942160fc95..beaa2bb4f9dc 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1350,6 +1350,14 @@ const char * const vmstat_text[] =3D { "swap_ra", "swap_ra_hit", #endif +#if defined(__x86_64__) +#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) + "direct_map_2M_splits", +#else + "direct_map_4M_splits", +#endif + "direct_map_1G_splits", +#endif #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */ }; #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG = */ --=20 2.24.1