Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp34484pxb; Wed, 18 Aug 2021 15:13:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzvqmILhF04rhicyMdA5FAQMKMkB/fK3nk3KR1DELxcVCWsifviFl8nMEC+OL7dXKx4vnsq X-Received: by 2002:a92:d304:: with SMTP id x4mr7622864ila.82.1629324802771; Wed, 18 Aug 2021 15:13:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629324802; cv=none; d=google.com; s=arc-20160816; b=Q14Uq5i3r4CNLm9r0HcL1jWzvgpcz655dNyWE5zgYmE7Eo/l5tI9XNTdZ6qPULu4nv 3y1m8mC9nTAP6FXeu+To+8/6/6NnVC9lizToH8RfMfC1sWbPtVDHK9Aon2pPPQjWEk2M K3kK5u0bCb3lM867VYfa9EPFUv+VlFeD22stFgHOUZ9BFuxtqyt7tVaspyq2F5ENlMsV B8Dnw3V1zvLW6nA/obFxf3FpYUwGnGWZM5CO5lKGPoBvILfSt6bT0pbmSAIaXRNKQ1ab dp2DGYGr4NUSYTOQQp3qz+egcmv6mgC9VWZvk0L5cx4sKDD/n9dx5yvvKnJg8FHz+ZYO ++4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from; bh=Bb+s5T06B3Xl+7saFUlYq0X/RH0PPIHxg/THoOGwdcs=; b=QU38LkKE4x5RAsSrqISthRM/K60VjFHsmcmLP2wBHMS7k11YvsJv4l3v+K0dmJe+cQ 4rCYtYaLoI52ViaU3Tqx6TK2YYl8LR3yqvhpp+3HdPZAeR1gHaNOZu2FXiRlMpM7ddGe a165NR/1pV0WTT1wvjSIBjKBV4O+cpg7/XyPEG1vhnm+IZQO86e7qqMUQSHmNAMENPYk ZyBGaESaiH8aPlsLcJXs/lOo06+sSC6cSOk8ti1+IWmDrDfDhedugb4yjb9gs/euWGcD HJtSChjQ1BeDb+c/ACh1D7WfMNHewBsgnh3K3QGEQxOO9/dwfsGxk4PcbjBmIvCwKudj wbow== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g2si1050862iob.28.2021.08.18.15.13.11; Wed, 18 Aug 2021 15:13:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234379AbhHRWNE (ORCPT + 99 others); Wed, 18 Aug 2021 18:13:04 -0400 Received: from mga11.intel.com ([192.55.52.93]:44025 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234106AbhHRWND (ORCPT ); Wed, 18 Aug 2021 18:13:03 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10080"; a="213318029" X-IronPort-AV: E=Sophos;i="5.84,332,1620716400"; d="scan'208";a="213318029" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Aug 2021 15:12:28 -0700 X-IronPort-AV: E=Sophos;i="5.84,332,1620716400"; d="scan'208";a="531870684" Received: from dballa1x-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.212.156.71]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Aug 2021 15:12:28 -0700 From: Rick Edgecombe To: x86@kernel.org, bp@alien8.de, mingo@redhat.com, tglx@linutronix.de, hpa@zytor.com, peterz@infradead.org, luto@kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org, wency@cn.fujitsu.com Subject: [PATCH] x86/mm: Flush before free in remove_pagetable() Date: Wed, 18 Aug 2021 15:10:26 -0700 Message-Id: <20210818221026.10794-1-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In remove_pagetable(), page tables may be freed before the TLB is flushed. The upper page tables are zapped before freeing the lower levels. However, without the flush the lower tables can still remain in paging-structure caches and so data that is written to the re-allocated page can control these mappings. For some reason there is only a flush lower down in remove_pte_table(), however, this will not be hit in the case of large pages on the direct map which is common. Currently remove_pagetable() is called from a few places in the hot unplug codepath and memremap unmapping operations. To properly tear down these mappings, gather the page tables using a simple linked list based in the table's struct page. Then flush the TLB before actually freeing the pages. Cc: stable@vger.kernel.org Fixes: ae9aae9eda2d ("memory-hotplug: common APIs to support page tables hot-remove") Acked-by: Dave Hansen Signed-off-by: Rick Edgecombe --- This wasn't observed causing any functional problem in normal runtime. AFAICT it can't be triggered from unprivileged userspace. arch/x86/mm/init_64.c | 60 ++++++++++++++++++++++++++++--------------- 1 file changed, 39 insertions(+), 21 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index ddeaba947eb3..3c0323ad99da 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -992,6 +992,23 @@ static void __meminit free_pagetable(struct page *page, int order) free_pages((unsigned long)page_address(page), order); } +static void __meminit gather_table(struct page *page, struct list_head *tables) +{ + list_add(&page->lru, tables); +} + +static void __meminit gather_table_finish(struct list_head *tables) +{ + struct page *page, *next; + + flush_tlb_all(); + + list_for_each_entry_safe(page, next, tables, lru) { + list_del(&page->lru); + free_pagetable(page, 0); + } +} + static void __meminit free_hugepage_table(struct page *page, struct vmem_altmap *altmap) { @@ -1001,7 +1018,7 @@ static void __meminit free_hugepage_table(struct page *page, free_pagetable(page, get_order(PMD_SIZE)); } -static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd) +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd, struct list_head *tables) { pte_t *pte; int i; @@ -1012,14 +1029,14 @@ static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd) return; } - /* free a pte talbe */ - free_pagetable(pmd_page(*pmd), 0); + /* gather a pte table */ + gather_table(pmd_page(*pmd), tables); spin_lock(&init_mm.page_table_lock); pmd_clear(pmd); spin_unlock(&init_mm.page_table_lock); } -static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud) +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud, struct list_head *tables) { pmd_t *pmd; int i; @@ -1030,14 +1047,14 @@ static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud) return; } - /* free a pmd talbe */ - free_pagetable(pud_page(*pud), 0); + /* gather a pmd table */ + gather_table(pud_page(*pud), tables); spin_lock(&init_mm.page_table_lock); pud_clear(pud); spin_unlock(&init_mm.page_table_lock); } -static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d) +static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d, struct list_head *tables) { pud_t *pud; int i; @@ -1048,8 +1065,8 @@ static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d) return; } - /* free a pud talbe */ - free_pagetable(p4d_page(*p4d), 0); + /* gather a pud table */ + gather_table(p4d_page(*p4d), tables); spin_lock(&init_mm.page_table_lock); p4d_clear(p4d); spin_unlock(&init_mm.page_table_lock); @@ -1057,7 +1074,7 @@ static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d) static void __meminit remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end, - bool direct) + bool direct, struct list_head *tables) { unsigned long next, pages = 0; pte_t *pte; @@ -1100,7 +1117,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end, static void __meminit remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end, - bool direct, struct vmem_altmap *altmap) + bool direct, struct vmem_altmap *altmap, struct list_head *tables) { unsigned long next, pages = 0; pte_t *pte_base; @@ -1138,8 +1155,8 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end, } pte_base = (pte_t *)pmd_page_vaddr(*pmd); - remove_pte_table(pte_base, addr, next, direct); - free_pte_table(pte_base, pmd); + remove_pte_table(pte_base, addr, next, direct, tables); + free_pte_table(pte_base, pmd, tables); } /* Call free_pmd_table() in remove_pud_table(). */ @@ -1149,7 +1166,7 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end, static void __meminit remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end, - struct vmem_altmap *altmap, bool direct) + struct vmem_altmap *altmap, bool direct, struct list_head *tables) { unsigned long next, pages = 0; pmd_t *pmd_base; @@ -1173,8 +1190,8 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end, } pmd_base = pmd_offset(pud, 0); - remove_pmd_table(pmd_base, addr, next, direct, altmap); - free_pmd_table(pmd_base, pud); + remove_pmd_table(pmd_base, addr, next, direct, altmap, tables); + free_pmd_table(pmd_base, pud, tables); } if (direct) @@ -1183,7 +1200,7 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end, static void __meminit remove_p4d_table(p4d_t *p4d_start, unsigned long addr, unsigned long end, - struct vmem_altmap *altmap, bool direct) + struct vmem_altmap *altmap, bool direct, struct list_head *tables) { unsigned long next, pages = 0; pud_t *pud_base; @@ -1199,14 +1216,14 @@ remove_p4d_table(p4d_t *p4d_start, unsigned long addr, unsigned long end, BUILD_BUG_ON(p4d_large(*p4d)); pud_base = pud_offset(p4d, 0); - remove_pud_table(pud_base, addr, next, altmap, direct); + remove_pud_table(pud_base, addr, next, altmap, direct, tables); /* * For 4-level page tables we do not want to free PUDs, but in the * 5-level case we should free them. This code will have to change * to adapt for boot-time switching between 4 and 5 level page tables. */ if (pgtable_l5_enabled()) - free_pud_table(pud_base, p4d); + free_pud_table(pud_base, p4d, tables); } if (direct) @@ -1220,6 +1237,7 @@ remove_pagetable(unsigned long start, unsigned long end, bool direct, { unsigned long next; unsigned long addr; + LIST_HEAD(tables); pgd_t *pgd; p4d_t *p4d; @@ -1231,10 +1249,10 @@ remove_pagetable(unsigned long start, unsigned long end, bool direct, continue; p4d = p4d_offset(pgd, 0); - remove_p4d_table(p4d, addr, next, altmap, direct); + remove_p4d_table(p4d, addr, next, altmap, direct, &tables); } - flush_tlb_all(); + gather_table_finish(&tables); } void __ref vmemmap_free(unsigned long start, unsigned long end, -- 2.17.1