Received: by 10.223.164.202 with SMTP id h10csp1952503wrb; Mon, 27 Nov 2017 09:40:16 -0800 (PST) X-Google-Smtp-Source: AGs4zMYA7fmaAqLo5UmQ/3EzHRpHn8hGjPOW1NJO8oTv9DxB6IN3Q/PFWfuju3IYpC3McUCgKGaV X-Received: by 10.84.172.195 with SMTP id n61mr39433991plb.49.1511804416180; Mon, 27 Nov 2017 09:40:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511804416; cv=none; d=google.com; s=arc-20160816; b=cLR9m5nnYkRK8c1oQZJqLULLsUh12TjINbaJX93BR8DbiqHfJ7NcdX743VK7Li42p0 ApUI13LlEyJBf4S9INX0zw2Y7z9CR8DL+77y06x4Eq+NPf6ETqPpRzO1pls6gtdLWInL ntbiAq4QiWHu8klJ0VNEjdVu76r08YnDHHhGDkAOqa+6onuX9NvnxqENSeOtHRlZQrv0 na8vf64Lwac6sTO+KZB/qZj7WDsHM5wVGjpG9ZlRguDTgFEmMJPg4MnhGVj7QHKhlIhR V1Gjsz8lpbv84lcdSye3nzg6CPkdJD95xktrfejf1V218sH/1bM3PBhNvsnhrawDPQCG /axA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-disposition:mime-version:references:subject:cc:to:from:date :arc-authentication-results; bh=Ed8aCXazPKc++zEpUcx/0E26wccemj12s2Y9puqLRcc=; b=ReKLzvOpaOSp9/5dDRhR9hHsDr9Ix2BBGo7SENw0868iaLrvaCvWpbkUT4tqdBKkO1 zG3Wu57UbbNssmV1HR5sSysjf7xKLpYg/CrhLzCnNfvvORAlHr9P1XGuGVlMjbSF7cu/ 1TddWhA6w8Pw6D8t6v0zAc3eunH9BXJBvtpyi0daTGSReJ+3Ld5445UXrslHVtyaPM0W rha4GRnrs5qinqqNo331U2d5gqTbBqsu5oA2+IOk87VEJvn5Lohvj8ROjwM6PoqSL6xi 81pJJamKLlnlECV10ke116aeq0aQuJTcBt9/wyrZZXSHvtr5vq1P0Vc/9zd8ANfy+KOT fVBw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o32si1986195pld.334.2017.11.27.09.40.03; Mon, 27 Nov 2017 09:40:16 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753943AbdK0Rir (ORCPT + 78 others); Mon, 27 Nov 2017 12:38:47 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:47314 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752753AbdK0Rip (ORCPT ); Mon, 27 Nov 2017 12:38:45 -0500 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vARHYR5C119077 for ; Mon, 27 Nov 2017 12:38:45 -0500 Received: from e06smtp14.uk.ibm.com (e06smtp14.uk.ibm.com [195.75.94.110]) by mx0a-001b2d01.pphosted.com with ESMTP id 2egp6nk68p-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 27 Nov 2017 12:38:45 -0500 Received: from localhost by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 27 Nov 2017 17:38:42 -0000 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp14.uk.ibm.com (192.168.101.144) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 27 Nov 2017 17:38:38 -0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vARHccVK34472122; Mon, 27 Nov 2017 17:38:38 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 14A85A4040; Mon, 27 Nov 2017 17:33:12 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9AE5CA4057; Mon, 27 Nov 2017 17:33:11 +0000 (GMT) Received: from samekh (unknown [9.162.48.51]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Mon, 27 Nov 2017 17:33:11 +0000 (GMT) Date: Mon, 27 Nov 2017 17:38:36 +0000 From: Andrea Reale To: Robin Murphy Cc: linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com, realean2@ie.ibm.com, mhocko@suse.com, m.bielski@virtualopensystems.com, scott.branden@broadcom.com, catalin.marinas@arm.com, will.deacon@arm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, arunks@qti.qualcomm.com, qiuxishi@huawei.com Subject: Re: [PATCH v2 3/5] mm: memory_hotplug: memblock to track partially removed vmemmap mem References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-TM-AS-GCONF: 00 x-cbid: 17112717-0016-0000-0000-000005060E64 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17112717-0017-0000-0000-00002841EF26 Message-Id: <20171127173835.GC12687@samekh> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-11-27_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1711270236 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Robin, On Mon 27 Nov 2017, 15:20, Robin Murphy wrote: > On 23/11/17 11:14, Andrea Reale wrote: > >When hot-removing memory we need to free vmemmap memory. > > What problems arise if we don't? Is it only for the sake of freeing up some > pages here and there, or is there something more fundamental? > It is just for freeing up pages, but imho we are talking about a relevant number of pages. For example, assuming 4K pages, to describe one hot added section of 1GB of new memory we need ~14MBs of vmemmap space (if my back of the envelope math is not wrong). This memory would be leaked if we do not do the cleanup in hot remove. If we do hot remove sections many times in the lifetime of a system, this quantity can become sizeable. > >However, depending on the memory is being removed, it might > >not be always possible to free a full vmemmap page / huge-page > >because part of it might still be used. > > > >Commit ae9aae9eda2d ("memory-hotplug: common APIs to support page tables > >hot-remove") introduced a workaround for x86 > >hot-remove, by which partially unused areas are filled with > >the 0xFD constant. Full pages are only removed when fully > >filled by 0xFDs. > > > >This commit introduces a MEMBLOCK_UNUSED_VMEMMAP memblock flag, with > >the goal of using it in place of 0xFDs. For now, this will be used for > >the arm64 port of memory hot remove, but the idea is to eventually use > >the same mechanism for x86 as well. > > > >Signed-off-by: Andrea Reale > >Signed-off-by: Maciej Bielski > >--- > > include/linux/memblock.h | 12 ++++++++++++ > > mm/memblock.c | 32 ++++++++++++++++++++++++++++++++ > > 2 files changed, 44 insertions(+) > > > >diff --git a/include/linux/memblock.h b/include/linux/memblock.h > >index bae11c7..0daec05 100644 > >--- a/include/linux/memblock.h > >+++ b/include/linux/memblock.h > >@@ -26,6 +26,9 @@ enum { > > MEMBLOCK_HOTPLUG = 0x1, /* hotpluggable region */ > > MEMBLOCK_MIRROR = 0x2, /* mirrored region */ > > MEMBLOCK_NOMAP = 0x4, /* don't add to kernel direct mapping */ > >+#ifdef CONFIG_MEMORY_HOTREMOVE > >+ MEMBLOCK_UNUSED_VMEMMAP = 0x8, /* Mark VMEMAP blocks as dirty */ > > I'm not sure I get what "dirty" is supposed to mean in this context. Also, > this appears to be specific to CONFIG_SPARSEMEM_VMEMMAP, whilst only > tangentially related to CONFIG_MEMORY_HOTREMOVE, so the dependencies look a > bit off. > > In fact, now that I think about it, why does this need to be in memblock at > all? If it is specific to sparsemem, shouldn't the section map already be > enough to tell us what's supposed to be present or not? > > Robin. The story is: when we are hot-removing one section, we cannot be sure that the full block can be fully removed, for example, because we might have used only a portion of it at hot-add time and the rest might have been used by other hot adds we are not aware of. So when we hot-remove, we mark the page structs of the removed memory, and we only remove the full page when it is all marked. This is exactly symmetrical to the issue described in commit ae9aae9eda2d ("memory-hotplug: common APIs to support page tables hot-remove") - introducing hot-remove for x86. In that commit, partially unused vmemmap pages where filled with the 0XFD constant. In the previous iteration of this patchset, it was rightfully suggested that marking the pages by writing inside them was not the best way to achieve the result. That's why we reverted to do this marking using memblock. This is only used in memory hot remove, that's why the CONFIG_MEMORY_HOTREMOVE dependency. Right now, I cannot think of how I could use sparse mem to tell: the only thing I know at the moment of trying to free a vmemmap block is that I have some physical addresses that might or not be in use to describe some pages. I canot think of any way to know which struct pages could be occupying this vmemmap block, besides maybe walking all pagetables and check if I have some matching mapping. However, I might be missing something, so suggestions are welcome. Thanks, Andrea > >+#endif > > }; > > struct memblock_region { > >@@ -90,6 +93,10 @@ int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); > > int memblock_mark_nomap(phys_addr_t base, phys_addr_t size); > > int memblock_clear_nomap(phys_addr_t base, phys_addr_t size); > > ulong choose_memblock_flags(void); > >+#ifdef CONFIG_MEMORY_HOTREMOVE > >+int memblock_mark_unused_vmemmap(phys_addr_t base, phys_addr_t size); > >+int memblock_clear_unused_vmemmap(phys_addr_t base, phys_addr_t size); > >+#endif > > /* Low level functions */ > > int memblock_add_range(struct memblock_type *type, > >@@ -182,6 +189,11 @@ static inline bool memblock_is_nomap(struct memblock_region *m) > > return m->flags & MEMBLOCK_NOMAP; > > } > >+#ifdef CONFIG_MEMORY_HOTREMOVE > >+bool memblock_is_vmemmap_unused_range(struct memblock_type *mt, > >+ phys_addr_t start, phys_addr_t end); > >+#endif > >+ > > #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > > int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn, > > unsigned long *end_pfn); > >diff --git a/mm/memblock.c b/mm/memblock.c > >index 9120578..30d5aa4 100644 > >--- a/mm/memblock.c > >+++ b/mm/memblock.c > >@@ -809,6 +809,18 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size) > > return memblock_setclr_flag(base, size, 0, MEMBLOCK_NOMAP); > > } > >+#ifdef CONFIG_MEMORY_HOTREMOVE > >+int __init_memblock memblock_mark_unused_vmemmap(phys_addr_t base, > >+ phys_addr_t size) > >+{ > >+ return memblock_setclr_flag(base, size, 1, MEMBLOCK_UNUSED_VMEMMAP); > >+} > >+int __init_memblock memblock_clear_unused_vmemmap(phys_addr_t base, > >+ phys_addr_t size) > >+{ > >+ return memblock_setclr_flag(base, size, 0, MEMBLOCK_UNUSED_VMEMMAP); > >+} > >+#endif > > /** > > * __next_reserved_mem_region - next function for for_each_reserved_region() > > * @idx: pointer to u64 loop variable > >@@ -1696,6 +1708,26 @@ void __init_memblock memblock_trim_memory(phys_addr_t align) > > } > > } > >+#ifdef CONFIG_MEMORY_HOTREMOVE > >+bool __init_memblock memblock_is_vmemmap_unused_range(struct memblock_type *mt, > >+ phys_addr_t start, phys_addr_t end) > >+{ > >+ u64 i; > >+ struct memblock_region *r; > >+ > >+ i = memblock_search(mt, start); > >+ r = &(mt->regions[i]); > >+ while (r->base < end) { > >+ if (!(r->flags & MEMBLOCK_UNUSED_VMEMMAP)) > >+ return 0; > >+ > >+ r = &(memblock.memory.regions[++i]); > >+ } > >+ > >+ return 1; > >+} > >+#endif > >+ > > void __init_memblock memblock_set_current_limit(phys_addr_t limit) > > { > > memblock.current_limit = limit; > > > From 1585233201608264105@xxx Mon Nov 27 15:23:09 +0000 2017 X-GM-THRID: 1584855345049029184 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread