Received: by 10.223.164.202 with SMTP id h10csp1003908wrb; Thu, 23 Nov 2017 09:34:50 -0800 (PST) X-Google-Smtp-Source: AGs4zMaJ0EW2GPgYbnxDZCLgva2oQbOfOul+gHAG5475LBE94BX5PyaagHZY5ahRv0bIl1RXrYCH X-Received: by 10.99.132.72 with SMTP id k69mr25438116pgd.437.1511458490011; Thu, 23 Nov 2017 09:34:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511458489; cv=none; d=google.com; s=arc-20160816; b=m00sB5ICZUgMSbQT/EOilY3RQPjACz8iTAXFHuK5+NqRC/IyMTAY3Q/846JyRc0jGU hKIO8EvU5pgBDW2/hOM5x+D/j8Wp1FEYSI/sfX/+At+l3f98t5xryZUza9/4QG+9ykGG LinwmV2At5SUgWnWdCLMlaG/cFchWjXxg7lWiZzuM2aMG9T5725f5MFSTzSJKOn4N1eW us2SHhwQJypV05nHi1xr1jKt/Id5YjufQgl7YpbUTEOR5hBGMKqpt0V5hzyrKfafQcbg y1BRZUncAvMq2v4AeprL9YW/DR29FUuM1ld8v6NgIoSOXX8DDdX1MRamWKB2e1LFxS5U T0tw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-disposition:mime-version:references:subject:cc:to:from:date :arc-authentication-results; bh=4ZlUarR8DIx5UkG5opVqH282YNuyX2dvT5q+8TGyftQ=; b=oehUW10G1YO9pJWHVjEuQpUokeqIwPg+appWhw1agtnhJ6ewYqWiIPRKFTiECx1ip+ PXpGA2tPZMmkrbIZCdoXwvNquUENaMb0OmqodsGMs5j4UuNaK5P22Hhr7s6OC9qdkgpg qdiDe+y5TLeztUetEuRh2ZuDFVZp+tKkVt0uroxDgl6yL7bLrRlvcsLsfAXwryttTkf9 Cjm9hIbLqFMMvcnggTpmBqx6vGV2aIElyWQANkP90rzibDvTESGXYzhCTtW3REe19rYn P+Ds//JmoWp0wBp04JaP/G/AbicsxphLyPw/W4/ayyGO3rq9jMkPvbPh1F/QNPOXNi0e ouEQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 88si16647950plc.803.2017.11.23.09.34.38; Thu, 23 Nov 2017 09:34:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753231AbdKWRdm (ORCPT + 76 others); Thu, 23 Nov 2017 12:33:42 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:33098 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753048AbdKWRdl (ORCPT ); Thu, 23 Nov 2017 12:33:41 -0500 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vANHWWG7019119 for ; Thu, 23 Nov 2017 12:33:41 -0500 Received: from e06smtp11.uk.ibm.com (e06smtp11.uk.ibm.com [195.75.94.107]) by mx0a-001b2d01.pphosted.com with ESMTP id 2ee1ev48fy-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 23 Nov 2017 12:33:40 -0500 Received: from localhost by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 23 Nov 2017 17:33:38 -0000 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp11.uk.ibm.com (192.168.101.141) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 23 Nov 2017 17:33:34 -0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vANHXXR342270910; Thu, 23 Nov 2017 17:33:33 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 62AFF52049; Thu, 23 Nov 2017 16:27:02 +0000 (GMT) Received: from samekh (unknown [9.162.48.51]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTPS id F1AA252043; Thu, 23 Nov 2017 16:27:01 +0000 (GMT) Date: Thu, 23 Nov 2017 17:33:31 +0000 From: Andrea Reale To: Michal Hocko Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, m.bielski@virtualopensystems.com, arunks@qti.qualcomm.com, mark.rutland@arm.com, scott.branden@broadcom.com, will.deacon@arm.com, qiuxishi@huawei.com, catalin.marinas@arm.com, realean2@ie.ibm.com Subject: Re: [PATCH v2 0/5] Memory hotplug support for arm64 - complete patchset v2 References: <20171123160258.xmw5lxnjfch2dxfw@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20171123160258.xmw5lxnjfch2dxfw@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) X-TM-AS-GCONF: 00 x-cbid: 17112317-0040-0000-0000-00000411E58A X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17112317-0041-0000-0000-000020B4B7C2 Message-Id: <20171123173331.GA15535@samekh> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-11-23_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1711230237 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 23 Nov 2017, 17:02, Michal Hocko wrote: Hi Michal, > I will try to have a look but I do not expect to understand any of arm64 > specific changes so I will focus on the generic code but it would help a > _lot_ if the cover letter provided some overview of what has been done > from a higher level POV. What are the arch pieces and what is the > generic code missing. A quick glance over patches suggests that > changelogs for specific patches are modest as well. Could you give us > more information please? Reviewing hundreds lines of code without > context is a pain. sorry for the lack of details. I will try to provide a better overview in the following. Please, feel free to ask for more details where needed. Overall, the goal of the patchset is to implement arch_memory_add and arch_memory_remove for arm64, to support the generic memory_hotplug framework. Hot add ------- Not so many surprises here. We implement the arch specific arch_add_memory, which builds the kernel page tables via hotplug_paging() and then calls arch specific add_pages(). We need the arch specific add_pages() to implement a trick that makes the satus of pages being added accepted by the asumptions made in the generic __add_pages. (See code comments). Hot remove ---------- The code is basically a port of x86_64 hot remove, with several relevant changes that I am highlithing below. * Architecture specific code: - We implement arch_remove_memory() which takes care of i) calling the generic __remove_pages and ii) tearing down kernel page tables (remove_pagetable()). - We implement the arch specific vmemmap_free(), which is called by the generic code to free vmemmap for memory being removed. vmemmap_free(), in its turn, reuses the code of remove_pagetable() to do its job. - remove_pagetable() (called by the two functions above), removes kernel page tables and, in the case of vmemmap, also removes the actual vmemmap pages. The function never splits P[UM]D mapped page table entries, and fails in case such a split is requested. To implement this behavior, we do a two passes call of remove_pagetable() in arch_remove_memory(): the first pass does not alter any of the pagetable contents, but only checks whether some P[UM]D split would occur; in the case the first pass succeeds, the second pass does the actual removal job. Actually, the case where a P[UM]D would be split should be extremely rare - so denying the removal should not be a big deal: in fact, hot-add and hot-remove add memory at the granularity of SECTION_SIZE_BITS, which is hardcoded to 30 for arm64 at the moment, and PMDs and PUDs map 2MB and 1GB worth of 4K pages, respectively. In order for a split to occur, someone should first decrease SECTION_SIZE_BITS and then ask to remove some p[um]d sub area that was mapped at boot to the full p[um]d. * Generic code - [SYSFS and x86 ACPI changes]. In x86, hot remove is triggered by ACPI, which performs memory offlining and removal in one atomic step. To enable memory removal in the absence of ACPI, we add a sysfs `remove` handle (/sys/devices/system/memory/remove), symmetrically to the existing memory probe device (existing since the beginning of time with commit 3947be1969a9 ("memory hotplug: sysfs and add/remove functions")). To hot-remove a section, one would first offline it (echo offline > /sys/devices/system/memory/memoryXX/state) and then call remove on this new remove handle, passing the phy address of the section being removed. Now, the x86 code assumes that offline and remove are done in one single atomic step (ACPI- Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal")). In this spirit, the generic code also assumed that when someone called memory_hotplug.c:remove_memory, then that memory would have been already offlined. If that was not the case, it would raise a BUG(). In our case, offlining and removal are done in separate steps, so we remove this assumptions and fail the removal if the memory was not previously offlined. We also consider the possibility that arch_remove_memory itself might fail. As explained above, in some rare cases, it actually might in our arm64 implementation. While functional to our implementation, I believe that the assumption of offlining and removal in one atomic step is not obvious for all the architectures in general. - [Memblock changes]. In x86 hot-remove implementation - commit ae9aae9eda2d ("memory-hotplug: common APIs to support page tables hot-remove") -, when freeing vmemmap, if a vmemmap page is only partially cleared and some of its content is still used, then the vmemap page is obviously not freed. Instead, the partially unused content of that paged is memset to the seemingly totally arbitrary 0xFD constant. When all the page content is found to be set to 0xFD, then the page is freed. After some good feedback received on the v1 of this patchset, we decided to get rid of this 0xFD trick for our arm64 port. Instead, we added a memblock flag, that we use to mark partially unused vmemmap areas (like 0xFD was doing before). We then check memblock rather than the content of the page to decide whether we can free it or not. I hope this is a better cover letter. Best regards, Andrea > > Changes v1->v2: > > - swapper pgtable updated in place on hot add, avoiding unnecessary copy > > - stop_machine used to updated swapper on hot add, avoiding races > > - introduced check on offlining state before hot remove > > - new memblock flag used to mark partially unused vmemmap pages, avoiding > > the nasty 0xFD hack used in the prev rev (and in x86 hot remove code) > > - proper cleaning sequence for p[um]ds,ptes and related TLB management > > - Removed macros that changed hot remove behavior based on number > > of pgtable levels. Now this is hidden in the pgtable traversal macros. > > - Check on the corner case where P[UM]Ds would have to be split during > > hot remove: now this is forbidden. > > - Minor fixes and refactoring. > > > > Andrea Reale (4): > > mm: memory_hotplug: Remove assumption on memory state before hotremove > > mm: memory_hotplug: memblock to track partially removed vmemmap mem > > mm: memory_hotplug: Add memory hotremove probe device > > mm: memory-hotplug: Add memory hot remove support for arm64 > > > > Maciej Bielski (1): > > mm: memory_hotplug: Memory hotplug (add) support for arm64 > > > > arch/arm64/Kconfig | 15 + > > arch/arm64/configs/defconfig | 2 + > > arch/arm64/include/asm/mmu.h | 7 + > > arch/arm64/mm/init.c | 116 ++++++++ > > arch/arm64/mm/mmu.c | 609 ++++++++++++++++++++++++++++++++++++++++- > > drivers/acpi/acpi_memhotplug.c | 2 +- > > drivers/base/memory.c | 34 ++- > > include/linux/memblock.h | 12 + > > include/linux/memory_hotplug.h | 9 +- > > mm/memblock.c | 32 +++ > > mm/memory_hotplug.c | 13 +- > > 11 files changed, 835 insertions(+), 16 deletions(-) > > > > -- > > 2.7.4 > > > > -- > Michal Hocko > SUSE Labs > From 1584873390123949400@xxx Thu Nov 23 16:04:06 +0000 2017 X-GM-THRID: 1584855179286707231 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread