Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp405081yba; Wed, 3 Apr 2019 10:59:56 -0700 (PDT) X-Google-Smtp-Source: APXvYqxEc3UMewikGchkv0/DORSFf/Xyo/lgYsdmxT8wTUuFrzrol3eodPl9hFwHmHk54Leln4iU X-Received: by 2002:a63:6193:: with SMTP id v141mr1043861pgb.392.1554314396241; Wed, 03 Apr 2019 10:59:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554314396; cv=none; d=google.com; s=arc-20160816; b=gcMt6f3BxWXxuqxOqEVBqLAee3CL+13dmoS+v7AKz+XZgFJe2/Aa6Uxes5Qs0Us0A7 oWOh/s8vwbffokczMy25WnTRYccWhtDutRBawAkddXqrzzkUipq0x4IByYB+yjV51h3i gafUxDDj8uJjODweAqg1ZE7Z9VKzomhE7iv9KMJZfUoWt9Q7trlqLdY6lwjzjra/ZaWc psrXpOMZztWxpMHzBQYo0Ir0oy67OF3FLuInwmP/h4lWi49i+XZR6V+nFzOdlm3wbTXE yCiSZM40Mx/lfbbcBacjxize2sJoumTCRS1bqUkQKqaJaIhJ7ayaQ0PGVXUnDiUaS8k0 7CwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=IQ9nnstDbPMxKjJQJzTG/jeVQIZQtwOINabybIT8hYI=; b=hqMcmjJbbC8cNxTSfQ034ORFYHEOLVS6toMTPLEx2DP27dNYk7oz9tEYZJxNoilEeD ldjQ9j2ztit2ZyuC8AUV6fjkdOXVRLZ0QjPO+KkMugH1+VfianAq3RatAoHDnfBa4WNk 6TXZXeSX9br7Wpuha56Kd7xzLL9kSICyfkDiaTktFxngziCiZojtUKI/s/ZqQlnlgIgF IrVh9XbaDKG2tpMSArxdy/k/lRAd0MNgx0TDTIVZVgR3K7DVuKiVUWnbul/m6EEyfE0R MgUzCruKAcinXmRDystqzxb3RwHfsM4djuIm9EdGeA/vVHGcEnThaHv43Z8PrN4OLm6h ysQw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s5si14139580plr.307.2019.04.03.10.59.41; Wed, 03 Apr 2019 10:59:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726783AbfDCR5I (ORCPT + 99 others); Wed, 3 Apr 2019 13:57:08 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:45988 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726264AbfDCR5I (ORCPT ); Wed, 3 Apr 2019 13:57:08 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AF8DA80D; Wed, 3 Apr 2019 10:57:07 -0700 (PDT) Received: from [10.1.196.75] (e110467-lin.cambridge.arm.com [10.1.196.75]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4AC263F721; Wed, 3 Apr 2019 10:57:04 -0700 (PDT) Subject: Re: [PATCH 2/6] arm64/mm: Enable memory hot remove To: Logan Gunthorpe , Anshuman Khandual , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, akpm@linux-foundation.org, will.deacon@arm.com, catalin.marinas@arm.com Cc: mark.rutland@arm.com, mhocko@suse.com, david@redhat.com, cai@lca.pw, pasha.tatashin@oracle.com, Stephen Bates , james.morse@arm.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, mgorman@techsingularity.net, osalvador@suse.de References: <1554265806-11501-1-git-send-email-anshuman.khandual@arm.com> <1554265806-11501-3-git-send-email-anshuman.khandual@arm.com> From: Robin Murphy Message-ID: <85fbfe49-d49e-fd6e-21dd-ff4d9808610b@arm.com> Date: Wed, 3 Apr 2019 18:57:02 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/04/2019 18:32, Logan Gunthorpe wrote: > > > On 2019-04-02 10:30 p.m., Anshuman Khandual wrote: >> Memory removal from an arch perspective involves tearing down two different >> kernel based mappings i.e vmemmap and linear while releasing related page >> table pages allocated for the physical memory range to be removed. >> >> Define a common kernel page table tear down helper remove_pagetable() which >> can be used to unmap given kernel virtual address range. In effect it can >> tear down both vmemap or kernel linear mappings. This new helper is called >> from both vmemamp_free() and ___remove_pgd_mapping() during memory removal. >> The argument 'direct' here identifies kernel linear mappings. >> >> Vmemmap mappings page table pages are allocated through sparse mem helper >> functions like vmemmap_alloc_block() which does not cycle the pages through >> pgtable_page_ctor() constructs. Hence while removing it skips corresponding >> destructor construct pgtable_page_dtor(). >> >> While here update arch_add_mempory() to handle __add_pages() failures by >> just unmapping recently added kernel linear mapping. Now enable memory hot >> remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE. >> >> This implementation is overall inspired from kernel page table tear down >> procedure on X86 architecture. > > I've been working on very similar things for RISC-V. In fact, I'm > currently in progress on a very similar stripped down version of > remove_pagetable(). (Though I'm fairly certain I've done a bunch of > stuff wrong.) > > Would it be possible to move this work into common code that can be used > by all arches? Seems like, to start, we should be able to support both > arm64 and RISC-V... and maybe even x86 too. > > I'd be happy to help integrate and test such functions in RISC-V. Indeed, I had hoped we might be able to piggyback off generic code for this anyway, given that we have generic pagetable code which knows how to free process pagetables, and kernel pagetables are also pagetables. I did actually hack up such a patch[1], and other than p?d_none_or_clear_bad() being loud it does actually appear to function OK in terms of withstanding repeated add/remove cycles and not crashing, but all the pagetable accounting and other stuff I don't really know about mean it's probably not viable without a lot more core work. Robin. [1] http://linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=75934a2c4f737ad9f26903861108d5b0658e86bb