Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp542101imm; Wed, 18 Jul 2018 06:41:07 -0700 (PDT) X-Google-Smtp-Source: AAOMgpcTGQQE8NgvypJ47cu2tNSMHHCxn3KwODNSTLaS6aawVWg6PJkPHIp1FEwEtK+RMJNCvCe3 X-Received: by 2002:a62:6a01:: with SMTP id f1-v6mr5303734pfc.156.1531921267573; Wed, 18 Jul 2018 06:41:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531921267; cv=none; d=google.com; s=arc-20160816; b=vwuzHCsRvvLVJYZySN9z9SXJLXE/lZR51uKq6rqEHE7xdOTWEav/CoHGy1K5TRloX6 RreCBGs8x2k6cyFrA6DlGh8uElKhMMGLFViPo6qPG5wH5alSfO9OCKNkpX1+v9q8js4m GWxMZuAbOu1o2D5FKIK1Peaerdb5wvnh9q7ntkIhBQ2e9P7crl3znehjy+A/FNpRsRdm jtDTAXQmEhl8j1QgvlUjmp6oTgBQGQoecgpA2tlceAO5cjEEZCl8daSX3YCYqAt/KoBy JSezyn4hnAy0aIo/b6NNSzNHF4ATuMx3pL4dKycQ3j+CR7kCgco4/iYJiJQAfpRjIlRb 6+ng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:autocrypt:openpgp:from:references:cc:to :subject:arc-authentication-results; bh=GooTL0XWCQrmvx0cXRSUkgVUQFRfIVw82XW94s5jCw0=; b=vTAEAF4PR3vB/b577cehtDAevyuD5OGS3GJXCsab13veoMmsv9WDc3Rmuy8EcN/jsU IKBRRAQh1OiSd/ZhfpYTVoS4bSVOSh8Eq6Y1iMD4A+/9YEhRrSgm4/M/28VeppKT1k1/ NV9+ZgIDbRFWY/6alsj/7dktWvmwuZztUKk995299y+C0wECC6ehiTCcw8FZlxapRrQs VNLaLhPQQhVbAC4P4U2TkDzgCkfKeLGFH2kJAHwLAwkV4UrCXY+u5tO4EBdGs17VWMwB UiiH5xLcGHDYahE02TEL96eZIC2rGXOELRQ/r/KuEX2/uRwgisrSOVyGo0teHja38bRt ItrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1-v6si3543649pgb.107.2018.07.18.06.40.52; Wed, 18 Jul 2018 06:41:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731455AbeGRORi (ORCPT + 99 others); Wed, 18 Jul 2018 10:17:38 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:46836 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726633AbeGRORi (ORCPT ); Wed, 18 Jul 2018 10:17:38 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9D8DF40214E2; Wed, 18 Jul 2018 13:39:37 +0000 (UTC) Received: from [10.36.118.31] (unknown [10.36.118.31]) by smtp.corp.redhat.com (Postfix) with ESMTP id 88D1C2156893; Wed, 18 Jul 2018 13:39:30 +0000 (UTC) Subject: Re: [PATCH v1 00/10] mm: online/offline 4MB chunks controlled by device driver To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexander Potapenko , Andrew Morton , Andrey Ryabinin , Balbir Singh , Baoquan He , Benjamin Herrenschmidt , Boris Ostrovsky , Dan Williams , Dave Young , Dmitry Vyukov , Greg Kroah-Hartman , Hari Bathini , Huang Ying , Hugh Dickins , Ingo Molnar , Jaewon Kim , Jan Kara , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Joonsoo Kim , Juergen Gross , Kate Stewart , "Kirill A. Shutemov" , Matthew Wilcox , Mel Gorman , Michael Ellerman , Miles Chen , Oscar Salvador , Paul Mackerras , Pavel Tatashin , Philippe Ombredanne , Rashmica Gupta , Reza Arbab , Souptick Joarder , Tetsuo Handa , Thomas Gleixner , Vlastimil Babka References: <20180523151151.6730-1-david@redhat.com> <20180524075327.GU20441@dhcp22.suse.cz> <14d79dad-ad47-f090-2ec0-c5daf87ac529@redhat.com> <20180524093121.GZ20441@dhcp22.suse.cz> <20180524120341.GF20441@dhcp22.suse.cz> <1a03ac4e-9185-ce8e-a672-c747c3e40ff2@redhat.com> <20180524142241.GJ20441@dhcp22.suse.cz> <819e45c5-6ae3-1dff-3f1d-c0411b6e2e1d@redhat.com> <20180718131905.GB7193@dhcp22.suse.cz> From: David Hildenbrand Openpgp: preference=signencrypt Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwX4EEwECACgFAljj9eoCGwMFCQlmAYAGCwkI BwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEE3eEPcA/4Na5IIP/3T/FIQMxIfNzZshIq687qgG 8UbspuE/YSUDdv7r5szYTK6KPTlqN8NAcSfheywbuYD9A4ZeSBWD3/NAVUdrCaRP2IvFyELj xoMvfJccbq45BxzgEspg/bVahNbyuBpLBVjVWwRtFCUEXkyazksSv8pdTMAs9IucChvFmmq3 jJ2vlaz9lYt/lxN246fIVceckPMiUveimngvXZw21VOAhfQ+/sofXF8JCFv2mFcBDoa7eYob s0FLpmqFaeNRHAlzMWgSsP80qx5nWWEvRLdKWi533N2vC/EyunN3HcBwVrXH4hxRBMco3jvM m8VKLKao9wKj82qSivUnkPIwsAGNPdFoPbgghCQiBjBe6A75Z2xHFrzo7t1jg7nQfIyNC7ez MZBJ59sqA9EDMEJPlLNIeJmqslXPjmMFnE7Mby/+335WJYDulsRybN+W5rLT5aMvhC6x6POK z55fMNKrMASCzBJum2Fwjf/VnuGRYkhKCqqZ8gJ3OvmR50tInDV2jZ1DQgc3i550T5JDpToh dPBxZocIhzg+MBSRDXcJmHOx/7nQm3iQ6iLuwmXsRC6f5FbFefk9EjuTKcLMvBsEx+2DEx0E UnmJ4hVg7u1PQ+2Oy+Lh/opK/BDiqlQ8Pz2jiXv5xkECvr/3Sv59hlOCZMOaiLTTjtOIU7Tq 7ut6OL64oAq+zsFNBFXLn5EBEADn1959INH2cwYJv0tsxf5MUCghCj/CA/lc/LMthqQ773ga uB9mN+F1rE9cyyXb6jyOGn+GUjMbnq1o121Vm0+neKHUCBtHyseBfDXHA6m4B3mUTWo13nid 0e4AM71r0DS8+KYh6zvweLX/LL5kQS9GQeT+QNroXcC1NzWbitts6TZ+IrPOwT1hfB4WNC+X 2n4AzDqp3+ILiVST2DT4VBc11Gz6jijpC/KI5Al8ZDhRwG47LUiuQmt3yqrmN63V9wzaPhC+ xbwIsNZlLUvuRnmBPkTJwwrFRZvwu5GPHNndBjVpAfaSTOfppyKBTccu2AXJXWAE1Xjh6GOC 8mlFjZwLxWFqdPHR1n2aPVgoiTLk34LR/bXO+e0GpzFXT7enwyvFFFyAS0Nk1q/7EChPcbRb hJqEBpRNZemxmg55zC3GLvgLKd5A09MOM2BrMea+l0FUR+PuTenh2YmnmLRTro6eZ/qYwWkC u8FFIw4pT0OUDMyLgi+GI1aMpVogTZJ70FgV0pUAlpmrzk/bLbRkF3TwgucpyPtcpmQtTkWS gDS50QG9DR/1As3LLLcNkwJBZzBG6PWbvcOyrwMQUF1nl4SSPV0LLH63+BrrHasfJzxKXzqg rW28CTAE2x8qi7e/6M/+XXhrsMYG+uaViM7n2je3qKe7ofum3s4vq7oFCPsOgwARAQABwsFl BBgBAgAPBQJVy5+RAhsMBQkJZgGAAAoJEE3eEPcA/4NagOsP/jPoIBb/iXVbM+fmSHOjEshl KMwEl/m5iLj3iHnHPVLBUWrXPdS7iQijJA/VLxjnFknhaS60hkUNWexDMxVVP/6lbOrs4bDZ NEWDMktAeqJaFtxackPszlcpRVkAs6Msn9tu8hlvB517pyUgvuD7ZS9gGOMmYwFQDyytpepo YApVV00P0u3AaE0Cj/o71STqGJKZxcVhPaZ+LR+UCBZOyKfEyq+ZN311VpOJZ1IvTExf+S/5 lqnciDtbO3I4Wq0ArLX1gs1q1XlXLaVaA3yVqeC8E7kOchDNinD3hJS4OX0e1gdsx/e6COvy qNg5aL5n0Kl4fcVqM0LdIhsubVs4eiNCa5XMSYpXmVi3HAuFyg9dN+x8thSwI836FoMASwOl C7tHsTjnSGufB+D7F7ZBT61BffNBBIm1KdMxcxqLUVXpBQHHlGkbwI+3Ye+nE6HmZH7IwLwV W+Ajl7oYF+jeKaH4DZFtgLYGLtZ1LDwKPjX7VAsa4Yx7S5+EBAaZGxK510MjIx6SGrZWBrrV TEvdV00F2MnQoeXKzD7O4WFbL55hhyGgfWTHwZ457iN9SgYi1JLPqWkZB0JRXIEtjd4JEQcx +8Umfre0Xt4713VxMygW0PnQt5aSQdMD58jHFxTk092mU+yIHj5LeYgvwSgZN4airXk5yRXl SE+xAvmumFBY Organization: Red Hat GmbH Message-ID: Date: Wed, 18 Jul 2018 15:39:29 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180718131905.GB7193@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Wed, 18 Jul 2018 13:39:37 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Wed, 18 Jul 2018 13:39:37 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'david@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 18.07.2018 15:19, Michal Hocko wrote: > [got back to this really late. Sorry about that] > > On Thu 24-05-18 23:07:23, David Hildenbrand wrote: >> On 24.05.2018 16:22, Michal Hocko wrote: >>> I will go over the rest of the email later I just wanted to make this >>> point clear because I suspect we are talking past each other. >> >> It sounds like we are now talking about how to solve the problem. I like >> that :) >> >>> >>> On Thu 24-05-18 16:04:38, David Hildenbrand wrote: >>> [...] >>>> The point I was making is: I cannot allocate 8MB/128MB using the buddy >>>> allocator. All I want to do is manage the memory a virtio-mem device >>>> provides as flexible as possible. >>> >>> I didn't mean to use the page allocator to isolate pages from it. We do >>> have other means. Have a look at the page isolation framework and have a >>> look how the current memory hotplug (ab)uses it. In short you mark the >>> desired physical memory range as isolated (nobody can allocate from it) >>> and then simply remove it from the page allocator. And you are done with >>> it. Your particular range is gone, nobody will ever use it. If you mark >>> those struct pages reserved then pfn walkers should already ignore them. >>> If you keep those pages with ref count 0 then even hotplug should work >>> seemlessly (I would have to double check). >>> >>> So all I am arguing is that whatever your driver wants to do can be >>> handled without touching the hotplug code much. You would still need >>> to add new ranges in the mem section units and manage on top of that. >>> You need to do that anyway to keep track of what parts are in use or >>> offlined anyway right? Now the mem sections. You have to do that anyway >>> for memmaps. Our sparse memory model simply works in those units. Even >>> if you make a part of that range unavailable then the section will still >>> be there. >>> >>> Do I make at least some sense or I am completely missing your point? >>> >> >> I think we're heading somewhere. I understand that you want to separate >> this "semi" offline part from the general offlining code. If so, we >> should definitely enforce segment alignment for online_pages/offline_pages. >> >> Importantly, what I need is: >> >> 1. Indicate and prepare memory sections to be used for adding memory >> chunks (right now add_memory()) > > Yes, this is section based. So you will always get memmap (struct page) > for the whole section. > >> 2. Make memory chunks of a section available to the system (right now >> online_pages()) > > Yes, this doesn't have to be section based. All you need is to mark > remaining pages as offline. They are reserved at this moment so nobody > should touch tehem. > >> 3. Remove memory chunks of a section from the system (right now >> offline_pages()) > > Yes. All we need is to note that those reserved pages are actually good > to offline. I have mentioned that reserved pages are yours at this stage > so you can note the special state without an additional page flag. > > The generic hotplug code just have to learn about this new state. > has_unmovable_pages sounds like a proper place to do that. You simply > clear the offline state and the PageReserved and you are done with the > page. > I agree. This would be minimal invassive - notifiers are still called on whole segment. >> 4. Remove memory sections from the system (right now remove_memory()) > > no change needed > >> 5. Hinder dumping tools from reading memory chunks that are logically >> offline (right now PageOffline()) > > I still fail to see why do we even care about some dumping tools. Pages > are reserved so they simply shouldn't touch that memory at all. > Thanks for having a look! I wonder why reserved pages never got excluded by dump tools. So I assume there is some kind of magic hidden in it. `git grep SetPageReserved` returns a number of buffers that are not to be swapped. So "reserved" there is used for: "PG_reserved is set for special pages, which can never be swapped out" And my point would be that these pages are still to be dumped (just as it is being done now). They are valid memory. It seems like this bit is used for two different purposes. My take would be then to have another way of indicating "don't swap" vs. "page not accessible / offline". And that's why I propose PageOffline. I would even go one step further and rename "reserved" to "dontswap". -- Thanks, David / dhildenb