Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp549729imm; Wed, 18 Jul 2018 06:49:13 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeJiEwAEqd4Qk7xwt02ft0f8MKJg5YdR4hc4h4yGidLVz2vTUJZwhVeFw1aSlLaIQ46Tqrh X-Received: by 2002:a63:2246:: with SMTP id t6-v6mr5959950pgm.258.1531921753216; Wed, 18 Jul 2018 06:49:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531921753; cv=none; d=google.com; s=arc-20160816; b=QFoZv9lB3As6AWgqss5Ej88hA0W826Yg82Dp0la11hFMuk0tvGC1D/KwcHGjQhxNYK 0xjrKpWxq8d3CEKErkIddpDZ2Xk79mPdDSOlsMVHP/M4fFMnDJW0NwA49N3iAfHEvB1K S8oBCjirSAaqxUFjX55PM9oNVuyh63bddn/BF7XHGk/qBKHPoqIZksX8EdMPzg/zWyS5 2MSUfIwZjOdbgu1A5sadXcn3vt05FyyQMR2idvmq3Mq95xDtjG5nEnzj07hH2dxSsskb k8qegibfwZbouRpDHLLComvPwPXRyvkKSEiNWpDcqiU3qlSzBzofJL+gItA20xcEE5K5 ix5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:autocrypt:openpgp:from:references:cc:to :subject:arc-authentication-results; bh=+NHxOizR1LAVRD0YPdLNc8uJeq0jF0dBK3hwNpCaCMk=; b=sBkDUokuB2z8LjlwGXAzKTvmVVa9A4ZNifKZSX4Jx9hXHh0x4Mw86G/djp+uR1QC7m 8gM7IwQw8ZODYJJ/pWQtV4SADY2VGq1i581GiHsGHbtqq9wnhGqS6b+I+oswEhUTBw14 rtu1O2/qCi7e6WQRr/37fo+a/MMxTqvUFLSukK2RY84IkwxSl4t5MZZ9SSmUm03FmFVy R0WeEWDVGqw0HWDcJRTY23YisR0pedS533kZwElQqwRlFVH2b9JuZHkFK4qZbLCcDo/F WOS1X2dCj3sZ16hoUs0PWKO++5IA8T2t+SsqzjH+gohwjfdQ1QXJgY6M8jVXppuoGAyO 8LUw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l1-v6si3582282pfd.139.2018.07.18.06.48.58; Wed, 18 Jul 2018 06:49:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731356AbeGROZr (ORCPT + 99 others); Wed, 18 Jul 2018 10:25:47 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:51586 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731003AbeGROZr (ORCPT ); Wed, 18 Jul 2018 10:25:47 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 424F37C6CA; Wed, 18 Jul 2018 13:47:44 +0000 (UTC) Received: from [10.36.118.31] (unknown [10.36.118.31]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8339D10193; Wed, 18 Jul 2018 13:47:34 +0000 (UTC) Subject: Re: [PATCH v1 00/10] mm: online/offline 4MB chunks controlled by device driver To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexander Potapenko , Andrew Morton , Andrey Ryabinin , Balbir Singh , Baoquan He , Benjamin Herrenschmidt , Boris Ostrovsky , Dan Williams , Dave Young , Dmitry Vyukov , Greg Kroah-Hartman , Hari Bathini , Huang Ying , Hugh Dickins , Ingo Molnar , Jaewon Kim , Jan Kara , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Joonsoo Kim , Juergen Gross , Kate Stewart , "Kirill A. Shutemov" , Matthew Wilcox , Mel Gorman , Michael Ellerman , Miles Chen , Oscar Salvador , Paul Mackerras , Pavel Tatashin , Philippe Ombredanne , Rashmica Gupta , Reza Arbab , Souptick Joarder , Tetsuo Handa , Thomas Gleixner , Vlastimil Babka References: <20180524075327.GU20441@dhcp22.suse.cz> <14d79dad-ad47-f090-2ec0-c5daf87ac529@redhat.com> <20180524093121.GZ20441@dhcp22.suse.cz> <20180524120341.GF20441@dhcp22.suse.cz> <1a03ac4e-9185-ce8e-a672-c747c3e40ff2@redhat.com> <20180524142241.GJ20441@dhcp22.suse.cz> <819e45c5-6ae3-1dff-3f1d-c0411b6e2e1d@redhat.com> <20180718131905.GB7193@dhcp22.suse.cz> <20180718134308.GF7193@dhcp22.suse.cz> From: David Hildenbrand Openpgp: preference=signencrypt Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwX4EEwECACgFAljj9eoCGwMFCQlmAYAGCwkI BwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEE3eEPcA/4Na5IIP/3T/FIQMxIfNzZshIq687qgG 8UbspuE/YSUDdv7r5szYTK6KPTlqN8NAcSfheywbuYD9A4ZeSBWD3/NAVUdrCaRP2IvFyELj xoMvfJccbq45BxzgEspg/bVahNbyuBpLBVjVWwRtFCUEXkyazksSv8pdTMAs9IucChvFmmq3 jJ2vlaz9lYt/lxN246fIVceckPMiUveimngvXZw21VOAhfQ+/sofXF8JCFv2mFcBDoa7eYob s0FLpmqFaeNRHAlzMWgSsP80qx5nWWEvRLdKWi533N2vC/EyunN3HcBwVrXH4hxRBMco3jvM m8VKLKao9wKj82qSivUnkPIwsAGNPdFoPbgghCQiBjBe6A75Z2xHFrzo7t1jg7nQfIyNC7ez MZBJ59sqA9EDMEJPlLNIeJmqslXPjmMFnE7Mby/+335WJYDulsRybN+W5rLT5aMvhC6x6POK z55fMNKrMASCzBJum2Fwjf/VnuGRYkhKCqqZ8gJ3OvmR50tInDV2jZ1DQgc3i550T5JDpToh dPBxZocIhzg+MBSRDXcJmHOx/7nQm3iQ6iLuwmXsRC6f5FbFefk9EjuTKcLMvBsEx+2DEx0E UnmJ4hVg7u1PQ+2Oy+Lh/opK/BDiqlQ8Pz2jiXv5xkECvr/3Sv59hlOCZMOaiLTTjtOIU7Tq 7ut6OL64oAq+zsFNBFXLn5EBEADn1959INH2cwYJv0tsxf5MUCghCj/CA/lc/LMthqQ773ga uB9mN+F1rE9cyyXb6jyOGn+GUjMbnq1o121Vm0+neKHUCBtHyseBfDXHA6m4B3mUTWo13nid 0e4AM71r0DS8+KYh6zvweLX/LL5kQS9GQeT+QNroXcC1NzWbitts6TZ+IrPOwT1hfB4WNC+X 2n4AzDqp3+ILiVST2DT4VBc11Gz6jijpC/KI5Al8ZDhRwG47LUiuQmt3yqrmN63V9wzaPhC+ xbwIsNZlLUvuRnmBPkTJwwrFRZvwu5GPHNndBjVpAfaSTOfppyKBTccu2AXJXWAE1Xjh6GOC 8mlFjZwLxWFqdPHR1n2aPVgoiTLk34LR/bXO+e0GpzFXT7enwyvFFFyAS0Nk1q/7EChPcbRb hJqEBpRNZemxmg55zC3GLvgLKd5A09MOM2BrMea+l0FUR+PuTenh2YmnmLRTro6eZ/qYwWkC u8FFIw4pT0OUDMyLgi+GI1aMpVogTZJ70FgV0pUAlpmrzk/bLbRkF3TwgucpyPtcpmQtTkWS gDS50QG9DR/1As3LLLcNkwJBZzBG6PWbvcOyrwMQUF1nl4SSPV0LLH63+BrrHasfJzxKXzqg rW28CTAE2x8qi7e/6M/+XXhrsMYG+uaViM7n2je3qKe7ofum3s4vq7oFCPsOgwARAQABwsFl BBgBAgAPBQJVy5+RAhsMBQkJZgGAAAoJEE3eEPcA/4NagOsP/jPoIBb/iXVbM+fmSHOjEshl KMwEl/m5iLj3iHnHPVLBUWrXPdS7iQijJA/VLxjnFknhaS60hkUNWexDMxVVP/6lbOrs4bDZ NEWDMktAeqJaFtxackPszlcpRVkAs6Msn9tu8hlvB517pyUgvuD7ZS9gGOMmYwFQDyytpepo YApVV00P0u3AaE0Cj/o71STqGJKZxcVhPaZ+LR+UCBZOyKfEyq+ZN311VpOJZ1IvTExf+S/5 lqnciDtbO3I4Wq0ArLX1gs1q1XlXLaVaA3yVqeC8E7kOchDNinD3hJS4OX0e1gdsx/e6COvy qNg5aL5n0Kl4fcVqM0LdIhsubVs4eiNCa5XMSYpXmVi3HAuFyg9dN+x8thSwI836FoMASwOl C7tHsTjnSGufB+D7F7ZBT61BffNBBIm1KdMxcxqLUVXpBQHHlGkbwI+3Ye+nE6HmZH7IwLwV W+Ajl7oYF+jeKaH4DZFtgLYGLtZ1LDwKPjX7VAsa4Yx7S5+EBAaZGxK510MjIx6SGrZWBrrV TEvdV00F2MnQoeXKzD7O4WFbL55hhyGgfWTHwZ457iN9SgYi1JLPqWkZB0JRXIEtjd4JEQcx +8Umfre0Xt4713VxMygW0PnQt5aSQdMD58jHFxTk092mU+yIHj5LeYgvwSgZN4airXk5yRXl SE+xAvmumFBY Organization: Red Hat GmbH Message-ID: <84568c21-bdbd-5769-56dd-64d5e2378b91@redhat.com> Date: Wed, 18 Jul 2018 15:47:33 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180718134308.GF7193@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Wed, 18 Jul 2018 13:47:44 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Wed, 18 Jul 2018 13:47:44 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'david@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 18.07.2018 15:43, Michal Hocko wrote: > On Wed 18-07-18 15:39:29, David Hildenbrand wrote: >> On 18.07.2018 15:19, Michal Hocko wrote: >>> [got back to this really late. Sorry about that] >>> >>> On Thu 24-05-18 23:07:23, David Hildenbrand wrote: >>>> On 24.05.2018 16:22, Michal Hocko wrote: >>>>> I will go over the rest of the email later I just wanted to make this >>>>> point clear because I suspect we are talking past each other. >>>> >>>> It sounds like we are now talking about how to solve the problem. I like >>>> that :) >>>> >>>>> >>>>> On Thu 24-05-18 16:04:38, David Hildenbrand wrote: >>>>> [...] >>>>>> The point I was making is: I cannot allocate 8MB/128MB using the buddy >>>>>> allocator. All I want to do is manage the memory a virtio-mem device >>>>>> provides as flexible as possible. >>>>> >>>>> I didn't mean to use the page allocator to isolate pages from it. We do >>>>> have other means. Have a look at the page isolation framework and have a >>>>> look how the current memory hotplug (ab)uses it. In short you mark the >>>>> desired physical memory range as isolated (nobody can allocate from it) >>>>> and then simply remove it from the page allocator. And you are done with >>>>> it. Your particular range is gone, nobody will ever use it. If you mark >>>>> those struct pages reserved then pfn walkers should already ignore them. >>>>> If you keep those pages with ref count 0 then even hotplug should work >>>>> seemlessly (I would have to double check). >>>>> >>>>> So all I am arguing is that whatever your driver wants to do can be >>>>> handled without touching the hotplug code much. You would still need >>>>> to add new ranges in the mem section units and manage on top of that. >>>>> You need to do that anyway to keep track of what parts are in use or >>>>> offlined anyway right? Now the mem sections. You have to do that anyway >>>>> for memmaps. Our sparse memory model simply works in those units. Even >>>>> if you make a part of that range unavailable then the section will still >>>>> be there. >>>>> >>>>> Do I make at least some sense or I am completely missing your point? >>>>> >>>> >>>> I think we're heading somewhere. I understand that you want to separate >>>> this "semi" offline part from the general offlining code. If so, we >>>> should definitely enforce segment alignment for online_pages/offline_pages. >>>> >>>> Importantly, what I need is: >>>> >>>> 1. Indicate and prepare memory sections to be used for adding memory >>>> chunks (right now add_memory()) >>> >>> Yes, this is section based. So you will always get memmap (struct page) >>> for the whole section. >>> >>>> 2. Make memory chunks of a section available to the system (right now >>>> online_pages()) >>> >>> Yes, this doesn't have to be section based. All you need is to mark >>> remaining pages as offline. They are reserved at this moment so nobody >>> should touch tehem. >>> >>>> 3. Remove memory chunks of a section from the system (right now >>>> offline_pages()) >>> >>> Yes. All we need is to note that those reserved pages are actually good >>> to offline. I have mentioned that reserved pages are yours at this stage >>> so you can note the special state without an additional page flag. >>> >>> The generic hotplug code just have to learn about this new state. >>> has_unmovable_pages sounds like a proper place to do that. You simply >>> clear the offline state and the PageReserved and you are done with the >>> page. >>> >> >> I agree. This would be minimal invassive - notifiers are still called on >> whole segment. > > That shouldn't matter because notifiers should never step on pages they > do not manage or own. > >>>> 4. Remove memory sections from the system (right now remove_memory()) >>> >>> no change needed >>> >>>> 5. Hinder dumping tools from reading memory chunks that are logically >>>> offline (right now PageOffline()) >>> >>> I still fail to see why do we even care about some dumping tools. Pages >>> are reserved so they simply shouldn't touch that memory at all. >>> >> >> Thanks for having a look! >> >> I wonder why reserved pages never got excluded by dump tools. So I >> assume there is some kind of magic hidden in it. >> >> `git grep SetPageReserved` returns a number of buffers that are not to >> be swapped. So "reserved" there is used for: >> "PG_reserved is set for special pages, which can never be swapped out" > > That was an ancient menaing of the flag. The flag in general means that > you shouldn't touch it unless you own it. > >> And my point would be that these pages are still to be dumped (just as >> it is being done now). They are valid memory. > > Then fix kdump or what ever is touching them. If the rule is really reserved -> dontouch, then I agree. > >> It seems like this bit is used for two different purposes. My take would >> be then to have another way of indicating "don't swap" vs. "page not >> accessible / offline". And that's why I propose PageOffline. >> >> I would even go one step further and rename "reserved" to "dontswap". > > No, it really doesn't have that meaning for years. > So would you agree to change the comment in page-flags.h to something like "PG_reserved is set for special pages, that should never be touched (read/written). Some of them might not even exist." Thanks! -- Thanks, David / dhildenb