Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2949844imm; Thu, 24 May 2018 19:43:29 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrNkU1H3IVBwKWPpc0CfNpJU93fRTCMkquHbwLLDFMVqbIDe2WI/aiSG00V06T7Zzt/Ngp4 X-Received: by 2002:a17:902:8207:: with SMTP id x7-v6mr635163pln.100.1527216209243; Thu, 24 May 2018 19:43:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527216209; cv=none; d=google.com; s=arc-20160816; b=1Fv6AXlbpYxo6LRm+6pfgRoZDUmWKilwlDruEpW3bvulgkPQbkF91AtWfa6ahJUtXr kkyfHTJzyi//DvKLOT3StMlMiWWjafUlBv6ibIms0tbx7Y9zutZYD65VrInv5WaevVP6 ls6UzfW2VvHt1L85eTcw45XZ6FcDq7BdbumIgIqyzqz9U7somH5wNNc1F6xsLuoUcoqr oinrddLycZGw2Az2Lu7HObxcLc1aS3zyJkV1TS91d/ra1F/gJm3A06EyaT/YbZIUHZ6K rhHphnmN3Xga6Yn/A83lmhOm9R7BxcNl4uLVF34+illgGdaC2/6BUjdO9u6/tfItYNHy HJZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :arc-authentication-results; bh=yJ/lV//HKOxT/pkCxjqGvqSxKiBQFMONAvc5V6qApso=; b=V/jBH2pfa4P8dUijD2QkFKsdQIhXCuT0ks7hfs1EGrTKn875NzG2vMBigO3ucUuD2B G/zfc5Q/HVd8FxM/CC+VpZJOt2YeC/DS9rqkSCHKITxE/GjGJ3pAqrNWA7E8g9+9gg86 +MZHhVnDvlOhH3lU/H9zPQu09+VlbgICB/sGkb2XWIP9H+VBEFmdNkrdtx/LK5+Vcn5J b2gy5N8dQjyvxD5FOFmMpF6Q51yQURrbiWkmGemJM66MqMly8BFoGisA+noRZ9PerePm m1apbNfwJOD6WJBHWw7p8JDnDdGff80xRz8OQ6sIaW5eeh0nY7NnHH/epXdAtm/icTf1 DIEg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c14-v6si4964657plo.359.2018.05.24.19.43.14; Thu, 24 May 2018 19:43:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S969263AbeEXVHj (ORCPT + 99 others); Thu, 24 May 2018 17:07:39 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:39194 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S967133AbeEXVHi (ORCPT ); Thu, 24 May 2018 17:07:38 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3BFBF406E974; Thu, 24 May 2018 21:07:37 +0000 (UTC) Received: from [10.36.116.77] (ovpn-116-77.ams2.redhat.com [10.36.116.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id B0B411006EB5; Thu, 24 May 2018 21:07:24 +0000 (UTC) Subject: Re: [PATCH v1 00/10] mm: online/offline 4MB chunks controlled by device driver To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexander Potapenko , Andrew Morton , Andrey Ryabinin , Balbir Singh , Baoquan He , Benjamin Herrenschmidt , Boris Ostrovsky , Dan Williams , Dave Young , Dmitry Vyukov , Greg Kroah-Hartman , Hari Bathini , Huang Ying , Hugh Dickins , Ingo Molnar , Jaewon Kim , Jan Kara , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Joonsoo Kim , Juergen Gross , Kate Stewart , "Kirill A. Shutemov" , Matthew Wilcox , Mel Gorman , Michael Ellerman , Miles Chen , Oscar Salvador , Paul Mackerras , Pavel Tatashin , Philippe Ombredanne , Rashmica Gupta , Reza Arbab , Souptick Joarder , Tetsuo Handa , Thomas Gleixner , Vlastimil Babka References: <20180523151151.6730-1-david@redhat.com> <20180524075327.GU20441@dhcp22.suse.cz> <14d79dad-ad47-f090-2ec0-c5daf87ac529@redhat.com> <20180524093121.GZ20441@dhcp22.suse.cz> <20180524120341.GF20441@dhcp22.suse.cz> <1a03ac4e-9185-ce8e-a672-c747c3e40ff2@redhat.com> <20180524142241.GJ20441@dhcp22.suse.cz> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <819e45c5-6ae3-1dff-3f1d-c0411b6e2e1d@redhat.com> Date: Thu, 24 May 2018 23:07:23 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180524142241.GJ20441@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 24 May 2018 21:07:37 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 24 May 2018 21:07:37 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'david@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 24.05.2018 16:22, Michal Hocko wrote: > I will go over the rest of the email later I just wanted to make this > point clear because I suspect we are talking past each other. It sounds like we are now talking about how to solve the problem. I like that :) > > On Thu 24-05-18 16:04:38, David Hildenbrand wrote: > [...] >> The point I was making is: I cannot allocate 8MB/128MB using the buddy >> allocator. All I want to do is manage the memory a virtio-mem device >> provides as flexible as possible. > > I didn't mean to use the page allocator to isolate pages from it. We do > have other means. Have a look at the page isolation framework and have a > look how the current memory hotplug (ab)uses it. In short you mark the > desired physical memory range as isolated (nobody can allocate from it) > and then simply remove it from the page allocator. And you are done with > it. Your particular range is gone, nobody will ever use it. If you mark > those struct pages reserved then pfn walkers should already ignore them. > If you keep those pages with ref count 0 then even hotplug should work > seemlessly (I would have to double check). > > So all I am arguing is that whatever your driver wants to do can be > handled without touching the hotplug code much. You would still need > to add new ranges in the mem section units and manage on top of that. > You need to do that anyway to keep track of what parts are in use or > offlined anyway right? Now the mem sections. You have to do that anyway > for memmaps. Our sparse memory model simply works in those units. Even > if you make a part of that range unavailable then the section will still > be there. > > Do I make at least some sense or I am completely missing your point? > I think we're heading somewhere. I understand that you want to separate this "semi" offline part from the general offlining code. If so, we should definitely enforce segment alignment for online_pages/offline_pages. Importantly, what I need is: 1. Indicate and prepare memory sections to be used for adding memory chunks (right now add_memory()) 2. Make memory chunks of a section available to the system (right now online_pages()) 3. Remove memory chunks of a section from the system (right now offline_pages()) 4. Remove memory sections from the system (right now remove_memory()) 5. Hinder dumping tools from reading memory chunks that are logically offline (right now PageOffline()) 6. For 3. find removable memory chunks in a certain memory range with a variable size. In an ideal world, 2. would never fail (in contrast to online_pages() right now). This might make some further developments I have in mind easier :) So if we can come up with an approach that can guarantee that, extra points. So what I think you are talking about is the following. For 1. Use add_memory() followed by online_pages(). Don't actually online the pages, keep them reserved (like XEN balloon). Fixup stats. For 2. Expose reserved pages to Buddy allocator. Clear reserved bit. Fixup stats. This can never fail. (yay) For 3. Isolate pages, try to move everything away (basically but not comletely offlining code). Set reserved flag. Fixup flags. For 4. offline_pages() followed by remove_memory(). -> Q: How to distinguish reserved offline from other reserved pages? offline_pages() has to be able to deal with that For 5. I don't think we can use reserved flag here. -> Q: What else to use? For 6. Scan for movable ranges. The use "You need to do that anyway to keep track of what parts are in use or offlined anyway right?" I would manually track which chunks of a section is logically offline (I do that right now already). Is that what you had in mind? If not, where does your idea differ. How could we solve 4/5. Of course, PageOffline() is again an option. Thanks! -- Thanks, David / dhildenb