Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1824097imm; Thu, 24 May 2018 01:13:06 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrHtII9jmUUmHHpL/BEV+BbY1adUzYa+irY/j091bgyzgZpLN3LFBkYqHMkWquaYTxwyM1s X-Received: by 2002:a63:7211:: with SMTP id n17-v6mr3994610pgc.94.1527149586506; Thu, 24 May 2018 01:13:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527149586; cv=none; d=google.com; s=arc-20160816; b=PavuuOqMCObCop6U2Zpq9ioYpn1LscNGLtFdsTYQm7omGbff9U7mQ41H1fRKXRZIdA bcPXtWYx7t2SzCXeL40Cn1C1fruOUUFH6CqKjhqzzgLxtX089S0NsnfqSy6t88LzqBpI aMXvpX8fpfBas26Sn3f/hBqIaM9EPbJ4q0XWvJmLNMYB4SluBWuJwrm/b+f5euUDtdTP fh3irASbbUNKC+jLaYlL//CUSUKUVC/bWA/5qlq7PBCpxLFr6bgIyOuLaEn8wyqFHKpG NzQZKyR5Z9AILNAq2xbPuISzkNeXqJFnP4kfHpEsD0eXIt9ql2IZ6zrfe8eMbzUgNPeW QEPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=Zf/7ClRl2llLzAId2WWmaiylubTi5DAI0nzqzSrhaPY=; b=Dcw8PPmu94V4UNopwR+uCoE4D3TIp9IYW4jGrRXghUS/4IOB75K9mE3YTHw3Zpxu1/ Q3hRKqhV/t3G/jSdbpBNJ0zsObDOLg7xgTaI77IBSKqvXQrMbrR32PppZgBqqWBsnJI3 alHMAVtulI4UvHtUxNzWEXItDjyWOHqGAoNm7aWdclxpJT8ZDfBYDDNFgixJ6T1+8+6S GW7UqpCmMYN0hD9Wunoo1GXY7E8kZZq5h1IracT4dt0S6iKkviDwWzxEJDKjLc8VSjL0 nY60MIUnxusMDczUkX9o4mqSVJUr2KYIsvFuOw7gmCSnPYB9HLZOIeLyA9Jd0IEBC4l8 +HSg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e2-v6si21219260pfm.220.2018.05.24.01.12.12; Thu, 24 May 2018 01:13:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965383AbeEXHxf (ORCPT + 99 others); Thu, 24 May 2018 03:53:35 -0400 Received: from mx2.suse.de ([195.135.220.15]:49156 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935667AbeEXHxb (ORCPT ); Thu, 24 May 2018 03:53:31 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id ED6D8AEE0; Thu, 24 May 2018 07:53:29 +0000 (UTC) Date: Thu, 24 May 2018 09:53:27 +0200 From: Michal Hocko To: David Hildenbrand Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexander Potapenko , Andrew Morton , Andrey Ryabinin , Balbir Singh , Baoquan He , Benjamin Herrenschmidt , Boris Ostrovsky , Dan Williams , Dave Young , Dmitry Vyukov , Greg Kroah-Hartman , Hari Bathini , Huang Ying , Hugh Dickins , Ingo Molnar , Jaewon Kim , Jan Kara , =?iso-8859-1?B?Suly9G1l?= Glisse , Joonsoo Kim , Juergen Gross , Kate Stewart , "Kirill A. Shutemov" , Matthew Wilcox , Mel Gorman , Michael Ellerman , Miles Chen , Oscar Salvador , Paul Mackerras , Pavel Tatashin , Philippe Ombredanne , Rashmica Gupta , Reza Arbab , Souptick Joarder , Tetsuo Handa , Thomas Gleixner , Vlastimil Babka Subject: Re: [PATCH v1 00/10] mm: online/offline 4MB chunks controlled by device driver Message-ID: <20180524075327.GU20441@dhcp22.suse.cz> References: <20180523151151.6730-1-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180523151151.6730-1-david@redhat.com> User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I've had some questions before and I am not sure they are fully covered. At least not in the cover letter (I didn't get much further yet) which should give us a highlevel overview of the feature. On Wed 23-05-18 17:11:41, David Hildenbrand wrote: > This is now the !RFC version. I did some additional tests and inspected > all memory notifiers. At least page_ext and kasan need fixes. > > ========== > > I am right now working on a paravirtualized memory device ("virtio-mem"). > These devices control a memory region and the amount of memory available > via it. Memory will not be indicated/added/onlined via ACPI and friends, > the device driver is responsible for it. > > When the device driver starts up, it will add and online the requested > amount of memory from its assigned physical memory region. On request, it > can add (online) either more memory or try to remove (offline) memory. As > it will be a virtio module, we also want to be able to have it as a loadable > kernel module. How do you handle the offline case? Do you online all the memory to zone_movable? > Such a device can be thought of like a "resizable DIMM" or a "huge > number of 4MB DIMMS" that can be automatically managed. Why do we need such a small granularity? The whole memory hotplug is centered around memory sections and those are 128MB in size. Smaller sizes simply do not fit into that concept. How do you deal with that? > As we want to be able to add/remove small chunks of memory to a VM without > fragmenting guest memory ("it's not what the guest pays for" and "what if > the hypervisor wants to use huge pages"), it looks like we can do that > under Linux in a 4MB granularity by using online_pages()/offline_pages() Please expand on this some more. Larger logical units usually lead to a smaller fragmentation. > We add a segment and online only 4MB blocks of it on demand. So the other > memory might not be accessible. But you still allocate vmemmap for the full memory section, right? That would mean that you spend 2MB to online 4MB of memory. Sounds quite wasteful to me. > For kdump and onlining/offlining code, we > have to mark pages as offline before a new segment is visible to the system > (e.g. as these pages might not be backed by real memory in the hypervisor). Please expand on the kdump part. That is really confusing because hotplug should simply not depend on kdump at all. Moreover why don't you simply mark those pages reserved and pull them out from the page allocator? -- Michal Hocko SUSE Labs