Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp124733pxk; Fri, 11 Sep 2020 02:14:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwDRNUb4WIBY4xaLSB89SqWZ1Wnfw1xWBwzZUgBjrIP3blM1SNPlb45YSrA0hZ9uouclMe7 X-Received: by 2002:a05:6402:17ec:: with SMTP id t12mr940425edy.328.1599815640108; Fri, 11 Sep 2020 02:14:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599815640; cv=none; d=google.com; s=arc-20160816; b=KLgv2s0n9E5CGkhaJAVc0JGYuVbTgosV+comXES+8S/fTMnfNgXahDK5kog/D2Z9cr WVe5siwsROI5looKb08xrWm3LZ+jRVkZstA50wRnUDvGlTqtzwHPvxLW67kG56PHTOxL 8lwG5L2HhDi634IWo0BbJtp4ka66Shg443hdVnp8oSbKQDn/8ms0BysG+oFOpzdoYOuV iyHbQ95aKLO4F0P4HPfZZN16TFWayjOkZ2kUue2LyNsG/STXwhfNM0IJw5mP+Cu66nKw +rrSL63yhI64k5WSOFb0kkfNeaGi21C3B9k5Sp9rIofIKq3nxJGy5wGPAASY51kZYVWv Ir8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=VfOhggVdyhgUkJ540fsOy1xcAO6iB2zuQmUIb1e9WZ4=; b=mHXBkXTXKsbd0wdXhTchp4X32ofkYdzd3A9GJNwOKeFHC4KEVtcYs0sx4UJGHYTBaR v21cAgKx0rMgD4PSXVopS4DI+tD6bPL5x6Pir1n/Q/wx7EGCFRL4EpDCYZd5r5iWHWkX NuGfyscZ1crTJmsefGgJUdZ4SiOJ4gKsvgZSi2kPoJFvB3MC89v3zaOf0APE43EnRAgp RFcXkvdMjlPZ3zsFIPtXvIjCoFIdgy6I7H89Lxa8v8CDCT897wCHOM1B1TKZPz3uPp3a Xb/sg/J8ZKGUj8nDRqBjIIoXqEdEt896pxU8wHxGQ9KSY13gUer8dM4WFqgbMjaBdH4u 2+1A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i3si829937edt.479.2020.09.11.02.13.37; Fri, 11 Sep 2020 02:14:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725814AbgIKJM7 (ORCPT + 99 others); Fri, 11 Sep 2020 05:12:59 -0400 Received: from mx2.suse.de ([195.135.220.15]:39940 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725730AbgIKJMz (ORCPT ); Fri, 11 Sep 2020 05:12:55 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 85D8DABEA; Fri, 11 Sep 2020 09:13:08 +0000 (UTC) Date: Fri, 11 Sep 2020 11:12:52 +0200 From: Michal Hocko To: David Hildenbrand Cc: Dave Hansen , Gerald Schaefer , "akpm@linux-foundation.org" , Greg KH , Jan =?iso-8859-1?Q?H=F6ppner?= , Heiko Carstens , "linux-mm@kvack.org" , linux-api@vger.kernel.org, Dave Hansen , "linux-kernel@vger.kernel.org" Subject: Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? Message-ID: <20200911091252.GD7986@dhcp22.suse.cz> References: <3E00A442-7107-48DA-8172-EED95F6E1663@redhat.com> <20200911072035.GC7986@dhcp22.suse.cz> <02cdbf90-b29f-a9ec-c83d-49f2548e3e91@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <02cdbf90-b29f-a9ec-c83d-49f2548e3e91@redhat.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 11-09-20 10:09:07, David Hildenbrand wrote: [...] > Consider two cases: > > 1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to > online/offline the whole thing. HW can effectively only plug/unplug the > whole thing. It makes sense in some (most?) setups to represent one DIMM > as one memory block device. Yes, for the physical hotplug it doesn't really make much sense to me to offline portions that the HW cannot hotremove. > 2. Hot(un)plugging small memory increments. This is mostly the case in > virtualized environments - especially hyper-v balloon, xen balloon, > virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC, > you want at least all (16MB!) memory block devices that can get > unplugged again individually ("LMBs") as separate memory blocks. Same on > s390x on memory increment size (currently effectively the memory block > size). Yes I do recognize those usecase even though I will not pretend I consider it quesitonable. E.g. any hotplug with a smaller granularity than the memory model in Linus allows is just dubious. We simply cannot implement that without a lot of wasting and then the question is what is the real point. > In summary, larger memory block devices mostly only make sense with > DIMMs (and for boot memory in some cases). We will still end up with > many memory block devices in other configurations. And that is fine because the boot time memory is still likely the primary source of memory. And reducing memory devices for those is a huge improvement already (just think of a multi TB system with gazillions pointless memory devices). > I do agree that a "disable sysfs" option is interesting - even with > memory hotplug (we mostly need a way to configure it and a way to notify > kexec-tools about memory hot(un)plug events). I am currently (once > again) looking into improving auto-onlining support in the kernel. > > Having that said, I much rather want to see smaller improvements (that > can be fine-tuned individually - like allowing variable-sized memory > blocks) than doing a switch to "new shiny" and figuring out after a > while that we need "new shiny2". There is only one certainty. Providing a long term interface with ever growing (ab)users is a hard target. And shinyN might be needed in the end. Who knows. My main point is that the existing interface is hitting a wall on usecases which _do_not_care_ about memory hotplug. And that is something we should be looking at. > I consider removing "phys_device" as one of these tunables. The question > would be how to make such sysfs changes easy to configure > ("-phys_device", "+variable_sized_blocks" ...) I am with you on that. There are more candidates in memory block directories which have dubious value. Deprecation process is a PITA and that's why I thought that it would make sense to focus on something that we can mis^Wdesign with exising and forming usecases in mind that would get rid of all the cruft that we know it doesn't work (removable would be another one. I am definitely not going to insist and I appreciate you are trying to clean this up. That is highly appreciated of course. -- Michal Hocko SUSE Labs