Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752384AbcDQCAg (ORCPT ); Sat, 16 Apr 2016 22:00:36 -0400 Received: from mail-pf0-f176.google.com ([209.85.192.176]:33862 "EHLO mail-pf0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752007AbcDQCAe (ORCPT ); Sat, 16 Apr 2016 22:00:34 -0400 Date: Sat, 16 Apr 2016 19:00:31 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Kirill A. Shutemov" cc: Hugh Dickins , Andrew Morton , "Kirill A. Shutemov" , Andrea Arcangeli , Andres Lagar-Cavilla , Yang Shi , Ning Qu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 03/31] huge tmpfs: huge=N mount option and /proc/sys/vm/shmem_huge In-Reply-To: <20160411111705.GE22996@node.shutemov.name> Message-ID: References: <20160411111705.GE22996@node.shutemov.name> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3113 Lines: 70 On Mon, 11 Apr 2016, Kirill A. Shutemov wrote: > On Tue, Apr 05, 2016 at 02:15:05PM -0700, Hugh Dickins wrote: > > Plumb in a new "huge=1" or "huge=0" mount option to tmpfs: I don't > > want to get into a maze of boot options, madvises and fadvises at > > this stage, nor extend the use of the existing THP tuning to tmpfs; > > though either might be pursued later on. We just want a way to ask > > a tmpfs filesystem to favor huge pages, and a way to turn that off > > again when it doesn't work out so well. Default of course is off. > > > > "mount -o remount,huge=N /mountpoint" works fine after mount: > > remounting from huge=1 (on) to huge=0 (off) will not attempt to > > break up huge pages at all, just stop more from being allocated. > > > > It's possible that we shall allow more values for the option later, > > to select different strategies (e.g. how hard to try when allocating > > huge pages, or when to map hugely and when not, or how sparse a huge > > page should be before it is split up), either for experiments, or well > > baked in: so use an unsigned char in the superblock rather than a bool. > > Make the value a string from beginning would be better choice in my > opinion. As more allocation policies would be implemented, number would > not make much sense. I'll probably agree about the strings. Though we have not in fact devised any more allocation policies so far, and perhaps never will at this mount level. > > For record, my implementation has four allocation policies: never, always, > within_size and advise. I'm sceptical who will get into choosing "within_size". > > > > > No new config option: put this under CONFIG_TRANSPARENT_HUGEPAGE, > > which is the appropriate option to protect those who don't want > > the new bloat, and with which we shall share some pmd code. Use a > > "name=numeric_value" format like most other tmpfs options. Prohibit > > the option when !CONFIG_TRANSPARENT_HUGEPAGE, just as mpol is invalid > > without CONFIG_NUMA (was hidden in mpol_parse_str(): make it explicit). > > Allow setting >0 only if the machine has_transparent_hugepage(). > > > > But what about Shmem with no user-visible mount? SysV SHM, memfds, > > shared anonymous mmaps (of /dev/zero or MAP_ANONYMOUS), GPU drivers' > > DRM objects, ashmem. Though unlikely to suit all usages, provide > > sysctl /proc/sys/vm/shmem_huge to experiment with huge on those. We > > may add a memfd_create flag and a per-file huge/non-huge fcntl later. > > I use sysfs knob instead: > > /sys/kernel/mm/transparent_hugepage/shmem_enabled > > And string values there as well. It's better match current THP interface. It's certainly been easier for me, to get it up and running without having to respect all the anon THP knobs. But I do expect some pressure to conform a bit more now. Hugh > > > And allow shmem_huge two further values: -1 for use in emergencies, > > to force the huge option off from all mounts; and (currently) 2, > > to force the huge option on for all - very useful for testing. > > In my case, it's "deny" and "force". > > -- > Kirill A. Shutemov