Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp688523ybp; Fri, 4 Oct 2019 03:25:36 -0700 (PDT) X-Google-Smtp-Source: APXvYqy37fn4Nd8D/4k0sw7e6g+Cu0RLlGJ/wcEjB6rpkwB+2/aJSmKQg2GyWLtgPISSh2wED9so X-Received: by 2002:aa7:dd8e:: with SMTP id g14mr14323459edv.233.1570184736124; Fri, 04 Oct 2019 03:25:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570184736; cv=none; d=google.com; s=arc-20160816; b=gNYdxpHH4rnjk/iOXQ8Pp91xV6McjJNxEVJf0CI3scwpEsU1ttr5Oj9O/ld1JEuVia bHJWjndrMDodzsiIUiNzf98z4bKpKv4JdiAUiCH5g9LyFSAH+0dU57Sl0bg7WIa8qgHd /4CGa+zkapIRQwGEmzhq/WNHkhEVCXSSAB8ZdkO8IugMdVNEToJ+cEkCefjBmSLIjEEj Jv/8T1PL2zAIiJRqVFH8DJ/6GEYGKJ+uViQLidcBJMEuLP8A/SQtGBieRzGCl1VXgjNN 9Fwg+7kO/cEwf+Z9rjITUy86PZ0XN6JjfwxVqYWYae63fkkam+wqof/01QdlVCkEZDd8 Ztcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=nSyWN26eIyeu4B1i99Iczzk8PFjuHwuOpU5DzAtP+Dw=; b=CjcSIH/sfWJGun6bdmCL55Azh62xA787AYe9lBv7swWMY6yn67CAyZOkW3P0uLTF4q Wenpml1BTZlaBo0VmbVE+YSAQAaokFnhgH6oLwdztn1/lrLGY3Gyz6IKbnKJSd6aDygJ 3kXAoE0BLwzYfAnTgvH7m1l3MFGtU8linFB4juXKdtyHtGN10wztebvqPP6oDPykE0kK loaS8+1EJW6md4EQ7hihZPoXoLPa3zK7PKAfOkxi7kDDIgMGEB4MSGboKnmyVtFkxIGu 6YB5NtQ0JZqowl84kvpWE6TIgKLe5N5mchL95lhrOBGwVeiEQSW2yoSMoSYl7hk6DZEH 8YSQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d22si3200719ede.131.2019.10.04.03.25.11; Fri, 04 Oct 2019 03:25:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731270AbfJDJ2L (ORCPT + 99 others); Fri, 4 Oct 2019 05:28:11 -0400 Received: from mx2.suse.de ([195.135.220.15]:38036 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727611AbfJDJ2L (ORCPT ); Fri, 4 Oct 2019 05:28:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 65DCAAC16; Fri, 4 Oct 2019 09:28:09 +0000 (UTC) Date: Fri, 4 Oct 2019 11:28:08 +0200 From: Michal Hocko To: David Rientjes Cc: Vlastimil Babka , Mike Kravetz , Linus Torvalds , Andrea Arcangeli , Andrew Morton , Mel Gorman , "Kirill A. Shutemov" , Linux Kernel Mailing List , Linux-MM Subject: Re: [rfc] mm, hugetlb: allow hugepage allocations to excessively reclaim Message-ID: <20191004092808.GC9578@dhcp22.suse.cz> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 03-10-19 12:52:33, David Rientjes wrote: > On Thu, 3 Oct 2019, Vlastimil Babka wrote: > > > I think the key differences between Mike's tests and Michal's is this part > > from Mike's mail linked above: > > > > "I 'tested' by simply creating some background activity and then seeing > > how many hugetlb pages could be allocated. Of course, many tries over > > time in a loop." > > > > - "some background activity" might be different than Michal's pre-filling > > of the memory with (clean) page cache > > - "many tries over time in a loop" could mean that kswapd has time to > > reclaim and eventually the new condition for pageblock order will pass > > every few retries, because there's enough memory for compaction and it > > won't return COMPACT_SKIPPED > > > > I'll rely on Mike, the hugetlb maintainer, to assess the trade-off between > the potential for encountering very expensive reclaim as Andrea did and > the possibility of being able to allocate additional hugetlb pages at > runtime if we did that expensive reclaim. That tradeoff has been expressed by __GFP_RETRY_MAYFAIL which got broken by b39d0ee2632d. > For parity with previous kernels it seems reasonable to ask that this > remains unchanged since allocating large amounts of hugetlb pages has > different latency expectations than during page fault. This patch is > available if he'd prefer to go that route. > > On the other hand, userspace could achieve similar results if it were to > use vm.drop_caches and explicitly triggered compaction through either > procfs or sysfs before writing to vm.nr_hugepages, and that would be much > faster because it would be done in one go. Users who allocate through the > kernel command line would obviously be unaffected. Requesting the userspace to drop _all_ page cache in order allocate a number of hugetlb pages or any other affected __GFP_RETRY_MAYFAIL requests is simply not reasonable IMHO. -- Michal Hocko SUSE Labs