Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp1709605ybn; Wed, 2 Oct 2019 22:08:30 -0700 (PDT) X-Google-Smtp-Source: APXvYqwZfb9xWH2EYc9JVX9+nR+96zOrYBlcD2I4iDeXCO241HRmIYTm74bUOVHISuomFFvX7HB2 X-Received: by 2002:a05:6402:1251:: with SMTP id l17mr7620652edw.270.1570079310028; Wed, 02 Oct 2019 22:08:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570079310; cv=none; d=google.com; s=arc-20160816; b=yn5OFrS9j5clSlJP08QTniAScu89nB8w7nVl6j48IBvaPhxv3EISvxxThhacfF2Jbp hCo4U1TlK5L2DwmugsWIyl6IJjcCQ8dLcqwc3pi7DotRVl0jSpX02TRgrWv24UkuI0We K9ls/LRRn4FM2HpNqHNL/sFHg19Hu5+/pfg5YkE18Ivr2C1kJbVB3Ilb3eKaPnzse+ZO VEZmh3Zpak0/O13fCjX4DlK2SeYYqKsFtURsFiEeTOfiAvrKo+t7XkmH9trgy/vgqTBO KQdwfq6CQyFY6o1FRp62AxseUGyZXGPZWZDrhOO6cmYIFeee0tVDkkVask0OF6A0ts6f hSng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=DwwmKQiThzeizjYsp8ebfMl4FJ0TEXntvx7vS2rWDA8=; b=Dw6/7OvRaz7lX8t3lW3YZR9lhQSm9bdOTeSCpxXTMCWXU9NtHz3vD8jCg2nilWOjK4 T4B2LUxlfBO17UxF1fmHEA3PvhtxSjNbnkdrkkEMB/60s6XZgTCYuL8HcaGEcXgLOoGL SPqcPhI4gKhZgVznJAsdBLwTe/lwNuxULc3cPFhNcr85wAlPkqcmT7MVkXUdY3+F8JE8 DDhPfgccty9GbLerUf3JrTBCm7rzOec8a5651vWMYt2p4Q4gPhzByE9vMHLTJ7nss/0S Eqk7QiKDGj9s1TAk9DJlMiFKwUAN8CjL4+Bt78DbS8P8vl9h8f+jidkSyGN5sMOs+lM2 2PjA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w1si746299eda.214.2019.10.02.22.07.42; Wed, 02 Oct 2019 22:08:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727242AbfJCFAO (ORCPT + 99 others); Thu, 3 Oct 2019 01:00:14 -0400 Received: from mx2.suse.de ([195.135.220.15]:58654 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725799AbfJCFAO (ORCPT ); Thu, 3 Oct 2019 01:00:14 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B537CAF5D; Thu, 3 Oct 2019 05:00:11 +0000 (UTC) Date: Thu, 3 Oct 2019 07:00:10 +0200 From: Michal Hocko To: Linus Torvalds Cc: David Rientjes , Mike Kravetz , Vlastimil Babka , Andrea Arcangeli , Andrew Morton , Mel Gorman , "Kirill A. Shutemov" , Linux Kernel Mailing List , Linux-MM Subject: Re: [rfc] mm, hugetlb: allow hugepage allocations to excessively reclaim Message-ID: <20191003050010.GA24174@dhcp22.suse.cz> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 02-10-19 16:37:57, Linus Torvalds wrote: > On Wed, Oct 2, 2019 at 4:03 PM David Rientjes wrote: > > > > Since hugetlb allocations have explicitly preferred to loop and do reclaim > > and compaction, exempt them from this new behavior at least for the time > > being. It is not shown that hugetlb allocation success rate has been > > impacted by commit b39d0ee2632d but hugetlb allocations are admittedly > > beyond the scope of what the patch is intended to address (thp > > allocations). > > I'd like to see some numbers to show that this special case makes sense. http://lkml.kernel.org/r/20191001054343.GA15624@dhcp22.suse.cz While the test is somehow artificial it is not too much different from real workloads which do preallocate a non trivial (50% in my case) of memory for hugetlb pages. Having a moderately utilized memory (by page cache in my case) is not really unexpected. > I understand the "this is what it used to do, and hugetlbfs wasn't the > intended recipient of the new semantics", and I don't think the patch > is wrong. This is not only about this used to work. It is an expected and documented semantic of __GFP_RETRY_MAYFAIL * %__GFP_RETRY_MAYFAIL: The VM implementation will retry memory reclaim * procedures that have previously failed if there is some indication * that progress has been made else where. It can wait for other * tasks to attempt high level approaches to freeing memory such as * compaction (which removes fragmentation) and page-out. * There is still a definite limit to the number of retries, but it is * a larger limit than with %__GFP_NORETRY. * Allocations with this flag may fail, but only when there is * genuinely little unused memory. While these allocations do not * directly trigger the OOM killer, their failure indicates that * the system is likely to need to use the OOM killer soon. The * caller must handle failure, but can reasonably do so by failing * a higher-level request, or completing it only in a much less * efficient manner. > But at the same time, we do know that swap storms happen for other > loads, and if we say "hugetlbfs is different" then there should at > least be some rationale for why it's different other than "history". > Some actual "yes, we _want_ the possibile swap storms, because load > XYZ". > > And I don't mean microbenchmark numbers for "look, behavior changed". > I mean "look, this is a real load, and now it runs X% slower because > it relied on this hugetlbfs behavior". It is not about running slower. It is about not getting the expected amount of hugetlb pages requested by admin who knows that that size is needed. -- Michal Hocko SUSE Labs