Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7995967imu; Tue, 4 Dec 2018 00:49:52 -0800 (PST) X-Google-Smtp-Source: AFSGD/UezUu6SIcj6AAXVs195L9YqTJLqbkn7OyFJfX8+zeAFChQOCfWuB9H//qs+gOlvE4fumQV X-Received: by 2002:a62:848d:: with SMTP id k135mr19071420pfd.47.1543913392522; Tue, 04 Dec 2018 00:49:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543913392; cv=none; d=google.com; s=arc-20160816; b=ILx7ouPjCoxrn1ZcAxXRc4lG480QKg8bzD+eD8gdre9Qq5UTsF0EA/kw1GEx01P0x4 +3QwAQEIcFFSKCQ7F22++LHSpYFjDD4C/R6cDJlSMwZDJ199p1bAjfMTXvyGzHZ66G5R LwuPdXarSHvfzN54iXtDeYGMM9IJ4pyezTsfppNDXuJzVn/2q2HJ7w7tLkK5FPpvBdvw lhfJYpyDR8Go5cE/7ENaffIDZSuw/BoFqSQOfv6UshMcmolwLL0RLhZEqz6jmAkvQuN0 aDpeoJjgj7J/B5uzJOnNjerYRz+vB15UIUw7ig8NJ/RwQeOwvLtTVFwaM7h3c4Ec8fUO 0j1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=C7Tu4U84nsr2DGI1x1FuS5wyQAT8BLTHpXgAPUWD0s8=; b=oyPbX3HcuQy7lgBoUrHXgDHozAYhxYPmm1r19Drg7tXumGzY8W57dNUjhmrEDrOXYF QYt1NPhHMfXvxIBbhGuZlhVcWrSEJdRdtY2AenNHPJGTljZrUfTkfvPqgZRVUNuSg1nh ky+YzNB94vZERJujs6uElROG5o8WeNtLCNRIh3xxrY7s4TWLczI31nWIAmCVXsfkPSM0 1iZ072bYPbAOrAkiBRMcxFE3IitEm4ini4Cyoh9crQcGtKRsFOW5zFRpPm8GZCuNuC5a ozXoZ81Bcy9yQXueCvDOj1yH0uBWo/DldzY+02o/rwpEqJrr84Db4U0VQ5JrmOtPOas2 TkJQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f5si16379521pfn.259.2018.12.04.00.49.37; Tue, 04 Dec 2018 00:49:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725989AbeLDIs0 (ORCPT + 99 others); Tue, 4 Dec 2018 03:48:26 -0500 Received: from mx2.suse.de ([195.135.220.15]:51796 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725764AbeLDIs0 (ORCPT ); Tue, 4 Dec 2018 03:48:26 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id D5B52B009; Tue, 4 Dec 2018 08:48:22 +0000 (UTC) Date: Tue, 4 Dec 2018 09:48:21 +0100 From: Michal Hocko To: David Rientjes Cc: Linus Torvalds , ying.huang@intel.com, Andrea Arcangeli , s.priebe@profihost.ag, mgorman@techsingularity.net, Linux List Kernel Mailing , alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name, Andrew Morton , zi.yan@cs.rutgers.edu, Vlastimil Babka Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression Message-ID: <20181204084821.GB1286@dhcp22.suse.cz> References: <20181203181456.GK31738@dhcp22.suse.cz> <20181203183050.GL31738@dhcp22.suse.cz> <20181203185954.GM31738@dhcp22.suse.cz> <20181203212539.GR31738@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 03-12-18 13:53:21, David Rientjes wrote: > On Mon, 3 Dec 2018, Michal Hocko wrote: > > > > I think extending functionality so thp can be allocated remotely if truly > > > desired is worthwhile > > > > This is a complete NUMA policy antipatern that we have for all other > > user memory allocations. So far you have to be explicit for your numa > > requirements. You are trying to conflate NUMA api with MADV and that is > > just conflating two orthogonal things and that is just wrong. > > > > No, the page allocator change for both my patch and __GFP_COMPACT_ONLY has > nothing to do with any madvise() mode. It has to do with where thp > allocations are preferred. Yes, this is different than other memory > allocations where it doesn't cause a 13.9% access latency regression for > the lifetime of a binary for users who back their text with hugepages. > MADV_HUGEPAGE still has its purpose to try synchronous memory compaction > at fault time under all thp defrag modes other than "never". The specific > problem being reported here, and that both my patch and __GFP_COMPACT_ONLY > address, is the pointless reclaim activity that does not assist in making > compaction more successful. You do not address my concern though. Sure there are reclaim related issues. Nobody is questioning that. But that is only half of the problem. The thing I am really up to here is that reintroduction of __GFP_THISNODE, which you are pushing for, will conflate madvise mode resp. defrag=always with a numa placement policy because the allocation doesn't fallback to a remote node. And that is a fundamental problem and the antipattern I am talking about. Look at it this way. All normal allocations are utilizing all the available memory even though they might hit a remote latency penalty. If you do care about NUMA placement you have an API to enforce a specific placement. What is so different about THP to behave differently. Do we really want to later invent an API to actually allow to utilize all the memory? There are certainly usecases (that triggered the discussion previously) that do not mind the remote latency because all other benefits simply outweight it? That being said what should users who want to use all the memory do to use as many THPs as possible? -- Michal Hocko SUSE Labs