Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp245506imu; Thu, 6 Dec 2018 23:48:00 -0800 (PST) X-Google-Smtp-Source: AFSGD/W00utwSHQ3fQBgyCBly4pn6LcLOVbC3Aj67NP8/4HnC8Df4FBAnwHqsirzXZiUNt04Cs7k X-Received: by 2002:a62:11c7:: with SMTP id 68mr1222221pfr.21.1544168880338; Thu, 06 Dec 2018 23:48:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544168880; cv=none; d=google.com; s=arc-20160816; b=DJ+pSiboUnM3Yw/rNQOVYSvT2PFs5L+acN5sPWRq6C4f95sek2cWttwGY5Ew6M5sWI nloMFA1W9x+7ugigsqxZEdY3MmHwBNSciKvSUdV5F/wad1xd7tTBvxFl1NlP6jJOD9ih sl3NzNv4symjueTJ3TpOCjNBNsGi2NvwBFLBAqW1rMY61drkw6pM+Dk+qbTGfGHAu7s6 MA0wl9UcgZoOnJhhv4qbda7HjtcKedhfhTH6iUXdzDzUVcvjkZ5QvfJFRyaMGdc6+ops 7bo2QcrP0+PT7MYPftOm3o3c0NoMGvQq7sR7/D81dcN+OwbDgKKNwtAmVm8oj2E7qCzy Td9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=VANO/1sHfomfigAaFG2QPS4YxsUlDgZRL6ZHwWHl4b8=; b=g0j7z1lTnPA1hR37Zf6uulqz8DYr/dTDsxC9bBTmsebaaXYzigiBoeb548/WzbZTJL pMc7knAh/5xJVSe8pA0iIIBN6lptfaXg0E0m7FRJERbcrUuAWnuGUXlLwTvs1yhUWjpJ mN7L2B0APoeSze0efm8GKuAt8mWWVJ89mLaMEGLnQhUvAZKC0LoDVEitiSVnlx0gyFWg 5f7N0Bm9Vb+lHT3+/5LChOpOHnBdyaiWBMnVdBeMu65kjPhVXLTUY+ulFAvJCuhCInny CtRQp2as3k9bKANgJiwhEsKVEOUL1Lm0j76BHKPr16hZ4zbMIbUiPoNhLk2xjCOE1U1U FAQg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i96si2350637plb.188.2018.12.06.23.47.45; Thu, 06 Dec 2018 23:48:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726060AbeLGHes (ORCPT + 99 others); Fri, 7 Dec 2018 02:34:48 -0500 Received: from mx2.suse.de ([195.135.220.15]:38766 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725952AbeLGHes (ORCPT ); Fri, 7 Dec 2018 02:34:48 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 53689AD5C; Fri, 7 Dec 2018 07:34:46 +0000 (UTC) Date: Fri, 7 Dec 2018 08:34:44 +0100 From: Michal Hocko To: David Rientjes Cc: Linus Torvalds , Andrea Arcangeli , mgorman@techsingularity.net, Vlastimil Babka , ying.huang@intel.com, s.priebe@profihost.ag, Linux List Kernel Mailing , alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name, Andrew Morton , zi.yan@cs.rutgers.edu Subject: Re: MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression) Message-ID: <20181207073444.GQ1286@dhcp22.suse.cz> References: <64a4aec6-3275-a716-8345-f021f6186d9b@suse.cz> <20181204104558.GV23260@techsingularity.net> <20181205204034.GB11899@redhat.com> <20181205233632.GE11899@redhat.com> <20181206091405.GD1286@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 06-12-18 15:49:04, David Rientjes wrote: > On Thu, 6 Dec 2018, Michal Hocko wrote: > > > MADV_HUGEPAGE changes the picture because the caller expressed a need > > for THP and is willing to go extra mile to get it. That involves > > allocation latency and as of now also a potential remote access. We do > > not have complete agreement on the later but the prevailing argument is > > that any strong NUMA locality is just reinventing node-reclaim story > > again or makes THP success rate down the toilet (to quote Mel). I agree > > that we do not want to fallback to a remote node overeagerly. I believe > > that something like the below would be sensible > > 1) THP on a local node with compaction not giving up too early > > 2) THP on a remote node in NOWAIT mode - so no direct > > compaction/reclaim (trigger kswapd/kcompactd only for > > defrag=defer+madvise) > > 3) fallback to the base page allocation > > > > I disagree that MADV_HUGEPAGE should take on any new semantic that > overrides the preference of node local memory for a hugepage, which is the > nearly four year behavior. The order of MADV_HUGEPAGE preferences listed > above would cause current users to regress who rely on local small page > fallback rather than remote hugepages because the access latency is much > better. I think the preference of remote hugepages over local small pages > needs to be expressed differently to prevent regression. Such a model would be broken. It doesn't provide consistent semantic and leads to surprising results. MADV_HUGEPAGE with local node binding will not prevent remote base pages to be used and you are back to square one. It has been a huge mistake to merge your __GFP_THISNODE patch back then in 4.1. Especially with an absolute lack of numbers for a variety of workloads. I still believe we can do better, offer a sane mem policy to help workloads with higher locality demands but it is outright wrong to confalte demand for THP with the locality semantic. If this is absolutely no go then we need a MADV_HUGEPAGE_SANE... -- Michal Hocko SUSE Labs