Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1779252imu; Wed, 28 Nov 2018 15:11:42 -0800 (PST) X-Google-Smtp-Source: AFSGD/WITgTau5YxyZvGMppVwzrtjU46ImozZl293fyngQZ3F+ZAZh9BsFfVGknkLV7FfhdiD3VG X-Received: by 2002:a63:160d:: with SMTP id w13mr35716751pgl.43.1543446702065; Wed, 28 Nov 2018 15:11:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543446702; cv=none; d=google.com; s=arc-20160816; b=Lg6UIfkOpSXFcshn25l/m0FTUVY3RYwgk0aRf31PyUT8PMwp1rCnRP6YMS/BXOazN4 gVvKZQjIMHbDPU9cauDCak1egoebeEoNNu/QjD3IvsjFxx6//F6YRdKProIkmAfuhmM9 3NUxdmGIyP0ja4teWmlEWsnV4LZDEX5RHrx7OsnDcL7e4EMpRQPLRdsr5xdn9rR/dn5v 0esZ+ajiS+gObpz7Yn1mZxNaSlU020vOD5mnq+TdDyqBVeomVigUNlCE0Th/K4NoN6QE jioQt9C71smdQJ4D7GU4b7NxUh8etBCWO+sVXamTEQ+R59wAkGS57gpsDMA3HxGSE0dW Fbsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=C/c3G5BmD83tfVWE2NJBrIu3QAeN/dWlRBSxFybPWAU=; b=ejVqdUVP6WcVhaqVmSpwYp+8PXJ2v4evW+uGpnekeTNy1YiXGpD+vJeNT0g8s3HjBy kl6cTnLK/R9nrspQXG5rVN4YURMhw+j9CHkSL3xkFMNWb2DT37kuU3Tm24TwmYZ97gKP IdMEBYZvyM8J8txYDE3TDj/hMmjkyrlTXKvByeU+8fOJxAiOmM5d4uxp1OazZyOcFdJ8 wY+Euii9FJW60ayWkcZRnlP7QwVFGpj8AMXvo/sbhRE2ggWojd6II5Sep94CSKN+mwWL 980y3yCkCFb3xznNMoMrSoVStDV9j27woXzzfNeD6rIHc9CwJdqqH2apBlNLTl2/eV9q PvUQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=H2QjseIx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x128si10469pfb.128.2018.11.28.15.11.27; Wed, 28 Nov 2018 15:11:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=H2QjseIx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727080AbeK2KNY (ORCPT + 99 others); Thu, 29 Nov 2018 05:13:24 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:37209 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726462AbeK2KNY (ORCPT ); Thu, 29 Nov 2018 05:13:24 -0500 Received: by mail-pl1-f195.google.com with SMTP id b5so1137plr.4 for ; Wed, 28 Nov 2018 15:10:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=C/c3G5BmD83tfVWE2NJBrIu3QAeN/dWlRBSxFybPWAU=; b=H2QjseIxovCEEMN9C2MSV7HY1gJ8aO4I5A0DVk3hL4VJGs6XifQAcspu1uncB5xDaK RGxLCjFV9Xis8nepNBS48v+meMXM90MBKvZX+NVALiPH4R2fLJHUwtkz+ZA1F/ARQsU1 WInGaBWZMvzz5MbBYaIAj1b3G8JNkE+p8b0lA9rtjEuzdFFteqdTJWhetNiUwxR6nDKV A+wI158BC1u+HlYVOpHs4JHQVRrSEk3Bmk3l0KCBiQvuTkHx/VJ259cPo1fW2MlKws7l IfJbq8NI9msgySBccb1UmLjIMO4gH3ztafmTTBIjS4tU20BtAe4ByUDXkgK0vShoPwU4 Wumg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=C/c3G5BmD83tfVWE2NJBrIu3QAeN/dWlRBSxFybPWAU=; b=jGZ4sCZ+TrCFX09YViIoaOo00zylav5p3IrXxHIPdedJUsluF0MnZy0b+flB6L7wVv 5CxG7GagskxoeucdkS3qW/LNrfMOA8AVDzpuoeO4ZxJbpXbbh52X8WPCG6pKf75EnS65 9iTzLpTw/OytzT4XjERugytdt47CWdizBBH/7TV/wkgSCeSfRXTyo0DTn/UEi5wORH3Y 6SI8hYeEmg7u9s1LhwWBbItSoUbPus+4OJOAZlxbC2SEW6d5PVCzris5TJo6Z3/KdVnB bKDJI7i38ASBXtXbdRYx2Mj3VrESYVnP5Wepc3C1Q1unv6v0OU8UnJV2Bn6fMJ0as+4c 6rzw== X-Gm-Message-State: AA+aEWZu3BUESKx23K6V1olJQDG54K4EyKy1I40TRP1QE1GzRrIlC2rl Nvh5CHBKfA372DpwVsEveSCiNQ== X-Received: by 2002:a17:902:bc81:: with SMTP id bb1mr19778631plb.223.1543446608265; Wed, 28 Nov 2018 15:10:08 -0800 (PST) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id c12-v6sm10913721pfb.174.2018.11.28.15.10.04 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 28 Nov 2018 15:10:05 -0800 (PST) Date: Wed, 28 Nov 2018 15:10:04 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Linus Torvalds cc: ying.huang@intel.com, Andrea Arcangeli , Michal Hocko , s.priebe@profihost.ag, mgorman@techsingularity.net, Linux List Kernel Mailing , alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name, Andrew Morton , zi.yan@cs.rutgers.edu, Vlastimil Babka Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression In-Reply-To: Message-ID: References: <20181127062503.GH6163@shao2-debian> <20181127205737.GI16136@redhat.com> <87tvk1yjkp.fsf@yhuang-dev.intel.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 28 Nov 2018, Linus Torvalds wrote: > On Tue, Nov 27, 2018 at 7:20 PM Huang, Ying wrote: > > > > From the above data, for the parent commit 3 processes exited within > > 14s, another 3 exited within 100s. For this commit, the first process > > exited at 203s. That is, this commit makes memory allocation more fair > > among processes, so that processes proceeded at more similar speed. But > > this raises system memory footprint too, so triggered much more swap, > > thus lower benchmark score. > > > > In general, memory allocation fairness among processes should be a good > > thing. So I think the report should have been a "performance > > improvement" instead of "performance regression". > > Hey, when you put it that way... > > Let's ignore this issue for now, and see if it shows up in some real > workload and people complain. > Well, I originally complained[*] when the change was first proposed and when the stable backports were proposed[**]. On a fragmented host, the change itself showed a 13.9% access latency regression on Haswell and up to 40% allocation latency regression. This is more substantial on Naples and Rome. I also measured similar numbers to this for Haswell. We are particularly hit hard by this because we have libraries that remap the text segment of binaries to hugepages; hugetlbfs is not widely used so this normally falls back to transparent hugepages. We mmap(), madvise(MADV_HUGEPAGE), memcpy(), mremap(). We fully accept the latency to do this when the binary starts because the access latency at runtime is so much better. With this change, however, we have no userspace workaround other than mbind() to prefer the local node. On all of our platforms, native sized pages are always a win over remote hugepages and it leaves open the opportunity that we collapse memory into hugepages later by khugepaged if fragmentation is the issue. mbind() is not viable if the local node is saturated, we are ok with falling back to remote pages of the native page size when the local node is oom; this would result in an oom kill if we used it to retain the old behavior. Given this severe access and allocation latency regression, we must revert this patch in our own kernel, there is simply no path forward without doing so. [*] https://marc.info/?l=linux-kernel&m=153868420126775 [**] https://marc.info/?l=linux-kernel&m=154269994800842