Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp112029ybh; Tue, 10 Mar 2020 20:37:42 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvOV4z88cpTsxDQXYMsfMefsldvgCfaBAf9ZfUC7+RTVnwMzSDIX+4fC9pm6uVjRl7K3zjH X-Received: by 2002:a9d:12b4:: with SMTP id g49mr805020otg.50.1583897862772; Tue, 10 Mar 2020 20:37:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1583897862; cv=none; d=google.com; s=arc-20160816; b=vGtLaZHq2FwLwNyoD+p3mKlq7Wkms1xnvXH0031uDOx8ywcgcLRHR6zQK46vHVpgcO leECXO0SN2J1SqkVDcsPm5+iTTAX7sVNyZDRzL2aUSoyMMumai4MUGl+s9tgbsCb6QYv TMDHMlfWLzbD+rLOY3jMmUUSAYgzcjx86MLMtLWDaWQE31IUsjQsC3qY2wbjh9RcAHv0 yMPIgXASU8KQNh+m6w8QauSfnGys+/a9/NGyPmCHaserdD9DrX5goH6+UTbuPBZKctTT IAx+ZzskPiMoxPPLHwQurQYPaXGBSVy6ohTObd9QOLfrUnT3+oa560OuPN9i+B0PASEq 69SQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:date:from:dkim-signature; bh=wCWT2o5DFKKbBLuoSAVQOpMO8+LfC0dfYXlp5Cfyx5Q=; b=0bT3NJ8nOkHmfaQouPjSXXPQHzUMLKug9xeajmgO/cBvn/LHxuyaJML9FKpA2KKM1g Q3oWiQNJy5kIMvG/VARWITnv2SA9g8bPvb+WILZAvkIGStPPdvpgRPSvsvdFxr3rXpRU bJfZ9z7NBBNXk0sTraqoMnSSINRdGwK4fZ8l1BSs3/FNqpOf3f28axpuIMZ4fo27ix27 bXkc0vGslb2ERumSFPWnbUfKvDQ/D70NSZB5a6feCTVWfxaNnF7tK1sIkDHf23Uy6H1k bRJHJX8x4Lfib11v9fa3cv5dUkyngXIruElVLenIFvcuNnnFa1jbcLwMhgeoFtziqXIj CynQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=QveIo4Xi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j74si458857otj.246.2020.03.10.20.37.31; Tue, 10 Mar 2020 20:37:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=QveIo4Xi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728103AbgCKDf7 (ORCPT + 99 others); Tue, 10 Mar 2020 23:35:59 -0400 Received: from mail-qk1-f193.google.com ([209.85.222.193]:39806 "EHLO mail-qk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727659AbgCKDf7 (ORCPT ); Tue, 10 Mar 2020 23:35:59 -0400 Received: by mail-qk1-f193.google.com with SMTP id e16so798309qkl.6 for ; Tue, 10 Mar 2020 20:35:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:date:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=wCWT2o5DFKKbBLuoSAVQOpMO8+LfC0dfYXlp5Cfyx5Q=; b=QveIo4XiPZyXhpsCHpYBHBIaDmxvs9lx/j4w1Zlq1KgieD+UXs0M8K332X10crh0WW rZDoxfAV8uOfh/CSthrLcWP82zySmkq0ZzljKC9hKbtF8SvLyhHda7MPLiYW3JAKRlPs Mxd3e/vj0/qGIXkJkZMgL5YxBl7oDPBBErf/Q8LoQshpbvUQdbwRlQYBt143XIsehgcx z0plpjm3N7wWr7yZLuelfn4chkqudfklm7C04wAbAahgxD4f+zkGUcvY++GpAMtGiSsk 067xIhxvA6G1JuXsEPCRpkqOSUtRJgBxdjcBNRVpMxevwIN46g4vpLjJaXmL4gMeNOgd Ih4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:date:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=wCWT2o5DFKKbBLuoSAVQOpMO8+LfC0dfYXlp5Cfyx5Q=; b=Qc9DZTj8YGztpXctaH0IOhADDttSr6HL1K75FeNoaagf4Is0Q3Y4I6KsqEohk2YCQE vldgn8ChOEHPmziZ2M+d9SvNjxhEK8a1l67XFP0MRpdBs7Zu89GY1jsqH1Ku3Aiv2enN 1hj1lTd9dhTJ5jAZMwtTvqAmXryiu1MnraMVpLNaEvnS5PsLHoBkh/G2vS3jlgpqJUL9 2YvmaArzvM7ou2hvlFKv9tJkMWysGMuVy+tpts2jRfOjUghmjAkdEEjeEWcfwS4fiMyj 8AksaoXuU2z23WTDwUhL/I4B1yfQVDFxb0+KfzgWsZgE2TGBhQwW3BYCyZWRkuRP/TN0 1hGA== X-Gm-Message-State: ANhLgQ19MrqyE5ev1Jz8fh8uTxOfP9psfC8v2wCmRsUMh/NZ/9owkATG mgO3fQoyMBKRlKsYtJ60B54= X-Received: by 2002:a37:b041:: with SMTP id z62mr894840qke.487.1583897756906; Tue, 10 Mar 2020 20:35:56 -0700 (PDT) Received: from rani.riverdale.lan ([2001:470:1f07:5f3::b55f]) by smtp.gmail.com with ESMTPSA id w2sm25034621qto.73.2020.03.10.20.35.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Mar 2020 20:35:56 -0700 (PDT) From: Arvind Sankar X-Google-Original-From: Arvind Sankar Date: Tue, 10 Mar 2020 23:35:54 -0400 To: "Kirill A. Shutemov" Cc: Cannon Matthews , Matthew Wilcox , Andi Kleen , Michal Hocko , Mike Kravetz , Andrew Morton , David Rientjes , Greg Thelen , Salman Qazi , linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org Subject: Re: [PATCH] mm: clear 1G pages with streaming stores on x86 Message-ID: <20200311033552.GA3657254@rani.riverdale.lan> References: <20200307010353.172991-1-cannonmatthews@google.com> <20200309000820.f37opzmppm67g6et@box> <20200309090630.GC8447@dhcp22.suse.cz> <20200309153831.GK1454533@tassilo.jf.intel.com> <20200309183704.GA1573@bombadil.infradead.org> <20200311005447.jkpsaghrpk3c4rwu@box> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200311005447.jkpsaghrpk3c4rwu@box> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 11, 2020 at 03:54:47AM +0300, Kirill A. Shutemov wrote: > On Tue, Mar 10, 2020 at 05:21:30PM -0700, Cannon Matthews wrote: > > On Mon, Mar 9, 2020 at 11:37 AM Matthew Wilcox wrote: > > > > > > On Mon, Mar 09, 2020 at 08:38:31AM -0700, Andi Kleen wrote: > > > > > Gigantic huge pages are a bit different. They are much less dynamic from > > > > > the usage POV in my experience. Micro-optimizations for the first access > > > > > tends to not matter at all as it is usually pre-allocation scenario. On > > > > > the other hand, speeding up the initialization sounds like a good thing > > > > > in general. It will be a single time benefit but if the additional code > > > > > is not hard to maintain then I would be inclined to take it even with > > > > > "artificial" numbers state above. There really shouldn't be other downsides > > > > > except for the code maintenance, right? > > > > > > > > There's a cautious tale of the old crappy RAID5 XOR assembler functions which > > > > were optimized a long time ago for the Pentium1, and stayed around, > > > > even though the compiler could actually do a better job. > > > > > > > > String instructions are constantly improving in performance (Broadwell is > > > > very old at this point) Most likely over time (and maybe even today > > > > on newer CPUs) you would need much more sophisticated unrolled MOVNTI variants > > > > (or maybe even AVX-*) to be competitive. > > > > > > Presumably you have access to current and maybe even some unreleased > > > CPUs ... I mean, he's posted the patches, so you can test this hypothesis. > > > > I don't have the data at hand, but could reproduce it if strongly > > desired, but I've also tested this on skylake and cascade lake, and > > we've had success running with this for a while now. > > > > When developing this originally, I tested all of this compared with > > AVX-* instructions as well as the string ops, they all seemed to be > > functionally equivalent, and all were beat out by this MOVNTI thing for > > large regions of 1G pages. > > > > There is probably room to further optimize the MOVNTI stuff with better > > loop unrolling or optimizations, if anyone has specific suggestions I'm > > happy to try to incorporate them, but this has shown to be effective as > > written so far, and I think I lack that assembly expertise to micro > > optimize further on my own. > > Andi's point is that string instructions might be a better bet in a long > run. You may win something with MOVNTI on current CPUs, but it may become > a burden on newer microarchitectures when string instructions improves. > Nobody realistically would re-validate if MOVNTI microoptimazation still > make sense for every new microarchitecture. > The rationale for MOVNTI instruction is supposed to be that it avoids cache pollution. Aside from the bench that shows MOVNTI to be faster for the move itself, shouldn't it have an additional benefit in not trashing the CPU caches? As string instructions improve, why wouldn't the same improvements be applied to MOVNTI?