Received: by 10.223.176.46 with SMTP id f43csp1250905wra; Wed, 24 Jan 2018 13:13:47 -0800 (PST) X-Google-Smtp-Source: AH8x224WLzagpqW7Oj0hPxFv01GN8mDyKkEf+ZbPEqPYGmCxsSvd0tPUdio55hA8sVeZ48dwvjei X-Received: by 2002:a17:902:47c2:: with SMTP id d2-v6mr7051074plh.222.1516828427764; Wed, 24 Jan 2018 13:13:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516828427; cv=none; d=google.com; s=arc-20160816; b=CekD0NrzRypu8he+KyFiPuP4g2zFJTY3l4sOslcLLgd1tmaPmCWkHXI9TFvqvIS1Pk ouY40oPZInAp7oqpiTcU2QYv8QVCoCOINSkBk1WyusTLDc7OCUS8oSEiKky6JsPUOTT6 rc2J38ehKUh3jbGg+oru1DrjCJYDSuqxa/RH21w1+cn9wz1wnwxz1XLRYiRQeiPEbLjU GhjP7fWT4o4BfzBHaS957TFyYDlk3J3ILZu2OfTM51I1V86rEfKpNm6lS9DPqbbxtpeH 8Xm7y5WjzgI/n8fpaEUS19/mzzJsrn2sV9n0viY/yGdoTLAerO79VJhM2PDRfjygmZUy p8vg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=HsVwSNWY6bbyfO58twmOyWOJFGzSzm+zUwMTIrmfjHY=; b=w8hVrsqIyPLTMLsqj/hP0bIlh6zKHHmt/7hPAfH4DFM/lm0zk6RQcTZpycCqyh12mw YFsYa7bin/ns2S6MnF1GKiX1pXiVyRF7IEtemmL+HAfQslL+sN58o5HjlpiUCl544WaQ 2i5qiXH/Rgpp8jU+YOtRrglxnR+75kchcNcHfU5ALe/Gd/F094aEyk5bz8c3hepMCmy4 ZqKa6AgYYFvFocLTpRzsDviYw3BZiUj9QCufVAB47v8u17lJd8FraJRxSBlNNoD1eB4g zvhkYxV2gCos2IJ2A8sAANb1U1afs4zK9huFIFp+84xn+BcFsQ88L7qr74evlEBrRMPL dxUw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l12-v6si728932plc.262.2018.01.24.13.13.33; Wed, 24 Jan 2018 13:13:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932647AbeAXVMb (ORCPT + 99 others); Wed, 24 Jan 2018 16:12:31 -0500 Received: from outbound-smtp26.blacknight.com ([81.17.249.194]:60597 "EHLO outbound-smtp26.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932238AbeAXVMa (ORCPT ); Wed, 24 Jan 2018 16:12:30 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp26.blacknight.com (Postfix) with ESMTPS id 9B40AB9110 for ; Wed, 24 Jan 2018 21:12:29 +0000 (GMT) Received: (qmail 16594 invoked from network); 24 Jan 2018 21:12:29 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[37.228.237.61]) by 81.17.254.9 with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 24 Jan 2018 21:12:29 -0000 Date: Wed, 24 Jan 2018 21:12:29 +0000 From: Mel Gorman To: Dave Hansen Cc: Aaron Lu , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Huang Ying , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka Subject: Re: [PATCH 2/2] free_pcppages_bulk: prefetch buddy while not holding lock Message-ID: <20180124211228.3k7tuuji7a7mvyh2@techsingularity.net> References: <20180124023050.20097-1-aaron.lu@intel.com> <20180124023050.20097-2-aaron.lu@intel.com> <20180124164344.lca63gjn7mefuiac@techsingularity.net> <148a42d8-8306-2f2f-7f7c-86bc118f8ccd@intel.com> <20180124181921.vnivr32q72ey7p5i@techsingularity.net> <525a20be-dea9-ed54-ca8e-8c4bc5e8a04f@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <525a20be-dea9-ed54-ca8e-8c4bc5e8a04f@intel.com> User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 24, 2018 at 11:23:49AM -0800, Dave Hansen wrote: > On 01/24/2018 10:19 AM, Mel Gorman wrote: > >> IOW, I don't think this has the same downsides normally associated with > >> prefetch() since the data is always used. > > That doesn't side-step the calculations are done twice in the > > free_pcppages_bulk path and there is no guarantee that one prefetch > > in the list of pages being freed will not evict a previous prefetch > > due to collisions. > > Fair enough. The description here could probably use some touchups to > explicitly spell out the downsides. > It would be preferable. As I said, I'm not explicitly NAKing this but it might push someone else over the edge into an outright ACK. I think patch 1 should go ahead as-is unconditionally as I see no reason to hold that one back. I would suggest adding the detail in the changelog that a prefetch will potentially evict an earlier prefetch from the L1 cache but it is expected the data would still be preserved in a L2 or L3 cache. Further note that while there is some additional instruction overhead, it is required that the data be fetched eventually and it's expected in many cases that cycles spent early will be offset by reduced memory latency later. Finally note that actual benefit will be workload/CPU dependant. Also consider adding a comment above the actual prefetch because it deserves one otherwise it looks like a fast path is being sprinked with magic dust from the prefetch fairy. > I do agree with you that there is no guarantee that this will be > resident in the cache before use. In fact, it might be entertaining to > see if we can show the extra conflicts in the L1 given from this change > given a large enough PCP batch size. > Well, I wouldn't bother worrying about different PCP batch sizes. In typical operations, it's going to be the pcp->batch size. Even if you were dumping the entire PCP due to a drain, it's still going to be less than many L1 sizes on x86 at least and those drains are usually in the context of a much larger operation where the overhead of the buddy calculations will be negligable in comparison. > But, this isn't just about the L1. If the results of the prefetch() > stay in *ANY* cache, then the memory bandwidth impact of this change is > still zero. You'll have a lot harder time arguing that we're likely to > see L2/L3 evictions in this path for our typical PCP batch sizes. > s/lot harder time/impossible without making crap up/ > Do you want to see some analysis for less-frequent PCP frees? Actually no I don't. While it would be interesting to see, it's a waste of your time. Less-frequent PCPs means that the impact of the extra cycles is *also* marginal and a PCP drain must fetch the data eventually. > We could > pretty easily instrument the latency doing normal-ish things to see if > we can measure a benefit from this rather than a tight-loop micro. I believe the only realistic scenario where it's going to be detectable is a network intensive application on very high speed networks where the cost of the alloc/free paths tends to be more noticable. I suspect anything else will be so far into the noise that it'll be unnoticable. -- Mel Gorman SUSE Labs