Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp961149iob; Fri, 13 May 2022 17:49:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwthL4bNP6GSRP4GHSKT3W6XVLVSlL5DDezWqL6Jxxi4HVmAJd0y8NVoIBlGyLaiGTEDyex X-Received: by 2002:a05:600c:4885:b0:393:fac6:f409 with SMTP id j5-20020a05600c488500b00393fac6f409mr17108991wmp.150.1652489347273; Fri, 13 May 2022 17:49:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652489347; cv=none; d=google.com; s=arc-20160816; b=N7tX3SjUhNdmP+1wdu7Gr67T4IL9/tIKMbtOHwW4qT6zoJZbOa7l7QTZkMrCTIulGw it2SIVBpceW8TlnYa5bM8mSxdIBbTczq/rDHk/5nxCC3/A+EWaqdgS1e++k87Eqjnc1Q XRsqxZU3kBolHghGFEhKJ2uhlD0hWDj3jiO2zLWA3MqQRD0+lQlJ+fFOXdYrCvlN2r5w Te2MoAS+boky4cwapPbjwphCTEw5U2jTeKT9nj0bX7yUnHbjCmI/t4n/BzzfTIpQAZYR Sby39awc0b6QtjuUvjGKJg8qHwggZ17CzWkvHukvPWPpG72V654IMxMUH/KDZAIvFjzF iNNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=Zrd42Za4rL0OR59b397wWt+ewCqIIV4nW4b1tNyPGMg=; b=VF8Y/Ns8RCIL518g0jCmJ/Wdt48nRMfXles4z25KZXImwbDDt4mCzLp8W1+owS7Uiw Ikxk56d/DQS1v57aAoN8bPEfNXEjnEGZ5PW2uVcsTm1xg5xw/WvqUkQ1w3kVssTr6nwg adx9aWsXjJvOxGozZqtZbyDGuwqUNaJMVztoHbrhfyK2uTsAvej5FCSZGXj63zZKJgOR mg0NRrY0M818FFeak21oV+T2vc+qTQrCuLiEJ9r9gTjQoMM+RtOA7Go61AQwVj5xgIzq QRgKGXBM0kor3Z7Uhw5prkQVAQD/Fed9yXwYlph0ykeeXy+qEEiIRB1lSxMEWJkwUvKW ec5A== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id g11-20020a05600c4ecb00b00396f2926935si609066wmq.185.2022.05.13.17.49.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 May 2022 17:49:07 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BB72733C1EF; Fri, 13 May 2022 16:27:35 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1383083AbiEMSXK (ORCPT + 99 others); Fri, 13 May 2022 14:23:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350261AbiEMSXI (ORCPT ); Fri, 13 May 2022 14:23:08 -0400 Received: from outbound-smtp11.blacknight.com (outbound-smtp11.blacknight.com [46.22.139.106]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AFC35E0A1 for ; Fri, 13 May 2022 11:23:04 -0700 (PDT) Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp11.blacknight.com (Postfix) with ESMTPS id 6064C1C6018 for ; Fri, 13 May 2022 19:23:03 +0100 (IST) Received: (qmail 1396 invoked from network); 13 May 2022 18:23:03 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 13 May 2022 18:23:03 -0000 Date: Fri, 13 May 2022 19:23:01 +0100 From: Mel Gorman To: Nicolas Saenz Julienne Cc: Andrew Morton , Marcelo Tosatti , Vlastimil Babka , Michal Hocko , LKML , Linux-MM Subject: Re: [PATCH 6/6] mm/page_alloc: Remotely drain per-cpu lists Message-ID: <20220513182301.GK3441@techsingularity.net> References: <20220512085043.5234-1-mgorman@techsingularity.net> <20220512085043.5234-7-mgorman@techsingularity.net> <20220512123743.5be26b3ad4413f20d5f46564@linux-foundation.org> <20220513150402.GJ3441@techsingularity.net> <167d30f439d171912b1ef584f20219e67a009de8.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <167d30f439d171912b1ef584f20219e67a009de8.camel@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 13, 2022 at 05:19:18PM +0200, Nicolas Saenz Julienne wrote: > On Fri, 2022-05-13 at 16:04 +0100, Mel Gorman wrote: > > On Thu, May 12, 2022 at 12:37:43PM -0700, Andrew Morton wrote: > > > On Thu, 12 May 2022 09:50:43 +0100 Mel Gorman wrote: > > > > > > > From: Nicolas Saenz Julienne > > > > > > > > Some setups, notably NOHZ_FULL CPUs, are too busy to handle the per-cpu > > > > drain work queued by __drain_all_pages(). So introduce a new mechanism to > > > > remotely drain the per-cpu lists. It is made possible by remotely locking > > > > 'struct per_cpu_pages' new per-cpu spinlocks. A benefit of this new scheme > > > > is that drain operations are now migration safe. > > > > > > > > There was no observed performance degradation vs. the previous scheme. > > > > Both netperf and hackbench were run in parallel to triggering the > > > > __drain_all_pages(NULL, true) code path around ~100 times per second. > > > > The new scheme performs a bit better (~5%), although the important point > > > > here is there are no performance regressions vs. the previous mechanism. > > > > Per-cpu lists draining happens only in slow paths. > > > > > > > > Minchan Kim tested this independently and reported; > > > > > > > > My workload is not NOHZ CPUs but run apps under heavy memory > > > > pressure so they goes to direct reclaim and be stuck on > > > > drain_all_pages until work on workqueue run. > > > > > > > > unit: nanosecond > > > > max(dur) avg(dur) count(dur) > > > > 166713013 487511.77786438033 1283 > > > > > > > > From traces, system encountered the drain_all_pages 1283 times and > > > > worst case was 166ms and avg was 487us. > > > > > > > > The other problem was alloc_contig_range in CMA. The PCP draining > > > > takes several hundred millisecond sometimes though there is no > > > > memory pressure or a few of pages to be migrated out but CPU were > > > > fully booked. > > > > > > > > Your patch perfectly removed those wasted time. > > > > > > I'm not getting a sense here of the overall effect upon userspace > > > performance. As Thomas said last year in > > > https://lkml.kernel.org/r/87v92sgt3n.ffs@tglx > > > > > > : The changelogs and the cover letter have a distinct void vs. that which > > > : means this is just another example of 'scratch my itch' changes w/o > > > : proper justification. > > > > > > Is there more to all of this than itchiness and if so, well, you know > > > the rest ;) > > > > > > > I think Minchan's example is clear-cut. The draining operation can take > > an arbitrary amount of time waiting for the workqueue to run on each CPU > > and can cause severe delays under reclaim or CMA and the patch fixes > > it. Maybe most users won't even notice but I bet phone users do if a > > camera app takes too long to open. > > > > The first paragraphs was written by Nicolas and I did not want to modify > > it heavily and still put his Signed-off-by on it. Maybe it could have > > been clearer though because "too busy" is vague when the actual intent > > is to avoid interfering with RT tasks. Does this sound better to you? > > > > Some setups, notably NOHZ_FULL CPUs, may be running realtime or > > latency-sensitive applications that cannot tolerate interference > > due to per-cpu drain work queued by __drain_all_pages(). Introduce > > a new mechanism to remotely drain the per-cpu lists. It is made > > possible by remotely locking 'struct per_cpu_pages' new per-cpu > > spinlocks. This has two advantages, the time to drain is more > > predictable and other unrelated tasks are not interrupted. > > > > You raise a very valid point with Thomas' mail and it is a concern that > > the local_lock is no longer strictly local. We still need preemption to > > be disabled between the percpu lookup and the lock acquisition but that > > can be done with get_cpu_var() to make the scope clear. > > This isn't going to work in RT :( > > get_cpu_var() disables preemption hampering RT spinlock use. There is more to > it in Documentation/locking/locktypes.rst. > Bah, you're right. A helper that called preempt_disable() on !RT and migrate_disable() on RT would work although similar to local_lock with a different name. I'll look on Monday to see how the code could be restructured to always have the get_cpu_var() call immediately before the lock acquisition. Once that is done, I'll look what sort of helper that "disables preempt/migration, lookup pcp structure, acquire lock, enable preempt/migration". It's effectively the magic trick that local_lock uses to always lock the right pcpu lock but we want the spinlock semantics for remote drain. -- Mel Gorman SUSE Labs